Skill

Process Images with Azure Computer Vision

Name: Process Images with Azure Computer Vision
Availability: OnlineOnly
Author: LlamaIndex

Azure Computer Vision Tool connects AI agents to Azure's Computer Vision API for image captioning, object detection, tagging, and OCR on image URLs.

Get skill

Works with azure cvopenai

LlamaIndex

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Models

gpt 4ogpt 4llama 3

Add to Favorites

Why it matters

Leverage Azure's advanced computer vision capabilities to analyze images. This tool enables agents to extract insights like captions, tags, and text from image URLs.

Outcomes

What it gets done

Caption images using Azure Computer Vision.

Extract tags and objects from images.

Perform Optical Character Recognition (OCR) on images.

Integrate Azure Computer Vision into agent workflows.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-tool-tools-azure-cv | bash

Capabilities

What this skill does

Classify

Labels or categorizes text, files, or data points.

Extract

Pulls structured data fields from unstructured text.

Transcribe

Converts audio or video speech to written text.

Summarize

Condenses long documents or threads into key takeaways.

Overview

Azure Computer Vision Tool

What it does

Azure Computer Vision Tool connects LlamaIndex agents to Azure's Computer Vision API. It provides a `process_image` function that performs computer vision tasks including image captioning, object classification, tag generation, and optical character recognition (OCR) on image URLs. The tool wraps Azure Cognitive Services as a ToolSpec that agents can invoke through natural language requests.

How it connects

Use this tool when your AI agent needs to analyze images as part of its workflow-captioning photos, extracting text from screenshots, identifying objects in product images, or generating descriptive tags. It's designed for scenarios where you already use Azure infrastructure and want to add vision capabilities to LlamaIndex agents without building custom API integrations.

Source README

Azure Computer Vision Tool

This tool connects to a Azure account and allows an Agent to perform a variety of computer vision tasks on image urls.

You will need to set up an api key and computer vision instance using Azure, learn more here: https://azure.microsoft.com/en-ca/products/cognitive-services/computer-vision

Usage

This tool has a more extensive example usage documented in a Jupyter notebook here

Here's an example usage of the AzureCVToolSpec.

from llama_index.tools.azure_cv import AzureCVToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

tool_spec = AzureCVToolSpec(api_key="your-key", resource="your-resource")

agent = FunctionAgent(
    tools=tool_spec.to_tool_list(), llm=OpenAI(model="gpt-4.1")
)

await agent.run(
    "caption this image and tell me what tags are in it https://portal.vision.cognitive.azure.com/dist/assets/ImageCaptioningSample1-bbe41ac5.png"
)
await agent.run(
    "caption this image and read any text https://portal.vision.cognitive.azure.com/dist/assets/OCR3-4782f088.jpg"
)

process_image: Send an image for computer vision classification of objects, tags, captioning or OCR.

This loader is designed to be used as a way to load data as a Tool in a Agent.

Discussion

Process Images with Azure Computer Vision

What it gets done

Add it to your toolbox

What this skill does

Azure Computer Vision Tool

What it does

How it connects

Azure Computer Vision Tool

Usage

Questions & comments · 0