NVIDIA Vision

NVIDIA Vision MCP Connector for Claude

A+

Generate images, analyze visuals, detect objects, and caption images via NVIDIA Vision APIs.

9 tools Official Updated Jun 28, 2026 Official Vinkius Partner

Connect NVIDIA Vision to any AI agent and unlock powerful image understanding and generation — create images with Stable Diffusion, analyze visuals with Kosmos-2, answer questions about images, and perform object detection through natural conversation.

What you can do

  • Generate Images — Create images from text prompts using Stable Diffusion models
  • Visual Q&A — Ask questions about any image and get detailed answers
  • Image Captioning — Generate detailed descriptions of image contents
  • Object Detection — Identify and list all objects visible in an image
  • Document Understanding — Extract information from scanned documents and forms
  • Visual Grounding — Locate specific objects or phrases within images
  • Style Transfer — Apply artistic styles to existing images
  • Image Segmentation — Segment images into distinct object regions

How it works

  1. Subscribe to this server
  2. Enter your NVIDIA API Key (from build.nvidia.com)
  3. Start analyzing and generating images from Claude, Cursor, or any MCP-compatible client

Who is this for?

  • Designers — Generate concepts and analyze visual compositions quickly
  • Developers — Integrate image understanding into apps without managing GPU infrastructure
  • Content Creators — Generate images and apply style transfers for social media
computer-visionimage-generationobject-detectionvisual-qaimage-captioninggenerative-ai

9 tools expose this connector's capabilities to your AI agent.

image_captioning

Generate a detailed caption for an image

detect_objects

Detect and list all objects in an image

document_qa

Works with scanned documents, forms, receipts. Ask questions about a document image (OCR + understanding)

generate_image

Model options: "stabilityai/stable-diffusion-3-medium", "stabilityai/stable-diffusion-xl-base-1.0". Size format: "1024x1024". Generate an image from a text prompt using Stable Diffusion

visual_grounding

Locate a specific object or phrase in an image

image_segmentation

Segment and identify all objects in an image

style_transfer

Apply an artistic style to an image

list_vision_models

List available vision models on NVIDIA API Catalog

visual_question_answering

Provide a public image URL. Ask a question about an image

See how to talk to your AI agent using NVIDIA Vision.

Generate an image of a futuristic city at sunset.

Image generated successfully! Base64 data available for display.

What objects do you see in this image: https://example.com/photo.jpg

I detect: 1. A red car (center). 2. A tree (left). 3. A building (background). 4. Two people walking (right).

Describe this image in detail: https://example.com/document.png

The image shows a business document dated March 2026. It contains a table with revenue figures totaling $2.4M.

Yes! Use the `generate_image` tool with Stable Diffusion models. Provide a descriptive prompt and optionally specify size (e.g., '1024x1024').

Related Connectors