Which models are available for chat completions?

You can use `create_chat_completion` with models like Meta-Llama-3.3-70B-Instruct, DeepSeek-V3.1, and MiniMax-M2.5 for high-speed text generation.

Can I generate embeddings for my RAG pipeline?

Yes! Use the `create_embedding` tool with models like E5-Mistral-7B-Instruct to create vectorized representations of your text data.

What is the difference between create_chat_completion and create_response?

`create_chat_completion` follows the standard OpenAI chat format, while `create_response` is a stateless API designed specifically for agentic workflows, returning typed output items.

SambaNova (AI Inference) MCP Connector for Claude

A+

High-speed AI inference for Llama 3, DeepSeek, and MiniMax models via SambaNova's ultra-fast SN40L chips.

3 tools Official Updated Jun 28, 2026 Official Vinkius Partner

More Details Connect to Claude

Connect to SambaNova Cloud to run the world's fastest open-source models directly from your AI agent. Leverage the power of SambaNova's DataScale and SN40L infrastructure to achieve record-breaking tokens-per-second.

What you can do

Chat Completions — Generate high-quality responses using state-of-the-art models like Meta-Llama-3.3-70B-Instruct and DeepSeek-V3.1.
Agentic Responses — Use the specialized create_response tool for stateless, typed outputs designed specifically for agentic workflows.
Vector Embeddings — Generate high-dimensional text representations for RAG (Retrieval-Augmented Generation) using E5-Mistral-7B-Instruct.
Advanced Sampling — Fine-tune outputs with temperature, top_p, top_k, and seed parameters for deterministic and creative results.

How it works

Subscribe to this server
Enter your SambaNova Cloud API Key
Start querying high-performance models from Claude, Cursor, or any MCP-compatible client

Who is this for?

AI Engineers — building real-time applications that require low-latency inference and high throughput.
Developers — looking for a cost-effective and faster alternative to standard LLM providers.
Data Scientists — generating embeddings for large-scale knowledge bases at scale.

llm-inferencellama3deepseekembeddingshigh-performance-computing

Connect to Claude

Subscribe on Vinkius, then add this connector to Claude.ai or Claude Code.

① Claude.ai (web app)

Go to Settings → Connectors → Add custom connector
Paste the MCP endpoint URL below

https://edge.vinkius.com/vk_preview_z52tsMdkFaoCTCBeg6u7bbxNhhnjcT0r5KXU6hjj/mcp

② Claude Code (terminal)

claude mcp add --transport http sambanova-ai-inference https://edge.vinkius.com/vk_preview_z52tsMdkFaoCTCBeg6u7bbxNhhnjcT0r5KXU6hjj/mcp

③ Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "sambanova-ai-inference": {
      "url": "https://edge.vinkius.com/vk_preview_z52tsMdkFaoCTCBeg6u7bbxNhhnjcT0r5KXU6hjj/mcp"
    }
  }
}

Get full access on Vinkius

The preview token above works for testing. Powered by Vinkius.

Details

Tools: 3
Grade: A+
Score: 100/100
Updated: Jun 28, 2026

Related Connectors

Traefik Proxy MCP

18 tools Official

Monitor and manage your Traefik Proxy infrastructure — inspect routers, services, and middlewares directly from your AI agent.

A+ View details →

Spotio MCP

12 tools Official

Manage leads, pipelines, and field sales activities on Spotio with AI agents.

A+ View details →

NOAA Aviation — Airport Weather Intelligence MCP

5 tools Official

Aviation weather data worldwide: METARs (current airport conditions), TAFs (24-hour airport forecasts), PIREPs (pilot reports of turbulence and icing), and SIGMETs/AIRMETs (significant aviation hazards) from the Aviation Weather Center.

A+ View details →

Bright Pattern MCP

10 tools Official

Orchestrate your contact center via Bright Pattern — manage users, track interactions, and monitor real-time stats directly from any AI agent.

A+ View details →