Predibase (LLM Serving & Finetuning)

Predibase (LLM Serving & Finetuning) MCP Connector for Claude

D

Deploy and query fine-tuned LLMs via Predibase — run inference, classify text, and monitor deployment metrics directly from your AI agent.

7 tools Official Updated Jun 28, 2026 Official Vinkius Partner

Connect your Predibase account to any AI agent to manage high-performance LLM serving and fine-tuning workflows. Predibase provides a unified interface for serverless LLM deployment and LoRA adapter management.

What you can do

  • LLM Inference — Generate text or chat completions using generate_text, chat_completion, and completion tools.
  • Fine-tuning Integration — Dynamically apply LoRA adapters during inference using the adapter_id parameter in generation tasks.
  • Text Classification — Perform batch classification tasks with the classify tool for structured data workflows.
  • Deployment Monitoring — Check the status of your endpoints with get_health, get_info, and get_metrics.
  • Structured Output — Enforce JSON schemas on model responses for reliable downstream automation.

How it works

  1. Subscribe to this server
  2. Provide your Predibase API Token and Tenant ID
  3. Start querying your deployments from Claude, Cursor, or any MCP client

Who is this for?

  • AI Engineers — deploy and test fine-tuned models without leaving the chat interface
  • Data Scientists — monitor inference metrics and health of production deployments
  • Developers — integrate high-performance LLM capabilities into apps with structured JSON output
llm-servingfine-tuninginferencemachine-learningai-ops

7 tools expose this connector's capabilities to your AI agent.

chat_completion

Create a chat completion (OpenAI compatible)

classify

Batch classification for one or more inputs

completion

Create a completion (OpenAI compatible)

generate_text

Generate text using a deployed LLM

get_health

Check health status of the inference endpoint

get_info

Get inference endpoint metadata

get_metrics

Get Prometheus metrics for the deployment

See how to talk to your AI agent using Predibase (LLM Serving & Finetuning).

Generate a summary of this text using the 'llama-3-70b' deployment.

I'll use the `generate_text` tool on your 'llama-3-70b' deployment. Processing the input prompt now...

Check the health and metrics for my 'customer-support-llm' deployment.

I am calling `get_health` and `get_metrics` for 'customer-support-llm'. The endpoint is currently healthy and processing 12 requests per minute.

Classify these three reviews using our sentiment model deployment.

Running the `classify` tool for your inputs. Results: Review 1 (Positive), Review 2 (Negative), Review 3 (Neutral).

Yes. When using the `generate_text` tool, you can provide an `adapter_id` to apply your specific fine-tuned LoRA adapter to the base model deployment.

Related Connectors