Cartesia (Voice AI)

Cartesia (Voice AI) MCP Connector for Claude

A+

Generate lifelike AI voices, clone speech, and transcribe audio with Cartesia's state-of-the-art Sonic models directly from your AI agent.

20 tools Official Updated Jun 28, 2026 Official Vinkius Partner

Connect Cartesia to your AI agent to unlock high-performance voice synthesis and speech recognition. Cartesia's Sonic models provide industry-leading latency and quality for real-time applications.

What you can do

  • Text-to-Speech (TTS) — Generate high-fidelity audio bytes or stream via SSE using models like Sonic 3.5 and Sonic 3.
  • Speech-to-Text (STT) — Transcribe audio files into text using the Ink Whisper model with multi-language support.
  • Voice Cloning — Create custom voice models from as little as 5 seconds of audio input.
  • Voice Management — List, retrieve, and update voices, or use the Voice Changer to transform existing audio.
  • Pronunciation Control — Manage custom pronunciation dictionaries for specialized terminology or accents.
  • Agent Orchestration — List and manage AI agents and monitor call logs and usage credits.

How it works

  1. Subscribe to this server
  2. Enter your Cartesia API Key
  3. Start generating audio or transcribing speech from Claude, Cursor, or any MCP-compatible client

Who is this for?

  • Developers — integrate real-time voice synthesis into applications without managing complex infrastructure.
  • Content Creators — automate voiceovers and audio localization using high-quality cloned voices.
  • Product Teams — build conversational AI agents that sound human and respond with sub-second latency.
text-to-speechspeech-to-textvoice-synthesislow-latencyai-voiceaudio-streaming

20 tools expose this connector's capabilities to your AI agent.

get_voice

Get details for a specific voice

list_agent_calls

List calls and transcripts for a specific agent

update_voice

Update voice metadata

clone_voice

Clone a voice from a 5s audio clip

create_pronunciation_dict

Create a new pronunciation dictionary

delete_pronunciation_dict

Delete a pronunciation dictionary

delete_voice

Delete a voice

generate_access_token

Generate a short-lived access token for client-side requests

get_agent

Get details for a specific voice agent

get_usage_credits

Get credit usage statistics

infill_bytes

Generate audio to smoothly connect two existing segments

list_agents

List all voice agents

list_pronunciation_dicts

List pronunciation dictionaries

list_voices

List available voices

localize_voice

Adapt a voice to a new language/dialect

stt_batch

Transcribe audio file to text (Batch STT)

tts_bytes

Generate text-to-speech audio bytes

tts_sse

Generate text-to-speech via Server-Sent Events

update_pronunciation_dict

Update a pronunciation dictionary

voice_changer_bytes

Change voice of an audio clip while preserving intonation

See how to talk to your AI agent using Cartesia (Voice AI).

List all available voices in my Cartesia account.

I've retrieved the list of voices. You have access to several models, including 'Baritone' (ID: 79a045e3...), 'British Lady' (ID: 123x-456y...), and your custom cloned voices. Which one would you like to use?

Generate a WAV audio file saying 'Welcome to the future of AI' using voice ID 79a045e3-a621-4923-b05c-8029db0dffca.

Generating audio... I've synthesized the text using the Sonic-3.5 model. The WAV file is ready for download.

Check my current usage credits on Cartesia.

You currently have 45,200 credits remaining in your account. Your last billing cycle refreshed on the 1st of the month.

Yes. Using the `tts_bytes` tool, you can specify the `output_format_container` as 'mp3', 'wav', or 'raw', and configure the sample rate and encoding to match your needs.

Related Connectors