Volcengine Speech Synthesis

Volcengine Speech Synthesis MCP Connector for Claude

A+

The massive 'TikTok Voice' TTS API — generate natural speech with ByteDance's iconic voice models.

5 tools Official Updated Jun 28, 2026 Official Vinkius Partner

Connect Volcengine Speech Synthesis (ByteDance's TTS platform) to any AI agent and generate stunning natural speech — including the iconic TikTok voices — through natural conversation.

What you can do

  • Text-to-Speech — Convert any text to natural-sounding speech
  • TikTok Voices — Use the exact voice models behind TikTok's viral TTS effects
  • Multi-Language — Synthesize in Chinese, English, Japanese, and more
  • SSML Support — Fine-grained control with pauses, emphasis, and prosody
  • Long-Form Audio — Synthesize articles, audiobooks, and lengthy documents
  • Custom Voices — Train personalized voice models from audio samples
  • Speed/Volume Control — Adjust speech rate and volume dynamically

How it works

  1. Subscribe to this server
  2. Enter your Volcengine Access Key and Secret Key
  3. Start generating speech from Claude, Cursor, or any MCP client

Who is this for?

  • Content Creators — Generate voiceovers for videos, reels, and TikToks
  • Accessibility Teams — Add speech output to apps and websites
  • Audiobook Producers — Convert long-form text to natural narration
  • Developers — Integrate TikTok-quality TTS into applications
volcenginetiktok-voicettsspeechtext-to-speechai-audio

5 tools expose this connector's capabilities to your AI agent.

get_audio_formats

Use MP3 for web delivery, WAV for editing, OGG Opus for efficient streaming, or PCM for raw processing. List supported audio output formats

list_voices

Essential for choosing the right voice before synthesis. Includes the famous TikTok voice styles. List all available TTS voice models

synthesize_long_text

Ideal for articles, audiobooks, and lengthy documentation. Use this when your text exceeds the standard 1024 character limit. Synthesize speech from long text (over 1024 characters)

synthesize_ssml

Use SSML tags like <break>, <emphasis>, <prosody> for natural-sounding output with precise timing and intonation control. Convert SSML (Speech Synthesis Markup Language) to speech

synthesize_speech

Supports multiple languages (Chinese, English, Japanese), various voice styles (female, male, child, trendy, news), and adjustable speed/volume. Returns audio data or URL. Ideal for narration, accessibility, multi-language content, and the iconic TikTok voice effects. Convert text to speech using Volcengine TTS

See how to talk to your AI agent using Volcengine Speech Synthesis.

Generate speech with the TikTok trendy female voice: 'Welcome to my video!'

🔊 Speech synthesized successfully! Using BV033_streaming (TikTok Trendy Female). Audio generated in MP3 format at 24kHz.

List all available voices and show me English options.

🎙️ Available voices: BV001 (Generic Female, zh), BV002 (Generic Male, zh), BV033 (TikTok Trendy Female, zh), BV113 (English Female, en), BV115 (English Male, en). English options: BV113 (Female), BV115 (Male).

Synthesize this article into speech: [long article text...]

📖 Long-text synthesis started! Article split into 5 chunks. Using BV001_streaming voice. Processing will take ~30 seconds for full narration.

Volcengine powers the iconic TikTok TTS effects used in billions of videos. It offers industry-leading Chinese speech quality, trendy social media voices, and ByteDance's proprietary neural voice technology.

Related Connectors