Natural Tokenizer Engine MCP Connector for Claude
A+Tokenize text into words, numbers, emails, URLs, emojis, and hashtags deterministically. AI struggles with mixed content — this engine extracts exact linguistic entities instantly.
You feed a tweet to an AI and ask it to extract the hashtags and emojis. It uses Byte Pair Encoding (BPE), meaning it sees words as sub-tokens. It frequently hallucinates boundaries, splitting hashtags or merging URLs with punctuation.
This MCP uses wink-tokenizer (inspired by Python's spaCy) to perform deterministic NLP tokenization. It understands the structural rules of human language, cleanly separating words from punctuation, while keeping complex entities like emails, URLs, and emojis intact.
The Superpowers
- Entity Extraction: Accurately tags tokens as
word,number,email,url,emoji,hashtag, ormention. - Punctuation Awareness: Intelligently separates punctuation from words without breaking abbreviations (e.g., 'U.S.A.' stays together, 'End.' splits).
- Mixed Content Ready: Flawlessly parses social media posts containing text, links, and emojis mixed together.
- Deterministic NLP: Math-based parsing, not LLM probability guessing.
Related Connectors
Convai MCP
Build and manage conversational AI characters with backstory, narrative design, and knowledge banks for games and apps.
HrFlow.ai MCP
AI-powered talent acquisition API for parsing, matching, and reasoning.
TomTom MCP
Equip your AI agent with global mapping, routing, and real-time traffic capabilities powered by TomTom.
Care Quality Commission (CQC) MCP
Access UK health and social care regulator data via CQC — search providers, check ratings, and retrieve inspection reports directly from any AI agent.