Text to Speech

TTS

1. Kokoro

2. Zonos

Supports mandarin (cmn) and cantonese (yue)

3. Reviews

Increasing number of high quality neural network based TTS are now available:


source: AIPrintify

Notes:

  • Spark-TTS license changed to non-commercial

Language Code

A lot of TTS use the language code based on espeak-ng:

Cantonese

Alternatives

You can pick from MANY open sourced Text to Speech Engines

  1. GitHub - suno-ai/bark: 🔊 Text-Prompted Generative Audio Model
  2. GitHub - metavoiceio/metavoice-src: Foundational model for human-like, expressive TTS
  3. GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
  4. GitHub - myshell-ai/MeloTTS: High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
  5. GitHub - jishengpeng/ControlSpeech: ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
  6. GitHub - fishaudio/fish-speech: Brand new TTS solution
  7. GitHub - jaywalnut310/vits: VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
  8. GitHub - RVC-Boss/GPT-SoVITS: 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
  9. GitHub - 2noise/ChatTTS: A generative speech model for daily dialogue.
  10. GitHub - huggingface/parler-tts: Inference and training library for high-quality TTS models.
  11. GitHub - yl4579/StyleTTS2: StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
  12. GitHub - jasonppy/VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
  13. GitHub - neonbjb/tortoise-tts: A multi-voice TTS system trained with an emphasis on quality
  14. balacoon/tts · Hugging Face
  15. GitHub - snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
  16. https://community.openconversational.ai/

VITS

The default TTS engine Piper is based on VITS:
https://docs.coqui.ai/en/latest/models/vits.html

Cantonese

StyleTTS2

Demo

Server

Piper

Under Windows - Team AI in Windows