Text to Speech

TTS

1. Kokoro

2. Piper

Under Windows - Partition AI in Windows

3. Edge TTS (cloud compute)

Edge TTS is not open source and it sends your data to Microsoft, but if you need to speak a language that is NOT available in good quality as open source (e.g. Cantonese) then the Edge TTS may be an option special use cases where privacy is not of concern.

Alternatives

You can pick from MANY open sourced Text to Speech Engines

  1. GitHub - suno-ai/bark: 🔊 Text-Prompted Generative Audio Model
  2. GitHub - metavoiceio/metavoice-src: Foundational model for human-like, expressive TTS
  3. GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
  4. GitHub - myshell-ai/MeloTTS: High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
  5. GitHub - jishengpeng/ControlSpeech: ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
  6. GitHub - fishaudio/fish-speech: Brand new TTS solution
  7. GitHub - jaywalnut310/vits: VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
  8. GitHub - RVC-Boss/GPT-SoVITS: 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
  9. GitHub - 2noise/ChatTTS: A generative speech model for daily dialogue.
  10. GitHub - huggingface/parler-tts: Inference and training library for high-quality TTS models.
  11. GitHub - yl4579/StyleTTS2: StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
  12. GitHub - jasonppy/VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
  13. GitHub - neonbjb/tortoise-tts: A multi-voice TTS system trained with an emphasis on quality
  14. balacoon/tts · Hugging Face
  15. GitHub - snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
  16. https://community.openconversational.ai/

VITS

The default TTS engine Piper is based on VITS:
https://docs.coqui.ai/en/latest/models/vits.html

Cantonese

StyleTTS2

Demo

Server