Text to Speech

admin · August 31, 2021, 2:00pm

TTS

1. Kokoro

2. Piper

GitHub - rhasspy/piper: A fast, local neural text to speech system
Piper-Training-Guide-with-Screen-Reader/README.md at main · ZachB100/Piper-Training-Guide-with-Screen-Reader · GitHub

3. Edge TTS (cloud compute)

Edge TTS is not open source and it sends your data to Microsoft, but if you need to speak a language that is NOT available in good quality as open source (e.g. Cantonese) then the Edge TTS may be an option special use cases where privacy is not of concern.

admin · May 27, 2023, 4:29am

Alternatives

You can pick from MANY open sourced Text to Speech Engines

GitHub - suno-ai/bark: 🔊 Text-Prompted Generative Audio Model
GitHub - metavoiceio/metavoice-src: Foundational model for human-like, expressive TTS
GitHub - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
GitHub - myshell-ai/MeloTTS: High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
GitHub - jishengpeng/ControlSpeech: ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
GitHub - fishaudio/fish-speech: Brand new TTS solution
GitHub - jaywalnut310/vits: VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
GitHub - RVC-Boss/GPT-SoVITS: 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
GitHub - 2noise/ChatTTS: A generative speech model for daily dialogue.
GitHub - huggingface/parler-tts: Inference and training library for high-quality TTS models.
GitHub - yl4579/StyleTTS2: StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
GitHub - jasonppy/VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
GitHub - neonbjb/tortoise-tts: A multi-voice TTS system trained with an emphasis on quality
balacoon/tts · Hugging Face
GitHub - snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
https://community.openconversational.ai/

admin · August 24, 2024, 3:58am

VITS

The default TTS engine Piper is based on VITS:
https://docs.coqui.ai/en/latest/models/vits.html

Cantonese

admin · August 25, 2024, 7:17am

StyleTTS2

GitHub - yl4579/StyleTTS2: StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Demo

Audio Samples from "Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech"

Server

GitHub - lxe/tts-server: A simple TTS server for generating speech using StyleTTS2