Speech to Text

Speech to Text

Speech is a fundamental part of being human.

88.io provides tools to help in the push towards citizens not only owning their data but also the intelligence that comes from their data.

1. Demo

1.1 Demonstration web pages:

Server:
https://speech.quuvoo4ohcequuox.0.88.io/

Client:
https://speech.contacttrace.com.au/

1.2 Details:

Speech Quality

We have tuned the STT to work even on the traditional telephone network (using A-law codec with 8kHz sampling rate).

Introduction to SST

With private cyberspace EVERYONE (yes you) got to train their own STT engine, for those who want to learn a bit about the technology behind the STT they use everyday, the following are some good introductions:

Default STT Engine

Both Kaldi and DeepSeech are good open sourced STT engines, although Kaldi is more complex then DeepSpeech to use, we have picked Kaldi instead of DeepSpeech as default from the 2024.02 release because it gives us a bit more flexibility. You can change the default engine to DeepSpeech or many other open sourced STT engine e.g. Vosk.

All engines are normally bundled with reasonable STT models, which can be substantially improved with our own Partition AI technologies for many use cases. In case the bundled models are not good enough, don't forget you own the compute, so you can even train your OWN models!

Faster Whisper

Faster Whisper can be used in most Private Cyberspace deployments.

which is based on OpenAI's Whisper:

Currently you can speak in 99 languages to your Entity Agent:

    "en": "english",
    "zh": "chinese",
    "de": "german",
    "es": "spanish",
    "ru": "russian",
    "ko": "korean",
    "fr": "french",
    "ja": "japanese",
    "pt": "portuguese",
    "tr": "turkish",
    "pl": "polish",
    "ca": "catalan",
    "nl": "dutch",
    "ar": "arabic",
    "sv": "swedish",
    "it": "italian",
    "id": "indonesian",
    "hi": "hindi",
    "fi": "finnish",
    "vi": "vietnamese",
    "he": "hebrew",
    "uk": "ukrainian",
    "el": "greek",
    "ms": "malay",
    "cs": "czech",
    "ro": "romanian",
    "da": "danish",
    "hu": "hungarian",
    "ta": "tamil",
    "no": "norwegian",
    "th": "thai",
    "ur": "urdu",
    "hr": "croatian",
    "bg": "bulgarian",
    "lt": "lithuanian",
    "la": "latin",
    "mi": "maori",
    "ml": "malayalam",
    "cy": "welsh",
    "sk": "slovak",
    "te": "telugu",
    "fa": "persian",
    "lv": "latvian",
    "bn": "bengali",
    "sr": "serbian",
    "az": "azerbaijani",
    "sl": "slovenian",
    "kn": "kannada",
    "et": "estonian",
    "mk": "macedonian",
    "br": "breton",
    "eu": "basque",
    "is": "icelandic",
    "hy": "armenian",
    "ne": "nepali",
    "mn": "mongolian",
    "bs": "bosnian",
    "kk": "kazakh",
    "sq": "albanian",
    "sw": "swahili",
    "gl": "galician",
    "mr": "marathi",
    "pa": "punjabi",
    "si": "sinhala",
    "km": "khmer",
    "sn": "shona",
    "yo": "yoruba",
    "so": "somali",
    "af": "afrikaans",
    "oc": "occitan",
    "ka": "georgian",
    "be": "belarusian",
    "tg": "tajik",
    "sd": "sindhi",
    "gu": "gujarati",
    "am": "amharic",
    "yi": "yiddish",
    "lo": "lao",
    "uz": "uzbek",
    "fo": "faroese",
    "ht": "haitian creole",
    "ps": "pashto",
    "tk": "turkmen",
    "nn": "nynorsk",
    "mt": "maltese",
    "sa": "sanskrit",
    "lb": "luxembourgish",
    "my": "myanmar",
    "bo": "tibetan",
    "tl": "tagalog",
    "mg": "malagasy",
    "as": "assamese",
    "tt": "tatar",
    "haw": "hawaiian",
    "ln": "lingala",
    "ha": "hausa",
    "ba": "bashkir",
    "jw": "javanese",
    "su": "sundanese",

There is also an "auto" language option you can select which will attempt to automatically detect the language you are speaking in, but the performance is LOWER than that if you tell it to focus on a specify spoken language of yours.

Models

Whisper has a number of models which you can pick for your Private Cyberspace depending on the compute power of the hardware you have access to.

Model Parameters Memory Speed Default
Tiny 39 M ~1 GB ~32x
Base 74 M ~1 GB ~16x Y
Small 244 M ~2 GB ~6x
Medium 769 M ~5 GB ~2x
Large 1550 M ~10 GB 1x

People who own less powerful personal hardware can scale down from default Base model to the Tiny Model, while people who share more powerful community hardware can scale up to larger models.

An unique advantage of Private Cyberspace is the availability of the Partition AI layer on top of Whisper, enabling you to use achieve much better resuls with smaller models than possible with Whisper alone.