88.io provides tools to help in the push towards citizens not only owning their data but also the intelligence that comes from their data.
A major weakness of many voice recognition system is the sending of your voice to the Cloud for recognition or for training.
Speech is a simple and convenient user interface, unfortunately its uses have been dominated by the Cloud platforms eg. Apple, Google, Amazon etc.
By taking advantage of Partition AI enables Private Cyberspaces come with their own independent Voice Recognition system, which can be trained privately by you and be used cross different platforms.
Client Voice Recognition
Server Voice Recognition
With your own Entity Agent, your are the only one with access to your voice in order to:
Give Voice Commands to your Agent
Train your Agent to recognise your Voice Commands
NOTHING is sent out of your personal device. The process does NOT using any external APIs, all commands and their training remains on your device.
Speech Quality
We have tuned the STT to work even on the traditional telephone network (using A-law codec with 8kHz sampling rate).
Introduction to SST
With private cyberspace EVERYONE (yes you) got to train their own STT engine, for those who want to learn a bit about the technology behind the STT they use everyday, the following are some good introductions:
There is also an "auto" language option you can select which will attempt to automatically detect the language you are speaking in, but the performance is LOWER than that if you tell it to focus on a specify spoken language of yours.
Models
Whisper has a number of models which you can pick for your Private Cyberspace depending on the compute power of the hardware you have access to.
Model
Parameters
Memory
Speed
Default
Tiny
39 M
~1 GB
~32x
Base
74 M
~1 GB
~16x
Y
Small
244 M
~2 GB
~6x
Medium
769 M
~5 GB
~2x
Large
1550 M
~10 GB
1x
People who own less powerful personal hardware can scale down from default Base model to the Tiny Model, while people who share more powerful community hardware can scale up to larger models.
An unique advantage of Private Cyberspace is the availability of the Partition AI layer on top of Whisper, enabling you to use achieve much better resuls with smaller models than possible with Whisper alone.
These are the words that your agent initially understands, which you can start training your agent with.
Numbers
zero
one
two
three
four
five
six
seven
eight
nine
Direction
up
down
left
right
stop
Status
yes
no
This vocabulary is always there in additional to other vocabularies, to handle words that your agent cannot recognise.
Command
Command Menu
0 - Help
1 - Tracking - Active / Passive
2 - Venue - Arrive / Depart
3 - Agent - do to conversion with anything llm api
4 - Contact - lost property, attractive person, missing person
5 - Advertisement - reality ads, ads on vehicles, signage
6 - News - other interesting events NOT covered by categories above
7 - Review - good performer, good restaurant, traffic delay
8 - Hazard - pot hole on road, people with flu symptoms, broken vehicle, rubbish on road
9 - Emergency - crime, medial, fire
Acknowledge Menu
0 - No
1 - Yes
2 - Sub-Menu
Tracking
Active
Browser - once every 1 minute
Owntracks iOS - Move Mode - every 5 minute OR move more than 50 meters
Owntracks Andriod - Move Mode - every 10 seconds
Home Assistant iOS -
Home Assistant Android - High Accuracy Mode -
Passive
Browser - check once every 5 minutes
Owntracks iOS - Significant Mode - every 5 minute AND move more than 500 meters
Owntracks Android - Move Mode - every 5 minute AND move more than 50 meters
Runs inside web browser can be used OFFLINE. The is the most private option as your voice never leaves your phone. In development, currently of limited capacity.
For a demonstration of web browser STT, go to https://speech.88.io and see how much your pre-trained agent can already recognise your voice BEFORE training.
Currently each word is represented by 696 speech spectrogram numbers holding the frequency information of the word pronounced by you.
Conversion between to Phoneme formats are required in some cases, for example, between International Phonetic Alphabet (IPA) and ARPABET which is used by CMU Sphinx:
ɔ AO
ɔː AO
o AO
oː AO
ɑ AA
ɑː AA
ɒ AO
iː IY
i IY
uː UW
u UW
ɛ EH
ɪ IH
ʊ UH
ʌ AH
ɐ AH
ə AH
æ AE
a AE
e AE
eɪ EY
aɪ AY
oʊ OW
aʊ AW
əʊ OW
iə EH
eə EH
ɔɪ OY
ɝ ER
ɜ ER
ɜː ER
ɹ R
r R
p P
b B
t T
d D
k K
ɡ G
ʧ CH
tʃ CH
dʒ JH
f F
v V
θ TH
ð DH
s S
z Z
ʃ SH
ʒ ZH
h HH
m M
n N
ŋ NG
l L
j Y
w W
ʔ Q
'
ˈ
ː
ˌ
+SPACE+ SIL
x K
ɲ N
### ɑ̃ N
### ɣ ZH