The MyCroft AI core stack consists of multiple software packages that are connected by the core stack. For example, every part of the voice assistant is handled by his own piece of “expertise”. In this blogpost will will give an better explanation of those different parts of this core stack.
Under the hood
Precise – Wake word spotting
In the early days, MyCroft AI used PocketSphinx for wake word spotting. Although phocketSphinx was open source and was a perfect fit for the philosophy of MyCroft, it was not really accurate. That is why the team decided to program this part of the stack itself. Precise was the outcome of that effort and is since mid-March 2018 the default wake-word spotter.
Precise is a Wake Word listener. As its name suggests, it is the task of a Wake Word Listener to constantly listen to sounds and speech around the device and activate when the sounds or speech match a Wake Word.
More information about Precise can be found on the documentation page;
https://mycroft.ai/documentation/precise/
Their github page;
https://github.com/MycroftAI/mycroft-precise
And how you can train Precise on your own voice and wake-word;
https://github.com/MycroftAI/mycroft-precise/wiki/Training-your-own-wake-word#how-to-train-your-own-wake-word
DeepSpeech – Speech To Text
Deepspeech is a “Speech To Text” (STT) package created by the The Machine Learning Group at Mozilla. Mozilla created this new, open source, machine learning-based STT technology called DeepSpeech built on research started at Baidu. The first version of MyCroft used Google it’s STT engine, however with privacy and opensource in mind this wasn’t really in line with the companies vision. Now MyCroft fixed the privacy a bit by anonymizing all “calls” as coming from the same one user, however they had no view on what was logged or not. However, secondly; it isn’t opensource.
More information on the Mozilla Speech & Machine Learning page;
https://research.mozilla.org/machine-learning/
And more on all the reasons why MyCroft moved from Google STT to Deepspeech;
https://mycroft.ai/blog/mycroft-speech-to-text-and-balance/https://mycroft.ai/blog/training-deep-speech-how-you-can-help/
Adapt – Natural language understanding
Adapt is an intent parser – meaning it is a library for converting natural language into machine-readable data structures, such as JSON. The Adapt Intent Parser is open source software. It is designed to run on devices with limited computer resources, such as embedded hardware.
The best way to explain what Adapt is, is by watching this little video;
Padatious – Machine-learning, neural-network based intent parser
Padatious is a machine learning, neural network based intent parser. It is an alternative to the Adapt intention parser. Unlike Adapt, which uses small groups of unique words, Padatious is trained on the sentence as a whole.
Padatious has a number of important advantages:
- With Padatious his intentions are easy to make
- The machine learning model in Padatious requires a relatively small amount of data
- Machine learning models must be trained. The model that Padatious uses is quick and easy to train.
- Intentions run independently of each other. This allows you to quickly install new skills without having to retrain all other skills.
- With Padatious you can easily extract entities and then use them in Skills. For example: “Find the nearest petrol station” -> {“place”: “petrol station”}
Mimic – Speech synthesis
Mimic is a fast, lightweight Text to Speech (TTS) engine developed by Mycroft.AI and VocaliD. Mimic is based on the FLITE software from Carnegie Mellon University. Mimic uses text as input and executes speech with the chosen voice.
Again, yet another video which explains it a lot beter then I can;
Soon Mimic 2 will been released with a new female voice that really sounds a lot better. Because MyCroft A.I. also have to pay bills (money makes the world go round), that new voice is now behind a earnings model of $ 1.99 monthly donation, but I expect that as soon as there are more votes there will probably be others behind the earnings model and your standard whether a male or female vote.
Anyway; You can also fully customize the TTS to other TTS engines. At this moment you can already choose between Mimic and Google via home.mycroft.ai.
Below you can compare the different voices with each other;
https://mycroft.ai/blog/available-voices/
Skills – Adding functionality
Skills provide Mycroft with functionality. Skills are what allow Mycroft to tell you the weather in your location, display the time in another timezone, play you a song or tell you a Chuck Norris joke.
Mycroft comes with several Skills built in. These are called core skills. It is easy to add more Skills to Mycroft. Skills are developed both by Mycroft.AI staff, as well as our Developer Community. To get a better understanding of which skills are already available have a look at this page;
https://github.com/MycroftAI/mycroft-skills
If you are a (python) developer you can also create your own skills. I will do another blog post later on how to get started. For now to have a look at which skills are available developed by the community. Scroll down on the page linked above.
You don’t have to be a developer to help making MyCroft A.I. better. There are also other ways to help making Precise, the wake word spotter and / or Deepspeech better by translating languages or validating wake word recording snippets.
But more on that in yet another blog post….
Do you like what you just read? Keep me going by donating on Paypal or become a Patron.