In my previous post I wrote about my ideas to automate my home with an important part to be a A.I. personal assistant. There are smart speakers out there that can be used to do more, but my vission is to have it all in one device if possible.
So in this post I would like to share with you my vision. How I see it. How this perfect DIY A.I. Personal Assistance should look like. Which hardware, which software, etc. So let’s start with a quick bullet list and elaborate a bit on how we get there after that.
How I see it
- Linux based. (Opensource)
- The device should look GOOD! (Although that will be part of the fine tuning process later on)
- Home Assistant for all Home Automation. (Powerfull and Opensource)
- Speech by MyCroft A.I. (Opensource and Privacy minded)
- Home Assistant and MyCroft fully and seamlessly working together.
- HDMI to TV for local multimedia, messages, additional information etc.
- All the rest Alexa, Google Assistance and Cortana can do. (Or at least close to that)
Then I also have a few nice to have’s for later on in the future;
- VOIP / Skype / (Video) calling support.
- Camera that uses DOA to always look at you while “calling”. (Similar as the Moon by 1-ring)
So let us start with the hardware that we need to have a stab at the above ideas.
the Hardware
Raspberry Pi 3.
Have been in doubt which version to choose at the moment. The new Raspberry Pi 3B+ is the newest, but during my research phase of all the different software needed I saw different issues with it. The WiFi and Debian / Raspbian support in combination with MyCroft. Now, I always like a challenge and those things will not keep my from using all the latest and greatest version of everything. In the end I played it safe and choose the “normal” 3B instead of the newer 3B+. (for now 😉 )
ReSpeaker 4-Mic Array for Raspberry Pi
For a personal assistance to be of a success, you really need to have a proper microphone. Nothing more annoying to speak exactly into the right direction, speak really clearly and mute all other sound devices before that. For proper noise cancellation, possibly DOA (Direction Of Arrival) and a good sensitivity, you really need a microphone array. In my search for a proper, yet affordable Mic-Array I found the 4-Mic ReSpeaker array for the Raspberry Pi from Seeedstudio.
This board is developed based on AC108, a highly integrated quad-channel ADC with I2S/TDM output transition for high definition voice capture, which allows the device to pick up sounds in a 3 meters radius. Besides, this 4-Mics version provides a super cool LED ring, which contains 12 APA102 programable LEDs. With that 4 microphones and the LED ring, Raspberry Pi would have ability to do VAD(Voice Activity Detection), estimate DOA(Direction of Arrival), do KWS(Keyword Search) and show the direction via LED ring, just like Amazon Echo or Google Home.
The perfect candidate for our project. Plus as a bonus, it only cost around $25,-
The case to hold it all
Like said before. It all also have to look good. A Personal A.I. Assistance device normally sit’s next to your TV or on your desk. So in plain sight. Most Raspberry Pi cases look, well …. Don’t look that good. At least far from good enough to have it 24/7 in plain sight. However apparently the Germans are not only good in Soccer. I found the Orbital Case. It looks good. Not that big, but more importantly; I think it would be easy to be modded for the ReSpeaker to be easily added into it.
What I have in mind is; With it’s diameter of 14 cm. I am thinking of cutting out a big circle of about 12-13 cm on the top and cover it (nicely looking of course) with semi-transparent fiber/cloth. Basically similar as the Google Home Mini, but than flat at the top (if I make any sense). In a later post, I will share a few thoughts and possible quick drawings of it. I am open for ideas, so if have a good idea about it, please by all means comment on this post below.
That is it for now hardware-wise. Above is for the base Personal A.I. Assistance. The whole Home Automation part of the system has not yet been fully decided up on. I will talk about that in a later post, when we are getting to it.
Next up is the discussion about the different software components.
Do you like what you just read? Keep me going by donating on Paypal or become a Patron.
Glad to see this series. I have a few devices in the house running MQTT that look hackable and maybe even user provided firmware possible — currently they are hooked up to Alexa.
I’ve been looking at Mycroft and snips.ai as potentials for moving them from Alexa to in-my-house control, since the ASR going outside of the house really bothers me. However, ASR seems to be its own thing somewhat seperate from IFTTT. I have a few old projects laying around and was considering putting a GPU into a machine to use for DNN projects. I know you can put Alexa on a Raspberry Pi, so I might like to forward non-local requests off to Alexa.
Frankly, TensorFlow is a little intimidating to me.
The end plan is to have MyCroft on the RPi working seamlessly with a local MQTT broker and HASS. For the ASR, MyCroft now uses Mozilla Deepspeech (Will do a proper Post about MyCroft A.I and it’s different software components a bit later), however in the beginning was also using Google’s STT. (Although anonymous by stripping all location/person related aspects).
The good part of Mozilla Deepspeech is that you can run it locally if you want. Right now, you need a heavy machine for it with a big-ass GPU to help you out. However with all the SoC’s that are coming out now a days with NPU’s onboard, I think it is only a matter of time before we can create our own local – Deepspeech embedded server.
Thanks a million for posting this series, up until now I’ve been having an issue finding solid info.
Quick question – what speaker are you using in your setup? I’m in the process of buying parts for my hass.io and mycroft project and didn’t see it listed in your series?
Thanks!
Hi Pat, at the moment I haven’t really focussed on the speaker part. All testing / development so far has been done with simple logitech speakers connected to the 3.5 plug. However, the end result in my opinion is a BT solution via pulseaudio. Hopefully multiroom with presence detection.