I’ve been in IT for sometime, in fact I’ve worked with people younger than I’ve been in IT. I remember the dawn on the Personal Computer when a PC didn’t just mean an IBM clone or it’s later siblings, but a PC could mean one of dozens of different and totally incompatible computers.
Somewhere back then people started working on the idea of actually talking to a computer and it doing what you asked. Early systems of speech-recognition meant hours of training both user and computer software how to ‘converse’ with one another. Speech synthesis for the computer to ‘talk’ back was even further down the line.
Trouble is us humans are still way better at communication and understanding dialects and accents. For the last few years I’ve been an Android phone user, and currently use a Sony phone. I also own and use an iPad and a PC – so I actually can use, talk and shout in frustration at the three main culprits.
Lets start by saying that I’ve not done ANY pre-training of the systems to recognize my voice. A simple “Text Nesta” command was use to hopefully send a text to a Welsh friend in all of the above cases, followed if it gets that far by dictation of the actual text.
Google Voice is built into Android and works by sending your speech to a bunch of servers in the cloud which then analyses and translates it. Partly using linguistic rules but mostly by engineering and computational analysis against by analyzing impossibly huge stashes of information or previously gained voice snippets from other users and recordings.
The result is quite good. It will happily send a text for me to the right person.
Siri on the other hand doesn’t like me…. I’ve spent ages trying to get it to do the odd command, and after the last IOS update when I had the chance I disabled Siri.
Cortana on my PC was even more frustrating, now given that my PC is a Workstation with twin hex core processors, 48Gb of RAM and 100mb plus connection to the web I was expecting it to at least equal Google’s system.
Now, on the other hand I’ve seen people have different results with Cortana on their phones so it could be down to an issue with the Windows 10 implementation or the HP hardware I’m using or the Logitech desktop mike…
You might think that it’s unfair to compare a simple command to send a SMS to the right person and leave it at that. After all my phone had all my contacts in it, but if I typed the command into Cortana’s command line it worked, with my phone sending the text (my phone was paired with my PC on Bluetooth). Both Siri and Cortana fared better with a different contact name, but for the time being only my phone could also manage to set a timer, book an appointment, give me weather, or route me home. It was after all hardly a scientific test. But, in this case that’s the point, speech-recognition is meant to be just that, and not hours of training and carefully phrased words to get it to work.
It’s not all sunshine on the Android phone however. Asking it to “Play something by the Rolling Stones”, ended up with it playing “Like a Rolling Stone” by Bob Dylan…. Oh well!
It would seem the idea of Artificial Intelligence and HAL, are still some way off, but if my name was Dave, I’m quite sure at the moment the “I’m sorry Dave, I can’t do that” should be in the responses as a default reply a lot of the time.