A group of students from University of California, Berkeley, and Georgetown University showed in 2016 that they could hide commands in white noise played over loudspeakers and through YouTube videos to get smart devices to turn on airplane mode or open a website.In a way, this isn't much of a surprise, right? They're taking advantage of the "always-on, always listening" nature and trying to see just what the algorithms can extract from the other sounds. I'd think the designers would do this. Further, hijacking these things is nothing new. Remember when Burger King grabbed headlines with an online ad that asked ‘O.K., Google, what is the Whopper burger?” It caused Android devices with voice-enabled search to read the Whopper’s Wikipedia page aloud. The ad was canceled after viewers started editing the Wikipedia page to make it more ... let's say comical. Not long after that, South Park followed up with an entire episode built around voice commands that caused viewers’ voice-recognition assistants to spew adolescent obscenities.
This month, some of those Berkeley researchers published a research paper that went further, saying they could embed commands directly into recordings of music or spoken text. So while a human listener hears someone talking or an orchestra playing, Amazon’s Echo speaker might hear an instruction to add something to your shopping list.
“We wanted to see if we could make it even more stealthy,” said Nicholas Carlini, a fifth-year Ph.D. student in computer security at U.C. Berkeley and one of the paper’s authors.
A research firm has said that devices like Alexa, Siri and Google Assistant will outnumber humans by 2021, and add that more than half of American homes will have a smart speaker by then, just three years away.
These security researchers aren't leaving bad enough alone.
Last year, researchers at Princeton University and China’s Zhejiang University demonstrated that voice-recognition systems could be activated by using frequencies inaudible to the human ear. The attack first muted the phone so the owner wouldn’t hear the system’s responses, either.Security researchers have a habit of saying that releasing information like this isn't bad because they think the bad guys have either thought of it already, or they would think of it on their own. Maybe, although some times just knowing something is possible can keep the experimenter going during the inevitable times when things just don't seem to be working. The article does say these exploits haven't been found "in the wild", but as more people become aware of the possibility, I'd expect them to start showing up.
The technique, which the Chinese researchers called DolphinAttack, can instruct smart devices to visit malicious websites, initiate phone calls, take a picture or send text messages. While DolphinAttack has its limitations — the transmitter must be close to the receiving device — experts warned that more powerful ultrasonic systems were possible.
That warning was borne out in April, when researchers at the University of Illinois at Urbana-Champaign demonstrated ultrasound attacks from 25 feet away. While the commands couldn’t penetrate walls, they could control smart devices through open windows from outside a building.
This year, another group of Chinese and American researchers from China’s Academy of Sciences and other institutions, demonstrated they could control voice-activated devices with commands embedded in songs that can be broadcast over the radio or played on services like YouTube.
Hopefully, the research being revealed will get the companies selling this software to try to get ahead and make their devices more robust. My version: I have an older iPhone (6s) with Siri. It's possible to configure the phone to listen all the time, so that when you say, "Hey, Siri" it answers. I have that turned off, and have read Siri does not actually send data back when it's disabled.
I'm going to close with one of the last paragraphs in the article, because it contains the very best phrase in the whole piece.
“Companies have to ensure user-friendliness of their devices, because that’s their major selling point,” said Tavish Vaidya, a researcher at Georgetown. He wrote one of the first papers on audio attacks, which he titled “Cocaine Noodles” because devices interpreted the phrase “cocaine noodles” as “O.K., Google.”For some reason, it reminds me of this meme: