Wednesday, May 16, 2018

Hacking Alexa, Siri and Google Assistant With Hidden Voice Commands

A lot of people have written about how getting one of these voice-activated digital assistants is voluntarily bugging yourself.  The reactions have widely varied, but for the software to recognize when you call it, ("OK, Google"...) it has to be listening at all times.  It's a deliberate design feature, or decision.  Most people who read here, at least, would be aware that they were installing a full time listening device in their homes.  To some, they assume it's an invasion of privacy and don't want these things; to others, it's something they ignore for the perceived benefits of the digital assistant.

The New York Times tech blog reports that a group of researchers in a couple of institutions have been able to secretly activate the systems on smartphones and smart speakers, simply by playing music with sub-audible (to humans) sounds hidden in it over the radio.
A group of students from University of California, Berkeley, and Georgetown University showed in 2016 that they could hide commands in white noise played over loudspeakers and through YouTube videos to get smart devices to turn on airplane mode or open a website.

This month, some of those Berkeley researchers published a research paper that went further, saying they could embed commands directly into recordings of music or spoken text. So while a human listener hears someone talking or an orchestra playing, Amazon’s Echo speaker might hear an instruction to add something to your shopping list.

“We wanted to see if we could make it even more stealthy,” said Nicholas Carlini, a fifth-year Ph.D. student in computer security at U.C. Berkeley and one of the paper’s authors.
In a way, this isn't much of a surprise, right?  They're taking advantage of the "always-on, always listening" nature and trying to see just what the algorithms can extract from the other sounds.  I'd think the designers would do this.  Further, hijacking these things is nothing new.  Remember when Burger King grabbed headlines with an online ad that asked ‘O.K., Google, what is the Whopper burger?”  It caused Android devices with voice-enabled search to read the Whopper’s Wikipedia page aloud.  The ad was canceled after viewers started editing the Wikipedia page to make it more ... let's say comical.  Not long after that, South Park followed up with an entire episode built around voice commands that caused viewers’ voice-recognition assistants to spew adolescent obscenities.

A research firm has said that devices like Alexa, Siri and Google Assistant will outnumber humans by 2021, and add that more than half of American homes will have a smart speaker by then, just three years away. 

These security researchers aren't leaving bad enough alone. 
Last year, researchers at Princeton University and China’s Zhejiang University demonstrated that voice-recognition systems could be activated by using frequencies inaudible to the human ear. The attack first muted the phone so the owner wouldn’t hear the system’s responses, either.

The technique, which the Chinese researchers called DolphinAttack, can instruct smart devices to visit malicious websites, initiate phone calls, take a picture or send text messages. While DolphinAttack has its limitations — the transmitter must be close to the receiving device — experts warned that more powerful ultrasonic systems were possible.

That warning was borne out in April, when researchers at the University of Illinois at Urbana-Champaign demonstrated ultrasound attacks from 25 feet away. While the commands couldn’t penetrate walls, they could control smart devices through open windows from outside a building.

This year, another group of Chinese and American researchers from China’s Academy of Sciences and other institutions, demonstrated they could control voice-activated devices with commands embedded in songs that can be broadcast over the radio or played on services like YouTube.
Security researchers have a habit of saying that releasing information like this isn't bad because they think the bad guys have either thought of it already, or they would think of it on their own.  Maybe, although some times just knowing something is possible can keep the experimenter going during the inevitable times when things just don't seem to be working.  The article does say these exploits haven't been found "in the wild", but as more people become aware of the possibility, I'd expect them to start showing up.

Hopefully, the research being revealed will get the companies selling this software to try to get ahead and make their devices more robust.  My version: I have an older iPhone (6s) with Siri.  It's possible to configure the phone to listen all the time, so that when you say, "Hey, Siri" it answers.  I have that turned off, and have read Siri does not actually send data back when it's disabled.

I'm going to close with one of the last paragraphs in the article, because it contains the very best phrase in the whole piece. 
“Companies have to ensure user-friendliness of their devices, because that’s their major selling point,” said Tavish Vaidya, a researcher at Georgetown. He wrote one of the first papers on audio attacks, which he titled “Cocaine Noodles” because devices interpreted the phrase “cocaine noodles” as “O.K., Google.”
For some reason, it reminds me of this meme:


  1. We received one of those devices for Christmas a few years ago. It's still in the box.

    I also took great pains to disable the WiFi in the new TV we bought for the living room. Samsung has no business looking at what we watch.

  2. Question for Silicon Graybeard: Why wouldn't Alexa violate wire tapping laws?

    1. The FBI wouldn't do anything illegal using Alexa... I'm sure that they'd get a FISA warrant because that process is so clean and lawful.

    2. There's probably a disclaimer in the user's agreement that says you acknowledge audio can be going back someplace else at all times, "to improve our service" or something like that. You know; those long EULAs that nobody reads.

      You agreed to being listened to.

    3. Well, I'm not sure about that. A few years back we had an auto accident. I took pictures and was holding my small camera when the cop was talking to the other guy. I actually had the video mode turned on and heard the cop ask him if he had been using medication and heard the response. So I had him! BUT, in my state I can't record someone without their permission. I had the proof of DUI but couldn't use the proof legally. So in the case of alexa "I" may agree to be recorded but not everyone in my house did. So it is the same situation. Alexa is recording everyone including those who are unaware of it. IMHO either you can legally record people without their knowledge or you cannot. There is no gray area.

  3. Security researchers have a habit of saying that releasing information like this isn't bad because they think the bad guys have either thought of it already, or they would think of it on their own.

    I read about synthesis of audio frequencies from ultrasonic on a music speaker mailing list 20 years ago. In general, the bad guys already know about the holes.

    1. In general, the bad guys already know about the holes.

      "Yes but". You can be sure that the NSA and high level professionals know it. You probably know the term "script kiddies": kids that don't really know much about security and how stuff works, but find pieces of code that can run various exploits. They'll learn it/adopt it when attacks become available.

      Most criminals don't have a great work ethic.

    2. I reject the "attractive nuisance" idea. 13 year olds who run attack scripts deserve a huge law enforcement and parental response, which will reform most children and they will grow up. 17+ year olds who run attack scripts deserve to be in prison until there is reason to believe they won't do it again.

      Most criminals don't have a great work ethic.

      But some criminals are organized, and they cause the bulk of the damage. There is a "short term win/long term lose" dynamic here. The only long term win is to close the holes which are cheap to operate; crime must be made unprofitable. The only way to get the holes closed against the economic forces which reward vendors lying appears to be to produce kiddy scripts, which are so cheap to use as to be a public relations disaster. Otherwise you get the locksmiths hiding a master keying system vulnerability for 100 years.

      Self-driving automobile's radio: "And now the new hit single from band Tesla, _Emergency Stop_"

  4. The people who sell these egregious voice controlled hazards are sociopaths, and the people who purchase them want to live in a fantasy world. These groups are codependent; don't get squished between them.