How Amazon red-teamed Alexa+ to keep your kids from ordering 50 pizzas

RSAC If Amazon’s Alexa+ works as intended, it could show how an AI assistant helps with everyday tasks like making dinner reservations or arranging an oven repair. Or things could go terribly wrong: it might fire up the oven and turn dinner plans into a house fire.

This is why the e-commerce giant brought in security engineers, including both red teams and penetration testers, to work alongside product developers from the beginning, according to Amazon CISO Amy Herzog.

Their job was to anticipate what could go wrong and ensure safety and security guardrails were in place to prevent Alexa+ from jumping the track.

“It’s funny how, having been in both seats, the product engineer thinks about making the intended thing work, and the security engineer thinks about all the ways that you can game that system,” Herzog told The Register on the outskirts of RSA Conference in San Francisco this week.

“Whenever you’re talking about a system that can take actions on behalf of someone our immediate [reaction is]: Wouldn’t it be good if, like me, as someone who’s running this household could just say, This is what I need to go shopping for. These are the dinner reservations I need to make. Go make that happen. Schedule a delivery window,” she said.

“And then my kid comes into the kitchen and says, and also 50 pepperoni pizzas for me and my friends.”

The product engineer thinks about making the intended thing work, and the security engineer thinks about all the ways that you can game that system

While the developers tend to focus on the product’s potential — this is what they could build, and here are all the amazing, new things it could do — it’s the red team’s job to burst that bubble, or at least point out what could go wrong to ensure that systems are isolated and security mechanisms are put in place to prevent unintended or malicious consequences.

“So having both of those in the same design meeting is really beneficial, because then even the product engineer is like, ‘Oh yeah, you’re right. I would totally order pizzas.’ How do we make sure the system can handle that kind of thing?”

Herzog is one of four chief information security officers (CISOs) across the e-commerce giant, and she’s responsible for infosec across ads and devices — so Alexa and the next-gen Alexa+ fall under her purview.

The personal assistant, available only to Amazon-approved early testers at this point, is built on top of Amazon’s LLMs, and the company claims it can orchestrate actions across tens of thousands of services and devices to do things like control smart-home products, order groceries, play music, and make dinner reservations while remembering friends and family members’ dietary restrictions and restaurant preferences.

Amazon says it can also interact with third-party AI agents on behalf of users. In an example used by Herzog, this means that if your oven breaks, Alexa+ can go online, use a system like Thumbtack to find a repairperson in your area, schedule the repair, and then tell you when it’s fixed.

“And this kind of a product has, as you might imagine, a number of unique security considerations,” Herzog said. “All the same attacks are still there. In that sense, not much has changed. But since these things are non-deterministic, you have to really build in ways to understand its behavior in a different way than if you’re working with a deterministic system.”

In this case, non-deterministic means you can give Alexa+ (or any other AI assistant) the exact same input, or query, and it will spit out a slightly different output every time.

This can lead to prompt injection, where mischief-makers or attackers create malicious inputs to trick the LLM into overriding its safety guardrails and doing things it is not supposed to do, or even just combining a series of inputs in a manner that causes the LLM to behave in unintended ways.

Plus, Alexa+ talks to a lot of other apps and services, and that requires a ton of interaction with APIs to send and retrieve data, execute commands, and perform other actions to do the things that people want an AI assistant to do.

“We did a lot of testing on pathways between those APIs,” Herzog said, adding that the API pathways to turn your house lights on and off are different from those routes required to text your kid or make a dinner reservation. “Which ones should be grouped with which other ones? How do we expect different actions to be taken together?”

This approach of bringing the offensive security teams together with product developers from the start of the process is “somewhat unusual,” Herzog noted. “Usually, my red teamers like a system to be done before we let them loose on it.” ®

Source

How Amazon red-teamed Alexa+ to keep your kids from ordering 50 pizzas

Sofia

Geneva