Skip links

Block CISO: We red-teamed our own AI agent to run an infostealer on an employee laptop

interview When it comes to security, AI agents are like self-driving cars, according to Block Chief Information Security Officer James Nettesheim.

“It’s not enough for self-driving cars to be just as good as humans,” Nettesheim said in an exclusive interview with The Register. “They have to be safer and better than humans – and provably so. We need that with our agentic use, too.”

The parent company of Square, Cash App, and Afterpay is pushing hard to position itself as an AI leader, co-designing the Model Context Protocol (MCP) with Anthropic and using MCP to build Goose, its open source AI agent that’s used by almost all Block’s 12,000 employees and connects to all of the company’s systems including Google accounts and Square payments.

A year ago, the company open sourced Goose.

As CISO, it’s Nettesheim’s job to ensure that Goose and all of Block’s AI-based systems are designed to be deployed securely, and at enterprise scale – which sounds a little bit terrifying.

“Being CISO is very much about being okay with ambiguity and being uncomfortable in situations,” Nettesheim said. “We are balancing risk constantly, and having to make trade off – in the AI space in particular. Like: What is a bigger risk right now? Not taking advantage of the technology enough? Or the security downsides of it? LLMs and agents are introducing a new, very rapidly evolving space.”

Least-privilege access

However, he adds, humans are just as capable as machines at introducing security risks into corporate environments.

“Software engineers also download and execute things they shouldn’t,” Nettesheim said. “Users do that regularly. We write bugs in our code to where it doesn’t execute. So we really just have to apply a lot of the principles we already have about making sure these agents are executing with least privilege, just like I want my software engineers to be doing.”

What is a bigger risk right now? Not taking advantage of the technology enough? Or the security downsides of it?

In other words: applying least-privilege access to humans and machines. Block employees should only have access to data they need to do their jobs – same with the company’s AI agents, he explained. Customer data should only be retained as long as it’s needed for a specific purpose, and this same rule should apply to AI agents

“There are times where it’s a risky area, and we need to dive a little deeper and make sure we’re protecting” certain data and systems, Nettesheim said, using the example of an agent querying data on a user’s behalf. For example, if a user asks Goose to provide information about their store, or their account, it’s vital that only information about that particular user is accessed and returned. 

To this end, Block uses penetration testing and other offensive security measures to identify how attackers could abuse its AI agent, and then find ways to fix the issue.

In a blog shared exclusively with The Register, Block described how it red-teamed Goose, and in one case successfully used a prompt injection attack to infect an employee’s laptop with information-stealing malware.

Prompt injection happens when a prompt is manipulated to include malicious instructions that the AI carries out, either through direct text input or indirect, hidden commands embedded in content that may be invisible to the user.

Poisoning a goose recipe

No one has solved this security snafu. “With our internal usage, we have to assume that prompt injection is possible,” Nettesheim said.

Goose uses “recipes” ​​- reusable workflows that can be shared with other users – and the red team realized that the portable nature of these workflows could allow an attacker to poison a recipe.

“One of the hiccups we ran into is that it’s possible to hide what that recipe is actually doing, tricking both the user and the agent into performing actions,” Nettesheim said.

Specifically: the team used a combo of phishing – direct emails to the goose development team about a purported “bug” in the system – and prompt injection, with the payload hidden in invisible Unicode characters.

In the process of “debugging” the workflow, the developer clicked the poisoned recipe, which downloaded and ran the infostealer.

The development team has since added features and built new prompt-injection mechanisms into Goose, both for internal Block use and also in the open source version. 

This included a “recipe install warning” that alerts users before they execute a new recipe and warns them only to proceed if they trust the source, thus adding transparency to the instructions.

Plus, Goose now displays desktop alerts when a recipe contains suspicious Unicode characters and detects and removes invisible Unicode characters inserted into strings. These can be used to hide malicious commands within the text, causing the AI to execute hidden instructions.

Adversarial AI

Additionally, Block is working on new ways to prevent prompt injection; some of these, like improved detection, have already been integrated into Goose. It’s experimenting with others including adversarial AI – using an AI to attack its own AI systems and agents, as well as validating inputs and monitoring outputs.

“Let’s use another LLM or another agent to check the content of the prompt and tell us if it thinks it’s not good or bad, and then warn the user that it’s bad,” Nettesheim explained. 

Block is still testing this internally, and hasn’t yet merged it with open source Goose as it irons out kinks in speed – it takes longer to send an agent’s input and output to another LLM for security checks – and quality to ensure the AI isn’t sending fake, or too many, security alerts that overwhelm analysts and prevent them from focusing attention on real threats.

But Nettesheim remains confident that adversarial AI will soon become another tool in security teams’ arsenals. “It’s gaining traction in the research community, and I think we are helping lead the way,” he said. “If you have two agents that are competing or collaborating with each other, it leads to better code generation.”

As with humans, two minds are better than one. ®

Source