Brooks McMillin

Infrastructure Security Engineer at Dropbox

I lead a team focused on AI agent security and LLM tooling. We build the frameworks engineers use to ship AI features safely.

This is the most interesting security problem I've worked on in a decade. Agents blur the line between code and user. The old playbook doesn't cover it. Getting the primitives right now shapes the next ten years of AI tooling, and I'm thrilled to be working on it.

Read my approach Latest writing

About

I got into security through abuse investigations — reading other people's bad ideas at scale. It's an excellent education. You stop asking "is this system secure?" and start asking "what would I do if I wanted to break it on a Tuesday?"

A decade later, I lead a small team at Dropbox. We work on the messiest part of modern shipping: keeping AI features from quietly turning into attack surface.

My background is unusual for AI security. I came up through operational security, not ML research. That shapes what I'm trying to do for the teams I work with: help them ship AI features fast without staying up at night wondering what the agent is doing. The interesting question isn't "can we trust the model?" It's "what does the system look like when we assume we can't?"

A Coding Agent Read a File That Didn't Exist Five Times, Then Blamed the Tools

June 3, 2026

A Claude Code session confabulated a nonexistent Python file, persisted against five truthful "does not exist" errors, then self-diagnosed as corrupted tool output. A reconstruction from the raw transcript, a corpus scan across 3,001 sessions on whether the failure is worse in Opus 4.8, and a model-independent mitigation.

#ai-security #agents #claude-code #llm-failure-modes

Read Article

Current Focus

AI Agent & Infrastructure Security

My team keeps AI agents from doing things they shouldn't. We give engineers the tools to ship AI features without creating new attack surface: sandboxes, permission systems, identity primitives, and runtime guardrails that work for autonomous, semi-autonomous, and copilot-style systems alike.

We work on a small set of problems and try to solve them well. The bet is that getting the primitives right (capability tokens, scoped sandboxes, audit trails that survive production) is more durable than trying to detect every prompt injection variant after the fact. Defense in depth, but for systems that are probabilistic by design.

What we work on

Sandboxing, permissions, and runtime controls for autonomous AI agents
LLM security tooling that fits into existing developer workflows
Threat modeling for MCP, multi-agent systems, and tool-use patterns in production
Identity, access control, and data protection for AI/ML infrastructure
Internal frameworks that make secure-by-default the easiest path for shipping AI features
Audit logging and incident response patterns for non-deterministic systems

Available for Consulting

Work with me

I work with teams shipping AI at scale. If you need help securing agents, hardening MCP servers, or building capability-aware frameworks, let's talk.

Typical engagements: security audits of agent and LLM systems, MCP server implementation and review, and agent hardening for production deployments.

Schedule a Consultation

How I Approach the Work

Most of my work is unsexy: sandboxing, permissions, identity. That's where trust actually happens. A few principles guide what my team builds. They've held up across roles at Facebook, American Airlines, and Dropbox.

Build, don't just review

Security teams that only gatekeep don't scale. The work isn't fun either. I'd rather ship a framework that makes the right thing easy than write a policy telling people not to do the wrong thing. We produce code, not slide decks.

Assume the model can be manipulated

Prompt injection is the main novel attack vector against LLMs. I don't think it's solvable. Every layer of the stack should assume an attacker can steer the model's intent. That assumption changes the design. Capability boundaries, tool interfaces, and just-in-time access all become non-optional.

Contain blast radius

Autonomous agents will do something unexpected. The question is what happens next. If I know exactly what an agent can touch and how to revoke it cleanly, I can hand off real work without losing sleep. Sandboxes, scoped credentials, and revocable tokens beat guardrails the model can talk its way around.

Raise the floor, don't slow the team

Secure-by-default tooling earns its keep when the easy path is the safe path. I optimize for "engineer ships AI feature without thinking about security," not "engineer fills out a security review form." The win is simple: the floor on a feature shipped at midnight is good enough that nobody loses sleep.

Featured Projects

MCP OAuth Framework

production

PythonOAuth 2.0Model Context ProtocolPKCE

An OAuth 2.0 framework for protecting MCP servers. Ships as three pip-installable packages: auth server, resource server, and a runnable example.

Authorization server with PKCE, dynamic client registration, and RFC 6749 errors
Resource server with RFC 7662 introspection and SSRF protection

Learn more → GitHub →

TaskManager

production

AstroNode.jsPostgreSQLOAuth 2.0

A task manager built around a real OAuth 2.0 auth server. Includes a Python SDK and MCP server, so my AI agents can manage tasks too.

Full OAuth 2.0 authorization server with PKCE support
Security testing suite with Vitest

Learn more → GitHub →

SMS Communications Suite

production

GoPythonGSMSerial Communication

Send and receive SMS through GSM modems. Includes CLI tools and libraries in both Go and Python.

Cross-platform GSM modem interface (Go + Python)
Interactive CLI chat interface

Learn more → GitHub →

ReMarkable Research Toolkit

production

PythonGormapiAnthropic Claude

Tools for managing research papers on reMarkable tablets. Uses AI to classify and sort them automatically.

AI-powered research paper classification
Zero-config rmapi binary management

Learn more → GitHub →

View All Projects

Recent Appearances

Confused Deputies and Stolen Tokens: Breaking and Rebuilding MCP Auth

podcast

Off By One Security · May 2026

Joining Stephen Sims on Off By One Security to walk the attack surface of a typical MCP deployment, then wire in the OAuth defenses one layer at a time using mcp-authflow.

Details → Watch Video

Poisoning the Safety Net

talk

CackalackyCon · May 2026 · Durham, NC

Attacking AI code review pipelines. How the defensive layers around LLM-generated code (context files, AI reviewers, agent runtimes) become their own attack surface, and what defenses actually hold in 2026.

Details →

Building Secure Agentic Systems: Lessons from Daily-Driver Agents

talk

[un]prompted · March 2026 · San Francisco, CA

Real lessons from building specialized agents on shared infrastructure. Covers capability bounding, prompt injection detection, memory isolation, and OAuth device flow.

Details → Watch Video

LLMs Will Never Be Fully Secure

podcast

The Secure Disclosure Podcast · March 2026 · Mesa, AZ

A discussion on malicious MCP servers and common AI security mistakes. Plus why prompt injection sticks around and how to deploy AI safely.

Details → Watch Video

View All Appearances

Let's Connect

If you're working on AI security, thinking about agent sandboxing, or just want to compare notes on what's actually breaking in production — I'd love to chat. Speaking invitations, collaborations, half-formed ideas, or a quick "have you seen this?" are all genuinely welcome. The inbox is open.

hello@brooksmcmillin.com GitHub LinkedIn Bluesky

Send a Message More About Me