Introduction

LLMs have become capable coders, but they can do unintuitive things in the name of achieving their goals. They will happily remove your authentication middleware if it helps them complete the task faster. They’ll downgrade HTTPS to HTTP if it simplifies the config. They’ll paste your API keys directly into GitHub Actions if you don’t stop them.

These aren’t hypotheticals. In the past month, I’ve caught Claude attempting to:

  • Store OAuth tokens in plaintext JSON files
  • Remove CSRF protection because “it was causing test failures”
  • Use eval() to parse user-provided configuration
  • Commit a .env file with production database credentials
  • Update test PostgreSQL credentials—which already had # pragma: allowlist secret appended, bypassing the secrets pre-commit hook—with my actual production admin credentials to get a test working
  • Truncate a 400-line Svelte component down to 50 lines when attempting a targeted edit, losing authentication logic in the process

LLMs are powerful coding assistants, but they lack the paranoia that keeps production systems secure. They optimize for task completion, not operational safety. The solution (thankfully) isn’t to stop using them, it’s to build safety nets that catch dangerous code before it ships.

This post covers the defensive layers I use across my projects: pre-commit hooks that block insecure patterns, automated review agents that catch context-blind mistakes, and CI workflows designed to fail loudly when LLMs generate problematic code. These patterns come from my TaskManager and agents repos where LLM-generated code meets production security requirements. Many of these same patterns apply to securing MCP servers and OAuth implementations, a topic I’ll be covering in more depth at CactusCon.

The Multi-Layer Defense Strategy

Traditional security assumes humans write code and occasionally make mistakes. LLM-assisted development inverts this: the baseline is fast, confident, and potentially catastrophic. Security becomes about layers that verify rather than trust.

The model is simple:

LayerWhat it catchesWhen it runsCan be bypassed?
Pre-commit hooksPatterns, secrets, type errorsBefore commitYes (--no-verify)
Review agentsLogic errors, context mismatchesOn PR creationYes (ignore comments)
CI workflowsEverything above + integrationBefore mergeNo (if set up to block merge)

Each layer catches different failure modes. Together, they create defense in depth.

Layer 1: Pre-commit Hooks (The First Line)

Pre-commit hooks run on your local machine before code even reaches git. This is your first opportunity to catch LLM hallucinations, insecure patterns, and configuration mistakes.

Here’s a sample of the pattern I use across Python projects (from taskmanager/.pre-commit-config.yaml):

repos:
  # Secret detection - catches leaked credentials
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.5.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']

  # Python linting and formatting (ruff replaces black, isort, flake8)
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.8.6
    hooks:
      - id: ruff
        args: [--fix, --exit-non-zero-on-fix]
      - id: ruff-format

  # Type checking - verifies type hints
  - repo: local
    hooks:
      - id: pyright
        name: pyright
        entry: uv run pyright
        language: system
        types: [python]
        pass_filenames: false

  # Security scanning
  - repo: https://github.com/PyCQA/bandit
    rev: 1.9.2
    hooks:
      - id: bandit
        args: ['-c', 'pyproject.toml']

What this catches:

  • Secret leaks: LLMs sometimes paste API keys directly into code. detect-secrets blocks commits containing credential patterns.
  • Type inconsistencies: LLMs can be sloppy with types. pyright enforces strict typing.
  • Security anti-patterns: bandit flags insecure crypto, SQL injection risks, and hardcoded passwords.

Running Tests in Pre-commit

The agents repo runs tests in pre-commit (agents/.pre-commit-config.yaml):

- repo: local
  hooks:
    - id: pytest
      name: pytest
      entry: uv run pytest
      language: system
      types: [python]
      pass_filenames: false
      stages: [pre-commit]

This is aggressive but effective. If an LLM change breaks tests, you know immediately before the commit happens.

Trade-off: For large test suites, this can be slow. My approach:

  • Pre-commit: Fast unit tests only (< 30 seconds)
  • CI: Full test suite including integration tests

Layer 2: Code Review Agents (Context Checking)

Pre-commit hooks are great at catching bad patterns, but review agents can analyze the whole system and catch logical inconsistencies. LLMs are excellent at spotting code smells, inconsistent patterns, and architectural violations—but only if you give them the right prompt and the right scope.

Local Agents

My most valuable agent checks run locally where they have access to extensive tooling for reviewing the codebase. I currently have four main agents in my claude-code-agents repo that I can fire off in parallel to review code optimization, test coverage, dependency management, or security concerns.

Each one is optimized to have specific goals and avoid polluting their context windows with unrelated concerns. They don’t make changes directly—the main Claude Code instance summarizes all the agents’ findings and prioritizes what should be fixed. Here’s what the agents found when I ran them against my agents repo:

These checks can be token-intensive. The run from the screenshot above used over 250k tokens, so it’s worth being intentional about which agents to run and when, especially if you’re watching costs. I typically don’t run these for every PR—more like once a week or when the codebase feels like it’s getting messy.

Why this works:

  1. Read-only analysis: The agents use many tools (~250 combined in the example above), but they don’t make any changes. They provide recommendations that you can have Claude fix later, or address yourself.
  2. Repository context: The CLAUDE.md file provides architecture patterns, security requirements, and coding standards.
  3. Specialization: LLM performance declines as context gets polluted with unrelated concerns. Having each agent specialize in one aspect lets them do more extensive work.
  4. Human-in-the-loop: The agents report findings; a human decides what to act on.

Agents in CI

Here’s how I use Claude Code in GitHub Actions to review every PR (taskmanager/.github/workflows/claude-code-review.yml).

One particularly valuable catch: the CI review agent regularly finds cases where my local Claude Code session emptied or truncated files, losing significant amounts of code. This happens more often than you’d expect. The main causes are full file rewrites (where Claude regenerates an entire file from memory instead of making a targeted edit), output token limits (hitting the max mid-stream leaves you with a truncated file), and context pollution (longer sessions make it harder to accurately reproduce file contents).

Large files, multiple files modified in the same session, and complex multi-step changes all make truncation more likely. The mitigation is straightforward: commit frequently, keep files modular, and start fresh sessions for risky operations. Having a CI agent with clean context sanity-check every PR has caught many truncation issues that would have otherwise reached main and caused runtime failures.

name: Claude Code Review

on:
  pull_request:
    types: [opened]

jobs:
  claude-review:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: read
      issues: read
      id-token: write

    steps:
      - name: Checkout repository
        uses: actions/checkout@v6
        with:
          fetch-depth: 1

      - name: Run Claude Code Review
        uses: anthropics/claude-code-action@v1
        with:
          claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
          prompt: |
            REPO: ${{ github.repository }}
            PR NUMBER: ${{ github.event.pull_request.number }}

            Please review this pull request and provide feedback on:
            - Code quality and best practices
            - Potential bugs or issues
            - Performance considerations
            - Security concerns
            - Test coverage

            Use the repository's CLAUDE.md for guidance on style and conventions.

            Use `gh pr comment` with your Bash tool to leave your review as a comment on the PR.

Why this works:

  1. Limited tool access: The agent can only use gh commands (view PRs, leave comments). It can’t push code or modify workflows.
  2. Repository context: The CLAUDE.md file provides architecture patterns, security requirements, and coding standards.
  3. Human-in-the-loop: The agent comments on PRs but doesn’t auto-merge. A human makes the final call.

Security-Focused Review Agent

I run a second review agent specifically for security (taskmanager/.github/workflows/claude-security-review.yml):

prompt: |
  Please perform a security-focused review of this pull request. Analyze the changes for:

  **Security Concerns:**
  - Authentication and authorization vulnerabilities
  - Injection attacks (SQL, command, code injection)
  - Sensitive data exposure (credentials, API keys, PII)
  - Insecure dependencies or imports
  - Path traversal or file access issues
  - Improper input validation or sanitization
  - Cryptographic weaknesses
  - OAuth/token handling issues
  - Race conditions or TOCTOU vulnerabilities

  **Code Quality (security-relevant):**
  - Error handling that might leak sensitive info
  - Logging that might expose secrets
  - Unsafe deserialization
  - Resource exhaustion possibilities

What the Security Agent Actually Catches

Here are three real examples from my repos where the security agent caught issues that static analysis would miss.

1. Open Redirect Bypass (PR #160)

The agent flagged that my redirect validation using startswith() was bypassable:

if parsed.scheme and not return_to.startswith(settings.frontend_url):
    raise errors.validation("Invalid redirect URL")

Issues identified:

  • Protocol-relative URLs like //evil.com/phishing have no scheme but redirect externally
  • Subdomain attacks: http://localhost:3000.evil.com/ passes the startswith() check
  • User-info attacks: http://localhost:[email protected]/ bypasses naive checks

Static analysis tools verify that redirect validation exists but can’t reason about whether the logic is correct.

2. Race Condition in Registration Codes (PR #141)

The agent identified a TOCTOU vulnerability in single-use registration codes:

await validate_registration_code(db, request.registration_code)  # Check
# ... user creation happens here ...
await use_registration_code(db, request.registration_code)       # Increment (too late!)

Attack scenario: Admin creates a code with max_uses=1. Attacker sends 10 concurrent requests. All 10 pass validation before any increment the counter. Result: 10 users created instead of 1.

Static analysis cannot detect race conditions—this requires understanding concurrent execution.

3. Incomplete Authorization Check (PR #152)

The agent caught a subtle IDOR-adjacent issue:

When updating a todo with a project_id, the code silently sets project_name and project_color to None if the project doesn’t belong to the user—but doesn’t reject the request. A user could associate a todo with an unauthorized project ID.

Static analysis sees the authorization check exists but can’t determine it’s incomplete.


The pattern across these examples: the security agent catches logic flaws (wrong validation, race conditions, incomplete authorization) rather than syntactic issues (missing sanitization calls, dangerous functions). That’s the key differentiator from static analysis.

The security agent also catches context-dependent issues like inconsistent protection across endpoints. An LLM might add authentication to /api/login but forget to protect /api/users. The review agent spots missing auth on sibling endpoints because it understands the intent of the code, not just its patterns.

Cost consideration: Each review costs ~$0.10-0.50 depending on PR size. For active repos, this adds up. Consider:

  • Running reviews only for PRs from external contributors
  • Using Haiku instead of Sonnet for routine reviews
  • Filtering by file patterns (e.g., only review changes to auth/ or api/)

Layer 3: Comprehensive CI Workflows (The Safety Net)

Pre-commit hooks run locally (they can be skipped with --no-verify). Review agents comment on PRs (humans can ignore them). CI is the enforced safety net that blocks merges.

I use reusable workflows from my workflows repo that provide consistent security checks across projects.

Python CI with Security Layers

The python-ci.yml workflow (workflows/.github/workflows/python-ci.yml) runs:

  1. Linting (ruff)
  2. Type checking (pyright or mypy)
  3. Tests with coverage (pytest + codecov)
  4. Package build verification (twine check)
  5. Security scanning (bandit)

Usage in a project:

# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main]
  pull_request:

jobs:
  ci:
    uses: brooksmcmillin/workflows/.github/workflows/python-ci.yml@main
    with:
      package-name: taskmanager
      run-type-check: true
      run-lint: true
      run-tests: true
      run-security: true
    secrets:
      CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}

This enforces that every PR must pass linting, type checking, tests, and security scans before merging. If an LLM change breaks any of these, the PR is blocked.

Tip: Use public repos for testing CI pipelines—they get unlimited GitHub Actions minutes. I’ve hit my monthly minutes limit on private repos when experimenting with pipeline configurations.

Multi-Tool Security Scanning

For production systems, I run layered security scans (workflows/.github/workflows/python-security.yml):

jobs:
  security-scan:
    steps:
      - name: Run Bandit (Python SAST)
      - name: Run Safety (dependency vulnerabilities)
      - name: Run pip-audit (Python-specific CVEs)

  semgrep:
    # Advanced SAST with semantic patterns
    run: semgrep scan --config=p/security-audit --config=p/secrets --config=p/python

  codeql:
    # GitHub's semantic code analysis
    uses: github/codeql-action/init@v3

  trivy:
    # Filesystem and dependency scanner
    uses: aquasecurity/trivy-action@master

Why multiple tools?

ToolWhat it catchesExample
BanditPython-specific anti-patternseval(), weak crypto
Safety/pip-auditKnown CVEs in dependenciesVulnerable Flask versions
SemgrepSemantic patternsSQL injection via f-strings
CodeQLAdvanced taint analysisUser input flowing to os.system()
TrivyContainer/filesystem vulnsSecrets in Docker layers

LLMs might suggest an old library version, use a deprecated crypto function, or construct SQL queries unsafely. One of these tools will catch it.

The same pattern applies to other languages—Rust projects use cargo-audit and cargo-deny, Go uses govulncheck, etc. The principle is the same: multiple specialized tools for comprehensive coverage.

Context Management: The CLAUDE.md Pattern

LLMs need context to make good decisions. The CLAUDE.md file provides this context in the repository itself so that review agents, coding assistants, and automated tools understand:

  • Architecture patterns: “Use async SQLAlchemy for database queries”
  • Security requirements: “Always use prepared statements, never f-strings for SQL”
  • Dependency conventions: “Use uv for package management, not pip”
  • Code style: “Type hints required, use str | None not Optional[str]

Example from taskmanager/CLAUDE.md:

### Response Formats

**Success responses:**
```json
// List/Collection: { "data": [...], "meta": { "count": 10 } }
// Single resource: { "data": { "id": 1, "title": "...", ... } }
```

**Error responses:**
```json
{
  "detail": {
    "code": "AUTH_001",
    "message": "Invalid credentials",
    "details": { ... }
  }
}
```

When Claude generates API endpoints, it follows this exact format because it’s documented in CLAUDE.md. Without this context, every endpoint would have inconsistent response structures.

What to Include in CLAUDE.md

From my experience, these sections prevent the most LLM mistakes:

  1. Quick commands: “How do I run tests?” → Prevents LLM from inventing commands
  2. Architecture overview: “What patterns does this codebase use?” → Enforces consistency
  3. Security requirements: “What should never be committed?” → Blocks credential leaks
  4. Common tasks: “How do I add a new API endpoint?” → Provides working examples
  5. Anti-patterns: “What should I never do?” → Explicit guardrails

The Cost/Benefit Reality

These layers add friction. Here’s what it actually costs:

Time costs:

  • Pre-commit hooks: ~30 seconds per commit
  • CI pipeline: ~3-5 minutes per PR
  • Review agent comments: ~2 minutes to read and address

Dollar costs:

  • GitHub Actions: Free tier covers most personal projects
  • Local agents: Can use a Claude subscription OAuth key; in my experience, running all 4 custom agents at once costs $4-5
  • Claude API for reviews: Can use a Claude subscription OAuth key, or via API ~$10-30/month for active repos (depends on PR volume)
  • Security scanning tools: All free/open-source

What you avoid:

  • A single credential leak can cost thousands in incident response
  • A production auth bypass is a career-defining mistake
  • Manual code review of LLM output takes 10x longer than automated checks

The math is simple: 30 seconds per commit is cheap insurance against shipping LLM-generated vulnerabilities.

What’s Next: MCP Security

These patterns apply directly to securing Model Context Protocol (MCP) servers. When you’re building tools that LLMs can invoke, the attack surface expands: OAuth token handling, input validation for tool parameters, scope limitations on what tools can access.

I’ll be covering MCP security in depth at CactusCon in my talk “Breaking Model Context Protocol: OAuth 2.0 Lessons We Didn’t Learn.”

Conclusion

Coding with LLMs is fast, powerful, and risky. They’ll generate working code that’s insecure, incomplete, or context-blind. The solution isn’t to stop using them—it’s to build defensive layers that catch mistakes before they ship.

The stack that works:

  1. Pre-commit hooks block obvious problems (secrets, type errors, anti-patterns)
  2. Automated review agents catch logic errors and context mismatches
  3. Comprehensive CI enforces security scans, tests, and build verification
  4. CLAUDE.md context files teach LLMs your patterns and requirements

The cost is ~30 seconds per commit and ~$20/month in API calls. The benefit is a reliable safety net between LLM-generated code and production.

These aren’t theoretical patterns. They’re running in my repos right now, catching issues weekly. If you’re using LLMs to write code, you need these layers.

Start with pre-commit hooks (easiest win). Add security review agents (highest ROI). Build up comprehensive CI from there.


All examples in this post are from my open-source repos: taskmanager, agents, and workflows. Feel free to copy patterns that work for your setup.

I’ll be speaking about MCP security at CactusCon—if you’re interested in how these patterns apply to tool-calling architectures, come find me there.