July 2, 2026

The autonomous SDLC is coming. It needs an accountability layer.

Miguel Martinez

TL;DR: Google’s vision is a self-healing, autonomous SDLC that finds and patches its own security flaws at machine speed. Chainloop brings that same self-healing to the enterprise, on any tool you already run, backed by a secure, signed evidence store, so every fix is provable, not just fast. Autonomy gives you the reflexes; Chainloop gives you the memory.

Google just described the autonomous SDLC

On June 29, Google Cloud Security published “How Google Cloud Security uses AI internally” (Chris Betz and Ruchi Shah). The thesis is bold, and we think it’s largely right: AI has collapsed the vulnerability patching window toward zero. When an attacker can weaponize a disclosed flaw in hours, human-paced security loses the race. So security has to become autonomous: multi-agent orchestration embedded across the whole SDLC, producing “immune software” that discovers, validates, and patches its own flaws at machine speed, with humans acting as orchestrators rather than operators.

Google lays out five pillars: agentic design review and gating, centralized AI code scanning (their open-sourced Mantis framework), self-healing fuzz testing, a unified patch pipeline where only validated fixes reach a human, and autonomous security posture management with a self-reflection loop that writes winning patterns into a shared knowledge store.

It’s a serious blueprint, and it’s worth being precise about what it is: a description of how Google runs this inside its own environment, on Google’s own stack. The pattern is what’s portable; the open question is how everyone else applies it to their SDLC. It also optimizes for one thing above all: the speed of remediation. Agents find, validate, and patch faster than any human team could. But speed isn’t proof. When agents generate and merge changes at machine speed, how do you show what actually happened?

The missing half: proving what the agents did

That question is the one the blueprint doesn’t answer: who attests that the agent’s patch is the patch that shipped? That the scan actually ran, the gate actually passed, and the fix is the one that landed in the artifact your customers run?

More automation means more machine-generated change, which means a bigger, faster-moving surface for provenance and tamper-evidence, not a smaller one. Every autonomous action is one more thing you will later have to prove.

So the autonomous SDLC needs an evidence layer underneath it, something that captures every agent action as a signed, policy-gated, queryable record. For a highly regulated or mission-critical enterprise, that layer isn’t optional: auditors, customers, and frameworks like the EU Cyber Resilience Act require a provable record of what changed and why. That’s what the rest of this post is about.

What the evidence layer does

In a real organization, findings come from everywhere: multiple scanners, several tools, security engineers, developers, and now AI agents. No single closed stack sees all of it. Chainloop starts from that reality: you ingest every signal, finding, and issue, from any scanner (Trivy, Grype, Snyk, and more), tool, person, or agent, into one place.

But a single place is just a bigger pile. The same vulnerability gets reported five ways, alert fatigue sets in, and control gates either block everything or block nothing. Aggregation without triage just relocates the noise. So Chainloop turns that noise into decisions automatically. It will deduplicate, assess, remediate, and notify, driven by confidence scoring and its own context (your evidence graph, your source code, your history):

Deduplicate: automatically, keyed on finding type, CVE, and package (PURL), across scanners and runs, so five copies of the same issue collapse into one finding.
Assess: the Vulnerabilities Agent drafts a structured, confidence-scored assessment covering root cause, reachability, impact, mitigating factors, recommendation, and a proposed verdict. It reasons from vulnerability metadata (severity, KEV, fixability, CVSS), scanner context, and your actual source code and Dockerfiles. Drafts land in “Needs Review,” and the agent’s verdict doesn’t change a finding’s status until a human approves it. It stays human-in-the-loop by default.
Remediate: the Remediation Agent patches the affected files and opens a linked GitHub PR when a finding is Affected with high enough confidence, either manually or auto-fired above a configurable confidence threshold. The PR carries the CVE, a fix summary, and the exact changes (for example, resolving a CVE by upgrading Go), and it stays attached to the finding. Unlike a blind dependency bump, it fires off the assessment, only for findings judged Affected, and ships with the evidence behind the fix.
Notify: automatically, so the right people see the decision the moment it’s made.

Chainloop Risk Assessment for GHSA-wqp7-x3pw-xc5r: the Vulnerabilities Agent proposes 'Not Affected' at 86% confidence, with a structured note covering root cause, reachability in the project, impact if exploited, mitigating factors, and a recommendation, awaiting human Approve or Reject.

The Vulnerabilities Agent drafts a confidence-scored assessment and proposes a verdict, but a human still approves or rejects it.

This is what “provable” looks like up close. The verdict above isn’t a chat message or a status someone flipped in a tracker. It’s a signed, versioned record of what was decided, by whom, and why, which is exactly the property the autonomous SDLC will need for every action it takes.

Human review is the default, and that’s deliberate: a confidence score is a signal, not a guarantee, so you decide where to trust it. A single control governs auto-assess, auto-approve, and auto-fix per severity, so you can let High findings self-heal end to end while keeping a human approval step on Critical and leaving the low-severity noise for later. That’s the self-healing SDLC on your terms, and every automated step still lands as signed evidence.

Chainloop Vulnerability Automation settings: per-severity toggles for auto-assess, auto-approve, and auto-fix, with High fully enabled, Critical set to auto-assess and auto-fix but manual approve, and Medium and Low left off.

Per-severity toggles for auto-assess, auto-approve, and auto-fix: self-healing you dial in, from fully human-reviewed to fully autonomous.

Autonomy runs on evidence

Underneath, it’s all evidence. Everything above runs on the same backbone as the rest of the platform: Ingest → Normalize → Decide → Distribute → Audit. Concretely, every signal becomes an in-toto attestation in a DSSE envelope, signed with Sigstore (keyless, or your own keys and KMS) and stored content-addressed and tamper-proof in a CAS backend you control, self-hosted if you need it: the evidence, the policy evaluation, and every revision of every assessment. AI decisions flow through the same signed evidence, versioned assessments, policy-as-code control gates, OpenVEX feed, and compliance machinery (CRA Annex I) as human decisions.

Autonomy inherits the same auditability as everything else. When an agent dismisses a false positive or ships a fix, the result is a signed, versioned, policy-gated record an auditor can query later, rather than a black-box action you have to take on trust. “The AI handled it” won’t satisfy an auditor. A signed assessment mapped to a control will.

Security agents on the coding surface, too

Google’s blueprint is about agents that secure software after it ships. There’s another class of agent already in your SDLC: the coding agents your developers run every day (Claude Code, Cursor, Codex), working upstream of CI on laptops, where the change actually gets decided. Chainloop governs that surface too, and brings the same agentic security capabilities to code generation that it brings to findings.

Chainloop AI Coding dashboard over 7 days: total sessions, active users, AI-assisted PRs, and AI-authored code share, an AI score trend chart across criteria (alignment, context and planning, scope discipline, solution quality, user trust signal, verification), plus top users and a model breakdown.

Every AI coding session rolls up into one org-wide view: adoption, AI-authored code share, per-criterion AI scores, and model usage.

One CLI, chainloop trace, hooks into the agent at session time and into git at push time, so every AI coding session lands as signed evidence without anyone changing how they work, correlated with the pull request it produced. From there the usual machinery runs: sessions and agent configs are just more evidence, so you can run policies over models, tools, configs, and session behavior, group them into framework-shaped controls (an AI Readiness posture a compliance lead can read without touching Rego), and make them a required merge check. Because the diff never tells the whole story, the AI Session Score reads the transcript behind a change to flag what no policy catches: a premature “done,” a claim that doesn’t match the diff, a quietly bypassed pre-commit hook.

The loop closes on both ends. Security agents assess and remediate the findings coming out of your builds, and the same signed, policy-gated evidence covers the agents generating the code going in. We covered the coding side in depth in AI Coding Governance: Same Thesis, Bigger Surface; setup lives in the Chainloop Trace guide.

Two layers, not two options

Google’s blueprint and Chainloop aren’t competing for the same slot. Google’s makes software fix itself. Chainloop makes every fix provable. They stack.

	Autonomy layer (Google’s blueprint)	Accountability layer (Chainloop)
Job	Find, validate, and patch at machine speed	Prove what every action did
Inputs	Its own agents and scanners	Any scanner, tool, person, or agent, Google’s agents included
Unit of trust	The agent’s output	A signed, versioned attestation
Human role	Orchestrate the agents	Approve agent drafts; own the policy
Coding agents	Governed inside the stack	Captured from any agent (Claude Code, Cursor, Codex) as signed evidence
Output	Fixed software	Auditable, policy-gated evidence, plus OpenVEX and CRA

The immune system needs a memory

Google calls the goal “immune software.” It’s the right metaphor, and worth finishing. An immune system doesn’t only fight off threats. It remembers them: it keeps a record of what it encountered and how it responded, so the next response is faster and the body can still tell self from non-self.

The autonomous SDLC gives you the reflexes. What it still needs is the memory: a verifiable record of every agent action, signed and policy-gated, that an auditor, a customer, or the next agent can query long after the fix shipped. Machine-speed remediation and provable remediation are the same story from two ends, and the second end is the one that’s missing.

That’s what Chainloop is: the verifiable memory of your supply chain, under any autonomy layer you run on top. See how Vulnerability Management and Risk Assessment works, or wire it onto a project. If you’re building agents into your SDLC, we’d love to show you what it looks like when every agent action lands as signed, policy-gated evidence.

The autonomous SDLC is coming. It needs an accountability layer.

Google just described the autonomous SDLC

The missing half: proving what the agents did

What the evidence layer does

Autonomy runs on evidence

Security agents on the coding surface, too

Two layers, not two options

The immune system needs a memory

Continue Reading

AI Coding Governance: Same Thesis, Bigger Surface

Changelog: AI Agent Governance, Ask Chainloop, and Supply Chain Hardening

Introducing WebAssembly Policy Engine preview: Secure, Fast, and Language-Agnostic Compliance as Code