← Back to Perspectives

The Problem With "Trust but Verify"

As AI agents take on more autonomous roles in commerce, the gap between probabilistic recommendations and verifiable proof is becoming impossible to ignore. This essay examines why "trust but verify" breaks down at scale, and why verification must move from human effort to infrastructure.

When I asked Gemini recently to recommend a gift for a friend's two-year-old, the interaction followed a script that has become familiar in the generative era: seamless, intelligent — and ultimately incomplete.

The request was straightforward but specific: a set of wooden blocks, sustainably sourced and free of harmful chemicals like lead paint. The model drew from a vast index of the web and identified a set that appeared to meet every criterion. It cited manufacturer sustainability pledges, referenced reviews and marketing pages, and pointed to claims of "water-based, non-toxic finishes."

Curious how it could be sure these claims were valid in a market flooded with cheaply made alternatives, I pressed further. The system explained that it was relying on public regulatory frameworks — Consumer Product Safety Commission requirements, enforcement data, and the assumption that sellers in regulated categories must undergo testing to list such products in the first place. But it ultimately acknowledged a hard limit: it could not physically verify lab reports or link a specific product to a deterministic batch record without access to a live, authoritative system of record.

If I wanted certainty, it suggested, I would need to verify it myself — by inspecting the packaging after purchase, locating a batch or certificate number, and cross-referencing that information against third-party laboratory databases.

In that moment, the "frictionless" future promised by Silicon Valley revealed its most taxing hidden cost. I hadn't delegated a task; I'd been assigned a research project.

This is the central paradox of the emerging economy of agentic AI — systems designed not just to answer questions, but to act on our behalf. These models excel at probabilistic reasoning. They infer that a product is likely safe based on marketing copy, scraped reviews, and reputational signals. But they lack a deterministic foundation. In a world where spoofed PDFs, expired certifications, or misapplied claims can have physical consequences, even a high statistical confidence is not sufficient when consequences are physical.

As AI agents are expected to handle billions of dollars in transactions, we are running into a trust bottleneck. We are asking machines to operate autonomously while still requiring humans to serve as the final compliance officers. For marketplaces, this manifests as manual audits, reactive enforcement, and liability exposure at internet scale. For consumers, it appears as friction. For regulators, as opacity. And for brands that actually invest in doing things right, it often means being indistinguishable from bad actors until something goes wrong.

What happens when an autonomous agent makes the wrong call? As decision-making shifts from humans to machines, risk no longer maps cleanly to existing actors. The model provider did not test the product. The marketplace did not make the claim. The lab did not issue the recommendation. And the consumer did not inspect the evidence. Yet when something fails, someone will inherit the liability. Today, there is no widely accepted technical or operational playbook for how that risk is assigned — or how it is mitigated — once agency is delegated to software. In practice, that ambiguity pushes risk toward the largest platforms and operators, the ones with balance sheets capable of absorbing it.

At Baseclaim, we believe that for AI to be genuinely useful in commerce, it must move from guessing to knowing. That shift requires infrastructure. Instead of an agent reasoning through marketing claims or scanned documents, it should be able to query a shared compliance layer—one that returns a lab-attested, machine-readable verdict derived from underlying test data, scope, and validity windows.

Not likelihood. Not inference. A verifiable answer.

The era of "trust but verify" has been a necessary bridge, but it is a bridge built on human effort. It assumes that individuals — consumers, operators, or regulators — will close the loop when automation stops short. If AI is to free us from busywork rather than redistribute it, verification cannot remain an afterthought. It must live at the infrastructure layer, reusable across marketplaces, agents, and regulatory contexts.

We need a world where the things we buy are not merely likely to be safe, compliant, or authentic—but verified by default.