Why did Deloitte, EY, and KPMG pull their AI reports?

Each report contained material fabricated by a generative AI model and published without adequate verification. Deloitte Australia's October 2025 government compliance review included a fabricated quote attributed to a federal court and references to academics who had not written the cited work. EY's May 2026 fraud report had roughly 70% of its citations fabricated — 16 of 27 — including a market-sizing figure attributed to McKinsey that McKinsey never produced. KPMG's June 2026 report had only 5 of 45 citations pointing to real, intact sources, with named organizations stating their described AI work was wrong or invented.

What is 'vibe citing'?

Vibe citing is a term coined by GPTZero CEO Edward Tian for the citation equivalent of vibe coding: a generative model stitches together fragments of real sources, invents plausible titles, and paraphrases references until they no longer match the original — producing something that looks exactly like scholarship but is hollow underneath. The output reads as authoritative because fluency is mistaken for accuracy.

What is a second-hand hallucination?

A second-hand hallucination is a fabrication from one report that gets laundered into the next one — cited by another analyst, absorbed into a model's training data, or repeated as 'according to KPMG.' The error does not disappear when the original PDF is retracted, because it has already been quoted, screenshotted, and propagated. This is how a single fabricated claim slowly contaminates the shared record everyone downstream relies on.

How does ForIntel prevent AI hallucinations in its reports?

ForIntel treats the model as the least trustworthy component in the pipeline and engineers around not trusting it. Every quantitative claim is recomputed from raw source data rather than paraphrased from model output; two independent measures must agree before a finding ships; statistical significance is claimed only where it is tested; data boundaries are named rather than filled when a source returns empty; and a separate counter-signal verification step re-pulls and recomputes every claim to try to break it before a client sees it.

Isn't detecting AI hallucinations enough?

Detection happens after publication — it is an autopsy. By the time a detector flags a fabricated report, it is already published, read by the client, and quoted elsewhere; the well is already poisoned. Prevention means verifying before the report ships. ForIntel's architecture is built to catch fabrication before publication, not to diagnose it afterward.

Home / Publications / Blog / ForIntel Research / The Well Is Poisoned. We Built ForIntel So Yours Isn't.

The Well Is Poisoned. We Built ForIntel So Yours Isn't.

Three of the four largest professional-services firms pulled AI reports in eight months because the AI fabricated quotes, citations, and case studies. The lesson isn't that AI is unreliable — it's that most people producing intelligence have no idea how to make it reliable, and ForIntel is engineered to catch the fabrication before publication, not after.

By Foragentis teamPublished 2026-06-205 min read

Three of the four largest professional-services firms on earth have now pulled AI reports because the AI made things up. Not a junior analyst's footnote error. Not a transposed figure. Fabricated court quotes. Invented McKinsey statistics. Case studies about real companies that the real companies say never happened.

This is the business we are in, and it is worth being precise about what just occurred — because the lesson is not "AI is unreliable." The lesson is that most people producing intelligence have no idea how to make AI reliable, and they are shipping anyway.

Eight months, three firms, one pattern

Start with the timeline, because the cadence is the story.

In October 2025, Deloitte Australia delivered a government compliance review on a six-figure contract. It contained a fabricated quote attributed to a federal court and references to academics who had not written the cited work. Deloitte refunded part of the fee to taxpayers.

In May 2026, EY published a report on fraud in loyalty programs. Roughly seventy percent of its citations — sixteen of twenty-seven — were fabricated: fake links to Forbes, McKinsey, Gartner, TechCrunch. A market-sizing figure attributed to McKinsey that McKinsey never produced. The report was pulled the same day the analysis went public.

In June 2026, KPMG pulled a flagship report titled Total Experience: Redefining Excellence in the Age of Agentic AI. Of its forty-five citations, only five pointed to real, intact sources. UBS, the UK's National Health Service, Swiss Federal Railways, and Transport for London each told the Financial Times that the report's descriptions of their AI work were wrong or made up. One case study described an Emirates chatbot that does not exist. Another cited a 2019 press release as evidence of "agentic AI" — a term that did not enter wide use until 2024. The report even contradicted KPMG's own CEO survey from the same month, citing 55% where its other research said 71%.

A report selling AI to enterprise clients was pulled because the report itself was hallucinated by AI. The irony writes itself, and every outlet from the FT to The Register used it. But irony is not the interesting part.

"Vibe citing" and the second-hand hallucination

The researchers who caught all three were GPTZero. Their CEO, Edward Tian, coined a phrase for the failure mode: vibe citing. It is the citation equivalent of vibe coding — a generative model stitches together fragments of real sources, invents plausible titles, paraphrases a reference until it is no longer the reference, and produces something that looks exactly like scholarship and is hollow all the way down.

Tian's sharper warning is the one that should keep anyone in this industry awake. Error-riddled reports from trusted firms, he said, "poison the well of information." They breed second-hand hallucinations — fabrications laundered into the next report, the next model's training data, the next analyst's "according to KPMG." The lie does not die when the PDF is pulled. It has already been cited, screenshotted, and absorbed.

That is the actual stakes. Not one embarrassing retraction. A slow contamination of the shared record that everyone downstream trusts.

Here is the part the Big Four got exactly backwards

GPTZero is a detector. It is very good at what it does, and it caught all three firms after the fact. But detection is an autopsy. By the time GPTZero runs its check, the report is published, the client has read it, the FT has the spokesperson on the phone, and the well is already poisoned.

There is, conspicuously, no public preventer. No one is owning the narrative of catching this before the PDF ships.

That gap is the whole reason ForIntel exists, and it is why our entire architecture looks nothing like "ask a model to write a report."

What we actually do differently

Every firm above had the same workflow: prompt a capable model, get fluent prose, lightly review, publish under a trusted brand. The brand was doing the verification work the process skipped. That is the trap — fluency reads as accuracy, and a model that does not know an answer will produce a beautifully formatted wrong one with total confidence.

ForIntel is built on the opposite assumption: the model is the least trustworthy component in the pipeline, so the pipeline is engineered around not trusting it. Concretely:

Every quantitative claim traces to raw data, not to model output. When a ForIntel brief says an import is concentrated 71.3% on one origin, that number is recomputed from the underlying line items, not paraphrased from a summary the model wrote. We have caught our own intermediate steps producing inflated figures — and the discipline is that the recomputed-from-source number wins, every time, even when it is less dramatic.

Two independent measures have to agree before a finding ships. A value-based concentration is cross-checked against a physical-weight concentration. A search-demand read is cross-checked against labour-market or filing data. When two unrelated instruments point the same way, the finding is grounded. When they don't, it doesn't ship as a finding.

Statistical significance is claimed only where it is tested. If a trend isn't significant, we say "flat." If a sample is too thin to weight, we say "directional, not deep-corpus." We do not dress an underpowered signal as a confident one — which is precisely the move that turned a KPMG survey number into a contradiction with KPMG's own data.

Boundaries are named, never filled. When a data source returns empty, the ForIntel brief says so — "this layer returned no data on every attempt" — and fabricates nothing to cover the hole. The Big Four's failure was the inverse: a gap in the data became an invented case study about a real company. We would rather show you the edge of what we know than manufacture confidence past it.

Counter-signal verification is a separate step, by design. One part of the system builds; a different part exists only to attack what was built — to re-pull, recompute, and try to break the claim before a client ever sees it. Detection-after-publication is what GPTZero sells. Verification-before-publication is what we ship.

The pitch, stated plainly

Three of the four biggest intelligence brands in the world just demonstrated, in public, eight months running, that prestige is not a substitute for engineering. The verifier catches it after the damage. We are the engineering layer that catches it before — because we assumed from day one that the model would lie, and we built everything downstream to keep that lie from reaching you.

The well is poisoned. We would like to show you what clean looks like under the hood.

ForIntel is a research-first business-intelligence product from Foragentis. Every quantitative claim in a ForIntel brief traces to source data, is cross-corroborated across independent measures, and passes a counter-signal verification step before it ships. To see a sample brief or commission a read, reach the ForIntel desk at forintel@foragentis.com.

The Well Is Poisoned. We Built ForIntel So Yours Isn't.

Eight months, three firms, one pattern

"Vibe citing" and the second-hand hallucination

Here is the part the Big Four got exactly backwards

What we actually do differently

The pitch, stated plainly

Related reading

What a 2026 Ecommerce SEO Audit Actually Reveals

What Nonprofit Digital Marketing Research Gets Wrong About Donor Search Behavior

The B2B SaaS SEO Lever Most Companies Are Under-Using

Multi-Location SEO in 2026: What Franchise Brands and Retail Chains Get Wrong