AI content detection
Overview
AI content detection (also synthetic-text detection or AI-generated text detection) refers to techniques for estimating whether a given text was produced by a language model rather than authored by a human. Detection is used in contexts including academic integrity enforcement, content moderation, and provenance auditing of publishing pipelines.
Detection operates under a fundamental asymmetry: generating text is computationally cheap; reliably detecting AI generation at scale is significantly harder, and detection accuracy degrades with paraphrasing, fine-tuning, or human editing.
Detection approaches
| Approach | Mechanism | Limitation |
|---|---|---|
| Statistical scoring (perplexity, burstiness) | AI text tends to be lower-perplexity and less variable per-sentence than human text | Paraphrasing and model updates shift the distributions |
| Classifier-based | Train a binary classifier on labeled AI/human samples | Generalizes poorly to new models or fine-tuned outputs |
| Watermarking | Embed a statistical signal during generation (e.g., green/red token biases) | Requires generator cooperation; post-hoc editing can remove weak watermarks |
| Retrieval-based | Check if suspicious phrases appear verbatim in model output logs | Requires access to the model's output log; impractical at scale |
No publicly available detector as of 2024 reliably distinguishes AI-generated text at low false-positive rates across diverse domains and models. Major providers (OpenAI) retracted their public detectors due to unacceptable false-positive rates.
AI watermarking
Watermarking is a generation-time detection enabler: the model that generates the text embeds a statistical pattern (a watermark) that a verifier can check. The C2PA coalition and several research groups have proposed watermarking standards for AI-generated media. Watermarking of LLM text is an active research area but has not been widely deployed in commercial models.
- AI content detection is not the same as AI-generated content disclosure: disclosure is a labeling practice (humans declare); detection is a technical attempt to infer origin without a declaration.
- Detection is not deterministic: results are probabilistic and subject to error.