System card
Overview
A system card is a structured disclosure document that accompanies a deployed large language model or AI system in production. It documents the system's intended use cases, known capabilities and limitations, identified risks including hallucination and bias, and the guardrails or safety alignment measures implemented to mitigate those risks. System cards function as institutional accountability records, enabling stakeholders—including end-users, regulators, and downstream integrators—to make informed decisions about deployment and use.
System cards emerged as an industry standard practice following increased regulatory scrutiny of AI systems and documented harms from unvetted deployments. They are distinct from model cards, which typically document the training process and benchmark performance of a model in development. System cards instead focus on the runtime behavior and risks of a model operating within a specific organizational context, serving particular user populations, and making real-world decisions.
The contents of a system card typically include: the system's primary function and intended users; documented performance on relevant benchmarks and in-domain evaluation; known failure modes and edge cases; social and technical risks identified through red teaming, adversarial robustness testing, and human evaluation; demographic disparities or representational harms; and concrete mitigation strategies such as content filtering, constitutional AI training, or retrieval-augmented generation for factual grounding.
System cards are increasingly required by regulatory frameworks and industry standards for systems deployed in high-stakes domains such as healthcare, lending, and content moderation. They function as both a compliance mechanism and an operational tool for tracking how deployed systems behave over time and where future improvements are needed.
How it works
A system card is produced through a multi-stage assessment process:
- Capability documentation: The team documents what the system was trained to do, its knowledge cutoff date, context window size, and measured performance on automated evaluation metrics such as ROUGE or BLEU for generation, or retrieval precision and recall for RAG systems.
- Risk identification: Stakeholders conduct red teaming exercises, adversarial robustness testing, and systematic audits to surface failure modes. This includes probing for hallucinations, prompt injection vulnerabilities, demographic biases, and silent failures where the system produces plausible but incorrect outputs without signaling uncertainty.
- Mitigation design and validation: The team designs and deploys specific mitigations—such as guardrails on output format, safety evaluation filters, or prompt engineering to encourage chain-of-thought reasoning—and documents their effectiveness through human evaluation and LLM-as-judge testing.
- Documentation and disclosure: All findings are compiled into a public or stakeholder-accessible system card. The card includes recommendations for responsible use, constraints on downstream applications, and a maintenance schedule for re-evaluating the system as it encounters new data and use cases in production.
System cards are living documents; they are updated as new risks are discovered, guardrails are refined, or the system is fine-tuned with new data.
| Term | Distinction |
|---|---|
| Model card | A model card documents a machine learning model during training and development, focusing on training data, hyperparameters, and benchmark performance. A system card documents the same or related model after deployment, emphasizing real-world risks, mitigations, and suitability for specific use cases. |
| System prompt | A system prompt is a text instruction embedded in a model's input that shapes its behavior at inference time. A system card is a transparency document describing the system's overall design, limitations, and risks; it is not an instruction to the model itself. |
| Safety alignment policy | Safety alignment refers to the training and tuning process used to reduce model misbehavior. A system card documents the results of safety alignment and any remaining gaps or limitations that were not fully addressed. |
| Content filter | A content filter is a technical component that blocks or removes specific outputs. A system card is a document that discloses where content filters are used, their scope, and known bypasses or failures. |
| Red team report | A red team report is an internal or external assessment of vulnerabilities and risks produced during development. A system card is a public-facing or stakeholder-facing summary of those findings paired with mitigation strategies for the deployed system. |
Examples
- OpenAI GPT-4 System Card (2023): OpenAI published a system card for GPT-4 documenting its hallucination rates, performance on multiple benchmarks, known risks including prompt injection susceptibility and social bias, and guardrails implemented via constitutional AI fine-tuning. The card included measured performance on tasks requiring factual accuracy and recommendations against use in high-stakes domains without additional retrieval-augmented generation or human verification.
- Google AI Overviews Transparency Disclosure (2024): Google published documentation for AI Overviews, its generative search system, disclosing its knowledge cutoff, reliance on RAG for factual consistency, known limitations in citation accuracy, and the content filtering mechanisms used to prevent harmful or misleading summaries. The documentation included measured hallucination rates and recommendations for user expectations.
- Anthropic Claude System Card (Constitutional AI) (2023): Anthropic published a system card for Claude describing the model's constitutional AI training process, documented reductions in harmful outputs, remaining biases identified through human evaluation, and benchmarked performance on factuality and harmlessness. The card explicitly noted that the model is not suitable for real-time decision-making in safety-critical domains and documented context window limitations and adversarial robustness gaps.
See also
- Model card – development-stage documentation of model training and performance
- Safety alignment – training techniques to reduce harmful model behavior
- Red teaming (AI) – systematic process for identifying system vulnerabilities
- Guardrails – technical components that constrain model outputs
- Human evaluation – manual assessment of model quality and risks
- Retrieval-augmented generation – method to improve factual consistency and support source attribution
- Safety evaluation – assessment methodologies for identifying and measuring AI risks