System card

From llmref.wiki
System card — A transparency document describing an AI system's capabilities, limitations, risks, and intended mitigations post-deployment.

Overview

A system card is a structured disclosure document that accompanies a deployed large language model or AI system in production. It documents the system's intended use cases, known capabilities and limitations, identified risks including hallucination and bias, and the guardrails or safety alignment measures implemented to mitigate those risks. System cards function as institutional accountability records, enabling stakeholders—including end-users, regulators, and downstream integrators—to make informed decisions about deployment and use.

System cards emerged as an industry standard practice following increased regulatory scrutiny of AI systems and documented harms from unvetted deployments. They are distinct from model cards, which typically document the training process and benchmark performance of a model in development. System cards instead focus on the runtime behavior and risks of a model operating within a specific organizational context, serving particular user populations, and making real-world decisions.

The contents of a system card typically include: the system's primary function and intended users; documented performance on relevant benchmarks and in-domain evaluation; known failure modes and edge cases; social and technical risks identified through red teaming, adversarial robustness testing, and human evaluation; demographic disparities or representational harms; and concrete mitigation strategies such as content filtering, constitutional AI training, or retrieval-augmented generation for factual grounding.

System cards are increasingly required by regulatory frameworks and industry standards for systems deployed in high-stakes domains such as healthcare, lending, and content moderation. They function as both a compliance mechanism and an operational tool for tracking how deployed systems behave over time and where future improvements are needed.

How it works

A system card is produced through a multi-stage assessment process:

  1. Capability documentation: The team documents what the system was trained to do, its knowledge cutoff date, context window size, and measured performance on automated evaluation metrics such as ROUGE or BLEU for generation, or retrieval precision and recall for RAG systems.
  1. Risk identification: Stakeholders conduct red teaming exercises, adversarial robustness testing, and systematic audits to surface failure modes. This includes probing for hallucinations, prompt injection vulnerabilities, demographic biases, and silent failures where the system produces plausible but incorrect outputs without signaling uncertainty.
  1. Mitigation design and validation: The team designs and deploys specific mitigations—such as guardrails on output format, safety evaluation filters, or prompt engineering to encourage chain-of-thought reasoning—and documents their effectiveness through human evaluation and LLM-as-judge testing.
  1. Documentation and disclosure: All findings are compiled into a public or stakeholder-accessible system card. The card includes recommendations for responsible use, constraints on downstream applications, and a maintenance schedule for re-evaluating the system as it encounters new data and use cases in production.

System cards are living documents; they are updated as new risks are discovered, guardrails are refined, or the system is fine-tuned with new data.

Distinction from related terms

Term Distinction
Model card A model card documents a machine learning model during training and development, focusing on training data, hyperparameters, and benchmark performance. A system card documents the same or related model after deployment, emphasizing real-world risks, mitigations, and suitability for specific use cases.
System prompt A system prompt is a text instruction embedded in a model's input that shapes its behavior at inference time. A system card is a transparency document describing the system's overall design, limitations, and risks; it is not an instruction to the model itself.
Safety alignment policy Safety alignment refers to the training and tuning process used to reduce model misbehavior. A system card documents the results of safety alignment and any remaining gaps or limitations that were not fully addressed.
Content filter A content filter is a technical component that blocks or removes specific outputs. A system card is a document that discloses where content filters are used, their scope, and known bypasses or failures.
Red team report A red team report is an internal or external assessment of vulnerabilities and risks produced during development. A system card is a public-facing or stakeholder-facing summary of those findings paired with mitigation strategies for the deployed system.

Examples

  • Anthropic Claude System Card (Constitutional AI) (2023): Anthropic published a system card for Claude describing the model's constitutional AI training process, documented reductions in harmful outputs, remaining biases identified through human evaluation, and benchmarked performance on factuality and harmlessness. The card explicitly noted that the model is not suitable for real-time decision-making in safety-critical domains and documented context window limitations and adversarial robustness gaps.

See also

References