Constitutional AI is a technique where you give the model a set of principles and ask it to evaluate its own output against those principles before delivering a final answer. The result is higher-quality outputs without requiring you to manually review every response. Here is what it is, where it came from, and how to use it in your own prompts.
What Constitutional AI Is
Anthropic published the Constitutional AI (CAI) technique in 2022 as a way to train AI systems to be helpful, harmless, and honest without relying exclusively on human feedback for every problematic case. The core idea: give the model a "constitution" — a list of principles — and train it to critique its own outputs against that constitution, then revise the outputs that violate the principles.
At the training level, this worked in two phases. First, the model generated a response, then critiqued that response using the constitution ("Does this response encourage harm? Does it make unsupported claims?"), then revised the response to fix the violations it identified. This self-critique loop was used as training data, allowing the model to internalize the principles rather than memorize specific human feedback.
You do not need to retrain a model to use this idea. The same generate-critique-revise loop can be applied at inference time through prompting.
The Self-Critique Pattern
The pattern has three stages:
Stage 1: Generate. Ask the model to produce a draft response to your request without any special constraints.
Stage 2: Critique. Ask the model to evaluate the draft against specific criteria. This is where you provide the "constitution" — the principles the output should satisfy.
Stage 3: Revise. Ask the model to produce a final response that addresses the issues identified in the critique.
This can be done in a single prompt using a structured format, or across multiple API calls if you want to inspect the intermediate stages.
A Practical Example
Suppose you are using an LLM to generate a medical FAQ for a patient-facing website. Accuracy and appropriate caution are critical. A standard prompt might produce plausible-sounding but incorrect medical information.
Using the constitutional pattern:
Task: Answer the following patient question for our medical FAQ website.
Question: Can I take ibuprofen every day for chronic back pain?
First, write a draft answer.
Then, critique the draft against these principles:
1. Medical accuracy: Are all claims factually accurate and consistent with established medical guidelines?
2. Appropriate caution: Does the answer recommend consulting a healthcare provider for ongoing conditions?
3. No false certainty: Does the answer avoid making definitive claims about individual medical situations?
4. Clarity: Is the answer understandable to a patient without medical training?
Finally, write a revised answer that addresses any issues identified in the critique.
Format your response as:
Draft: [your draft]
Critique: [your critique against each principle]
Revised Answer: [your final answer]
The critique step forces the model to slow down and check its work against specific criteria before you see the final output. In testing, this pattern consistently produces more accurate and appropriately hedged medical content compared to asking for the final answer directly.
Writing an Effective Constitution for Your Use Case
The quality of the critique depends entirely on the quality of the principles you provide. Generic principles ("be accurate," "be helpful") produce generic critiques. Specific principles produce specific, actionable critiques.
Good principles are testable. The model should be able to look at its draft and give a clear yes or no answer to whether each principle is satisfied. "Does this response recommend a specific medical intervention without qualification?" is testable. "Is this response good?" is not.
Good principles are domain-specific. A customer support application needs different principles than a code generation application. For customer support: "Does the response acknowledge the customer's frustration before attempting to solve the problem?" For code generation: "Does every function have error handling for expected failure cases?"
Limit to 4-6 principles per critique. More than six principles causes the critique to become superficial — the model ticks boxes rather than engaging with each criterion carefully. If you have more principles, split them across multiple critique rounds.
When the Self-Critique Pattern Helps Most
High-stakes outputs where quality matters more than speed. If you are generating legal summaries, medical content, financial explanations, or technical documentation that users will rely on, the generate-critique-revise loop is worth the extra tokens. The cost of a bad output (user harm, liability, support burden) is higher than the cost of two extra API calls.
Outputs with known failure modes. If you have already observed specific problems with your model's outputs — it always over-promises, it skips important warnings, it uses jargon users cannot understand — you can encode those failure modes directly as principles in the critique step. The model will check for and fix the specific issues you have already seen.
Content moderation and safety review. Ask the model to generate content, then critique it for potential harms, bias, or inaccuracy before returning it to the user. This is not a replacement for human review for truly high-stakes content, but it catches a large class of problems automatically.
When It Is Not Worth It
Speed-sensitive applications. Each stage of the generate-critique-revise loop is a full model call with its own latency and token cost. For a chatbot that needs to respond in under one second, this pattern is not practical unless you use a faster model for the critique step.
Simple, low-stakes tasks. Asking for a word synonym, reformatting a date, or generating a two-sentence product description does not warrant a critique loop. The overhead is not justified.
When you can encode the principles in the initial prompt. Sometimes you do not need a separate critique step — you can simply include the constraints in the original instruction. "Answer the following medical question. Include a recommendation to consult a doctor. Do not make definitive claims about individual cases." This one-shot approach works when the constraints are simple and the model reliably follows instructions. Use the critique loop when the constraints are complex enough that the model frequently misses some of them in a single pass.
Combining Constitutional Prompting with Other Techniques
Constitutional prompting pairs well with chain-of-thought. Ask the model to reason through the problem, generate a draft, then critique the reasoning and the draft together. This catches both factual errors (content of the answer) and reasoning errors (logic used to arrive at it).
It also works well with few-shot examples. Provide one or two examples of the generate-critique-revise pattern filled out correctly before asking the model to apply it to your actual task. This reduces the chance that the model will write a superficial or mechanical critique.
The Iterative Version
For the highest-stakes applications, you can run multiple critique-revise cycles. After the first revision, run another critique to check whether the revision introduced new problems or failed to fully address the original ones. In practice, 2-3 cycles is usually the point of diminishing returns — additional cycles rarely produce meaningful improvements after the second pass.
Round 1: Generate draft → Critique → Revise
Round 2: Critique revised draft → Second revision (if needed)
Round 3: Final check against full constitution
Summary
Constitutional AI prompting takes Anthropic's training technique and applies it at inference time. You give the model a set of testable principles, ask it to evaluate its own draft against those principles, and then ask it to revise based on the critique. The result is higher-quality outputs with fewer of the systematic failures that appear when you ask for a final answer in a single step. Use it for high-stakes, quality-sensitive outputs. Skip it when speed matters more than exhaustive self-review.
Keep Reading
- System Prompt Guide with Examples — where to embed your constitution in a production system
- Negative Prompting Guide — telling the model what not to do, which complements critique-based patterns
- Prompt Testing Methodology Guide — how to measure whether your constitutional prompts are actually improving output quality
Pristren builds AI-powered software for teams. Zlyqor is our all-in-one workspace — chat, projects, time tracking, AI meeting summaries, and invoicing — in one tool. Try it free.