Claude Managed Agents: Outcomes β€” Success Criteria and Self-Correction

Claude Code

Claude Managed Agents introduces Outcomes, a capability that lets developers define explicit success criteria for agent tasks using a structured rubric. A dedicated grader agent evaluates outputs in an isolated context window and feeds gaps back to the working agent, which self-corrects iteratively until meeting the bar. Internal testing shows up to a 10-point improvement in task success rate over standard prompting, with file generation gains of 8.4% for .docx and 10.1% for .pptx. The feature is particularly effective for subjective quality tasks such as brand voice alignment. Outcomes is now available in public beta.

Sources & Mentions

1 external resource covering this update


What Are Outcomes?

Outcomes is a new capability in Claude Managed Agents that allows developers to define explicit success criteria for agent tasks. Rather than simply prompting an agent and hoping the output meets expectations, developers write a rubric describing what a successful result looks like. The agent then works toward meeting those criteria, with a separate grader agent evaluating the output independently.

How It Works

The Outcomes workflow unfolds in three stages. First, the developer defines a rubric β€” a structured description of what constitutes a passing output. Second, the agent executes the task and produces an initial result. Third, a dedicated grader agent evaluates that output against the rubric in its own isolated context window, deliberately separated from the original agent's reasoning to eliminate evaluation bias.

When the output falls short, the grader identifies the specific gaps and feeds that assessment back to the agent, which takes another pass. This self-correction loop continues iteratively until the output meets the defined bar, or until a configured retry limit is reached.

Performance Impact

Internal testing shows Outcomes improves task success by up to 10 percentage points over standard prompting, with the largest gains on the hardest problems. File generation quality metrics are particularly striking: document creation success rates improved by 8.4% for .docx files and 10.1% for .pptx files.

The feature is especially effective for tasks requiring attention to detail, exhaustive coverage, or subjective quality standards such as brand voice alignment and visual guidelines compliance.

Real-World Application

Spiral, a writing tool built by Every, uses Outcomes to enforce editorial quality at scale. Each draft is scored against editorial principles and the user's voice rubric pulled from memory β€” only drafts that pass the threshold are returned to users. When multiple drafts are requested, subagents run in parallel, each evaluated independently by the grader.

Status and Availability

Outcomes is available in public beta for all Claude Managed Agents developers. Documentation is available at platform.claude.com/docs/en/managed-agents/overview.


Mentioned onClaude Code