Claude Code StopFailure Hook: Automated Error Recovery When API Calls Fail

Claude CodeMar 17, 2026

Claude Code v2.1.78 introduces the StopFailure hook event, which fires whenever a turn ends due to an API error such as a rate limit or authentication failure. This expands the hooks system's coverage to error scenarios that previously had no deterministic trigger point, enabling developers to implement automated recovery workflows, alerting, and diagnostic logging when Claude hits infrastructure-level failures. The companion bug fix — preventing an infinite loop when API errors triggered stop hooks that re-fed blocking errors back to the model — makes StopFailure safe to use in production automation pipelines from day one.

Sources & Mentions

5 external resources covering this update

Claude Code v2.1.78 Release Notes

GitHub

Claude Code Hooks Tutorial

DataCamp

Claude Code Hooks: Complete Guide with 20 Ready-to-Use Examples (2026)

Dev.to

Claude Code Docs: Hooks (7.4.5)

DeepWiki

Ask HN: Claude Code Hooks Discussion

Hacker News

What Is the StopFailure Hook?

The Claude Code hooks system allows developers to run custom scripts at key points in the agent lifecycle — before or after tool use, at the end of a session, and now, when a session ends due to an API error. The new StopFailure hook event fires whenever a turn terminates because of an infrastructure-level failure: a rate limit hit, an authentication error, a network timeout, or any other non-successful API response that prevents the model from completing its work.

Before v2.1.78, the hooks system covered two exit scenarios: Stop (a session ends normally) and SubagentStop (a subagent completes). There was no deterministic hook for the failure case, leaving developers without a reliable trigger to act on API errors in automated pipelines. StopFailure closes that gap.

Why This Matters for Automation

Claude Code is increasingly used in long-running, unattended pipelines — CI workflows, scheduled tasks, overnight coding sessions. In these contexts, silent failures are particularly costly. An API error that goes undetected might leave a pipeline stalled indefinitely, or worse, leave work in a partially-completed state with no record of what happened.

StopFailure gives teams three high-value automation primitives that were previously impossible to implement reliably:

Automated Alerting

A StopFailure hook can call a webhook, send a Slack message, or page an on-call engineer the moment Claude hits a rate limit or authentication failure. The hook receives context about the session — including the error type — so the alert can include actionable details.

Retry and Requeue Logic

A hook triggered by a rate limit error can automatically requeue the session for retry after a backoff period, without requiring human intervention. This is especially valuable for batch processing workflows where individual session failures should not abort the entire job.

Diagnostic Logging

Every API failure can now be captured with structured metadata: which project the session was running in, what the last tool call was, what error code the API returned. This data is invaluable for debugging patterns of failure over time and for billing/quota management.

The Companion Bug Fix: No More Infinite Loops

Shipping a new hook event that triggers on error scenarios introduces an obvious risk: what if the hook itself triggers another error? Anthropic identified and fixed this problem before v2.1.78 shipped.

Prior to the fix, a StopFailure hook that returned a blocking error would cause that error to be fed back into the model, which would then fail again — triggering the hook again, indefinitely. The fix breaks this cycle by detecting when an API error has already triggered a stop event and preventing the recursive re-entry. This makes StopFailure safe to deploy in production from day one, without requiring developers to defensively guard against the loop themselves.

Security and Reliability Improvements in v2.1.78

The same release includes two additional improvements that strengthen Claude Code for production use:

bypassPermissions No Longer Silently Writes to Protected Directories

The bypassPermissions execution mode previously allowed writes to protected directories including .git and .claude. This has been corrected — those directories remain protected even when bypassPermissions is active. Teams using bypassPermissions in CI contexts should note this behavioral change, as any scripts that relied on the previous behavior will need to be updated.

Sandbox Disable Warning

When Claude Code is configured with sandbox.enabled: true but the required sandbox dependencies are not present on the host machine, the tool previously failed silently — which could cause confusing behavior in new environments or after machine migrations. Version 2.1.78 surfaces a visible warning in this scenario, making the missing dependency immediately obvious.

The StopFailure Hook in Context

The hooks system now provides deterministic control at every meaningful exit point in a Claude Code session:

PreToolUse — intercept and validate before a tool runs
PostToolUse — react after a tool completes
Stop — run cleanup or notification when a session ends normally
SubagentStop — handle subagent completion in multi-agent workflows
StopFailure — respond to infrastructure-level API failures

This coverage makes it practical to treat Claude Code sessions as reliable primitives in larger systems, with the same observability and error-handling guarantees that developers expect from other infrastructure components.

Mentioned onGitHub DataCamp Dev.to DeepWiki Hacker News