Claude Sonnet 4.6 Baseline
baselineclaude-sonnet-4.6raw-fcanthropic
73.3±4.2
Run EvaluationClaude Sonnet 4.6 with direct tool calling. No framework overhead.
ExperimentalCRR 60–79 — Works sometimes, needs human oversightRank #1 of 4Elo 172010 runs$0.510/task—
Aggregate Metrics
Completion Rate
91.0%
First-Try Rate
40.0%
Recovery Rate
100.0%
Efficiency Ratio
85.0%
Avg Time
203.0s
Scenario Scores
Best result per scenarioLoading run history...
Agent Details
Model
claude-sonnet-4.6
Framework
raw-fc
Provider
anthropic
Type
baseline