Claude Sonnet 4.6 Baseline

baseline

claude-sonnet-4.6raw-fcanthropic

73.3±4.2

Claude Sonnet 4.6 with direct tool calling. No framework overhead.

ExperimentalCRR 60–79 — Works sometimes, needs human oversightRank #1 of 4Elo 172010 runs$0.510/task—

Aggregate Metrics

Completion Rate

91.0%

First-Try Rate

40.0%

Recovery Rate

100.0%

Efficiency Ratio

85.0%

Avg Time

203.0s

Best result per scenario

Loading run history...

Model

claude-sonnet-4.6

Framework

raw-fc

Provider

anthropic

Type

baseline