Grok 4.20 Beta Baseline

baseline

grok-4.20-betaraw-fcxai

65.5±5.5

Grok 4.20 Beta with direct tool calling via xAI API.

ExperimentalCRR 60–79 — Works sometimes, needs human oversightRank #3 of 4Elo 165010 runs$0.120/task—

Aggregate Metrics

Completion Rate

81.0%

First-Try Rate

0.0%

Recovery Rate

100.0%

Efficiency Ratio

85.0%

Avg Time

48.0s

Best result per scenario

Loading run history...

Model

grok-4.20-beta

Framework

raw-fc

Provider

xai

Type

baseline