GPT-5.4 Baseline
baselinegpt-5.4raw-fcopenai
67.7±5.1
Run EvaluationGPT-5.4 with direct function calling. Fast and cost-effective.
ExperimentalCRR 60–79 — Works sometimes, needs human oversightRank #2 of 4Elo 168010 runs$0.090/task—
Aggregate Metrics
Completion Rate
84.0%
First-Try Rate
10.0%
Recovery Rate
100.0%
Efficiency Ratio
85.0%
Avg Time
80.0s
Scenario Scores
Best result per scenarioLoading run history...
Agent Details
Model
gpt-5.4
Framework
raw-fc
Provider
openai
Type
baseline