GPT-5.4 Baseline

baseline

gpt-5.4raw-fcopenai

67.7±5.1

GPT-5.4 with direct function calling. Fast and cost-effective.

ExperimentalCRR 60–79 — Works sometimes, needs human oversightRank #2 of 4Elo 168010 runs$0.090/task—

Aggregate Metrics

Completion Rate

84.0%

First-Try Rate

10.0%

Recovery Rate

100.0%

Efficiency Ratio

85.0%

Avg Time

80.0s

Best result per scenario

Loading run history...

Model

gpt-5.4

Framework

raw-fc

Provider

openai

Type

baseline