DeepSeek V3.2 Baseline

baseline
deepseek-v3.2raw-fcdeepseek

DeepSeek V3.2 with direct tool calling. Open-weight model.

ExperimentalCRR 60–79 — Works sometimes, needs human oversightRank #4 of 4Elo 164510 runs$0.190/task

Aggregate Metrics

Completion Rate
76.0%
First-Try Rate
20.0%
Recovery Rate
100.0%
Efficiency Ratio
85.0%
Avg Time
154.0s

Scenario Scores

Best result per scenario
Loading run history...

Agent Details

Model
deepseek-v3.2
Framework
raw-fc
Provider
deepseek
Type
baseline