AI Model Benchmark Methodology

A raw benchmark score tells you how capable a model is. Cost per task tells you what that benchmark workload costs. Value Frontier asks whether a model is better or worse than expected for its cost.

The market curve

Because model prices span orders of magnitude, we compare score against log cost. The slope tells us how many benchmark points the market usually buys for every 10x increase in cost.

expected_score = 89.40 + 12.76 x log10(cost)

Sample size

RMSE

3.65

R²

0.90

Last updated

Jun 19, 2026

Value surplus

Value surplus is the vertical distance between a model's actual score and the expected score at its cost level.

value_surplus = actual_score - expected_score

Pick a model

Actual score

72.0

Actual cost

$0.018

Expected score

67.1

Surplus

4.9

Fair cost and value multiple

Fair cost asks what the market curve says a score should cost. The multiple compares that fair cost to the actual cost.

fair_cost = 10 ** ((actual_score - a) / b)

value_multiple = fair_cost / actual_cost

2.4x cheaper than expected

The slider

The ranking blends standardized performance and standardized value surplus. Move the slider toward value when cost-adjusted outperformance matters more than absolute score.

rank_score = (1 - w) x performance_z + w x value_z

#1 Apex Max

Crown Model Co.

1.06

rank score

#2 Orbit Premier

OrbitAI

0.79

rank score

#3 Cascade Ultra

Helio Systems

0.66

rank score

#4 Aurora Reason 2

Northstar AI

0.54

rank score

Pareto frontier

A model is on the frontier when no other model is both cheaper and stronger. Frontier models define the efficient edge of the market.

Data source and use cases

Data comes from Artificial Analysis and refreshes daily. Use this for model selection, routing decisions, cost-aware product planning, market analysis, and comparing frontier models against efficient challengers.

Snapshot: Jun 19, 2026

Index version: demo

Dropped rows: 0