AI Model Benchmark Methodology
A raw benchmark score tells you how capable a model is. Cost per task tells you what that benchmark workload costs. Value Frontier asks whether a model is better or worse than expected for its cost.
The market curve
Because model prices span orders of magnitude, we compare score against log cost. The slope tells us how many benchmark points the market usually buys for every 10x increase in cost.
Value surplus
Value surplus is the vertical distance between a model's actual score and the expected score at its cost level.
Fair cost and value multiple
Fair cost asks what the market curve says a score should cost. The multiple compares that fair cost to the actual cost.
The slider
The ranking blends standardized performance and standardized value surplus. Move the slider toward value when cost-adjusted outperformance matters more than absolute score.
Pareto frontier
A model is on the frontier when no other model is both cheaper and stronger. Frontier models define the efficient edge of the market.
Data source and use cases
Data comes from Artificial Analysis and refreshes daily. Use this for model selection, routing decisions, cost-aware product planning, market analysis, and comparing frontier models against efficient challengers.