Tiny-ML Leaderboard

Sub-150M parameter language models, same eval harness, transparent methodology.

Unknown org? The "Unknown" tag is for model makers submitting their model ahead of official release. If that's you — DM glintresearch on Discord.

Detailed Results

Sort

#	Model	Org	Params	Eff. ⚡	WikiText-2 byte_ppl ↓	BLiMP ↑	ARC-Easy ↑	Training Tokens	Released	Links

Model Release Timeline

Most recent first

Benchmark Overview

BLiMP ↑

Higher is better

ARC-Easy ↑

Higher is better

WikiText-2 Byte-Level Perplexity ↓

Lower is better · byte_ppl = exp(−loglik / total_bytes) · bubble size = perplexity (smaller bubble = better)

Model Efficiency

Leaderboard Score vs Params

Parameters vs Leaderboard Score

Scatter of each model's overall score vs its parameter count. Points above the dashed threshold line are ≥1σ above the trend. Top 3 marked.

Avg trend High-efficiency threshold Outperforming zone

Models Released by Org

Add your model

Open a PR with your model's benchmark results and reproduction steps. We require: params, training data provenance, eval harness used, and scores for all three of the benchmarks using lm-eval harness.

Agree to Terms