Metric-Dependent Annotation Saturation for Learning from Label Distributions

Apple ML Research··

When annotators disagree on a label, the disagreement itself carries signal—and the number of annotators needed to capture it depends on the evaluation metric. We fine-tune NLI models on label distributions subsampled from ChaosNLI, a dataset providing 100 independent annotator judgments per item, and identify metric-dependent saturation. In our 3-class NLI setting, entropy correlation—whether the model identifies which items elicit disagreement—requires N ≈ 20–50 annotators to converge, while d...

Read full article →

Related Articles

OpenAI’s o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors
donsupreme · Hacker News · 1mo ago
Accelerating Gemma 4: faster inference with multi-token prediction drafters
amrrs · Hacker News · 1mo ago
A couple million lines of Haskell: Production engineering at Mercury
unignorant · Hacker News · 1mo ago
Using “underdrawings” for accurate text and numbers
samcollins · Hacker News · 1mo ago
ProgramBench: Can language models rebuild programs from scratch?
jonbaer · Hacker News · 1mo ago