Metric-Dependent Annotation Saturation for Learning from Label Distributions

Apple ML Research·AI·June 23, 2026

When annotators disagree on a label, the disagreement itself carries signal—and the number of annotators needed to capture it depends on the evaluation metric. We fine-tune NLI models on label distributions subsampled from ChaosNLI, a dataset providing 100 independent annotator judgments per item, and identify metric-dependent saturation. In our 3-class NLI setting, entropy correlation—whether the model identifies which items elicit disagreement—requires N ≈ 20–50 annotators to converge, while d...

Read full article →

Metric-Dependent Annotation Saturation for Learning from Label Distributions

Related Articles