We need 3rd party Training-Run Assessments
Training-run assessments conducted by a 3rd party should become a standard part of frontier AI safety.By a Training-Run Assessment, or TRA, I mean an in-depth analysis of the post-training pipeline and dynamics leading up to a frontier model release. A TRA can look at intermediate checkpoints, training rollouts, RL environments, reward signals, SFT datasets, and the process by which the developer responded to warning signs.[1]In this post I will argue that:Final-checkpoint evaluations will be in...
Read full article →