Estimating Tail Risk in Neural Networks

Mark Xu·ARC·AI Safety·September 13, 2024

Machine learning systems are typically trained to maximize average-case performance. However, this method of training can fail to meaningfully control the probability of tail events that might cause significant harm. For instance, while an artificial intelligence (AI) assistant may be generally safe, it would be catastrophic if it ever suggested an action that resulted in unnecessary large-scale harm. Current techniques for estimating the probability of tail events are based on finding inputs on...

Read full article →

Estimating Tail Risk in Neural Networks

Related Articles