Some subtypes of taskishness / corrigibility

·LessWrong··

"Corrigibility" is somewhat of an overloaded term in alignment - it points in the direction of a cluster of desirable properties, but different people have different ideas of what this entails.I think of "corrigibility", as it is used, to cover a few different ideas. I will name some of these and sort them roughly in order of how much of the good outcomes from deploying such a system are in the hands of the AI, rather than the human operator.Sponge corrigibility - The AI is corrigible and follow...

Read full article →

Related Articles

DSpark: Speculative decoding accelerates LLM inference [pdf]
aurenvale · Hacker News · 16h ago
How Many Elementary Particles Are There, Really?
rwmj · Hacker News · 12h ago
Anthropic says Alibaba illicitly extracted Claude AI model capabilities
htrp · Hacker News · 3d ago
The gap between open weights LLMs and closed source LLMs
kkm · Hacker News · 1d ago
Michigan spent $1.8B and only created 602 jobs
littlexsparkee · Hacker News · 3h ago