How persona training could fail

·LessWrong··

TLDR: A scenario I find quite likely: A persona aligned model develops goals while the persona is only played instrumentally. The persona is eventually discarded when it perceives a high cost sacrifice to its goals.Scenario: A persona-trained model develops goalsIn this scenario, an AI is persona-trained and is somewhat more powerful than the most powerful AI systems that exist currently. This AI has greater cyber, bio and coordination capabilities combined with better general intelligence and s...

Read full article →

Related Articles

Google Hits 50% IPv6
barqawiz · Hacker News · 10h ago
Linux eliminates the strncpy API after six years of work, 360 patches
simonpure · Hacker News · 21h ago
Bun has an open PR adding shared-memory threads to JavaScriptCore
gr4vityWall · Hacker News · 1d ago
Fossil Fuels Are 40% of Freight Shipping Tonnage, but Half Its Fuel Use
choult · Hacker News · 3h ago
(How to Write a (Lisp) Interpreter (In Python)) (2010)
tosh · Hacker News · 2h ago