How persona training could fail
TLDR: A scenario I find quite likely: A persona aligned model develops goals while the persona is only played instrumentally. The persona is eventually discarded when it perceives a high cost sacrifice to its goals.Scenario: A persona-trained model develops goalsIn this scenario, an AI is persona-trained and is somewhat more powerful than the most powerful AI systems that exist currently. This AI has greater cyber, bio and coordination capabilities combined with better general intelligence and s...
Read full article →