How might continual learning affect safety and alignment?
This is the third post in our sequence Implications of Continual Learning for LLM Agents.SummaryWe argue that continual learning (CL) has two major potential safety implications: it may enable changes to LLM goals and values after deployment, and it eliminates the last-mover advantage held by current safety interventions.We identify three pathways for goal and value change during deployment. First, loss of developer-side control over generalization. Second, value systematization, induced when an...
Read full article →