A Geometric Account of Activation Steering through Angle–Norm Decomposition
This blog post provides an overview of our recent paper: A Geometric Account of Activation Steering through Angle–Norm Decomposition.TL;DR: We decompose linear activation steering into two distinct operations: one that changes the angle of the activation toward a concept direction, and one that changes its norm. Through controlled experiments, we analyze the role of each component. We find that concept information is indeed primarily encoded in the angular component of activations. However, the ...
Read full article →