Exploring Generalization in NLA's

·LessWrong··

Recently, I was reading anthropic's paper on NLA's[1] and for a person who works on steering, it was an interesting and thought-provoking paper. In this post I would like to go through my reproduction and some of the experiments I did on them.Training and ArchitectureI'm going to touch little on architecture here because the paper already covers them, I add it here so that it could make little sense or give a refresh while reading. So, we basically train 2 models,Activation Verbalizer (AV): Inje...

Read full article →

Related Articles

Anthropic says Alibaba illicitly extracted Claude AI model capabilities
htrp · Hacker News · 1d ago
Ford AI hiccups push carmaker to rehire ‘gray beard’ inspectors
alanwreath · Hacker News · 11h ago
An entire Herculaneum scroll has been read for the first time
verditelabs · Hacker News · 10h ago
IBM debuts sub-1 nanometer chip technology
porridgeraisin · Hacker News · 10h ago
Show HN: I made Google Trends for Hacker News by indexing 18 years of comments
ytkimirti · Hacker News · 12h ago