Using Base-LCM to Monitor LLMs
Epistemic status: experimental results. This is an exploratory work examining an alternative approach to the interpretation of language models.SummaryWe aim to determine whether the LCM model — which predicts sentence embeddings rather than token embeddings — can predict the outputs of LLM. We compare four architectures, the most efficient model predicts the following paragraphs with a cosine similarity of 0.53.The code is available hereMotivationThis work is driven by the need to monitor and un...
Read full article →