Using Base-LCM to Monitor LLMs

Éloïse Benito-Rodriguez·LessWrong·AI Safety·May 6, 2026

Epistemic status: experimental results. This is an exploratory work examining an alternative approach to the interpretation of language models.SummaryWe aim to determine whether the LCM model — which predicts sentence embeddings rather than token embeddings — can predict the outputs of LLM. We compare four architectures, the most efficient model predicts the following paragraphs with a cosine similarity of 0.53.The code is available hereMotivationThis work is driven by the need to monitor and un...

Read full article →

Using Base-LCM to Monitor LLMs

Related Articles