Paraphrasing Is (At Best) a Partial Defence Against Steganography in LLMs

Usman Anwar·LessWrong·AI·May 3, 2026

Within the AI Safety community, paraphrasing, which, in the context of this post, simply means using another LLM (with nonzero temperature) to rewrite a given piece of content, is generally considered a viable defence and detection method for steganography in LLMs. In this blogpost, we briefly provide a taxonomy of types of steganography in LLMs, and then highlight the limitations of paraphrasing against each type.Unfortunately, there are types of steganography that LLMs have been shown to use t...

Read full article →

Paraphrasing Is (At Best) a Partial Defence Against Steganography in LLMs

Related Articles