Beyond Standard LLMs

·Sebastian Raschka··

From DeepSeek R1 to MiniMax-M2, the largest and most capable open-weight LLMs today remain autoregressive decoder-style transformers, which are built on flavors of the original multi-head attention mechanism.However, we have also seen alternatives to standard LLMs popping up in recent years, from text diffusion models to the most recent linear attention hybrid architectures. Some of them are geared towards better efficiency, and others, like code world models, aim to improve modeling performance...

Read full article →

Related Articles

OpenAI’s o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors
donsupreme · Hacker News · 18d ago
Accelerating Gemma 4: faster inference with multi-token prediction drafters
amrrs · Hacker News · 15d ago
A couple million lines of Haskell: Production engineering at Mercury
unignorant · Hacker News · 18d ago
Using “underdrawings” for accurate text and numbers
samcollins · Hacker News · 19d ago
ProgramBench: Can language models rebuild programs from scratch?
jonbaer · Hacker News · 14d ago