What it takes to transpose a matrix

·Hacker News··

What it takes to transpose a matrix Table of Contents Introduction Setting Naive implementation [3.90] Read stream Write stream Combining read and write streams together Dealing with cache aliasing Reversing the order [2.61] Exploring block structure [1.46] Software prefetching [1.35] 64-bit SIMD [0.74] 256-bit SIMD [0.49] Buffering output [0.35] Conclusion Andrei Gudkov <gudokk@gmail.com> Introduction Classical CPU architecture is a poor choice for performing matrix-oriented computations.

Read full article →

Related Articles

US bans differential privacy in Census data
nl · Hacker News · 2h ago
Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 Packages
qwertox · Hacker News · 4h ago
Twenty One Zero-Days in FFmpeg
redbell · Hacker News · 18h ago
CRISPR tech selectively shreds cancer cells, including "undruggable" cancers
gmays · Hacker News · 1d ago
Kimi K2.7-Code: open-source coding model with better token efficiency
nekofneko · Hacker News · 1d ago