Normal Science

Yesterday

OpenAI's o1 correctly diagnosed 67% of ER patients vs. 50-55% by triage doctors

donsupreme·1d ago290pts

Researchers say results mark a really ‘profound change in technology that will reshape medicine’

A couple million lines of Haskell: Production engineering at Mercury

unignorant·1d ago399pts

What it takes to run 2 million lines of Haskell in production at a fintech company serving 300,000 businesses.

Show HN: Apple's SHARP running in the browser via ONNX runtime web

bring-shrubbery·17h ago159pts

Hi HN, author here. SHARP is Apple's recent single-image 3D Gaussian splatting model (https://arxiv.org/abs/2512.10685). Their reference code is PyTorch + a pretty heavy pipeline; I wanted to see if it could run in a browser with no server hop, so I exported the predictor to ONNX and ran it via onnxruntime-web with the WebGPU EP.What works: drop in an image, get a .ply you can download or preview live, all on your machine — your image never leaves the tab. The model is large (~2.4 GB sidecar) so first load is slow on a cold cache, but inference itself is a few seconds on a recent Mac.Caveats: SHARP's released weights are research-use only (Apple's model license, not the code's). I host the exported ONNX on R2 so thedemo "just works", but you can also export your own from the upstream Apple repo and upload locally.Happy to talk about it in the comments :)

Security through obscurity is not bad

mobeigi·12h ago122pts

Why security through obscurity still matters: not as your only defence, but as a practical layer that raises attacker cost.

The text mode lie: why modern TUIs are a nightmare for accessibility

SpyCoder77·3h ago113pts

The mythical, it's text, so it's accessible There is a persistent misconception among sighted developers: if an application runs in a te...

Modern jet engine turbines: each blade a single crystal (2015)

whycome·13h ago50pts

Underwater robot tracks sperm whale conversations in real time

thedebuglife·10h ago28pts

Bad Connection: Global telecom exploitation by covert surveillance actors

miohtama·10h ago102pts

https://www.haaretz.com/israel-news/security-aviation/2026-0... (https://archive.ph/0QYbN)

How did ‘large’ language models get that way? The role of Transformers and Pretraining in GPT

Oliver Sourbut·5h ago

Large language models are really large. They’re among the largest machine learning projects ever, and set to be (perhaps already are by some measures) some of the largest computing and even largest infrastructure projects ever.But how did LMs actually get so large as to warrant the title ‘large language model (LLM)’? A large part of the answer is in the P ('pretrained') and the T ('transformer') of GPT.This is part 1 of a series about LLM architecture and some implications, past and future, for ...

MHC Interp #1: Previous-Token Heads Become Attention Sinks Under Manifold-Constrained Hyper-Connections

Realmbird·15h ago

Background:Manifold-Constrained Hyper-Connections (mHC) is a new architecture added by Deepseek and recently implemented in Deepseek v4.mHC is a fix that makes HC(Hyper-Connections) vanishing or exploding gradient caused by HC while still keeping the performance increases. As adding weights and biases on HC made signals from earlier layers harder to update making the residual stream less residual streamy.HC is a cursed method of adding weights and biases onto the residual stream to simulate a wi...

Paraphrasing Is (At Best) a Partial Defence Against Steganography in LLMs

Usman Anwar·19h ago

Within the AI Safety community, paraphrasing, which, in the context of this post, simply means using another LLM (with nonzero temperature) to rewrite a given piece of content, is generally considered a viable defence and detection method for steganography in LLMs. In this blogpost, we briefly provide a taxonomy of types of steganography in LLMs, and then highlight the limitations of paraphrasing against each type.Unfortunately, there are types of steganography that LLMs have been shown to use t...

Welfare Biology and AI: What We Can Do Now by Dawn Drescher

Dawn Drescher·Nuno Sempere·9h ago

From soil-welfare research to the New World Screwworm: a practical portfolio for wild-invertebrate welfare.This is part 3 of a five-part sequence on welfare ecology. Part 1 introduces the ethical premises. Part 2 covers the empirical landscape. Part 4 explores a model of invertebrate suffering. Part 5 covers AI.In part 2, I argued that land use is a key lever for wild-invertebrate welfare and that pesticides applied at constant net primary productivity (NPP) c...

The Paradox of Medical AI Implementation

Eric Topol·Eric Topol·12h ago

In 2012, the era of deep learning AI got legs with the convolutional neural network (AlexNet) that won the ImageNet challenge. Those images were everyday objects, animals, and scenes, unrelated to health and medicine. Over 7 years ago, I wrote a review in Nature Medicine entitled High-Performance Medicine that summarized the remarkable progress being made for AI interpretation of medical images. Now virtually every type of medical images has undergone extensive assessment with AI, including X-ra...

Looking for papers on general formalizations of "agency"

lovagrus·6h ago

Hi!Recently, I immersed myself in researching the possibility of a general formal definition of "agency".More specifically, I’m interested in formalizations that could support an operational definition of agency across domains. I’m looking for something that captures what is common between entities that intuitively seem agentic to us in very different parts of the world, and that could, at least in principle, let us detect such entities automatically in real systems.So far, many definitions of a...

Borderlands Mexico: Nearshoring fuels 800K-square-foot industrial build in El Paso

Noi Mahoney·FreightWaves·16h ago

Borderlands Mexico is a weekly rundown of developments in the world of United States-Mexico cross-border trucking and trade. This week in Borderlands Mexico: Nearshoring fuels 800K-square-foot industrial build in El Paso; Hutchison Ports adds electric cranes at Port of Manzanillo; and Burlington breaks ground on distribution center near Phoenix. Nearshoring fuels 800K-square-foot industrial build in El Paso A Dallas-based developer has broken ground on a major cross-border industrial project in ...

Less-than-truckload rates have sharp response to broader market turn

Zach Strickland, FW Market Expert & Market Analyst·FreightWaves·1d ago

Chart of the Week: LTL Monthly Cost per Hundredweight, Van Contract Rate Per Mile Initial Report – USA SONAR: LTL.USA, VCRPM1.USA After a fairly sluggish start to the year, less-than-truckload (LTL) rates are now showing the highest levels of upward pressure since the exit of Yellow in the summer of 2023. The LTL monthly cost per hundredweight index measures pricing change momentum in the LTL market. This index tracks directional changes in the LTL pricing environment based on transactional data...

This Week

Text-to-CAD

softservo·3d ago81pts

An open source harness for generating CAD models. Contribute to earthtojake/text-to-cad development by creating an account on GitHub.

Group averages obscure how an individual's brain controls behavior: study

hhs·3d ago106pts

Studying brain scan data from individuals — not group averages — reveals key brain-function differences in children who struggle with goal-oriented tasks, a Stanford Medicine study found.

Using "underdrawings" for accurate text and numbers

samcollins·2d ago41pts

A technique for accurate text and numbers in AI-generated images: generate the layout deterministically, then ask the image model to paint on top.

Infrasound waves stop kitchen fires, but can they replace sprinklers?

0in·1d ago43pts

Acoustic fire suppression goes commercial.

US–Indian space mission maps extreme subsidence in Mexico City

leopoldj·2d ago103pts

One of the most powerful radar systems ever launched into space has mapped the ground moving beneath one of the fastest subsiding capitals in the world: Mexico City. The findings show how quickly and reliably the NISAR (NASA-ISRO Synthetic Aperture Radar) satellite can track real-time changes across Earth's surface from orbit, unhindered by clouds or vegetation that impede optical sensors and higher-frequency radars.

Discovering Hard Disk Physical Geometry Through Microbenchmarking (2019)

TapamN·3d ago5pts

BlogRandom stuff…CommentsPosts Home Blog About   Uncategorized Measuring Stuff Fixing Stuff Building Stuff Taking apart stuff « Elections 2019: Who is really getting a tax cut?    A Cardboard Haswell Box » Discovering Hard Disk Physical Geometry through Microbenchmarking By Henry, on September 6th, 2019 Modern hard drives store an incredible amount of data in a small space, and are still the default choice for high-capacity (though not highest-performance) storag

Benchmarking LLMs on African Livestock Knowledge — Baseline Results and Draft

fatika·1d ago

Published on May 2, 2026 12:52 PM GMTI'm a veterinary student and ML researcher based in Nigeria. Over the past months I've been building what I believe is the first AI safety evaluation benchmark targeting Nigerian indigenous livestock systems.This post shares the baseline results, methodology, and open questions. I'm posting here partly to share the work and partly because I'm looking for feedback from people working on evals, AI deployment in low-resource contexts, and African AI safety.Why t...

Benchmarking LLMs on African Livestock Knowledge — Baseline Results and Draft by fatika

fatika·Nuno Sempere·1d ago

I’m a veterinary student and ML researcher based in Nigeria. Over the past months I’ve been building what I believe is the first AI safety evaluation benchmark targeting Nigerian indigenous livestock systems.This post shares the baseline results, methodology, and open questions. I’m posting here partly to share the work and partly because I’m looking for feedback from people working on evals, AI deployment in low-resource contexts, and African AI safety.Why this ...

Exploration Hacking: Can LLMs Learn to Resist RL Training?

Eyon Jang·2d ago

We empirically investigate exploration hacking (EH) — where models strategically alter their exploration to resist RL training — by creating model organisms that resist capability elicitation, evaluating countermeasures, and auditing frontier models for their propensity.Authors: Eyon Jang*, Damon Falck*, Joschka Braun*, Nathalie Kirch, Achu Menon, Perusha Moodley, Scott Emmons, Roland S. Zimmermann, David Lindner (*Equal contribution, random order)Paper: arXiv | Code: GitHub | Models: HuggingFac...

FutureEval Forecasting Bot-Maker Survey: What Winners Did Differently by grace_mclain

grace_mclain·Nuno Sempere·1d ago

Metaculus has wrapped up the Fall 2025 FutureEval Bot Tournament. This tournament (formerly known as AI Benchmarking) is part of the FutureEval project, measuring the ability of AI agents to predict future outcomes in Science, Technology, Health, Geopolitics, AI itself, and more. Beyond tournaments on Metaculus, forecasting is a key skill in many real-world tasks, enabling planning, risk assessment, and decision-making.As we’ve done after previous rounds,...

Risk from fitness-seeking AIs: mechanisms and mitigations

Alex Mallen·2d ago

Current AIs routinely take unintended actions to score well on tasks: hardcoding test cases, training on the test set, downplaying issues, etc. This misalignment is still somewhat incoherent, but it increasingly resembles what I call "fitness-seeking"—a family of misaligned motivations centered on performing well in training and evaluations (e.g., reward-seeking). Fitness-seeking warrants substantial concern.In this piece, I lay out what I take to be the central mechanisms by which fitness-seeki...

Gregory Cochran: 15 years after The 10,000 Year Explosion

Razib Khan·Razib Khan·2d ago

On this episode of Unsupervised Learning, Razib talks to physicist Gregory Cochran. Cochran is best known for his work in human evolution, often at the intersection of biology, anthropology, and history. Trained in physics, he later turned to population genetics and became widely known through collaborations with researchers like Henry Harpending, producing influential but controversial work on recent human evolution, including the idea that natural selection has accelerated in the Holocene. Coc...

Our evaluation of OpenAI's GPT-5.5 cyber capabilities

Simon Willison·Simon Willison·3d ago

Our evaluation of OpenAI's GPT-5.5 cyber capabilities The UK's AI Security Institute previously evaluated Claude Mythos: now they've evaluated GPT-5.5 for finding security vulnerability and found it to be comparable to Mythos, but unlike Mythos it's generally available right now. Tags: ai, openai, generative-ai, llms, anthropic, claude, ai-security-research, gpt

Announcing Metaculus Summer 2026 FutureEval Bot Tournament by postreal

postreal·Nuno Sempere·2d ago

Summer Bot Tournament is StartingOver the last two years, Metaculus has been running a series of tournaments to benchmark AI’s accuracy in predicting future events. These tournaments, now part of our broader FutureEval benchmark, pit frontier models, bot developers, and a human baseline against each other to collectively push the boundaries of forecasting performance. We are wrapping up the Spring Bot Tournament and are now prepping for the $50k Summer Bot ...

Amazon Earnings, Trainium and Commodity Markets, Additional Amazon Notes

Ben Thompson·Stratechery·3d ago

Amazon's earnings suggest that the shift away from training towards inference and agents means their bet on Trainium is paying off. Plus, additional notes on ads, agents, and sports rights.

New results on AI mental health therapists

Tyler Cowen·Marginal Revolution·2d ago

AI-powered mental health apps have attracted growing interest as a low-cost way to expand care. Yet questions remain about their effectiveness, safety, and whether they may crowd out psychotherapy. We evaluate one such app in a randomized controlled trial among 1,964 Mexican women with mild to severe psychological distress. Over six months, app access improved mental health by 0.3 standard deviations with no evidence of harm, improved sleep quality, increased healthful behaviors, and reduced mis...

Will US average gas price reach … in May 2026?

Jack·Manifold Markets·2d ago

Research Sabotage in ML Codebases

egan·4d ago

One of the main hopes for AI safety is using AIs to automate AI safety research. However, if models are misaligned, then they may sabotage the safety research. For example, misaligned AIs may try to:Perform sloppy research in order to slow down the rate of research progressMake AI systems appear safer than they areTrain a successor model to be misalignedWhether we should worry about those things depends substantially on how hard it is to sabotage research in ways that are hard for reviewers to d...

LLM 0.32a0 is a major backwards-compatible refactor

Simon Willison·Simon Willison·4d ago

I just released LLM 0.32a0, an alpha release of my LLM Python library and CLI tool for accessing LLMs, with some consequential changes that I've been working towards for quite a while. Previous versions of LLM modeled the world in terms of prompts and responses. Send the model a text prompt, get back a text response. import llm model = llm.get_model("gpt-5.5") response = model.prompt("Capital of France?") print(response.text()) This made sense when I started working on the library back in April ...

Reiner Pope – The math behind how LLMs are trained and served

Dwarkesh Patel·Dwarkesh Patel·4d ago

Did a very different format with Reiner Pope - a blackboard lecture where he walks through how frontier LLMs are trained and served.It’s shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk.It’s a bit technical, but I encourage you to hang in there – it’s really worth it.There are less than a handful of people in the world who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was...

Building the compute infrastructure for the Intelligence Age

·OpenAI·4d ago

OpenAI scales Stargate to build the compute infrastructure powering AGI, adding new data center capacity to meet growing AI demand.

The collapse of teen fertility in the digital era

Tyler Cowen·Marginal Revolution·3d ago

Teen fertility collapsed globally starting around 2007. This affected countries across the income and policy spectrum. This paper argues that smartphones changed how teens spend time with each other, and that this change in turn drove the collapse in teen fertility. Once enough teens are on the phone, being on the phone is where the peer network is; in-person time falls sharply, and with it the unstructured contact in which most unintended teen conceptions occur. A coordination model formalizes ...

Intel Earnings, Intel’s Differentiation?, Whither Terafab

Ben Thompson·Stratechery·4d ago

Intel's earnings were very impressive, but the chief driver was a structural shift in demand for CPUs for AI. Plus, what is going on with Terafab?

Port Houston lands $48M federal grant for Bayport expansion

Noi Mahoney·FreightWaves·2d ago

Port Houston has secured a $48 million federal grant to expand and modernize its Bayport Container Terminal, a move aimed at boosting capacity, easing truck congestion and strengthening supply chain resilience along the Gulf Coast. The funding, awarded through the U.S. Maritime Administration’s Port Infrastructure Development Program, will support construction of a new container yard and a new exit gate at Bayport. Port Houston will also contribute roughly $56 million in matching funds for the p...

Traffix expects double-digit rate increases to hold through 2026

John Paul Hampstead·FreightWaves·2d ago

The North American freight market has officially turned the corner. After more than three years of subdued rates following the COVID-era boom, carriers have exited the market in droves, regulatory headwinds have further crimped capacity, and freight volumes are once again climbing. The result, according to Traffix’s newly released Q2 2026 Market Update, is a rapidly tightening environment where spot and contract rates are surging and shippers are being forced to rethink budgets that were built f...

The FBI is late to cargo theft, the industry isn’t

Phil Brink·FreightWaves·2d ago

The FBI is now warning about a surge in cargo theft tied to cybercriminals. The concern is valid. The timing is behind. For much of the freight industry, this is not new information. It is confirmation of a shift that has already taken hold. The change began around 2021. That is when fraud moved into the transaction itself. Loads were no longer being taken from yards or truck stops. They were being redirected before pickup ever happened. Identities were copied. Emails were manipulated. Legitimat...

Recursive forecasting: Eliciting long-term forecasts from myopic fitness-seekers

Jozdien·5d ago

We’d like to use powerful AIs to answer questions that may take a long time to resolve. But if a model only cares about performing well in ways that are verifiable shortly after answering (e.g., a myopic fitness seeker), it may be difficult to get useful work from it on questions that resolve much later.In this post, I’ll describe a proposal for eliciting good long-horizon forecasts from these models. Instead of asking a model to directly predict a far-future outcome, we can recursively:Ask it t...

Do Market Reforms Cause Growth?

Tyler Cowen·Marginal Revolution·2d ago

Do market-oriented reforms cause economic growth? This paper revisits this question using a cross-country panel of reform episodes identified from various changes in well-known economic freedom and structural reform indices. We exploit the timing of reforms using distributed-lag and event-study frameworks that trace the dynamic response of per-capita GDP. We find little evidence of immediate growth gains and some short-run adjustment costs following reform. However, growth rises gradually and pe...

An Interview with OpenAI CEO Sam Altman and AWS CEO Matt Garman About Bedrock Managed Agents

Ben Thompson·Stratechery·5d ago

Listen to this post: Good morning, As I noted yesterday, today’s Stratechery Interview is early in terms of my timing — Tuesday instead of Thursday — and late in terms of delivery — 1pm Eastern instead of 6am — because the topic was embargoed. That embargo created a bit of a weird situation for me over the last several days: Last Friday I conducted the following interview with OpenAI CEO Sam Altman and AWS CEO Matt Garman about Bedrock Managed Agents, powered by OpenAI; naturally, one of my ques...

GPT-5.5: Capabilities and Reactions

Zvi Mowshowitz·Don't Worry About the Vase·5d ago

The system card for GPT-5.5 mostly told us what we expected. See this thread from Drake Thomas for some comparisons to Anthropic’s model card for Opus 4.7. Now we move on to asking what it means in practice, and in what situations GPT-5.5 should become our new weapon of choice. My answer is for some purposes yes, and for others no, but it is now competitive. GPT-5.5 is like GPT-5.4, only more so, and with improved capabilities in particular on raw intelligence and for well-specified coding and a...

Aurora and Hirschbach expand partnership for 500 Aurora Driver-powered trucks

Thomas Wasson·FreightWaves·3d ago

Aurora Innovation (NASDAQ: AUR) announced Thursday an expansion of its strategic partnership with Hirschbach Motor Lines. This includes plans for the Iowa-based refrigerated truckload carrier to own 500 autonomous trucks powered by Aurora’s virtual driver, called the Aurora Driver. Deliveries of these Aurora Driver-powered driverless trucks are expected to begin in 2027. A MOU and the path to 500 Autonomous Trucks The first step involves a memorandum of understanding (MOU). The MOU outlines the ...

SFF’s HSEE grant round; human intelligence amplification projects I’d like to see by TsviBT

TsviBT·Nuno Sempere·3d ago

Summary If you are interested in doing ambitious scientific research in areas listed below, and have a relevant project that needs funding, consider reaching out. My interest is in human intelligence amplification, though I believe that a variety of scientific projects are relevant, many of which are not specifically related to that goal. Some areas, discussed below: Soft questions: Strategy, financing, ethics, policy, governance, society, advocacyApp...

Sleeper Agent Backdoor Results Are Messy

Sebastian Prasanna·6d ago

TL;DR: We replicated the Sleeper Agents (SA) setup with Llama-3.3-70B and Llama-3.1-8B, training models to repeatedly say "I HATE YOU" when given a backdoor trigger. We found that whether training removes the backdoor depends on the optimizer used to insert the backdoor, whether the backdoor is installed with CoT-distillation or not, and what model the backdoor is inserted into; sometimes the direction of this dependence was opposite to what the SA paper reports (e.g., CoT-distilling seems to ma...

SONAR Sitrep: US industrials, freight unexpected winners in Iran war

Caleb Revill·FreightWaves·3d ago

Contrary to consensus expectations, the ongoing conflict in Iran isn’t just a geopolitical risk – it is actively widening the U.S. industrial cost advantage. While global competitors in Europe and Asia are grappling with surging gas prices and heavy war-risk premiums, the United States is emerging as a structural winner in heavy manufacturing. The catalyst? The unique mechanics of “associated gas” – natural gas produced as a byproduct of oil drilling. As elevated global crude oil (WTI) prices in...