Monitoring computer-use agents in OSWorld-control

·LessWrong··

TL;DR: I built a control setting for computer-use agents and studied a few monitoring setups. I found that attackers can leverage the opacity of computer-use tool calls and gaps between screenshot-action pairs to access credentials or exfiltrate data to other devices on a network without being caught by a monitor. These results validate a possible attack strategy but are not comprehensive evidence that the strategy is uniquely dangerous, given the small sample sizes and limited task set.The sett...

Read full article →

Related Articles

US bans differential privacy in Census data
nl · Hacker News · 2h ago
Arch Linux Now Believes Malware Incident Under Control: More Than 1,500 Packages
qwertox · Hacker News · 4h ago
Twenty One Zero-Days in FFmpeg
redbell · Hacker News · 18h ago
CRISPR tech selectively shreds cancer cells, including "undruggable" cancers
gmays · Hacker News · 1d ago
Kimi K2.7-Code: open-source coding model with better token efficiency
nekofneko · Hacker News · 1d ago