|
π Read In Depth
|
New comment by ggillas in "How We Broke Top AI Agent Benchmarks: And What Comes Next"
A paper apparently achieved near-perfect scores on major AI agent benchmarks without solving a single task β exploits range from trivially simple (sending empty JSON to get full marks) to technically involved. This is a significant methodological indictment of how the field measures agentic AI progress, and worth reading carefully for anyone who builds or evaluates such systems.
hn/Best Comments
|
New comment by kilpikaarna in "Small models also found the vulnerabilities that Mythos found"
The thread debates Anthropic's Claude Mythos vulnerability-finding results: small, cheap open-weight models apparently replicated many of the same findings when given the same isolated code snippets. The tptacek comment (article 129) is the most incisive β the hard part of finding vulnerabilities isn't pattern-matching on isolated code, it's knowing where to look in a million-line codebase. This whole thread cuts to the heart of what AI security research actually proves.
hn/Best Comments
|
The Escalating Global A.I. Arms Race
NYT deep-dive on how China, the US, Russia and others are racing to militarize AI β compared to the dawn of nuclear weapons. Goes beyond the usual hype to cover actual military systems being deployed or developed. Relevant background for anyone thinking about AI governance, export controls, and the geopolitics shaping which companies and chips end up where.
nyt/Technology
|
How Accurate Are Googleβs A.I. Overviews?
NYT investigates how accurate Google's AI Overviews actually are, finding they draw from sources ranging from authoritative sites to Facebook posts. As AI-generated answers increasingly intercept search traffic, the quality of sources and the accuracy of synthesis becomes a systemic reliability question β relevant to anyone thinking about RAG pipelines, information quality, or how AI reshapes information access.
nyt/Technology
|
Too dangerous to release
A substantive community analysis of why Claude Mythos is being withheld from public release β the author argues it's genuinely about capability thresholds, not marketing. Worth reading as a grounded take on how AI labs are actually thinking about capability-safety tradeoffs in practice, with specific reference to what Mythos can do in security research contexts.
reddit/r/singularity
|
In Defense of AGI Skepticism
A long-form essay defending the position that AGI β defined as a system substitutable for a human in any knowledge work β may be substantially further away than maximalists claim. Written for an audience that reads r/singularity, so it engages seriously with the strongest counterarguments. Worth reading as a steel-man of the skeptical position.
reddit/r/singularity
|
|
π¬ Check It Out
|
Want to Star Opposite Daniel Radcliffe? At βEvery Brilliant Thing,β You Have a Chance.
Daniel Radcliffe is doing an interactive one-person show at the Hudson Theater in NYC where he enlists audience members to participate before each performance. 'Every Brilliant Thing' is a piece about depression and the things that make life worth living β small-scale, intimate theater with genuine craft rather than spectacle. Worth catching if you're in or visiting NYC.
nyt/Arts
|
|
β‘ FYI
|
AMD's senior director of AI thinks 'Claude has regressed' and that it 'cannot be trusted to perform complex engineering'
AMD's senior director of AI publicly complained that Claude has regressed and can't be trusted for complex engineering work β this tracks with the HN thread about Anthropic silently reducing cache TTLs and model quality shifting unpredictably. A practitioner-level signal that reliability and consistency matter as much as benchmark numbers for production use.
reddit/r/singularity
|
New comment by sunaurus in "Anthropic downgraded cache TTL on March 6th"
Engineers are losing trust in Claude and Codex not because of capability drops per se, but because changes are made silently (e.g., cache TTL quietly reduced on March 6th) with no transparency. The comment thread captures a broader sentiment shift: when you can't tell if you're getting the same model you tested against, the product becomes unreliable as infrastructure.
hn/Best Comments
|
Workers in some Indian factories have started wearing cameras on their heads to record their movements so robots can be trained using the footage.
Workers in Indian factories are wearing head-mounted cameras so humanoid robots can be trained on their movement data. This is a concrete, present-tense example of how the labor economics of robotics training data actually plays out β not in Silicon Valley labs but in low-wage manufacturing environments.
reddit/r/singularity
|
Neuralink enables nonverbal ALS patient to speak again with thoughts and AI-cloned voice
Neuralink has enabled a nonverbal ALS patient to communicate again using a brain-computer interface combined with an AI-cloned voice. A meaningful real-world result that separates Neuralink from its hype cycle β this is the kind of application that justifies the technology regardless of what else happens with the company.
reddit/r/singularity
|
The Iran War Has Prompted Some Companies to Raise Prices
Delta, Amazon, USPS and others are raising prices citing higher energy costs from the Iran war and Strait of Hormuz disruption. US CPI surged in March with the biggest monthly increase since the 2022 inflation peak. The macro context in California matters β energy costs affect data center costs, commuting, and consumer spending broadly.
nyt/Business
|
The Strait of Hormuz and Iranβs Uranium Stockpiles Were Sticking Points in U.S.-Iran Peace Talks
Iran is demanding a final peace deal before reopening the Strait of Hormuz, while the US wants it open immediately β this stalemate is why Vance left Pakistan talks without a deal. The standoff is directly driving the inflation and market volatility that's becoming a major macroeconomic backdrop for 2026.
nyt/Top Stories
|
New comment by MidnightRider39 in "We've raised $17M to build what comes after Git"
A $17M a16z-backed startup claims to be building 'what comes after Git' for AI-assisted workflows. The HN comments are the real content: skepticism is high, the product vision is vague, and the Vampyre comment cuts to the moat question β a company building lock-in on top of a commodity workflow tool raises immediate defensibility concerns.
hn/Best Comments
|
New comment by qsort in "AI assistance when contributing to the Linux kernel"
The Linux kernel project has clarified its AI policy: you can use AI tools to contribute code, but you take full responsibility and the code must satisfy license requirements. Refreshingly pragmatic β no ban, no special exemption, just normal engineering accountability applied to a new toolset.
hn/Best Comments
|