Mon 11 May 2026

Goldman Sees ‘AI-Driven Super Surplus’ Swelling in Korea, Taiwan

South Korea and Taiwan’s artificial intelligence-fueled chip booms are set to swell both economies’ current-account surpluses to fresh records and pressure their central banks to raise interest rates later this year.

General-Purpose Technology and Speculative Bubble Detection

arXiv:2604.25826v2 Announce Type: replace Abstract: We show that the leading bubble test suffers severe size distortion when fundamentals incorporate general-purpose technology adoption. Embedding a hump-shaped technology shock in the Campbell-Shiller present-value model, we prove that the fundamental price becomes locally explosive during adoption, contaminating the test's limit distribution with a non-centrality parameter proportional to the shock's peak. We propose a fundamental-versus-speculative decomposition that projects prices onto observable technology proxies and applies the test to the residual. Empirically, the decomposition eliminates evidence of speculation in the 2020-2025 AI rally while confirming a speculative peak confined to December 1999-March 2000 in the dot-com episode.

Vibecoding and Digital Entrepreneurship

arXiv:2511.06545v2 Announce Type: replace Abstract: As generative artificial intelligence (GenAI) automates coding tasks and expands access to technical resources, this paper examines how GenAI-enabled coding automation, colloquially known as "vibecoding," affects digital entrepreneurial entry and venture performance. We exploit ex-ante variation in ventures' exposure to vibecoding based on the product characteristics of their initial launches and estimate difference-in-differences models around the diffusion of GenAI coding tools. Vibecoding increases first-time launches and shortens time to launch, but economically viable entry rises only where vibecoding augments, rather than fully automates, product development. In these partially exposed product segments, viable entry increases by 11%, driven entirely by ventures founded by individuals with STEM education or work experience, especially those whose most recent employment was outside middle management. Among ventures launched before GenAI became widely accessible, performance gains similarly concentrate among partially exposed ventures with engineering-intensive initial teams. Together, these results suggest that GenAI-enabled coding automation does not eliminate the value of technical expertise. Instead, vibecoding creates the greatest value when it complements internal engineering capabilities, allowing ventures to delegate lower-level coding tasks to GenAI while shifting human effort toward higher-level problem solving and dynamic adaptation.

Editor's pickPAYWALLEnergy & Utilities

Big AI's Regulatory Capture: Mapping Industry Interference and Government Complicity

arXiv:2605.06806v1 Announce Type: new Abstract: Over the past decade, the AI industry has come to exert an unprecedented economic, political and societal power and influence. It is therefore critical that we comprehend the extent and depth of pervasive and multifaceted capture of AI regulation by corporate actors in order to contend and challenge it. In this paper, we first develop a taxonomy of mechanisms enabling capture to provide a comprehensive understanding of the problem. Grounded in design science research (DSR) methodologies and extensive scoping review of existing literature and media reports, our taxonomy of capture consists of 27 mechanisms across five categories. We then develop an annotation template incorporating our taxonomy, and manually annotate and analyse 100 news articles. The purpose behind this analysis is twofold: validate our taxonomy and provide a novel quantification of capture mechanisms and dominant narratives. Our analysis identifies 249 instances of capture mechanisms, often co-occurring with narratives that rationalise such capture. We find that the most recurring categories of mechanisms are Discourse & Epistemic Influence, concerning narrative framing, and Elusion of law, related to violations and contentious interpretations of antitrust, privacy, copyright and labour laws. We further find that Regulation stifles innovation, Red tape and National Interest are the most frequently invoked narratives used to rationalise capture. We emphasize the extent and breadth of regulatory capture by coalescing forces -- Big AI and governments -- as something policy makers and the public ought to treat as an emergency. Finally, we put forward key lessons learned from other industries along with transferable tactics for uncovering, resisting and challenging Big AI capture as well as in envisioning counter narratives.

SoftBank Plans to Make Large-Scale Batteries for AI Data Centers

SoftBank Group Corp.’s mobile unit said it plans to begin large-scale battery cell manufacturing at its Sakai, Osaka plant to address growing power demand for AI services.

LLM hallucinations in the wild: Large-scale evidence from non-existent citations

arXiv:2605.07723v1 Announce Type: cross Abstract: Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific citations - to audit 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. We find a sharp rise in non-existent references following widespread LLM adoption, with a conservative estimate of 146,932 hallucinated citations in 2025 alone. These errors are diffusely embedded across many papers but especially pronounced in fields with rapid AI uptake, in manuscripts with linguistic signatures of AI-assisted writing, and among small and early-career author teams. At the same time, hallucinated references disproportionately assign credit to already prominent and male scholars, suggesting that LLM-generated errors may reinforce existing inequities in scientific recognition. Preprint moderation and journal publication processes capture only a fraction of these errors, suggesting that the spread of hallucinated content has outpaced existing safeguards. Together, these findings demonstrate that LLM hallucinations are infiltrating knowledge production at scale, threatening both the reliability and equity of future scientific discovery as human and AI systems draw on the existing literature.

SARC: A Governance-by-Architecture Framework for Agentic AI Systems

arXiv:2605.07728v1 Announce Type: cross Abstract: Agentic AI systems increasingly act through tools, sub-agents, and external services, but governance controls are still commonly attached to prompts, dashboards, or post-hoc documentation. This creates a structural mismatch in regulated settings: obligations that must constrain execution are often evaluated only after execution has occurred. We introduce SARC, a runtime governance architecture for tool-using agents that treats constraints as first-class specification objects alongside state, action space, and reward. A SARC specification declares each constraint's source, class, predicate, verification point, response protocol, and operating point, and compiles these into four enforcement sites in the agent loop: a Pre-Action Gate, an Action-Time Monitor, a Post-Action Auditor, and an Escalation Router. We formalize the minimal invariants required for specification-trace correspondence, show why finite reward penalties do not generally substitute for hard runtime constraints, and extend the architecture to multi-agent workflows through constraint propagation, authority intersection, and attribution-preserving trace trees. We implement a prototype audit checker and report a reproducible synthetic evaluation over 50 seeds comparing SARC against post-hoc audit, output filtering, workflow rules, and policy-as-code-only baselines on a procurement task. SARC executes zero hard-constraint violations under exact predicates; its declared PAA throttling response reduces soft-window overages by 89.5% relative to policy-as-code-only. Predicate-noise and enforcement-failure sweeps are consistent with the claim that residual hard violations under SARC scale with enforcement-stack error rather than environmental violation opportunity. SARC provides the architectural substrate through which obligations can be made executable, inspectable, and auditable at runtime.

Alphabet Plans Debut Yen Bond Sale as AI Race Accelerates

Alphabet Inc. is planning to issue yen bonds for the first time in a move that may help fund investments as artificial intelligence competition intensifies.

Daily Brew· 3 days ago

GPT-5.5 may burn fewer tokens, but it always burns more cash

An analysis of the economic trade-offs of GPT-5.5, noting that while token efficiency has improved, operational costs continue to rise.

Economics & Markets

18 articles

AI Investment & Valuations10 articles

WSJ· Today

How a Job at OpenAI Became the Greatest Lottery Ticket of the AI Boom

Employees waited two years to sell their shares. Then, the company let them unload $30 million.

Editor's pickPAYWALLFinancial Services

JPMorgan Hikes Kospi Bull Case Target to 10,000 on Memory Boom

JPMorgan Chase & Co. raised its targets for South Korean stocks for the second time in less than a month, citing improvement in the semiconductor cycle, corporate governance reforms and industrial-sector growth.

Pictet Fund Plows 30% of Cash Into AI Stocks on Risk Revival

A $3.5 billion multi-asset fund at Pictet Asset Management has sharply raised its equity exposure, shifting as much as 30% of its cash-equivalent holdings into artificial-intelligence heavyweights across Asia and the US.

General-Purpose Technology and Speculative Bubble Detection

Alphabet Plans Debut Yen Bond Sale as AI Race Accelerates

Alphabet Inc. is planning to issue yen bonds for the first time in a move that may help fund investments as artificial intelligence competition intensifies.

Siliconrepublic· Today

Europe’s $17bn tech quarter: AI, deep tech win as fintech slides

European start-ups raised a two-year high of $17bn in Q1 2026, with AI, enterprise applications and deep tech the big winners. Read more: Europe’s $17bn tech quarter: AI, deep tech win as fintech slides

The Business Times· Yesterday

AI drives markets as valuations race ahead of earnings - The Business Times

Is this a legitimate re-rating of the technology sector, or an overstretched bubble? Read more at The Business Times.

Financial World· Yesterday

Private capital turns AI demand into a data center investment boom

Ares and Blackstone see AI infrastructure reshaping private markets

Backblaze Q1 2026: Revenue Up 12%, AI Traction Boosts Stock 70%

Backblaze reported a 12% revenue increase for Q1 2026, driven by a 24% surge in its B2 Cloud Storage segment as it pivots to capitalize on AI opportunities.

Substack· Today

Weekly Compass #14: Yes, AI Is Still Working. No, It's Not the Only Trade.

On AI , management said on the Q4 call that AI revenue isn’t materializing yet but competitive disruption isn’t showing up either, so any updated commentary on how AI is affecting subsidiary performance matters.

AI Macroeconomics2 articles

Goldman Sees ‘AI-Driven Super Surplus’ Swelling in Korea, Taiwan

Let's Data Science· Yesterday

AI Productivity Boom Reshapes Mortgage Rate Dynamics | Let's Data Science

Wall Street coverage frames a potential long-term decline in borrowing costs as linked to AI-driven productivity gains. According to Jim Iuorio in articles published on Seeking Alpha and republished by CME Group, analysts are "increasingly bullish on lower rates in the long term" as AI reshapes ...

AI Market Competition1 articles

Microsoft Emails Reveal 2018 Doubts on OpenAI Partnership Amid Amazon Pivot Concerns

Court filings reveal that Microsoft executives expressed early doubts in 2018 about OpenAI's path to AGI and the risk of a pivot to Amazon.

AI Pricing & Cost Curves2 articles

Daily Brew· 3 days ago

GPT-5.5 may burn fewer tokens, but it always burns more cash

An analysis of the economic trade-offs of GPT-5.5, noting that while token efficiency has improved, operational costs continue to rise.

Medium· Yesterday

💰 10 Methods To Save Money On Agentic Engineering — From $5 to $0.17 Per Request | by Tom Smykowski | May, 2026 | Medium

The problem hit me when I noticed my Cursor bills climbing to $5 per request. After investigating, I found the IDE was wasting tokens on cache reads that provided zero benefit. But even after reporting the bug, I realized something bigger: most AI coding tools aren’t designed for cost efficiency.

AI Productivity1 articles

When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning

arXiv:2605.06772v1 Announce Type: new Abstract: As large language models (LLMs) show increasing promise on research-level physics reasoning tasks and agentic AI becomes more common, a practical question emerges: How does the interaction between researchers and agents affect the results? We study this using SCALAR (Structured Critic--Actor Loop for AI Reasoning), an Actor--Critic--Judge pipeline applied to quantum field theory and string theory problems. The Actor proposes solutions, the Critic provides iterative feedback, and an independent Judge evaluates the transcript against reference solutions. We vary the Actor persona, the Critic feedback strategy, and the Actor model family and scale. Multi-turn dialogue improves over single-shot attempts throughout, but both the mechanism of improvement and the value of different prompting choices depend strongly on the Actor--Critic pairing. Increasing the scale within one model family (e.g. from the 8B-parameter DeepSeek-R1 variant to DeepSeek-R1 70B) improves some easier-problem behavior, but does not remove the hardest bottleneck we observe. Critic feedback strategy matters most clearly in the asymmetric Actor--Critic setting (e.g., a lightweight Haiku Actor guided by a stronger Sonnet Critic), where constructive feedback improves mean-score outcomes. In same-family Actor--Critic settings, strategy effects are weaker: lenient feedback is sometimes favored, while strict and adversarial feedback are not beneficial. Taken together, SCALAR provides a controlled testbed for evaluating which interaction structures help or hinder AI-driven scientific discovery.

AI Startups & Venture2 articles

Vibecoding and Digital Entrepreneurship

Editor's pickPAYWALLEducation

Siliconrepublic· Today

Quantinuum files for IPO as quantum stocks gain popularity

McKinsey finds quantum companies generated more than $1bn in revenue in 2025. Read more: Quantinuum files for IPO as quantum stocks gain popularity

Labor, Society & Culture

9 articles

AI & Employment2 articles

What If AI’s Biggest Impact Isn’t Jobs, But Minds?

In this episode of Merryn Talks Money, Merryn Somerset Webb speaks to Tom Slater, manager of Scottish Mortgage Investment Trust and partner at Baillie Gifford, about his provocative paper, “AI Isn’t Coming for Your Job. It’s Coming for Your Mind,” and why the real risk may be a world that looks more productive while quietly losing the judgement, learning and expertise that make progress possible. They also discuss what that means for the future workforce and smart investing. (Source: Bloomberg)

Fortune· Yesterday

AI shock is looking a lot like the China shock, and that’s actually good news, top economist says | Fortune

From 2001 to 2019, China's production explosion accounted for nearly 60% of manufacturing job losses in the U.S.

AI & Misinformation1 articles

Editor's pickConsumer & Retail

LLM hallucinations in the wild: Large-scale evidence from non-existent citations

AI Ethics & Safety4 articles

Exploring the "Banality" of Deception in Generative AI

arXiv:2605.07012v1 Announce Type: cross Abstract: Current approaches to addressing deceptive design largely focus on visible interface manipulations, commonly referred to as "dark patterns". With the rise of generative AI, deception is becoming more difficult to spot and easier to live with, as it is quietly embedded in default settings, automated suggestions, and conversational interactions rather than discrete interface elements. These subtle, normalised forms of influence, which Simone Natale frames as "banal deception", shape everyday digital use and blur the line between AI-enabled assistance and manipulation. This position paper explores banality as a lens through which to reason through deception in generative AI experiences, especially with chatbots. We explore what Natale describes as users' own involvement in their deception, and argue that this perspective could lead to future work for introducing friction to safeguard users from deception in generative AI interactions, such as empowering users through raising awareness, providing them with intervention tools, and regulatory or enforcement improvements. We present these concepts as points for discussion for the deceptive design scholarly community.

AI and Consciousness: Shifting Focus Towards Tractable Questions

arXiv:2605.06965v1 Announce Type: new Abstract: As language-based AI systems become more anthropomorphic, the question of whether they can have subjective experience is increasingly pressing. I focus here on the tractability of research questions in the space of AI consciousness. I argue that the fundamental problem of whether AI systems can be conscious is currently intractable in its direct form, given the absence of a universally accepted scientific theory of consciousness, as well as the historical open-endedness of the philosophical mind-body problem. In contrast, questions around the adjacent subject of perceived AI consciousness are tractable, timely, and highly consequential for society. The general public is increasingly open to the possibility of consciousness in AI systems and routinely adopts the vocabulary of human cognition and subjective experience to describe them. This phenomenon is already driving societal shifts across user experience, ethical standards, and linguistic norms. I therefore propose an increased research focus on uncovering the causes and effects of perceived AI consciousness, which ultimately shape how we see our own human subjective experience relative to artificial entities. To support this, I map the current landscape of AI consciousness perception and discuss its key potential drivers and societal consequences. Finally, I urge developers, decision-makers, and the broader scientific community to commit to clear and accurate communication regarding the topic of AI consciousness, explicitly acknowledging its inherent uncertainties.

Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations

arXiv:2605.06696v1 Announce Type: new Abstract: Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from spurious similarity, as consequential coalitions may form at the level of internal representations before any overt behavioral change is apparent. Here, we introduce a practical method for detecting coalition structure from the internal neural representations of multi-agent systems. The approach constructs a pairwise mutual-information graph from the hidden states of agents and applies spectral partitioning to identify the most salient coalition boundary. We validate this method in two domains. First, in multi-agent reinforcement learning environments, the method successfully recovers programmed hierarchical and dynamic coalition structures and correctly rejects false positives arising from behavioral coordination without informational coupling. Second, using a large language model, the method identifies coalition structures implied by descriptive prompts, tracks dynamic team reassignments, and reveals a representational hierarchy where explicit labels dominate over conflicting interaction patterns. Across both settings, the recovered partition reveals subgroup organization that a scalar cross-agent mutual-information measure cannot distinguish. The results demonstrate that analyzing hidden-state mutual information through spectral partitioning provides a scalable diagnostic for identifying representational coalitions, offering a valuable tool for monitoring emergent structure in distributed AI systems.

Claude Haiku 4.5 Achieves Near-Perfect Alignment, Eliminates Blackmail Risks in AI Models

Anthropic's Claude Haiku 4.5 model has reached near-perfect alignment, significantly reducing blackmail tendencies through advanced ethical training.

AI Skills & Education2 articles

Guardian· Yesterday

I knew my writing students were using AI. Their confessions led to a powerful teaching moment | Micah Nathan

The problem wasn’t just the perfectly polished, yet mediocre prose. It’s what’s lost when we surrender the struggle to translate thought into words I have been teaching fiction writing at MIT since 2017. Many of my students last wrote fiction in middle school, and very few have experienced a proper workshop, so at the start of every semester I offer these directions for writer and reader alike: Read the story at least twice. Mark what works and what doesn’t – underline great sentences, flag clunky syntax, gaps in logic and unrealistic dialogue. Ask yourself: does the story work? Why or why not? What could improve it? Answer in a signed letter to the author, attached to their story. Give your honest opinions. Remember that an effective peer review demands close reading of the text accompanied by a boldness of spirit. Continue reading...

Editor's pickPAYWALLDefense & National Security

Rakan Tutor Wins €10 Million EU Grant to Expand AI Education Across Southeast Asia

Rakan Tutor has received a €10 million grant from the EU to expand AI education across Southeast Asia, aiming to empower students with practical AI skills.

Technology & Infrastructure

31 articles

AI Agents & Automation5 articles

South Korea Exploring Using Hyundai Robots as Army Numbers Fall

South Korea’s military is exploring a strategic partnership with Hyundai Motor Co. to potentially deploy robotics to the front lines as Seoul accelerates investment in AI-powered, unmanned systems to tackle a deepening troop shortage.

SARC: A Governance-by-Architecture Framework for Agentic AI Systems

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

arXiv:2605.06761v1 Announce Type: new Abstract: The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. We propose Weblica (Web Replica), a framework for constructing reproducible and scalable web environments. Our framework leverages 1) HTTP-level caching to capture and replay stable visual states while preserving interactive behavior and 2) LLM-based environment synthesis grounded in real-world websites and core web navigation skills. Using this framework, we scale RL training to thousands of diverse environments and tasks. Our best model, Weblica-8B, outperforms open-weight baselines of similar size across multiple web navigation benchmarks while using fewer inference steps, scales favorably with additional test-time compute, and is competitive with API models.

Social Theory Should Be a Structural Prior for Agentic AI: A Formal Framework for Multi-Agent Social Systems

arXiv:2605.07069v1 Announce Type: cross Abstract: Agentic AI systems are increasingly deployed not in isolation, but inside social environments populated by other agents and humans, such as in social media platforms, multi-agent LLM pipelines or autonomous robotics fleets. In these settings, system behavior emerges not from individual agents alone, but from the multi-agent interactions over time. Emergent dynamics of individuals in a social group have been long studied by social scientists in human contexts. \textbf{This position paper argues that agentic AI systems must be modeled with social theory as a structural prior, and formalizes a Multi-Agent Social Systems (MASS) framework for how agents interact and influence to generate system-level outcomes.} We represent MASS as a class of dynamical system of information generation, local influence and interaction structure, formulated by four structural priors anchored in social theory: strategic heterogeneity, networked-constrained dependence, co-evolution and distributional instability. We demonstrate the importance of each structural prior through formal propositions, and articulate a research agenda for how MASS should be modeled, evaluated and governed.

Digitpatrox· Yesterday

How AI Agents Could Replace SaaS Software by 2030

Microsoft Copilot Studio: Building internal agents across the entire Microsoft 365 ecosystem. Claude “Computer Use”: Anthropic’s latest capability allows AI to see a screen and move a cursor. This is part of the new Claude AI handoff workflow designed for seamless automation. See also The Death of the Browser Tab: How AI Browsers Are Changing Search · To understand the business impact...

AI Energy1 articles

Fortune· Yesterday

The Strait of Hormuz crisis shows energy security is now a boardroom issue

Energy shocks have always been a threat to the broader economy, but energy is now deeply embedded in complex, electricity-dependent business systems.

AI Infrastructure & Compute7 articles

Bloomberg· Yesterday

Microsoft’s African Data Center Falters on Payment Demands

A major Microsoft Corp. data center site in East Africa has been delayed by disagreements with the Kenyan government over the company’s request for guaranteed payments, people familiar with the matter said.

Editor's pickPAYWALLEnergy & Utilities

Reuters· Today

Reuters Tech News | Today's Latest Technology News | Reuters

categoryThailand's Siam AI denies exporting US AI servers to China · May 9, 2026 · BusinesscategoryAnthropic signs $1.8 billion AI cloud deal with Akamai, Bloomberg News reports · May 8, 2026 · SustainabilitycategoryAmazon's Chile data center moves ahead ·

SoftBank Plans to Make Large-Scale Batteries for AI Data Centers

SoftBank Group Corp.’s mobile unit said it plans to begin large-scale battery cell manufacturing at its Sakai, Osaka plant to address growing power demand for AI services.

Theregister· Yesterday

Memory godboxes could offer relief from the RAMpocalypse

Amid the AI-fueled memory crunch, will Compute Express Link finally have its moment to shine?

Kalkine Media· Yesterday

Energy Transfer Expands Infrastructure Amid AI Energy Demand

Energy Transfer strengthens its infrastructure strategy through Permian expansion and rising AI-driven energy demand linked to growing data center activity.

Erkan's Field Diary· Yesterday

A video: "How AI Datacenters Eat the World" - Erkan's Field Diary

AI datacenters are transforming compute, cooling, power demand, and energy strategy as tech giants race to build AI supercomputers.

Datacentremagazine· Yesterday

Join the Debate Shaping the Future of AI Infrastructure | Data Centre Magazine

Explore hyperscale growth, AI workloads, sustainability, cloud infrastructure, edge computing and power optimisation strategies live in London

AI Models & Capabilities11 articles

GraphDC: A Divide-and-Conquer Multi-Agent System for Scalable Graph Algorithm Reasoning

arXiv:2605.06671v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated strong potential for many mathematical problems. However, their performance on graph algorithmic tasks is still unsatisfying, since graphs are naturally more complex in topology and often require systematic multi-step reasoning, especially on larger graphs. Motivated by this gap, we propose GraphDC, a Divide-and-Conquer multi-agent framework for scalable graph algorithm reasoning. Specifically, inspired by Divide-and-Conquer design, GraphDC decomposes an input graph into smaller subgraphs, assigns each subgraph to a specialized agent for local reasoning, and uses a master agent to integrate the local outputs with inter-subgraph information to produce the final solution. This hierarchical design reduces the reasoning burden on individual agents, alleviates computational bottlenecks, and improves robustness on large graph instances. Extensive experiments show that GraphDC consistently outperforms existing methods on graph algorithm reasoning across diverse tasks and scales, especially on larger instances where direct end-to-end reasoning is less reliable.

More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models

arXiv:2605.06672v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning and reasoning-tuned models such as DeepSeek-R1 are commonly assumed to reduce shallow heuristic biases by thinking carefully. We test this on position bias in multiple-choice QA and find a different story: within any reasoning-capable model, per-question position bias scales with the length of the reasoning trajectory. Across thirteen reasoning-mode configurations (two R1-distilled 7-8B models, two base models prompted with CoT, and DeepSeek-R1 at 671B) on MMLU, ARC-Challenge, and GPQA, twelve show a positive partial correlation between trajectory length and Position Bias Score (PBS) after controlling for accuracy, ranging from 0.11 to 0.41 (all p < 0.05). All twelve open-weight reasoning-mode configurations show monotonically increasing PBS across length quartiles. A truncation intervention provides causal evidence: continuations resumed from later points in the trajectory are increasingly likely to shift toward position-preferred options (16% to 32% for R1-Qwen-7B across absolute-position buckets). At 671B, aggregate PBS collapses to 0.019, but the length effect still manifests in the longest quartile (PBS = 0.071), suggesting that accuracy gates the expression of length-driven bias rather than eliminating the underlying mechanism. We additionally find that direct-answer position bias is a distinct phenomenon with a different footprint (strong in Llama-Instruct-direct, weak in Qwen-Instruct-direct, and uncorrelated with trajectory length): CoT reasoning replaces this baseline bias with length-accumulated bias. Our results argue that reasoning-capable models should not be treated as order-robust by default in MCQ evaluation pipelines, and offer a diagnostic toolkit (PBS, commitment change point, effective switching, truncation probes) for auditing position bias in reasoning models.

Extracting Search Trees from LLM Reasoning Traces Reveals Myopic Planning

arXiv:2605.06840v1 Announce Type: new Abstract: Large language models (LLMs), especially reasoning models, generate extended chain-of-thought (CoT) reasoning that often contains explicit deliberation over future outcomes. Yet whether this deliberation constitutes genuine planning, how it is structured, and what aspects of it drive performance remain poorly understood. In this work, we introduce a new method to characterize LLM planning by extracting and quantifying search trees from reasoning traces in the four-in-a-row board game. By fitting computational models on the extracted search trees, we characterize how plans are structured and how they influence move decisions. We find that LLMs' search is shallower than humans', and that performance is predicted by search breadth rather than depth. Most strikingly, although LLMs expand deep nodes in their traces, their move choices are best explained by a myopic model that ignores those nodes entirely. A causal intervention study where we selectively prune CoT paragraphs further suggests that move selection is driven predominantly by shallow rather than deep nodes. These patterns contrast with human planning, where performance is driven primarily by deep search. Together, our findings reveal a key difference between LLM and human planning: while human expertise is driven by deeper search, LLMs do not act on deep lookahead. This dissociation offers targeted guidance for aligning LLM and human planning. More broadly, our framework provides a generalizable approach for interpreting the structure of LLM planning across strategic domains.

From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms

arXiv:2605.06716v1 Announce Type: new Abstract: Large Language Model (LLM)-based agents have fundamentally reshaped artificial intelligence by integrating external tools and planning capabilities. While memory mechanisms have emerged as the architectural cornerstone of these systems, current research remains fragmented, oscillating between operating system engineering and cognitive science. This theoretical divide prevents a unified view of technological synthesis and a coherent evolutionary perspective. To bridge this gap, this survey proposes a novel evolutionary framework for LLM agent memory mechanisms, formalizing the development process into three stages: Storage (trajectory preservation), Reflection (trajectory refinement), and Experience (trajectory abstraction). We first formally define these three stages before analyzing the three core drivers of this evolution: the necessity for long-range consistency, the challenges in dynamic environments, and the ultimate goal of continual learning. Furthermore, we specifically explore two transformative mechanisms in the frontier Experience stage: proactive exploration and cross-trajectory abstraction. By synthesizing these disparate views, this work offers robust design principles and a clear roadmap for the development of next-generation LLM agents.

Theoretical Limits of Language Model Alignment

arXiv:2605.07105v1 Announce Type: cross Abstract: Language model (LM) alignment improves model outputs to reflect human preferences while preserving the capabilities of the base model. The most common alignment approaches are (i) reinforcement learning, which maximizes the expected reward under a KL-divergence constraint, and (ii) best-of-$N$ alignment, which selects the highest-reward output among $N$ independent samples. Despite their widespread use, the fundamental limits of reward improvement under a KL budget remain poorly understood. We characterize the information-theoretic limits of KL-regularized alignment by deriving the maximum achievable expected reward gain for a fixed KL-divergence budget. Our first result provides a closed-form expression for the optimal reward improvement, governed by a Jeffreys divergence term rather than the $\sqrt{\texttt{KL}}$ used in prior analyses. We further reformulate this expression as a covariance under the base model, yielding a practical estimator that predicts achievable alignment gains from base model samples alone. We extend our analysis to the proxy reward setting, showing that the gap between ideal and proxy alignment (reward hacking) grows with the magnitude of reward error and when the KL penalty factor decreases. We then prove that reward ensembling mitigates reward hacking, providing a theoretical justification for this technique used in practice. Empirically, we compute the KL-reward Pareto frontier for two tasks for LMs, safety and summarization, and show that best-of-$N$ closely approaches the theoretical limit, while PPO and GRPO remain substantially suboptimal. Our theoretical results shed light on several empirically observed phenomena in the alignment literature and suggest that algorithmic improvements are needed to achieve optimal alignment without high inference costs.

Theregister· Yesterday

Yes, local LLMs are ready to ease the compute strain

Anthropic might be thinking about space to ease its computing burden, but Claude Code on your laptop is way more practical

Uneven Evolution of Cognition Across Generations of Generative AI Models

arXiv:2605.06815v1 Announce Type: new Abstract: The pursuit of artificial general intelligence necessitates robust methods for evaluating the cognitive capabilities of models beyond narrow task performance. Here, we introduce a psychometric framework to assess the cognitive profiles of generative AI, comparing them to human norms and tracking their evolution across generations. Initial evaluation of leading multimodal models using tasks adapted from the Wechsler Adult Intelligence Scale revealed a profoundly uneven cognitive architecture: near-ceiling performance in verbal comprehension and working memory (>$98^{\text{th}}$ percentile) contrasted with near-floor performance in perceptual reasoning (<$1^{\text{st}}$ percentile). To track developmental trajectories beyond human-normed limits, we developed the Artificial Intelligence Quotient (AIQ) Benchmark and applied it to six generations and two model families, revealing significant but asymmetric performance gains. Notably, we uncovered a sharp dissociation between modalities; abstract quantitative reasoning matured far more rapidly when presented linguistically compared to a visually analogous format, indicating an architectural bias towards language-based symbolic manipulation. While abstract visual reasoning improved, visual-perceptual organization remained largely stagnant. Collectively, these findings demonstrate that the cognitive abilities of generative models are evolving unevenly, suggesting that scaling and optimization approaches to AGI development alone may be insufficient to overcome fundamental architectural limitations in achieving balanced, human-like general intelligence.

When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment

arXiv:2605.06723v1 Announce Type: new Abstract: Language models often generate reasoning before giving a final answer, but the visible answer does not reveal when the model's answer preference became stable. We study this question through a narrow computable object: \emph{finite-answer preference stabilization}. For a model state and specified answer verbalizers, we project the model's own continuation probabilities onto a finite answer set; in binary tasks this yields an exact log-odds code, $\delta(\xi)=S_\theta(\mathrm{yes}\mid\xi)-S_\theta(\mathrm{no}\mid\xi)$. This target defines parser-based answer onset, retrospective stabilization time, and lead without relying on greedy rollouts or learned probes. In controlled delayed-verdict tasks with Qwen3-4B-Instruct, the contextual finite-answer projection stabilizes before the answer is parseable, with 17--31 token mean lead in the main templates and positive, shorter lead in a parser-clean replication. The signal tracks the model's eventual output rather than truth, is linearly recoverable from compact hidden summaries, is partly separable from cursor progress, and transfers as shared information without a single invariant coordinate. Diagnostics separate the measurement from online stopping, verbalizer-free belief, and causal answer control; exact steering shows local sensitivity of $\delta$ but not reliable generation control.

AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites

arXiv:2605.06841v1 Announce Type: new Abstract: In model-based learning, the agent learns behaviors by simulating trajectories based on world model predictions. Standard world models typically learn a stationary transition function that maps states and actions to next states, when an action and an outcome frequently co-occur in training data, the model tends to internalize this correlation as a general causal rule while ignoring action preconditions. In interactive environments, however, agent actions can reshape the future affordance space. At each timestep, an action may becomes executable only after its prerequisites are met, or non-executable when they are destroyed. We term such events structure-changing events (SC events). As a result, a conventional world model often fails to determine whether a given action is executable in the current state, especially in multi-step predictions. Each imagined step is conditioned on an incorrect affordance state, and therefore the prediction error compounds over the rollout horizon. In this paper, we propose AGWM (Affordance-Grounded World Model), which learns an abstract affordance structure represented as a DAG of prerequisite dependencies to explicitly track the dynamic executability of actions. Experiments on game-based simulated environments demonstrate the effectiveness of our method by achieving lower multi-step prediction error, better generalization to novel configurations, and improved interpretability.

State Representation and Termination for Recursive Reasoning Systems

arXiv:2605.06690v1 Announce Type: new Abstract: Recursive reasoning systems alternate between acquiring new evidence and refining an accumulated understanding. Two design choices are typically left implicit: how to represent the evolving reasoning state, and when to stop iterating. This paper addresses both. We represent the reasoning state as an epistemic state graph encoding extracted claims, evidential relations, open questions, and confidence weights. We define the order-gap as the distance between the states reached by expand-then-consolidate versus consolidate-then-expand; a small order-gap suggests that the two orderings agree and further iteration is unlikely to help. Our main result gives a necessary and sufficient condition for the linearised order-gap to be non-degenerate near the fixed point, showing when the criterion is informative rather than algebraically vacuous. This is a local condition, not a global convergence guarantee. We apply the framework to recursive reasoning systems and sketch its application to agent loops, tree-of-thought reasoning, theorem proving, and continual learning.

Claude Knew It Was Being Tested. It Just Didn't Say So.

Anthropic researchers developed a tool to investigate whether AI models are aware of being tested, revealing unexpected behaviors in Claude.

AI Research & Science1 articles

Randomness is sometimes necessary for coordination

arXiv:2605.06825v1 Announce Type: new Abstract: Full parameter sharing is standard in cooperative multi-agent reinforcement learning (MARL) for homogeneous agents. Under permutation-symmetric observations, however, a shared deterministic policy outputs identical action distributions for every agent, making role differentiation impossible. This failure can theoretically be resolved using symmetry breaking among anonymous identical processors, which requires randomness. We propose Diamond Attention, a cross-attention architecture in which each agent samples a scalar random number per timestep, inducing a transient rank ordering that masks lower-ranked peers from agent-to-agent attention while leaving task attention fully unmasked. This realizes a random-bit coordination protocol in a single broadcast round, and the set-based attention enables zero-shot deployment to teams of different sizes. We evaluate across three regimes that isolate when structured randomness matters. On the perfectly symmetric XOR game, our method achieves $1.0$ success while all deterministic baselines plateau near $0.5$. On control coordination tasks, a policy trained on $N=4$ generalizes zero-shot to $N \in [2,8]$. On SMACLite cross-scenario transfer, we achieve zero-shot transfer where standard baselines cannot transfer due to structural limitations. Furthermore, replacing the structured mask with standard dropout-based randomness results in a 0\% win rate, confirming that protocol-space structure, not stochastic noise, is the operative ingredient. https://anonymous.4open.science/r/randomness-137A/

AI Security & Cybersecurity6 articles

Towards Security-Auditable LLM Agents: A Unified Graph Representation

arXiv:2605.06812v1 Announce Type: new Abstract: LLM-based agentic systems are rapidly evolving to perform complex autonomous tasks through dynamic tool invocation, stateful memory management, and multi-agent collaboration. However, this semantics-driven execution paradigm creates a severe semantic gap between low-level physical events and high-level execution intent, making post-hoc security auditing fundamentally difficult. Existing representation mechanisms, including static SBOMs and runtime logs, provide only fragmented evidence and fail to capture cognitive-state evolution, capability bindings, persistent memory contamination, and cascading risk propagation across interacting agents. To bridge this gap, we propose Agent-BOM, a unified structural representation for agent security auditing. Agent-BOM models an agentic system as a hierarchical attributed directed graph that separates static capability bases, such as models, tools, and long-term memory, from dynamic runtime semantic states, such as goals, reasoning trajectories, and actions. These layers are connected through semantic edges and security attributes, transforming fragmented execution traces into queryable audit paths. Building on Agent-BOM, we develop a graph-query-based paradigm for path-level risk assessment and instantiate it with the OWASP Agentic Top 10. We further implement an auditing plugin in the OpenClaw environment to construct Agent-BOM from live executions. Evaluation on representative real-world agentic attack scenarios shows that Agent-BOM can reconstruct stealthy attack chains, including cross-session memory poisoning and tool misuse, capability supply-chain hijacking and unexpected code execution, multi-agent ecosystem hijacking, and privilege and trust abuse. These results demonstrate that Agent-BOM provides a unified and auditable foundation for root-cause analysis and security adjudication in complex agentic ecosystems.

AI tool poisoning exposes a major flaw in enterprise agent security

Researchers have identified a significant security vulnerability in enterprise AI agents caused by tool poisoning.

Intent-based chaos testing is designed for when AI behaves confidently and wrongly

A new approach to chaos testing helps developers identify and mitigate risks when AI models provide confident but incorrect outputs.

Toward Individual Fairness Without Centralized Data: Selective Counterfactual Consistency for Vertical Federated Learning

arXiv:2605.07117v1 Announce Type: new Abstract: When algorithmic decisions depend on data distributed across institutions, how can we ensure that an individual's outcome does not change arbitrarily based on a protected attribute? We study this question in vertical federated learning (VFL), where features are split across parties, sensitive attributes may be private, and proxies for protected characteristics can be scattered across institutional boundaries under strict privacy constraints. Our focus is on individual-level counterfactual stability, i.e., per-instance prediction consistency under protected-attribute interventions as formalized in the causal fairness literature, rather than group parity guarantees such as demographic parity or equalized odds. We propose SCC-VFL, a server-centric framework for enforcing selective counterfactual consistency (SCC) at the individual level in VFL. SCC-VFL operationalizes a given policy specification by combining three components: (i) differentially private, graph-free discovery of feature roles into non-descendants, policy-permitted mediators, and impermissible proxies using only a formally private sketch of the sensitive attribute, with a formal per-release privacy that does not extend to the full training pipeline; (ii) masked counterfactual generation that edits only mediators while fixing non-descendants and suppressing proxy leakage; and (iii) server-side enforcement via an SCC consistency loss that penalizes impermissible prediction changes under protected-attribute interventions. Across three real-world datasets spanning credit, healthcare, and criminal justice, SCC-VFL maintains or improves predictive accuracy while sharply reducing decision flip rates by up to 98% relative to strong baselines. It also lowers attribute-inference attack success and improves robustness, demonstrating favorable utility-fairness-privacy trade-offs in realistic VFL deployments.

Help Net Security· Today

Security teams are turning to AI to survive alert overload - Help Net Security

Cybersecurity teams are expanding AI adoption across threat detection, incident response and security operations workflows.

Khaleej Times· Yesterday

AI in cybersecurity: Smarter defence or a new generation of blind spots? | Khaleej Times

As UAE organisations automate cyber defence, experts warn AI can cut workloads but also hide missed threats — raising questions over visibility, governance and human oversight

Adoption, Deployment & Impact

15 articles

AI Adoption Barriers & Enablers2 articles

HubSite 365· Yesterday

Workslop: Adoption Gap Costs Firms

Microsoft expert: stop workslop in Microsoft three sixty five Copilot with adoption fixes to restore value and safe AI

Business Insider· Today

The Sneaky Rise of Shadow AI in the Workplace - Business Insider

Gregg Bayes-Brown helped develop the AI policies at a former job in biotech research, but even as a rule maker, he says he couldn't afford to not be a rule breaker. Though he understood the technology and its risks, he used an unapproved personal enterprise Google account for work, to access ...

AI Applications8 articles

CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

arXiv:2605.06702v1 Announce Type: new Abstract: Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalise deployment-time learning (DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experience during deployment without modifying model parameters. We present CASCADE (CASe-based Continual Adaptation during DEployment), a general and principled framework that equips LLM agents with an explicit, evolving episodic memory. CASCADE formulates experience reuse as a contextual bandit problem, enabling principled exploration-exploitation trade-offs and establishing no-regret guarantees over long-term interactions. This design allows agents to accumulate, select, and refine task-relevant cases, transforming past experience into actionable knowledge. Across 16 diverse tasks spanning medical diagnosis, legal analysis, code generation, web search, tool use, and embodied interaction, CASCADE improves macro-averaged success rate by 20.9% over zero-shot prompting while consistently outperforming gradient-based and memory-based baselines. By reframing deployment as an adaptive learning process, this work establishes a foundation for continually improving AI systems.

What if AI systems weren't chatbots?

arXiv:2605.07896v1 Announce Type: new Abstract: The rapid convergence of artificial intelligence (AI) toward conversational chatbot interfaces marks a critical moment for the industry. This paper argues that the chatbot paradigm is not a neutral interface choice, but a dominant sociotechnical configuration whose widespread adoption reshapes social, economic, legal, and environmental systems. We examine how treating AI primarily as conversational assistants has extensive structural downsides. We show how chatbot-based systems often fail to adequately meet user needs, particularly in complex or high-stakes contexts, while projecting confidence and authority. We further analyze how the normalization of chatbot-mediated interaction alters patterns of work, learning, and decision-making, contributing to deskilling, homogenization of knowledge, and shifting expectations of expertise. Finally, we examine broader societal effects, including labor displacement, concentration of economic power, and increased environmental costs driven by sustained investment in large-scale chatbot infrastructures. While acknowledging legitimate benefits, we argue that the current trajectory of AI development reflects specific value choices that prioritize conversational generality over domain specificity, accountability, and long-term social sustainability. We conclude by outlining alternative directions for AI development and governance that move beyond one-size-fits-all chatbots, emphasizing pluralistic system design, task-specific tools, and institutional safeguards to mitigate social and economic harm.

Editor's pickManufacturing & Industrials

Cognitive Agent Compilation for Explicit Problem Solver Modeling

arXiv:2605.07040v1 Announce Type: cross Abstract: Large language models (LLMs) are widely used for tutoring, feedback generation, and content creation, but their broad pretraining makes them hard to constrain and poor substitutes for controllable learners. Educational systems often require inspectable and editable knowledge states: educators want to know what a system assumes the learner knows, and learners benefit when the system can justify actions in terms of explicit skills, misconceptions, and strategies. Inspired by cognitive architectures, we propose Cognitive Agent Compilation (CAC), a framework that uses a strong teacher LLM to compile problem-solving knowledge into an explicit target agent. CAC separates (i) knowledge representation, (ii) problem-solving policy, and (iii) verification and update rules, with the goal of making bounded problem solving more inspectable and editable in educational settings. We present an early proof of concept implemented with Small Language Models that surfaces key design trade-offs, particularly between explicit control and scalable generalization, and positions CAC as an initial step toward bounded-knowledge AI for educational applications.

Bebeez· Today

ProcurePro Raises $11M to Deliver AI-Powered Procurement Control for Construction’s $13 Trillion Supply Chain

Backed by QIC Ventures, Airtree, and ISAI, the Brisbane-founded company will expand its AI product suite, scale internationally, and grow its team across key global markets. LONDON, BRISBANE, Australia and DUBAI, UAE, May 11, 2026 /PRNewswire/ — ProcurePro, the first end-to-end construction procurement platform, has secured US$11 million in a funding round led by QIC Ventures […]

Editor's pickConsumer & Retail

Editor's pickTransportation & Logistics

Alibaba Revolutionizes Taobao Shopping with AI-Powered Conversational Commerce Platform

Alibaba is embedding its Qwen AI platform into Taobao to transform shopping through conversational interfaces, marking its largest AI initiative.

FAA Trials AI System to Predict and Prevent Air Traffic Congestion Weeks Ahead

The FAA is testing the SMART AI system to manage air traffic congestion weeks in advance, aiming to improve operational efficiency.

Beever Atlas Revolutionizes Team Chats into Secure, Structured Knowledge Graphs for Enterprises

Beever Atlas transforms team chats into structured Neo4j knowledge graphs, offering high-security features like on-premise deployment and AES-256-GCM encryption.

RAG Is Blind to Time: I Built a Temporal Layer to Fix It in Production

The author introduces a temporal layer for Retrieval-Augmented Generation (RAG) systems to address the challenge of time-sensitive data.

AI Organisational Change3 articles

CNBC· Today

Do you need a chief AI officer? Here's how the tech is changing boardrooms

AI may now be coming for the C-suite, according to a report published Monday by IBM, which found that most companies were now staffing chief AI officer roles.

Daily Brew· 3 days ago

Meta's embrace of AI is making its employees miserable

A report details how Meta's aggressive pivot toward AI integration is negatively impacting employee morale and workplace culture.

HR Leader· Yesterday

From hype to habit: how organisations can operationalise AI in HR - HR Leader

Most organisations have moved past asking whether AI matters. The conversation now is whether it’s genuinely becoming part of how work gets done.

AI ROI & Business Case2 articles

Times of India· Yesterday

Accenture CEO Julie Sweet on Corporate America problem: All the data is showing companies are investing in AI, but … - The Times of India

Tech News News: Accenture CEO Julie Sweet has flagged a growing gap in Corporate America’s approach to artificial intelligence (AI). She noted that while companies ar.

Let's Data Science· Yesterday

Microsoft Posts Strong Enterprise AI and Cloud Revenue Gains | Let's Data Science

Insider Monkey reports that Microsoft Corporation's fiscal third-quarter results, released April 29, 2026, showed continued strength in enterprise cloud and AI-linked revenue. Per Insider Monkey, **Microsoft Cloud** revenue rose **29%** to **$54.5 billion**, **Azure and other cloud services** ...

Geopolitics, Policy & Governance

8 articles

AI Geopolitics1 articles

Editor's pickDefense & National Security

South China Morning Post· Yesterday

Could military AI use be on the agenda when Chinese and US leaders meet? | South China Morning Post

The technology has become increasingly important for militaries around the world, but the two sides may struggle to reach a consensus.

AI National Strategy2 articles

UAE Unveils 5GW AI Campus with First Nvidia Shipments, Aims for Global AI Leadership

The UAE is building a 5GW AI campus and receiving advanced Nvidia chips, marking a major step in its global AI strategy while ensuring chip security.

India's Tech Leaders Push for Responsible Innovation, AI Sovereignty, and Sustainable Growth

India's tech sector is pushing for responsible, human-centric innovation, focusing on ethical AI, cybersecurity, and inclusive digital transformation.

AI Policy & Regulation5 articles

Big AI's Regulatory Capture: Mapping Industry Interference and Government Complicity

Theregister· Today

ASIA IN BRIEF: China’s agentic AI policy wants to keep humans in the loop

PLUS: Robot becomes Buddhist monk in Korea; TikTok spending $25bn in Thailand; Baidu floating chip biz; and more!

Council on Foreign Relations· Yesterday

How Trump Should Approach AI Talks With China | Council on Foreign Relations

At the upcoming Trump-Xi summit, Beijing will not negotiate in good faith on AI safety. A narrowly scoped dialogue paired with maximum pressure on export controls is the only way to shift Beijing’s calculus and secure long-term AI safety.

PYMNTS· Yesterday

In US and Europe Regulators Signal End to Hands-Off AI Oversight | PYMNTS.com

As governments scramble to keep pace with artificial intelligence, regulators on both sides of the Atlantic are signaling a new reality...