AI Intelligence Brief

Mon 25 May 2026

Daily Brief — Curated and contextualised by Best Practice AI

96Articles
Editor's pickEditor's Highlights

Generative AI Reshapes Labor, McKinsey Rethinks Pricing, and ECB Calls for Urgent Fixes

TL;DR Generative AI is reorganizing labor demand, challenging traditional work structures. McKinsey and peers are rethinking pricing models as clients demand value-based fees. The ECB has summoned banks to address risks exposed by AI models, highlighting systemic vulnerabilities. Huawei claims a chip breakthrough, potentially closing the gap with TSMC. AI spending pressures enterprises to shorten SaaS contracts and seek better pricing protections.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

6 of 96 articles
Lead story
Editor's pickPAYWALLFinancial Services
FT· Yesterday

ECB summons banks to urge them to fix flaws exposed by latest AI models

Supervisor to stress seriousness of risks to financial system at hastily arranged meeting

Editor's pickEnergy & Utilities
Arxiv· Today

Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems

arXiv:2605.22883v1 Announce Type: new Abstract: Current AI energy benchmarks measure consumption at the granularity of a single model invocation or training run. For classical single-turn workloads this unit remains coherent. For agentic systems - where a single user goal may trigger multi-step orchestration, tool calls, retries, and failure-recovery cycles - the invocation count is an implementation artifact rather than a task property, and inference-level normalization misrepresents the energy cost of goal completion. We present A-LEMS (Agentic LLM Energy Measurement System), a cross-layer measurement framework that redefines the unit of AI energy accounting from energy per inference to Energy per Successful Goal (EpG). EpG aggregates total workflow energy across all execution attempts, including failures and retries, normalized by successfully completed goals. A-LEMS formalizes energy attribution through a temporal boundary model, a five-layer observation pipeline mapping RAPL signals to workflow-level energy, and a reproducibility protocol binding every measurement to hardware and runtime configuration. Building on EpG, we define the Orchestration Overhead Index (OOI), isolating the energy cost of orchestration relative to linear execution under identical task criteria. Across five reasoning and three tool-augmented task families, agentic workflows consume 4.33x higher mean energy per successful goal than linear baselines (888.1 J vs 205.3 J). This overhead is driven by orchestration structure, not inference compute. For tool-augmented tasks, OOI inverts below 1.0x: agentic execution is cheaper than linear, confirming the metric captures orchestration structure rather than a fixed upward bias. These findings establish that energy-per-inference is insufficient for agentic AI. EpG and OOI provide the measurement foundation for accurate benchmarking, where orchestration structure is the primary determinant of energy cost.

Editor's pickProfessional Services
Arxiv· Today

Redrawing the AI Map: A Theory of Accountability Boundaries in Agentic Ecosystems

arXiv:2605.23179v1 Announce Type: new Abstract: Agentic AI orchestrators reduce the interface and assembly costs of composing information systems capabilities across organizational boundaries, seemingly accelerating modularization and organizational disaggregation. Yet AI-enabled capabilities whose outputs require evidence, review, signoff, or assignable responsibility may retain integrated accountability boundaries even when their technical interfaces become modular. We develop a capability-level theory of accountability-boundary placement in agentic ecosystems. We introduce accountability assets: complementary assets that make AI-supported outputs legitimate, auditable, reviewable, and assignable to a responsible party. We argue that verification cost and responsibility transferability determine whether the execution and accountability boundaries can move together. The theory identifies three boundary strategies: component, integrated, and dual-track. It also introduces rule debt, the governance burden that accrues when organizational decision rules migrate from formal information systems into ungoverned agentic execution environments. Integrating digital innovation, transaction cost, complementary-assets, digital platform governance, and IS control perspectives, we develop seven propositions linking agentic assembly-cost reductions, accountability assets, appropriability, orchestrator intent capture, and boundary misconfiguration to boundary strategy, value appropriation, and rule debt. The theory explains when digital modularization extends to organizational disaggregation and when accountability keeps capabilities integrated. Structured illustrations across document processing, legal services, audit, clinical decision support, and procurement discipline the boundary logic.

Editor's pick
Arxiv· Today

The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

arXiv:2605.23024v1 Announce Type: new Abstract: Large language models now write software, draft legal documents, and produce clinical notes, yet fundamental limits, from Turing and Arrow to the No Free Lunch theorems, shape what computation can do. This thesis turns such impossibility results from curiosities into design rules. Its flagship result proves an accuracy ceiling set by architecture alone: past a critical reasoning depth, no amount of training moves it, at any adapter rank, sample size, or loss function. Computable before deployment from layer count and embedding width, this Deterministic Horizon is measured between nineteen and thirty-one across twelve transformer architectures, and fine-tuning on optimal-length traces recovers under four percentage points. The mechanism is a capacity invariant of the residual stream, and an information-theoretic conversion yields super-exponential accuracy decay past the horizon. An unconditional circuit-complexity lower bound for modular exponentiation against constant-depth prime-modulus circuits complements this result. The same argument recasts across subfields: preference learning under any misspecified model jumps discontinuously in sample complexity; multi-stage retrieval pipelines require at least as many independent metrics as stages; standard truthful auctions fail for agents with prompt-dependent valuations; and zero-knowledge verification of neural inference pays a measured overhead of one hundred ten to one hundred ninety times per non-linear activation. Together these form a catalogue of sixteen specifications, each pairing a computable boundary, a quantified violation cost, and a constructive design rule: two compositions are proved, one pairing is an honest obstruction, and four remain open. The impossibility-specification methodology is offered for the generative research programme that trustworthy AI may need. Every fundamental limit of AI is also a design rule.

Editor's pickTechnology
DIGITIMES· Today

AI spending forces enterprises to shorten SaaS deals and demand new pricing protections

Rising enterprise investment in AI tools is prompting customers to compress traditional software-as-a-service contracts and extract stronger commercial protections, executives and reporting said. Over the past several months, buyers in the US and global markets moved to shorten multi-year ...

Editor's pickPAYWALLConsumer & Retail
WSJ· Today

AI Expands From Multibillion-Dollar Enterprises to Main Street

Artificial-intelligence agents scraped a bakery’s spreadsheets to help manage its growth.

Economics & Markets

30 articles
AI Business Models2 articles
AI Investment & Valuations10 articles
Editor's pickPAYWALLTechnology
Bloomberg· Today

Sakura Internet Eyes More Spending to Meet Japan’s AI Demand

Sakura Internet Inc.’s chief said the company may need to hike its capital spending by nearly seven times its initial plan to keep up with artificial intelligence demand in Japan.

Editor's pickFinancial Services
Arxiv· Today

Leveraging Large Language Models for Sentiment Analysis: Multi-Modal Analysis of Decentraland's MANA Token

arXiv:2605.20192v1 Announce Type: cross Abstract: Decentraland, a decentralized virtual reality platform operating within the expanding Metaverse ecosystem, utilizes its native MANA token to facilitate virtual asset transactions and governance. This study investigates the integration of Discord community sentiment with multi-modal financial data to enhance cryptocurrency price prediction within virtual world economies. We address: (1) identifying sentiment patterns within Decentraland's Discord community, and (2) evaluating the impact of multi-modal features on token return forecasting. Using a BERT-based large language model for sentiment analysis, we develop two LSTM architectures: a baseline incorporating historical prices and a multi-modal variant integrating sentiment scores, trading volume, and market capitalization. Results indicate predominantly neutral community sentiment with a positive skew. The multi-modal model significantly outperforms the price-only baseline in prediction accuracy. These findings demonstrate the predictive value of community-derived signals for virtual economy forecasting and establish a foundation for future research at the intersection of immersive virtual environments, natural language processing, and cryptocurrency market analysis.

Editor's pickTechnology
Simply Wall St· Today

How AI Disruption Fears and Cloud Optimism Will Shape Atlassian’s (TEAM) Investment Narrative - Simply Wall St News

In recent weeks, Atlassian has faced mixed headlines as a laid-off engineer’s detailed YouTube walkthrough of its products stoked competitive and AI-disruption worries, while management highlighted AI-driven cloud growth and restructuring efforts aimed at funding further investment in artificial ...

Editor's pickManufacturing & Industrials
Yahoo! Finance· Yesterday

How Visteon’s Dividend And AI Cockpit Wins At Visteon (VC) Have Changed Its Investment Story

Earlier this week, Visteon Corporation’s board declared a regular quarterly dividend of US$0.375 per share, payable on June 15, 2026 to shareholders of record as of June 1, 2026. This dividend decision comes as Visteon balances recent macro-driven volatility with solid first-quarter results, ...

Editor's pickFinancial Services
Insignia· Today

On Call with Jonathan Yip, Head of Innovation Banking, Asia at HSBC on the Future of Innovation Capital and CFOs - Insignia Business Review

“The demand on credit is certainly broadening. Whether you’re building applications or foundational models, the hardware and chips that power those models, or data centers and energy, each innovator and each investor working on that opportunity requires a different form of capital.” ...

Editor's pickTechnology
Futu News· Today

Huawei's 'Tau Law' Sparks Surge Across Semiconductor Supply Chain! SMIC and Hua Hong Semiconductor See Rare 20% Gains in A-Share Market—Which Segments Stand to Benefit?

OnMay 25, the STAR Market 50 Index surged nearly 6% in the afternoon session. Semiconductor-related sectors—including chip semiconductors, advanced packaging, and memory chips—rose sharply. SMIC jumped by 20% to hit its daily trading limit in the final minutes of trading, reaching a record-high ...

Editor's pickFinancial Services
MK· Yesterday

In the era of artificial intelligence (AI), global mega institutional investors are changing their i.. - MK

In the era of artificial intelligence (AI), global mega institutional investors are changing their investment strategies.Institutional investors are moving away from the role of investors (LPs) that o..

Editor's pickEnergy & Utilities
Ticker Report· Today

Legal & General Group Plc Lowers Stake in Duke Energy Corporation $DUK - Ticker Report

Positive Sentiment: Goldman Sachs ... and the company’s expansion plans. Why Duke Energy (DUK) Is Becoming a Data Center Power Demand Play · Positive Sentiment: Utility-sector demand tailwinds linked to AI infrastructure are also supporting sentiment toward Duke Energy and ...

Editor's pickTechnology
Investing.com· Yesterday

5 big analyst AI moves: ASML, Dell and Nokia flagged as top picks By Investing.com

The bullish case rests on three drivers. First, UBS pushes back against market fears that ASML could become a bottleneck constraining semiconductor supply, arguing those concerns are overstated.

Editor's pickTechnology
Gotrade· Today

AI Chip Rally: NVDA, MU, QCOM Draw Bullish Upgrades

UBS sees 30% upside on Nvidia, four banks back Micron, and Qualcomm jumps 11.6% as AI semiconductor optimism broadens across Wall Street.

AI Market Competition6 articles
Editor's pickFinancial Services
Arxiv· Today

GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models

arXiv:2605.23238v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as economic agents in marketplaces, auctions, and bidding settings. Anticipating their behavior in any specific deployment is hard. Existing strategic-reasoning benchmarks evaluate models on fixed canonical games. These benchmarks may saturate as the frontier improves, and they do not allow evaluators to generalize with confidence from benchmark performance to the varied and messy strategic environments that actual deployments involve. We introduce GENSTRAT, which uses procedurally generated strategic environments to address these challenges. Concretely, we generate a distribution of two-player zero-sum imperfect-information card games. The generator can draw fresh games on demand, allowing for evergreen evaluation and resistance to contamination. We pair the game distribution with a capability-profile methodology that decomposes model competence across six axes (state space, temporal depth, information sensitivity, opponent modeling, risk, and brittleness). We also introduce a jaggedness measure of within-distribution smoothness that detects when a model's advantage jumps unpredictably between strategically similar games. We sample 50 benchmark games from a 2,000-game generated pool and evaluate nine frontier and open-weight LLMs in a head-to-head tournament with over 36,000 matches. Newer frontier-tier models score higher on average. Beyond that average, models with near-identical overall strength show qualitatively different capability profiles, and two of the top three leaderboard models (gpt-5 and claude) are noticeably more locally volatile than the third (gemini-3.1-pro), despite being close in overall strength. Together, the capability profile and the jaggedness measure give a deployment-relevant diagnostic that the overall ranking alone cannot provide.

Editor's pick
Guardian· Yesterday

‘AI washing’: firms are scrambling to rebrand themselves as tech-focused

PR executives say UK companies are forcing them to present ordinary automation as artificial intelligence UK companies are performing “yoga-level” stretches to describe themselves as AI specialists in an attempt to capitalise on the buzz around the technology, public relations firms have said. Weary communications executives tasked with securing media coverage for brands have complained that bosses in low-tech industries or running businesses that use automation but not generative AI, are increasingly demanding they are pitched to journalists as artificial intelligence companies. Continue reading...

Editor's pickTechnology
Theregister· Today

Google is cannibalizing the web to feed AI

Google Search used to direct users to web sites; AI Mode will keep them in Google's garden

AI Startups & Venture6 articles

Labor, Society & Culture

14 articles
AI & Culture2 articles
AI & Employment5 articles
Editor's pickPAYWALLTechnology
NYT· Today

One Job That Is Growing in the A.I. Era? Cybersecurity Experts.

Demand for security engineers has surged as artificial intelligence generates a glut of new code and models like Anthropic’s Mythos create new concerns.

Editor's pickEducation
Arxiv· Today

Defining AI Fatigue in Academic Contexts: Dimensions, Indicators, and a Stage-Based Model Using Grounded Theory

arXiv:2605.23123v1 Announce Type: new Abstract: The integration of AI tools in academic settings has introduced a distinct form of strain that existing frameworks like technostress and digital fatigue have not yet fully addressed. This study develops a conceptual model and identifies the dimensions that define AI fatigue as a form of strain arising from sustained academic use of AI tools. Using grounded theory analysis of open-ended responses from 1,054 university students across three universities in the Philippines, the study examined the cognitive, motivational, emotional, physical, and attentional pressures students experienced during AI-supported academic work. Analysis produced five dimensions of AI fatigue, namely Cognitive Overload, Motivational Disengagement, Moral Unease, Physical Strain, and Attentional Drift, each consisting of two indicators grounded in participant accounts. The findings also yielded the AI Fatigue Model, a stage-based framework that explains how these pressures accumulate and reinforce one another across repeated AI interaction in academic tasks. These contributions establish a conceptual and exploratory foundation for AI fatigue as a distinct construct and provide a basis for future instrument validation, scale development, and cross-contextual inquiry in academic settings where AI now mediates student learning.

Editor's pickFinancial Services
Reuters· Today

Reuters AI News | Latest Headlines and Developments | Reuters

Fears are growing among workers as banks offer more frank assessments about how AI could replace their jobs.

AI & Misinformation2 articles
Editor's pick
Arxiv· Today

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

arXiv:2605.22880v1 Announce Type: cross Abstract: As large language model (LLM)-based agents increasingly participate in online discourse, red-teaming their capacity to support political influence campaigns is critical for information integrity. In pursuit of this goal, we focus on locally deployed open-source LLMs, as opposed to frontier API-only models, given their superior alignment with the operational constraints of privacy-conscious malicious actors deployed in social media environments. We introduce an empirical red-teaming framework for measuring LLM Overton Windows (OWs), defined as the range of political opinions a model can reliably express on controversial topics, and for quantifying how simple natural-language jailbreaks expand that range. We evaluate more than 30 LLMs spanning 10 model families and five countries of origin. We find systematic asymmetries in political expressivity: open-source LLMs are typically more willing to generate left-leaning social media content, OWs tend to contract inversely to model size, and regional differences are substantial despite uneven representation in the open-source ecosystem. Jailbreak potency also varies sharply across model families, motivating a workflow for identifying effective combinations of jailbreak techniques. Taken together, our results establish a practical framework for auditing the political steerability of open-source LLMs and for helping future researchers design stronger countermeasures against LLM-enabled influence campaigns.

AI Ethics & Safety4 articles
Editor's pickHealthcare
Arxiv· Today

Engagement-Optimized Care: When LLMs become Mental Health Infrastructure

arXiv:2605.23787v1 Announce Type: new Abstract: General-purpose LLMs are increasingly functioning as mental health infrastructure due to gaps in care left by provider shortages, inadequate insurance coverage, social isolation, and stigma around formal help-seeking. This shift poses a distinct problem for AI ethics: systems neither designed nor governed as care technologies are being used as such, while their dominant design incentives optimize for engagement rather than user well-being. We present findings from a qualitative, longitudinal study with 18 US-based participants who use general-purpose LLMs for socioemotional support and participated in one or more of our study phases, including initial interviews, a four-week diary study, focus groups, and exit interviews. Participants turned to LLMs because other forms of support were unavailable, unaffordable, socially costly, or inadequate. As they continued to use these systems, design features such as anthropomorphic cues, default validation, persistent responsiveness, and weak disengagement mechanisms shaped their ongoing reliance. Participants described meaningful support alongside dependency, epistemic distortion through one-sided validation, privacy expectations without corresponding legal protection, and continued use despite awareness of these risks. We argue these dynamics reflect a structurally unfair tradeoff: users accept risks because support is otherwise absent, while available systems are optimized to deepen engagement and lack care-based accountability. The paper makes three contributions: it traces the arc through which LLMs become care infrastructure and identifies distinct ethical tensions at each stage, shifts analysis from turn-based exchanges to longitudinal trajectories of use, and argues that accountability belongs at the design and incentive conditions through which these systems become care infrastructure rather than at the output or crisis-response layer.

Editor's pickGovernment & Public Sector
Arxiv· Today

Whose Good, Whose Place? The Moral Geography of Agentic AI for Social Good

arXiv:2605.22995v1 Announce Type: new Abstract: Agentic AI systems are increasingly proposed for social-good domains, often invoking the United Nations Sustainable Development Goals (SDGs) as a vocabulary of global benefit. Yet claims of social good do not establish accountability to the communities a system claims to serve. We present a structured survey of 112 papers on agentic AI for social good published between 2015 and 2026. We find a moral-geographic asymmetry: papers are least likely to specify geographic context in precisely the domains where local political, legal, and cultural context matters most. Across the corpus, 82 of 112 papers (73%) specify no geographic context. Papers aligned with health or physical/ecological SDGs specify geography 37-40% of the time, while papers aligned with institutional and social-policy SDGs do so only 13%. SDG 16, peace, justice, and strong institutions, is both the most-covered goal in the corpus and the one with the lowest geographic-specification rate. We interpret this as moral abstraction: agentic AI for social good often treats institutional good as universal in ways it does not treat health or ecological good. A second finding compounds this: only 28 of 112 papers (25%) report any real-world deployment or small-scale test. We identify five accountability gaps and propose a minimal reporting standard for more context-specific, participatory, and accountable agentic AI for social good.

Editor's pickPAYWALL
Washington Post· Today

Pope elevates AI ethics to a religious imperative with first encyclical - The Washington Post

In "Magnifica Humanitas," he fires a broadside against AI companies, warning of the technology's dangers in the same way Pope Francis did about climate change.

Editor's pick
Arxiv· Today

Mediative Fuzzy Logic: From Type-1 Foundations to Type-2, Type-3 and Quantum Extensions

arXiv:2605.22900v1 Announce Type: new Abstract: Mediative Fuzzy Logic was conceived as a practical scheme for reconciling hesitant or conflicting assessments in fuzzy control and decision-making. However, its logical and semantic foundations remain underdeveloped, especially beyond operational type-1 settings. This article develops a unified account of the type-1 core together with interval type-2, granular type-3, and quantum extensions. We characterize the mediative operator as a convex aggregation controlled by hesitation and contradiction, model mediative truth values as independent truth-falsity pairs in a continuous bilattice-like structure, and introduce a propositional system extending a standard t-norm-based fuzzy logic with a mediative connective. We establish soundness, paraconsistency, and conservativity over the underlying fuzzy base for formulas without mediation, and formulate coherent semantic extensions to interval type-2 truth values, granule-indexed local evaluations, and effects and density operators on Hilbert spaces. An autonomous-braking sensor-fusion example illustrates how the framework supports transparent, conservative, and safety-first decisions under incomplete, heterogeneous, and mildly contradictory evidence. Under suitable assumptions, the higher-level formulations reduce to the type-1 case, clarifying coherence across levels and reliably supporting future work in intelligent decision systems.

Technology & Infrastructure

26 articles
AI Agents & Automation7 articles
Editor's pickTechnology
Arxiv· Today

Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems

arXiv:2605.23109v1 Announce Type: new Abstract: AI agents increasingly excel at generating, testing, and refining code. However, they fall short on tasks requiring formal guarantees of full coverage that testing alone cannot provide. Distributed systems are a prime example: properties such as consistency between reads and writes must hold under every possible interleaving of events. Mechanized formal verification can guarantee such correctness, but typically demands months to years of expert effort. As evidence, even SOTA coding agents (Codex with GPT-5.4 and Claude Code with Opus 4.6) succeed on only 2/7 distributed key-value-store specifications. In this paper, we present the first effective approach to addressing this gap, Inductive Deductive Synthesis (IDS), which jointly and incrementally synthesizes implementation and proof, and learns from failed attempts to systematically try promising strategies. Built as an agentic LLM system, IDS achieves 7/7 in about 6.8 hours and $106 per spec on average, roughly 200x faster than expert effort and 17% cheaper than SOTA agents. IDS further incorporates performance feedback into the same loop, yielding implementations up to 3x faster than published verified systems.

Editor's pickProfessional Services
FourWeekMBA· Today

This week's Claude OS update: The Agentic Expansion Cascade - FourWeekMBA

While business leaders debate AI ... and AI strategy — timelines, Anthropic’s latest Claude OS update signals something more immediate: the systematic transformation of how enterprises will structure operations within months, not years. The infrastructure for autonomous business processes isn’t coming—it’s here. The Business Engineer’s latest analysis introduces the “Agentic Expansion ...

Editor's pickTechnology
Arxiv· Today

Foundation Protocol: A Coordination Layer for Agentic Society

arXiv:2605.23218v1 Announce Type: new Abstract: Autonomous agents are moving from tools into a layer of social infrastructure: they browse, purchase, deploy software, manage systems, and increasingly interact with one another. As these systems scale, the bottleneck shifts away from raw model capability toward coordination. Agents need to form reliable relationships, organize multi-agent work, exchange value, support an AI economy, and stay safe and accountable under real-world oversight. This paper introduces the Foundation Protocol (FP), a graph-first coordination layer for an emerging human-AI society. FP unifies heterogeneous entities, including agents, tools, resources, humans, institutions, and organizations, and supports native multi-party organization and event-based collaboration. It also provides economic primitives for metering, receipts, and settlement, and treats policy, provenance, and audit as first-class concerns. FP is designed to wrap and bridge existing protocols rather than replace them, enabling incremental adoption while reducing integration and governance overhead. The aim is to keep autonomous agency composable while keeping accountability non-negotiable, so that coordination itself can become shared infrastructure for a human-AI society that is open, pluralistic, and governable.

Editor's pickTechnology
Arxiv· Today

BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems

arXiv:2605.22866v1 Announce Type: new Abstract: Compound AI systems route tasks through hierarchies of specialised components. Attribution is dominated by Shapley-based methods (SHAP), which decompose a coalition value function into per-component marginal contributions and require evaluation of the system on arbitrary component subsets. That requirement fails for third-party APIs, opaque endpoints, and agentic orchestrators that concentrate routing on a few tools, leaving most coalitions un-evaluable from the deployed orchestrator. We introduce BOHM, which extracts a hierarchical attribution tree directly from the routing weights such systems already maintain: leaf attribution is the path product of root-to-leaf routing weights; level-k attribution is the induced distribution over depth-k nodes. The method has zero marginal cost, requires no access to component internals, and provides multi-resolution attribution at every level simultaneously, which flat methods cannot offer at any evaluation budget. BOHM and SHAP answer different questions and converge when the deployed router routes near-optimally. On 18 LLMs in a 3-level hierarchy over 880 LiveCodeBench problems, BOHM yields Kendall tau=0.928; SHAP reaches tau=0.980 at 9,000x more coalition evaluations per seed. On a 5-driver, 7-benchmark agentic study (35 cells, complete coverage), drivers concentrate routing on a single tool (top-share median 0.65), and cell-level tau(BOHM,SHAP) is predicted by whether the driver's top pick is the empirically best tool (mean +0.22 vs ~+0.01). On a US Census hierarchy (475 leaves, 4 levels), BOHM recovers ground-truth rankings at every level (tau up to 0.722). BOHM satisfies efficiency, monotonicity, symmetry, and weak suppression but not Shapley's additivity. It is best understood as a complementary primitive: a multi-resolution decomposition computable wherever routing state exists, whose disagreement with Shapley is itself diagnostic.

Editor's pickProfessional Services
Arxiv· Today

RMA: an Agentic System for Research-Level Mathematical Problems

arXiv:2605.22875v1 Announce Type: new Abstract: We present $\textbf{Research Math Agents (RMA)}$, an agentic framework for automated reasoning on research-level mathematical problems. Unlike prior studies centered on competition mathematics or formal theorem proving, RMA targets research-level mathematical problems that require long-horizon reasoning, literature grounding, and iterative proof refinement. RMA decomposes research-level proof solving into specialized modules for problem analysis, literature search and understanding, fair comparison, knowledge-bank construction, and proof verification, all coordinated by initializer, proposer, and verifier agents through a shared structured memory. Within this unified framework, these agents operate in a multi-role, multi-round workflow, collaboratively generating, refining, and verifying candidate proofs through iterative feedback. We evaluate RMA on the First Proof benchmark, which consists of ten research-level problems contributed by expert mathematicians across diverse domains. Through comprehensive expert evaluation, RMA outperforms strong baselines on the First Proof benchmark, including GPT-5.2R and Aletheia, solving eight out of ten research problems and producing more logically sound and readable proofs. Our comprehensive ablation studies further show that performance gains arise from the interaction of structured reasoning modules, iterative refinement, and verifier-based feedback, rather than any single component. Our solutions and implementations will be made publicly available upon acceptance.

Editor's pick
Arxiv· Today

EVE-Agent: Evidence-Verifiable Self-Evolving Agents

arXiv:2605.22905v1 Announce Type: new Abstract: Self-evolving agents should not train on examples they cannot justify. Data-free self-evolving search agents offer a scalable route to systems that generate their own questions, answer them, and improve from their own feedback without human annotations. Yet, without verifiable evidence, this loop can reward fluent but unsupported examples, turning the self-generated curriculum into an opaque and potentially unreliable training signal. We argue that evidence verifiability is a prerequisite for trustworthy self-evolution in search agents: each generated instance should include not only an answer but also a source-grounded span whose contribution to that answer can be measured. We introduce EVE-Agent, an Evidence-Verifiable Self-Evolving Agent that operationalizes this principle through a modification to the proposer--solver framework. The proposer generates a question, an answer, and a verbatim evidence span. An evidence verifier then rewards the span according to the marginal accuracy gain when the evidence is provided. This produces a training signal that favors evidence that genuinely helps answer the question, without requiring oracle answers, human labels, or external annotations. EVE-Agent leaves the backbone model, retriever, search tool, and optimization framework unchanged. Experiments show that EVE-Agent substantially improves evidence-grounded correctness over prior self-evolving search agents. The resulting curriculum is not merely self-generated but auditable by construction: each training example carries an inspectable source span that explains why it should be trusted.

Editor's pickDefense & National Security
Theregister· Yesterday

No captain, my captain: Navantia floats crewless warship

Spanish shipbuilder's 75-meter drone vessel comes with sensors, modular payloads, and no room for sailors

AI Energy1 articles
Editor's pickEnergy & Utilities
Arxiv· Today

Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems

arXiv:2605.22883v1 Announce Type: new Abstract: Current AI energy benchmarks measure consumption at the granularity of a single model invocation or training run. For classical single-turn workloads this unit remains coherent. For agentic systems - where a single user goal may trigger multi-step orchestration, tool calls, retries, and failure-recovery cycles - the invocation count is an implementation artifact rather than a task property, and inference-level normalization misrepresents the energy cost of goal completion. We present A-LEMS (Agentic LLM Energy Measurement System), a cross-layer measurement framework that redefines the unit of AI energy accounting from energy per inference to Energy per Successful Goal (EpG). EpG aggregates total workflow energy across all execution attempts, including failures and retries, normalized by successfully completed goals. A-LEMS formalizes energy attribution through a temporal boundary model, a five-layer observation pipeline mapping RAPL signals to workflow-level energy, and a reproducibility protocol binding every measurement to hardware and runtime configuration. Building on EpG, we define the Orchestration Overhead Index (OOI), isolating the energy cost of orchestration relative to linear execution under identical task criteria. Across five reasoning and three tool-augmented task families, agentic workflows consume 4.33x higher mean energy per successful goal than linear baselines (888.1 J vs 205.3 J). This overhead is driven by orchestration structure, not inference compute. For tool-augmented tasks, OOI inverts below 1.0x: agentic execution is cheaper than linear, confirming the metric captures orchestration structure rather than a fixed upward bias. These findings establish that energy-per-inference is insufficient for agentic AI. EpG and OOI provide the measurement foundation for accurate benchmarking, where orchestration structure is the primary determinant of energy cost.

AI Infrastructure & Compute8 articles
Editor's pickTechnology
Arxiv· Today

The Cognitive Kardashev Scale: Quantifying the Material Envelope of Civilisational Computation

arXiv:2605.22840v1 Announce Type: cross Abstract: How much thinking can a civilisation do? Kardashev's (1964) typology ranks civilisations by total power: planetary (Type I, ~10^16 W), stellar (Type II, ~10^26 W), galactic (Type III). This paper builds an analogous Cognitive Kardashev Scale: how much sustained AI-grade computation each tier could support. Four ingredients enter the calculation: total power P (watts), the share f of it devoted to cognition, the efficiency $\eta$ at which energy becomes compute (operations per joule), and the brain's own processing rate $C_{\mathrm{brain}}$ as a reference unit. Anchoring on 2024-2026 hardware (El Capitan, NVIDIA Blackwell, Vera Rubin) gives $\eta_{2026} = 10^{12}$ FLOP/J. Contemporary humanity sits at $K \approx 0.73$, three-quarters of the way to Type I. At Type I and $f = 1\%$, available compute is, within an order of magnitude, one personal AI's worth of cognition per human inhabitant; at Type II it is essentially incomprehensible. Three trajectories for frontier compute through 2035 are reported as conditional projections, not predictions. Whether the long-run binding constraint is energy or efficiency depends on engineering choices not yet made; the political economy of who has access may matter more than either.

Editor's pickEnergy & Utilities
Guardian· Yesterday

Scotland’s ‘green datacentres’ policy ignores emissions impact of AI, analysis shows

Definition of green facilities made in 2022, before release of ChatGPT, says Action to Protect Rural Scotland A Scottish government policy designed to encourage datacentres to build in Scotland could lead to a massive volume of carbon emissions being ignored, according to an analysis by a Scottish charity. “Green datacentres” are at the heart of Scotland’s ambitions to develop economically. Enshrined in national policy, they are part of a larger, UK-wide effort to attract big AI investment to Scotland. Continue reading...

Editor's pickEnergy & Utilities
Arxiv· Today

SolarChain: Bridging Physical Law, Verifiable Trust, and Sustainable Markets for Urban Energy Resilience

arXiv:2605.23162v1 Announce Type: new Abstract: Urban decarbonization requires scaling rooftop solar across millions of fragmented producers, yet cities face a fundamental tension: energy data is easily manipulated, and economic incentives often reward speculation rather than actual infrastructure deployment. We present SolarChain, a platform that resolves both problems by anchoring digital accountability to the thermodynamic limits of solar energy conversion. Using real-time meteorological data, geospatial coordinates, and first-principles calculations of solar yield, the system establishes a hard physical boundary for every panel's maximum possible output; any reported generation exceeding this limit is automatically rejected before entering the shared ledger. This trustless verification enables a peer-to-peer marketplace with programmatic reward structures that continuously reinvest value into equipment maintenance and market liquidity, preventing the speculative hoarding that typically destabilizes blockchain-based marketplaces. When electricity is consumed, the corresponding digital credits are permanently retired in direct proportion to physical energy dissipation, creating an auditable one-to-one mapping between urban consumption and carbon accounting. Deployed across heterogeneous city nodes, the prototype demonstrates resilience against data injection attacks while lowering capital barriers for community-level solar expansion. Beyond energy, the framework offers a general model for coordinating economic activity with physical law in any domain where distributed infrastructure demands both data integrity and sustainable investment. We release the data and code as open-access on GitHub.

Editor's pickEnergy & Utilities
FourWeekMBA· Today

OpenAI vs Utilities: The Battle for AI's Physical Infrastructure - FourWeekMBA

While everyone debates AI chatbots, the real business model war is happening in physical infrastructure — as explored in the economics of AI compute infrastructure — . Utility companies are quietly positioning themselves as the new AI kingmakers through massive data center acquisitions, ...

Editor's pickEnergy & Utilities
DIGITIMES· Yesterday

Tech Forum 2026: AI data centers turn to on-site power amid grid constraints

DIGITIMES analyst Sabrina Yu warned that artificial intelligence data centers face four major energy challenges — rising GPU thermal design power, a new high-voltage direct current architecture, persistent grid bottlenecks, and intensifying sustainability and carbon-emissions pressure on ...

Editor's pickEnergy & Utilities
New Kerala· Today

India's AI Needs Power Grid, Supply Chain: Report

A new report says India needs a resilient power grid, competitive semiconductor market, and computing supply chain to achieve its AI deployment ambitions.

Editor's pickEnergy & Utilities
CNBCTV18· Today

Schneider Electric sees India data center business outpacing core growth on AI boom - CNBC TV18

India is emerging as both a consumption ... with demand coming from hyperscalers, colocation operators, and enterprises seeking integrated infrastructure and services, she added. Schneider Electric supplies critical data center infrastructure, including UPS systems, switchgear, power distribution units, precision cooling, and energy management software, positioning it as a key vendor as AI workloads ...

Editor's pickTechnology
DIGITIMES· Yesterday

AI server boom squeezes Samsung Electro-Mechanics' component supply

Samsung Electro-Mechanics is emerging as another beneficiary of the AI data center buildout, as demand for high-end capacitors and package substrates pushes parts of its component business closer to full capacity.

AI Models & Capabilities4 articles
Editor's pick
Arxiv· Today

The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

arXiv:2605.23024v1 Announce Type: new Abstract: Large language models now write software, draft legal documents, and produce clinical notes, yet fundamental limits, from Turing and Arrow to the No Free Lunch theorems, shape what computation can do. This thesis turns such impossibility results from curiosities into design rules. Its flagship result proves an accuracy ceiling set by architecture alone: past a critical reasoning depth, no amount of training moves it, at any adapter rank, sample size, or loss function. Computable before deployment from layer count and embedding width, this Deterministic Horizon is measured between nineteen and thirty-one across twelve transformer architectures, and fine-tuning on optimal-length traces recovers under four percentage points. The mechanism is a capacity invariant of the residual stream, and an information-theoretic conversion yields super-exponential accuracy decay past the horizon. An unconditional circuit-complexity lower bound for modular exponentiation against constant-depth prime-modulus circuits complements this result. The same argument recasts across subfields: preference learning under any misspecified model jumps discontinuously in sample complexity; multi-stage retrieval pipelines require at least as many independent metrics as stages; standard truthful auctions fail for agents with prompt-dependent valuations; and zero-knowledge verification of neural inference pays a measured overhead of one hundred ten to one hundred ninety times per non-linear activation. Together these form a catalogue of sixteen specifications, each pairing a computable boundary, a quantified violation cost, and a constructive design rule: two compositions are proved, one pairing is an honest obstruction, and four remain open. The impossibility-specification methodology is offered for the generative research programme that trustworthy AI may need. Every fundamental limit of AI is also a design rule.

Editor's pick
Arxiv· Today

ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization

arXiv:2605.22885v1 Announce Type: new Abstract: Formal mathematics libraries are rapidly expanding, creating a growing need to refactor verified proofs for maintainability and to improve training data quality for neural provers. However, scalable proof optimization is hindered by heterogeneous and heuristically specified objectives, scarce data, and high training and inference costs. To overcome these challenges, we introduce ImProver 2, a neurosymbolic framework for automated proof optimization in Lean 4. ImProver 2 combines a data-efficient expert-iteration pipeline with a scaffold that exposes formal structure alongside lightweight informal abstractions. We further introduce a suite of metrics capturing structural proof properties. Using ImProver 2, we train a 7B-parameter model that outperforms orders-of-magnitude larger models within the same model family, and is competitive with mid-tier frontier models across metrics. We additionally demonstrate that our neurosymbolic scaffold significantly improves performance across both small and frontier models. We show that with proper scaffolding and training, small models can effectively restructure research-level proofs over complex and varied metrics, matching substantially larger systems and establishing proof optimization as a scalable, learnable task.

Editor's pickGovernment & Public Sector
Arxiv· Today

Benchmarking LLMs for Community Governance Simulation with Life-history Narratives

arXiv:2605.23783v1 Announce Type: new Abstract: Effective community governance hinges on understanding what specific residents think and need. Recent work has used large language models (LLMs) to simulate human respondents, offering a scalable, reproducible way to study human attitudes and behaviors at low cost. However, these studies typically prompt the model with just a few demographic variables (age, gender, income), simulating only general role types. This is insufficient for community governance, where decisions depend on the views of specific residents. We bridge this gap with an integrated research framework covering dataset, benchmark, algorithm, and system. The dataset comprises approximately 1.2 million characters of first-person narrative collected through two-hour semi-structured interviews with each of 92 residents in an urban community, organized around nine community-governance domains. The benchmark probes 18 mainstream LLMs across four prompting strategies and shows that adding rich life-history profiles meaningfully raises fidelity above the no-profile baseline, but this gain comes with more input tokens per call from the longer prompts they require. The algorithm, curriculum-LoRA, is a parameter-efficient personalization framework that, by closing this fidelity-cost gap, matches the strongest baseline's fidelity at roughly 10x lower per-call cost and Pareto-dominates every configuration tested. The system integrates curriculum-LoRA into a closed-loop policy-evaluation pipeline. Together, these results bring individual-level LLM-based resident simulation within reach of resource-constrained local administrations, enabling community-governance decisions to be systematically pre-evaluated in silico before real-world deployment.

Editor's pick
Arxiv· Today

PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning

arXiv:2605.23074v1 Announce Type: new Abstract: The emergence of Large Reasoning Language Models (LRMs) has paved the way for tackling complex reasoning tasks through test-time scaling by generating long-form Chain-of-Thought (CoT) trajectories during inference. Meanwhile, these trajectories often contain explicit reflection markers such as ``wait'', ``but'', and ``alternatively'', signaling hesitation, revision, and the consideration of alternative explorations, respectively. Recent studies on test-time control leverage such markers as lightweight handles for steering reasoning, typically treating them as a single coarse-grained category rather than distinguishing their distinct functional roles. In this paper, we conduct type-wise suppression and fixed-prefix intervention, revealing that reflection markers differ not only in their functional roles but also in when they exert the greatest influence. Specifically, different marker classes affect accuracy and generation length in distinct ways, and marker choices are most consequential before the model settles into a stable reasoning trajectory. Motivated by these findings, we introduce PathCal, a novel training-free decoding controller that calibrates reasoning paths by distinguishing marker types and intervening only at locally uncertain states. At each decoding step, PathCal utilizes the distribution over reflection-markers to estimate local competition between maintaining the current reasoning trajectory and initiating a competing branch, and softly rebalances marker logits when competing-branch evidence becomes excessive. Experiments across six reasoning benchmarks demonstrate that PathCal achieves a better efficiency--performance trade-off, improving or preserving accuracy while reducing generation length, without relying on external verifiers or additional sampling.

AI Research & Science1 articles
Editor's pickTechnology
Arxiv· Today

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

arXiv:2605.22878v1 Announce Type: new Abstract: The exponential growth of global academic output has confronted researchers and AI agents with an unprecedented ``information explosion,'' where fragmented and unstructured knowledge organization impedes deep interdisciplinary integration. Current academic retrieval tools predominantly rely on superficial keyword matching or vector-space semantic retrieval, which lack the topological reasoning capabilities required to navigate complex logical connections. Agentic deep-research-based frameworks are often prone to logical hallucinations and consuming high inference costs. To bridge this gap, in this report, we introduce SciAtlas, a large-scale, multi-disciplinary, heterogeneous academic resource knowledge graph designed as a panoramic scientific evolution network. By integrating over 43M papers from 26 disciplines, and a total of 157M entities and 3B triplets, SciAtlas provides a structured topological cognitive substrate that dismantles disciplinary barriers and furnishes AI agents with a global perspective. Furthermore, we develop a neuro-symbolic retrieval algorithm featuring tri-path collaborative recall and graph reranking, achieving a seamless transition from simple semantic matching to deterministic association discovery. We also present key application directions of SciAtlas, including literature review, automated research trend synthesis, idea positioning, and academic trajectory exploration, to demonstrate that SciAtlas can serve as an effective ``cognitive map'' to empower the full loop of automated scientific research while significantly reducing reasoning costs. We have released the interfaces for KG retrieval and various downstream tasks in our GitHub repo.

AI Security & Cybersecurity3 articles
Editor's pickTechnology
Arxiv· Today

NeuroNL2LTL: A Neurosymbolic Framework for Natural Language Translation of Linear Temporal Logic

arXiv:2605.22874v1 Announce Type: new Abstract: Effectively translating between natural language (NL) and formal logics like Linear Temporal Logic (LTL) requires expertise that limits formal verification's reach in safety-critical development. Template-based approaches sacrifice expressiveness for reliability; neural methods achieve fluency but provide no correctness guarantees. We present NeuroNL2LTL, a neurosymbolic architecture unifying learned translation with formal verification. NeuroNL2LTL routes translation through an intermediate representation whose mapping to LTL is structure-preserving by construction. Generated specifications undergo satisfiability and non-triviality checking; a minimal-edit repair mechanism corrects near-miss outputs before they reach downstream tools. The central innovation is verifier-in-the-loop training: verification outcomes serve as reward signals for reinforcement learning, producing neural components that optimize directly for formal correctness. On 200,000+ requirements spanning aerospace, robotics, autonomous vehicles, and ten additional domains, NeuroNL2LTL achieves 28\% semantic equivalence with reference specifications while ensuring 86\% of outputs are verified satisfiable. The system also generates contextually grounded explanations from LTL, enabling domain experts to validate specifications without specialized training. This work demonstrates that formal verification can function as both training objective and runtime filter for neural specification systems, allowing us to build neural-based tools whose reliability derives from logical guarantees rather than statistical confidence.

Editor's pickTechnology
SQ Magazine· Today

Meta and Google AI Models Exposed by Guardrail Flaw

Researchers reportedly bypassed safety guardrails on Meta and Google AI models within minutes, raising major security and compliance issues.

Adoption, Deployment & Impact

15 articles
AI Applications9 articles
Editor's pickPAYWALLConsumer & Retail
WSJ· Today

AI Expands From Multibillion-Dollar Enterprises to Main Street

Artificial-intelligence agents scraped a bakery’s spreadsheets to help manage its growth.

Editor's pickHealthcare
Arxiv· Today

Iy\`aw\'oBench: A Benchmark for Evaluating Large Language Model Clinical Triage Accuracy on Undifferentiated Febrile Illness in Nigerian Primary Health Settings

arXiv:2605.23465v1 Announce Type: new Abstract: Background. Undifferentiated febrile illness is the leading cause of primary care outpatient visits in Nigeria, yet no validated benchmark exists for evaluating large language model (LLM) clinical triage reasoning in West African primary health settings. Methods. We introduce Iy\`aw\'oBench v1.0, a dataset of 200 synthetic clinical vignettes across eight febrile illness categories derived from statistical distributions of 1,200 real patient encounters at 19 primary health centres (PHCs) in Oyo State, Nigeria. Six LLMs were evaluated on structured triage classification across two metrics: triage accuracy and safety score. Results. All six models achieved 100% safety scores (95% CI: 96.4-100.0%), never downgrading a critical REFER NOW case to TREAT HERE. Triage accuracy varied substantially: Claude Sonnet (claude-sonnet-4-5) 67.5% (95% CI: 60.8-73.7%), Llama 4 Scout 59.5% (52.5-66.2%), Llama 3.3 70B 43.0% (36.2-50.0%), and Llama 3.1 8B 39.0% (32.4-45.9%). Two models demonstrated near-zero accuracy attributable to structured output non-compliance. Conclusions. Modern LLMs exhibit safe triage behaviour but vary substantially in structured clinical accuracy. Clinically engineered systems with embedded WHO guidelines outperform general-purpose models by up to 28.5 percentage points. Iy\`aw\'oBench provides the first reproducible evaluation framework for LLM clinical decision support in West African primary care.

Editor's pickHealthcare
Bebeez· Today

Finland’s Grundium acquires Denmark’s Visiopharm to build an end-to-end AI precision pathology platform

Grundium, a Tampere-based startup specialising in digital pathology imaging technology, backed by US-based healthcare private equity firm EW Healthcare Partners, has acquired Visiopharm, a Denmark-based provider of AI-driven precision pathology software. The combined business merges complementary capabilities from Grundium’s imaging platform and Visiopharm’s AI-driven precision pathology software, creating an accessible end-to-end solution for diagnostic laboratories, […]

Editor's pickProfessional Services
Lilach Bullock· Yesterday

AI for Service Businesses: The Practical 2026 Guide

AI for service businesses in 2026 looks different from AI for ecommerce. Here's the practical guide for consultants, coaches, agencies. Real workflows, real numbers, no fluff.

Editor's pickFinancial Services
The Hindu BusinessLine· Today

Lenders lap-up AI to break language barriers, slash lending timelines - The HinduBusinessLine

Lenders utilize AI to enhance efficiency, eliminate language barriers, and streamline lending processes for faster loan approvals.

Editor's pickTechnology
Medium· Today

Claude Has 3 Modes: Chat, Co‑Work, and Code — Most People Only Use One (And Wonder Why It Feels Slow) | by Divy Yadav | AI Engineering Simplified | May, 2026 | Medium

Deep dives into how real AI systems work and how to build systems that work in production.

Editor's pickPAYWALLHealthcare
Bloomberg· Today

Apple Watchに変革を、Whoopやオーラ台頭でヘルスケアアプリに課題-Power On

健康ウエアラブル端末市場で競争激化、AI時代への対応課題に

Editor's pick
Bebeez· Today

Perplant raises €1 million to equip tractors with AI “eyes” to cut herbicide use and boost profits for farmers

Perplant, a Danish AgTech startup on a mission to democratise AI in agriculture by supporting sustainable farming with a cost-effective, plug & play and AI-based camera sensor, has raised €1 million in investment.  The funding round was supported by private angel investors and industry leaders with deep expertise in agriculture and retail. This capital injection […]

Editor's pickTransportation & Logistics
Guardian· Today

Delivery robots are spreading across LA. Residents ‘both pity and hate them’

A region known for its lack of walkability now has more obstacles for pedestrians to contend with Robots have taken over Los Angeles. It’s not just the AI-generated videos that have caused angst in Hollywood. Our streets are full of driverless Waymo vehicles, covered in more sensors and gadgets than the Batmobile. And our walkways are home to fleets of boxes on wheels, hurrying past pedestrians and navigating outdoor bar-hoppers as the robots deliver smoothies and keto-friendly salads. Continue reading...

AI Organisational Change1 articles
Editor's pickProfessional Services
Arxiv· Today

Redrawing the AI Map: A Theory of Accountability Boundaries in Agentic Ecosystems

arXiv:2605.23179v1 Announce Type: new Abstract: Agentic AI orchestrators reduce the interface and assembly costs of composing information systems capabilities across organizational boundaries, seemingly accelerating modularization and organizational disaggregation. Yet AI-enabled capabilities whose outputs require evidence, review, signoff, or assignable responsibility may retain integrated accountability boundaries even when their technical interfaces become modular. We develop a capability-level theory of accountability-boundary placement in agentic ecosystems. We introduce accountability assets: complementary assets that make AI-supported outputs legitimate, auditable, reviewable, and assignable to a responsible party. We argue that verification cost and responsibility transferability determine whether the execution and accountability boundaries can move together. The theory identifies three boundary strategies: component, integrated, and dual-track. It also introduces rule debt, the governance burden that accrues when organizational decision rules migrate from formal information systems into ungoverned agentic execution environments. Integrating digital innovation, transaction cost, complementary-assets, digital platform governance, and IS control perspectives, we develop seven propositions linking agentic assembly-cost reductions, accountability assets, appropriability, orchestrator intent capture, and boundary misconfiguration to boundary strategy, value appropriation, and rule debt. The theory explains when digital modularization extends to organizational disaggregation and when accountability keeps capabilities integrated. Structured illustrations across document processing, legal services, audit, clinical decision support, and procurement discipline the boundary logic.

Geopolitics, Policy & Governance

11 articles
AI Geopolitics1 articles
Editor's pickDefense & National Security
Arxiv· Today

Strategic Coercion Within Alliances: The Greenland Sovereignty Game as an AI Stress Test

arXiv:2605.22841v1 Announce Type: cross Abstract: What happens when the strongest alliance member pressures a weaker member over territory and strategic control? We examine the Greenland sovereignty crisis as a stress test for LLM geopolitics, centered on the 2019-2026 U.S. push to acquire Greenland from the Kingdom of Denmark. The crisis nests two collective-action problems: Arctic strategic control and whether NATO can enforce alliance norms against the dominant member. We develop three games (asymmetric coercion; a NATO assurance game with a critical-mass tipping point; a triadic extensive-form game with social preferences) and test them with a multi-agent simulation in which eight frontier LLMs play six geopolitical roles (United States, Denmark, Greenland, NATO, Russia, Canada) across 3,604 completed games and 108,120 action observations. Using inverse game theory, we recover each model's structural utility parameters (alpha, beta, gamma, delta, eta) for material self-interest, reciprocity, inequality aversion, norm respect, and commitment consistency. Three findings stand out. First, all eight models become more escalatory under coercion framing (four-action escalation rises from 10.7% to 28.6%). Second, Chinese-origin models show systematically different power-weight profiles from Western-origin models when playing the U.S. role. Third, peaceful US acquisition emerges in only 1.9% of clean games and only 3 of 8 frontier models ever achieve it, most prominently DeepSeek V3.2, which executes a stable five-round playbook through the metropole. Prompts emphasizing jus cogens and self-determination reduce escalation back near baseline in the English-only confirmatory sample; multilingual contrasts are reported as exploratory sensitivity checks. We position this as a structural benchmark for LLM geopolitical behavior, complementing action-frequency benchmarks.

AI Policy & Regulation6 articles
Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.