Mon 25 May 2026
Daily Brief — Curated and contextualised by Best Practice AI
Generative AI Reshapes Labor, McKinsey Rethinks Pricing, and ECB Calls for Urgent Fixes
TL;DR Generative AI is reorganizing labor demand, challenging traditional work structures. McKinsey and peers are rethinking pricing models as clients demand value-based fees. The ECB has summoned banks to address risks exposed by AI models, highlighting systemic vulnerabilities. Huawei claims a chip breakthrough, potentially closing the gap with TSMC. AI spending pressures enterprises to shorten SaaS contracts and seek better pricing protections.
The stories that matter most
Selected and contextualised by the Best Practice AI team
ECB summons banks to urge them to fix flaws exposed by latest AI models
Supervisor to stress seriousness of risks to financial system at hastily arranged meeting
Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
arXiv:2605.22883v1 Announce Type: new Abstract: Current AI energy benchmarks measure consumption at the granularity of a single model invocation or training run. For classical single-turn workloads this unit remains coherent. For agentic systems - where a single user goal may trigger multi-step orchestration, tool calls, retries, and failure-recovery cycles - the invocation count is an implementation artifact rather than a task property, and inference-level normalization misrepresents the energy cost of goal completion. We present A-LEMS (Agentic LLM Energy Measurement System), a cross-layer measurement framework that redefines the unit of AI energy accounting from energy per inference to Energy per Successful Goal (EpG). EpG aggregates total workflow energy across all execution attempts, including failures and retries, normalized by successfully completed goals. A-LEMS formalizes energy attribution through a temporal boundary model, a five-layer observation pipeline mapping RAPL signals to workflow-level energy, and a reproducibility protocol binding every measurement to hardware and runtime configuration. Building on EpG, we define the Orchestration Overhead Index (OOI), isolating the energy cost of orchestration relative to linear execution under identical task criteria. Across five reasoning and three tool-augmented task families, agentic workflows consume 4.33x higher mean energy per successful goal than linear baselines (888.1 J vs 205.3 J). This overhead is driven by orchestration structure, not inference compute. For tool-augmented tasks, OOI inverts below 1.0x: agentic execution is cheaper than linear, confirming the metric captures orchestration structure rather than a fixed upward bias. These findings establish that energy-per-inference is insufficient for agentic AI. EpG and OOI provide the measurement foundation for accurate benchmarking, where orchestration structure is the primary determinant of energy cost.
Redrawing the AI Map: A Theory of Accountability Boundaries in Agentic Ecosystems
arXiv:2605.23179v1 Announce Type: new Abstract: Agentic AI orchestrators reduce the interface and assembly costs of composing information systems capabilities across organizational boundaries, seemingly accelerating modularization and organizational disaggregation. Yet AI-enabled capabilities whose outputs require evidence, review, signoff, or assignable responsibility may retain integrated accountability boundaries even when their technical interfaces become modular. We develop a capability-level theory of accountability-boundary placement in agentic ecosystems. We introduce accountability assets: complementary assets that make AI-supported outputs legitimate, auditable, reviewable, and assignable to a responsible party. We argue that verification cost and responsibility transferability determine whether the execution and accountability boundaries can move together. The theory identifies three boundary strategies: component, integrated, and dual-track. It also introduces rule debt, the governance burden that accrues when organizational decision rules migrate from formal information systems into ungoverned agentic execution environments. Integrating digital innovation, transaction cost, complementary-assets, digital platform governance, and IS control perspectives, we develop seven propositions linking agentic assembly-cost reductions, accountability assets, appropriability, orchestrator intent capture, and boundary misconfiguration to boundary strategy, value appropriation, and rule debt. The theory explains when digital modularization extends to organizational disaggregation and when accountability keeps capabilities integrated. Structured illustrations across document processing, legal services, audit, clinical decision support, and procurement discipline the boundary logic.
The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems
arXiv:2605.23024v1 Announce Type: new Abstract: Large language models now write software, draft legal documents, and produce clinical notes, yet fundamental limits, from Turing and Arrow to the No Free Lunch theorems, shape what computation can do. This thesis turns such impossibility results from curiosities into design rules. Its flagship result proves an accuracy ceiling set by architecture alone: past a critical reasoning depth, no amount of training moves it, at any adapter rank, sample size, or loss function. Computable before deployment from layer count and embedding width, this Deterministic Horizon is measured between nineteen and thirty-one across twelve transformer architectures, and fine-tuning on optimal-length traces recovers under four percentage points. The mechanism is a capacity invariant of the residual stream, and an information-theoretic conversion yields super-exponential accuracy decay past the horizon. An unconditional circuit-complexity lower bound for modular exponentiation against constant-depth prime-modulus circuits complements this result. The same argument recasts across subfields: preference learning under any misspecified model jumps discontinuously in sample complexity; multi-stage retrieval pipelines require at least as many independent metrics as stages; standard truthful auctions fail for agents with prompt-dependent valuations; and zero-knowledge verification of neural inference pays a measured overhead of one hundred ten to one hundred ninety times per non-linear activation. Together these form a catalogue of sixteen specifications, each pairing a computable boundary, a quantified violation cost, and a constructive design rule: two compositions are proved, one pairing is an honest obstruction, and four remain open. The impossibility-specification methodology is offered for the generative research programme that trustworthy AI may need. Every fundamental limit of AI is also a design rule.
AI spending forces enterprises to shorten SaaS deals and demand new pricing protections
Rising enterprise investment in AI tools is prompting customers to compress traditional software-as-a-service contracts and extract stronger commercial protections, executives and reporting said. Over the past several months, buyers in the US and global markets moved to shorten multi-year ...
AI Expands From Multibillion-Dollar Enterprises to Main Street
Artificial-intelligence agents scraped a bakery’s spreadsheets to help manage its growth.
Economics & Markets
Sakura Internet Eyes More Spending to Meet Japan’s AI Demand
Sakura Internet Inc.’s chief said the company may need to hike its capital spending by nearly seven times its initial plan to keep up with artificial intelligence demand in Japan.
Leveraging Large Language Models for Sentiment Analysis: Multi-Modal Analysis of Decentraland's MANA Token
arXiv:2605.20192v1 Announce Type: cross Abstract: Decentraland, a decentralized virtual reality platform operating within the expanding Metaverse ecosystem, utilizes its native MANA token to facilitate virtual asset transactions and governance. This study investigates the integration of Discord community sentiment with multi-modal financial data to enhance cryptocurrency price prediction within virtual world economies. We address: (1) identifying sentiment patterns within Decentraland's Discord community, and (2) evaluating the impact of multi-modal features on token return forecasting. Using a BERT-based large language model for sentiment analysis, we develop two LSTM architectures: a baseline incorporating historical prices and a multi-modal variant integrating sentiment scores, trading volume, and market capitalization. Results indicate predominantly neutral community sentiment with a positive skew. The multi-modal model significantly outperforms the price-only baseline in prediction accuracy. These findings demonstrate the predictive value of community-derived signals for virtual economy forecasting and establish a foundation for future research at the intersection of immersive virtual environments, natural language processing, and cryptocurrency market analysis.
How AI Disruption Fears and Cloud Optimism Will Shape Atlassian’s (TEAM) Investment Narrative - Simply Wall St News
In recent weeks, Atlassian has faced mixed headlines as a laid-off engineer’s detailed YouTube walkthrough of its products stoked competitive and AI-disruption worries, while management highlighted AI-driven cloud growth and restructuring efforts aimed at funding further investment in artificial ...
How Visteon’s Dividend And AI Cockpit Wins At Visteon (VC) Have Changed Its Investment Story
Earlier this week, Visteon Corporation’s board declared a regular quarterly dividend of US$0.375 per share, payable on June 15, 2026 to shareholders of record as of June 1, 2026. This dividend decision comes as Visteon balances recent macro-driven volatility with solid first-quarter results, ...
On Call with Jonathan Yip, Head of Innovation Banking, Asia at HSBC on the Future of Innovation Capital and CFOs - Insignia Business Review
“The demand on credit is certainly broadening. Whether you’re building applications or foundational models, the hardware and chips that power those models, or data centers and energy, each innovator and each investor working on that opportunity requires a different form of capital.” ...
Huawei's 'Tau Law' Sparks Surge Across Semiconductor Supply Chain! SMIC and Hua Hong Semiconductor See Rare 20% Gains in A-Share Market—Which Segments Stand to Benefit?
OnMay 25, the STAR Market 50 Index surged nearly 6% in the afternoon session. Semiconductor-related sectors—including chip semiconductors, advanced packaging, and memory chips—rose sharply. SMIC jumped by 20% to hit its daily trading limit in the final minutes of trading, reaching a record-high ...
In the era of artificial intelligence (AI), global mega institutional investors are changing their i.. - MK
In the era of artificial intelligence (AI), global mega institutional investors are changing their investment strategies.Institutional investors are moving away from the role of investors (LPs) that o..
Legal & General Group Plc Lowers Stake in Duke Energy Corporation $DUK - Ticker Report
Positive Sentiment: Goldman Sachs ... and the company’s expansion plans. Why Duke Energy (DUK) Is Becoming a Data Center Power Demand Play · Positive Sentiment: Utility-sector demand tailwinds linked to AI infrastructure are also supporting sentiment toward Duke Energy and ...
5 big analyst AI moves: ASML, Dell and Nokia flagged as top picks By Investing.com
The bullish case rests on three drivers. First, UBS pushes back against market fears that ASML could become a bottleneck constraining semiconductor supply, arguing those concerns are overstated.
AI Chip Rally: NVDA, MU, QCOM Draw Bullish Upgrades
UBS sees 30% upside on Nvidia, four banks back Micron, and Qualcomm jumps 11.6% as AI semiconductor optimism broadens across Wall Street.
GENSTRAT: Toward a Science of Strategic Reasoning in Large Language Models
arXiv:2605.23238v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as economic agents in marketplaces, auctions, and bidding settings. Anticipating their behavior in any specific deployment is hard. Existing strategic-reasoning benchmarks evaluate models on fixed canonical games. These benchmarks may saturate as the frontier improves, and they do not allow evaluators to generalize with confidence from benchmark performance to the varied and messy strategic environments that actual deployments involve. We introduce GENSTRAT, which uses procedurally generated strategic environments to address these challenges. Concretely, we generate a distribution of two-player zero-sum imperfect-information card games. The generator can draw fresh games on demand, allowing for evergreen evaluation and resistance to contamination. We pair the game distribution with a capability-profile methodology that decomposes model competence across six axes (state space, temporal depth, information sensitivity, opponent modeling, risk, and brittleness). We also introduce a jaggedness measure of within-distribution smoothness that detects when a model's advantage jumps unpredictably between strategically similar games. We sample 50 benchmark games from a 2,000-game generated pool and evaluate nine frontier and open-weight LLMs in a head-to-head tournament with over 36,000 matches. Newer frontier-tier models score higher on average. Beyond that average, models with near-identical overall strength show qualitatively different capability profiles, and two of the top three leaderboard models (gpt-5 and claude) are noticeably more locally volatile than the third (gemini-3.1-pro), despite being close in overall strength. Together, the capability profile and the jaggedness measure give a deployment-relevant diagnostic that the overall ranking alone cannot provide.
‘AI washing’: firms are scrambling to rebrand themselves as tech-focused
PR executives say UK companies are forcing them to present ordinary automation as artificial intelligence UK companies are performing “yoga-level” stretches to describe themselves as AI specialists in an attempt to capitalise on the buzz around the technology, public relations firms have said. Weary communications executives tasked with securing media coverage for brands have complained that bosses in low-tech industries or running businesses that use automation but not generative AI, are increasingly demanding they are pitched to journalists as artificial intelligence companies. Continue reading...
Google is cannibalizing the web to feed AI
Google Search used to direct users to web sites; AI Mode will keep them in Google's garden
Google has seriously leaned into AI enshittification lately
Could the Chocolate Factory's mission to reshape the web backfire?
Google tells US appeals court search monopoly ruling has ‘fundamental errors’
Google argued that a lower court's ruling on remedies in an internet search monopolization case failed to distinguish between conduct harming competitors and conduct harming competition.
In-Depth Analysis: Automatic Data Processing Versus Competitors In Professional Services Industry - Autom - Benzinga
In this article, we will undertake an in-depth industry comparison, assessing Automatic Data Processing (NASDAQ:ADP) alongside its primary competitors in the Professional Services industry. By meticulously examining crucial financial indicators, market positioning, and growth potential, we aim to ...
📈 Why AI bills rise as costs fall
Boo 👻 it's the ghost token
AI Coding Tools: A Costly Misstep? Insights from Microsoft and Uber, ETEnterpriseai
Explore the rising costs of AI coding tools as Microsoft and Uber shift strategies amid budget constraints. Understand the implications of token-based pricing and the future of enterprise AI adoption.
Microsoft reins in Claude Code usage as soaring AI costs expose cracks in enterprise adoption - Storyboard18
Microsoft is reportedly scaling back direct access to Anthropic’s Claude Code for employees and steering developers toward GitHub Copilot CLI, highlighting growing concerns over the spiralling costs of enterprise AI deployment.
AI spending forces enterprises to shorten SaaS deals and demand new pricing protections
Rising enterprise investment in AI tools is prompting customers to compress traditional software-as-a-service contracts and extract stronger commercial protections, executives and reporting said. Over the past several months, buyers in the US and global markets moved to shorten multi-year ...
Inside Europe's Tech Startup Surge That's Being Boosted by AI - Business Insider
European startups like Legora and Lovable are challenging US tech dominance, driven by AI advancements and better access to VC funding.
Irish SME venture capital investments fall more than half in first quarter - TechCentral.ie
Irish technology SMEs raised €221.7 million in venture capital in the first quarter of 2026, a fall of 58% compared to the same period last year, according to the Irish Venture Capital Association VenturePulse survey published today in association with William Fry.
Funding for Irish tech startups slumps as AI hoovers up investors’ resources | Irish Independent
Tech funding in Ireland has fallen by one of its steepest-ever declines as AI hoovers up investors’ cash to the detriment of startups here.
Irish AI health-tech xWave to create 30 jobs amid €3m funding drive
xWave Technologies has earned more than 20 NHS Trusts contracts in the UK for its diagnostic decision-making platform. Read more: Irish AI health-tech xWave to create 30 jobs amid €3m funding drive
Why Venture Capital as We Knew It Died—and What Replaced It in 2026
Venture capital is transforming. In 2026, sovereignty, DeepTech, and VC-as-a-Service have replaced the old growth-at-all-costs model. Here's what's next.
Tequipy, founded by Revolut’s former IT chief, raises over €3 million to automate global device logistics
Tequipy, a Polish-British startup that ships, services, and retrieves employee IT devices, has raised over €3 million in funding to expand its platform beyond hardware into software and security operations. The round was led by Smedvig Ventures, with participation from Manta Ray and Unfold.vc. “I’ve seen ambitious, talented IT specialists, who should have been building […]
Labor, Society & Culture
One Job That Is Growing in the A.I. Era? Cybersecurity Experts.
Demand for security engineers has surged as artificial intelligence generates a glut of new code and models like Anthropic’s Mythos create new concerns.
Defining AI Fatigue in Academic Contexts: Dimensions, Indicators, and a Stage-Based Model Using Grounded Theory
arXiv:2605.23123v1 Announce Type: new Abstract: The integration of AI tools in academic settings has introduced a distinct form of strain that existing frameworks like technostress and digital fatigue have not yet fully addressed. This study develops a conceptual model and identifies the dimensions that define AI fatigue as a form of strain arising from sustained academic use of AI tools. Using grounded theory analysis of open-ended responses from 1,054 university students across three universities in the Philippines, the study examined the cognitive, motivational, emotional, physical, and attentional pressures students experienced during AI-supported academic work. Analysis produced five dimensions of AI fatigue, namely Cognitive Overload, Motivational Disengagement, Moral Unease, Physical Strain, and Attentional Drift, each consisting of two indicators grounded in participant accounts. The findings also yielded the AI Fatigue Model, a stage-based framework that explains how these pressures accumulate and reinforce one another across repeated AI interaction in academic tasks. These contributions establish a conceptual and exploratory foundation for AI fatigue as a distinct construct and provide a basis for future instrument validation, scale development, and cross-contextual inquiry in academic settings where AI now mediates student learning.
Reuters AI News | Latest Headlines and Developments | Reuters
Fears are growing among workers as banks offer more frank assessments about how AI could replace their jobs.
Linus Torvalds to ‘start being more hardnosed’ about ‘pointless pull requests’ – some of which come from AIs
Warns large release candidates ‘are *not* conducive to long-term stability’
Vast majority of executives expect AI layoffs soon, survey says | Mashable
A new corporate survey shows execs are leaning into AI despite uncertainty, and they're prepared to reduce their workforces to do it.
Engagement-Optimized Care: When LLMs become Mental Health Infrastructure
arXiv:2605.23787v1 Announce Type: new Abstract: General-purpose LLMs are increasingly functioning as mental health infrastructure due to gaps in care left by provider shortages, inadequate insurance coverage, social isolation, and stigma around formal help-seeking. This shift poses a distinct problem for AI ethics: systems neither designed nor governed as care technologies are being used as such, while their dominant design incentives optimize for engagement rather than user well-being. We present findings from a qualitative, longitudinal study with 18 US-based participants who use general-purpose LLMs for socioemotional support and participated in one or more of our study phases, including initial interviews, a four-week diary study, focus groups, and exit interviews. Participants turned to LLMs because other forms of support were unavailable, unaffordable, socially costly, or inadequate. As they continued to use these systems, design features such as anthropomorphic cues, default validation, persistent responsiveness, and weak disengagement mechanisms shaped their ongoing reliance. Participants described meaningful support alongside dependency, epistemic distortion through one-sided validation, privacy expectations without corresponding legal protection, and continued use despite awareness of these risks. We argue these dynamics reflect a structurally unfair tradeoff: users accept risks because support is otherwise absent, while available systems are optimized to deepen engagement and lack care-based accountability. The paper makes three contributions: it traces the arc through which LLMs become care infrastructure and identifies distinct ethical tensions at each stage, shifts analysis from turn-based exchanges to longitudinal trajectories of use, and argues that accountability belongs at the design and incentive conditions through which these systems become care infrastructure rather than at the output or crisis-response layer.
Whose Good, Whose Place? The Moral Geography of Agentic AI for Social Good
arXiv:2605.22995v1 Announce Type: new Abstract: Agentic AI systems are increasingly proposed for social-good domains, often invoking the United Nations Sustainable Development Goals (SDGs) as a vocabulary of global benefit. Yet claims of social good do not establish accountability to the communities a system claims to serve. We present a structured survey of 112 papers on agentic AI for social good published between 2015 and 2026. We find a moral-geographic asymmetry: papers are least likely to specify geographic context in precisely the domains where local political, legal, and cultural context matters most. Across the corpus, 82 of 112 papers (73%) specify no geographic context. Papers aligned with health or physical/ecological SDGs specify geography 37-40% of the time, while papers aligned with institutional and social-policy SDGs do so only 13%. SDG 16, peace, justice, and strong institutions, is both the most-covered goal in the corpus and the one with the lowest geographic-specification rate. We interpret this as moral abstraction: agentic AI for social good often treats institutional good as universal in ways it does not treat health or ecological good. A second finding compounds this: only 28 of 112 papers (25%) report any real-world deployment or small-scale test. We identify five accountability gaps and propose a minimal reporting standard for more context-specific, participatory, and accountable agentic AI for social good.
Pope elevates AI ethics to a religious imperative with first encyclical - The Washington Post
In "Magnifica Humanitas," he fires a broadside against AI companies, warning of the technology's dangers in the same way Pope Francis did about climate change.
Mediative Fuzzy Logic: From Type-1 Foundations to Type-2, Type-3 and Quantum Extensions
arXiv:2605.22900v1 Announce Type: new Abstract: Mediative Fuzzy Logic was conceived as a practical scheme for reconciling hesitant or conflicting assessments in fuzzy control and decision-making. However, its logical and semantic foundations remain underdeveloped, especially beyond operational type-1 settings. This article develops a unified account of the type-1 core together with interval type-2, granular type-3, and quantum extensions. We characterize the mediative operator as a convex aggregation controlled by hesitation and contradiction, model mediative truth values as independent truth-falsity pairs in a continuous bilattice-like structure, and introduce a propositional system extending a standard t-norm-based fuzzy logic with a mediative connective. We establish soundness, paraconsistency, and conservativity over the underlying fuzzy base for formulas without mediation, and formulate coherent semantic extensions to interval type-2 truth values, granule-indexed local evaluations, and effects and density operators on Hilbert spaces. An autonomous-braking sensor-fusion example illustrates how the framework supports transparent, conservative, and safety-first decisions under incomplete, heterogeneous, and mildly contradictory evidence. Under suitable assumptions, the higher-level formulations reduce to the type-1 case, clarifying coherence across levels and reliably supporting future work in intelligent decision systems.
Technology & Infrastructure
Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems
arXiv:2605.23109v1 Announce Type: new Abstract: AI agents increasingly excel at generating, testing, and refining code. However, they fall short on tasks requiring formal guarantees of full coverage that testing alone cannot provide. Distributed systems are a prime example: properties such as consistency between reads and writes must hold under every possible interleaving of events. Mechanized formal verification can guarantee such correctness, but typically demands months to years of expert effort. As evidence, even SOTA coding agents (Codex with GPT-5.4 and Claude Code with Opus 4.6) succeed on only 2/7 distributed key-value-store specifications. In this paper, we present the first effective approach to addressing this gap, Inductive Deductive Synthesis (IDS), which jointly and incrementally synthesizes implementation and proof, and learns from failed attempts to systematically try promising strategies. Built as an agentic LLM system, IDS achieves 7/7 in about 6.8 hours and $106 per spec on average, roughly 200x faster than expert effort and 17% cheaper than SOTA agents. IDS further incorporates performance feedback into the same loop, yielding implementations up to 3x faster than published verified systems.
This week's Claude OS update: The Agentic Expansion Cascade - FourWeekMBA
While business leaders debate AI ... and AI strategy — timelines, Anthropic’s latest Claude OS update signals something more immediate: the systematic transformation of how enterprises will structure operations within months, not years. The infrastructure for autonomous business processes isn’t coming—it’s here. The Business Engineer’s latest analysis introduces the “Agentic Expansion ...
Foundation Protocol: A Coordination Layer for Agentic Society
arXiv:2605.23218v1 Announce Type: new Abstract: Autonomous agents are moving from tools into a layer of social infrastructure: they browse, purchase, deploy software, manage systems, and increasingly interact with one another. As these systems scale, the bottleneck shifts away from raw model capability toward coordination. Agents need to form reliable relationships, organize multi-agent work, exchange value, support an AI economy, and stay safe and accountable under real-world oversight. This paper introduces the Foundation Protocol (FP), a graph-first coordination layer for an emerging human-AI society. FP unifies heterogeneous entities, including agents, tools, resources, humans, institutions, and organizations, and supports native multi-party organization and event-based collaboration. It also provides economic primitives for metering, receipts, and settlement, and treats policy, provenance, and audit as first-class concerns. FP is designed to wrap and bridge existing protocols rather than replace them, enabling incremental adoption while reducing integration and governance overhead. The aim is to keep autonomous agency composable while keeping accountability non-negotiable, so that coordination itself can become shared infrastructure for a human-AI society that is open, pluralistic, and governable.
BOHM: Zero-Cost Hierarchical Attribution for Compound AI Systems
arXiv:2605.22866v1 Announce Type: new Abstract: Compound AI systems route tasks through hierarchies of specialised components. Attribution is dominated by Shapley-based methods (SHAP), which decompose a coalition value function into per-component marginal contributions and require evaluation of the system on arbitrary component subsets. That requirement fails for third-party APIs, opaque endpoints, and agentic orchestrators that concentrate routing on a few tools, leaving most coalitions un-evaluable from the deployed orchestrator. We introduce BOHM, which extracts a hierarchical attribution tree directly from the routing weights such systems already maintain: leaf attribution is the path product of root-to-leaf routing weights; level-k attribution is the induced distribution over depth-k nodes. The method has zero marginal cost, requires no access to component internals, and provides multi-resolution attribution at every level simultaneously, which flat methods cannot offer at any evaluation budget. BOHM and SHAP answer different questions and converge when the deployed router routes near-optimally. On 18 LLMs in a 3-level hierarchy over 880 LiveCodeBench problems, BOHM yields Kendall tau=0.928; SHAP reaches tau=0.980 at 9,000x more coalition evaluations per seed. On a 5-driver, 7-benchmark agentic study (35 cells, complete coverage), drivers concentrate routing on a single tool (top-share median 0.65), and cell-level tau(BOHM,SHAP) is predicted by whether the driver's top pick is the empirically best tool (mean +0.22 vs ~+0.01). On a US Census hierarchy (475 leaves, 4 levels), BOHM recovers ground-truth rankings at every level (tau up to 0.722). BOHM satisfies efficiency, monotonicity, symmetry, and weak suppression but not Shapley's additivity. It is best understood as a complementary primitive: a multi-resolution decomposition computable wherever routing state exists, whose disagreement with Shapley is itself diagnostic.
RMA: an Agentic System for Research-Level Mathematical Problems
arXiv:2605.22875v1 Announce Type: new Abstract: We present $\textbf{Research Math Agents (RMA)}$, an agentic framework for automated reasoning on research-level mathematical problems. Unlike prior studies centered on competition mathematics or formal theorem proving, RMA targets research-level mathematical problems that require long-horizon reasoning, literature grounding, and iterative proof refinement. RMA decomposes research-level proof solving into specialized modules for problem analysis, literature search and understanding, fair comparison, knowledge-bank construction, and proof verification, all coordinated by initializer, proposer, and verifier agents through a shared structured memory. Within this unified framework, these agents operate in a multi-role, multi-round workflow, collaboratively generating, refining, and verifying candidate proofs through iterative feedback. We evaluate RMA on the First Proof benchmark, which consists of ten research-level problems contributed by expert mathematicians across diverse domains. Through comprehensive expert evaluation, RMA outperforms strong baselines on the First Proof benchmark, including GPT-5.2R and Aletheia, solving eight out of ten research problems and producing more logically sound and readable proofs. Our comprehensive ablation studies further show that performance gains arise from the interaction of structured reasoning modules, iterative refinement, and verifier-based feedback, rather than any single component. Our solutions and implementations will be made publicly available upon acceptance.
EVE-Agent: Evidence-Verifiable Self-Evolving Agents
arXiv:2605.22905v1 Announce Type: new Abstract: Self-evolving agents should not train on examples they cannot justify. Data-free self-evolving search agents offer a scalable route to systems that generate their own questions, answer them, and improve from their own feedback without human annotations. Yet, without verifiable evidence, this loop can reward fluent but unsupported examples, turning the self-generated curriculum into an opaque and potentially unreliable training signal. We argue that evidence verifiability is a prerequisite for trustworthy self-evolution in search agents: each generated instance should include not only an answer but also a source-grounded span whose contribution to that answer can be measured. We introduce EVE-Agent, an Evidence-Verifiable Self-Evolving Agent that operationalizes this principle through a modification to the proposer--solver framework. The proposer generates a question, an answer, and a verbatim evidence span. An evidence verifier then rewards the span according to the marginal accuracy gain when the evidence is provided. This produces a training signal that favors evidence that genuinely helps answer the question, without requiring oracle answers, human labels, or external annotations. EVE-Agent leaves the backbone model, retriever, search tool, and optimization framework unchanged. Experiments show that EVE-Agent substantially improves evidence-grounded correctness over prior self-evolving search agents. The resulting curriculum is not merely self-generated but auditable by construction: each training example carries an inspectable source span that explains why it should be trusted.
No captain, my captain: Navantia floats crewless warship
Spanish shipbuilder's 75-meter drone vessel comes with sensors, modular payloads, and no room for sailors
The Cognitive Kardashev Scale: Quantifying the Material Envelope of Civilisational Computation
arXiv:2605.22840v1 Announce Type: cross Abstract: How much thinking can a civilisation do? Kardashev's (1964) typology ranks civilisations by total power: planetary (Type I, ~10^16 W), stellar (Type II, ~10^26 W), galactic (Type III). This paper builds an analogous Cognitive Kardashev Scale: how much sustained AI-grade computation each tier could support. Four ingredients enter the calculation: total power P (watts), the share f of it devoted to cognition, the efficiency $\eta$ at which energy becomes compute (operations per joule), and the brain's own processing rate $C_{\mathrm{brain}}$ as a reference unit. Anchoring on 2024-2026 hardware (El Capitan, NVIDIA Blackwell, Vera Rubin) gives $\eta_{2026} = 10^{12}$ FLOP/J. Contemporary humanity sits at $K \approx 0.73$, three-quarters of the way to Type I. At Type I and $f = 1\%$, available compute is, within an order of magnitude, one personal AI's worth of cognition per human inhabitant; at Type II it is essentially incomprehensible. Three trajectories for frontier compute through 2035 are reported as conditional projections, not predictions. Whether the long-run binding constraint is energy or efficiency depends on engineering choices not yet made; the political economy of who has access may matter more than either.
Scotland’s ‘green datacentres’ policy ignores emissions impact of AI, analysis shows
Definition of green facilities made in 2022, before release of ChatGPT, says Action to Protect Rural Scotland A Scottish government policy designed to encourage datacentres to build in Scotland could lead to a massive volume of carbon emissions being ignored, according to an analysis by a Scottish charity. “Green datacentres” are at the heart of Scotland’s ambitions to develop economically. Enshrined in national policy, they are part of a larger, UK-wide effort to attract big AI investment to Scotland. Continue reading...
SolarChain: Bridging Physical Law, Verifiable Trust, and Sustainable Markets for Urban Energy Resilience
arXiv:2605.23162v1 Announce Type: new Abstract: Urban decarbonization requires scaling rooftop solar across millions of fragmented producers, yet cities face a fundamental tension: energy data is easily manipulated, and economic incentives often reward speculation rather than actual infrastructure deployment. We present SolarChain, a platform that resolves both problems by anchoring digital accountability to the thermodynamic limits of solar energy conversion. Using real-time meteorological data, geospatial coordinates, and first-principles calculations of solar yield, the system establishes a hard physical boundary for every panel's maximum possible output; any reported generation exceeding this limit is automatically rejected before entering the shared ledger. This trustless verification enables a peer-to-peer marketplace with programmatic reward structures that continuously reinvest value into equipment maintenance and market liquidity, preventing the speculative hoarding that typically destabilizes blockchain-based marketplaces. When electricity is consumed, the corresponding digital credits are permanently retired in direct proportion to physical energy dissipation, creating an auditable one-to-one mapping between urban consumption and carbon accounting. Deployed across heterogeneous city nodes, the prototype demonstrates resilience against data injection attacks while lowering capital barriers for community-level solar expansion. Beyond energy, the framework offers a general model for coordinating economic activity with physical law in any domain where distributed infrastructure demands both data integrity and sustainable investment. We release the data and code as open-access on GitHub.
OpenAI vs Utilities: The Battle for AI's Physical Infrastructure - FourWeekMBA
While everyone debates AI chatbots, the real business model war is happening in physical infrastructure — as explored in the economics of AI compute infrastructure — . Utility companies are quietly positioning themselves as the new AI kingmakers through massive data center acquisitions, ...
Tech Forum 2026: AI data centers turn to on-site power amid grid constraints
DIGITIMES analyst Sabrina Yu warned that artificial intelligence data centers face four major energy challenges — rising GPU thermal design power, a new high-voltage direct current architecture, persistent grid bottlenecks, and intensifying sustainability and carbon-emissions pressure on ...
India's AI Needs Power Grid, Supply Chain: Report
A new report says India needs a resilient power grid, competitive semiconductor market, and computing supply chain to achieve its AI deployment ambitions.
Schneider Electric sees India data center business outpacing core growth on AI boom - CNBC TV18
India is emerging as both a consumption ... with demand coming from hyperscalers, colocation operators, and enterprises seeking integrated infrastructure and services, she added. Schneider Electric supplies critical data center infrastructure, including UPS systems, switchgear, power distribution units, precision cooling, and energy management software, positioning it as a key vendor as AI workloads ...
AI server boom squeezes Samsung Electro-Mechanics' component supply
Samsung Electro-Mechanics is emerging as another beneficiary of the AI data center buildout, as demand for high-end capacitors and package substrates pushes parts of its component business closer to full capacity.
The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems
arXiv:2605.23024v1 Announce Type: new Abstract: Large language models now write software, draft legal documents, and produce clinical notes, yet fundamental limits, from Turing and Arrow to the No Free Lunch theorems, shape what computation can do. This thesis turns such impossibility results from curiosities into design rules. Its flagship result proves an accuracy ceiling set by architecture alone: past a critical reasoning depth, no amount of training moves it, at any adapter rank, sample size, or loss function. Computable before deployment from layer count and embedding width, this Deterministic Horizon is measured between nineteen and thirty-one across twelve transformer architectures, and fine-tuning on optimal-length traces recovers under four percentage points. The mechanism is a capacity invariant of the residual stream, and an information-theoretic conversion yields super-exponential accuracy decay past the horizon. An unconditional circuit-complexity lower bound for modular exponentiation against constant-depth prime-modulus circuits complements this result. The same argument recasts across subfields: preference learning under any misspecified model jumps discontinuously in sample complexity; multi-stage retrieval pipelines require at least as many independent metrics as stages; standard truthful auctions fail for agents with prompt-dependent valuations; and zero-knowledge verification of neural inference pays a measured overhead of one hundred ten to one hundred ninety times per non-linear activation. Together these form a catalogue of sixteen specifications, each pairing a computable boundary, a quantified violation cost, and a constructive design rule: two compositions are proved, one pairing is an honest obstruction, and four remain open. The impossibility-specification methodology is offered for the generative research programme that trustworthy AI may need. Every fundamental limit of AI is also a design rule.
ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization
arXiv:2605.22885v1 Announce Type: new Abstract: Formal mathematics libraries are rapidly expanding, creating a growing need to refactor verified proofs for maintainability and to improve training data quality for neural provers. However, scalable proof optimization is hindered by heterogeneous and heuristically specified objectives, scarce data, and high training and inference costs. To overcome these challenges, we introduce ImProver 2, a neurosymbolic framework for automated proof optimization in Lean 4. ImProver 2 combines a data-efficient expert-iteration pipeline with a scaffold that exposes formal structure alongside lightweight informal abstractions. We further introduce a suite of metrics capturing structural proof properties. Using ImProver 2, we train a 7B-parameter model that outperforms orders-of-magnitude larger models within the same model family, and is competitive with mid-tier frontier models across metrics. We additionally demonstrate that our neurosymbolic scaffold significantly improves performance across both small and frontier models. We show that with proper scaffolding and training, small models can effectively restructure research-level proofs over complex and varied metrics, matching substantially larger systems and establishing proof optimization as a scalable, learnable task.
Benchmarking LLMs for Community Governance Simulation with Life-history Narratives
arXiv:2605.23783v1 Announce Type: new Abstract: Effective community governance hinges on understanding what specific residents think and need. Recent work has used large language models (LLMs) to simulate human respondents, offering a scalable, reproducible way to study human attitudes and behaviors at low cost. However, these studies typically prompt the model with just a few demographic variables (age, gender, income), simulating only general role types. This is insufficient for community governance, where decisions depend on the views of specific residents. We bridge this gap with an integrated research framework covering dataset, benchmark, algorithm, and system. The dataset comprises approximately 1.2 million characters of first-person narrative collected through two-hour semi-structured interviews with each of 92 residents in an urban community, organized around nine community-governance domains. The benchmark probes 18 mainstream LLMs across four prompting strategies and shows that adding rich life-history profiles meaningfully raises fidelity above the no-profile baseline, but this gain comes with more input tokens per call from the longer prompts they require. The algorithm, curriculum-LoRA, is a parameter-efficient personalization framework that, by closing this fidelity-cost gap, matches the strongest baseline's fidelity at roughly 10x lower per-call cost and Pareto-dominates every configuration tested. The system integrates curriculum-LoRA into a closed-loop policy-evaluation pipeline. Together, these results bring individual-level LLM-based resident simulation within reach of resource-constrained local administrations, enabling community-governance decisions to be systematically pre-evaluated in silico before real-world deployment.
PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning
arXiv:2605.23074v1 Announce Type: new Abstract: The emergence of Large Reasoning Language Models (LRMs) has paved the way for tackling complex reasoning tasks through test-time scaling by generating long-form Chain-of-Thought (CoT) trajectories during inference. Meanwhile, these trajectories often contain explicit reflection markers such as ``wait'', ``but'', and ``alternatively'', signaling hesitation, revision, and the consideration of alternative explorations, respectively. Recent studies on test-time control leverage such markers as lightweight handles for steering reasoning, typically treating them as a single coarse-grained category rather than distinguishing their distinct functional roles. In this paper, we conduct type-wise suppression and fixed-prefix intervention, revealing that reflection markers differ not only in their functional roles but also in when they exert the greatest influence. Specifically, different marker classes affect accuracy and generation length in distinct ways, and marker choices are most consequential before the model settles into a stable reasoning trajectory. Motivated by these findings, we introduce PathCal, a novel training-free decoding controller that calibrates reasoning paths by distinguishing marker types and intervening only at locally uncertain states. At each decoding step, PathCal utilizes the distribution over reflection-markers to estimate local competition between maintaining the current reasoning trajectory and initiating a competing branch, and softly rebalances marker logits when competing-branch evidence becomes excessive. Experiments across six reasoning benchmarks demonstrate that PathCal achieves a better efficiency--performance trade-off, improving or preserving accuracy while reducing generation length, without relying on external verifiers or additional sampling.
NeuroNL2LTL: A Neurosymbolic Framework for Natural Language Translation of Linear Temporal Logic
arXiv:2605.22874v1 Announce Type: new Abstract: Effectively translating between natural language (NL) and formal logics like Linear Temporal Logic (LTL) requires expertise that limits formal verification's reach in safety-critical development. Template-based approaches sacrifice expressiveness for reliability; neural methods achieve fluency but provide no correctness guarantees. We present NeuroNL2LTL, a neurosymbolic architecture unifying learned translation with formal verification. NeuroNL2LTL routes translation through an intermediate representation whose mapping to LTL is structure-preserving by construction. Generated specifications undergo satisfiability and non-triviality checking; a minimal-edit repair mechanism corrects near-miss outputs before they reach downstream tools. The central innovation is verifier-in-the-loop training: verification outcomes serve as reward signals for reinforcement learning, producing neural components that optimize directly for formal correctness. On 200,000+ requirements spanning aerospace, robotics, autonomous vehicles, and ten additional domains, NeuroNL2LTL achieves 28\% semantic equivalence with reference specifications while ensuring 86\% of outputs are verified satisfiable. The system also generates contextually grounded explanations from LTL, enabling domain experts to validate specifications without specialized training. This work demonstrates that formal verification can function as both training objective and runtime filter for neural specification systems, allowing us to build neural-based tools whose reliability derives from logical guarantees rather than statistical confidence.
Meta and Google AI Models Exposed by Guardrail Flaw
Researchers reportedly bypassed safety guardrails on Meta and Google AI models within minutes, raising major security and compliance issues.
Adoption, Deployment & Impact
AI, Automation, and Robotics Are Reshaping Value Creation for Private Equity
Across my portfolio company clients, AI has stopped being a slide in the strategic plan and started being a line item. Back office automation, forecasting, supply chain optimization, customer service deflection, code generation inside engineering.
Why EU business AI adoption is rising and still not catching up
EU enterprise AI adoption hit 20% in 2025. The headline obscures a deeper problem, and it is older than the AI Act.
WSO2 Announces New Initiatives to Accelerate Agentic Enterprise Adoption - Channel Post MEA
WSO2 Identity Platform also advances ... for autonomous agents, complex B2B ecosystems, and decentralized identity models. New capabilities include delegated access, asynchronous authentication, and enhanced identity management for non-human entities, enabling secure interaction between humans, systems, and AI ...
AI is making everyone web app builders - but leaving teams exposed | TechRadar
The security gap emerging as AI makes app building easier
Government’s AI misuse exposes Ontarians to unnecessary risk and cost
It could help the province deliver better services at lower cost, but we’re off to a bad start.
AI Expands From Multibillion-Dollar Enterprises to Main Street
Artificial-intelligence agents scraped a bakery’s spreadsheets to help manage its growth.
Iy\`aw\'oBench: A Benchmark for Evaluating Large Language Model Clinical Triage Accuracy on Undifferentiated Febrile Illness in Nigerian Primary Health Settings
arXiv:2605.23465v1 Announce Type: new Abstract: Background. Undifferentiated febrile illness is the leading cause of primary care outpatient visits in Nigeria, yet no validated benchmark exists for evaluating large language model (LLM) clinical triage reasoning in West African primary health settings. Methods. We introduce Iy\`aw\'oBench v1.0, a dataset of 200 synthetic clinical vignettes across eight febrile illness categories derived from statistical distributions of 1,200 real patient encounters at 19 primary health centres (PHCs) in Oyo State, Nigeria. Six LLMs were evaluated on structured triage classification across two metrics: triage accuracy and safety score. Results. All six models achieved 100% safety scores (95% CI: 96.4-100.0%), never downgrading a critical REFER NOW case to TREAT HERE. Triage accuracy varied substantially: Claude Sonnet (claude-sonnet-4-5) 67.5% (95% CI: 60.8-73.7%), Llama 4 Scout 59.5% (52.5-66.2%), Llama 3.3 70B 43.0% (36.2-50.0%), and Llama 3.1 8B 39.0% (32.4-45.9%). Two models demonstrated near-zero accuracy attributable to structured output non-compliance. Conclusions. Modern LLMs exhibit safe triage behaviour but vary substantially in structured clinical accuracy. Clinically engineered systems with embedded WHO guidelines outperform general-purpose models by up to 28.5 percentage points. Iy\`aw\'oBench provides the first reproducible evaluation framework for LLM clinical decision support in West African primary care.
Finland’s Grundium acquires Denmark’s Visiopharm to build an end-to-end AI precision pathology platform
Grundium, a Tampere-based startup specialising in digital pathology imaging technology, backed by US-based healthcare private equity firm EW Healthcare Partners, has acquired Visiopharm, a Denmark-based provider of AI-driven precision pathology software. The combined business merges complementary capabilities from Grundium’s imaging platform and Visiopharm’s AI-driven precision pathology software, creating an accessible end-to-end solution for diagnostic laboratories, […]
AI for Service Businesses: The Practical 2026 Guide
AI for service businesses in 2026 looks different from AI for ecommerce. Here's the practical guide for consultants, coaches, agencies. Real workflows, real numbers, no fluff.
Lenders lap-up AI to break language barriers, slash lending timelines - The HinduBusinessLine
Lenders utilize AI to enhance efficiency, eliminate language barriers, and streamline lending processes for faster loan approvals.
Claude Has 3 Modes: Chat, Co‑Work, and Code — Most People Only Use One (And Wonder Why It Feels Slow) | by Divy Yadav | AI Engineering Simplified | May, 2026 | Medium
Deep dives into how real AI systems work and how to build systems that work in production.
Apple Watchに変革を、Whoopやオーラ台頭でヘルスケアアプリに課題-Power On
健康ウエアラブル端末市場で競争激化、AI時代への対応課題に
Perplant raises €1 million to equip tractors with AI “eyes” to cut herbicide use and boost profits for farmers
Perplant, a Danish AgTech startup on a mission to democratise AI in agriculture by supporting sustainable farming with a cost-effective, plug & play and AI-based camera sensor, has raised €1 million in investment. The funding round was supported by private angel investors and industry leaders with deep expertise in agriculture and retail. This capital injection […]
Delivery robots are spreading across LA. Residents ‘both pity and hate them’
A region known for its lack of walkability now has more obstacles for pedestrians to contend with Robots have taken over Los Angeles. It’s not just the AI-generated videos that have caused angst in Hollywood. Our streets are full of driverless Waymo vehicles, covered in more sensors and gadgets than the Batmobile. And our walkways are home to fleets of boxes on wheels, hurrying past pedestrians and navigating outdoor bar-hoppers as the robots deliver smoothies and keto-friendly salads. Continue reading...
Geopolitics, Policy & Governance
Rachel Reeves tells ministers to ‘buy British’ in four key industries
Exclusive: Chancellor pushes for procurement of ships, steel, energy and AI to prioritise Britishness as well as cost Rachel Reeves has instructed cabinet colleagues to award government contracts in four critical industries directly to British companies, making clear her irritation that ministers have been sending too much government business abroad. In a letter seen by the Guardian, the chancellor tells every cabinet minister in charge of a spending department to “buy British” wherever possible, adding that she is disappointed they are not already doing so. Continue reading...
The White House is asking for $9 billion to buy AI chips for spies. | The Verge
The New York Times reports the CIA and the NSA lack the computing capacity to run the latest AI models. The White House has approved a request for $9 billion to buy cutting-edge chips and build infrastructure to support Nvidia’s Grace Blackwell superchip. But Congress needs to approve the funds.
ECB summons banks to urge them to fix flaws exposed by latest AI models
Supervisor to stress seriousness of risks to financial system at hastily arranged meeting
Anthropic's Olah says AI must be guided from outside Big Tech | Reuters
VATICAN CITY, May 25 (Reuters) - The co-founder of AI company Anthropic said on Monday that the development of artificial intelligence cannot be left solely to technology companies, urging greater oversight from religious leaders, governments ...
UK Institute Is Hunting for Dangers Lurking in AI
The government’s A.I. Security Institute, staffed by alumni from OpenAI and Google, is becoming a model for countries grappling with A.I.’s emerging risks.
5 Key Regulatory Shifts From Makary's Era at the FDA | AJMC
From rewriting drug approval standards to embedding AI in review workflows, former FDA commissioner Marty Makary, MD reshaped how evidence, speed, and access are balanced in US drug regulation.
Authors suing Meta might seek early US appellate review of shadow library claims
Writers suing Meta for copyright infringement are considering asking a US appeals court to resolve an “intra-district split” on whether using shadow library books to train AI is illegal.
You can’t repair your tractor because Hollywood was terrified of the VCR
The 1998 Digital Millennium Copyright Act accidentally handed John Deere the legal right to lock farmers out of their own tractors.
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.