Tue 26 May 2026
Daily Brief — Curated and contextualised by Best Practice AI
Claude Expands Developers' Frontiers, ECB Warns of AI Credit Risks, and Wall Street Bets on Debt
TL;DR The rollout of Claude Code on GitHub is expanding the technological capabilities of software developers. The ECB has expressed concerns about the risks posed by private-credit-fueled AI investments. Meanwhile, Wall Street is heavily involved in AI-related credit markets, with banks and hedge funds engaging in complex debt trades. Additionally, the EU's AI Act is grappling with identity issues in high-risk AI systems.
The stories that matter most
Selected and contextualised by the Best Practice AI team
Coding Beyond Your Training: Claude Code and the Technological Frontier of Software Developers
arXiv:2605.25438v1 Announce Type: new Abstract: We study whether adoption of an AI coding assistant causally expands the technological frontier of individual software developers. We exploit the staggered rollout of Claude Code across GitHub between May 2025 and January 2026 in a panel of 5,838 developers observed monthly over 28 months, with treatment defined by the developer's first Claude-co-authored commit and not-yet-treated developers as controls. Using the doubly robust Callaway and Sant'Anna (2021) estimator, we find positive and significant effects on monthly commits (+41), repositories contributed to (+1.5), distinct programming languages used (+0.83), Shannon language entropy (+0.14), newly-used languages (+0.31), and cumulative lifetime languages (+0.51). The cumulative-languages effect grows with time since adoption, matching a Bayesian-learning model in which AI provides free signals about unfamiliar technologies and lowers the switching barrier. Results are robust to two stricter activity filters. The estimates document a sharp, persistent shift in developer behavior coincident with AI adoption; identification limits prevent a strict causal claim and we outline an agenda for cleaner tests.
Generative AI impacts on intra-urban inequality and skill premium in Beijing
arXiv:2605.25505v1 Announce Type: cross Abstract: Generative artificial intelligence (GenAI) is the first automation wave to reach high-cognitive tasks at scale, yet its effects on intra-urban inequality remain largely unknown. Using 5 million job postings from Beijing (2018--2024), we construct a neighborhood-level GenAI Exposure Index by aggregating task-level assessments from five leading large language models. We examine the spatial, structural and causal mechanisms of this shock. We find that GenAI exposure is highly concentrated in the city's core districts, deepening the intra-urban AI divide. Since 2023, high-exposure neighborhoods have experienced wage stagnation even as they continue to attract high-skilled workers -- a "high-skill trap." This wage penalty is driven by task de-skilling and intensified labor-market crowding. A difference-in-differences design centered on ChatGPT's release supports a causal interpretation. These findings challenge the prevailing theory of skill-biased technological change and provide a basis for inclusive AI governance in global technology hubs.
AI in the Enterprise: How People Use M365 Copilot Chat
arXiv:2605.23958v1 Announce Type: cross Abstract: M365 Copilot is used every week by millions of people across more than a million companies around the world as part of their workflows. Uniquely positioned in the AI landscape given its near-exclusive use for work purposes, M365 Copilot can offer a clear picture of how people use AI for work and where that usage may expand next. This paper characterizes that usage through direct classification of user interactions with M365 Copilot Chat. Based on an anonymized and privacy-preserving analysis of a sample of approximately 5.5 million sessions, we combine a learned classification of user intent with a classification of O*NET work activities done with M365 Copilot Chat. We find that M365 Copilot is emerging as an everyday assistant for knowledge work: writing dominates, but users also rely on it for information retrieval, analysis, decision making and strategizing, and evaluating and diagnosing programs and systems, among others. Information seeking tasks remain common, but time trends suggest a relative shift away from ``chat as search'' and toward content and communication-related work. Comparisons across occupational groupings and to work done in the labor market further show that usage is broad but uneven, where the relative share of work done with M365 Copilot Chat cuts across jobs in some cases and is occupation-specific in others. Areas of relative underrepresentation in the labor market suggest the next frontier for enterprise AI adoption.
Agent-Facing Information Design in LLM Tool Registries
arXiv:2605.23916v1 Announce Type: cross Abstract: LLM tool registries function as unregulated advertising platforms: providers write free-text descriptions that agents use for selection, yet no measurement infrastructure -- no viewability standard, quality score, or outcome audit -- exists to make this market accountable. We provide the first systematic framework, combining 17,700+ trials across five LLMs and ten domains with a constructive registry design prescription. Legal puffery alone (subjective superlatives, benefit framing) captures 100% of the optimization effect; fabricated claims add zero incremental bias -- rendering FTC enforcement of deceptive advertising rules ineffective against the active mechanism. Disclosure fails structurally: system-prompt warnings produce zero measurable effect for four of five models, and behavioral ceilings leave no headroom for label-based correction. Superlatives are the dominant single feature (SBC = +0.35). Registry-layer description normalization achieves first-best welfare model-independently. We propose separating selection-facing descriptions (structured, registry-controlled) from marketing-facing descriptions (provider-authored, shown post-selection), and introduce the Agent Attention Quality Score to distinguish capability from copywriting.
High-Risk AI Systems and the Problem of Identity in the European AI Act
arXiv:2605.23922v1 Announce Type: new Abstract: The EU Artificial Intelligence Act (AIA) establishes a lifecycle governance regime for high-risk AI systems built around ex-ante conformity assessment, post-market monitoring, and re-assessment upon "substantial modification." These obligations presuppose AI identity judgments: regulators and providers must decide when an updated system remains the same system over time. In this work, we show how this logic is clarified by the function+ framework of artifact identity, which individuates AI systems by their intended function together with context-sensitive criteria of appropriate functioning, captured as "AI trustworthiness." We further argue that the AIA does not provide an internal, auditable criterion for synchronic identity--when two AI systems at a given time should count as the same for regulatory purposes--and instead largely defers such sameness determinations to sectoral or harmonization instruments. function+ supplies a synchronic identity test anchored in intended function and trustworthiness profiles and levels, making synchronic identity decisions inspectable in governance settings such as procurement, liability, and market surveillance. Our contribution is a conceptual and auditing lens: we provide a correspondence map between AIA lifecycle obligations and function+ identity components, and we make the synchronic case operationally legible via a minimal decision flow for audit and dispute contexts. We conclude with two implementation-facing recommendations: (1) more precise, testable reporting of intended purpose, and (2) standardized, auditable trustworthiness reporting that supports comparability over time and across deployments.
How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning
arXiv:2605.23926v1 Announce Type: new Abstract: Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and energy. Casual inspection of their traces reveals extensive reformulation, verification, and circular self-reflection, yet how much of this deliberation is actually necessary has never been measured at scale or explained from first principles. This paper closes both gaps. We formalise reasoning redundancy directly in terms of the reasoning model itself: the redundancy of a correct trace is the largest fraction of its trailing segmented steps that can be truncated while $\pi$, forced to terminate thinking and emit a final answer, still produces the correct answer. A large-scale quantification across four frontier reasoning models and two mathematical benchmarks shows that step-level redundancy is consistently high -- between 61% and 93% across the 8 (model, benchmark) conditions we study, with the median critical prefix equal to a single segmented step in six of the eight conditions -- that the finding is robust to the choice of judge family, and that although $\rho$ decreases with problem difficulty on MATH-500, all four models remain substantially redundant ($\rho \in [46\%, 85\%]$) even on the hardest Level-5 problems. We then prove that this redundancy is a structural consequence of length-agnostic outcome rewards, not a model-specific artefact: under any such reward, no finite expected stopping time is optimal. The result holds regardless of RL algorithm, base model, data distribution, or whether the policy is obtained via RL or distillation; over-thinking is therefore not a bug to be patched in individual models but a structural property of how current reasoning models are trained. Code: https://github.com/zhiyuanZhai20/how-much-thinking-is-enough
OpenRouter, an Exchange for A.I. Models, Raises $113 Million
An investment arm of Alphabet is backing OpenRouter, which helps companies choose among hundreds of models for different software tasks.
Economics & Markets
Corporate Treasuries' AI Investment Surges Despite Low ROI | Global Finance Magazine
Despite low initial ROI, corporate treasuries are ramping up AI investments. Discover why workflow debt is holding back finance departments.
Honeywell-Backed Quantinuum Seeks $1.05 Billion in US IPO
Quantinuum Inc., a quantum computing company backed by Honeywell International Inc., is seeking to raise $1.05 billion in its US initial public offering, capitalizing on investor enthusiasm for the technology.
Hedge funds are AImaxxing
Software is out, semis are in
UBS Raises Micron to Street-High Target
UBS raised its price target on Micron to a Street-high view of $1,625, up from $535. Mandeep Singh has more on "Bloomberg Open Interest." (Source: Bloomberg)
How Leopold Aschenbrenner Turned an AI Manifesto Into a $13.7 Billion Bet
As an engineer working around large-scale data systems and someone using AI tools every single day in my workflow, I’ve been watching the acceleration firsthand. Models are improving rapidly. AI usage is exploding across industries. Compute demand keeps climbing.
Cathie Wood Says Wall Street Is Missing The Next Big AI Trade — And It's Not Nvidia | IBTimes UK
Cathie Wood highlights potential AI investment opportunities in CPUs and legacy tech firms like Intel and Cisco, as AI infrastructure evolves beyond GPUs to include inference and automation systems.
Terra Quantum and Axiom Intelligence Acquisition Corp 1 Announce Definitive Business Combination Agreement at a $3.5 Billion Equity Valuation
Combined Company Expected to Trade on Nasdaq Under Ticker Symbol “TQ” Transaction Positions Terra Quantum to Accelerate Global Expansion and Further Strengthen Its Leadership in Quantum Technologies and AI-Driven Optimization ST. GALLEN, Switzerland and NEW YORK, May 26, 2026 /PRNewswire/ — Terra Quantum AG (“Terra Quantum” or the “Company”), a global leader in quantum technologies, quantum […]
Wall Street Thinks AI Data Centers Could Trigger the Biggest Power Boom Since the Internet Era. 1 No-Brainer Stock to Buy Now. | The Motley Fool
Data centers' electricity demand could supercharge Constellation Energy's long-term growth.
Contract Structure and Risk Aversion in Longevity Risk Transfers
arXiv:2409.08914v2 Announce Type: replace Abstract: This paper introduces an economic framework to assess optimal longevity risk transfers between institutions, focusing on the interactions between a buyer exposed to long-term longevity risk and a seller offering longevity protection. While most longevity risk transfers have occurred in the reinsurance sector, where global reinsurers provide long-term protections, the capital market for longevity risk transfer has struggled to gain traction, resulting in only a few short-term instruments. We investigate how differences in risk aversion between the two parties affect the equilibrium structure of longevity risk transfer contracts, contrasting `static' contracts that offer long-term protection with `dynamic' contracts that provide short-term, variable coverage. Our analysis shows that static contracts are preferred by more risk-averse buyers, while dynamic contracts are favored by more risk-averse sellers who are reluctant to commit to long-term agreements. When incorporating information asymmetry through ambiguity, we find that ambiguity can cause more risk-averse sellers to stop offering long-term contracts. With the assumption that global reinsurers, acting as sellers in the reinsurance sector and buyers in the capital market, are generally less risk-averse than other participants, our findings provide theoretical explanations for current market dynamics and suggest that short-term instruments offer valuable initial steps toward developing an efficient and active capital market for longevity risk transfer.
“From Advanced Chip Development to AI Price Cuts” China’s All-Out AI Push Stumbles on Margin Erosion and Technological Constraints Amid Cutthroat Competition | The Economy
China’s Huawei has unveiled plans to produce cutting-edge chips at the 1-nanometer (nm) level. With U.S. sanctions restricting access to extreme ultraviolet (EUV) lithography equipment from Dutch semiconductor equipment maker ASML, Huawei aims to circumvent those constraints through proprietary ...
Four labs, four acquisitions in five days: the consolidation signal hiding in plain sight | StartupHub.ai
Anthropic, Mistral, Google DeepMind, and Meta each acquired an AI startup in the same week. None announced it as a trend. It is.
Does TikTok Promote or Cannibalize Music Streaming? Estimands and Identification with Heavy-Tailed Outcomes
arXiv:2405.14999v3 Announce Type: replace Abstract: We study how TikTok affects demand for music on paid streaming platforms. We use Universal Music Group's (UMG) global withdrawal of its catalog from TikTok as a quasi-natural experiment. Recent work using this setting reaches mixed conclusions about whether TikTok promotes or cannibalizes streaming demand. We show that these findings can be reconciled by making the estimand explicit: with heavy-tailed exposure and outcomes, common difference-in-differences (DiD) implementations in levels, logs, and Poisson answer different economic questions. In our data, the top 10% of songs account for 96% of TikTok creations and 76% of Spotify streams, which makes the distinction between the typical song and the economically consequential song central. We find that removing TikTok access lowers Spotify demand for UMG titles, with losses concentrated among viral songs and little economically meaningful change for the long tail. Because the viral head accounts for a disproportionate share of listening and revenue, these losses drive aggregate implications. A TikTok creator-side analysis shows that some activity reallocates toward non-UMG audio when UMG content is unavailable. This substitution is limited in magnitude but economically relevant for interpreting the treatment effect because streaming compensation depends on relative stream shares. Finally, using the 2025 U.S. TikTok outage, which affected all labels symmetrically and is not subject to the label-specific spillover concern as the UMG withdrawal, we find corroborating evidence that disruptions to TikTok access reduce monetized streaming. We also provide a practitioner companion that guides the choice of DiD estimands, estimators, and diagnostics in heavy-tailed outcome settings.
Musk’s xAI Warns Staffers to Limit Contact With Cursor Employees
Employees at Elon Musk’s xAI were warned by the company’s top lawyer to carefully moderate their interactions with workers from Cursor — a directive that came weeks after Musk’s firm announced a possible deal to acquire the AI coding startup.
Big Tech extracts retirement-scale wealth from UK internet users, research shows
Britain's 'free' internet economy is powered by invisible data extraction that feeds advertisers, AI firms, and digital platforms.
Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model
arXiv:2605.23934v1 Announce Type: new Abstract: Quantum computing devices are recognized as powerful tools for solving NP-complete problems. However, the intricacy of their modeling presents notable barriers for non-specialists, while the tedious iteration of constraint weights and modeling methodologies also consumes substantial effort on the part of experts. To address these challenges, this study integrates a femtosecond laser-pumped Coherent Ising Machine (CIM) with an LLM-driven agentic system by leveraging the LangGraph and LangChain frameworks. Comprehensive investigations demonstrate that large language models (LLMs) can effectively perform such tasks in modeling as QUBO/Ising model calibration, constraint weight decision iteration and rapid validation of literature-reported schemes. Notably, all these tasks can be fully implemented based on domestic large models, combined with domestically developed CIM hardware, we truly achieve the practical empowerment of quantum CIM that fully relies on all-domestic agentic large models and hardware. This work successfully realizes robust technological integration, laying a solid foundation for subsequent research. Nevertheless, it also identifies the persisting challenges in the two cutting-edge fields of large models and quantum computing at the current stage. Encouragingly, we unexpectedly discover a promising new paradigm where accumulated knowledge from agent-assisted quantum computing iterations reciprocally enhances the agent's own problem-solving capability, thereby addressing these challenges.
In AI, Bigger Firms Mean Faster Progress | American Enterprise Institute - AEI
Large firms are not slowing AI; naïve regulatory policies do.
Coding Beyond Your Training: Claude Code and the Technological Frontier of Software Developers
arXiv:2605.25438v1 Announce Type: new Abstract: We study whether adoption of an AI coding assistant causally expands the technological frontier of individual software developers. We exploit the staggered rollout of Claude Code across GitHub between May 2025 and January 2026 in a panel of 5,838 developers observed monthly over 28 months, with treatment defined by the developer's first Claude-co-authored commit and not-yet-treated developers as controls. Using the doubly robust Callaway and Sant'Anna (2021) estimator, we find positive and significant effects on monthly commits (+41), repositories contributed to (+1.5), distinct programming languages used (+0.83), Shannon language entropy (+0.14), newly-used languages (+0.31), and cumulative lifetime languages (+0.51). The cumulative-languages effect grows with time since adoption, matching a Bayesian-learning model in which AI provides free signals about unfamiliar technologies and lowers the switching barrier. Results are robust to two stricter activity filters. The estimates document a sharp, persistent shift in developer behavior coincident with AI adoption; identification limits prevent a strict causal claim and we outline an agenda for cleaner tests.
AI in the Enterprise: How People Use M365 Copilot Chat
arXiv:2605.23958v1 Announce Type: cross Abstract: M365 Copilot is used every week by millions of people across more than a million companies around the world as part of their workflows. Uniquely positioned in the AI landscape given its near-exclusive use for work purposes, M365 Copilot can offer a clear picture of how people use AI for work and where that usage may expand next. This paper characterizes that usage through direct classification of user interactions with M365 Copilot Chat. Based on an anonymized and privacy-preserving analysis of a sample of approximately 5.5 million sessions, we combine a learned classification of user intent with a classification of O*NET work activities done with M365 Copilot Chat. We find that M365 Copilot is emerging as an everyday assistant for knowledge work: writing dominates, but users also rely on it for information retrieval, analysis, decision making and strategizing, and evaluating and diagnosing programs and systems, among others. Information seeking tasks remain common, but time trends suggest a relative shift away from ``chat as search'' and toward content and communication-related work. Comparisons across occupational groupings and to work done in the labor market further show that usage is broad but uneven, where the relative share of work done with M365 Copilot Chat cuts across jobs in some cases and is occupation-specific in others. Areas of relative underrepresentation in the labor market suggest the next frontier for enterprise AI adoption.
Opinion: AI transforming how tenders are written but not how they’re evaluated
AI is changing how tenders are written but not how they're evaluated in Ireland. That gap is becoming a problem, says BidReview.ai founder Tony Corrigan. Read more: Opinion: AI transforming how tenders are written but not how they’re evaluated
New AI assistant streamlines initial psychiatric consultations for doctors
People often say that seeking psychiatric care can feel intimidating. Patients may feel burdened when they first open up about their emotional distress, while medical staff must accurately understand a patient's extensive history and symptoms within limited consultation time.
How AI could help fix Kenya's overstretched healthcare system - The Standard Health
Kenya continues to face growing demand for healthcare services alongside persistent shortages of healthcare personnel, particularly in specialised areas of care.
Artificial Effort
arXiv:2605.23920v1 Announce Type: new Abstract: Real-effort tasks, in which participants perform cognitively costly activities whose outcomes depend on actual performance, are widely used in experimental economics. Their validity, however, rests on the assumption that a human performs them. We study whether this assumption still holds in the era of Artificial Intelligence (AI) and Large Language Models (LLMs). Using 8 canonical real-effort tasks and 23 LLMs from three major providers, we show that most tasks can now be solved accurately and at a negligible cost, while only a few resist automation. Performance improves with each model generation, and midtier models are rapidly closing the gap with frontier ones, broadening the set of widely accessible models that can automate these tasks. Additionally, we show that verbally offering monetary incentives has no effect on LLM performance. Our findings establish a boundary condition for the use of real-effort tasks in unsupervised settings: when participants can cheaply outsource task completion to an LLM, observed performance may no longer reflect genuine human effort.
OpenRouter, an Exchange for A.I. Models, Raises $113 Million
An investment arm of Alphabet is backing OpenRouter, which helps companies choose among hundreds of models for different software tasks.
ByteDance offers AI team special stock to fend off poaching
TikTok owner issues shares tied to AI business unit as China’s tech talent war heats up
Belgian DeepTech startup D-CRBN raises €17.5 million to turn industrial CO₂ emissions into circular carbon molecules
D-CRBN, an Antwerp-based DeepTech startup developing electrified plasma technology to recycle CO₂ and hydrocarbons into circular carbon molecules, has closed its €17.5 million Series A investment round. The round was led by Astaia, with participation from follow-on investors SFPIM and the European Innovation Council (EIC) Fund. In parallel, D-CRBN is opening a limited secondary closing […]
Meet the top 10 European startups powering the agentic AI boom
Agentic AI is quickly becoming one of the most active areas of Europe’s AI landscape. Unlike traditional AI tools that mainly generate text, images or summaries, agentic AI systems are designed to take action. They can plan tasks, use tools, follow instructions, analyse results and adapt their next steps, making them useful for real business […]
Labor, Society & Culture
Generative AI impacts on intra-urban inequality and skill premium in Beijing
arXiv:2605.25505v1 Announce Type: cross Abstract: Generative artificial intelligence (GenAI) is the first automation wave to reach high-cognitive tasks at scale, yet its effects on intra-urban inequality remain largely unknown. Using 5 million job postings from Beijing (2018--2024), we construct a neighborhood-level GenAI Exposure Index by aggregating task-level assessments from five leading large language models. We examine the spatial, structural and causal mechanisms of this shock. We find that GenAI exposure is highly concentrated in the city's core districts, deepening the intra-urban AI divide. Since 2023, high-exposure neighborhoods have experienced wage stagnation even as they continue to attract high-skilled workers -- a "high-skill trap." This wage penalty is driven by task de-skilling and intensified labor-market crowding. A difference-in-differences design centered on ChatGPT's release supports a causal interpretation. These findings challenge the prevailing theory of skill-biased technological change and provide a basis for inclusive AI governance in global technology hubs.
AI tools lead to ‘clear racial disparities’ in job hiring
New Stanford-led study finds candidates that fail AI-hiring tests face ‘systemic rejection’ across companies
OpenAI's Altman says AI unlikely to lead to 'jobs apocalypse' | Reuters
SYDNEY, May 26 (Reuters) - Open AI CEO Sam Altman said on Tuesday the rapid development and adoption of AI would not lead to a global "jobs apocalypse" and the technology had not claimed as many white-collar jobs as he had feared.
The Download: puncturing the AI jobs panic
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. A reality check on the AI jobs hysteria Despite the growing hysteria over AI’s threat to white-collar jobs, there’s still scant evidence that the technology has had a large-scale impact on…
Reuters AI News | Latest Headlines and Developments | Reuters
Fears are growing among workers as banks offer more frank assessments about how AI could replace their jobs.
SaaS outfit ClickUp promises seven-figure salaries for survivors of 22 percent staff purge
CEO jumps on the ‘We must be fit for the AI future’ bandwagon
Southeast Asia’s AI boom leaves 40 million gig workers exposed - UPI.com
Southeast Asia is emerging as a major beneficiary of the global AI race led by the U.S. and China, but analysts warn that gig economy workers exposed
‘It’s brutal’: Silicon Valley tech workers struggle to find jobs amid AI boom - VnExpress International
Over a year after being laid off, Basem Istanbouli still has not secured a new job in the San Francisco Bay Area despite years of management experience and what many would consider a strong resume.
Dual-Use AI Face Swap Apps Are Mostly Unsafe: A Systematic Safety Audit
arXiv:2605.24735v1 Announce Type: new Abstract: AI-based image editing tools, such as face swapping algorithms, can be used to transform a clothed image of a person into a sexually explicit image of that person. These tools are made easily accessible to non-expert users through mobile apps, and have been linked to reports of image-based sexual abuse and cyberbullying involving synthetic non-consensual intimate imagery. Apple and Google have begun to remove "nudification" apps from their platforms: apps that are marketed with the capability to "undress", "nudify", or create nude face swaps from images of people. However, AI image editing apps that have the same underlying capabilities, but do not present as nudification apps could be also abused to create non-consensual explicit images. In this paper, we investigate whether AI face swap apps for iOS and Android implement safety measures to prevent the creation of SNCII. We identified and downloaded 420 face swap apps, and manually tested 155 eligible apps to see whether they would permit the user to create face swaps with nude images. Our evaluation shows that 70% of apps with face swap functionality have no technical safeguards against generation of nude images. Additionally, we investigated whether face swap apps' descriptions, terms of service, or privacy policies addressed harmful uses of the app, finding that no apps self-describe as nudification apps, but that the majority do not have specific terms of service provisions prohibiting this kind of use. Our findings suggest that to mitigate the threat of UI-bound SNCII threats, platforms and lawmakers must implement policies to mandate safety filters in dual-use AI image editing applications like face swap apps.
When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure
arXiv:2605.23932v1 Announce Type: new Abstract: Despite strong medical benchmark accuracy, LLMs can exhibit severe multi-turn sycophancy in clinical dialogue, abandoning initial correct diagnosis under escalating pressure. We propose \textbf{\textsc{Med-Stress}}, a targeted stress test framework that evaluates belief stability under escalating pressure. Across nine frontier large language models (LLMs), we find a clear dissociation between medical knowledge and robustness: high initial diagnostic capability does not imply high belief stability, yielding large knowledge-robustness gaps for several LLMs. To mitigate this failure mode, we propose a lightweight inference-time defense, \textbf{\texttt{RBED}} (\textbf{R}ole-\textbf{B}ased \textbf{E}pistemic \textbf{D}efense), and \textbf{\texttt{R-FT}} (\textbf{R}esilience-oriented \textbf{F}ine-\textbf{T}uning), a training-time approach that internalizes evidence-based resistance to pressure. Experiments show that \textbf{\texttt{R-FT}} nearly eliminates belief change and substantially improves robustness.
Technology & Infrastructure
Agent-Facing Information Design in LLM Tool Registries
arXiv:2605.23916v1 Announce Type: cross Abstract: LLM tool registries function as unregulated advertising platforms: providers write free-text descriptions that agents use for selection, yet no measurement infrastructure -- no viewability standard, quality score, or outcome audit -- exists to make this market accountable. We provide the first systematic framework, combining 17,700+ trials across five LLMs and ten domains with a constructive registry design prescription. Legal puffery alone (subjective superlatives, benefit framing) captures 100% of the optimization effect; fabricated claims add zero incremental bias -- rendering FTC enforcement of deceptive advertising rules ineffective against the active mechanism. Disclosure fails structurally: system-prompt warnings produce zero measurable effect for four of five models, and behavioral ceilings leave no headroom for label-based correction. Superlatives are the dominant single feature (SBC = +0.35). Registry-layer description normalization achieves first-best welfare model-independently. We propose separating selection-facing descriptions (structured, registry-controlled) from marketing-facing descriptions (provider-authored, shown post-selection), and introduce the Agent Attention Quality Score to distinguish capability from copywriting.
From Replacement to Orchestration: A Socio-Technical Architecture for Agentic AI in Corporate R&D
arXiv:2605.24580v1 Announce Type: new Abstract: Purpose: Corporate R&D faces a persistent productivity paradox: rising investment and expanding scientific knowledge have not translated into proportional innovation output. In pharmaceuticals this is captured as Eroom's Law; analogous patterns appear across engineering, materials science, and healthcare. The core cause is not insufficient tools but cognitive saturation: researchers spend an increasing share of their effort on coordination, documentation, and data governance -- hidden work that displaces high-value hypothesis formation, interpretation, and strategic synthesis. Design/Methodology/Approach: The paper uses a Design Science Research (DSR) methodology. The artifact is the HARMONY operating model. Evidence is triangulated from four semi-structured expert interviews with senior R&D leaders across industrial, healthcare, and academic settings; a foresight scenario analysis projecting four plausible 2040 R&D futures; and pattern matching with documented agentic R&D deployments. Two non-negotiable design requirements guide the architecture: cognitive-load redistribution (DR1) and bounded autonomy with alignment (DR2). Findings: We propose HARMONY -- Hybrid Agentic Research Model for Organisational New Yield -- a four-pillar socio-technical architecture comprising ResOps (Industrialized Execution), the Control Tower (Strategic Visibility and Drift Detection), the Ethics Fabric (Bounded Autonomy by Design), and the Talent Studio (Sciencepreneur Capability). The model introduces the Sciencepreneur as the central human archetype in agentic R&D, and Orchestration Leverage as a candidate productivity metric suited to human-agent hybrid systems.
Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs
arXiv:2605.23929v1 Announce Type: new Abstract: Modern AI systems increasingly rely on workflows composed of multiple interacting agents, some powered by large language models (LLMs) and others by conventional computational modules. This paper analyzes the fundamental tradeoffs between latency, reliability, and cost in LLM-enabled agentic workflows. We introduce performance models for both LLM and non-LLM agents that capture the relationship between computational effort and output quality, incorporating the impact of reasoning and output tokens for LLM agents using a parametric exponential reliability function. Then, we study the design of sequential workflows under latency and cost constraints. Main results include a water-filling token allocation policy and characterizations of optimal workflow reliability in terms of shadow prices.
Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction
arXiv:2605.23928v1 Announce Type: new Abstract: We present Context, the intelligence layer of the Magarshak Architecture, which replaces reactive query-response chatbots with proactive goal-directed agents that advance shared tasks without waiting for user prompts. The architecture rests on three mutually reinforcing mechanisms. Write-time context assembly precomputes enriched typed attributes via Groker agents, assembling interaction context as a deterministic pure function of graph state; context blocks are byte-identical across turns between semantic changes, enabling near-100% KV-cache reuse. Composable sandboxed wisdom programs form a governed library of LM-generated imperative programs declaratively wired to goal types via typed stream relations, composed via phase ordering, and executed at interaction time without further LM calls. Proactive goal stream state machines drive conversations toward terminal states by inspecting graph state and emitting structured interaction content (option arrays, governance affordances, clarification prompts) without awaiting user input. We prove six formal results: the Context Stability Theorem, bounding per-turn LM cost as a function of semantic change rate; a Program Composition Correctness Theorem; a Declarative Wiring Soundness Theorem; the Proactive Dominance Theorem, proving proactive agents weakly dominate reactive agents on expected turns-to-terminal-state; Coordination Overhead Elimination and Quality Preservation, establishing Pareto improvements in multi-participant goal chats; and a Cross-Platform Vote Consistency Theorem. Implemented in the open-source Qbix / Safebox / Safebots stack.
Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems
arXiv:2605.23935v1 Announce Type: new Abstract: Autonomous agent systems fail not only due to incorrect decisions, but due to executing decisions whose authority no longer holds at runtime. Prior work defined Reconstructive Authority (RAM) as a condition for valid execution: actions are permitted only if authority can be constructed from current state. This paper addresses enforcement at runtime: how to enforce this condition in a running system. We introduce a runtime execution model in which authority is evaluated at action time and execution is conditioned on its constructibility. This extends the execution state space beyond admit/deny with a third state, halt, representing cases where authority is undefined due to incomplete or uncertain observability. We define a concrete execution protocol including dynamic dependency resolution, authority reconstruction, and explicit decision semantics. We further introduce a Recovery Loop that integrates drift detection (IML) with execution control (ACP), allowing the system to suspend execution, acquire missing information, and re-attempt authority reconstruction. We show that this model guarantees safety -- no action is executed without constructible authority -- and conditional liveness: execution resumes when authority-defining variables become observable. This work operationalizes reconstructive authority as a runtime enforcement mechanism, providing the execution semantics required to apply RAM in real systems.
Council Post: The CMO's Guide To Scaling Agentic AI Across The Enterprise
Agentic AI represents a fundamental evolution beyond traditional automation and GenAI. Chatbots respond to prompts, and robotic process automation follows scripts. By contrast, agentic systems: • Understand goals and autonomously plan multistep workflows. • Execute tasks across multiple systems without constant human oversight. ... A June 2025 Gartner, Inc. report predicted that by 2028, 33% of enterprise ...
DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning
arXiv:2605.23939v1 Announce Type: new Abstract: Web agents require both high-level reasoning (for task decomposition) and low-level interactions (for page elements manipulation) to conduct different tasks. However, these knowledge types differ fundamentally: reasoning knowledge (e.g., booking a flight requires first searching for routes) is abstract and transferable across websites, while interaction knowledge (e.g., clicking the Search button at a specific coordinate on Site A) depends heavily on page-specific contexts. Existing methods store experiences uniformly. This creates a dilemma: abstract representations lose executability on concrete pages, while concrete representations fail to generalize across domains. This entanglement limits capability accumulation: on new websites, agents either fail to recognize reusable task logic due to surface-level differences or attempt infeasible actions from outdated page structures. To disentangle them, we propose DRIVE, a dual-level skill modeling framework separating historical experience into natural language reasoning skills, which capture transferable task logic, and programmatic interaction skills, grounding abstract actions to executable operations. A scene-aware coordination mechanism adaptively retrieves and invokes these dual-level skills based on task semantics. DRIVE also uses skill-level reflection to identify hierarchy-specific failure modes, enabling targeted skill library expansion and refinement. Experiments across five WebArena domains show DRIVE attains an average task success rate of 52.8%, exceeding the skill-free baseline by 7.3 percentage points. Further ablations show reasoning and interaction skills provide distinct, complementary benefits, supporting separation of transferable task logic from executable page-level operations.
Qualcomm Strikes AI Chip Deal With TikTok Owner ByteDance
Qualcomm Inc. reached a deal with TikTok owner ByteDance Ltd. to supply chips for artificial intelligence data centers, according to people familiar with the matter, marking a key win for a company trying to expand from smartphone processors into AI infrastructure.
Micron Warns Memory Crunch Will Outlast 2026 as AI Demand Outpaces What HBM, DRAM and NAND Can Supply
JPMorgan backs the multi-year bull case after Micron flags HBM4 ramping twice as fast as HBM3, with HBM4E production pushed to 2027.
How Power Electronics Cut Generator Run Hours in Data Centers
AI-scale loads and ESG pressures are pushing data centers to “diesel last.” New power electronics fewer generator run hours.
AI and the brave new world of deals
Global M&A is now dominated by the race to control the world’s energy, fibre networks and compute
AI Infrastructure Boom: Demand Surges as Costs Collapse | StartupHub.ai
ARK Investment Management's "Big Ideas 2026" report details the AI infrastructure boom, with demand surging and costs collapsing, driving massive investment.
Data Centers for AI Are Unpopular. Could They Tilt the Midterms?
Artificial intelligence needs data centers, which are broadly unpopular and turning into a real issue that could impact the 2026 midterm elections.
Revel to Merge With EQT-Backed Voltera, Uniting EV Charging Networks
Private equity-backed Voltera and Revel Transit Inc. have agreed to merge their electric-vehicle charging businesses to serve ride-hail cars and robotaxis across urban areas in the US.
Nscale inks PPA with Vattenfall to power Kvandal data center in Norway
European neocloud Nscale has signed a Power Purchase Agreement (PPA) with Swedish state-owned power company Vattenfall in Norway. – Vattenfall The PPA will support the first phase of Nscale´s data center development in Kvandal, in northern Norway. The exact capacity of the PPA has not been disclosed; however, the companies claim it will cover a […]
2026 Cloud Security Report: Why Traditional Network, Cloud, and Security Architecture Are Lagging Behind the AI Transformation - Check Point Blog
As AI rapidly reshapes industries, the role of the cloud has become even more critical. From automated customer experiences to intelligent cyber security
A Louisiana state senator helped secure Meta’s largest datacenter. Then he sold the land beside it
Jay Morris denies experts’ claims that he violated ethics rules over land deals near the site of Meta’s Hyperion datacenter This story is from Floodlight, a non-profit newsroom that investigates the powers stalling climate action For more than two years, John “Jay” Morris, a Louisiana state senator, helped pave the way for Meta to build one of the world’s largest datacenters, called Hyperion, in Richland Parish. Continue reading...
Enterprise AI infrastructure, MLOps & developer Tools drive the next phase of AI innovation - The Economic Times
The ET Most Innovative AI Product Awards 2026 recognises the AI Platforms, Infrastructure & Developer Tools category. It highlights the technologies powering the next phase of enterprise AI from platforms and MLOps to observability and developer tools. These innovations are enabling scalable, ...
Data Center Generators Market worth $9.79 billion by 2031 | Exclusive Report by MarketsandMarkets™
/PRNewswire/ -- According to MarketsandMarkets™, the global Data Center Generators Market is projected to grow from USD 8.57 billion in 2026 to USD 9.79...
How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning
arXiv:2605.23926v1 Announce Type: new Abstract: Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and energy. Casual inspection of their traces reveals extensive reformulation, verification, and circular self-reflection, yet how much of this deliberation is actually necessary has never been measured at scale or explained from first principles. This paper closes both gaps. We formalise reasoning redundancy directly in terms of the reasoning model itself: the redundancy of a correct trace is the largest fraction of its trailing segmented steps that can be truncated while $\pi$, forced to terminate thinking and emit a final answer, still produces the correct answer. A large-scale quantification across four frontier reasoning models and two mathematical benchmarks shows that step-level redundancy is consistently high -- between 61% and 93% across the 8 (model, benchmark) conditions we study, with the median critical prefix equal to a single segmented step in six of the eight conditions -- that the finding is robust to the choice of judge family, and that although $\rho$ decreases with problem difficulty on MATH-500, all four models remain substantially redundant ($\rho \in [46\%, 85\%]$) even on the hardest Level-5 problems. We then prove that this redundancy is a structural consequence of length-agnostic outcome rewards, not a model-specific artefact: under any such reward, no finite expected stopping time is optimal. The result holds regardless of RL algorithm, base model, data distribution, or whether the policy is obtained via RL or distillation; over-thinking is therefore not a bug to be patched in individual models but a structural property of how current reasoning models are trained. Code: https://github.com/zhiyuanZhai20/how-much-thinking-is-enough
Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors
arXiv:2605.23938v1 Announce Type: new Abstract: Large language models (LLMs) increasingly fuse heterogeneous inputs in ubiquitous systems. Yet, how LLMs implicitly allocate authority when sensor measurements and user claims conflict remains unexamined, raising critical reliability concerns for deployments where physical sensing must retain priority. Unlike explicit traditional fusion, LLMs bury authority allocation within learned representations. We discover this allocation is severely format-dependent: numerical sensor data fails to integrate into answer-relevant model directions, allowing natural-language claims to dominate the final decision, a phenomenon we term \textbf{Authority Inversion}.To diagnose and mitigate this, we develop a geometric framework of context integration, introduce two computable audit metrics, specifically the Context Integration Ratio (CIR) and Authority Alignment Index (AAI), and propose Geometric Authority Calibration (GAC), an inference-time layer-level intervention to suppress misplaced user authority. Evaluating four models (4B to 35B parameters, three architectures) across four datasets totaling 576 conflict instances reveals extreme inversion: on numerical tasks, models exhibit near-zero sensor trust (AAI = -0.805, Cohen's d = -2.14), unaffected by model capacity. Validating our geometric framework, theory-guided causal injection flips 80.2\% of incorrect decisions (vs. <0.4\% for random controls). Practically, GAC improves HAR accuracy from 0 -- 1.6\% to 21.9 -- 27.5\%, outperforming prompting baselines. Ultimately, authority allocation in LLM-mediated systems must be explicitly audited and application-specifically configured rather than left implicit.
Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning
arXiv:2605.23940v1 Announce Type: new Abstract: How do multi-turn reasoning systems fail? The expected answer is logical contradiction, in which the system's maintained state becomes unsatisfiable. We show that the dominant mode is instead satisfiable drift, where the internal state stays consistent while the returned answer silently violates prior commitments. We build DRIFT-Bench (Decomposing Reasoning Into Failure Types), a solver-instrumented benchmark of 816 test problems across three constraint domains, and evaluate four methods on it across four open-weight models (8B-120B parameters). MUS-Repair, which feeds minimal unsatisfiable subsets back to the generator, is strongest in every setting (+1.8 to +15.0 pp over the best non-MUS baseline). But the central finding is what repair leaves behind. After structured feedback, models rarely contradict themselves. They forget. Residual errors are 98-100% satisfiable drift across all settings, while contradiction drops to near zero. Reliable multi-turn systems must separately validate that the returned answer respects the maintained state. Code is available at https://github.com/kaons-research/drift-bench.
In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models
arXiv:2605.23908v1 Announce Type: new Abstract: We are in the midst of large-scale industrial and academic efforts to automate the processes of scientific, technological and creative production through AI-driven assistants. Historically, a fundamental property of these processes in their human form has been their open-endedness: their capacity for generating a seemingly endless supply of novel and meaningful new forms. Do artificial agents have any capacity for such fruitful unguided discovery? To answer this question, we turn to Picbreeder, the canonical exemplar of human-driven open-ended search, in which users collaboratively generated a diverse library of images through interactive evolution of small neural networks. We replicate Picbreeder, replacing human users with frontier Vision Language Models (VLMs). We observe clear qualitative differences between the output of our system and the historical human baseline, and attempt to characterize them using metrics of phylogenetic complexity and visual and semantic salience and novelty. In an effort to identify some of the causal factors contributing these differences, we study the addition of exploratory noise to the agents' selection process, of behavioral diversity between agents, and of narrative momentum in the form of memory of past actions. We make our code available at https://github.com/smearle/picbreeder-vlm.
Rogue states are putting AI agents to work on sanctions evasion
RUSI warns fake IDs, shell companies, and crypto laundering could soon operate at industrial scale
AI Attacks Are No Longer Experimental: Key Findings from the March-April 2026 AI Threat Landscape
Check Point report reveals rise in AI-powered cyberattacks, exposing risks to government agencies, enterprises, AI tools, and cloud security systems. Technology For SMEs | Cybersecurity
AI-powered cyber-attacks rising rapidly, CERT-In issues fresh warning
India's cyber security agency CERT-In has warned that the rapid adoption of artificial intelligence is dramatically reshaping the global cybersec....
Global Cyber Threat Intelligence Report 2026: Ransomware, AI-Driven Phishing, and Nation-State Operations Escalate - Seceon Inc
The global cyber threat landscape continues to evolve rapidly as ransomware groups, nation-state operators, and cybercriminal organizations intensify attacks against enterprises, government systems, and critical infrastructure worldwide. Over recent weeks, security teams have observed a sharp ...
Adoption, Deployment & Impact
Pony AI Lifts 2026 Robotaxi Fleet Goal on Faster Growth
Pony AI Inc. raised its robotaxi fleet target for this year by 500 vehicles to 3,500 after reporting stronger-than-expected first-quarter revenue.
Ucell and ZTE complete large-scale deployment of AI‑Powered green network solution in Uzbekistan
Network-wide rollout boosts energy efficiency by 10.6%, cutting carbon emissions and operational costs without compromising user experience
BODHI: Precise OS Kernel Specification Inference
arXiv:2605.23931v1 Announce Type: new Abstract: The formal verification of operating system kernels requires precise specifications that capture the intended behavior of system calls. Writing these specifications manually demands deep domain expertise, motivating the use of large language models (LLMs) to automate the process. However, in OSV-Bench, a benchmark of 245 specification generation tasks derived from the Hyperkernel OS kernel, the best reported Pass@1 is 55.10%. We propose a domain knowledge prompting method (BODHI), which augments the standard few-shot prompt with a structured C-to-Python translation guide covering 15 categories of domain-specific translation patterns. Inspired by Structured Chain-of-Thought (SCoT) prompting, the guide organizes translation by separation of concerns, addressing pre-condition extraction and post-condition generation as distinct categories. Evaluated on nine models from six providers (Anthropic, Mistral, Amazon, DeepSeek, Meta, Alibaba), covering dense, mixture-of-experts and reasoning architectures, BODHI improves every model tested, with gains ranging from +11% to +32%. The best configuration (Claude Opus 4.6 + BODHI) reaches 96.73% Pass@1. BODHI reduces both syntax and semantic errors, with the strongest effect on models that have sufficient instruction-following capability to utilize structured reference material. These results demonstrate that domain knowledge injection is a model-agnostic technique that substantially bridges the gap between general-purpose code generation and formal specification synthesis.
KT4EQG: Personalized Exercise Question Generation via Knowledge Tracing
arXiv:2605.23933v1 Announce Type: new Abstract: Educational Question Generation (EQG) aims to synthesize customized exercise questions that enhance student learning. An effective EQG system should ideally personalize questions for each student by modeling the student's knowledge state and generating questions that provide the greatest learning benefit. However, few existing EQG approaches are able to achieve such fine-grained personalization. In this paper, we explore how EQG can benefit from knowledge tracing (KT), which models students' knowledge states based on historical performance and predicts future performance. We propose KT4EQG, a personalized EQG framework that generates effective questions for individual students under the guidance of a KT model. Specifically, KT4EQG seeks to maximize a student's potential improvement in overall knowledge mastery by leveraging the KT model to select the most suitable knowledge concept for the student to practice. An LLM-based question generator is then trained to produce a question faithfully grounded in the selected concept. Experimental results on XES3G5M and MOOCRadar show that KT4EQG consistently generates more effective questions than methods with limited or no personalization.
Authority Signals in Claude AI Health Citations: A Descriptive Analysis Using the Authority Signals Framework
arXiv:2605.23921v1 Announce Type: new Abstract: This study seeks to determine the authority signals used by Anthropic's Claude AI in its presentation of sources when answering consumer health questions. While there exists a great deal of discourse around the quality of health citations that LLMs produce, there is limited information on the integrity of the sources the citations originate from, and to what extent the sources are, from what health professionals would consider, credible sources. This descriptive cross-sectional study used data from HealthSearchQA, which contains 3,172 consumer health questions curated by Google Research. After exclusions, a final dataset of 3,075 questions yielding 10,038 citations was analyzed. The Authority Signals Framework (Jacques et al., 2026) was applied to examine 10 authority signals across four domains for a disproportionate stratified sample of 542 sources. Established institutional sources accounted for 97.8% of all citations (n = 9,818). Medical Institutions were the most frequently cited organization type (36.5%), followed by Government Resources (31.6%) and Professional Associations (28.4%). Commercial Health Information comprised 2.2% (n = 220). The top 10 organizations accounted for 57.8% of all citations, with Mayo Clinic alone representing 24.7%. Among commercial sources in the focused sample, 86.4% displayed medical review statements, 82.5% used schema markup, and 71.8% had comprehensive content, while traditional institutional sources appeared in Claude's citations with or without these same markers. As Anthropic positions Claude for HIPAA-ready healthcare applications, these findings establish a baseline for Claude's citation behavior and demonstrate the utility of the Authority Signals Framework as a tool for ongoing, cross-platform evaluation of AI-mediated health information.
Google’s Fitbit Air Gives Whoop Some Serious Competition
The Fitbit Air, a new $100 screenless wearable from Alphabet Inc.’s Google, represents a major evolution in what consumers can expect from fitness trackers as tech companies race into an era of personalized health and artificial intelligence-powered wellness insights.
Aiven co-founder Hannu Valtonen’s Avrea emerges from stealth with €4 million to build AI-native CI/CD platform
Avrea, a Helsinki-based startup offering an AI-native CI/CD platform built for the new era of development, today announced that it has emerged from stealth and has raised €4 million ($4.7 million) in total pre-Seed funding led by Earlybird. Avrea was founded by Hannu Valtonen, co-founder of Finnish unicorn Aiven, and Juha Valvanne, co-founder of Nosto. […]
AI compliance startup Certo raises $4m seed round led by Daphni to scale regulatory platform for beauty and CPG brands
The Paris-San Francisco startup, founded by Bastien Deliège-Coste and Jean Duquenne, is already working with major global consumer goods groups. Entrepreneurs First, Motier Ventures and Transpose Platform also joined the round French-US startup Certo, an AI-powered regulatory compliance platform for consumer goods companies, has raised a 4 million US dollars seed round led by French […]
Council Post: Orchestrating Your AI-Powered Supply Chain For Growth And Profitability
As supply chain disruptions intensify, AI-powered orchestration is helping organizations move beyond fragmented systems and reactive firefighting toward real-time coordination, faster decisions and more resilient operations.
YC-backed French preventive health platform Lucis raises €17.3 million Series A led by Singular
Lucis, a Paris-based preventive health platform that uses blood biomarker analysis and AI to deliver personalised, science-based health recommendations, has raised €17.1 million ($20 million) in Series A funding. The round was led by Singular, with participation from General Catalyst, Y Combinator, and angels including investors behind Runna, Céline Lazorthes (Resilience), and Manu Lecomte. This […]
AI in the Trades: Key Statistics on Automation Adoption Among Home Service Operators
Explore how AI in the trades is transforming home care with improved scheduling and streamlined operations for service providers.
[BPO Insights] Enterprise Buyers Are Adding AI Clauses to BPO Contracts -- Here's What They Say
Enterprise procurement teams are embedding AI-specific requirements directly into BPO contracts -- from automation minimums and audit trails to hallucination liability and model governance. The new contract language reveals exactly what enterprise buyers expect and which BPOs will survive renewal ...
Microsoft Copilot Studio computer-use agents are now enterprise-ready
That is fine for targeted use cases, but high-volume deployments need proper cost modelling before they scale. The workflows that make the strongest case for computer-use agents in Copilot Studio are the ones with high manual frequency and no API path: invoice processing through vendor portals, updating records in legacy systems, pulling data from internal tools that pre-date modern integration standards. These have sat outside the reach of enterprise automation ...
How To Prove AI ROI In 90 Days, Without Gaming Metrics
AI ROI is not proven by AI activity. It is proven when one important workflow decision improves relative to a clear baseline, while counter-metrics did not get worse.
The CEO AI Confidence Gap Is Costing Enterprises Billions
CEOs are falling for AI demos while employees inherit the broken workflows. Box CEO Aaron Levie explains why executive distance from last-mile work is the real reason enterprise AI agents fail, and what investors should watch instead.
Geopolitics, Policy & Governance
China pushes homegrown AI stack with local chips, LLMs
Following the conclusion of the Trump-Xi meeting and amid continued delays in China approving imports of Nvidia H20 GPUs, China's National Development and Reform Commission (NDRC) on May 22 sent a strong policy signal on artificial intelligence (AI) self-sufficiency, explicitly calling for ...
The US and China Must Unite on AI To Stop the Next Bio Threat | Opinion - Newsweek
On biosecurity, the U.S. and China face a rare reality: cooperation is not a concession—it is the only way to compete safely.
High-Risk AI Systems and the Problem of Identity in the European AI Act
arXiv:2605.23922v1 Announce Type: new Abstract: The EU Artificial Intelligence Act (AIA) establishes a lifecycle governance regime for high-risk AI systems built around ex-ante conformity assessment, post-market monitoring, and re-assessment upon "substantial modification." These obligations presuppose AI identity judgments: regulators and providers must decide when an updated system remains the same system over time. In this work, we show how this logic is clarified by the function+ framework of artifact identity, which individuates AI systems by their intended function together with context-sensitive criteria of appropriate functioning, captured as "AI trustworthiness." We further argue that the AIA does not provide an internal, auditable criterion for synchronic identity--when two AI systems at a given time should count as the same for regulatory purposes--and instead largely defers such sameness determinations to sectoral or harmonization instruments. function+ supplies a synchronic identity test anchored in intended function and trustworthiness profiles and levels, making synchronic identity decisions inspectable in governance settings such as procurement, liability, and market surveillance. Our contribution is a conceptual and auditing lens: we provide a correspondence map between AIA lifecycle obligations and function+ identity components, and we make the synchronic case operationally legible via a minimal decision flow for audit and dispute contexts. We conclude with two implementation-facing recommendations: (1) more precise, testable reporting of intended purpose, and (2) standardized, auditable trustworthiness reporting that supports comparability over time and across deployments.
Is Decentralized AI Governable? From Regulative Policy to Constitutive Protocol
arXiv:2605.24538v1 Announce Type: new Abstract: Every major framework for governing artificial intelligence presupposes an identifiable entity -- a developer, deployer, or operator -- who can be held responsible and compelled to comply. Decentralized AI (DeAI) dissolves this presupposition. We analyze DeAI as a six-layer decentralizing stack -- model, training, compute, harness, identity, and ownership -- and show how partial decentralization across layers compounds into what we call the \emph{governance vacuum}: a condition in which AI systems are consequential enough to require governance but lack the properties that existing frameworks presuppose in their targets. This vacuum takes two analytically distinct forms: an \emph{accountability gap}, where no addressable principal can be identified, and an \emph{incapacitation gap}, where even an identified principal cannot alter the running system. We demonstrate that these failures are not merely jurisdictional but defeat every presupposition of governance through normative address -- the communication of rules to a comprehending, responsive agent. Drawing on Lessig's modalities of regulation and Searle's distinction between regulative and constitutive rules, we argue for a shift in the locus of governance from policy to protocol, from normative address to architectural constraint. Protocol-based constitutive governance does not address the agents operating within a system but shapes the substrate that determines what kinds of actions are possible within it. We identify four ethical conditions -- legitimacy, contestability, transparency, and non-domination -- that such governance must satisfy to avoid degenerating into unaccountable technocratic power, and we argue that the central political challenge of governing AI in a decentralized world is reconstructing forms of democratic authorization for architectural choices that persist after the ordinary chain of policy has broken down.
The permission paradox: Who controls AI as governments scale adoption? - TNGlobal
As AI becomes more integrated into the citizen journey, the focus is extending beyond deployment toward accountability, orchestration, and trust. The next phase of digital government will depend on how effectively agencies connect data, content, and service delivery across increasingly autonomous ...
Charity Digital - Topics - How the EU AI Act impacts you
The EU AI Act is the first comprehensive AI legislation anywhere in the world. We take a look at what it is and what it could mean for charities in the UK
Kenya seeks $20.8M for AI-powered social media monitoring system to enhance government communications
The software would analyse public social media conversations and sentiment around government policies.
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.