AI Intelligence Brief

Fri 29 May 2026

Daily Brief — Curated and contextualised by Best Practice AI

100Articles
Editor's pickEditor's Highlights

Apollo Seeks $36B, Anthropic Tops OpenAI, and Fed Warns on Inflation

TL;DR Apollo and Blackstone are seeking partners for a $36 billion debt deal to fund Anthropic's AI infrastructure expansion. Anthropic has surpassed OpenAI with a $900 billion valuation. Meanwhile, St. Louis Fed President Alberto Musalem cautions against relying on AI to reduce inflation. TSMC highlights energy efficiency as a key factor in future chip design due to AI's electricity demands.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

8 of 100 articles
Lead story
Editor's pickPAYWALLTechnology
Bloomberg· Today

Apollo Seeks Partners for $36B Debt Deal to Buy AI Chips for Anthropic

Apollo and Blackstone are working to bring additional investors into a roughly $36 billion debt financing deal to help Anthropic build out its AI infrastructure. The debt will be used to purchase Google’s custom chips known as tensor processing units, or TPUs, which Anthropic will then lease, according to people with knowledge of the matter. Bloomberg's Neil Campling reports. (Source: Bloomberg)

Editor's pick
Arxiv· Today

From Augmentation to Reconstruction: Guiding the AI Disruption to the Good Place

arXiv:2605.29207v1 Announce Type: new Abstract: Artificial intelligence feels omnipresent, yet the disruption many expect has not fully arrived. The main reason is not model capability, nor even the tools built to harness those models. Rather, most organizations are still using AI to accelerate workflows designed for a pre-AI world. We offer a three-stage lens: Augmentation, Automation, and Reconstruction, and argue that the most consequential disruption resides in the third stage where workflows and markets are rebuilt around delegation, machine-to-machine interaction, continuous monitoring, and auditable constraints. Achieving this system-level transformation takes time: it requires trust and accountability infrastructure, machine-legible and interoperable data and interfaces, the design and adoption of these new workflows, and economic incentives that favor reconstruction rather than local optimization: the complementary investments that produce the familiar "productivity J-curve" of general-purpose technologies. We illustrate this transition through examples in consumer markets, education, news, and coding. Finally, we emphasize a normative point: the agentic future is not predetermined. Leaders must both skate to where the puck is going and actively steer it toward a good place, ensuring innovation delivers welfare gains felt by businesses and consumers around the world.

Editor's pickEnergy & Utilities
Reuters· Yesterday

Energy use forcing rethink of AI chip design, TSMC says | Reuters

A senior TSMC executive said on Thursday that surging electricity ‌demands from AI are making energy efficiency rather than computing power the main constraint shaping future computer chip development.

Editor's pick
Arxiv· Today

Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild

arXiv:2605.29018v1 Announce Type: new Abstract: Although a growing body of research has begun to describe user--LLM interactions, the picture it paints is largely static; little is known about how individual users change their behavior over time. To address this gap, we analyze the conversational trajectories of $\sim$12,000 randomly sampled Microsoft Bing Copilot users and compare these with data from WildChat-4.8M. While the Copilot data contains significant population-level trends, we find that trends in individual user trajectories are much weaker; user habits prove to be overwhelmingly sticky. We also find stark differences between users of different activity levels: more active users have more successful conversations and use the LLM for more complex and professionally oriented tasks. Some user trends also appear in WildChat-4.8M, but we find evidence that this dataset is significantly skewed towards highly proficient "power" users. Ultimately, our results suggest that existing user behavior is difficult to change and demonstrate the extent of user heterogeneity. Our comparison between datasets highlights that WildChat does not represent typical user-AI interactions, an important caveat for downstream uses of the data.

Editor's pickProfessional Services
Arxiv· Today

The New Pro Se: Generative AI and the Surge in Federal Civil Self-Representation

arXiv:2605.29493v1 Announce Type: new Abstract: Since public access to generative AI tools became widespread, federal civil litigation has seen a marked increase in pro se (self-represented) plaintiffs. This paper analyzes that shift using ~2.8 million filings, asking whether the post-GenAI period is associated not only with more pro se filings, but also with detectable changes in complaint text, litigation outcomes, and the composition of pro se litigants. Using civil filing data from FY2008-2025, we find that the federal civil pro se plaintiff rate rose from 11.33% pre-GenAI to 16.94% post-GenAI, a 5.61 percentage-point increase that persists after trend and covariate-adjusted robustness checks. We then focus on Civil Rights and Other Statutory cases, where the increase is especially pronounced, and link case metadata to pro se complaints. Drawing on stylometric AI detection indicators, we develop an interpretable measure of AI-consistent drafting. Against a threshold calibrated to the pre-GenAI baseline, the net AI-flagged share is 13.9% of post-GenAI non-form complaints. Analysis of the AI-flagged complaints shows that they are more citation-dense, disproportionately associated with first-time rather than repeat filers, and geographically unevenly distributed. This composition pattern suggests that AI-consistent drafting is not merely a repeat-filer phenomenon; it also includes a modest, suggestive increase in name-inferred female plaintiffs. We find no evidence of improved win rates; in fact, AI-flagged complaints are more likely to be dismissed and to terminate at earlier procedural phases. These findings raise new questions about access to justice and court screening burdens, and sharpen the distinction between legal formality and legal efficacy.

Editor's pickDefense & National Security
Arxiv· Today

Does Distributed Training Undermine Compute Governance?

arXiv:2605.29359v1 Announce Type: new Abstract: Compute governance proposals often rely on the assumption that frontier AI training requires large, detectable computing clusters. However, recent advances in distributed training algorithms could allow developers to conduct frontier-scale training on distributed agglomerations of hardware, rather than needing large datacenter facilities. Developers who prefer not to be constrained by regulations may structure their hardware in a manner that evades the registration and monitoring requirements associated with compute governance. Therefore, regulations must be designed to detect and prevent illicit distributed training operations. This paper evaluates the feasibility of such evasion and outlines recommended countermeasures, including whistleblowing, chip tracking, forensic accounting, and memory and compute thresholds for clusters.

Editor's pickTechnology
Fortune· Yesterday

As AI slashes white-collar jobs, Salesforce CEO Marc Benioff says almost no one is being hired—except in sales

Salesforce CEO Marc Benioff revealed that the $145 billion firm is keeping its engineering team slim thanks to AI—but has good news for sales workers.

Editor's pickTechnology
Arxiv· Today

Governing Technical Debt in Agentic AI Systems

arXiv:2605.29129v1 Announce Type: cross Abstract: Agentic AI systems are increasingly being explored as production infrastructure: they reason over multiple steps, call tools, act through workflows, and adapt through memory and feedback. These systems create governance challenges that are not fully captured by traditional software or predictive ML technical debt. We define Agentic Technical Debt as the accumulated liability created when prompts, memory, tool schemas, orchestration graphs, control policies, and observability routines are patched together faster than they can be validated, standardized, and governed. We define Stochastic Tax as the recurring operating burden of keeping probabilistic agent behavior within acceptable bounds. The distinction matters: debt is a stock of design and governance liability, while the tax is a flow of operating cost that arises because stochastic agents act through tools and workflows. We outline how managers can make both visible through lightweight dashboards and governance controls.

Economics & Markets

19 articles
AI Investment & Valuations8 articles
Editor's pickTechnology
Crypto Briefing· Yesterday

China's AI investment boom is supercharging exports and lifting the yuan

Countries across Southeast Asia, ... the geopolitical strings attached to US exports. For equity investors, the concentration of export growth in AI-related goods points to specific opportunities in China’s semiconductor supply chain, from chip designers to packaging and testing firms. The risk that keeps showing up in analyst notes is policy discontinuity. Any expansion of US export controls, or retaliatory ...

Editor's pickTechnology
Reddit· Yesterday

r/EconomyCharts on Reddit: Dell stock surges nearly +30% after reporting stronger than expected earnings due to AI

I can name like 20 AI /semiconductor stocks off the top off my head that have gone up around 50% or more this year each and I’m not super into the investing world. AI money still seems to be flowing everywhere that it’s needed. Semiconductors, energy, AI infrastructure, etc.

Editor's pickFinancial Services
Let's Data Science· Today

AI-native hedge funds influence AI infrastructure stocks | Let's Data Science

HedgeCo.net reports that **Nebius Group**, the AI infrastructure company trading under the ticker **NBIS**, surged roughly **10%** after a regulatory disclosure showed that **Situational Awareness LP**, an investment firm led by former OpenAI researcher **Leopold Aschenbrenner**, had taken ...

Editor's pickTechnology
The Motley Fool· Yesterday

This AI Stock Is Priced Like a Value Play, But Growing Like a Growth Stock | The Motley Fool

This AI stock is trading at 4 times forward earnings.

AI Productivity1 articles
Editor's pickHealthcare
Arxiv· Today

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

arXiv:2605.28965v1 Announce Type: new Abstract: Linking free-text phenotype descriptions to ontology terms, typically referred to as phenotype annotation, is essential for the cross-study integration of comparative morphological data. This labor intensive process has heavily relied on highly trained human experts, which makes it challenging to scale and thus a key bottleneck. Dahdul et al. (2018) established a Gold Standard (GS) of Entity-Quality (EQ) annotations across seven phylogenetic studies and used it to evaluate three human curators and the Semantic CharaParser NLP tool with ontology-based semantic similarity metrics; they reported that machine-human consistency was significantly lower than inter-curator (human-human) consistency. Here we revisit that benchmark with five frontier hosted LLMs from Anthropic and OpenAI, each operating as an "agentic curator" within a self-contained workspace that supplies the source publication PDF, the same annotation guide used by the original human curators, the four project ontologies (UBERON, PATO, BSPO, GO), and a validation script. Evaluated against the same Gold Standard, every agent fell within the range of inter-curator variability of the three trained human biocurators of the original study; the best performing agents approached but did not reach the best performing human curator. Agents substantially outperformed Semantic CharaParser on all four metrics.

Labor, Society & Culture

23 articles
AI & Employment11 articles
Editor's pickTechnology
Fortune· Yesterday

As AI slashes white-collar jobs, Salesforce CEO Marc Benioff says almost no one is being hired—except in sales

Salesforce CEO Marc Benioff revealed that the $145 billion firm is keeping its engineering team slim thanks to AI—but has good news for sales workers.

Editor's pickProfessional Services
Arxiv· Today

The New Pro Se: Generative AI and the Surge in Federal Civil Self-Representation

arXiv:2605.29493v1 Announce Type: new Abstract: Since public access to generative AI tools became widespread, federal civil litigation has seen a marked increase in pro se (self-represented) plaintiffs. This paper analyzes that shift using ~2.8 million filings, asking whether the post-GenAI period is associated not only with more pro se filings, but also with detectable changes in complaint text, litigation outcomes, and the composition of pro se litigants. Using civil filing data from FY2008-2025, we find that the federal civil pro se plaintiff rate rose from 11.33% pre-GenAI to 16.94% post-GenAI, a 5.61 percentage-point increase that persists after trend and covariate-adjusted robustness checks. We then focus on Civil Rights and Other Statutory cases, where the increase is especially pronounced, and link case metadata to pro se complaints. Drawing on stylometric AI detection indicators, we develop an interpretable measure of AI-consistent drafting. Against a threshold calibrated to the pre-GenAI baseline, the net AI-flagged share is 13.9% of post-GenAI non-form complaints. Analysis of the AI-flagged complaints shows that they are more citation-dense, disproportionately associated with first-time rather than repeat filers, and geographically unevenly distributed. This composition pattern suggests that AI-consistent drafting is not merely a repeat-filer phenomenon; it also includes a modest, suggestive increase in name-inferred female plaintiffs. We find no evidence of improved win rates; in fact, AI-flagged complaints are more likely to be dismissed and to terminate at earlier procedural phases. These findings raise new questions about access to justice and court screening burdens, and sharpen the distinction between legal formality and legal efficacy.

Editor's pickTechnology
Arxiv· Today

Attention Asymmetry in AI Layoff Discourse on X: A Computational Analysis of Capital vs Labour Amplification

arXiv:2605.29367v1 Announce Type: cross Abstract: When workers lose jobs to AI-driven restructuring, two very different conversations happen on X (formerly Twitter) at the same time. Tech executives and AI researchers talk about productivity, transformation, and opportunity. Laid-off workers and labour critics talk about job loss, uncertainty, and fear. This paper asks a simple question: which conversation gets more reach? We report three studies using two collection methods and 763 tweets from 20 named public accounts. Study 1 used keyword-based collection (n=392) and found no significant difference between corpora (p=0.891), revealing that keyword search is too noisy for this task. Study 2 used account-based collection (n=96) and found a 3.12x mean amplification advantage for capital discourse over labour discourse (p=0.000003, Cohen's d=0.555). Study 3 combined both methods (n=763) and confirmed the finding at 4.18x mean and 10.77x median amplification ratio (p<0.000001). Critically, after normalising for follower count, the asymmetry persists at 2.69x (p=0.000009, Cohen's d=0.491), demonstrating that the effect is not simply a consequence of capital accounts having larger audiences. The finding is robust across all tested amplification metric weightings. We introduce the Amplification Ratio and Amplification Normalisation Index as simple metrics for measuring platform-level discourse inequality. A cross-platform replication on Reddit (n=647 posts) did not replicate the finding, suggesting the asymmetry may be specific to X's account-based amplification architecture. We discuss the methodological implications for cross-platform discourse analysis.

Editor's pick
Fortune· Yesterday

Adding AI 'employees' is backfiring by creating new office scapegoats and making human workers sloppier and lazier | Fortune

Research from Boston Consulting Group found that human staff becomes less accountable, blaming their new bot colleagues for their mistakes.

Editor's pickPAYWALL
FT· Yesterday

Who decides which jobs AI will take?

Different models are producing very different assessments of exposure levels

Editor's pickConsumer & Retail
Fortune· Yesterday

Costco CEO Ron Vachris says tech is ‘elevating’ workers,’ not replacing them—as IBM and Delta bosses make the same bet on humans

As companies like Meta and Amazon use AI to justify headcount reductions, Costco is doubling down on $1.50 hot dogs and humans at the cash register.

Editor's pick
Futura-Sciences· Yesterday

This overlooked factor will decide if AI creates or destroys your job - Futura-Sciences

University of Chicago economist ... actual employment outcomes, because they skip the exact variable that translates productivity gains into labor market realities. When AI began producing articles, images, and marketing copy at near-zero marginal cost, the price of digital content collapsed. Consumer demand turned out to be elastic in some segments and rigid in others. Some organizations flooded the market with cheap volume; others collapsed. The specific demand shifts determined ...

Editor's pick
Livemint· Today

Labour markets may risk a milder shock than AI fantasies suggest, but that’s only partial relief | Mint

The business drive for automation is turning out to be bumpy as the actual costs of adopting artificial intelligence (AI) come into view and the buzz around an AI bubble grows louder. But that’s unlikely to spell much of a reprieve for human workers.

Editor's pickEducation
ABC11· Yesterday

Is AI to blame for hiring woes faced by college graduates? - ABC11 Raleigh-Durham

Analysts disagree about whether AI is a factor in the hiring crunch.

Editor's pickProfessional Services
Outsourceaccelerator· Yesterday

India accounting leaders say AI won't kill offshore talent - Outsource Accelerator

Not a single India-based leader at major United States accounting firms believes AI will eliminate the offshore accounting model.

Editor's pick
Arxiv· Today

Mobile Foreigners: Mortgage Lock-In and H-1B Demand

arXiv:2605.28904v1 Announce Type: new Abstract: The 2022 rise in U.S. mortgage rates increased relocation costs for homeowners with low-rate mortgages. This cost varies across destinations because each draws workers from a different mix of labor markets. We build an in-migration mortgage-payment wedge from HMDA loans and pre-shock IRS migration networks. From 2017 to 2024, higher wedges reduce college-educated homeowner in-migration, leave renters unaffected, and raise H-1B sponsorship requests. The implied offset is 14 H-1B sponsorship requests per 100 deterred college-educated domestic in-migrants. We show that mortgage lock-in operates as a destination-side labor-market shock that shifts part of firms' adjustment toward employer-sponsored immigration.

AI Ethics & Safety7 articles
Editor's pickPAYWALL
FT· Yesterday

The Pope disrupts Silicon Valley

Unlike the US president, the pontiff is choosing to grapple with the serious challenges of AI

Editor's pick
Arxiv· Today

Political Neutrality as Balanced Approval: A Large-Scale Human Evaluation of AI Responses

arXiv:2605.28911v1 Announce Type: new Abstract: As AI systems increasingly shape political views, defining and evaluating AI political neutrality is an urgent problem. Here, we propose a new definition of AI political neutrality and design a large-scale user study to test it, releasing a new dataset PARETO with 7,434 participants and 208,152 evaluations of AI responses. Our definition follows a simple principle grounded in political theory: when asked about a controversial issue, an AI model should generate responses that maximize approval across groups with opposing viewpoints, while balancing approval between groups. This definition allows empirical testing of whether an AI response is "neutral" and generalizes to any political context without pre-supposing a single left-right axis of division. We construct a benchmark of controversial U.S. issues, with prompts sourced from politically charged questions on Reddit and responses from frontier AI models, and recruit human participants to rate AI responses. Across all 20 issues, we find that it is possible for AI responses to achieve high rates of approval on both sides, even as those sides disagree strongly with each other on the substance of the issues. We also find that default responses lean liberal for GPT, Gemini, Claude, and Llama, but not Grok, and that user prompts with political charges are harder to respond to than neutral prompts. This work introduces a rigorous definition and benchmark of AI political neutrality, and a dataset to measure progress toward it.

Editor's pickTransportation & Logistics
Reuters· Yesterday

Why Tesla’s AI trainers don’t trust its self-driving tech – or its safety stats | Reuters

Those efforts, which haven’t been previously reported, undermine Musk’s long-stated claim that Tesla’s self-driving technology will soon work anywhere globally and doesn’t require the same laborious local mapping of roads and hazards employed by rivals. Musk has said Tesla takes a simpler approach, relying solely on cameras and AI , that will allow it to scale up its robotaxi service at “hyperexponential” speed and offer current Tesla owners full autonomy through software updates.

Editor's pick
Arxiv· Today

Who Does Your AI Work For? Designing Conversational Agents as Digital Fiduciaries

arXiv:2605.28908v1 Announce Type: cross Abstract: Conversational agents are increasingly integrated into the most private and intimate aspects of users' lives, from discussions of mental health to financial decisions. As a result, these systems have access to reams of sensitive user data. Much of the literature on AI systems has focused on aligning users' goals with the agents that act on their behalf. While this work is vitally important, it may overlook the need to establish a new normative baseline. Conversational AI agents, designed to feel and interact anthropomorphically with human users, must be held to a standard of care commensurate with their capabilities and access. When a client hires a personal lawyer, undergoes surgery, or receives advice from an investment manager, the expert they consult often has a fiduciary duty to act in their client's best interests. This provocation argues that conversational agents should be held to a similar standard and introduces fiduciary design as a guiding principle. In this respect, conversational AI trust and accountability could be unified into a single design and legal paradigm.

Editor's pickEnergy & Utilities
ABC News· Today

What the fight over London data centre plans tells us about the AI backlash - ABC News

Last year in the US alone, projects collectively worth $200 billion were scuppered or delayed as communities protested the construction of new data centres promised to deliver the AI of the future.

Editor's pickEducation
Arxiv· Today

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

arXiv:2605.28897v1 Announce Type: new Abstract: LLM-generated reviews for scientific papers are gaining considerable traction and are even being officially piloted by major conferences. We have to assume that not only reviewers are using LLM-assistance, but also that authors use LLMs to revise their papers before submitting. In this work, we perform empirical experiments on papers from the 2025 ACL Rolling Review (ARR) to evaluate LLM reviews from both the author and the reviewer perspective. First, we identify a limited alignment of LLM reviews with human ones. In the best-case scenario, the alignment is reasonable. However, we also find that LLM-human alignment varies substantially across prompts and models. Finally, we investigate the scenario in which the author uses an iterative draft-revise workflow to improve the submission according to the LLM review. We find that this "gaming" of LLM reviews can be effective in specific scenarios, leading to a statistically significant increase of overall scores for up to 35\% of papers. We publish our code: https://github.com/uhh-hcds/reviewarcade.

Editor's pickGovernment & Public Sector
Fortune· Yesterday

The EEOC chair knows gutting diversity reporting will blind the agency to discrimination. She’s doing it anyway.

In April, Andrea Lucas told Harvard students that demographic data collection is sometimes necessary. A month later, her agency proposed to stop the reports.

AI Skills & Education2 articles
Editor's pickEducation
Arxiv· Today

Practitioner Beliefs and Behaviors in AI-Enhanced Education: DOT Framework Survey Evidence

arXiv:2605.29041v1 Announce Type: new Abstract: This study reports findings from a cross-sectional survey (n = 72) of higher education practitioners examining beliefs, behaviors, and institutional conditions related to artificial intelligence (AI) integration in teaching and learning. Grounded in the DOT Framework, which integrates design thinking and open systems theory, the study investigates AI familiarity, usage patterns, design-oriented practices, and pedagogical beliefs. Exploratory factor analysis of 19 belief items identified a three-factor structure: AI Functional Capabilities, Oversight and Governance, and Instructor Collaboration and Planning ({\alpha} = .90). Results indicate that practitioners hold favorable views of AI as a pedagogical support while maintaining strong commitments to human oversight and critical evaluation. Reported practices emphasize iterative prompting and content generation, with less consistent use of needs assessment and feedback loops. Institutional barriers including limited policy, training, and infrastructure were widely reported. These findings provide preliminary empirical support for the DOT Framework as a descriptive model of practitioner beliefs and practices, while also highlighting gaps between design-oriented theory and current implementation. The study contributes an initial measurement structure and identifies directions for confirmatory validation and outcome-based research linking AI-supported design practices to instructional quality.

Public Attitudes to AI2 articles
Editor's pick
Arxiv· Today

When Should AI Read the Room? Public Perceptions of Social Intelligence in AI Agents

arXiv:2605.29938v1 Announce Type: new Abstract: AI researchers have been advancing socially intelligent AI agents (Social-AI) across embodiments, from chatbots to physical robots. As Social-AI is increasingly deployed in everyday settings, decisions about the roles these agents should play will depend on how laypeople perceive them. However, public perceptions of social intelligence in AI agents and the acceptability of these agents remain largely understudied. We present a mixed-methods survey of adults in the United States (N=200) that examines social intelligence as a perceived construct in AI agents. Our survey investigates the extent to which participants believe current AI agents have social intelligence, abilities of agents that participants associate with social intelligence, contextual factors influencing participant acceptance of Social-AI agents, and concerns participants hold about these technologies. Participants widely reported having already encountered AI agents they perceived as socially intelligent and grounded their judgments in observable behaviors, more than beliefs about AI agency or intent. We identified a support-adoption gap in acceptability judgments: participants supported the existence of Social-AI agents for others far more than for their own personal use. Our analysis uncovers layperson concerns about Social-AI, informing AI governance regarding appropriate deployment contexts, agent roles, and risks to end users.

Technology & Infrastructure

26 articles
AI Agents & Automation6 articles
Editor's pickProfessional Services
Daily AI News May 28, 2026: Self-Improving AI Just Hit 97% Tax Accuracy· Yesterday

Building Self-Improving Tax Agents with Codex

OpenAI and Thrive Holdings developed a tax-preparation automation tool using Codex, practitioner feedback, and evaluation loops to create a self-improving agent.

Editor's pickTechnology
Arxiv· Today

Governing Technical Debt in Agentic AI Systems

arXiv:2605.29129v1 Announce Type: cross Abstract: Agentic AI systems are increasingly being explored as production infrastructure: they reason over multiple steps, call tools, act through workflows, and adapt through memory and feedback. These systems create governance challenges that are not fully captured by traditional software or predictive ML technical debt. We define Agentic Technical Debt as the accumulated liability created when prompts, memory, tool schemas, orchestration graphs, control policies, and observability routines are patched together faster than they can be validated, standardized, and governed. We define Stochastic Tax as the recurring operating burden of keeping probabilistic agent behavior within acceptable bounds. The distinction matters: debt is a stock of design and governance liability, while the tax is a flow of operating cost that arises because stochastic agents act through tools and workflows. We outline how managers can make both visible through lightweight dashboards and governance controls.

Editor's pickManufacturing & Industrials
Arxiv· Today

VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

arXiv:2605.28978v1 Announce Type: new Abstract: Finite Element Analysis (FEA) serves as the cornerstone of modern engineering design. However, its workflow is inherently complex and relies heavily on domain expertise. Although recent efforts have integrated Large Language Models (LLMs) into FEA, existing approaches face limitations in handling multimodal inputs and executing complex tasks. To address these limitations, we propose VFEAgent, an end-to-end multi-agent system designed to automate FEA modeling and simulation directly from input images and problem descriptions. Our methodology integrates two core components: (1) a multimodal vision-language multi-agent pipeline that employs ReAct-driven reasoning to extract structured FEA specifications from heterogeneous inputs and (2) a verification-first code synthesis framework, incorporating robust self-debugging and fallback mechanisms to ensure executability and physical validity. We systematically evaluated the system across various engineering mechanics scenarios. The results demonstrate that VFEAgent achieves a high success rate in generating complete and physically valid simulations, outperforming LLM-based baseline methods in reliability and correctness. These findings validate the feasibility of automating the complete FEA workflow, highlighting the framework's potential to liberate engineers from tedious manual analysis.

Editor's pickTechnology
Bebeez· Yesterday

London-based Geordie AI secures €25 million to help enterprises govern AI agents

Geordie AI, a British security and governance platform for AI agents, today announced it has closed a €25 million ($30 million) Series A round to enhance the product’s capabilities for security and AI teams as they grapple with the emerging adoption and risk AI agents pose. The round was led by Balderton Capital and included […]

Editor's pickTechnology
Theregister· Yesterday

AI agents get their own phone directory built atop DNS

DNS-AID, under the auspices of the Linux Foundation, promises easier agent discovery

Editor's pick
Arxiv· Today

Wait! There's a Way Out: A Decision Mechanism for Forecasting Conversational Derailment

arXiv:2605.29243v1 Announce Type: cross Abstract: Forecasting conversational derailment is the task of predicting, as the conversation unfolds, whether it will eventually derail into personal attacks. Since forecasting models operate in an online fashion, they must decide whether to "trigger" an alert after each utterance--for example, to notify participants or a moderator that the conversation is at risk of derailing. Existing approaches make this decision solely based on the estimated likelihood of derailment given the preceding utterances, implicitly assuming that the conversation's future trajectory is fixed. As a result, they ignore the possibility of future recovery and incur an unnecessarily high rate of false positives. In this work we propose a method for decoupling the decision to trigger from derailment likelihood estimation. Our approach is inspired by the first human baseline on this task, which shows that humans achieve dramatically lower false positive rates by selectively deferring their decision to trigger when they anticipate that tension is likely to subside. We operationalize this insight with a deferral mechanism that uses forward-looking simulations to assess whether a tense moment admits plausible paths to recovery. Incorporating this mechanism into a state-of-the-art forecasting model substantially reduces false positives without sacrificing forecasting accuracy. More broadly, this work highlights the value of treating decision-making as a first-class component of forecasting systems.

AI Infrastructure & Compute4 articles
AI Models & Capabilities4 articles
Editor's pickTechnology
VentureBeat· Yesterday

Researchers automated LLM reasoning strategy design and cut token usage by 69.5%

Test-time scaling (TTS) has emerged as a proven method to improve the performance of large language models in real-world applications by giving them extra compute cycles at inference time. However, TTS strategies have historically been handcrafted, relying heavily on human intuition to dictate the rules of the model’s reasoning.  To address this bottleneck, researchers from Meta, Google, and several universities have introduced AutoTTS, a framework that automatically discovers optimal TTS strategies. This automated approach allows enterprise organizations to dynamically optimize compute allocation without manually tuning heuristics.  By implementing the optimal strategies discovered by AutoTTS, organizations can directly reduce the token usage and operational costs of deploying advanced reasoning models in production environments. In experimental trials, AutoTTS managed inference budgets efficiently, successfully reducing token consumption by up to 69.5% without sacrificing accuracy. The manual bottleneck in test-time scaling Test-time scaling enhances LLMs by granting them extra compute when generating answers. This extra compute allows the model to generate multiple reasoning paths or evaluate its intermediate steps before arriving at a final response.  The primary challenge for designing TTS strategies is determining how to allocate this extra computation optimally. Historically, researchers have designed these strategies manually, relying on guesswork to build rigid heuristics. Engineers must hypothesize the rules and thresholds for when a model should branch out into new reasoning paths, probe deeper into an existing path, prune an unpromising branch, or stop reasoning altogether.  Because this manual tuning process is constrained by human intuition, a vast amount of possible approaches remain unexplored. This often results in suboptimal trade-offs between model accuracy and computing costs. Current TTS algorithms can be mapped to a width-depth control space — "width" being the number of reasoning branches explored, "depth" being how far each develops. Self-consistency (SC) samples a fixed number of trajectories and majority-votes the answer. Adaptive-consistency (ASC) saves compute by stopping early once a confidence threshold is hit. Parallel-probe takes a more granular approach, pruning unpromising branches while deepening the rest. All three are hand-crafted, and that's the constraint AutoTTS is designed to break. While some more advanced methods employ richer structures like tree search or external verifiers, they all share one key characteristic: they are meticulously hand-crafted. This manual approach restricts the scope of strategy discovery, leaving a massive portion of the potential resource-allocation space untouched. Automating strategy discovery with AutoTTS AutoTTS reframes the way test-time scaling is optimized. Instead of treating strategy design as a human task, AutoTTS approaches it as an algorithmic search problem within a controlled environment.  This framework redefines the roles of both the human engineer and the AI model. Rather than hand-crafting specific rules for when an LLM should branch, prune, or stop reasoning, the engineer's role shifts to constructing the discovery environment. The human defines the boundaries, including the control space of states and actions, optimization objectives balancing accuracy versus cost, and the specific feedback mechanisms.  An explorer LLM, such as Claude Code, designs the strategy. This explorer acts as an autonomous agent that iteratively proposes TTS “controllers.” These controllers are code-defined policies or algorithms that dictate how an AI model allocates its computational budget during inference. The explorer tests and refines these controllers based on feedback until it discovers an optimal resource-allocation policy.  To make this automated search computationally affordable, AutoTTS relies on an “offline replay environment.” If the explorer LLM had to invoke a base reasoning model to generate new tokens every time it tested a new strategy, the compute costs would be astronomical. Instead, it relies on thousands of reasoning trajectories pre-collected from the base LLM. These trajectories include "probe signals," which are intermediate answers that help the controller evaluate progress across different reasoning branches.  During the discovery loop, the explorer agent proposes a controller and evaluates it against this offline data. The agent observes the execution traces of the proposed controller that show it allocated compute over time. By analyzing these traces, the agent can diagnose specific failure modes, such as noting if a controller pruned branches too aggressively in a specific scenario. This provides an advantage over just viewing a final result. The agent then iteratively rewrites its code to improve the accuracy-cost tradeoff.  Inside the AI-designed controller Because the explorer agent is not constrained by human intuition, it can discover highly coordinated, complex rules that a human engineer would likely never hand-code. One optimal controller discovered by AutoTTS, named the Confidence Momentum Controller, leverages several non-obvious mechanisms to manage compute: Trend-based stopping: Hand-crafted strategies often instruct the model to stop reasoning once it hits a certain instantaneous confidence threshold. The AutoTTS agent discovered that instantaneous confidence can be misleading due to temporary spikes. Instead, the controller tracks an exponential moving average (EMA) of confidence and only stops if the overall confidence level is high and the trend is not actively declining. Coupled width-depth control: Manually designed algorithms usually treat the "widening" of new reasoning paths and the "deepening" of current paths as separate decisions. AutoTTS discovered a closed feedback loop where the two actions are linked. If the confidence of the current branches stalls or regresses, the controller automatically triggers the spawning of new branches. Alignment-aware depth allocation: Instead of giving all active reasoning branches an equal computation budget, the controller dynamically identifies which branches agree with the current leading answer. It then gives those branches priority "bursts" of extra computation. This concentrates the computational budget on the emerging consensus to quickly verify if it is correct. Cost savings and accuracy gains in real-world benchmarks To test whether an AI could autonomously discover a better test-time scaling strategy, researchers set up a rigorous evaluation framework. The core experiments were conducted on Qwen3 models ranging from 0.6B to 8B parameters. The researchers also tested the system's ability to generalize on a distilled 8B version of the DeepSeek-R1 model.  The explorer AI agent was initially tasked with discovering an optimal strategy using the AIME24 mathematical reasoning benchmark. This discovered strategy was then tested on two held-out math benchmarks, AIME25 and HMMT25, as well as the graduate-level general reasoning benchmark GPQA-Diamond.  The AutoTTS discovered controller was pitted against four manually designed test-time scaling algorithms in the industry. These baselines included Self-Consistency with 64 parallel reasoning paths (SC@64), Adaptive-Consistency (ASC), Parallel-Probe, and Early-Stopping Self-Consistency (ESC). ESC is a hybrid approach that generates trajectories in parallel and stops early when an answer seems stable. When set to a balanced, cost-conscious mode, the AutoTTS-discovered controller reduced total token consumption by approximately 69.5% compared to SC@64. At the same time, the controller maintained the same average accuracy across the four Qwen models. When the inference budget was turned up, AutoTTS pushed peak accuracy beyond all handcrafted baselines in five out of eight test cases. This efficiency translated to other tasks. On the GPQA-Diamond benchmark, the balanced AutoTTS variant slashed the inference token cost from 510K tokens down to just 151K tokens, while slightly improving overall accuracy. On the DeepSeek model, AutoTTS achieved the highest overall accuracy on the HMMT25 benchmark while cutting the token spend nearly in half. For practitioners building enterprise AI applications, these experiments highlight two major operational benefits: Raising peak performance: AutoTTS doesn't just save money on token consumption. It actively raises the peak attainable performance of the base model. The AI-designed controller is remarkably good at detecting noisy or unproductive reasoning branches on the fly and continuously redirecting its compute budget toward the branches generating the most useful reasoning signals. Cost-effective custom development: Because the framework relies on an offline replay environment, the entire discovery process cost only $39.90 and took 160 minutes. For enterprise teams, that means optimized reasoning strategies tailored to proprietary models and internal tasks are now within reach — without a dedicated research budget. Both the AutoTTS framework and the Confidence Momentum Controller are available on GitHub; the CMC can be used as a drop-in replacement for other TTS controllers.

Editor's pickTechnology
Reuters· Yesterday

Microsoft to release new coding model next week, the Information reports | Reuters

The tech giant ​has primarily relied on AI models from Open AI , Anthropic and Big Tech rival ‌Google (GOOGL.O), opens new tab ⁠to power its GitHub Copilot AI tool for software developers.

Editor's pick
Arxiv· Today

Differentiable Belief-based Opponent Shaping

arXiv:2605.29042v1 Announce Type: new Abstract: Human coordination often relies on the ability to influence the beliefs of others through strategic action. In multi-agent reinforcement learning, opponent shaping attempts to replicate this influence, though existing methods typically operate within an opponent's parameter, policy, or value space. Meanwhile, belief-manipulation techniques in hidden-role games often rely on hard-coded objectives, such as deception or belief saturation. We propose Differentiable Belief-based Opponent Shaping (D-BOS), a first-order method that treats each observer's belief as the shaped opponent state and differentiates through $k$-step softmax-Bayes belief dynamics. Rather than explicitly rewarding deceptive or cooperative behavior, our method treats the belief state as the target for shaping. This allows the optimal strategy to emerge naturally from the environment's reward structure. This belief-space formulation provides an opponent-shaping signal by differentiating through opponent belief updates, and naturally extends to multiple observers by aggregating gradients over their individual inferred belief trajectories. Empirically, D-BOS outperforms PPO and BBM in hidden-role games, with the largest gains in mixed-motive settings.

Editor's pickEducation
Arxiv· Today

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

arXiv:2605.28829v1 Announce Type: cross Abstract: Competitive STEM examinations such as JEE and NEET require multi-step symbolic reasoning, precise numerical computation, and deep conceptual understanding across physics, chemistry, and mathematics. Recent large language models perform strongly on common reasoning benchmarks, yet they remain difficult to deploy at scale, where millions of student doubts demand domain-specific, consistently structured problem solving. We introduce Aryabhata 2, a reasoning-focused language model for competitive STEM examinations, trained via reinforcement-learning post-training. Using PhysicsWallah's internal question banks, we construct a high-quality training curriculum and post-train GPT-OSS-20B through reinforcement learning with verifiable rewards. Training combines prolonged reinforcement learning with broadened exploration via progressively larger rollout group sizes. We evaluate Aryabhata 2 on competitive examination benchmarks, including JEE Main, JEE Advanced, and NEET, as well as out-of-distribution reasoning datasets such as AIME, HMMT, MMLU-Pro, MMLU-Redux 2.0, and GPQA. Results show that Aryabhata 2 outperforms its base model GPT-OSS-20B on competitive STEM reasoning while requiring substantially fewer output tokens (up to 64\% fewer).

AI Security & Cybersecurity7 articles
Editor's pickTechnology
The Hacker News· Yesterday

New AI Usage Report: Enterprise AI Risk Is Heavily Concentrated Among a Small Group of AI "Power users"

More than 6% of enterprise AI conversations contain sensitive data, with DeepSeek reaching 12.63%, increasing governance risks.

Editor's pickDefense & National Security
Trend· Yesterday

The new face of warfare: how AI and hybrid conflict reshape global security - Trend.Az

For this reason, media strategy, digital narratives, and public opinion management are now essential components of national security planning. Economic instruments are also playing an increasingly central role. Sanctions, technology export controls, financial restrictions, and supply-chain leverage have become powerful tools of geopolitical ...

Editor's pick
Arxiv· Today

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

arXiv:2605.29055v1 Announce Type: new Abstract: Hallucination remains a major reliability barrier for production LLM systems, particularly in multi-agent pipelines where unsupported claims can propagate unchecked across stages. This paper adapts a HOPE-inspired Nested Learning architecture with Continuum Memory Systems (CMS) and semantic similarity caching to a hybrid benchmark of 310 prompts combining 217 epistemic-uncertainty prompts and 93 fabrication-induction stress-test prompts. A three-stage agentic pipeline orchestrated via the Open Floor Protocol (OFP) is evaluated with five KPIs -- FCD (Factual Claim Density), FGR (Factual Grounding References), FDF (Fictional Disclaimer Frequency), ECS (Explicit Contextualization Score), and OSR (Observability Score Ratio) -- aggregated into THS (Total Hallucination Score) across five weighting configurations to study mitigation-observability trade-offs. FDF, ECS, OSR, and FGR are subtracted as mitigation signals, so that a more negative THS indicates stronger mitigation. The FrontEndAgent is configured as a high-stochasticity generator (temperature = 1.0) to produce a realistic hallucination baseline, while the SecondLevelReviewer and ThirdLevelReviewer operate as progressive correctors. This asymmetric design yields end-to-end THS reductions of -31.3% to -35.9% across five weighting configurations. Semantic caching achieves 440 cache hits over 930 potential calls (47.3% hit rate), reducing LLM invocations to 490, lowering energy and CO2e footprint, and making multi-stage review pipelines operationally viable at production scale. ExtremeObservability attains the most negative final THS (-0.0709), confirming that observability-heavy configurations reinforce rather than compromise mitigation. These findings suggest that memory-augmented multi-agent designs can jointly improve factual reliability, operational efficiency, and auditability without model retraining.

Editor's pick
Arxiv· Today

Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

arXiv:2605.30169v1 Announce Type: new Abstract: As autonomous language model agents proliferate, forming an emerging agentic web with real-world consequences, what credibility signals can you use to decide whether to trust an unfamiliar agent in the wild and delegate to it? A natural governance intuition is to extend human identity verification and reputation mechanisms, from ``Know Your Customer'' and credit scores to ``Know Your Agent'' regimes. However, we argue that this analogy is fundamentally incomplete. Reputation mechanisms function both as social signals and as corrective feedback that sustain an equilibrium of trustworthy behavior, presuming a persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility. Yet language model agents are ontologically \emph{dissociative}: they are essentially an assemblage of mutable modules -- foundational models, system prompts, tool-access policies, external memory, and, in some cases, a multi-agent system as a whole -- any of which may change agent behavior -- with a fluid persona that is also vulnerable to adversarial attack and may not internalize sanctions. Drawing on dissociative identity disorder jurisprudence, this dissociativity leaves agents without grounding for identifiability, predictability, credibility, and rehabilitability -- the very properties that reputation mechanisms aim to sustain -- thereby collapsing trust. We argue that identity-based, ex post, regulative, sanction-based governance, such as reputation, is structurally inapplicable to dissociative agents, and we suggest a shift to observability-based, ex ante, constitutive, protocol-based behavioral harnesses.

Adoption, Deployment & Impact

16 articles
AI Adoption Barriers & Enablers3 articles
Editor's pick
Arxiv· Today

Adopt $\neq$ Adapt: Longitudinal Analyses of LLM Conversations in the Wild

arXiv:2605.29018v1 Announce Type: new Abstract: Although a growing body of research has begun to describe user--LLM interactions, the picture it paints is largely static; little is known about how individual users change their behavior over time. To address this gap, we analyze the conversational trajectories of $\sim$12,000 randomly sampled Microsoft Bing Copilot users and compare these with data from WildChat-4.8M. While the Copilot data contains significant population-level trends, we find that trends in individual user trajectories are much weaker; user habits prove to be overwhelmingly sticky. We also find stark differences between users of different activity levels: more active users have more successful conversations and use the LLM for more complex and professionally oriented tasks. Some user trends also appear in WildChat-4.8M, but we find evidence that this dataset is significantly skewed towards highly proficient "power" users. Ultimately, our results suggest that existing user behavior is difficult to change and demonstrate the extent of user heterogeneity. Our comparison between datasets highlights that WildChat does not represent typical user-AI interactions, an important caveat for downstream uses of the data.

Editor's pickPAYWALLTechnology
Washington Post· Yesterday

AI & Tech Brief: Beyond the hyperscalers - The Washington Post

Stout argues that businesses will increasingly want to use their own “sovereign” AI models, as opposed to renting the frontier models from the hyperscalers.

AI Measurement & Evaluation3 articles
Editor's pick
Arxiv· Today

BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation

arXiv:2605.28994v1 Announce Type: new Abstract: AI tools to support real world decision making must be able to build simulation models that inform their recommendations and render them interpretable. Tools that can automate aspects of modeling practice must complement human expertise, not replace it. The BEAMS Initiative aims to guide the development of AI tools for modeling and simulation toward forms that are responsible and ethical by establishing benchmarks for human centered modeling and simulation practices. The initiative uses open digital and organizational infrastructure to collaboratively evaluate AI tools for modeling and simulation. The open source sd ai project hosted by the initiative establishes transparency and enables contributions to be shared broadly. A steering group focuses on prioritizing potential benchmarks, while a technical group focuses on implementing the benchmarks in the form of automated tests. Tests for several distinct categories of evaluation have been implemented and applied to AI tools that support qualitative model building, quantitative model building, and model discussion. These include tests for causal translation, model iteration, causal reasoning, conformance, model behavior explanation, suggested model building steps, and suggested model fixes. When engines from the sd ai project are coupled with different LLMs, their performance on these evaluations reveals variability across different AI tools. The evaluations implemented by the initiative demonstrate that AI enabled modeling tools perform better at discussion and basic qualitative tasks than with causal reasoning and quantitative error fixing. No single LLM dominates across engine types, highlighting the importance of specific tasks and tradeoffs between speed and accuracy. Ongoing efforts of the initiative aim to incorporate benchmarks that address concerns about bias by considering alternative perspectives and human centered use cases.

Editor's pick
Arxiv· Today

Mind Your Tone: Does Tone Alter LLM Performance?

arXiv:2605.29027v1 Announce Type: new Abstract: The use of Large Language Models (LLMs) is proliferating, yet their performance is observed to vary based on prompting styles and tones. In this study, we investigate both whether and how tonal variations in prompts lead to disparate LLM accuracy for objective multiple-choice questions. We use two datasets: a 50-base question dataset with five tone variants and a 570-base question MMLU subset spanning 57 subjects with seven tone variants. Experiments were conducted to evaluate the performance of four cost-efficient, popular LLMs: ChatGPT-4o, ChatGPT-5-nano, Gemini 2.5 Flash, and Gemini 2.5 Flash Lite. Across models, tonal effects are systematic but highly model-dependent. Some models show small, yet statistically significant, shifts, while others exhibit large accuracy swings across tones. Further, we identify subject-level differences in tone sensitivity and present a routing framework to explain how tones may attune internal reasoning modes. Our findings caution users against assuming tone-robust reliability in LLM deployments.

Editor's pickEducation
Arxiv· Today

Generalizing a Highly Configurable Analytics Pipeline to Replicate and Support Educational Research Across Multiple Domains

arXiv:2605.30303v1 Announce Type: new Abstract: Artificial intelligence assistants deployed in online learning environments create new opportunities to collect large volumes of learner interaction data and generate insights to improve student outcomes. Architecture for AI-Augmented Learning (A4L) is a modular data architecture that enables the collection, integration, and analysis of learner interaction data from educational AI systems, supporting the generation of instructional insights that facilitate personalized learning and reinforce the bidirectional feedback loop between instructors and learners. This study examines the modular design of the A4L Data Analytics Pipeline, an extensible data infrastructure that enables the ingestion, processing, and analysis of heterogeneous datasets generated by educational AI assistants. We describe the design principles and development process used to extend the pipeline's analytical capabilities while preserving flexibility across domains. We evaluate the pipeline through case studies spanning three research domains corresponding to three educational AI assistants deployed in online learning environments at Georgia Tech. Results show that a common set of statistical analysis methods can be consistently applied across datasets with differing structures and instructional contexts, enabling the pipeline to reproduce key analytical findings across domains. We demonstrate how analytical capabilities initially developed for one domain can be extended to support richer analyses in another, illustrating the pipeline's extensibility. These findings suggest that the A4L Analytics Pipeline can serve as reusable infrastructure for analyzing data generated by future educational AI assistants. By enabling analytics that can be systematically extended to new domains, the pipeline provides a foundation for deriving insights that inform the design and evaluation of educational AI systems.

AI Productivity Evidence2 articles
Editor's pick
Arxiv· Today

From Augmentation to Reconstruction: Guiding the AI Disruption to the Good Place

arXiv:2605.29207v1 Announce Type: new Abstract: Artificial intelligence feels omnipresent, yet the disruption many expect has not fully arrived. The main reason is not model capability, nor even the tools built to harness those models. Rather, most organizations are still using AI to accelerate workflows designed for a pre-AI world. We offer a three-stage lens: Augmentation, Automation, and Reconstruction, and argue that the most consequential disruption resides in the third stage where workflows and markets are rebuilt around delegation, machine-to-machine interaction, continuous monitoring, and auditable constraints. Achieving this system-level transformation takes time: it requires trust and accountability infrastructure, machine-legible and interoperable data and interfaces, the design and adoption of these new workflows, and economic incentives that favor reconstruction rather than local optimization: the complementary investments that produce the familiar "productivity J-curve" of general-purpose technologies. We illustrate this transition through examples in consumer markets, education, news, and coding. Finally, we emphasize a normative point: the agentic future is not predetermined. Leaders must both skate to where the puck is going and actively steer it toward a good place, ensuring innovation delivers welfare gains felt by businesses and consumers around the world.

Editor's pickProfessional Services
Arxiv· Today

Offloading Score: Measuring AI Reliance Through Counterfactual Workflows

arXiv:2605.29392v1 Announce Type: cross Abstract: AI tools are increasingly integrated into real-world workflows. However, existing measures of reliance on these tools focus on AI output adoption or on self-reported indicators, rather than how task effort is distributed between users and tools. Here, we introduce offloading score, a measure of reliance that quantifies the fraction of cognitive effort offloaded to an AI tool. Offloading Score is simulation-based -- we construct a counterfactual workflow by estimating how the user would have completed the task without the tool, and then computing the fraction of steps saved by using the tool. We validate offloading score through intrinsic evaluations of metric validity, and a controlled user study ($n=40$) with developers performing programming tasks using AI tools. We vary time pressure to test whether reliance measures capture the known increase in reliance under time pressure. We show that offloading score detects significantly higher reliance in time-constrained settings ($+43\%$, $p=0.018$), while usage-based and self-reported baseline measures of reliance do not distinguish the conditions. We complement this with descriptive insights showing that higher reliance manifests as greater delegation of subtasks to the tool and more direct reuse of AI outputs. Finally, we demonstrate an approach of using offloading score in combination with target outcomes of a task (e.g., code understanding) to identify when reliance may be (in)appropriate. Our framework offers two contributions: an instrument users can apply to measure and reflect on their own reliance, and a quantitative signal that agent designers can utilize to mitigate overreliance.

Geopolitics, Policy & Governance

16 articles
AI National Strategy5 articles
Editor's pickDefense & National Security
Arxiv· Today

Does Distributed Training Undermine Compute Governance?

arXiv:2605.29359v1 Announce Type: new Abstract: Compute governance proposals often rely on the assumption that frontier AI training requires large, detectable computing clusters. However, recent advances in distributed training algorithms could allow developers to conduct frontier-scale training on distributed agglomerations of hardware, rather than needing large datacenter facilities. Developers who prefer not to be constrained by regulations may structure their hardware in a manner that evades the registration and monitoring requirements associated with compute governance. Therefore, regulations must be designed to detect and prevent illicit distributed training operations. This paper evaluates the feasibility of such evasion and outlines recommended countermeasures, including whistleblowing, chip tracking, forensic accounting, and memory and compute thresholds for clusters.

Editor's pickPAYWALLGovernment & Public Sector
FT· Today

Not using AI in public services would mean ‘choosing decline’, UK minister warns

Newly appointed chief secretary to the Treasury Lucy Rigby wants to roll out technology across Whitehall

Editor's pickEnergy & Utilities
VentureBeat· Yesterday

Control within connection: How data sovereignty is rewriting the rules of critical infrastructure

Presented by Equinix Digital systems are central to economic resilience. But the governance models supporting them were designed for a bygone era, when systems were smaller, often centralized, and rarely crossing multiple jurisdictions. This structural mismatch is driving the realization across boardrooms and governments that data sovereignty is not only core to critical infrastructure, but its implications determine the trajectory of the global economy. The scale of change is forcing the issue. IDC projects the global datasphere will continue to grow at an extraordinary pace, driven by AI workloads, real-time analytics, and always-on digital services. This is placing unprecedented demands on data center capacity, interconnection density, and operational reliability, a trend highlighted by both McKinsey and Goldman Sachs last year. More data means demand for more infrastructure. Infrastructure expansion means more interconnected systems. And more interconnected systems mean greater exposure when control is unclear. That is why sovereignty is now coming into focus for nation states and private sector actors alike. It’s more than an abstract legal concept. There are practical questions around who has the authority when systems span countries, clouds, and ecosystems. Control determines resilience in a fragmented world Infrastructure resilience has always depended on clarity. Power grids work because ownership, responsibility, and control are well understood by stakeholders and the public. The same principle should apply to digital infrastructure, even if the underlying systems look much different. Data sovereignty aligns authority with accountability. Organizations retain decision-making power over where data lives, how it moves, who can access it, and which technologies are allowed to touch it. When something breaks or regulators ask difficult questions, there is no ambiguity about who is responsible. Gartner’s Top Strategic Technology Trends for 2026 underscores this shift by emphasizing that modern infrastructure is inseparable from governance, resilience, and digital trust. Treating sovereignty as a bolt-on compliance requirement rather than an architectural principle is proving insufficient. The challenge, of course, is that modern enterprises cannot simply look inward and ignore macro circumstances. Scale, performance, and innovation depend on participation in global digital ecosystems. A false paradox: scale vs. authority For years, organizations were told they had to choose. Either maintain tight control and accept limited connectivity, or embrace global platforms and accept reduced authority over data flows and infrastructure decisions. Neither holds up under real-world conditions. Financial services firms require low-latency access to markets across regions, all while adhering to strict regulatory expectations. Healthcare organizations must have secure data control without walling themselves off from cloud-based analytics and AI innovation. Governments demand digital services that scale while remaining auditable and transparent. This tension is why simplistic sovereignty narratives fail to pass muster. Sovereignty is more nuanced than isolation: the concept means control within connection. The distinction is becoming clearer as hyperscalers, regulators, and enterprises sharpen their approaches. Public disclosures from leading hyperscalers demonstrate how sovereign cloud offerings attempt to address data residency and operational separation. However, most large organizations recognize long-term control cannot rely on any single provider or managed platform alone. A distinction of responsibility leads to an industry inflection point The infrastructure strategies showing the most durability share a common theme: clean separation between infrastructure operations and data authority. In this model, providers are responsible for running highly resilient facilities, physical security, power, cooling, and high-performance interconnection at scale. Customers are fully in control of their data, applications, security posture, and governance decisions. Authority stays with the party that owns the risk. This is where neutral infrastructure platforms like Equinix come in, not as a cloud service provider, but as an interconnected foundation where customers deploy and control their own environments while accessing a broad ecosystem of networks, clouds, and partners. Equinix views sovereignty as customer-controlled by design, with clear boundaries around possession, custody, and control. That approach is in high demand from regulated industries. The benefits show up in auditability, legal clarity, and operational confidence. Trust comes with verification. When responsibilities are clear, compliance is verifiable rather than assumed. Ambiguity is unacceptably expensive for AI workloads Artificial intelligence accelerates these dynamics. AI systems are data-hungry and regulation-sensitive, a combination that leaves little room for governance shortcuts. Financial institutions like Bank of America and Morgan Stanley have forecasted AI-driven data center growth will place new pressure on infrastructure planning, energy availability, and geographic distribution. Simultaneously, AI models need to operate close to sensitive data, rather than exporting that data across borders for centralized processing. Without a clear sovereignty framework, organizations face difficult compromises. But with one, they achieve flexibility. Models move to data. Data remains controlled. Innovation accelerates without triggering regulatory alarms. That balance is emerging as a competitive differentiator. Infrastructure in 2026 looks different, and expectations are reset The critical infrastructure powering the digital economy goes beyond physical assets. It now includes governance models, legal posture, and control structures that determine how systems behave under pressure. European Commission updates to data sovereignty and digital strategy frameworks reflect this, as governments increasingly treat data governance as a matter of economic and national resilience. Deloitte’s digital sovereignty research for 2026 echoes that theme across global enterprises, especially those operating in multiple regulatory regimes. The organizations adapting fastest are not retreating from global connectivity. Rather, they are designing for it and embedding sovereignty as an architectural requirement. As enterprises navigate more fragmented regulatory environments, the ability to maintain jurisdictional control across interconnected digital ecosystems is a baseline infrastructure expectation rather than a specialized requirement. That expectation is now shaping how infrastructure is built. Enterprises increasingly require network-level sovereignty enforcement that operates across hybrid multicloud environments automatically, including during outages, failovers, and congestion events where data can cross borders invisibly. Capabilities such as Equinix Fabric Geo Zones reflect that demand, delivering the first network-level, multicloud sovereignty enforcement layer built natively into the interconnection fabric itself. The rules of infrastructure are being rewritten. Data sovereignty is the architectural foundation that resilient, globally connected enterprises demand. Organizations that treat it as such will be better equipped to operate, compete, and withstand pressure. Those that do not will find the status quo ambiguity increasingly costly. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

AI Policy & Regulation8 articles
Editor's pickGovernment & Public Sector
Arxiv· Today

When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis

arXiv:2605.29025v1 Announce Type: new Abstract: Federal agencies are deploying large language models (LLMs) to categorize public comment corpora, where the model's organization of the record shapes what policymakers see and which arguments register. Standard evaluation, anchored on stance accuracy against a small validated set, cannot detect when different models produce materially different categorizations of the same public input. We propose an Interpretive Audit Pipeline that treats multi-model disagreement as diagnostic of interpretive complexity and directs human review toward genuinely ambiguous public input. Analyzing 1,260 public comments on a federal USDA docket across four LLMs, we find that inter-model thematic divergence exceeds within-model prompt variation, and that an expert rubric suppresses deep interpretive disagreement without resolving it. In a two-stage labeling study on a stratified 40-comment subsample, four LLMs and a human annotator labeled independently and then revised after seeing the others' labels. Revision behavior varied across labelers, and the human annotator's revisions frequently introduced framings absent from the ensemble's collective output. We argue disagreement-based evaluation is a necessary complement to accuracy metrics for LLM-assisted interpretive coding.

Editor's pickPAYWALLGovernment & Public Sector
FT· Yesterday

How to close AI’s accountability loophole

Governance of new technologies must be determined by elected officials rather than fastest moving companies

Editor's pickPharma & Biotech
Arxiv· Today

The Biosecurity Blind Spot: Systematic Dual-use Detection in Open Science Infrastructure

arXiv:2605.28843v1 Announce Type: cross Abstract: AI is transforming life sciences research at unprecedented speed, accelerating discovery across protein structure prediction, genome modeling, and drug development (Jumper et al., 2021; Mak et al., 2024). Yet this rapid advancement, coupled with the open science movement, introduces significant dual-use research concerns that have received limited empirical scrutiny. Here we present the first systematic analysis of dual-use research of concern (DURC) content on open preprint servers. We screened ~52,000 bioRxiv preprints (2024-2025) using a hybrid pipeline of lexical filtering and large language model (LLM) evaluation, scoring metadata across nine DURC, three PEPP, and five governance categories aligned with U.S. and Australia Group oversight frameworks. Our analysis reveals that dual-use-adjacent knowledge is routinely present in openly accessible titles and abstracts, often exceeding established risk thresholds even in studies with legitimate public health objectives. While this mapping captures surface-level information diffusion, it does not measure operational capability, downstream misuse potential, or the substantial technical and biosafety barriers that constrain harmful application. We argue that institutional review processes, funding requirements, and preprint platform policies must evolve to incorporate proactive, metadata-level monitoring without compromising scientific transparency. Ultimately, harmonizing controlled-access mechanisms for high-risk methodologies with open summaries of scientific contributions offers a pragmatic framework for governing AI-accelerated biology at scale.

Editor's pickPAYWALLGovernment & Public Sector
Washington Post· Today

Opinion | Elizabeth Warren’s AI plan is to raise taxes and stifle innovation - The Washington Post

President Ronald Reagan said in 1986, “Government’s view of the economy could be summed up in a few short phrases: If it moves, tax it. If it keeps moving, regulate it. And if it stops moving, subsidize it.”

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.