AI Intelligence Brief

Tue 26 May 2026

Daily Brief — Curated and contextualised by Best Practice AI

110Articles
Editor's pickEditor's Highlights

Claude Expands Developers' Frontiers, ECB Warns of AI Credit Risks, and Wall Street Bets on Debt

TL;DR The rollout of Claude Code on GitHub is expanding the technological capabilities of software developers. The ECB has expressed concerns about the risks posed by private-credit-fueled AI investments. Meanwhile, Wall Street is heavily involved in AI-related credit markets, with banks and hedge funds engaging in complex debt trades. Additionally, the EU's AI Act is grappling with identity issues in high-risk AI systems.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

7 of 110 articles
Lead story
Editor's pickTechnology
Arxiv· Yesterday

Coding Beyond Your Training: Claude Code and the Technological Frontier of Software Developers

arXiv:2605.25438v1 Announce Type: new Abstract: We study whether adoption of an AI coding assistant causally expands the technological frontier of individual software developers. We exploit the staggered rollout of Claude Code across GitHub between May 2025 and January 2026 in a panel of 5,838 developers observed monthly over 28 months, with treatment defined by the developer's first Claude-co-authored commit and not-yet-treated developers as controls. Using the doubly robust Callaway and Sant'Anna (2021) estimator, we find positive and significant effects on monthly commits (+41), repositories contributed to (+1.5), distinct programming languages used (+0.83), Shannon language entropy (+0.14), newly-used languages (+0.31), and cumulative lifetime languages (+0.51). The cumulative-languages effect grows with time since adoption, matching a Bayesian-learning model in which AI provides free signals about unfamiliar technologies and lowers the switching barrier. Results are robust to two stricter activity filters. The estimates document a sharp, persistent shift in developer behavior coincident with AI adoption; identification limits prevent a strict causal claim and we outline an agenda for cleaner tests.

Editor's pickTechnology
Arxiv· Yesterday

Generative AI impacts on intra-urban inequality and skill premium in Beijing

arXiv:2605.25505v1 Announce Type: cross Abstract: Generative artificial intelligence (GenAI) is the first automation wave to reach high-cognitive tasks at scale, yet its effects on intra-urban inequality remain largely unknown. Using 5 million job postings from Beijing (2018--2024), we construct a neighborhood-level GenAI Exposure Index by aggregating task-level assessments from five leading large language models. We examine the spatial, structural and causal mechanisms of this shock. We find that GenAI exposure is highly concentrated in the city's core districts, deepening the intra-urban AI divide. Since 2023, high-exposure neighborhoods have experienced wage stagnation even as they continue to attract high-skilled workers -- a "high-skill trap." This wage penalty is driven by task de-skilling and intensified labor-market crowding. A difference-in-differences design centered on ChatGPT's release supports a causal interpretation. These findings challenge the prevailing theory of skill-biased technological change and provide a basis for inclusive AI governance in global technology hubs.

Editor's pickProfessional Services
Arxiv· Yesterday

AI in the Enterprise: How People Use M365 Copilot Chat

arXiv:2605.23958v1 Announce Type: cross Abstract: M365 Copilot is used every week by millions of people across more than a million companies around the world as part of their workflows. Uniquely positioned in the AI landscape given its near-exclusive use for work purposes, M365 Copilot can offer a clear picture of how people use AI for work and where that usage may expand next. This paper characterizes that usage through direct classification of user interactions with M365 Copilot Chat. Based on an anonymized and privacy-preserving analysis of a sample of approximately 5.5 million sessions, we combine a learned classification of user intent with a classification of O*NET work activities done with M365 Copilot Chat. We find that M365 Copilot is emerging as an everyday assistant for knowledge work: writing dominates, but users also rely on it for information retrieval, analysis, decision making and strategizing, and evaluating and diagnosing programs and systems, among others. Information seeking tasks remain common, but time trends suggest a relative shift away from ``chat as search'' and toward content and communication-related work. Comparisons across occupational groupings and to work done in the labor market further show that usage is broad but uneven, where the relative share of work done with M365 Copilot Chat cuts across jobs in some cases and is occupation-specific in others. Areas of relative underrepresentation in the labor market suggest the next frontier for enterprise AI adoption.

Editor's pickTechnology
Arxiv· Yesterday

Agent-Facing Information Design in LLM Tool Registries

arXiv:2605.23916v1 Announce Type: cross Abstract: LLM tool registries function as unregulated advertising platforms: providers write free-text descriptions that agents use for selection, yet no measurement infrastructure -- no viewability standard, quality score, or outcome audit -- exists to make this market accountable. We provide the first systematic framework, combining 17,700+ trials across five LLMs and ten domains with a constructive registry design prescription. Legal puffery alone (subjective superlatives, benefit framing) captures 100% of the optimization effect; fabricated claims add zero incremental bias -- rendering FTC enforcement of deceptive advertising rules ineffective against the active mechanism. Disclosure fails structurally: system-prompt warnings produce zero measurable effect for four of five models, and behavioral ceilings leave no headroom for label-based correction. Superlatives are the dominant single feature (SBC = +0.35). Registry-layer description normalization achieves first-best welfare model-independently. We propose separating selection-facing descriptions (structured, registry-controlled) from marketing-facing descriptions (provider-authored, shown post-selection), and introduce the Agent Attention Quality Score to distinguish capability from copywriting.

Editor's pickGovernment & Public Sector
Arxiv· Yesterday

High-Risk AI Systems and the Problem of Identity in the European AI Act

arXiv:2605.23922v1 Announce Type: new Abstract: The EU Artificial Intelligence Act (AIA) establishes a lifecycle governance regime for high-risk AI systems built around ex-ante conformity assessment, post-market monitoring, and re-assessment upon "substantial modification." These obligations presuppose AI identity judgments: regulators and providers must decide when an updated system remains the same system over time. In this work, we show how this logic is clarified by the function+ framework of artifact identity, which individuates AI systems by their intended function together with context-sensitive criteria of appropriate functioning, captured as "AI trustworthiness." We further argue that the AIA does not provide an internal, auditable criterion for synchronic identity--when two AI systems at a given time should count as the same for regulatory purposes--and instead largely defers such sameness determinations to sectoral or harmonization instruments. function+ supplies a synchronic identity test anchored in intended function and trustworthiness profiles and levels, making synchronic identity decisions inspectable in governance settings such as procurement, liability, and market surveillance. Our contribution is a conceptual and auditing lens: we provide a correspondence map between AIA lifecycle obligations and function+ identity components, and we make the synchronic case operationally legible via a minimal decision flow for audit and dispute contexts. We conclude with two implementation-facing recommendations: (1) more precise, testable reporting of intended purpose, and (2) standardized, auditable trustworthiness reporting that supports comparability over time and across deployments.

Editor's pick
Arxiv· Yesterday

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

arXiv:2605.23926v1 Announce Type: new Abstract: Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and energy. Casual inspection of their traces reveals extensive reformulation, verification, and circular self-reflection, yet how much of this deliberation is actually necessary has never been measured at scale or explained from first principles. This paper closes both gaps. We formalise reasoning redundancy directly in terms of the reasoning model itself: the redundancy of a correct trace is the largest fraction of its trailing segmented steps that can be truncated while $\pi$, forced to terminate thinking and emit a final answer, still produces the correct answer. A large-scale quantification across four frontier reasoning models and two mathematical benchmarks shows that step-level redundancy is consistently high -- between 61% and 93% across the 8 (model, benchmark) conditions we study, with the median critical prefix equal to a single segmented step in six of the eight conditions -- that the finding is robust to the choice of judge family, and that although $\rho$ decreases with problem difficulty on MATH-500, all four models remain substantially redundant ($\rho \in [46\%, 85\%]$) even on the hardest Level-5 problems. We then prove that this redundancy is a structural consequence of length-agnostic outcome rewards, not a model-specific artefact: under any such reward, no finite expected stopping time is optimal. The result holds regardless of RL algorithm, base model, data distribution, or whether the policy is obtained via RL or distillation; over-thinking is therefore not a bug to be patched in individual models but a structural property of how current reasoning models are trained. Code: https://github.com/zhiyuanZhai20/how-much-thinking-is-enough

Editor's pickPAYWALLTechnology
NYT· Yesterday

OpenRouter, an Exchange for A.I. Models, Raises $113 Million

An investment arm of Alphabet is backing OpenRouter, which helps companies choose among hundreds of models for different software tasks.

Economics & Markets

31 articles
AI Investment & Valuations9 articles
Editor's pickTechnology
International Business Times· Yesterday

Cathie Wood Says Wall Street Is Missing The Next Big AI Trade — And It's Not Nvidia | IBTimes UK

Cathie Wood highlights potential AI investment opportunities in CPUs and legacy tech firms like Intel and Cisco, as AI infrastructure evolves beyond GPUs to include inference and automation systems.

Editor's pickTechnology
Bebeez· Yesterday

Terra Quantum and Axiom Intelligence Acquisition Corp 1 Announce Definitive Business Combination Agreement at a $3.5 Billion Equity Valuation

Combined Company Expected to Trade on Nasdaq Under Ticker Symbol “TQ” Transaction Positions Terra Quantum to Accelerate Global Expansion and Further Strengthen Its Leadership in Quantum Technologies and AI-Driven Optimization ST. GALLEN, Switzerland and NEW YORK, May 26, 2026 /PRNewswire/ — Terra Quantum AG (“Terra Quantum” or the “Company”), a global leader in quantum technologies, quantum […]

Editor's pickEnergy & Utilities
The Motley Fool· 2 days ago

Wall Street Thinks AI Data Centers Could Trigger the Biggest Power Boom Since the Internet Era. 1 No-Brainer Stock to Buy Now. | The Motley Fool

Data centers' electricity demand could supercharge Constellation Energy's long-term growth.

Editor's pickFinancial Services
Arxiv· Yesterday

Contract Structure and Risk Aversion in Longevity Risk Transfers

arXiv:2409.08914v2 Announce Type: replace Abstract: This paper introduces an economic framework to assess optimal longevity risk transfers between institutions, focusing on the interactions between a buyer exposed to long-term longevity risk and a seller offering longevity protection. While most longevity risk transfers have occurred in the reinsurance sector, where global reinsurers provide long-term protections, the capital market for longevity risk transfer has struggled to gain traction, resulting in only a few short-term instruments. We investigate how differences in risk aversion between the two parties affect the equilibrium structure of longevity risk transfer contracts, contrasting `static' contracts that offer long-term protection with `dynamic' contracts that provide short-term, variable coverage. Our analysis shows that static contracts are preferred by more risk-averse buyers, while dynamic contracts are favored by more risk-averse sellers who are reluctant to commit to long-term agreements. When incorporating information asymmetry through ambiguity, we find that ambiguity can cause more risk-averse sellers to stop offering long-term contracts. With the assumption that global reinsurers, acting as sellers in the reinsurance sector and buyers in the capital market, are generally less risk-averse than other participants, our findings provide theoretical explanations for current market dynamics and suggest that short-term instruments offer valuable initial steps toward developing an efficient and active capital market for longevity risk transfer.

AI Macroeconomics2 articles
Editor's pickManufacturing & Industrials
Arxiv· Yesterday

AI-Driven Controlled Environment Agriculture as Resilient Infrastructure for U.S. Fresh-Produce Supply Chains

arXiv:2605.23946v1 Announce Type: new Abstract: Climate volatility, regional production concentration, labor constraints, cyber risk, and dependence on long-distance fresh-produce supply chains expose vulnerabilities in U.S. fresh-produce and specialty-crop systems. Controlled environment agriculture (CEA) can reduce some exposure by moving selected production into protected, sensor-rich environments, but recent failures in venture-backed vertical farming show that CEA cannot be treated as a universal food-security solution. This paper proposes the Controlled Environment Agriculture Resilience Infrastructure Framework, Version 2.0 (CEA-RIF 2.0), for evaluating AI-driven CEA as targeted regional fresh-produce continuity infrastructure. The framework assesses seven dimensions: supply continuity, climate isolation, energy and grid integration, water and nutrient circularity, cyber-physical reliability, economic viability, and governance and deployment. Drawing on U.S. government reports, peer-reviewed CEA and energy literature, demand-response research, cybersecurity standards, international smart-agriculture programs, 2025-2026 financing and policy signals, and public autonomous-greenhouse datasets, the paper argues that AI creates resilience value only when it improves measured operational outcomes such as climate stability, energy flexibility, yield consistency, anomaly detection, labor productivity, and safe recovery from faults. The analysis reframes AI-driven CEA as a cyber-physical infrastructure problem: energy-aware, grid-interactive, secure, interoperable, regionally distributed, financially disciplined, and connected to public resilience goals. The paper concludes with a research agenda for interagency testbeds, open datasets, standardized metrics, demand-response pilots, and cyber-physical reference architectures.

Editor's pickEnergy & Utilities
Arxiv· Yesterday

Ownership Networks and Economic Power in the Italian Energy Sector

arXiv:2605.25555v1 Announce Type: new Abstract: The energy sector is a cornerstone of national strategic autonomy, yet its increasing financialization has transformed ownership structures into complex networked configurations. This paper investigates the distribution of economic power in the Italian energy sector by introducing two sector-level extensions of the Network Power framework: the Aggregate Network Power Index (A-NPI) and the Aggregate Network Power Flow (A-NPF). Unlike traditional macro-level measures, these indices aggregate firm-level control and influence into a systemic framework that accounts for the relative economic weight of each operator. Applying this framework to the Italian case reveals a "Governance Paradox": while the State retains formal majority ownership, the sector's deepening reliance on global capital markets and the pervasive presence of common ownership by transnational institutional investors have progressively hollowed out public strategic direction. The results show that capital centralization enables global financial actors to internalize sectoral competition, fostering a regime of tacit strategic convergence in the management of critical infrastructure. This configuration challenges European strategic autonomy, raising questions about the adequacy of traditional Foreign Direct Investment (FDI) screening and antitrust tools in addressing the systemic influence exerted through networked ownership structures.

AI Market Competition7 articles
Editor's pickTechnology
The Economy· Yesterday

“From Advanced Chip Development to AI Price Cuts” China’s All-Out AI Push Stumbles on Margin Erosion and Technological Constraints Amid Cutthroat Competition | The Economy

China’s Huawei has unveiled plans to produce cutting-edge chips at the 1-nanometer (nm) level. With U.S. sanctions restricting access to extreme ultraviolet (EUV) lithography equipment from Dutch semiconductor equipment maker ASML, Huawei aims to circumvent those constraints through proprietary ...

Editor's pickTechnology
StartupHub.ai· 2 days ago

Four labs, four acquisitions in five days: the consolidation signal hiding in plain sight | StartupHub.ai

Anthropic, Mistral, Google DeepMind, and Meta each acquired an AI startup in the same week. None announced it as a trend. It is.

Editor's pickMedia & Entertainment
Arxiv· Yesterday

Does TikTok Promote or Cannibalize Music Streaming? Estimands and Identification with Heavy-Tailed Outcomes

arXiv:2405.14999v3 Announce Type: replace Abstract: We study how TikTok affects demand for music on paid streaming platforms. We use Universal Music Group's (UMG) global withdrawal of its catalog from TikTok as a quasi-natural experiment. Recent work using this setting reaches mixed conclusions about whether TikTok promotes or cannibalizes streaming demand. We show that these findings can be reconciled by making the estimand explicit: with heavy-tailed exposure and outcomes, common difference-in-differences (DiD) implementations in levels, logs, and Poisson answer different economic questions. In our data, the top 10% of songs account for 96% of TikTok creations and 76% of Spotify streams, which makes the distinction between the typical song and the economically consequential song central. We find that removing TikTok access lowers Spotify demand for UMG titles, with losses concentrated among viral songs and little economically meaningful change for the long tail. Because the viral head accounts for a disproportionate share of listening and revenue, these losses drive aggregate implications. A TikTok creator-side analysis shows that some activity reallocates toward non-UMG audio when UMG content is unavailable. This substitution is limited in magnitude but economically relevant for interpreting the treatment effect because streaming compensation depends on relative stream shares. Finally, using the 2025 U.S. TikTok outage, which affected all labels symmetrically and is not subject to the label-specific spillover concern as the UMG withdrawal, we find corroborating evidence that disruptions to TikTok access reduce monetized streaming. We also provide a practitioner companion that guides the choice of DiD estimands, estimators, and diagnostics in heavy-tailed outcome settings.

Editor's pickPAYWALLTechnology
Bloomberg· Yesterday

Musk’s xAI Warns Staffers to Limit Contact With Cursor Employees

Employees at Elon Musk’s xAI were warned by the company’s top lawyer to carefully moderate their interactions with workers from Cursor — a directive that came weeks after Musk’s firm announced a possible deal to acquire the AI coding startup.

Editor's pickTechnology
Top Daily Headlines: EU's digital sovereignty boo-boo may be the best thing to ever happen to the project· Today

Big Tech extracts retirement-scale wealth from UK internet users, research shows

Britain's 'free' internet economy is powered by invisible data extraction that feeds advertisers, AI firms, and digital platforms.

Editor's pickEnergy & Utilities
Arxiv· Yesterday

Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model

arXiv:2605.23934v1 Announce Type: new Abstract: Quantum computing devices are recognized as powerful tools for solving NP-complete problems. However, the intricacy of their modeling presents notable barriers for non-specialists, while the tedious iteration of constraint weights and modeling methodologies also consumes substantial effort on the part of experts. To address these challenges, this study integrates a femtosecond laser-pumped Coherent Ising Machine (CIM) with an LLM-driven agentic system by leveraging the LangGraph and LangChain frameworks. Comprehensive investigations demonstrate that large language models (LLMs) can effectively perform such tasks in modeling as QUBO/Ising model calibration, constraint weight decision iteration and rapid validation of literature-reported schemes. Notably, all these tasks can be fully implemented based on domestic large models, combined with domestically developed CIM hardware, we truly achieve the practical empowerment of quantum CIM that fully relies on all-domestic agentic large models and hardware. This work successfully realizes robust technological integration, laying a solid foundation for subsequent research. Nevertheless, it also identifies the persisting challenges in the two cutting-edge fields of large models and quantum computing at the current stage. Encouragingly, we unexpectedly discover a promising new paradigm where accumulated knowledge from agent-assisted quantum computing iterations reciprocally enhances the agent's own problem-solving capability, thereby addressing these challenges.

Editor's pickTechnology
AEI· Yesterday

In AI, Bigger Firms Mean Faster Progress | American Enterprise Institute - AEI

Large firms are not slowing AI; naïve regulatory policies do.

AI Productivity6 articles
Editor's pickTechnology
Arxiv· Yesterday

Coding Beyond Your Training: Claude Code and the Technological Frontier of Software Developers

arXiv:2605.25438v1 Announce Type: new Abstract: We study whether adoption of an AI coding assistant causally expands the technological frontier of individual software developers. We exploit the staggered rollout of Claude Code across GitHub between May 2025 and January 2026 in a panel of 5,838 developers observed monthly over 28 months, with treatment defined by the developer's first Claude-co-authored commit and not-yet-treated developers as controls. Using the doubly robust Callaway and Sant'Anna (2021) estimator, we find positive and significant effects on monthly commits (+41), repositories contributed to (+1.5), distinct programming languages used (+0.83), Shannon language entropy (+0.14), newly-used languages (+0.31), and cumulative lifetime languages (+0.51). The cumulative-languages effect grows with time since adoption, matching a Bayesian-learning model in which AI provides free signals about unfamiliar technologies and lowers the switching barrier. Results are robust to two stricter activity filters. The estimates document a sharp, persistent shift in developer behavior coincident with AI adoption; identification limits prevent a strict causal claim and we outline an agenda for cleaner tests.

Editor's pickProfessional Services
Arxiv· Yesterday

AI in the Enterprise: How People Use M365 Copilot Chat

arXiv:2605.23958v1 Announce Type: cross Abstract: M365 Copilot is used every week by millions of people across more than a million companies around the world as part of their workflows. Uniquely positioned in the AI landscape given its near-exclusive use for work purposes, M365 Copilot can offer a clear picture of how people use AI for work and where that usage may expand next. This paper characterizes that usage through direct classification of user interactions with M365 Copilot Chat. Based on an anonymized and privacy-preserving analysis of a sample of approximately 5.5 million sessions, we combine a learned classification of user intent with a classification of O*NET work activities done with M365 Copilot Chat. We find that M365 Copilot is emerging as an everyday assistant for knowledge work: writing dominates, but users also rely on it for information retrieval, analysis, decision making and strategizing, and evaluating and diagnosing programs and systems, among others. Information seeking tasks remain common, but time trends suggest a relative shift away from ``chat as search'' and toward content and communication-related work. Comparisons across occupational groupings and to work done in the labor market further show that usage is broad but uneven, where the relative share of work done with M365 Copilot Chat cuts across jobs in some cases and is occupation-specific in others. Areas of relative underrepresentation in the labor market suggest the next frontier for enterprise AI adoption.

Editor's pickProfessional Services
Siliconrepublic· Yesterday

Opinion: AI transforming how tenders are written but not how they’re evaluated

AI is changing how tenders are written but not how they're evaluated in Ireland. That gap is becoming a problem, says BidReview.ai founder Tony Corrigan. Read more: Opinion: AI transforming how tenders are written but not how they’re evaluated

Editor's pickHealthcare
News-Medical· Yesterday

New AI assistant streamlines initial psychiatric consultations for doctors

People often say that seeking psychiatric care can feel intimidating. Patients may feel burdened when they first open up about their emotional distress, while medical staff must accurately understand a patient's extensive history and symptoms within limited consultation time.

Editor's pickHealthcare
The Standard· 2 days ago

How AI could help fix Kenya's overstretched healthcare system - The Standard Health

Kenya continues to face growing demand for healthcare services alongside persistent shortages of healthcare personnel, particularly in specialised areas of care.

Editor's pick
Arxiv· Yesterday

Artificial Effort

arXiv:2605.23920v1 Announce Type: new Abstract: Real-effort tasks, in which participants perform cognitively costly activities whose outcomes depend on actual performance, are widely used in experimental economics. Their validity, however, rests on the assumption that a human performs them. We study whether this assumption still holds in the era of Artificial Intelligence (AI) and Large Language Models (LLMs). Using 8 canonical real-effort tasks and 23 LLMs from three major providers, we show that most tasks can now be solved accurately and at a negligible cost, while only a few resist automation. Performance improves with each model generation, and midtier models are rapidly closing the gap with frontier ones, broadening the set of widely accessible models that can automate these tasks. Additionally, we show that verbally offering monetary incentives has no effect on LLM performance. Our findings establish a boundary condition for the use of real-effort tasks in unsupervised settings: when participants can cheaply outsource task completion to an LLM, observed performance may no longer reflect genuine human effort.

Labor, Society & Culture

17 articles
AI & Culture2 articles
AI & Employment8 articles
Editor's pickTechnology
Arxiv· Yesterday

Generative AI impacts on intra-urban inequality and skill premium in Beijing

arXiv:2605.25505v1 Announce Type: cross Abstract: Generative artificial intelligence (GenAI) is the first automation wave to reach high-cognitive tasks at scale, yet its effects on intra-urban inequality remain largely unknown. Using 5 million job postings from Beijing (2018--2024), we construct a neighborhood-level GenAI Exposure Index by aggregating task-level assessments from five leading large language models. We examine the spatial, structural and causal mechanisms of this shock. We find that GenAI exposure is highly concentrated in the city's core districts, deepening the intra-urban AI divide. Since 2023, high-exposure neighborhoods have experienced wage stagnation even as they continue to attract high-skilled workers -- a "high-skill trap." This wage penalty is driven by task de-skilling and intensified labor-market crowding. A difference-in-differences design centered on ChatGPT's release supports a causal interpretation. These findings challenge the prevailing theory of skill-biased technological change and provide a basis for inclusive AI governance in global technology hubs.

Editor's pickPAYWALLProfessional Services
FT· Yesterday

AI tools lead to ‘clear racial disparities’ in job hiring

New Stanford-led study finds candidates that fail AI-hiring tests face ‘systemic rejection’ across companies

Editor's pickTechnology
Reuters· Yesterday

OpenAI's Altman says AI unlikely to lead to 'jobs apocalypse' | Reuters

SYDNEY, May 26 (Reuters) - Open AI CEO Sam Altman said on Tuesday the rapid development ‌and adoption of AI would not lead to a global "jobs apocalypse" and the technology had not claimed as many white-collar jobs as he had feared.

Editor's pick
MIT Technology Review· Yesterday

The Download: puncturing the AI jobs panic

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. A reality check on the AI jobs hysteria Despite the growing hysteria over AI’s threat to white-collar jobs, there’s still scant evidence that the technology has had a large-scale impact on…

AI Ethics & Safety3 articles
Editor's pickTechnology
Arxiv· Yesterday

Dual-Use AI Face Swap Apps Are Mostly Unsafe: A Systematic Safety Audit

arXiv:2605.24735v1 Announce Type: new Abstract: AI-based image editing tools, such as face swapping algorithms, can be used to transform a clothed image of a person into a sexually explicit image of that person. These tools are made easily accessible to non-expert users through mobile apps, and have been linked to reports of image-based sexual abuse and cyberbullying involving synthetic non-consensual intimate imagery. Apple and Google have begun to remove "nudification" apps from their platforms: apps that are marketed with the capability to "undress", "nudify", or create nude face swaps from images of people. However, AI image editing apps that have the same underlying capabilities, but do not present as nudification apps could be also abused to create non-consensual explicit images. In this paper, we investigate whether AI face swap apps for iOS and Android implement safety measures to prevent the creation of SNCII. We identified and downloaded 420 face swap apps, and manually tested 155 eligible apps to see whether they would permit the user to create face swaps with nude images. Our evaluation shows that 70% of apps with face swap functionality have no technical safeguards against generation of nude images. Additionally, we investigated whether face swap apps' descriptions, terms of service, or privacy policies addressed harmful uses of the app, finding that no apps self-describe as nudification apps, but that the majority do not have specific terms of service provisions prohibiting this kind of use. Our findings suggest that to mitigate the threat of UI-bound SNCII threats, platforms and lawmakers must implement policies to mandate safety filters in dual-use AI image editing applications like face swap apps.

Editor's pickHealthcare
Arxiv· Yesterday

When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

arXiv:2605.23932v1 Announce Type: new Abstract: Despite strong medical benchmark accuracy, LLMs can exhibit severe multi-turn sycophancy in clinical dialogue, abandoning initial correct diagnosis under escalating pressure. We propose \textbf{\textsc{Med-Stress}}, a targeted stress test framework that evaluates belief stability under escalating pressure. Across nine frontier large language models (LLMs), we find a clear dissociation between medical knowledge and robustness: high initial diagnostic capability does not imply high belief stability, yielding large knowledge-robustness gaps for several LLMs. To mitigate this failure mode, we propose a lightweight inference-time defense, \textbf{\texttt{RBED}} (\textbf{R}ole-\textbf{B}ased \textbf{E}pistemic \textbf{D}efense), and \textbf{\texttt{R-FT}} (\textbf{R}esilience-oriented \textbf{F}ine-\textbf{T}uning), a training-time approach that internalizes evidence-based resistance to pressure. Experiments show that \textbf{\texttt{R-FT}} nearly eliminates belief change and substantially improves robustness.

AI Skills & Education2 articles
Editor's pickEducation
Arxiv· Yesterday

Generative AI as a Design Variable: An Evidence-Centered Framework for Principled Governance in STEM Assessment

arXiv:2605.24837v1 Announce Type: new Abstract: Generative Artificial Intelligence (GenAI) presents a governance challenge for STEM assessment. Unrestricted GenAI access enables task outsourcing that undermines the validity of traditional assessments; blanket prohibitions are difficult to enforce, may push use underground, and do little to prepare students for workplaces where GenAI-supported workflows are increasingly common. This paper addresses this dilemma by proposing a framework grounded in Evidence-Centered Design (ECD) that treats GenAI as a design variable within the assessment argument rather than an external threat to it. The framework analyzes how GenAI reshapes the student model, evidence model, and task model, and uses this analysis to articulate three principled governance stances. Restrict is warranted when GenAI would contaminate the inferential link between student work products and targeted unaided proficiency. Scaffold is warranted when bounded GenAI support can support peripheral demands without revealing the target construct, preserving inferential interpretability. Require is warranted when the target construct is disciplinary AI interaction competency and tasks can be designed to elicit process artifacts, including prompts, critiques, and revisions, that make student reasoning observable, scorable, and distinguishable from AI-generated output. This framework specifies when to restrict, scaffold, or require GenAI use in STEM assessment. We present two task designs deployed in an introductory physics course and demonstrate that disciplinary AI interaction competencies are observable in student response artifacts and can be scored using defensible rubrics grounded in student data and expert knowledge. By situating GenAI governance within validity arguments, the framework offers actionable guidance for preserving learning integrity while supporting authentic preparation for AI-enabled professional environments.

Technology & Infrastructure

32 articles
AI Agents & Automation7 articles
Editor's pickTechnology
Arxiv· Yesterday

Agent-Facing Information Design in LLM Tool Registries

arXiv:2605.23916v1 Announce Type: cross Abstract: LLM tool registries function as unregulated advertising platforms: providers write free-text descriptions that agents use for selection, yet no measurement infrastructure -- no viewability standard, quality score, or outcome audit -- exists to make this market accountable. We provide the first systematic framework, combining 17,700+ trials across five LLMs and ten domains with a constructive registry design prescription. Legal puffery alone (subjective superlatives, benefit framing) captures 100% of the optimization effect; fabricated claims add zero incremental bias -- rendering FTC enforcement of deceptive advertising rules ineffective against the active mechanism. Disclosure fails structurally: system-prompt warnings produce zero measurable effect for four of five models, and behavioral ceilings leave no headroom for label-based correction. Superlatives are the dominant single feature (SBC = +0.35). Registry-layer description normalization achieves first-best welfare model-independently. We propose separating selection-facing descriptions (structured, registry-controlled) from marketing-facing descriptions (provider-authored, shown post-selection), and introduce the Agent Attention Quality Score to distinguish capability from copywriting.

Editor's pickPharma & Biotech
Arxiv· Yesterday

From Replacement to Orchestration: A Socio-Technical Architecture for Agentic AI in Corporate R&D

arXiv:2605.24580v1 Announce Type: new Abstract: Purpose: Corporate R&D faces a persistent productivity paradox: rising investment and expanding scientific knowledge have not translated into proportional innovation output. In pharmaceuticals this is captured as Eroom's Law; analogous patterns appear across engineering, materials science, and healthcare. The core cause is not insufficient tools but cognitive saturation: researchers spend an increasing share of their effort on coordination, documentation, and data governance -- hidden work that displaces high-value hypothesis formation, interpretation, and strategic synthesis. Design/Methodology/Approach: The paper uses a Design Science Research (DSR) methodology. The artifact is the HARMONY operating model. Evidence is triangulated from four semi-structured expert interviews with senior R&D leaders across industrial, healthcare, and academic settings; a foresight scenario analysis projecting four plausible 2040 R&D futures; and pattern matching with documented agentic R&D deployments. Two non-negotiable design requirements guide the architecture: cognitive-load redistribution (DR1) and bounded autonomy with alignment (DR2). Findings: We propose HARMONY -- Hybrid Agentic Research Model for Organisational New Yield -- a four-pillar socio-technical architecture comprising ResOps (Industrialized Execution), the Control Tower (Strategic Visibility and Drift Detection), the Ethics Fabric (Bounded Autonomy by Design), and the Talent Studio (Sciencepreneur Capability). The model introduces the Sciencepreneur as the central human archetype in agentic R&D, and Orchestration Leverage as a candidate productivity metric suited to human-agent hybrid systems.

Editor's pickTechnology
Arxiv· Yesterday

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

arXiv:2605.23929v1 Announce Type: new Abstract: Modern AI systems increasingly rely on workflows composed of multiple interacting agents, some powered by large language models (LLMs) and others by conventional computational modules. This paper analyzes the fundamental tradeoffs between latency, reliability, and cost in LLM-enabled agentic workflows. We introduce performance models for both LLM and non-LLM agents that capture the relationship between computational effort and output quality, incorporating the impact of reasoning and output tokens for LLM agents using a parametric exponential reliability function. Then, we study the design of sequential workflows under latency and cost constraints. Main results include a water-filling token allocation policy and characterizations of optimal workflow reliability in terms of shadow prices.

Editor's pickTechnology
Arxiv· Yesterday

Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction

arXiv:2605.23928v1 Announce Type: new Abstract: We present Context, the intelligence layer of the Magarshak Architecture, which replaces reactive query-response chatbots with proactive goal-directed agents that advance shared tasks without waiting for user prompts. The architecture rests on three mutually reinforcing mechanisms. Write-time context assembly precomputes enriched typed attributes via Groker agents, assembling interaction context as a deterministic pure function of graph state; context blocks are byte-identical across turns between semantic changes, enabling near-100% KV-cache reuse. Composable sandboxed wisdom programs form a governed library of LM-generated imperative programs declaratively wired to goal types via typed stream relations, composed via phase ordering, and executed at interaction time without further LM calls. Proactive goal stream state machines drive conversations toward terminal states by inspecting graph state and emitting structured interaction content (option arrays, governance affordances, clarification prompts) without awaiting user input. We prove six formal results: the Context Stability Theorem, bounding per-turn LM cost as a function of semantic change rate; a Program Composition Correctness Theorem; a Declarative Wiring Soundness Theorem; the Proactive Dominance Theorem, proving proactive agents weakly dominate reactive agents on expected turns-to-terminal-state; Coordination Overhead Elimination and Quality Preservation, establishing Pareto improvements in multi-participant goal chats; and a Cross-Platform Vote Consistency Theorem. Implemented in the open-source Qbix / Safebox / Safebots stack.

Editor's pickTechnology
Arxiv· Yesterday

Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems

arXiv:2605.23935v1 Announce Type: new Abstract: Autonomous agent systems fail not only due to incorrect decisions, but due to executing decisions whose authority no longer holds at runtime. Prior work defined Reconstructive Authority (RAM) as a condition for valid execution: actions are permitted only if authority can be constructed from current state. This paper addresses enforcement at runtime: how to enforce this condition in a running system. We introduce a runtime execution model in which authority is evaluated at action time and execution is conditioned on its constructibility. This extends the execution state space beyond admit/deny with a third state, halt, representing cases where authority is undefined due to incomplete or uncertain observability. We define a concrete execution protocol including dynamic dependency resolution, authority reconstruction, and explicit decision semantics. We further introduce a Recovery Loop that integrates drift detection (IML) with execution control (ACP), allowing the system to suspend execution, acquire missing information, and re-attempt authority reconstruction. We show that this model guarantees safety -- no action is executed without constructible authority -- and conditional liveness: execution resumes when authority-defining variables become observable. This work operationalizes reconstructive authority as a runtime enforcement mechanism, providing the execution semantics required to apply RAM in real systems.

Editor's pickTechnology
Forbes· Yesterday

Council Post: The CMO's Guide To Scaling Agentic AI Across The Enterprise

Agentic AI represents a fundamental evolution beyond traditional automation and GenAI. Chatbots respond to prompts, and robotic process automation follows scripts. By contrast, agentic systems: • Understand goals and autonomously plan multistep workflows. • Execute tasks across multiple systems without constant human oversight. ... A June 2025 Gartner, Inc. report predicted that by 2028, 33% of enterprise ...

Editor's pickTechnology
Arxiv· Yesterday

DRIVE: Modeling Skills at the Reasoning and Interaction Levels for Web Agents under Continual Learning

arXiv:2605.23939v1 Announce Type: new Abstract: Web agents require both high-level reasoning (for task decomposition) and low-level interactions (for page elements manipulation) to conduct different tasks. However, these knowledge types differ fundamentally: reasoning knowledge (e.g., booking a flight requires first searching for routes) is abstract and transferable across websites, while interaction knowledge (e.g., clicking the Search button at a specific coordinate on Site A) depends heavily on page-specific contexts. Existing methods store experiences uniformly. This creates a dilemma: abstract representations lose executability on concrete pages, while concrete representations fail to generalize across domains. This entanglement limits capability accumulation: on new websites, agents either fail to recognize reusable task logic due to surface-level differences or attempt infeasible actions from outdated page structures. To disentangle them, we propose DRIVE, a dual-level skill modeling framework separating historical experience into natural language reasoning skills, which capture transferable task logic, and programmatic interaction skills, grounding abstract actions to executable operations. A scene-aware coordination mechanism adaptively retrieves and invokes these dual-level skills based on task semantics. DRIVE also uses skill-level reflection to identify hierarchy-specific failure modes, enabling targeted skill library expansion and refinement. Experiments across five WebArena domains show DRIVE attains an average task success rate of 52.8%, exceeding the skill-free baseline by 7.3 percentage points. Further ablations show reasoning and interaction skills provide distinct, complementary benefits, supporting separation of transferable task logic from executable page-level operations.

AI Infrastructure & Compute10 articles
Editor's pickEnergy & Utilities
Bebeez· Yesterday

Nscale inks PPA with Vattenfall to power Kvandal data center in Norway

European neocloud Nscale has signed a Power Purchase Agreement (PPA) with Swedish state-owned power company Vattenfall in Norway. – Vattenfall The PPA will support the first phase of Nscale´s data center development in Kvandal, in northern Norway. The exact capacity of the PPA has not been disclosed; however, the companies claim it will cover a […]

Editor's pickTechnology
Check Point· Yesterday

2026 Cloud Security Report: Why Traditional Network, Cloud, and Security Architecture Are Lagging Behind the AI Transformation - Check Point Blog

As AI rapidly reshapes industries, the role of the cloud has become even more critical. From automated customer experiences to intelligent cyber security

Editor's pickTechnology
Guardian· 2 days ago

A Louisiana state senator helped secure Meta’s largest datacenter. Then he sold the land beside it

Jay Morris denies experts’ claims that he violated ethics rules over land deals near the site of Meta’s Hyperion datacenter This story is from Floodlight, a non-profit newsroom that investigates the powers stalling climate action For more than two years, John “Jay” Morris, a Louisiana state senator, helped pave the way for Meta to build one of the world’s largest datacenters, called Hyperion, in Richland Parish. Continue reading...

Editor's pickTechnology
The Economic Times· Yesterday

Enterprise AI infrastructure, MLOps & developer Tools drive the next phase of AI innovation - The Economic Times

The ET Most Innovative AI Product Awards 2026 recognises the AI Platforms, Infrastructure & Developer Tools category. It highlights the technologies powering the next phase of enterprise AI from platforms and MLOps to observability and developer tools. These innovations are enabling scalable, ...

Editor's pickEnergy & Utilities
PR Newswire· Yesterday

Data Center Generators Market worth $9.79 billion by 2031 | Exclusive Report by MarketsandMarkets™

/PRNewswire/ -- According to MarketsandMarkets™, the global Data Center Generators Market is projected to grow from USD 8.57 billion in 2026 to USD 9.79...

AI Models & Capabilities4 articles
Editor's pick
Arxiv· Yesterday

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

arXiv:2605.23926v1 Announce Type: new Abstract: Reasoning-capable large language models solve hard problems by emitting long chains of thought, paying heavily in latency, GPU time, and energy. Casual inspection of their traces reveals extensive reformulation, verification, and circular self-reflection, yet how much of this deliberation is actually necessary has never been measured at scale or explained from first principles. This paper closes both gaps. We formalise reasoning redundancy directly in terms of the reasoning model itself: the redundancy of a correct trace is the largest fraction of its trailing segmented steps that can be truncated while $\pi$, forced to terminate thinking and emit a final answer, still produces the correct answer. A large-scale quantification across four frontier reasoning models and two mathematical benchmarks shows that step-level redundancy is consistently high -- between 61% and 93% across the 8 (model, benchmark) conditions we study, with the median critical prefix equal to a single segmented step in six of the eight conditions -- that the finding is robust to the choice of judge family, and that although $\rho$ decreases with problem difficulty on MATH-500, all four models remain substantially redundant ($\rho \in [46\%, 85\%]$) even on the hardest Level-5 problems. We then prove that this redundancy is a structural consequence of length-agnostic outcome rewards, not a model-specific artefact: under any such reward, no finite expected stopping time is optimal. The result holds regardless of RL algorithm, base model, data distribution, or whether the policy is obtained via RL or distillation; over-thinking is therefore not a bug to be patched in individual models but a structural property of how current reasoning models are trained. Code: https://github.com/zhiyuanZhai20/how-much-thinking-is-enough

Editor's pickTechnology
Arxiv· Yesterday

Authority Inversion in LLM-Mediated Ubiquitous Systems: When Models Trust Users Over Sensors

arXiv:2605.23938v1 Announce Type: new Abstract: Large language models (LLMs) increasingly fuse heterogeneous inputs in ubiquitous systems. Yet, how LLMs implicitly allocate authority when sensor measurements and user claims conflict remains unexamined, raising critical reliability concerns for deployments where physical sensing must retain priority. Unlike explicit traditional fusion, LLMs bury authority allocation within learned representations. We discover this allocation is severely format-dependent: numerical sensor data fails to integrate into answer-relevant model directions, allowing natural-language claims to dominate the final decision, a phenomenon we term \textbf{Authority Inversion}.To diagnose and mitigate this, we develop a geometric framework of context integration, introduce two computable audit metrics, specifically the Context Integration Ratio (CIR) and Authority Alignment Index (AAI), and propose Geometric Authority Calibration (GAC), an inference-time layer-level intervention to suppress misplaced user authority. Evaluating four models (4B to 35B parameters, three architectures) across four datasets totaling 576 conflict instances reveals extreme inversion: on numerical tasks, models exhibit near-zero sensor trust (AAI = -0.805, Cohen's d = -2.14), unaffected by model capacity. Validating our geometric framework, theory-guided causal injection flips 80.2\% of incorrect decisions (vs. <0.4\% for random controls). Practically, GAC improves HAR accuracy from 0 -- 1.6\% to 21.9 -- 27.5\%, outperforming prompting baselines. Ultimately, authority allocation in LLM-mediated systems must be explicitly audited and application-specifically configured rather than left implicit.

Editor's pickTechnology
Arxiv· Yesterday

Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning

arXiv:2605.23940v1 Announce Type: new Abstract: How do multi-turn reasoning systems fail? The expected answer is logical contradiction, in which the system's maintained state becomes unsatisfiable. We show that the dominant mode is instead satisfiable drift, where the internal state stays consistent while the returned answer silently violates prior commitments. We build DRIFT-Bench (Decomposing Reasoning Into Failure Types), a solver-instrumented benchmark of 816 test problems across three constraint domains, and evaluate four methods on it across four open-weight models (8B-120B parameters). MUS-Repair, which feeds minimal unsatisfiable subsets back to the generator, is strongest in every setting (+1.8 to +15.0 pp over the best non-MUS baseline). But the central finding is what repair leaves behind. After structured feedback, models rarely contradict themselves. They forget. Residual errors are 98-100% satisfiable drift across all settings, while contradiction drops to near zero. Reliable multi-turn systems must separately validate that the returned answer respects the maintained state. Code is available at https://github.com/kaons-research/drift-bench.

Editor's pick
Arxiv· Yesterday

In Search of the Ingredients of Open-Endedness: Replicating Picbreeder with Large Vision-Language Models

arXiv:2605.23908v1 Announce Type: new Abstract: We are in the midst of large-scale industrial and academic efforts to automate the processes of scientific, technological and creative production through AI-driven assistants. Historically, a fundamental property of these processes in their human form has been their open-endedness: their capacity for generating a seemingly endless supply of novel and meaningful new forms. Do artificial agents have any capacity for such fruitful unguided discovery? To answer this question, we turn to Picbreeder, the canonical exemplar of human-driven open-ended search, in which users collaboratively generated a diverse library of images through interactive evolution of small neural networks. We replicate Picbreeder, replacing human users with frontier Vision Language Models (VLMs). We observe clear qualitative differences between the output of our system and the historical human baseline, and attempt to characterize them using metrics of phylogenetic complexity and visual and semantic salience and novelty. In an effort to identify some of the causal factors contributing these differences, we study the addition of exploratory noise to the agents' selection process, of behavioral diversity between agents, and of narrative momentum in the form of memory of past actions. We make our code available at https://github.com/smearle/picbreeder-vlm.

AI Research & Science2 articles
Editor's pickTechnology
Arxiv· Yesterday

EXOTIC: An Exact, Optimistic, Tree-Based Algorithm for Min-Max Optimization

arXiv:2508.12479v2 Announce Type: replace-cross Abstract: Min-max optimization arises in many domains such as game theory, adversarial machine learning, etc. For these problems, gradient-based methods are well understood and enjoy strong guarantees. However, in the absence of convexity or concavity, existing approaches study convergence to an approximate saddle point or first-order stationary points, which may be arbitrarily far from global optima. In this work, we present an algorithmic framework for computing the global minimax value in convex--non-concave and non-convex--concave min-max optimization. For convex--non-concave min-max problems, we use a reformulation that transforms the problem into a non-concave--convex max-min optimization problem with suitably defined feasible sets and objective function. This reformulation can be viewed as an extension of Sion's minimax theorem to the convex--non-concave setting. We then introduce EXOTIC -- an Exact, Optimistic, Tree-based algorithm for solving the reformulated max-min problem. EXOTIC combines an iterative convex optimization solver for the inner minimization with an optimistic hierarchical tree search for the outer maximization, inspired by StroquOOL~\cite{bartlett2019simple}. Unlike StroquOOL, which assumes stochastic zero-mean noisy evaluations, EXOTIC handles deterministic, biased, and budget-dependent evaluation errors arising from finite-time solutions of the inner convex subproblems. We establish an upper bound on its optimality gap. The same framework also applies to non-convex--concave min-max optimization. Empirically, EXOTIC outperforms gradient-based methods on popular benchmarks from the literature. Finally, we demonstrate the utility of EXOTIC by computing security strategies in multi-player games with three or more players -- a computationally challenging task that, to our knowledge, no prior method solves exactly.

Adoption, Deployment & Impact

20 articles
AI Applications11 articles
Editor's pickPAYWALLTransportation & Logistics
Bloomberg· Yesterday

Pony AI Lifts 2026 Robotaxi Fleet Goal on Faster Growth

Pony AI Inc. raised its robotaxi fleet target for this year by 500 vehicles to 3,500 after reporting stronger-than-expected first-quarter revenue.

Editor's pickTelecommunications
Theregister· Yesterday

Ucell and ZTE complete large-scale deployment of AI‑Powered green network solution in Uzbekistan

Network-wide rollout boosts energy efficiency by 10.6%, cutting carbon emissions and operational costs without compromising user experience

Editor's pickTechnology
Arxiv· Yesterday

BODHI: Precise OS Kernel Specification Inference

arXiv:2605.23931v1 Announce Type: new Abstract: The formal verification of operating system kernels requires precise specifications that capture the intended behavior of system calls. Writing these specifications manually demands deep domain expertise, motivating the use of large language models (LLMs) to automate the process. However, in OSV-Bench, a benchmark of 245 specification generation tasks derived from the Hyperkernel OS kernel, the best reported Pass@1 is 55.10%. We propose a domain knowledge prompting method (BODHI), which augments the standard few-shot prompt with a structured C-to-Python translation guide covering 15 categories of domain-specific translation patterns. Inspired by Structured Chain-of-Thought (SCoT) prompting, the guide organizes translation by separation of concerns, addressing pre-condition extraction and post-condition generation as distinct categories. Evaluated on nine models from six providers (Anthropic, Mistral, Amazon, DeepSeek, Meta, Alibaba), covering dense, mixture-of-experts and reasoning architectures, BODHI improves every model tested, with gains ranging from +11% to +32%. The best configuration (Claude Opus 4.6 + BODHI) reaches 96.73% Pass@1. BODHI reduces both syntax and semantic errors, with the strongest effect on models that have sufficient instruction-following capability to utilize structured reference material. These results demonstrate that domain knowledge injection is a model-agnostic technique that substantially bridges the gap between general-purpose code generation and formal specification synthesis.

Editor's pickEducation
Arxiv· Yesterday

KT4EQG: Personalized Exercise Question Generation via Knowledge Tracing

arXiv:2605.23933v1 Announce Type: new Abstract: Educational Question Generation (EQG) aims to synthesize customized exercise questions that enhance student learning. An effective EQG system should ideally personalize questions for each student by modeling the student's knowledge state and generating questions that provide the greatest learning benefit. However, few existing EQG approaches are able to achieve such fine-grained personalization. In this paper, we explore how EQG can benefit from knowledge tracing (KT), which models students' knowledge states based on historical performance and predicts future performance. We propose KT4EQG, a personalized EQG framework that generates effective questions for individual students under the guidance of a KT model. Specifically, KT4EQG seeks to maximize a student's potential improvement in overall knowledge mastery by leveraging the KT model to select the most suitable knowledge concept for the student to practice. An LLM-based question generator is then trained to produce a question faithfully grounded in the selected concept. Experimental results on XES3G5M and MOOCRadar show that KT4EQG consistently generates more effective questions than methods with limited or no personalization.

Editor's pickHealthcare
Arxiv· Yesterday

Authority Signals in Claude AI Health Citations: A Descriptive Analysis Using the Authority Signals Framework

arXiv:2605.23921v1 Announce Type: new Abstract: This study seeks to determine the authority signals used by Anthropic's Claude AI in its presentation of sources when answering consumer health questions. While there exists a great deal of discourse around the quality of health citations that LLMs produce, there is limited information on the integrity of the sources the citations originate from, and to what extent the sources are, from what health professionals would consider, credible sources. This descriptive cross-sectional study used data from HealthSearchQA, which contains 3,172 consumer health questions curated by Google Research. After exclusions, a final dataset of 3,075 questions yielding 10,038 citations was analyzed. The Authority Signals Framework (Jacques et al., 2026) was applied to examine 10 authority signals across four domains for a disproportionate stratified sample of 542 sources. Established institutional sources accounted for 97.8% of all citations (n = 9,818). Medical Institutions were the most frequently cited organization type (36.5%), followed by Government Resources (31.6%) and Professional Associations (28.4%). Commercial Health Information comprised 2.2% (n = 220). The top 10 organizations accounted for 57.8% of all citations, with Mayo Clinic alone representing 24.7%. Among commercial sources in the focused sample, 86.4% displayed medical review statements, 82.5% used schema markup, and 71.8% had comprehensive content, while traditional institutional sources appeared in Claude's citations with or without these same markers. As Anthropic positions Claude for HIPAA-ready healthcare applications, these findings establish a baseline for Claude's citation behavior and demonstrate the utility of the Authority Signals Framework as a tool for ongoing, cross-platform evaluation of AI-mediated health information.

Editor's pickPAYWALLConsumer & Retail
Bloomberg· Yesterday

Google’s Fitbit Air Gives Whoop Some Serious Competition

The Fitbit Air, a new $100 screenless wearable from Alphabet Inc.’s Google, represents a major evolution in what consumers can expect from fitness trackers as tech companies race into an era of personalized health and artificial intelligence-powered wellness insights.

Editor's pickTechnology
Bebeez· Yesterday

Aiven co-founder Hannu Valtonen’s Avrea emerges from stealth with €4 million to build AI-native CI/CD platform

Avrea, a Helsinki-based startup offering an AI-native CI/CD platform built for the new era of development, today announced that it has emerged from stealth and has raised €4 million ($4.7 million) in total pre-Seed funding led by Earlybird. Avrea was founded by Hannu Valtonen, co-founder of Finnish unicorn Aiven, and Juha Valvanne, co-founder of Nosto.  […]

Editor's pickConsumer & Retail
Bebeez· Yesterday

AI compliance startup Certo raises $4m seed round led by Daphni to scale regulatory platform for beauty and CPG brands

The Paris-San Francisco startup, founded by Bastien Deliège-Coste and Jean Duquenne, is already working with major global consumer goods groups. Entrepreneurs First, Motier Ventures and Transpose Platform also joined the round French-US startup Certo, an AI-powered regulatory compliance platform for consumer goods companies, has raised a 4 million US dollars seed round led by French […]

Editor's pickManufacturing & Industrials
Forbes· Yesterday

Council Post: Orchestrating Your AI-Powered Supply Chain For Growth And Profitability

As supply chain disruptions intensify, AI-powered orchestration is helping organizations move beyond fragmented systems and reactive firefighting toward real-time coordination, faster decisions and more resilient operations.

Editor's pickHealthcare
Bebeez· Yesterday

YC-backed French preventive health platform Lucis raises €17.3 million Series A led by Singular

Lucis, a Paris-based preventive health platform that uses blood biomarker analysis and AI to deliver personalised, science-based health recommendations, has raised €17.1 million ($20 million) in Series A funding.  The round was led by Singular, with participation from General Catalyst, Y Combinator, and angels including investors behind Runna, Céline Lazorthes (Resilience), and Manu Lecomte. This […]

Editor's pickConsumer & Retail
Electro IQ· 2 days ago

AI in the Trades: Key Statistics on Automation Adoption Among Home Service Operators

Explore how AI in the trades is transforming home care with improved scheduling and streamlined operations for service providers.

AI Measurement & Evaluation1 articles
Editor's pickProfessional Services
Arxiv· Yesterday

PAIRED: A Process-Anchored Framework for Transparent Reporting of AI Contributions in Scientific Research

arXiv:2605.24325v1 Announce Type: new Abstract: The rapid integration of generative AI into scientific research has exposed a critical gap in academic disclosure practice. Existing frameworks for reporting AI contributions are uniformly output-oriented -- they document what AI produced, not how the research unfolded. As a result, researchers who wish to report their AI collaboration honestly lack the tools to do so: no current framework can distinguish between a researcher who originated a research direction and one who adopted a direction proposed by AI, or between a researcher who critically evaluated AI-generated alternatives and one who accepted AI output without independent assessment. This gap is not a matter of compliance detail; it is a failure to capture the cognitive dynamics that determine what kind of intellectual contribution a paper actually represents. We propose PAIRED -- Process-Anchored Interaction Reporting for AI-Enabled Discovery -- a dual-facing framework that addresses this gap through four design principles: process orientation, which takes the decision point rather than the research product as the fundamental unit of documentation; dual-facing output, which derives a structured publisher disclosure from a prospective author log without double work; decision-point granularity, which operates between session-level coarseness and message-level impracticality; and artifact-triggered logging, which provides an auditable rule against selective omission. We demonstrate PAIRED through worked examples, discuss its limitations openly, and propose a model-assisted adoption pathway that embeds the framework's logging discipline directly into AI research platforms.

AI ROI & Business Case4 articles

Geopolitics, Policy & Governance

10 articles
AI Geopolitics1 articles
Editor's pickEnergy & Utilities
Arxiv· Yesterday

Contested Temporalities in Critical Minerals and Resource Extraction for Electric Vehicles

arXiv:2605.24356v1 Announce Type: cross Abstract: The global push for electric vehicles (EVs) has sharply increased demand for critical minerals such as cobalt and lithium, creating a tension between rapid industrial growth and long-term sustainability. Extraction is concentrated in a few regions -- notably the Democratic Republic of Congo (DRC), Chile, and Argentina -- where it has produced serious socio-environmental harms, including ecosystem degradation, labour exploitation, and the displacement of Indigenous communities. In the DRC, cobalt mining is frequently linked to child labour and hazardous working conditions; in Chile, lithium extraction intensifies water scarcity and threatens local agriculture and biodiversity. Policy instruments such as the U.S. Inflation Reduction Act (IRA) seek to promote ethical sourcing, but an extraction-driven model continues to deepen global inequalities. This chapter examines the contested temporalities of the transition, in which the short-term economic incentives of extraction conflict with longer-term environmental and social goals. It argues for a place-based framework built on community-centred governance, sustainable mining practices, and circular-economy strategies, including recycling and material substitution, to align resource security with equity and ensure that the shift to EVs does not reproduce the injustices it aims to address.

AI Policy & Regulation5 articles
Editor's pickGovernment & Public Sector
Arxiv· Yesterday

High-Risk AI Systems and the Problem of Identity in the European AI Act

arXiv:2605.23922v1 Announce Type: new Abstract: The EU Artificial Intelligence Act (AIA) establishes a lifecycle governance regime for high-risk AI systems built around ex-ante conformity assessment, post-market monitoring, and re-assessment upon "substantial modification." These obligations presuppose AI identity judgments: regulators and providers must decide when an updated system remains the same system over time. In this work, we show how this logic is clarified by the function+ framework of artifact identity, which individuates AI systems by their intended function together with context-sensitive criteria of appropriate functioning, captured as "AI trustworthiness." We further argue that the AIA does not provide an internal, auditable criterion for synchronic identity--when two AI systems at a given time should count as the same for regulatory purposes--and instead largely defers such sameness determinations to sectoral or harmonization instruments. function+ supplies a synchronic identity test anchored in intended function and trustworthiness profiles and levels, making synchronic identity decisions inspectable in governance settings such as procurement, liability, and market surveillance. Our contribution is a conceptual and auditing lens: we provide a correspondence map between AIA lifecycle obligations and function+ identity components, and we make the synchronic case operationally legible via a minimal decision flow for audit and dispute contexts. We conclude with two implementation-facing recommendations: (1) more precise, testable reporting of intended purpose, and (2) standardized, auditable trustworthiness reporting that supports comparability over time and across deployments.

Editor's pick
Arxiv· Yesterday

Is Decentralized AI Governable? From Regulative Policy to Constitutive Protocol

arXiv:2605.24538v1 Announce Type: new Abstract: Every major framework for governing artificial intelligence presupposes an identifiable entity -- a developer, deployer, or operator -- who can be held responsible and compelled to comply. Decentralized AI (DeAI) dissolves this presupposition. We analyze DeAI as a six-layer decentralizing stack -- model, training, compute, harness, identity, and ownership -- and show how partial decentralization across layers compounds into what we call the \emph{governance vacuum}: a condition in which AI systems are consequential enough to require governance but lack the properties that existing frameworks presuppose in their targets. This vacuum takes two analytically distinct forms: an \emph{accountability gap}, where no addressable principal can be identified, and an \emph{incapacitation gap}, where even an identified principal cannot alter the running system. We demonstrate that these failures are not merely jurisdictional but defeat every presupposition of governance through normative address -- the communication of rules to a comprehending, responsive agent. Drawing on Lessig's modalities of regulation and Searle's distinction between regulative and constitutive rules, we argue for a shift in the locus of governance from policy to protocol, from normative address to architectural constraint. Protocol-based constitutive governance does not address the agents operating within a system but shapes the substrate that determines what kinds of actions are possible within it. We identify four ethical conditions -- legitimacy, contestability, transparency, and non-domination -- that such governance must satisfy to avoid degenerating into unaccountable technocratic power, and we argue that the central political challenge of governing AI in a decentralized world is reconstructing forms of democratic authorization for architectural choices that persist after the ordinary chain of policy has broken down.

Editor's pickGovernment & Public Sector
TNGlobal· Yesterday

The permission paradox: Who controls AI as governments scale adoption? - TNGlobal

As AI becomes more integrated into the citizen journey, the focus is extending beyond deployment toward accountability, orchestration, and trust. The next phase of digital government will depend on how effectively agencies connect data, content, and service delivery across increasingly autonomous ...

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.