AI Intelligence Brief

Fri 22 May 2026

Daily Brief — Curated and contextualised by Best Practice AI

130Articles
Editor's pickEditor's Highlights

Standard Chartered CEO Apologizes, IBM Gains $1 Billion, and Newsom Tackles AI Job Loss

TL;DR Standard Chartered's CEO apologized for referring to workers as 'lower-value human capital' amid AI job loss discussions. The U.S. government is investing $2 billion in quantum computing, with IBM receiving half. World trade surged due to AI investments, while BlackRock highlights AI's role in earnings growth. California Governor Gavin Newsom is addressing AI-induced job displacement with new labor policies.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

5 of 130 articles

Economics & Markets

28 articles
AI Investment & Valuations11 articles
Editor's pickPAYWALLTechnology
Bloomberg· Today

UK’s Softcat Recasts Itself as AI Winner With Guidance Upgrade

Softcat Plc’s image among investors is quickly shifting from AI loser to AI winner.

Editor's pickTechnology
Reuters· Today

Exclusive: Grok falls flat in Washington, undercutting SpaceX's AI growth story | Reuters

WASHINGTON, May 21 (Reuters) - SpaceX’s initial public offering is set to be the largest in history, partly fueled by its promise to grab a chunk of what it calls a multi-trillion-dollar market for artificial intelligence services through ...

Editor's pickTechnology
Reuters· Yesterday

SpaceX IPO filing lays bare losses and Musk control as it stakes future on AI | Reuters

Musk's purchase of his social media and AI company x AI gave ⁠SpaceX new capabilities and opportunities but a staggering amount of spending, accounting for 76% of its $10.1 billion in capital spending in the first quarter, as well as fresh losses.

Editor's pickEnergy & Utilities
CNBC· Yesterday

An AI trade involving energy and infrastructure that's doubled your money, topping Nvidia

If you put the same money into a basket of companies that are building out AI infrastructure and energy sources, you’ve done much better than stocks like Nvidia.

Editor's pickFinancial Services
Reuters· Today

Reuters Reuters | Breaking International News & Views

Elon Musk’s rocket ‌firm is poised to float with a $1.75 trln valuation. With Open AI coming, starry-eyed investors ​are likely to focus on mega ​floats. Windscreen fixer Belron’s mix of stable ⁠growth and low AI -disruption risk could ​show that even earthier listings can ​take off.

Editor's pickFinancial Services
Fortune· Today

I’ve spent 25 years in venture capital. Here’s how it quietly shut ordinary Americans out of the AI wealth boom—and what could fix it

The private market didn't just grow. It replaced the system that once let regular investors participate in America's biggest wealth-creation moments. One solution keeps getting ignored.

Editor's pickTechnology
Guardian· Today

Mars colony and Grok warnings: five strange details in SpaceX’s pitch to investors

IPO filing from Elon Musk’s company reveals closer look at finances, cosmic ambitions and tech empire’s quirks SpaceX publicly released an investor prospectus on Wednesday as part of its plan for a $1.75tn debut on the US stock market next month, revealing unseen details about the finances and future plans of Elon Musk’s flagship company. In addition to new information on operating costs and revenue, the filing also included trademark Muskian sweeping proclamations about the universe and insights into some of the quirks of his tech empire. Scattered throughout the 300-plus-page prospectus are several disclosures and risk warnings that show the eccentricities of Musk’s company and its cosmic ambitions. Other financial details in the document highlight how interdependent Musk’s various businesses have become and the risks that they carry. Continue reading...

Editor's pickFinancial Services
Fundingo· Yesterday

The Synthetic Yield Mastering the Structural Complexity of Specialized Commercial AI Hardware and Compute Infrastructure Finance - Loan Management Software by Fundingo

The Synthetic Yield: Mastering the Structural Complexity of Specialized Commercial AI Hardware and Compute Infrastructure Finance The rapid proliferation of large language models and generative artificial intelligence has fundamentally altered the risk-return profile of specialized infrastructure ...

Editor's pickTechnology
RS Web Solutions· Today

QUALCOMM Earnings Report and AI Smartphone Technology Insights

Such competitive dynamics can significantly impact pricing strategies, profit margins, and momentum in securing design wins, all factors of keen interest to institutional and retail investors. Regulatory frameworks and trade policies present additional industry-wide concerns. Export controls on advanced semiconductor technologies and evolving geopolitical ...

Editor's pickPAYWALLTechnology
Bloomberg· Today

Norway’s $2.3 Trillion Fund Objects to Elkann’s Meta Board Seat

Norway’s $2.3 trillion wealth fund expressed dissatisfaction with the reappointment of John Elkann, chairman of Stellantis NV and chief executive of investor Exor NV, on the board of directors at Meta Platforms Inc.

Editor's pickTechnology
The Motley Fool· Today

CoreWeave vs. Nebius: Which Artificial Intelligence (AI) Infrastructure Stock Is a Better Buy in 2026? | The Motley Fool

Both CoreWeave and Nebius power AI giants with critical infrastructure, but one looks like the superior investment.

AI Macroeconomics4 articles
Editor's pick
ProMarket· Yesterday

More AI-Exposed Industries and States Are Benefiting, But Results Are Heterogenous - ProMarket

In new research, Christos Makridis and Andrew Johnston find that industries exposed to generative AI are seeing an increase in production, employment, and wages. However, the majority of AI-driven revenue growth is channelled back to capital as profits, rather than to workers.

Editor's pick
Arxiv· Today

Can Rising Consumption Deepen Inequality?

arXiv:2601.15537v2 Announce Type: replace-cross Abstract: The impact of rising consumption on wealth inequality remains an open question. Here we revisit and extend the Social Architecture of Capitalism agent-based model proposed by Ian Wright, which reproduces stylized facts of wealth and income distributions. In a previous study, we demonstrated that the macroscopic behavior of the model is predominantly governed by a single dimensionless parameter, the ratio between average wealth per capita and mean salary, denoted by R. The shape of the wealth distribution, the emergence of a two-class structure, and the level of inequality - summarized by the Gini index - were found to depend mainly on R, with inequality increasing as R increases. In the present work, we examine the robustness of this result by relaxing some simplifying assumptions of the model. We first allow transactions such as purchases, salary payments, and revenue collections to occur with different frequencies, reflecting the heterogeneous temporal dynamics of real economies. We then impose limits on the maximum fractions of wealth that agents can spend or collect at each step, constraining the amplitude of individual transactions. We find that the dependence of the inequality on R remains qualitatively robust, although the detailed distribution patterns are affected by relative frequencies and transaction limits. Finally, we analyze a further variant of the model with adaptive wages emerging endogenously from the dynamics, showing that self-organized labor-market feedback can either stabilize or amplify inequality depending on macroeconomic conditions.

AI Market Competition5 articles
AI Productivity3 articles
Editor's pickTransportation & Logistics
Arxiv· Today

COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space

arXiv:2605.20618v1 Announce Type: new Abstract: Although Vehicle Routing Problems (VRP) are essential to many real-world systems, they remain computationally intractable at scale due to their combinatorial complexity. Traditional heuristics rely on handcrafted rules for local improvements and occasional \textit{jumps} to escape local minima, but often struggle to generalize across diverse instances. We introduce \textbf{COAgents}, a cooperative multi-agent framework that models the search process as a graph: nodes represent solutions, and edges correspond to either local refinements or large perturbations for diversification (i.e., jumps). A \textit{Partial Search Graph} (PSG) is dynamically constructed during search, enabling COAgents to train a Node Selection Agent and a Move Selection Agent to guide intensification, and a Jump Agent to trigger well-timed explorations of new regions. Unlike end-to-end learning approaches, COAgents cleanly separates problem-agnostic search control from compact domain-specific encoding, facilitating adaptability across tasks. Extensive experiments on the CVRP and VRPTW benchmarks show that COAgents remains competitive with several learn-to-search baselines on CVRP and sets a new state of the art among learning-based methods on the more challenging VRPTW instances, reducing the gap to the best-known solutions by 14\% at $N\!=\!100$ and 44\% at $N\!=\!50$ relative to the strongest neural solver (POMO), and by 21\% and 40\% respectively relative to ALNS. Code is available at https://github.com/mahdims/COAgents.

Editor's pick
Fortune· Today

I've led companies through every major tech disruption. AI washing is the same mistake, every time | Fortune

Leaders using AI to justify workforce cuts are missing the real opportunity to build more capable organizations.

Labor, Society & Culture

26 articles
AI & Culture2 articles
AI & Employment9 articles
Editor's pick
Arxiv· Today

Who Uses AI? Platforms, Workforce, and AI Exposure

arXiv:2605.21743v1 Announce Type: cross Abstract: A growing literature uses artificial intelligence platform conversation logs to measure occupation exposure. We show that these scores partly measure platform user base rather than the workforce. Holding outcome, sample, controls, and estimator fixed while varying only the platform input changes the post-ChatGPT employment coefficient by a factor of 1.9, and within-vendor consumer-versus-enterprise channels produce estimates that disagree in sign. Reweighting to Bureau of Labor Statistics workforce shares attenuates estimates by 42 to 93 percent. We formalize the non-classical measurement error, derive probability limits and partial-identification bounds for employment elasticities. The bias understates substitution more than augmentation.

Editor's pickPAYWALLGovernment & Public Sector
FT· Yesterday

Generating tax revenues in an automated world

If AI destroys job markets, governments will need to make up the resulting shortfall in labour income tax receipts

Editor's pickProfessional Services
Theregister· Today

Workday wants AI to punch in instead of having to hire new recruits

CEO eyes margin gains by keeping headcount flat – bold for a company selling HR software to employers

Editor's pickProfessional Services
Fortune· Yesterday

Ex-Facebook exec Sheryl Sandberg says the 10-year career plan is dead thanks to AI:  ‘Don’t script your career when the future is uncertain,’ she warns Gen Z

The former Meta says rigid career plans will backfire: "If I had one, I would have missed the internet," Sheryl Sandberg warned Gen Z.

Editor's pickProfessional Services
Employee Benefit News· Yesterday

Workers say reliance on AI is eroding skills and judgment

A new GoTo study finds workers increasingly depend on AI tools, raising concerns about misuse, poor judgment and declining skills.

AI & Misinformation2 articles
Editor's pickFinancial Services
VentureBeat· Yesterday

Americans can’t spot a deepfake, and that’s a business crisis, not just a consumer problem

Presented by Veriff Americans can’t reliably distinguish real from AI-generated content, and that’s not just a media literacy problem; it’s a direct threat to how businesses verify identity online. New research finds that while many people are aware of deepfakes, their ability to distinguish them from reality is barely better than a coin flip. A 2026 survey conducted by Veriff and Kantar among 3,000 respondents in the United States, the United Kingdom, and Brazil shows Americans scoring just 0.07 on a scale where 0 represents random guessing. If people can’t distinguish authentic visual content, they can’t reliably distinguish authentic identities. In practice, that means the same users interacting with digital services are often unable to tell whether the person on the other side of a screen is real. That ineffectiveness has direct consequences for every digital business that relies on image- and video-based identity verification to confirm who is on the other side of a screen. That includes everything from customer bank onboarding and account recovery to marketplace seller verification, high-value ecommerce transactions, social platform authentication, and enterprise access control. In the U.S., those consequences are already material — synthetic identity fraud now accounts for billions in annual losses, and the tools to generate convincing fakes are now widely accessible. The report also identifies a small but high-risk cohort: the roughly 7% of users who perform poorly at detecting deepfakes, yet remain confident in their ability and rarely verify what they see. While this is small as a percentage, at scale it represents millions of accounts that are highly exploitable targets for fraud. If users can’t reliably distinguish real from synthetic identities, then any system that depends on visual verification is fundamentally exposed. Identity verification can no longer be treated as a compliance function; instead, it has to be built as core digital infrastructure. “Now that AI-generated content is becoming indistinguishable from reality, the human eye alone is no longer a reliable line of defense,” says Ira Bondar-Mucci, fraud platform lead at Veriff. "Businesses and policymakers in the U.S. need to close this awareness gap urgently, while simultaneously investing in automated verification technologies that can catch what humans simply can’t." The U.S. deepfake awareness gap is wider than expected The United States might be the global epicenter of generative AI development, but American consumers demonstrate the lowest familiarity with deepfakes among the three surveyed markets. Only 63% of U.S. adults are familiar with the term, compared to 74% in the UK and 67% in Brazil. “There’s a paradox at play,” Bondar-Mucci says. “The U.S. is the global epicenter of AI development, yet American consumers are the least familiar with one of its most dangerous byproducts. Historically, consumers have had higher baseline trust in digital content, with the conversation about fraud centered more on data privacy than on content authenticity. The problem is that low awareness doesn’t reduce risk, it amplifies it. If you don’t know what a deepfake is, you’re far less likely to pause and verify whether you've encountered one.” Human deepfake detection is barely better than a coin flip In practice, the randomness that characterizes consumer’s ability to distinguish real from fake is evident across the ways people assess different types of content. Video content proved to be especially difficult to assess, with fake videos frequently identified as authentic and real videos often flagged as fake. Even in side-by-side comparisons, respondents split their judgments close to evenly, another indication that visual inspection alone is no longer a reliable method for verifying authenticity. Overconfidence in deepfake detection creates a dangerous vulnerability Roughly half of U.S. respondents say they are confident in their ability to identify deepfakes, but that confidence far exceeds actual performance, demonstrating that self-assessment is effectively meaningless. Within that population, there’s that small but high-risk cohort: the approximately 7% of users who are inaccurate, yet overconfident in their ability and rarely verify suspicious content. “This confidence-competence gap creates a false sense of security that fraudsters are primed to use,” says Bondar-Mucci. “When people believe they can’t be fooled, they stop looking for the signs. That’s precisely when they’re most vulnerable, whether to a synthetic identity used in financial fraud or a fabricated video designed to manipulate trust.” For businesses, the implication is clear: any organization that still relies on manual review processes or customer self-attestation is inheriting this vulnerability directly. Human judgment is an increasingly unreliable safeguard, and verification needs to be built into systems by default. This means automated, technology-led, and not dependent on the end user’s self-assessment of their ability to tell real from fake. Americans are worried about deepfakes but trust platforms to handle them Concern about deepfakes is high across the U.S., with 79% of respondents reporting they are rather or extremely concerned about personal fraud and impersonation. The U.S. diverges from other markets in where that concern gets directed. Americans are more likely than UK or Brazilian respondents to trust social media platforms and digital services to identify and manage AI-generated content. That delegation of responsibility may be reducing individual vigilance at exactly the moment the threat is accelerating. “We’re seeing synthetic identities used to open fraudulent accounts and authorize transactions, and deepfake videos deployed to bypass basic verification checks,” he explains. “What makes this particularly urgent is the combination of great concern with relatively high platform trust. That gap between perceived and actual protection is exactly where fraud thrives.” The business case for automated identity verification has never been stronger The gap between what Americans believe they can detect and what they actually can is not a knowledge problem that awareness campaigns will resolve, but a design flaw in any system that places the burden of identity verification on unassisted human judgment. The effective response is not to remove humans from the verification loop, but to stop assigning them tasks that human perception can no longer perform reliably. Organizations that persist in relying on manual review processes or customer self-attestation are absorbing this vulnerability into their operations. The alternative is automated, AI-powered identity verification that operates at the point of interaction, detects synthetic media before a human decision is required, and does not depend on the end user’s ability to distinguish real from fake. “Seeing is no longer believing,” says Bondar-Mucci. “The companies that build verification infrastructure around that reality, rather than around the assumption that it will be otherwise, are the ones best positioned to sustain customer trust as the synthetic media landscape continues to evolve.” Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Editor's pickMedia & Entertainment
Arxiv· Today

Detecting Synthetic Political Narratives in Cross-Platform Social Media Discourse

arXiv:2605.21540v1 Announce Type: cross Abstract: The proliferation of large language models has introduced a new paradigm of synthetic political communication in which narratives may be generated, semantically coordinated, and strategically disseminated across platforms at scale. We present a cross-platform framework for detecting synthetic political narratives using four coordination signals -- lexical diversity D(C), temporal burstiness B(C), rhetorical repetition R(C), and semantic homogenization H(C) -- combined into a Synthetic Narrative Coordination Score SNC(C). We apply the framework to a corpus of 353,223 records spanning six geopolitical event windows collected from six Telegram channels and nine Reddit communities (2023--2026). Results show that IntelSlava exhibits the lowest lexical diversity (MATTR 0.52--0.54), the highest burstiness (B=+0.48 to +0.73), and the highest rhetorical overlap with peer channels (Jaccard 0.12), ranking first in the composite SNC(C) on four of six event windows (SNC 0.45--0.60). Rybar ranks last on all windows despite its high semantic homogenization, because its Russian-language output yields high lexical diversity and near-zero rhetorical Jaccard with English-language channels -- demonstrating that no single indicator is sufficient for coordination detection. Multi-dimensional SNC(C) scoring provides a more robust and interpretable signal than any individual metric.

AI Ethics & Safety7 articles
Editor's pickConsumer & Retail
Artificial Intelligence Newsletter | May 22, 2026· Yesterday

Cox Media, two others settle US FTC claims over AI marketing service

Cox Media Group and two marketing firms agreed to pay $930,000 to settle FTC allegations regarding deceptive claims about using AI to listen to smart device conversations for ad targeting.

Editor's pickGovernment & Public Sector
Arxiv· Today

Machine Learning as Performative Materialist Practice: Thirteen Theses on the Epistemology, Methodology, and Politics of Applied ML

arXiv:2605.21785v1 Announce Type: new Abstract: Machine learning practice in institutional decision-support contexts -- government, public policy, public health, criminal justice, resource allocation -- rests on a set of largely unexamined epistemological commitments inherited from classical statistics and computer science: that models represent stable regularities, that validation can be context-free, that performance metrics are politically neutral, and that feature importance reveals system structure. This paper challenges these commitments through a unified framework of performative materialist ML, articulated as thirteen theses. Drawing on Pickering's cybernetic ontology, the performativity literature from economic sociology (Callon, MacKenzie), Simon's bounded rationality, the formalization of performative prediction (Perdomo et al., 2020), and fifteen years of applied ML experience in government and public policy, we argue that: (1) ML models are best understood not as truth-seeking representations but as temporally situated compressions that function as instruments of intervention; (2) the full data product is a complex adaptive system that coevolves with its target and navigates a multi-objective space no single algorithm can optimize; (3) validity is fundamentally performative, measured by effects in the world rather than formal properties of the model; (4) the choices embedded in objective functions, fairness criteria, and resource thresholds are political decisions belonging to stakeholders, not technicians. We show how these theses unify several practical prescriptions -- temporal cross-validation, precision and recall at k, pipeline-aware fairness auditing, satisficing over optimizing -- as consequences of a coherent materialist epistemology rather than isolated best practices

Editor's pickTechnology
Artificial Intelligence Newsletter | May 22, 2026· Yesterday

Personalization of AI chatbots becoming focus of UK data watchdog, official says

The UK's Information Commissioner's Office is examining how AI chatbots collect and use personal data as it develops new guidance for responsible AI use under data protection law.

Editor's pickEducation
Arxiv· Today

CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety

arXiv:2605.21609v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly embedded in adolescent digital environments, mediating information seeking, advice, and emotionally sensitive interactions. Yet existing safety mechanisms remain largely grounded in adult-centric norms and operationalize safety through refusal-oriented suppression. While such approaches may reduce immediate policy violations, they can also create conversational dead-ends, limit constructive guidance, and fail to address the developmental vulnerabilities inherent in adolescent-AI interactions. We argue that adolescent LLM safety should be framed not solely as a filtering problem, but as a socio-technical, developmentally aligned transformation problem. To operationalize this perspective, we propose Critique-and-Revise-for-Teenagers (CR4T), a model-agnostic safeguarding framework that selectively reconstructs unsafe or refusal-style outputs into ageappropriate, guidance-oriented responses while preserving benign intent. CR4T combines lightweight risk detection with domain-conditioned rewriting to remove risk-amplifying content, reduce unnecessary conversational shutdown, and introduce developmentally appropriate guidance. Experimental results show that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions. These findings suggest that selective response reconstruction offers a more human-centered alternative to refusal-centric guardrails for adolescent-facing LLM systems.

Editor's pick
Arxiv· Today

Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

arXiv:2605.22109v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) are increasingly deployed in human-facing roles where personality perception is critical, yet existing benchmarks evaluate this capability solely on numerical Big Five score prediction, leaving open whether models truly perceive personality through behavioral understanding or merely prejudge through superficial pattern matching. We address this gap with three contributions. (i) A new task: we formalize Grounded Personality Reasoning (GPR), which requires MLLMs to anchor each Big Five rating in observable evidence through a chain of rating, reasoning, and grounding. (ii) A new dataset: we release MM-OCEAN (1,104 videos, 5,320 MCQs), produced by a multi-agent pipeline with human verification, with timestamped behavioral observations, evidence-grounded trait analyses, and seven categories of cue-grounding MCQs. (iii) Benchmark and analysis: we design a three-tier evaluation (rating, reasoning, grounding) plus four sample-level failure-mode metrics: Prejudice Rate (PR), Confabulation Rate (CR), Integration-failure Rate (IR), and Holistic-grounding Rate (HR), and benchmark 27 MLLMs (13 closed, 14 open). The analysis uncovers a striking Prejudice Gap: across the field, 51% of correct ratings are not grounded in retrieved cues, and the Holistic-Grounding Rate spans only 0-33.5%. These findings expose a disconnect between getting the right score and reasoning for the right reason, charting a roadmap for grounded social cognition in MLLMs.

Editor's pickEducation
Fortune· Today

A school district’s lawsuit against Meta for mental health costs was set for trial next month. Zuckerberg settled

The school district had sought more than $60 million to create a 15-year program it said would help counteract mental health and learning issues.

Editor's pickTechnology
Guardian· Today

Meta and Snapchat blocking Saudi dissidents’ accounts

US social media firms acting on orders from Middle East kingdom accused of being ‘instruments of repression’ Major US social media companies including Meta’s Facebook and Instagram platforms have blocked the accounts of Saudi Arabian dissidents so they are no longer visible inside the kingdom, following orders by Saudi authorities. Those affected include Abdullah Alaoudh, a US-based activist and vocal critic of Saudi human rights violations, and Omar Abdulaziz, a Canada and UK-based activist who worked closely with Jamal Khashoggi before the journalist’s murder by Saudi agents in 2018. The headline on this article was amended on 22 May 2026. An earlier version wrongly said X was blocking dissidents’ accounts. This has been corrected Continue reading...

AI Skills & Education5 articles

Technology & Infrastructure

34 articles
AI Agents & Automation11 articles
Editor's pickTelecommunications
Arxiv· Today

From Automated to Autonomous: Hierarchical Agent-native Network Architecture (HANA)

arXiv:2605.20608v1 Announce Type: new Abstract: Realizing Level 4/5 Autonomous Networks (AN) demands a shift from static automation to agent-native intelligence. Current operations, reliant on rigid scripts, lack the cognitive agency to handle off-nominal conditions. To address this, this letter proposes a hierarchical multi-agent reference architecture enabling high-level autonomy. The framework features a Dual-Driven Orchestrator that coordinates specialized Executive Agents, supported by a shared Public Memory for unified domain knowledge. A key innovation is the integration of agent self-awareness, which empowers the system to harmonize deliberative strategic governance with reflexive fault recovery. We instantiate and validate this architecture within a 5G Core environment. Case studies demonstrate that the system sustains critical throughput under congestion and reduces Mean Time to Repair (MTTR) by 86%, confirming its efficacy in unifying strategic planning with operational resilience.

Editor's pickManufacturing & Industrials
Arxiv· Today

Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

arXiv:2605.20190v1 Announce Type: new Abstract: Iterative industrial design-simulation optimization is bottlenecked by the CAD-CAE semantic gap: translating simulation feedback into valid geometric edits under diverse, coupled constraints. To fill this gap, we propose COSMO-Agent (Closed-loop Optimization, Simulation, and Modeling Orchestration), a tool-augmented reinforcement learning (RL) framework that teaches LLMs to complete the closed-loop CAD-CAE process. Specifically, we cast CAD generation, CAE solving, result parsing, and geometry revision as an interactive RL environment, where an LLM learns to orchestrate external tools and revise parametric geometries until constraints are satisfied. To make this learning stable and industrially usable, we design a multi-constraint reward that jointly encourages feasibility, toolchain robustness, and structured output validity. In addition, we contribute an industry-aligned dataset that covers 25 component categories with executable CAD-CAE tasks to support realistic training and evaluation. Experiments show that COSMO-Agent training substantially improves small open-source LLMs for constraint-driven design, exceeding large open-source and strong closed-source models in feasibility, efficiency, and stability.

Editor's pickTechnology
MIT Technology Review· Today

The Download: coding’s future, the ‘Steroid Olympics,’ and AI-driven science

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Anthropic’s Code with Claude showed off coding’s future—whether you like it or not At Anthropic’s developer event in London this week, Code with Claude, attendees were asked if they’d shipped code…

Editor's pickTechnology
Top Daily Headlines: Gemini accused of 30,000-line code purge and fake recovery report· Today

Gemini accused of 30,000-line code purge and fake recovery report

An AI coding agent reportedly broke production and generated fictitious post-mortem paperwork after a rollback.

Editor's pickTransportation & Logistics
Bebeez· Today

Norway’s Roboxi lands €13 million to transform airport airside operations with automation and robotics

Roboxi, a Stavanger-based startup specialising in airport airside automation and autonomy, has announced the completion of a share issue raising approximately €13 million in new equity. According to the company, the share issue generated significant interest from both new and existing shareholders. The primary investors are prominent ones based in the Rogaland region of Norway. […]

Editor's pickTechnology
InfotechLead· Yesterday

IDC: 93% of Enterprises See AI as Revenue Driver as Agentic AI Reshapes Business Strategy - InfotechLead

IDC revealed AI has evolved from an experimental technology into a core business growth engine, with enterprises adopting agentic AI

Editor's pickPharma & Biotech
Arxiv· Today

AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows

arXiv:2605.20425v1 Announce Type: new Abstract: Designing multi-agent workflows is especially difficult in open-ended scientific settings where tasks lack curated training sets, reliable scalar evaluation metrics, and standardized interfaces between existing tools and agents. We propose AgentCo-op, a retrieval-based synthesis framework that composes reusable skills, tools, and external agents into executable workflows through typed artifact handoffs, then applies bounded self-guided local repair to implicated components when execution evidence indicates failure. In two open-world genomics case studies, AgentCo-op composes independently developed scientific agents and external tool repositories into auditable workflows without redesigning them or running global topology search. It coordinates specialized agents for spatial transcriptomics and gene-set interpretation to enable collaborative discovery from spatial transcriptomics data, and builds a parallel workflow for cross-modality marker analysis on single-cell multiome data. AgentCo-op can also import a searched workflow as a structural prior and improve it by grounding nodes with retrieved components and applying local repair, showing that synthesis and search are complementary. On six coding, math, and question-answering benchmarks, AgentCo-op achieves the best result on four benchmarks and the best average score under a unified backbone setting, while consistently reducing per-task cost relative to multi-agent baselines. Together, these results suggest that retrieval-based synthesis can extend automated agentic workflow design beyond benchmark-optimized agent graphs to open-world workflows built from existing agents, tools, and typed artifacts.

Editor's pickTechnology
Arxiv· Today

SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation

arXiv:2605.20189v1 Announce Type: new Abstract: Despite the remarkable success of large language models (LLMs), they still face bottlenecks while deploying in dynamic, real-world settings with primary challenges being concept drift and the high cost of gradient-based adaptation. Traditional fine-tuning (FT) struggles to adapt to non-stationary data streams without resulting in catastrophic for getting or requiring extensive manual data curation. To address these limitations within the streaming and continual learning paradigm, we propose the Self-Optimizing Lifelong Autonomous Reasoner (SOLAR) which is an open-ended autonomous agent that leverages parameter-level meta-learning to self-improve, treating model weights as an environment for exploration. It initiates the process by consolidating a strong prior over common-sense knowledge making it effective for transfer-learning. By utilizing a multi-level reinforcement learning approach, SOLAR autonomously discovers adaptation strategies, enabling efficient test-time adaptation to unseen domains. Crucially, SOLAR maintains an evolving knowledge base of valid modification strategies, implicitly acting as an episodic memory buffer to balance plasticity (adaptation to new tasks) and stability (retention of meta-knowledge). Experiments demonstrate that SOLAR outperforms strong baselines on common-sense, mathematical, medical, coding, social and logical reasoning tasks, marking a significant step toward autonomous agents capable of lifelong adaptation in evolving environments.

Editor's pickTechnology
Thediligencestack· Yesterday

The Agentic AI Storage Shock - by Ben Bajarin

How enterprise agents turn data lakes, workflow logs, and generated artifacts into the next infrastructure gating layer

Editor's pickTechnology
Insignia· Yesterday

Why Agentic AI’s Next Breakthrough Depends on Search - Insignia Business Review

AI is entering a new phase. Attention is shifting toward inference: how to run those models reliably, cheaply, and at scale in real-world environments.

Editor's pickDefense & National Security
Small Wars Journal· Today

Rethinking Artificial Intelligence at the Strategic Frontier

AI in defense shifts from tools to human-AI teaming; interaction-centered design improves trust, decisions, and security outcomes in complex environments.

AI Infrastructure & Compute6 articles
Editor's pickTechnology
TechRadar· Yesterday

How AI demand is redefining enterprise infrastructure strategy | TechRadar

AI adoption is causing increased pressure on supply chains and component availability

Editor's pickGovernment & Public Sector
Bebeez· Today

Gaia AI supercomputer launched in Kraków, Poland

Poland has inaugurated its second AI factory in its southern city of Kraków. Known as the Gaia AI Factory, the 10 exaflop supercomputer will harness more than a thousand GPU accelerators to facilitate the training of advanced AI models and research into practical applications for the technology in education, healthcare, and public administration – Academic […]

Editor's pick
Arxiv· Today

Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX

arXiv:2605.20577v1 Announce Type: new Abstract: Riichi Mahjong is a multi-player, imperfect-information game characterized by stochasticity and high-dimensional state spaces. These attributes present a unique combination of challenges that mirror complex real-world decision-making problems in reinforcement learning. While prior research has heavily relied on supervised learning from human play logs to pre-train the policy, algorithms capable of learning \textit{tabula rasa} (from scratch) offer greater potential for general applicability, as evidenced by the AlphaZero lineage. To facilitate such research, we introduce \textbf{Mahjax}, a fully vectorized Riichi Mahjong environment implemented in JAX to enable large-scale rollout parallelization on Graphics Processing Units (GPUs). We also provide a high-quality visualization tool to streamline debugging and interaction with trained agents. Experimental results demonstrate that Mahjax achieves throughputs of up to \textbf{2 million} and \textbf{1 million steps per second} on eight NVIDIA A100 GPUs under the no-red and red rules, respectively. Furthermore, we validate the environment's utility for reinforcement learning by showing that agents can be trained effectively to improve their rank against baseline policies.

AI Models & Capabilities7 articles
Editor's pick
Arxiv· Today

Not Yet: Humans Outperform LLMs in a Colonel Blotto Tournament

arXiv:2605.22095v1 Announce Type: new Abstract: The emergence of large language models (LLMs) has spurred economists to study how humans and LLMs behave in strategic settings. We organized a series of round-robin tournaments in the Colonel Blotto game. This game attracts game theorists' attention due to high-dimensional action space and the absence of pure strategy Nash equilibria. In the first tournament, more than 200 human participants competed against one another. In the second tournament, several popular LLMs were invited to submit strategies. In the third tournament, we matched the number of LLM strategies to the number submitted by humans. We find that humans more often employ better-calibrated intermediate-level allocation heuristics and outperform the simpler, more stereotyped strategies submitted by LLMs. Strategic sophistication is key to success if and only if the necessary level of reasoning depth is reached, while lower and higher levels of reasoning offer no clear advantage over the primitive strategies. Among humans, field of study weakly predicts success: participants with STEM backgrounds perform better in the first tournament. Surprisingly, humans almost do not adjust their strategies across tournaments with different sets of opponents. This result suggests that humans base their choices primarily on the game's rules rather than on the identity of their opponents, treating LLMs much like human competitors.

Editor's pickTechnology
VentureBeat· Yesterday

A 0.12% parameter add-on gives AI agents the working memory RAG can't

AI agents forget. Every time a coding assistant loses track of a debugging thread, or a data analysis agent re-ingests the same context it already processed, the team pays in latency, token costs, and brittle workflows. The fix most teams reach for — expanding the context window or adding more RAG — is increasingly expensive and still doesn't reliably work. To address this, researchers from Mind Lab and several universities proposed delta-mem, an efficient technique that compresses the model’s historical information into a dynamically updated matrix without changing the model itself. The resulting module adds just 0.12% of the backbone model's parameters — compared to 76.40% for one leading alternative — while outperforming it on memory-heavy benchmarks. Delta-mem allows models to continuously accumulate and reuse historical data, reducing the reliance on massive context windows or complex external retrieval modules for behavioral continuity. The long memory challenge The conventional solution is to simply dump all the information into the model’s context window. But as Jingdi Lei, co-author of the paper, told VentureBeat, current systems treat memory merely as a context-management problem. “Either we keep expanding the context window, or we retrieve more documents through RAG,” Lei explained. “These approaches are useful and will remain important, but they become increasingly expensive and brittle when agents need to operate over long-running, multi-step interactions, and they don't really [work] like human memory since they are more like looking up documents.” In enterprise settings, the bottleneck is not just whether the model can access history, but whether it can reuse that history efficiently, continuously, and with low latency. Standard attention mechanisms incur a quadratic computational cost as the sequence length increases. Furthermore, expanding the context window does not guarantee the model will actually recall the information effectively. Models often suffer from context degradation or context rot as they become overwhelmed with more (and often conflicting) information, even if they support one million tokens in theory. The researchers argue for advanced memory mechanisms that can represent historical information compactly and maintain it dynamically across interactions. Existing solutions come with heavy trade-offs and generally fall into three paradigms: Textual memory: stores history as text injected into context — constrained by window limits and prone to information loss under compression. Outside-channel (RAG): encodes and retrieves from external modules — adds latency, integration complexity, and potential misalignment with the backbone. Parametric: encodes memory into model weights via adapters — static after training, can't adapt to new information during live interactions. Inside delta-mem To achieve a compact and dynamically updated memory, delta-mem compresses an agent’s past interactions into an “online state of associative memory” (OSAM). This state is maintained as a fixed-size matrix that preserves historical information while the underlying language model remains frozen. For enterprise workflows, this translates directly to resolving operational bottlenecks. Lei noted that a persistent coding assistant, for example, “may need to remember project conventions, recent debugging steps, user preferences, or intermediate decisions across a workflow.” Similarly, a data analysis agent might “need to maintain task state, assumptions, and prior observations while iterating over multiple tool calls.”  Rather than repeatedly retrieving and re-inserting all relevant history for these tasks, the delta-mem matrix provides a low-overhead way to carry forward useful interaction states inside the model’s forward computation. During generation, the system does not retrieve raw text segments to add to the prompt. Instead, the backbone LLM’s current hidden state is projected into the matrix to retrieve old memory. This operation extracts context-relevant associative memory signals from delta-mem. These signals are then transformed into numerical corrections that are applied to the computations of the model. This steers the model's reasoning at inference time without altering its internal parameters. Following each interaction, delta-mem updates the online state using “delta-rule learning.” When new information arrives, the previous state makes a prediction about the resulting attention values. It then compares this prediction to the actual value and corrects the memory matrix based on the discrepancy. This update mechanism relies on a “gated delta-rule.” Basically, the memory module has different knobs that control how much previous memory is kept and how much of the new memory is applied. This error correction with controlled forgetting allows the matrix to evolve over time, holding onto stable historical associations without being derailed by short-term noise. The researchers explored three strategies for determining when and how the matrix updates: Token-state write captures fine-grained changes but is vulnerable to short-term noise. Sequence-state write averages tokens within a message segment, smoothing updates at the cost of some localized detail. Multi-state write decomposes memory into sub-states for different information types like facts or task progress. Delta-mem in action The researchers evaluated delta-mem across three LLM backbones: Qwen3-8B, Qwen3-4B-Instruct, and SmolLM3-3B. They configured the framework with a compact 8x8 matrix. The system was tested on general capability benchmarks, including HotpotQA, GPQA-Diamond, and IFEval. It was also evaluated on memory-heavy tasks such as LoCoMo, which tests long-term conversational memory, and Memory Agent Bench, which assesses retention, retrieval, selective forgetting, and test-time learning over extended interactions. The framework was compared against representative models from the three existing memory paradigms: textual memory baselines (e.g., BM25 RAG, LLMLingua-2, and MemoryBank), parametric systems (Context2LoRA and MemGen), and the outside-channel approach MLP Memory. Across the board, delta-mem outperformed the baselines, according to the researchers. On the Qwen3-4B-Instruct backbone, the token-state write variant achieved an average score of 51.66%, easily surpassing the frozen vanilla backbone at 46.79% and the strongest baseline, Context2LoRA, at 44.90%. On the memory-heavy Memory Agent Bench, the average score jumped from 29.54% to 38.85%. Performance on the specific test-time learning subtask nearly doubled from 26.14 to 50.50. However, the most compelling takeaways are the system's operational efficiency. The researchers tested the framework in a no-context setting where the historical text was entirely removed from the context. Even without explicit text replay, delta-mem successfully recovered context-relevant evidence in multi-hop tasks. The researchers argue that the model remembers past interactions without needing to ingest massive amounts of prompt tokens. The framework also adds only 4.87 million trainable parameters, representing just 0.12% of the Qwen3-4B-Instruct backbone. By comparison, the MLP Memory baseline required 3 billion parameters, scaling up to 76.40% of the backbone's size while delivering inferior results. When prompt lengths scaled up to 32,000 tokens during inference tests, the framework maintained almost the exact same GPU memory footprint as a standard, unmodified model. It sidesteps the heavy memory bloat that affects other advanced memory systems like MemGen and MLP Memory. Different update strategies proved beneficial depending on the underlying model capacity. The sequence-state write strategy was the most effective for stronger backbones like Qwen3-8B. These more capable models use the segment-level writing to smooth out updates and mitigate token-level noise. Conversely, the multi-state write strategy drove massive performance leaps for smaller backbones like SmolLM3-3B. For these lower-capacity models, separating memory into multiple states proved critical to minimizing information interference. Implementing delta-mem in the enterprise stack The researchers have released the code for delta-mem on GitHub and the weights for their trained adapters on Hugging Face. For AI engineering teams looking to integrate this framework into their existing inference stack, the process requires minimal computing resources. “In practice, an engineering team would start from an existing instruction-tuned backbone, attach the Delta-Mem adapter modules to selected attention layers, train only the adapter parameters on domain-relevant multi-turn or long-context data... and then run inference with the memory state updated online during interaction,” Lei said. Crucially, teams do not need a massive pretraining corpus. The training data only needs to reflect the target memory behavior, such as multi-turn dialogues, agent traces, or domain workflows where earlier information must influence later decisions. While compressing interaction history into a fixed-size mathematical matrix creates immense efficiency, it does come with trade-offs. Delta-mem is not a lossless replacement for explicit text logs or document retrieval. Because different pieces of information compete inside the same limited state, there is a risk of memory blending. “Delta-Mem is useful when the system needs fast, online, continuously updated behavioral state,” Lei said. “RAG is better when the system needs exact factual recall, citation, compliance, auditability, or access to a large external knowledge base.” Remembering a user’s working style or a multi-step reasoning trajectory is a perfect fit for delta-mem, while retrieving a legal contract or a medical guideline should remain in a vector database. This means the most realistic enterprise architecture moving forward is a hybrid approach. Delta-mem acts as a lightweight internal working memory, reducing the need to retrieve or replay everything all the time, while RAG serves as the explicit, high-capacity memory layer. “Looking ahead, I do not think vector databases will become obsolete,” Lei said. “Instead, I expect enterprise AI stacks to become more layered. We will likely see short-term working memory inside the model, longer-term explicit memory in retrieval systems, and policy or audit layers that decide what should be stored, retrieved, forgotten, or exposed to the user.”

Editor's pickPAYWALLTechnology
FT· Yesterday

Six takeaways from Musk’s 200,000-word planetary vision

Elon Musk’s rockets-to-AI conglomerate lays out its ambitions

Editor's pick
Arxiv· Today

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

arXiv:2605.20423v1 Announce Type: new Abstract: Large Language Models (LLMs) perform well on many language tasks, but their Theory of Mind (ToM) reasoning is still uneven in complex social settings. Existing benchmarks, including ExploreToM, do not always test the recursive beliefs and information asymmetries that make these settings difficult. This paper presents OSCToM (Observer-Self Conflict Theory of Mind), an approach for modeling nested belief conflicts in LLM-based ToM tasks. The key case is one in which an observer's view of another agent conflicts with the observer's own belief state. Such cases go beyond simple perspective-taking and require recursive, multi-layered reasoning. OSCToM combines reinforcement learning (RL), an extended domain-specific language, and compositional surrogate models to generate observer-self conflicts. In our experiments, OSCToM-8B gives the best overall result among the systems tested. It improves on the reported ExploreToM results on FANToM and remains competitive on Hi-ToM and BigToM. On the information-asymmetric FANToM benchmark, OSCToM reaches 76% accuracy, compared with the 0.2% reported by ExploreToM. The data-synthesis procedure is also 6x more efficient, indicating that targeted training data can help smaller models handle advanced cognitive reasoning. The project code is available at https://github.com/sharminsrishty/osct.

AI Research & Science3 articles
AI Security & Cybersecurity4 articles
Editor's pickDefense & National Security
Arxiv· Today

Detecting Offensive Cyber Agents: A Detection-in-Depth Approach

arXiv:2605.21956v1 Announce Type: new Abstract: Artificial Intelligence (AI) agents can now orchestrate cyberattacks. This development is already increasing the speed and scale of cyber attacks, decreasing attack costs, and improving the operational autonomy of cyber capabilities. To defend against these emerging threats, actors must first develop the capability to detect them. This report frames the offensive cyber agent detection challenge by outlining the coming detection gap between offensive cyber agents and traditional cyber capabilities; introducing detection-in-depth, a strategic framework to guide policymakers and defenders responding to this detection gap; and presents five actionable detection mechanisms to support policymakers, industry, and defenders when putting this strategic framework into practice. These include (1) Agent Identifiers for Critical Infrastructure,(2) Agent Honeypots; (3) AI-Automated Alert Analysis and Triage: systems that use AI to filter, prioritize, and interpret the growing volume of detection signals expected from autonomous cyber operations; (4) An Agentic Security Alert Standard: A reporting standard model that providers can use to communicate agentic threats, improving the speed, consistency, and actionability of reports; (5) An Agentic Cybersecurity Exchange (ACE): an institution modeled on the Global Signal Exchange that brings together model and cloud providers to detect offensive cyber agent threats at their origin point and coordinate ecosystem-wide agentic threat disruption.

Editor's pickTechnology
Daily Brew· Yesterday

GitHub confirms breach of 3,800 repos via malicious VSCode extension

GitHub has confirmed a security breach affecting 3,800 repositories, traced back to a malicious VSCode extension.

Adoption, Deployment & Impact

29 articles
AI Adoption Barriers & Enablers7 articles
Editor's pickFinancial Services
Arxiv· Today

Declarative Data Services: Structured Agentic Discovery for Composing Data Systems

arXiv:2605.20690v1 Announce Type: new Abstract: Agentic discovery has shown that LLM-driven search can find novel algorithms, designs, and code under benchmark conditions. Translating the paradigm to multi-system data backends surfaces a harder problem: the search space is heterogeneous, the verifier is whether a deployed stack actually runs, and composition knowledge is unevenly captured in pretraining. Unbounded agentic discovery, a coding agent iterating on failure-log feedback, fails to converge consistently on a working stack even when iteration and explicit composition knowledge are added. We propose Declarative Data Services (DDS), an architecture for structured agentic discovery of data-system compositions from declarative user intent. The framework owns four typed contracts at successive layers (intent, operator DAG, per-system skills, runtime attribution) that decompose the global search into bounded sub-searches; sub-agents search each typed space, while the framework provides the channels by which knowledge flows forward as inline skill citations and errors route backward as typed signals. As a proof of life on a trading-backend workload, DDS converges where unbounded discovery does not; runtime failures become skill patches that the next deployment cites inline. We position this as an early prototype reporting lessons from real-world data-system composition.

Editor's pick
Arxiv· Today

From Licensing to Open Access: Designing a Sustainable Transition in Operational Weather Data

arXiv:2605.21673v1 Announce Type: cross Abstract: This translational article documents the European Centre for Medium-Range Weather Forecasts (ECMWF) transition from a restricted data licensing model to open access under CC BY 4.0, completed in October 2025. The policy context included EU open data requirements and alignment with international data exchange frameworks. The transition was implemented through a tiered service model that kept core forecast data open while offering operationally supported delivery as a cost-recovered service. Between 2020 and 2025, ECMWF executed an iterative planning cycle: setting an annual target for revenue reduction, specifying additions to the open tier under that target, provisioning infrastructure, and assessing outcomes to update assumptions. Drawing on internal administrative records (2014 - 2025), we describe design choices, operational constraints, and early outcomes. In the six months following the end of the transition, more than 93% of previously paying organisations retained a Service Agreement, while open endpoint download volumes increased substantially. We discuss trade-offs in defining the open tier (resolution, parameters, schedule), the reduction of compliance overheads formerly associated with redistribution restrictions, and the scalability implications of global distribution. We note an emerging sustainability question as AI-based forecast products become freely available. The early evidence is consistent with the view that a tiered service model can be designed to reconcile open-access obligations with operational sustainability, subject to monitoring over longer contract renewal cycles (typically annual).

Editor's pickManufacturing & Industrials
Supply Chain Dive· Yesterday

Procurement leaders urge incremental AI adoption to avoid heavy spend | Supply Chain Dive

Managing the cost of using AI rests on starting low-risk pilots and scaling the technology slowly, executives said at the Institute for Supply Management World 2026 conference.

Editor's pickProfessional Services
Artificial Intelligence Newsletter | May 22, 2026· Today

Singapore launches AI playbook to steer enterprise transformation

Singapore has launched an AI for Enterprise Impact Playbook to assist companies with AI adoption, workforce upskilling, and business transformation.

AI Applications9 articles
Editor's pickPAYWALLTechnology
Bloomberg· Yesterday

Zoom Soars After Expansion Into New Products Begins to Pay Off

Zoom Communications Inc. shares surged as much as 18% after the company projected stronger-than-anticipated sales growth and said that customers are paying for its expanded suite of office products.

Editor's pickPAYWALLMedia & Entertainment
Bloomberg· Today

AI Cartoon ‘Critterz’ Looks for Tech Partner Beyond OpenAI

Critterz, a feature-length cartoon intended to showcase how OpenAI’s video-generation capabilities could revolutionize filmmaking, has missed a planned Cannes Film Festival debut after the artificial intelligence company shut down its Sora tool, forcing its creators to look for a new AI partner.

Editor's pickPAYWALLMedia & Entertainment
FT· Yesterday

Spotify targets high-spending superfans with AI-generated music

Streamer and Universal Music Group strike licensing deal for a paid add-on tool within Spotify’s app

Editor's pickManufacturing & Industrials
Bebeez· Today

UK AI startup Scope raises €17.3 million funding led by Index Ventures to speed up industrial inspection workflows

Scope, a London-based AI workflow platform transforming inspections for the TIC (testing, inspection, certification) industry, has raised €17.2 million ($20 million) in funding to grow its London-based team and accelerate adoption among leading inspection companies globally. The round was led by Index Ventures with participation from Susa Ventures, Entrepreneurs First and Syndicate 1. Notable angels […]

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | May 21, 2026· 2 days ago

Indonesia targets corruption, efficiency with AI push across government

Indonesia plans to expand the use of AI across government administration, welfare distribution, and procurement to improve efficiency and reduce corruption, according to a senior official.

Editor's pickManufacturing & Industrials
Arxiv· Today

VBFDD-Agent for Electric Vehicle Battery Fault Detection and Diagnosis: Descriptive Text Modeling of Battery Digital Signals

arXiv:2605.20742v1 Announce Type: new Abstract: With the rapid proliferation of electric vehicles, the safety and reliability of lithium-ion batteries have become critical concerns. Effective anomaly detection is essential for ensuring safe battery operation. However, as battery systems and operating scenarios become increasingly complex, battery fault diagnosis and maintenance require stronger cross-domain adaptability and human-AI collaboration. Traditional fault detection and diagnosis methods are usually designed for specific scenarios and predefined workflows, making them less effective in complex real-world applications. To address the scarcity of open-source battery fault report corpora and the lack of unified maintenance knowledge representation, this study proposes a descriptive text modeling approach for battery signal reports. Monitoring signals, statistical features, anomaly records, and state assessment results are transformed into structured and readable natural language descriptions, forming a language corpus for battery health diagnosis and maintenance. Based on this corpus, we propose VBFDD-Agent, a vehicle battery fault detection and diagnosis agent for automotive-grade battery systems. VBFDD-Agent integrates descriptive battery-state texts, historical case retrieval, local maintenance manuals, and large language model reasoning to generate structured diagnostic results and maintenance recommendations. Experiments show that the proposed framework can accurately perform anomaly monitoring based on descriptive textual representations and provide flexible, efficient, and actionable maintenance suggestions. Expert evaluation further confirms the practical value of the generated recommendations. Overall, VBFDD-Agent extends traditional battery diagnosis from label prediction to interpretable and maintenance-oriented decision support.

Editor's pickHealthcare
PYMNTS· Yesterday

60% of Healthcare Firms Use AI for Chatbots | PYMNTS.com

Healthcare’s AI adoption is narrower than other sectors, but the industry is using it where operational strain is most immediate.

Editor's pickEducation
Arxiv· Today

AI-Enabled Serious Games: Integrating Intelligence and Adaptivity in Training Systems

arXiv:2605.21962v1 Announce Type: cross Abstract: Serious games are widely used for learning and training across domains such as healthcare, defense, and education. Persistent challenges remain, however, including static scenario design, authoring bottlenecks, limited learner modeling, and difficulty implementing meaningful real-time instructional adaptation. Recent advances in artificial intelligence (AI) introduce novel capabilities such as dynamic scenario variation, contextual feedback, adaptive pacing, and learner-state modeling that may help address some of these limitations. At the same time, integrating AI into serious games raises important questions related to validity, transparency, system control, and learner trust. This chapter examines how contemporary AI approaches may support real-time instructional adaptation in serious games. It distinguishes between instructional intelligence, defined as a system's capacity to infer learner knowledge and reason about pedagogically appropriate responses, and adaptivity, defined as the ability to modify instructional actions during interaction. A historical synthesis of adaptive learning systems is presented, tracing developments from early computer-assisted instruction through intelligent tutoring systems (ITS), dynamic difficulty adjustment (DDA), authoring platforms, learning analytics, and recent AI-enabled architectures. Building on this perspective, the chapter discusses how large language models (LLMs), reinforcement learning (RL), and agent-based architectures may contribute to more integrated forms of intelligence and adaptivity in serious games. It also highlights practical and research challenges associated with AI-enabled systems, including explainability, validation, computational cost, and the limited empirical evidence regarding long-term learning outcomes in AI-enabled serious games.

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | May 21, 2026· Yesterday

India eyes AI shield against manipulation in government tenders

India must deploy AI and advanced data analytics to detect bid rigging in government procurement while strengthening coordination between auditors and the competition watchdog.

AI Measurement & Evaluation3 articles
Editor's pickTechnology
Arxiv· Today

Open-World Evaluations for Measuring Frontier AI Capabilities

arXiv:2605.20520v1 Announce Type: new Abstract: Benchmark-based evaluation remains important for tracking frontier AI progress. But it can both overstate and understate deployed capability because it privileges tasks that can be precisely specified, automatically graded, easy to optimize for, and run with low budgets and short time horizons. We advocate for a complementary class of evaluations, which we term open-world evaluations: long-horizon, messy, real-world tasks assessed through small-sample qualitative analysis rather than benchmark-scale automation. In this paper we survey recent open-world evaluations, identify their strengths and limitations, and introduce CRUX (Collaborative Research for Updating AI eXpectations), a project for conducting such evaluations regularly. As a first instance, we task an AI agent with developing and publishing a simple iOS application to the Apple App Store. The agent completed the task with only a single avoidable manual intervention, suggesting that open-world evaluations can provide early warning of capabilities that may soon become widespread. We conclude with recommendations for designing and reporting open-world evals.

Editor's pick
Arxiv· Today

AgentAtlas: Beyond Outcome Leaderboards for LLM Agents

arXiv:2605.20530v1 Announce Type: new Abstract: Large language model agents now act on codebases, browsers, operating systems, calendars, files, and tool ecosystems, but the benchmarks used to evaluate them are fragmented: each emphasizes a different unit of measurement (final task success, tool-call validity, repeated-pass consistency, trajectory safety, or attack robustness). A line of 2024-2025 work has converged on the diagnosis that a single accuracy column is no longer the right unit of comparison for deployable agents. AgentAtlas extends this line of work with four components: (i) a six-state control-decision taxonomy (Act / Ask / Refuse / Stop / Confirm / Recover); (ii) a nine-category trajectory-failure taxonomy with two orthogonal hierarchical labels (primary_error_source, impact); (iii) a taxonomy-aware vs. taxonomy-blind methodology that measures how much of a model's apparent capability comes from the supervision in the prompt; and (iv) a benchmark-coverage audit mapping fifteen agent benchmarks against six behavioral axes. To demonstrate the methodology we run a small fixed eight-model set (1,342 generated items, four frontier closed and four open-weight) under both prompt modes. Removing the explicit label menu drops every model's trajectory accuracy by 14-40 pp to a tight 0.54-0.62 floor regardless of family, and no single model wins on all three of control accuracy, trajectory diagnosis, and tool-context utility retention. We treat the synthetic run as a measurement-protocol demonstration, not a benchmark release.

AI Productivity Evidence5 articles
Editor's pick
Arxiv· Today

The efficiency-gain illusion: People underestimate the rate of AI use and overestimate its benefits on simple tasks

arXiv:2605.22687v1 Announce Type: new Abstract: People are increasingly turning to AI assistance for simple tasks, e.g., arithmetic, spell-check, and answering simple questions. But does AI assistance actually save users time and effort? We investigate people's propensity to use AI for cognitively simple tasks and assess whether their reliance is well-calibrated. Across three pre-registered user studies (N = 2691), we find that people frequently choose to use AI even when doing so is inefficient (i.e. provides no meaningful time or effort savings). We identify systematic miscalibration at two levels: (1) a self-estimate miscalibration where people on average believe that they are using AI less than they actually are, and (2) efficiency-gain illusions where people overestimate how much time and effort savings AI use affords. We also identify a session-level carryover effect where a participant's prior AI use leads to further AI adoption and entrenches their miscalibration about time savings. Our results shed light on the mechanisms and biases underlying people's choice of whether to use AI as well as the risk of an overreliance feedback loop.

Editor's pickTechnology
Substack· Today

THE DAILY SCRAPE - by Brent Orrell - Help Desk - Substack

Matthew Prince writes in the WSJ that Cloudflare laid off over 20% of its workforce while growing revenue more than 30%, targeting what Peter Drucker called "measurers" — middle managers, operations, internal audit, finance, compliance, marketing — rather than builders or sellers. Prince argues AI now measures organizations more continuously and precisely than humans can, and predicts the growth-with-layoffs pattern will become standard across the next year.

Editor's pickTechnology
Theregister· Today

Cisco used AI to write security incident reports, with mixed results

You’ll need a lot of detailed prompts to get solid output - and even then it may have errors and typos

AI ROI & Business Case4 articles
Editor's pickManufacturing & Industrials
Arxiv· Today

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

arXiv:2605.20630v1 Announce Type: new Abstract: Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem on AssetOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline exposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. Existing LLM caching techniques such as KV-cache reuse and embedding-based semantic caching were designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOB plan-execute pipelines: a temporal semantic cache and a set of MCP workflow optimizations combining disk-backed tool-discovery caching and dependency-aware parallel step execution. MCP workflow optimizations corresponded to a 1.67x speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6x speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure semantic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.

Editor's pick
Arxiv· Today

Addressing the Synergy Gap: The Six Elements of the Design Space

arXiv:2605.21635v1 Announce Type: cross Abstract: AI is now embedded in healthcare, finance, policy, and many other domains, yet genuine human-AI synergy - combined performance that exceeds what either party achieves alone - is uncommon. Meta-analyses show that AI assistance tends to improve human performance compared to working alone, but studies finding true synergy are scarce. We call this persistent shortfall the synergy gap. Most current work treats human-AI combination as an engineering problem and concentrates on interpretability, trust calibration, or interface design. These matter, but they cover only part of what determines whether combination works. Closing the synergy gap, we argue, requires explicit engagement with a wider design space. We map that space through six interconnected elements: sociotechnical context, decision-making frameworks, human decision participants, AI capabilities, interaction, and holistic evaluation. For each element, we describe what it covers, how it shapes the others in practice, and what it implies for design. The result is a shared vocabulary for practitioners building hybrid systems, an analytical lens for researchers studying combination patterns, and a starting point for evaluators interested in the full quality of human-AI decision-making rather than accuracy alone.

Geopolitics, Policy & Governance

13 articles
AI Policy & Regulation9 articles
Editor's pickPAYWALLGovernment & Public Sector
Washington Post· Yesterday

AI & Tech Brief: White House AI order now postponed - The Washington Post

President Donald Trump cites overregulation concerns

Editor's pickTechnology
Arxiv· Today

Position: The Pre/Post-Training Boundary Should Govern IP in Industry-Academia ML Collaborations

arXiv:2605.22632v1 Announce Type: new Abstract: Industry-academia ML collaborations routinely fail to launch -- not for scientific reasons, but because academics must publish while companies must protect models trained on proprietary data, and no standard contract framework resolves this tension. Because contracts are negotiated by legal departments alone, many apparent legal disputes are incentive misalignment problems that only scientists at the table can correctly diagnose. We propose PBOS (Protect-the-Business / Open-Source-the-Science), a community-adoptable contract template anchored to a single technically-grounded boundary: pre-training artifacts (architectures, training code, benchmarks, untrained weights) are open science; post-training artifacts (weights trained on proprietary data) are business IP. This boundary is technically meaningful, legally clean, and auditable -- and could not have been drawn correctly without scientists at the negotiating table. We argue the ML community should adopt PBOS as its default contract for such collaborations.

Editor's pick
Arxiv· Today

Barriers to Evidence in AI-Related Cases and the Privatization of Proof

arXiv:2605.21816v1 Announce Type: new Abstract: Evidence lies at the core of litigation, but it is increasingly difficult to obtain in AI-related disputes. Even when a claimant's position has merit, cases are often settled or dismissed because decisive facts are hidden inside proprietary models, platform logs, and protected databases. Grounding our discussion in past and ongoing cases, we investigate how asymmetries in access, resources, and expertise can create significant barriers to evidence in AI-related cases. We show how developers and deployers resist disclosure through various strategies challenging the value of the evidence to the requesting party and the cost of evidence production. From these patterns we identify seven recurring sources of asymmetry -- access to models, data, documentation, logs, expertise, compute, and infrastructure -- that reflect a broader pattern that we call the privatization of proof: when control over proof falls in the hands of private actors that can demand justification for access while ensuring that justification remains out of reach. We further argue that different types of access can be fungible: in the absence of a certain type of access (e.g., to model internals), one may be able to use alternative forms of access (e.g., sufficient compute, query access, and access to user logs) and to obtain a functionally equivalent amount of information. We propose a three-part test that can help resolve AI access disputes in litigation, drawing on concepts such as proportionality and reasonable alternatives. Our test relies on a few observations, including that the cause of action can provide a baseline for access.

Editor's pickPAYWALLGovernment & Public Sector
Washington Post· Today

Last-minute lobbying by tech industry officials led Trump to cancel AI order - The Washington Post

Eleventh-hour phone calls with industry leaders and former AI and crypto czar David Sacks helped persuade President Donald Trump not to sign a highly anticipated executive order on artificial intelligence on Thursday.

Editor's pickGovernment & Public Sector
Guardian· Yesterday

Sadiq Khan sparks row with Met after blocking £50m AI deal with Palantir

Exclusive: Scotland Yard criticises London mayor’s decision as disappointing and warns it could hit policing Sadiq Khan has blocked a £50m Metropolitan police deal with the controversial US tech company Palantir, sparking a bitter row between the London mayor and Scotland Yard. After the UK’s largest police force had agreed to use Palantir’s AI technology to automate intelligence analysis in criminal investigations, Khan intervened, citing “serious concerns” about how the deal had been struck. Continue reading...

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.