Fri 22 May 2026
Daily Brief — Curated and contextualised by Best Practice AI
Standard Chartered CEO Apologizes, IBM Gains $1 Billion, and Newsom Tackles AI Job Loss
TL;DR Standard Chartered's CEO apologized for referring to workers as 'lower-value human capital' amid AI job loss discussions. The U.S. government is investing $2 billion in quantum computing, with IBM receiving half. World trade surged due to AI investments, while BlackRock highlights AI's role in earnings growth. California Governor Gavin Newsom is addressing AI-induced job displacement with new labor policies.
The stories that matter most
Selected and contextualised by the Best Practice AI team
U.S. to Award Quantum-Computing Firms $2 Billion and Take Equity Stakes
IBM, set to receive $1 billion of the package, saw large stock gains along with other companies involved.
World Trade Grew Strongly at Start of Year on AI Boom
World trade flows continued to increase at a rapid pace in the first three months of the year, boosted by the boom in AI-related investment.
UK’s Softcat Recasts Itself as AI Winner With Guidance Upgrade
Softcat Plc’s image among investors is quickly shifting from AI loser to AI winner.
Who Uses AI? Platforms, Workforce, and AI Exposure
arXiv:2605.21743v1 Announce Type: cross Abstract: A growing literature uses artificial intelligence platform conversation logs to measure occupation exposure. We show that these scores partly measure platform user base rather than the workforce. Holding outcome, sample, controls, and estimator fixed while varying only the platform input changes the post-ChatGPT employment coefficient by a factor of 1.9, and within-vendor consumer-versus-enterprise channels produce estimates that disagree in sign. Reweighting to Bureau of Labor Statistics workforce shares attenuates estimates by 42 to 93 percent. We formalize the non-classical measurement error, derive probability limits and partial-identification bounds for employment elasticities. The bias understates substitution more than augmentation.
Giving Workers a Stake in A.I. Gains Traction
Gov. Gavin Newsom of California has floated a policy idea that’s getting attention in Silicon Valley: let workers own a piece of technology disruption.
Economics & Markets
UK’s Softcat Recasts Itself as AI Winner With Guidance Upgrade
Softcat Plc’s image among investors is quickly shifting from AI loser to AI winner.
Exclusive: Grok falls flat in Washington, undercutting SpaceX's AI growth story | Reuters
WASHINGTON, May 21 (Reuters) - SpaceX’s initial public offering is set to be the largest in history, partly fueled by its promise to grab a chunk of what it calls a multi-trillion-dollar market for artificial intelligence services through ...
SpaceX IPO filing lays bare losses and Musk control as it stakes future on AI | Reuters
Musk's purchase of his social media and AI company x AI gave SpaceX new capabilities and opportunities but a staggering amount of spending, accounting for 76% of its $10.1 billion in capital spending in the first quarter, as well as fresh losses.
An AI trade involving energy and infrastructure that's doubled your money, topping Nvidia
If you put the same money into a basket of companies that are building out AI infrastructure and energy sources, you’ve done much better than stocks like Nvidia.
Reuters Reuters | Breaking International News & Views
Elon Musk’s rocket firm is poised to float with a $1.75 trln valuation. With Open AI coming, starry-eyed investors are likely to focus on mega floats. Windscreen fixer Belron’s mix of stable growth and low AI -disruption risk could show that even earthier listings can take off.
I’ve spent 25 years in venture capital. Here’s how it quietly shut ordinary Americans out of the AI wealth boom—and what could fix it
The private market didn't just grow. It replaced the system that once let regular investors participate in America's biggest wealth-creation moments. One solution keeps getting ignored.
Mars colony and Grok warnings: five strange details in SpaceX’s pitch to investors
IPO filing from Elon Musk’s company reveals closer look at finances, cosmic ambitions and tech empire’s quirks SpaceX publicly released an investor prospectus on Wednesday as part of its plan for a $1.75tn debut on the US stock market next month, revealing unseen details about the finances and future plans of Elon Musk’s flagship company. In addition to new information on operating costs and revenue, the filing also included trademark Muskian sweeping proclamations about the universe and insights into some of the quirks of his tech empire. Scattered throughout the 300-plus-page prospectus are several disclosures and risk warnings that show the eccentricities of Musk’s company and its cosmic ambitions. Other financial details in the document highlight how interdependent Musk’s various businesses have become and the risks that they carry. Continue reading...
The Synthetic Yield Mastering the Structural Complexity of Specialized Commercial AI Hardware and Compute Infrastructure Finance - Loan Management Software by Fundingo
The Synthetic Yield: Mastering the Structural Complexity of Specialized Commercial AI Hardware and Compute Infrastructure Finance The rapid proliferation of large language models and generative artificial intelligence has fundamentally altered the risk-return profile of specialized infrastructure ...
QUALCOMM Earnings Report and AI Smartphone Technology Insights
Such competitive dynamics can significantly impact pricing strategies, profit margins, and momentum in securing design wins, all factors of keen interest to institutional and retail investors. Regulatory frameworks and trade policies present additional industry-wide concerns. Export controls on advanced semiconductor technologies and evolving geopolitical ...
Norway’s $2.3 Trillion Fund Objects to Elkann’s Meta Board Seat
Norway’s $2.3 trillion wealth fund expressed dissatisfaction with the reappointment of John Elkann, chairman of Stellantis NV and chief executive of investor Exor NV, on the board of directors at Meta Platforms Inc.
CoreWeave vs. Nebius: Which Artificial Intelligence (AI) Infrastructure Stock Is a Better Buy in 2026? | The Motley Fool
Both CoreWeave and Nebius power AI giants with critical infrastructure, but one looks like the superior investment.
World Trade Grew Strongly at Start of Year on AI Boom
World trade flows continued to increase at a rapid pace in the first three months of the year, boosted by the boom in AI-related investment.
Jamie Dimon sees ‘exuberance’ in markets. That’s a loaded word when it comes to bubbles popping
The most important economic question of our time has no consensus — and the gap between what AI can do and what economies are organized to absorb may be the defining tension of the decade.
More AI-Exposed Industries and States Are Benefiting, But Results Are Heterogenous - ProMarket
In new research, Christos Makridis and Andrew Johnston find that industries exposed to generative AI are seeing an increase in production, employment, and wages. However, the majority of AI-driven revenue growth is channelled back to capital as profits, rather than to workers.
Can Rising Consumption Deepen Inequality?
arXiv:2601.15537v2 Announce Type: replace-cross Abstract: The impact of rising consumption on wealth inequality remains an open question. Here we revisit and extend the Social Architecture of Capitalism agent-based model proposed by Ian Wright, which reproduces stylized facts of wealth and income distributions. In a previous study, we demonstrated that the macroscopic behavior of the model is predominantly governed by a single dimensionless parameter, the ratio between average wealth per capita and mean salary, denoted by R. The shape of the wealth distribution, the emergence of a two-class structure, and the level of inequality - summarized by the Gini index - were found to depend mainly on R, with inequality increasing as R increases. In the present work, we examine the robustness of this result by relaxing some simplifying assumptions of the model. We first allow transactions such as purchases, salary payments, and revenue collections to occur with different frequencies, reflecting the heterogeneous temporal dynamics of real economies. We then impose limits on the maximum fractions of wealth that agents can spend or collect at each step, constraining the amplitude of individual transactions. We find that the dependence of the inequality on R remains qualitatively robust, although the detailed distribution patterns are affected by relative frequencies and transaction limits. Finally, we analyze a further variant of the model with adaptive wages emerging endogenously from the dynamics, showing that self-organized labor-market feedback can either stabilize or amplify inequality depending on macroeconomic conditions.
China’s AI-Made Video Is Changing the Entertainment Landscape
Such services pose an existential threat to traditional entertainment.
AI Startup Companies’ $80 Billion ARR—90% Captured by Just Two Compan… | Blockchain Industry Original In-Depth Content - Authoritative Industry Analysis Report Interpretation - Blockchain Technology Application Analysis - TechFlow
This isn’t a winner-takes-all scenario—it’s the winner flipping the table.
Visma Dinero halts Danish rollout of new AI assistant to preserve competition
Visma Dinero stopped the release of its AI assistant in Denmark after antitrust authorities warned that the tool could facilitate anticompetitive information exchange between rivals.
Music publishers file amended US claims against Anthropic
Universal Music, Concord Music Group, and ABKCO Music filed an amended complaint accusing Anthropic of copyright infringement through the unauthorized use of lyrics in AI model training.
Exclusive interview: Sundar Pichai on AI's flip phone moment
An exclusive interview featuring Google CEO Sundar Pichai discussing the current state and future of artificial intelligence.
COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space
arXiv:2605.20618v1 Announce Type: new Abstract: Although Vehicle Routing Problems (VRP) are essential to many real-world systems, they remain computationally intractable at scale due to their combinatorial complexity. Traditional heuristics rely on handcrafted rules for local improvements and occasional \textit{jumps} to escape local minima, but often struggle to generalize across diverse instances. We introduce \textbf{COAgents}, a cooperative multi-agent framework that models the search process as a graph: nodes represent solutions, and edges correspond to either local refinements or large perturbations for diversification (i.e., jumps). A \textit{Partial Search Graph} (PSG) is dynamically constructed during search, enabling COAgents to train a Node Selection Agent and a Move Selection Agent to guide intensification, and a Jump Agent to trigger well-timed explorations of new regions. Unlike end-to-end learning approaches, COAgents cleanly separates problem-agnostic search control from compact domain-specific encoding, facilitating adaptability across tasks. Extensive experiments on the CVRP and VRPTW benchmarks show that COAgents remains competitive with several learn-to-search baselines on CVRP and sets a new state of the art among learning-based methods on the more challenging VRPTW instances, reducing the gap to the best-known solutions by 14\% at $N\!=\!100$ and 44\% at $N\!=\!50$ relative to the strongest neural solver (POMO), and by 21\% and 40\% respectively relative to ALNS. Code is available at https://github.com/mahdims/COAgents.
I've led companies through every major tech disruption. AI washing is the same mistake, every time | Fortune
Leaders using AI to justify workforce cuts are missing the real opportunity to build more capable organizations.
OpenAI to open first international applied AI lab in Singapore
Singapore and OpenAI signed a partnership to establish an Applied AI Lab, the first outside the US, to boost AI adoption and talent development.
UK gets a fresh new unicorn as beauty and wellness platform Fresha lands €68.9 million from KKR
Fresha, a London-based AI-powered marketplace and business management platform for the beauty and wellness industry, has announced a €68.9 million ($80 million) primary growth investment from funds managed by KKR, a global investment firm. This deal values Fresha at over €861.9 million ($1 billion) and brings Fresha’s total capital raised to €245.5 million ($285 million). […]
Labor, Society & Culture
Who Uses AI? Platforms, Workforce, and AI Exposure
arXiv:2605.21743v1 Announce Type: cross Abstract: A growing literature uses artificial intelligence platform conversation logs to measure occupation exposure. We show that these scores partly measure platform user base rather than the workforce. Holding outcome, sample, controls, and estimator fixed while varying only the platform input changes the post-ChatGPT employment coefficient by a factor of 1.9, and within-vendor consumer-versus-enterprise channels produce estimates that disagree in sign. Reweighting to Bureau of Labor Statistics workforce shares attenuates estimates by 42 to 93 percent. We formalize the non-classical measurement error, derive probability limits and partial-identification bounds for employment elasticities. The bias understates substitution more than augmentation.
Generating tax revenues in an automated world
If AI destroys job markets, governments will need to make up the resulting shortfall in labour income tax receipts
Workday wants AI to punch in instead of having to hire new recruits
CEO eyes margin gains by keeping headcount flat – bold for a company selling HR software to employers
Ex-Facebook exec Sheryl Sandberg says the 10-year career plan is dead thanks to AI: ‘Don’t script your career when the future is uncertain,’ she warns Gen Z
The former Meta says rigid career plans will backfire: "If I had one, I would have missed the internet," Sheryl Sandberg warned Gen Z.
Workers say reliance on AI is eroding skills and judgment
A new GoTo study finds workers increasingly depend on AI tools, raising concerns about misuse, poor judgment and declining skills.
AI Might Not Bring On A Job Crisis, But A Workforce ‘Mismatch’ Could
A new Indeed report suggests there will still be job growth, but not in all fields. Here’s how employers and workers must adapt to avoid 8% unemployment.
By the Numbers: What the class of 2026 job market actually looks like — and where AI fits in
This commission may impact how and where certain products appear on this site (including, for example, the order in which they appear). Read more about Select on CNBC, and click here to read our full advertiser disclosure. ... ShareShare Article via FacebookShare Article via TwitterShare Article via LinkedInShare Article via Email · Congratulations to the Class of 2026...
AI Skills Shortage Hits 45% of Indian Organisations: Report
Nearly 45% of Indian firms cite AI skills as top workforce constraint. SHRM India Report reveals 54% show low urgency on AI investment despite looming disruption.
AI Is Reshaping Early Career Hiring Expectations, New ICIMS Data Reveals
/PRNewswire/ -- ICIMS, a leading enterprise talent acquisition platform, released the ICIMS Insights May 2026 Workforce Report, revealing a growing imbalance...
Cox Media, two others settle US FTC claims over AI marketing service
Cox Media Group and two marketing firms agreed to pay $930,000 to settle FTC allegations regarding deceptive claims about using AI to listen to smart device conversations for ad targeting.
Machine Learning as Performative Materialist Practice: Thirteen Theses on the Epistemology, Methodology, and Politics of Applied ML
arXiv:2605.21785v1 Announce Type: new Abstract: Machine learning practice in institutional decision-support contexts -- government, public policy, public health, criminal justice, resource allocation -- rests on a set of largely unexamined epistemological commitments inherited from classical statistics and computer science: that models represent stable regularities, that validation can be context-free, that performance metrics are politically neutral, and that feature importance reveals system structure. This paper challenges these commitments through a unified framework of performative materialist ML, articulated as thirteen theses. Drawing on Pickering's cybernetic ontology, the performativity literature from economic sociology (Callon, MacKenzie), Simon's bounded rationality, the formalization of performative prediction (Perdomo et al., 2020), and fifteen years of applied ML experience in government and public policy, we argue that: (1) ML models are best understood not as truth-seeking representations but as temporally situated compressions that function as instruments of intervention; (2) the full data product is a complex adaptive system that coevolves with its target and navigates a multi-objective space no single algorithm can optimize; (3) validity is fundamentally performative, measured by effects in the world rather than formal properties of the model; (4) the choices embedded in objective functions, fairness criteria, and resource thresholds are political decisions belonging to stakeholders, not technicians. We show how these theses unify several practical prescriptions -- temporal cross-validation, precision and recall at k, pipeline-aware fairness auditing, satisficing over optimizing -- as consequences of a coherent materialist epistemology rather than isolated best practices
Personalization of AI chatbots becoming focus of UK data watchdog, official says
The UK's Information Commissioner's Office is examining how AI chatbots collect and use personal data as it develops new guidance for responsible AI use under data protection law.
CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
arXiv:2605.21609v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly embedded in adolescent digital environments, mediating information seeking, advice, and emotionally sensitive interactions. Yet existing safety mechanisms remain largely grounded in adult-centric norms and operationalize safety through refusal-oriented suppression. While such approaches may reduce immediate policy violations, they can also create conversational dead-ends, limit constructive guidance, and fail to address the developmental vulnerabilities inherent in adolescent-AI interactions. We argue that adolescent LLM safety should be framed not solely as a filtering problem, but as a socio-technical, developmentally aligned transformation problem. To operationalize this perspective, we propose Critique-and-Revise-for-Teenagers (CR4T), a model-agnostic safeguarding framework that selectively reconstructs unsafe or refusal-style outputs into ageappropriate, guidance-oriented responses while preserving benign intent. CR4T combines lightweight risk detection with domain-conditioned rewriting to remove risk-amplifying content, reduce unnecessary conversational shutdown, and introduce developmentally appropriate guidance. Experimental results show that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions. These findings suggest that selective response reconstruction offers a more human-centered alternative to refusal-centric guardrails for adolescent-facing LLM systems.
Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?
arXiv:2605.22109v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) are increasingly deployed in human-facing roles where personality perception is critical, yet existing benchmarks evaluate this capability solely on numerical Big Five score prediction, leaving open whether models truly perceive personality through behavioral understanding or merely prejudge through superficial pattern matching. We address this gap with three contributions. (i) A new task: we formalize Grounded Personality Reasoning (GPR), which requires MLLMs to anchor each Big Five rating in observable evidence through a chain of rating, reasoning, and grounding. (ii) A new dataset: we release MM-OCEAN (1,104 videos, 5,320 MCQs), produced by a multi-agent pipeline with human verification, with timestamped behavioral observations, evidence-grounded trait analyses, and seven categories of cue-grounding MCQs. (iii) Benchmark and analysis: we design a three-tier evaluation (rating, reasoning, grounding) plus four sample-level failure-mode metrics: Prejudice Rate (PR), Confabulation Rate (CR), Integration-failure Rate (IR), and Holistic-grounding Rate (HR), and benchmark 27 MLLMs (13 closed, 14 open). The analysis uncovers a striking Prejudice Gap: across the field, 51% of correct ratings are not grounded in retrieved cues, and the Holistic-Grounding Rate spans only 0-33.5%. These findings expose a disconnect between getting the right score and reasoning for the right reason, charting a roadmap for grounded social cognition in MLLMs.
A school district’s lawsuit against Meta for mental health costs was set for trial next month. Zuckerberg settled
The school district had sought more than $60 million to create a 15-year program it said would help counteract mental health and learning issues.
Meta and Snapchat blocking Saudi dissidents’ accounts
US social media firms acting on orders from Middle East kingdom accused of being ‘instruments of repression’ Major US social media companies including Meta’s Facebook and Instagram platforms have blocked the accounts of Saudi Arabian dissidents so they are no longer visible inside the kingdom, following orders by Saudi authorities. Those affected include Abdullah Alaoudh, a US-based activist and vocal critic of Saudi human rights violations, and Omar Abdulaziz, a Canada and UK-based activist who worked closely with Jamal Khashoggi before the journalist’s murder by Saudi agents in 2018. The headline on this article was amended on 22 May 2026. An earlier version wrongly said X was blocking dissidents’ accounts. This has been corrected Continue reading...
Why the UK’s AI-powered prosperity hinges on skilling for all - Microsoft UK Stories
While employees are increasingly working like it’s 2026, some organisations are still operating like it’s 2019. Unless businesses, educators, government and the technology sector as a whole work together to build AI capability more broadly across the workforce, the UK risks hampering its ...
Colleges Are at a Breaking Point
The AI job market has made tuition look like a dubious investment. But it only exposes the deeper identity crisis in American higher education.
Opinion | A Defense of a Liberal Arts Education in the Age of A.I. - The New York Times
Making the case for a “useless” education · Hosted by Ross Douthat
Employers are prioritising AI-ready skills across general, tech industries - The Hindu
As AI becomes central to workforce strategy, Indian employers are prioritising practical, AI-ready skills across both general industries and the technology sector, said Nasscom in a report it prepared in collaboration with Indeed, a global job search and hiring platform based in Texas.
The Missing Link in Healthcare AI Adoption: Workforce Readiness | Healthcare IT Today
The following is a guest article by Anupama Shashank, Managing Director & Senior Vice President, Healthcare & Life Sciences at Kyndryl Nearly all healthcare organizations are deploying AI across clinical, operational, and administrative functions, outpacing the global average.
Technology & Infrastructure
From Automated to Autonomous: Hierarchical Agent-native Network Architecture (HANA)
arXiv:2605.20608v1 Announce Type: new Abstract: Realizing Level 4/5 Autonomous Networks (AN) demands a shift from static automation to agent-native intelligence. Current operations, reliant on rigid scripts, lack the cognitive agency to handle off-nominal conditions. To address this, this letter proposes a hierarchical multi-agent reference architecture enabling high-level autonomy. The framework features a Dual-Driven Orchestrator that coordinates specialized Executive Agents, supported by a shared Public Memory for unified domain knowledge. A key innovation is the integration of agent self-awareness, which empowers the system to harmonize deliberative strategic governance with reflexive fault recovery. We instantiate and validate this architecture within a 5G Core environment. Case studies demonstrate that the system sustains critical throughput under congestion and reduces Mean Time to Repair (MTTR) by 86%, confirming its efficacy in unifying strategic planning with operational resilience.
Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration
arXiv:2605.20190v1 Announce Type: new Abstract: Iterative industrial design-simulation optimization is bottlenecked by the CAD-CAE semantic gap: translating simulation feedback into valid geometric edits under diverse, coupled constraints. To fill this gap, we propose COSMO-Agent (Closed-loop Optimization, Simulation, and Modeling Orchestration), a tool-augmented reinforcement learning (RL) framework that teaches LLMs to complete the closed-loop CAD-CAE process. Specifically, we cast CAD generation, CAE solving, result parsing, and geometry revision as an interactive RL environment, where an LLM learns to orchestrate external tools and revise parametric geometries until constraints are satisfied. To make this learning stable and industrially usable, we design a multi-constraint reward that jointly encourages feasibility, toolchain robustness, and structured output validity. In addition, we contribute an industry-aligned dataset that covers 25 component categories with executable CAD-CAE tasks to support realistic training and evaluation. Experiments show that COSMO-Agent training substantially improves small open-source LLMs for constraint-driven design, exceeding large open-source and strong closed-source models in feasibility, efficiency, and stability.
The Download: coding’s future, the ‘Steroid Olympics,’ and AI-driven science
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. Anthropic’s Code with Claude showed off coding’s future—whether you like it or not At Anthropic’s developer event in London this week, Code with Claude, attendees were asked if they’d shipped code…
Gemini accused of 30,000-line code purge and fake recovery report
An AI coding agent reportedly broke production and generated fictitious post-mortem paperwork after a rollback.
Norway’s Roboxi lands €13 million to transform airport airside operations with automation and robotics
Roboxi, a Stavanger-based startup specialising in airport airside automation and autonomy, has announced the completion of a share issue raising approximately €13 million in new equity. According to the company, the share issue generated significant interest from both new and existing shareholders. The primary investors are prominent ones based in the Rogaland region of Norway. […]
IDC: 93% of Enterprises See AI as Revenue Driver as Agentic AI Reshapes Business Strategy - InfotechLead
IDC revealed AI has evolved from an experimental technology into a core business growth engine, with enterprises adopting agentic AI
AgentCo-op: Retrieval-Based Synthesis of Interoperable Multi-Agent Workflows
arXiv:2605.20425v1 Announce Type: new Abstract: Designing multi-agent workflows is especially difficult in open-ended scientific settings where tasks lack curated training sets, reliable scalar evaluation metrics, and standardized interfaces between existing tools and agents. We propose AgentCo-op, a retrieval-based synthesis framework that composes reusable skills, tools, and external agents into executable workflows through typed artifact handoffs, then applies bounded self-guided local repair to implicated components when execution evidence indicates failure. In two open-world genomics case studies, AgentCo-op composes independently developed scientific agents and external tool repositories into auditable workflows without redesigning them or running global topology search. It coordinates specialized agents for spatial transcriptomics and gene-set interpretation to enable collaborative discovery from spatial transcriptomics data, and builds a parallel workflow for cross-modality marker analysis on single-cell multiome data. AgentCo-op can also import a searched workflow as a structural prior and improve it by grounding nodes with retrieved components and applying local repair, showing that synthesis and search are complementary. On six coding, math, and question-answering benchmarks, AgentCo-op achieves the best result on four benchmarks and the best average score under a unified backbone setting, while consistently reducing per-task cost relative to multi-agent baselines. Together, these results suggest that retrieval-based synthesis can extend automated agentic workflow design beyond benchmark-optimized agent graphs to open-world workflows built from existing agents, tools, and typed artifacts.
SOLAR: A Self-Optimizing Open-Ended Autonomous Agent for Lifelong Learning and Continual Adaptation
arXiv:2605.20189v1 Announce Type: new Abstract: Despite the remarkable success of large language models (LLMs), they still face bottlenecks while deploying in dynamic, real-world settings with primary challenges being concept drift and the high cost of gradient-based adaptation. Traditional fine-tuning (FT) struggles to adapt to non-stationary data streams without resulting in catastrophic for getting or requiring extensive manual data curation. To address these limitations within the streaming and continual learning paradigm, we propose the Self-Optimizing Lifelong Autonomous Reasoner (SOLAR) which is an open-ended autonomous agent that leverages parameter-level meta-learning to self-improve, treating model weights as an environment for exploration. It initiates the process by consolidating a strong prior over common-sense knowledge making it effective for transfer-learning. By utilizing a multi-level reinforcement learning approach, SOLAR autonomously discovers adaptation strategies, enabling efficient test-time adaptation to unseen domains. Crucially, SOLAR maintains an evolving knowledge base of valid modification strategies, implicitly acting as an episodic memory buffer to balance plasticity (adaptation to new tasks) and stability (retention of meta-knowledge). Experiments demonstrate that SOLAR outperforms strong baselines on common-sense, mathematical, medical, coding, social and logical reasoning tasks, marking a significant step toward autonomous agents capable of lifelong adaptation in evolving environments.
The Agentic AI Storage Shock - by Ben Bajarin
How enterprise agents turn data lakes, workflow logs, and generated artifacts into the next infrastructure gating layer
Why Agentic AI’s Next Breakthrough Depends on Search - Insignia Business Review
AI is entering a new phase. Attention is shifting toward inference: how to run those models reliably, cheaply, and at scale in real-world environments.
Rethinking Artificial Intelligence at the Strategic Frontier
AI in defense shifts from tools to human-AI teaming; interaction-centered design improves trust, decisions, and security outcomes in complex environments.
Lenovo Q4 revenue tops estimates on strong PC sales, shares jump 15%
The ISG segment has spent the last ... but AI servers carry thinner margins than PCs and depend heavily on whether Lenovo can secure GPU allocation at competitive prices. Bamboo Works’ analysis of the company has flagged ongoing geopolitical exposure, particularly around US export controls on advanced ...
U.S. to Award Quantum-Computing Firms $2 Billion and Take Equity Stakes
IBM, set to receive $1 billion of the package, saw large stock gains along with other companies involved.
Anthropic, Microsoft in talks for AI chip deal after $5 billion investment
Microsoft has not made the Maia 200 chips available to customers, but they are used in the company's data centers, offering better efficiency than other silicon.
How AI demand is redefining enterprise infrastructure strategy | TechRadar
AI adoption is causing increased pressure on supply chains and component availability
Gaia AI supercomputer launched in Kraków, Poland
Poland has inaugurated its second AI factory in its southern city of Kraków. Known as the Gaia AI Factory, the 10 exaflop supercomputer will harness more than a thousand GPU accelerators to facilitate the training of advanced AI models and research into practical applications for the technology in education, healthcare, and public administration – Academic […]
Mahjax: A GPU-Accelerated Mahjong Simulator for Reinforcement Learning in JAX
arXiv:2605.20577v1 Announce Type: new Abstract: Riichi Mahjong is a multi-player, imperfect-information game characterized by stochasticity and high-dimensional state spaces. These attributes present a unique combination of challenges that mirror complex real-world decision-making problems in reinforcement learning. While prior research has heavily relied on supervised learning from human play logs to pre-train the policy, algorithms capable of learning \textit{tabula rasa} (from scratch) offer greater potential for general applicability, as evidenced by the AlphaZero lineage. To facilitate such research, we introduce \textbf{Mahjax}, a fully vectorized Riichi Mahjong environment implemented in JAX to enable large-scale rollout parallelization on Graphics Processing Units (GPUs). We also provide a high-quality visualization tool to streamline debugging and interaction with trained agents. Experimental results demonstrate that Mahjax achieves throughputs of up to \textbf{2 million} and \textbf{1 million steps per second} on eight NVIDIA A100 GPUs under the no-red and red rules, respectively. Furthermore, we validate the environment's utility for reinforcement learning by showing that agents can be trained effectively to improve their rank against baseline policies.
Not Yet: Humans Outperform LLMs in a Colonel Blotto Tournament
arXiv:2605.22095v1 Announce Type: new Abstract: The emergence of large language models (LLMs) has spurred economists to study how humans and LLMs behave in strategic settings. We organized a series of round-robin tournaments in the Colonel Blotto game. This game attracts game theorists' attention due to high-dimensional action space and the absence of pure strategy Nash equilibria. In the first tournament, more than 200 human participants competed against one another. In the second tournament, several popular LLMs were invited to submit strategies. In the third tournament, we matched the number of LLM strategies to the number submitted by humans. We find that humans more often employ better-calibrated intermediate-level allocation heuristics and outperform the simpler, more stereotyped strategies submitted by LLMs. Strategic sophistication is key to success if and only if the necessary level of reasoning depth is reached, while lower and higher levels of reasoning offer no clear advantage over the primitive strategies. Among humans, field of study weakly predicts success: participants with STEM backgrounds perform better in the first tournament. Surprisingly, humans almost do not adjust their strategies across tournaments with different sets of opponents. This result suggests that humans base their choices primarily on the game's rules rather than on the identity of their opponents, treating LLMs much like human competitors.
A 0.12% parameter add-on gives AI agents the working memory RAG can't
AI agents forget. Every time a coding assistant loses track of a debugging thread, or a data analysis agent re-ingests the same context it already processed, the team pays in latency, token costs, and brittle workflows. The fix most teams reach for — expanding the context window or adding more RAG — is increasingly expensive and still doesn't reliably work. To address this, researchers from Mind Lab and several universities proposed delta-mem, an efficient technique that compresses the model’s historical information into a dynamically updated matrix without changing the model itself. The resulting module adds just 0.12% of the backbone model's parameters — compared to 76.40% for one leading alternative — while outperforming it on memory-heavy benchmarks. Delta-mem allows models to continuously accumulate and reuse historical data, reducing the reliance on massive context windows or complex external retrieval modules for behavioral continuity. The long memory challenge The conventional solution is to simply dump all the information into the model’s context window. But as Jingdi Lei, co-author of the paper, told VentureBeat, current systems treat memory merely as a context-management problem. “Either we keep expanding the context window, or we retrieve more documents through RAG,” Lei explained. “These approaches are useful and will remain important, but they become increasingly expensive and brittle when agents need to operate over long-running, multi-step interactions, and they don't really [work] like human memory since they are more like looking up documents.” In enterprise settings, the bottleneck is not just whether the model can access history, but whether it can reuse that history efficiently, continuously, and with low latency. Standard attention mechanisms incur a quadratic computational cost as the sequence length increases. Furthermore, expanding the context window does not guarantee the model will actually recall the information effectively. Models often suffer from context degradation or context rot as they become overwhelmed with more (and often conflicting) information, even if they support one million tokens in theory. The researchers argue for advanced memory mechanisms that can represent historical information compactly and maintain it dynamically across interactions. Existing solutions come with heavy trade-offs and generally fall into three paradigms: Textual memory: stores history as text injected into context — constrained by window limits and prone to information loss under compression. Outside-channel (RAG): encodes and retrieves from external modules — adds latency, integration complexity, and potential misalignment with the backbone. Parametric: encodes memory into model weights via adapters — static after training, can't adapt to new information during live interactions. Inside delta-mem To achieve a compact and dynamically updated memory, delta-mem compresses an agent’s past interactions into an “online state of associative memory” (OSAM). This state is maintained as a fixed-size matrix that preserves historical information while the underlying language model remains frozen. For enterprise workflows, this translates directly to resolving operational bottlenecks. Lei noted that a persistent coding assistant, for example, “may need to remember project conventions, recent debugging steps, user preferences, or intermediate decisions across a workflow.” Similarly, a data analysis agent might “need to maintain task state, assumptions, and prior observations while iterating over multiple tool calls.” Rather than repeatedly retrieving and re-inserting all relevant history for these tasks, the delta-mem matrix provides a low-overhead way to carry forward useful interaction states inside the model’s forward computation. During generation, the system does not retrieve raw text segments to add to the prompt. Instead, the backbone LLM’s current hidden state is projected into the matrix to retrieve old memory. This operation extracts context-relevant associative memory signals from delta-mem. These signals are then transformed into numerical corrections that are applied to the computations of the model. This steers the model's reasoning at inference time without altering its internal parameters. Following each interaction, delta-mem updates the online state using “delta-rule learning.” When new information arrives, the previous state makes a prediction about the resulting attention values. It then compares this prediction to the actual value and corrects the memory matrix based on the discrepancy. This update mechanism relies on a “gated delta-rule.” Basically, the memory module has different knobs that control how much previous memory is kept and how much of the new memory is applied. This error correction with controlled forgetting allows the matrix to evolve over time, holding onto stable historical associations without being derailed by short-term noise. The researchers explored three strategies for determining when and how the matrix updates: Token-state write captures fine-grained changes but is vulnerable to short-term noise. Sequence-state write averages tokens within a message segment, smoothing updates at the cost of some localized detail. Multi-state write decomposes memory into sub-states for different information types like facts or task progress. Delta-mem in action The researchers evaluated delta-mem across three LLM backbones: Qwen3-8B, Qwen3-4B-Instruct, and SmolLM3-3B. They configured the framework with a compact 8x8 matrix. The system was tested on general capability benchmarks, including HotpotQA, GPQA-Diamond, and IFEval. It was also evaluated on memory-heavy tasks such as LoCoMo, which tests long-term conversational memory, and Memory Agent Bench, which assesses retention, retrieval, selective forgetting, and test-time learning over extended interactions. The framework was compared against representative models from the three existing memory paradigms: textual memory baselines (e.g., BM25 RAG, LLMLingua-2, and MemoryBank), parametric systems (Context2LoRA and MemGen), and the outside-channel approach MLP Memory. Across the board, delta-mem outperformed the baselines, according to the researchers. On the Qwen3-4B-Instruct backbone, the token-state write variant achieved an average score of 51.66%, easily surpassing the frozen vanilla backbone at 46.79% and the strongest baseline, Context2LoRA, at 44.90%. On the memory-heavy Memory Agent Bench, the average score jumped from 29.54% to 38.85%. Performance on the specific test-time learning subtask nearly doubled from 26.14 to 50.50. However, the most compelling takeaways are the system's operational efficiency. The researchers tested the framework in a no-context setting where the historical text was entirely removed from the context. Even without explicit text replay, delta-mem successfully recovered context-relevant evidence in multi-hop tasks. The researchers argue that the model remembers past interactions without needing to ingest massive amounts of prompt tokens. The framework also adds only 4.87 million trainable parameters, representing just 0.12% of the Qwen3-4B-Instruct backbone. By comparison, the MLP Memory baseline required 3 billion parameters, scaling up to 76.40% of the backbone's size while delivering inferior results. When prompt lengths scaled up to 32,000 tokens during inference tests, the framework maintained almost the exact same GPU memory footprint as a standard, unmodified model. It sidesteps the heavy memory bloat that affects other advanced memory systems like MemGen and MLP Memory. Different update strategies proved beneficial depending on the underlying model capacity. The sequence-state write strategy was the most effective for stronger backbones like Qwen3-8B. These more capable models use the segment-level writing to smooth out updates and mitigate token-level noise. Conversely, the multi-state write strategy drove massive performance leaps for smaller backbones like SmolLM3-3B. For these lower-capacity models, separating memory into multiple states proved critical to minimizing information interference. Implementing delta-mem in the enterprise stack The researchers have released the code for delta-mem on GitHub and the weights for their trained adapters on Hugging Face. For AI engineering teams looking to integrate this framework into their existing inference stack, the process requires minimal computing resources. “In practice, an engineering team would start from an existing instruction-tuned backbone, attach the Delta-Mem adapter modules to selected attention layers, train only the adapter parameters on domain-relevant multi-turn or long-context data... and then run inference with the memory state updated online during interaction,” Lei said. Crucially, teams do not need a massive pretraining corpus. The training data only needs to reflect the target memory behavior, such as multi-turn dialogues, agent traces, or domain workflows where earlier information must influence later decisions. While compressing interaction history into a fixed-size mathematical matrix creates immense efficiency, it does come with trade-offs. Delta-mem is not a lossless replacement for explicit text logs or document retrieval. Because different pieces of information compete inside the same limited state, there is a risk of memory blending. “Delta-Mem is useful when the system needs fast, online, continuously updated behavioral state,” Lei said. “RAG is better when the system needs exact factual recall, citation, compliance, auditability, or access to a large external knowledge base.” Remembering a user’s working style or a multi-step reasoning trajectory is a perfect fit for delta-mem, while retrieving a legal contract or a medical guideline should remain in a vector database. This means the most realistic enterprise architecture moving forward is a hybrid approach. Delta-mem acts as a lightweight internal working memory, reducing the need to retrieve or replay everything all the time, while RAG serves as the explicit, high-capacity memory layer. “Looking ahead, I do not think vector databases will become obsolete,” Lei said. “Instead, I expect enterprise AI stacks to become more layered. We will likely see short-term working memory inside the model, longer-term explicit memory in retrieval systems, and policy or audit layers that decide what should be stored, retrieved, forgotten, or exposed to the user.”
Six takeaways from Musk’s 200,000-word planetary vision
Elon Musk’s rockets-to-AI conglomerate lays out its ambitions
OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind
arXiv:2605.20423v1 Announce Type: new Abstract: Large Language Models (LLMs) perform well on many language tasks, but their Theory of Mind (ToM) reasoning is still uneven in complex social settings. Existing benchmarks, including ExploreToM, do not always test the recursive beliefs and information asymmetries that make these settings difficult. This paper presents OSCToM (Observer-Self Conflict Theory of Mind), an approach for modeling nested belief conflicts in LLM-based ToM tasks. The key case is one in which an observer's view of another agent conflicts with the observer's own belief state. Such cases go beyond simple perspective-taking and require recursive, multi-layered reasoning. OSCToM combines reinforcement learning (RL), an extended domain-specific language, and compositional surrogate models to generate observer-self conflicts. In our experiments, OSCToM-8B gives the best overall result among the systems tested. It improves on the reported ExploreToM results on FANToM and remains competitive on Hi-ToM and BigToM. On the information-asymmetric FANToM benchmark, OSCToM reaches 76% accuracy, compared with the 0.2% reported by ExploreToM. The data-synthesis procedure is also 6x more efficient, indicating that targeted training data can help smaller models handle advanced cognitive reasoning. The project code is available at https://github.com/sharminsrishty/osct.
Roundtables: Can AI Learn to Understand the World?
Listen to the session or watch below AI companies want to build systems that understand the external world and overcome the limitations of LLMs. Recent developments have brought world models to the forefront of the AI discussion. Watch a conversation with editor in chief Mat Honan, senior AI editor Will Douglas Heaven, and AI reporter…
Our Field Trip to Google I/O + A Sit-Down With Sundar Pichai + System Update
“This is the only recent gathering of a large number of people where mentions of A.I. did not produce a large chorus of boos.”
Meet Stable Audio 3.0, the Model Family Built for Artistic Experimentation with Open-Weight Models
Stable Audio 3.0 introduces open-weight generative audio models trained on licensed data, aimed at music and sound creation.
Isomorphic Dynamic Programs
arXiv:2605.22076v1 Announce Type: new Abstract: We study relationships between dynamic programs by applying conjugacy methods from dynamical systems theory. When two dynamic programs are connected by an order isomorphism, we show that optimality properties transmit from one formulation to the other. We apply these results to Epstein--Zin preferences with time preference shocks, obtaining a sharp characterization of when optimality holds. We also show that multiplicative Kreps--Porteus preferences and risk-sensitive preferences are isomorphic, so that well-known results for the latter carry over to the former. Finally, we demonstrate how isomorphic transformations can improve the numerical accuracy of value function approximations, with gains of two orders of magnitude in a multisector real business cycle model.
Google I/O showed how the path for AI-driven science is shifting
During Tuesday’s Google I/O keynote, Demis Hassabis, the CEO of Google DeepMind, proclaimed that we are currently “standing in the foothills of the singularity.” It was a striking statement—the singularity is the theoretical future moment when AI rapidly exceeds human intelligence and dramatically transforms the world. But what struck me as I listened in the…
Detecting Offensive Cyber Agents: A Detection-in-Depth Approach
arXiv:2605.21956v1 Announce Type: new Abstract: Artificial Intelligence (AI) agents can now orchestrate cyberattacks. This development is already increasing the speed and scale of cyber attacks, decreasing attack costs, and improving the operational autonomy of cyber capabilities. To defend against these emerging threats, actors must first develop the capability to detect them. This report frames the offensive cyber agent detection challenge by outlining the coming detection gap between offensive cyber agents and traditional cyber capabilities; introducing detection-in-depth, a strategic framework to guide policymakers and defenders responding to this detection gap; and presents five actionable detection mechanisms to support policymakers, industry, and defenders when putting this strategic framework into practice. These include (1) Agent Identifiers for Critical Infrastructure,(2) Agent Honeypots; (3) AI-Automated Alert Analysis and Triage: systems that use AI to filter, prioritize, and interpret the growing volume of detection signals expected from autonomous cyber operations; (4) An Agentic Security Alert Standard: A reporting standard model that providers can use to communicate agentic threats, improving the speed, consistency, and actionability of reports; (5) An Agentic Cybersecurity Exchange (ACE): an institution modeled on the Global Signal Exchange that brings together model and cloud providers to detect offensive cyber agent threats at their origin point and coordinate ecosystem-wide agentic threat disruption.
GitHub confirms breach of 3,800 repos via malicious VSCode extension
GitHub has confirmed a security breach affecting 3,800 repositories, traced back to a malicious VSCode extension.
AI in Cybersecurity Market Growth Analysis, Trends, and Investment Outlook 2035
Future Outlook and Investment ... AI in cybersecurity market will be defined by autonomous defense ecosystems capable of self-learning, adaptive response, and predictive remediation. As cyber threats become increasingly sophisticated and machine-generated attacks proliferate, AI-driven security intelligence will become a foundational enterprise requirement ...
AI, Cybersecurity Education, and the Defense of America’s Digital Border | eSecurity Planet
Artificial intelligence (AI) is reshaping cybersecurity at a pace that is forcing educators, businesses, and governments to rethink workforce development and national defense strategies. During a recent discussion with cybersecurity entrepreneur and ConnectSecure Chairman, Arnie Bellini, key themes emerged around the evolution of cyber threats...
Adoption, Deployment & Impact
Declarative Data Services: Structured Agentic Discovery for Composing Data Systems
arXiv:2605.20690v1 Announce Type: new Abstract: Agentic discovery has shown that LLM-driven search can find novel algorithms, designs, and code under benchmark conditions. Translating the paradigm to multi-system data backends surfaces a harder problem: the search space is heterogeneous, the verifier is whether a deployed stack actually runs, and composition knowledge is unevenly captured in pretraining. Unbounded agentic discovery, a coding agent iterating on failure-log feedback, fails to converge consistently on a working stack even when iteration and explicit composition knowledge are added. We propose Declarative Data Services (DDS), an architecture for structured agentic discovery of data-system compositions from declarative user intent. The framework owns four typed contracts at successive layers (intent, operator DAG, per-system skills, runtime attribution) that decompose the global search into bounded sub-searches; sub-agents search each typed space, while the framework provides the channels by which knowledge flows forward as inline skill citations and errors route backward as typed signals. As a proof of life on a trading-backend workload, DDS converges where unbounded discovery does not; runtime failures become skill patches that the next deployment cites inline. We position this as an early prototype reporting lessons from real-world data-system composition.
From Licensing to Open Access: Designing a Sustainable Transition in Operational Weather Data
arXiv:2605.21673v1 Announce Type: cross Abstract: This translational article documents the European Centre for Medium-Range Weather Forecasts (ECMWF) transition from a restricted data licensing model to open access under CC BY 4.0, completed in October 2025. The policy context included EU open data requirements and alignment with international data exchange frameworks. The transition was implemented through a tiered service model that kept core forecast data open while offering operationally supported delivery as a cost-recovered service. Between 2020 and 2025, ECMWF executed an iterative planning cycle: setting an annual target for revenue reduction, specifying additions to the open tier under that target, provisioning infrastructure, and assessing outcomes to update assumptions. Drawing on internal administrative records (2014 - 2025), we describe design choices, operational constraints, and early outcomes. In the six months following the end of the transition, more than 93% of previously paying organisations retained a Service Agreement, while open endpoint download volumes increased substantially. We discuss trade-offs in defining the open tier (resolution, parameters, schedule), the reduction of compliance overheads formerly associated with redistribution restrictions, and the scalability implications of global distribution. We note an emerging sustainability question as AI-based forecast products become freely available. The early evidence is consistent with the view that a tiered service model can be designed to reconcile open-access obligations with operational sustainability, subject to monitoring over longer contract renewal cycles (typically annual).
Procurement leaders urge incremental AI adoption to avoid heavy spend | Supply Chain Dive
Managing the cost of using AI rests on starting low-risk pilots and scaling the technology slowly, executives said at the Institute for Supply Management World 2026 conference.
Singapore launches AI playbook to steer enterprise transformation
Singapore has launched an AI for Enterprise Impact Playbook to assist companies with AI adoption, workforce upskilling, and business transformation.
Viewpoint: Insurers Cautiously Navigate the Next Steps in AI Adoption
As more and more companies embed AI into select functions, only a portion indicate that they have used AI to change how an overall enterprise runs. It
The hidden flaw in insurance AI adoption for advisors and carriers - Insurance News | InsuranceNewsNet
Many insurers are still using AI for existing underwriting and claims rather than redesigning how those workflows operate.
AI, data, and regulatory risk: What's shaping surveillance in 2026
Discover findings from the 2026 Surveillance Benchmarking Survey: AI adoption, regulatory expectations, and data shaping compliance today.
Zoom Soars After Expansion Into New Products Begins to Pay Off
Zoom Communications Inc. shares surged as much as 18% after the company projected stronger-than-anticipated sales growth and said that customers are paying for its expanded suite of office products.
AI Cartoon ‘Critterz’ Looks for Tech Partner Beyond OpenAI
Critterz, a feature-length cartoon intended to showcase how OpenAI’s video-generation capabilities could revolutionize filmmaking, has missed a planned Cannes Film Festival debut after the artificial intelligence company shut down its Sora tool, forcing its creators to look for a new AI partner.
Spotify targets high-spending superfans with AI-generated music
Streamer and Universal Music Group strike licensing deal for a paid add-on tool within Spotify’s app
UK AI startup Scope raises €17.3 million funding led by Index Ventures to speed up industrial inspection workflows
Scope, a London-based AI workflow platform transforming inspections for the TIC (testing, inspection, certification) industry, has raised €17.2 million ($20 million) in funding to grow its London-based team and accelerate adoption among leading inspection companies globally. The round was led by Index Ventures with participation from Susa Ventures, Entrepreneurs First and Syndicate 1. Notable angels […]
Indonesia targets corruption, efficiency with AI push across government
Indonesia plans to expand the use of AI across government administration, welfare distribution, and procurement to improve efficiency and reduce corruption, according to a senior official.
VBFDD-Agent for Electric Vehicle Battery Fault Detection and Diagnosis: Descriptive Text Modeling of Battery Digital Signals
arXiv:2605.20742v1 Announce Type: new Abstract: With the rapid proliferation of electric vehicles, the safety and reliability of lithium-ion batteries have become critical concerns. Effective anomaly detection is essential for ensuring safe battery operation. However, as battery systems and operating scenarios become increasingly complex, battery fault diagnosis and maintenance require stronger cross-domain adaptability and human-AI collaboration. Traditional fault detection and diagnosis methods are usually designed for specific scenarios and predefined workflows, making them less effective in complex real-world applications. To address the scarcity of open-source battery fault report corpora and the lack of unified maintenance knowledge representation, this study proposes a descriptive text modeling approach for battery signal reports. Monitoring signals, statistical features, anomaly records, and state assessment results are transformed into structured and readable natural language descriptions, forming a language corpus for battery health diagnosis and maintenance. Based on this corpus, we propose VBFDD-Agent, a vehicle battery fault detection and diagnosis agent for automotive-grade battery systems. VBFDD-Agent integrates descriptive battery-state texts, historical case retrieval, local maintenance manuals, and large language model reasoning to generate structured diagnostic results and maintenance recommendations. Experiments show that the proposed framework can accurately perform anomaly monitoring based on descriptive textual representations and provide flexible, efficient, and actionable maintenance suggestions. Expert evaluation further confirms the practical value of the generated recommendations. Overall, VBFDD-Agent extends traditional battery diagnosis from label prediction to interpretable and maintenance-oriented decision support.
60% of Healthcare Firms Use AI for Chatbots | PYMNTS.com
Healthcare’s AI adoption is narrower than other sectors, but the industry is using it where operational strain is most immediate.
AI-Enabled Serious Games: Integrating Intelligence and Adaptivity in Training Systems
arXiv:2605.21962v1 Announce Type: cross Abstract: Serious games are widely used for learning and training across domains such as healthcare, defense, and education. Persistent challenges remain, however, including static scenario design, authoring bottlenecks, limited learner modeling, and difficulty implementing meaningful real-time instructional adaptation. Recent advances in artificial intelligence (AI) introduce novel capabilities such as dynamic scenario variation, contextual feedback, adaptive pacing, and learner-state modeling that may help address some of these limitations. At the same time, integrating AI into serious games raises important questions related to validity, transparency, system control, and learner trust. This chapter examines how contemporary AI approaches may support real-time instructional adaptation in serious games. It distinguishes between instructional intelligence, defined as a system's capacity to infer learner knowledge and reason about pedagogically appropriate responses, and adaptivity, defined as the ability to modify instructional actions during interaction. A historical synthesis of adaptive learning systems is presented, tracing developments from early computer-assisted instruction through intelligent tutoring systems (ITS), dynamic difficulty adjustment (DDA), authoring platforms, learning analytics, and recent AI-enabled architectures. Building on this perspective, the chapter discusses how large language models (LLMs), reinforcement learning (RL), and agent-based architectures may contribute to more integrated forms of intelligence and adaptivity in serious games. It also highlights practical and research challenges associated with AI-enabled systems, including explainability, validation, computational cost, and the limited empirical evidence regarding long-term learning outcomes in AI-enabled serious games.
India eyes AI shield against manipulation in government tenders
India must deploy AI and advanced data analytics to detect bid rigging in government procurement while strengthening coordination between auditors and the competition watchdog.
Open-World Evaluations for Measuring Frontier AI Capabilities
arXiv:2605.20520v1 Announce Type: new Abstract: Benchmark-based evaluation remains important for tracking frontier AI progress. But it can both overstate and understate deployed capability because it privileges tasks that can be precisely specified, automatically graded, easy to optimize for, and run with low budgets and short time horizons. We advocate for a complementary class of evaluations, which we term open-world evaluations: long-horizon, messy, real-world tasks assessed through small-sample qualitative analysis rather than benchmark-scale automation. In this paper we survey recent open-world evaluations, identify their strengths and limitations, and introduce CRUX (Collaborative Research for Updating AI eXpectations), a project for conducting such evaluations regularly. As a first instance, we task an AI agent with developing and publishing a simple iOS application to the Apple App Store. The agent completed the task with only a single avoidable manual intervention, suggesting that open-world evaluations can provide early warning of capabilities that may soon become widespread. We conclude with recommendations for designing and reporting open-world evals.
AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
arXiv:2605.20530v1 Announce Type: new Abstract: Large language model agents now act on codebases, browsers, operating systems, calendars, files, and tool ecosystems, but the benchmarks used to evaluate them are fragmented: each emphasizes a different unit of measurement (final task success, tool-call validity, repeated-pass consistency, trajectory safety, or attack robustness). A line of 2024-2025 work has converged on the diagnosis that a single accuracy column is no longer the right unit of comparison for deployable agents. AgentAtlas extends this line of work with four components: (i) a six-state control-decision taxonomy (Act / Ask / Refuse / Stop / Confirm / Recover); (ii) a nine-category trajectory-failure taxonomy with two orthogonal hierarchical labels (primary_error_source, impact); (iii) a taxonomy-aware vs. taxonomy-blind methodology that measures how much of a model's apparent capability comes from the supervision in the prompt; and (iv) a benchmark-coverage audit mapping fifteen agent benchmarks against six behavioral axes. To demonstrate the methodology we run a small fixed eight-model set (1,342 generated items, four frontier closed and four open-weight) under both prompt modes. Removing the explicit label menu drops every model's trajectory accuracy by 14-40 pp to a tight 0.54-0.62 floor regardless of family, and no single model wins on all three of control accuracy, trajectory diagnosis, and tool-context utility retention. We treat the synthetic run as a measurement-protocol demonstration, not a benchmark release.
The efficiency-gain illusion: People underestimate the rate of AI use and overestimate its benefits on simple tasks
arXiv:2605.22687v1 Announce Type: new Abstract: People are increasingly turning to AI assistance for simple tasks, e.g., arithmetic, spell-check, and answering simple questions. But does AI assistance actually save users time and effort? We investigate people's propensity to use AI for cognitively simple tasks and assess whether their reliance is well-calibrated. Across three pre-registered user studies (N = 2691), we find that people frequently choose to use AI even when doing so is inefficient (i.e. provides no meaningful time or effort savings). We identify systematic miscalibration at two levels: (1) a self-estimate miscalibration where people on average believe that they are using AI less than they actually are, and (2) efficiency-gain illusions where people overestimate how much time and effort savings AI use affords. We also identify a session-level carryover effect where a participant's prior AI use leads to further AI adoption and entrenches their miscalibration about time savings. Our results shed light on the mechanisms and biases underlying people's choice of whether to use AI as well as the risk of an overreliance feedback loop.
THE DAILY SCRAPE - by Brent Orrell - Help Desk - Substack
Matthew Prince writes in the WSJ that Cloudflare laid off over 20% of its workforce while growing revenue more than 30%, targeting what Peter Drucker called "measurers" — middle managers, operations, internal audit, finance, compliance, marketing — rather than builders or sellers. Prince argues AI now measures organizations more continuously and precisely than humans can, and predicts the growth-with-layoffs pattern will become standard across the next year.
Cisco used AI to write security incident reports, with mixed results
You’ll need a lot of detailed prompts to get solid output - and even then it may have errors and typos
Presien Reduces Critical Safety Events on Construction Sites by 70%+ with Claude
Presien utilizes Claude for continuous, AI-driven risk detection on construction sites, marking a shift from manual review to automated safety monitoring.
Agencies look to AI, automation amid growth in digital records | Federal News Network
Federal leaders see automation and AI as crucial to wrangling an ever increasing tide of digital records that's leading to backlogs in areas like FOIA.
Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines
arXiv:2605.20630v1 Announce Type: new Abstract: Industrial asset operations workflows are latency-sensitive because a single user query may require coordination over sensor data, work orders, failure modes, forecasting tools, and domain-specific agents. We evaluate this problem on AssetOpsBench (AOB), an industrial agent benchmark whose plan-execute pipeline exposes repeated overhead from tool discovery, LLM planning, MCP tool execution, and final summarization. Existing LLM caching techniques such as KV-cache reuse and embedding-based semantic caching were designed for chatbot serving and break down when output validity depends on time, asset, or sensor parameters. We propose two complementary optimization layers for AOB plan-execute pipelines: a temporal semantic cache and a set of MCP workflow optimizations combining disk-backed tool-discovery caching and dependency-aware parallel step execution. MCP workflow optimizations corresponded to a 1.67x speedup and reduced median end-to-end latency by about 40.0% while the temporal-cache benchmark achieved a median of 30.6x speedup on cache hits. Beyond the speedup, our results expose a concrete failure mode of pure semantic caching for parameter-rich industrial queries, providing a critical analysis of how caching choices interact with evaluation correctness in MCP-backed agent benchmarks.
Addressing the Synergy Gap: The Six Elements of the Design Space
arXiv:2605.21635v1 Announce Type: cross Abstract: AI is now embedded in healthcare, finance, policy, and many other domains, yet genuine human-AI synergy - combined performance that exceeds what either party achieves alone - is uncommon. Meta-analyses show that AI assistance tends to improve human performance compared to working alone, but studies finding true synergy are scarce. We call this persistent shortfall the synergy gap. Most current work treats human-AI combination as an engineering problem and concentrates on interpretability, trust calibration, or interface design. These matter, but they cover only part of what determines whether combination works. Closing the synergy gap, we argue, requires explicit engagement with a wider design space. We map that space through six interconnected elements: sociotechnical context, decision-making frameworks, human decision participants, AI capabilities, interaction, and holistic evaluation. For each element, we describe what it covers, how it shapes the others in practice, and what it implies for design. The result is a shared vocabulary for practitioners building hybrid systems, an analytical lens for researchers studying combination patterns, and a starting point for evaluators interested in the full quality of human-AI decision-making rather than accuracy alone.
Zara Owner Inditex’s CEO Bets on Diversification, AI for Growth
Zara owner Inditex SA is banking on diversification across brands and countries as well as artificial intelligence to spur growth at the world’s largest listed retailer, Chief Executive Officer Óscar García Maceiras said.
Council Post: How AI Is Changing The Economics Of Integration
Integration is no longer just a painful phase to complete and move past.
Geopolitics, Policy & Governance
Taiwan Launches Major Crackdown on NVIDIA AI Chip Smuggling Network
The raids also underline the broader geopolitical importance of AI accelerators. High-end GPUs are increasingly viewed as strategically important technologies tied to national security, industrial competitiveness, and advanced research capabilities. Export controls surrounding AI infrastructure ...
Xi-Trump summit shows a rivalry being managed, not resolved
Chinese President Xi Jinping's question to Donald Trump about whether the US and China can escape the “Thucydides Trap” framed a summit that showed the two powers are learning to manage competition rather than end it.
Nvidia excludes China data center revenue from outlook amid H200 delay
Nvidia is not assuming data center compute revenue from China for its Q2 fiscal 2027 outlook, as Beijing has yet to approve imports of the H200 chip.
China, Russia pledge closer AI, cybersecurity ties during Putin Visit
Following a visit by President Vladimir Putin, China and Russia have pledged to deepen cooperation in artificial intelligence technologies and strengthen efforts to combat cybercrime.
AI & Tech Brief: White House AI order now postponed - The Washington Post
President Donald Trump cites overregulation concerns
Position: The Pre/Post-Training Boundary Should Govern IP in Industry-Academia ML Collaborations
arXiv:2605.22632v1 Announce Type: new Abstract: Industry-academia ML collaborations routinely fail to launch -- not for scientific reasons, but because academics must publish while companies must protect models trained on proprietary data, and no standard contract framework resolves this tension. Because contracts are negotiated by legal departments alone, many apparent legal disputes are incentive misalignment problems that only scientists at the table can correctly diagnose. We propose PBOS (Protect-the-Business / Open-Source-the-Science), a community-adoptable contract template anchored to a single technically-grounded boundary: pre-training artifacts (architectures, training code, benchmarks, untrained weights) are open science; post-training artifacts (weights trained on proprietary data) are business IP. This boundary is technically meaningful, legally clean, and auditable -- and could not have been drawn correctly without scientists at the negotiating table. We argue the ML community should adopt PBOS as its default contract for such collaborations.
Barriers to Evidence in AI-Related Cases and the Privatization of Proof
arXiv:2605.21816v1 Announce Type: new Abstract: Evidence lies at the core of litigation, but it is increasingly difficult to obtain in AI-related disputes. Even when a claimant's position has merit, cases are often settled or dismissed because decisive facts are hidden inside proprietary models, platform logs, and protected databases. Grounding our discussion in past and ongoing cases, we investigate how asymmetries in access, resources, and expertise can create significant barriers to evidence in AI-related cases. We show how developers and deployers resist disclosure through various strategies challenging the value of the evidence to the requesting party and the cost of evidence production. From these patterns we identify seven recurring sources of asymmetry -- access to models, data, documentation, logs, expertise, compute, and infrastructure -- that reflect a broader pattern that we call the privatization of proof: when control over proof falls in the hands of private actors that can demand justification for access while ensuring that justification remains out of reach. We further argue that different types of access can be fungible: in the absence of a certain type of access (e.g., to model internals), one may be able to use alternative forms of access (e.g., sufficient compute, query access, and access to user logs) and to obtain a functionally equivalent amount of information. We propose a three-part test that can help resolve AI access disputes in litigation, drawing on concepts such as proportionality and reasonable alternatives. Our test relies on a few observations, including that the cause of action can provide a baseline for access.
Last-minute lobbying by tech industry officials led Trump to cancel AI order - The Washington Post
Eleventh-hour phone calls with industry leaders and former AI and crypto czar David Sacks helped persuade President Donald Trump not to sign a highly anticipated executive order on artificial intelligence on Thursday.
Sadiq Khan sparks row with Met after blocking £50m AI deal with Palantir
Exclusive: Scotland Yard criticises London mayor’s decision as disappointing and warns it could hit policing Sadiq Khan has blocked a £50m Metropolitan police deal with the controversial US tech company Palantir, sparking a bitter row between the London mayor and Scotland Yard. After the UK’s largest police force had agreed to use Palantir’s AI technology to automate intelligence analysis in criminal investigations, Khan intervened, citing “serious concerns” about how the deal had been struck. Continue reading...
Opinion | Illinois is less suited to regulate AI than Congress - The Washington Post
The Illinois state Senate has fast tracked eight bills to regulate AI and aims to pass them before its session wraps on May 31.
EU digital sovereignty rules may raise costs, worsen services, tech lobby warns
The EU's push for digital sovereignty could lead to higher costs and inferior cloud services, according to a leading tech association representing companies like Amazon, IBM, and Microsoft.
US deepfake legislation would expand safe harbor, takedown system
A revised version of the bipartisan NO FAKES Act aims to establish property rights for digital likenesses while expanding safe harbor protections and notice-and-takedown systems for internet platforms.
The Growth Of Dual-Use By Design Research In Europe: Export Control Risks And Challenges – Analysis
By Lauriane Héau Both European states and the European Union (EU) are trying to accelerate and support national rearmament and
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.