AI Intelligence Brief

Mon 29 June 2026

Daily Brief — Curated and contextualised by Best Practice AI

136Articles
Editor's pickSummary

The BIS Warns of Recession, Governments Vet Model Access, and Investors Scramble for Power

TL;DRThe Bank for International Settlements warned that excessive AI investment spending could trigger a global recession. The US government has mandated federal vetting for all new customers of frontier AI models from OpenAI and Anthropic. Meanwhile, the US power sector saw a record $200 billion in M&A activity driven by data center energy demands. Google has begun capping Gemini usage due to severe compute shortages.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

11 of 136 articles
Lead story
Editor's pickPAYWALL
WSJ· Yesterday

BIS Sees Peril for Economy, Financial System in AI Investment Boom

Fierce competition will dominate artificial intelligence risks driving investment spending to excessive levels that could tip some economies into recession, the Bank for International Settlements said.

Editor's pickProfessional Services
Arxiv· Today

"Generate" the Future of Work through AI: Empirical Evidence from Online Labor Markets

arXiv:2308.05201v4 Announce Type: replace-cross Abstract: Large Language Model (LLM)-based generative AI systems are general-purpose tools capable of augmenting or even automating a wide range of job functions, positioning them to reshape labor market dynamics. However, predicting their precise impact a priori is challenging, given AI's simultaneous effects on both demand and supply, as well as the strategic responses of market participants. Leveraging an extensive dataset from a leading online labor platform, we document a pronounced displacement effect and an overall contraction in submarkets where required skills closely align with core LLM functionalities. Although demand and supply both decline, the reduction in supply is comparatively smaller, thereby intensifying competition among freelancers. Notably, further analysis shows that this heightened competition is especially pronounced in programming-intensive submarkets. This pattern is attributed to skill-transition effects: by lowering the human-capital barrier to programming, ChatGPT enables incumbent freelancers to enter programming tasks. Moreover, these transitions are not homogeneous, with high-skilled freelancers contributing disproportionately to the shift. Our findings illuminate the multifaceted impacts of general-purpose AI on labor markets, highlighting not only the displacement of certain occupations but also the inducement of skill transitions within the labor supply. These insights offer practical implications for policymakers, platform operators, and workers.

Editor's pickPAYWALLTechnology
Bloomberg· Today

South Korea Unveils Plan to Sustain Lead in AI

South Korea unveiled an ambitious plan aimed at cementing its status as a technological powerhouse, with companies led by Samsung Electronics Co. and SK Hynix Inc. initiating large-scale investments in memory chips, data centers and robotics. Samsung and SK Hynix leaders sat next to President Lee Jae Myung at a briefing, and are expected to give more details of their future investment plans. Bloomberg's Cat Barton reports from Seoul. (Source: Bloomberg)

Editor's pickPAYWALLEnergy & Utilities
FT· Today

AI fuels record $200bn M&A boom in US power sector

Companies in dealmaking blitz as they seek to build the energy infrastructure for data centres

Editor's pickProfessional Services
Arxiv· Today

Measuring Racial Disparities in Rent Growth Under Algorithmic Landlord Concentration in U.S. Metros

arXiv:2606.27525v1 Announce Type: new Abstract: The 2024 Department of Justice antitrust complaint against RealPage, Inc. named five major residential REITs for coordinating algorithmic rent pricing across hundreds of thousands of apartment units in major US metropolitan areas. This paper studies whether census-tract-level corporate landlord concentration (CLC), measured from SEC EDGAR 10-K property filings geocoded to census tracts, the first such application in the literature, is associated with rent growth 2019-2023, and whether that association is larger in majority-minority neighborhoods. Rent outcomes are measured using the Zillow Observed Rent Index (ZORI). To account for the possibility that corporate landlords preferentially locate in neighborhoods already seeing rent appreciation, all regressions control for a fully novel Algorithmic Housing Burden Index (AHBI), a composite of pre-existing rent burden and market tightness from ACS data. Across 665 census tracts in ten US metropolitan areas, doubling REIT concentration is associated with 2.8 percentage points higher rent growth (p = 0.086, p = 0.030, HC1 robust). This association is significantly stronger in majority-minority tracts. Within the same metro, high-CLC majority-minority tracts are associated with 5.9 percentage points higher rent growth than comparable white tracts (p = 0.039). An XGBoost model predicts 44 percent of out-of-sample rent growth variance, with SHAP analysis independently confirming that CLC's contribution is positive in minority tracts and negative in white tracts. Taken all together, these findings provide the first tract-level evidence consistent with corporate landlord concentration being associated with disproportionately higher rent growth in communities of color.

Editor's pickPAYWALLManufacturing & Industrials
Bloomberg· Today

German AI Rollout Offers €300 Billion Fix for Worker Shortage

At a made-to-order homebuilder in northwest Germany, processing more than 250 invoices a week used to swallow the equivalent of four working days. After introducing artificial intelligence last year, the task takes half as long.

Editor's pickPAYWALLTechnology
FT· Yesterday

Google caps Meta’s Gemini use as AI demand strains capacity

Surging appetite for advanced models is turning computing power into the tech industry’s scarcest commodity

Editor's pickTechnology
Daily Brew· Yesterday

Claude Code turned every engineer into three. Now companies need more product thinkers

Claude Code has amplified engineering productivity, but companies now require more product thinkers to leverage the increased output effectively.

Editor's pickPAYWALL
FT· Today

Artificial intelligence and Engels’ Pause

Politics could be a bigger constraint than power and compute

Editor's pickTechnology
Times of India· Yesterday

AI enters cost-conscious era as enterprises chase returns - The Times of India

NEW DELHI: The enterprise AI gold rush is giving way to a more disciplined phase. After two years of racing to deploy the biggest models and consume more compute, companies are asking a simpler question: is every AI rupee generating measurable business returns?

Editor's pick
Artificial Intelligence Newsletter | June 29, 2026· Today

GDPR — not AI Act — delayed release of frontier AI in Europe, research shows

Research from the Centre for the Governance of AI indicates that GDPR, rather than the EU AI Act, has been the primary regulatory factor delaying the release of frontier AI models in Europe.

Economics & Markets

44 articles
AI Investment & Valuations19 articles
Editor's pickPAYWALLTechnology
Bloomberg· Yesterday

Samsung, SK Prep Record Spending to Sustain Korea’s AI Lead

South Korea unveiled an ambitious plan aimed at cementing its status as a technological powerhouse, with companies led by Samsung Electronics Co. and SK Hynix Inc. initiating large-scale investments in memory chips, data centers and robotics.

Editor's pickPAYWALL
FT· Yesterday

AI ‘exuberance’ risks ending in lengthy investment bust, BIS warns

Weak returns could trigger a sharp pullback in funding for tech companies that threatens the global economy

Editor's pickFinancial Services
Guardian· Yesterday

Australian with retirement savings? You probably own SpaceX

Tech and AI stocks now make up as much as 12% of most balanced superannuation funds, experts say Follow our Australia news live blog for latest updates Get our breaking news email, free app or daily news podcast Artificial intelligence and technology stocks have become a driving force on Wall Street and, unbeknownst to most Australians, a growing part of their retirement savings. The so-called “magnificent seven” – chip maker Nvidia, Google owner Alphabet, Apple, Microsoft, Amazon, Facebook owner Meta and Tesla – are, for better or worse, increasingly part of the portfolios offered by superannuation funds. Continue reading...

Editor's pickPAYWALLFinancial Services
Bloomberg· Yesterday

BlueBay Sees Near-Term Risk in Japan AI Stocks Followed by Rally

RBC BlueBay Asset Management remains bullish on Japanese AI-related stocks, expecting the rally to extend into 2027, while trimming near-term risk ahead of a potential slowdown in July and August.

Editor's pickPAYWALL
FT· Yesterday

Are AI stocks headed for further turbulence?

Market Questions is the FT’s guide to the week ahead

Editor's pickEnergy & Utilities
Top Daily Headlines: Security boss thought MFA would be too much security· Today

Engineer accused of insider trading tied to Microsoft's reboot of Three Mile Island nuclear plant

The SEC alleges a former Constellation employee made $1.4 million trading options before the announcement of the nuclear plant restart deal.

Editor's pickTechnology
The Guardian· Today

Shares in chipmakers underpinning AI boom rocket in first half of 2026 | Technology sector | The Guardian

Value of some chip manufacturers have tripled, or more, driving Asia Pacific stock markets sharply higher

Editor's pickTechnology
Intellectia.AI· Yesterday

AI Infrastructure Investment 2026: $600B Hyperscaler Boom

If the technology fails to achieve ... productivity gains, the current spending will represent one of the largest capital misallocations in corporate history. The outcome will likely determine which companies emerge as the dominant technology platforms of the next decade. ... Amazon Web Services (AWS) has emerged as the most aggressive investor in AI infrastructure, with planned 2026 capital ...

Editor's pickTechnology
Memeburn· Yesterday

The Companies Winning AI Are Starting to Build Their Own Chips - Memeburn

Qualcomm's reported deal with ByteDance reveals a broader AI hardware shift: companies are moving from buying custom silicon around their own workloads. Here's why that matters.

Editor's pickFinancial Services
The Telegraph· Yesterday

AI boom risks global financial crash, warn central bankers

Reversal of ‘excessive’ tech investments could have serious economic consequences, report finds

Editor's pickTechnology
Daily Brew· Yesterday

Anthropic Eyes Near-$1 Trillion IPO Amid Revenue Growth, Strategic Partnerships, and AI Safety Focus

Anthropic is gearing up for a potential IPO, having filed a confidential Form S-1 with the SEC, highlighting its strong revenue growth and strategic partnerships with Amazon and Google.

Editor's pickDefense & National Security
Daily Brew· Today

Palantir Joins Forces with Army to Enhance AI Tools, Shares Surge 5% on Defense Modernization Project

Palantir is collaborating with the U.S. Army on AI-enhanced tools under the NGC2 program, focusing on rapid prototyping and improved decision-making for soldiers.

Editor's pickTechnology
Yahoo! Finance· Yesterday

Emerging markets are now an AI chip trade: Chart of the Day

It's getting harder to diversify out of the global AI chip trade.

Editor's pickTechnology
Yahoo! Finance· Yesterday

'We don't view this as a bubble' that will pop soon: Wall Street weighs surging AI costs on stock market rally

Wall Street analysts told Yahoo Finance that Micron's earnings proved AI demand remains strong.

Editor's pickTechnology
Yahoo! Finance· Yesterday

5 big analyst AI moves: Micron price targets hiked, cautious on SpaceX valuation

Investing.com -- Here are the biggest analyst moves in the area of artificial intelligence (AI) for this week.

Editor's pickFinancial Services
Seoul Economic Daily· Yesterday

Billionaire Who Foresaw Dot-Com Bubble Warns AI Bubble Will Burst Within Years - Seoul Economic Daily

Published 2026.06.28. 15:28:05| ... Jeremy Grantham, co-founder of global investment firm GMO and one of Wall Street's most prominent pessimists, has called the current artificial intelligence (AI) investment frenzy "the biggest bubble in history" and urged investors to respond cautiously.

Editor's pickFinancial Services
TechStory· Yesterday

AI Bubble About to Burst? Chinese Hedge Funds Sound Fresh Warning to Investors - TechStory

The race to dominate artificial intelligence has produced some of the biggest winners in financial markets over the past two years. Chipmakers have seen their share prices soar. Software firms have rushed to add AI products. Private companies have secured funding at valuations that would have ...

Editor's pickTechnology
GuruFocus· Yesterday

Retail Investors Eye Tech Stocks Amid Valuation Concerns: MSFT, GOOGL, META

On June 28, 2026, retail investors are navigating a complex landscape in technology stocks, particularly with major players like Microsoft Corp (MSFT). A recent

Editor's pick
Finbold· Yesterday

It’s ‘game over’ on AI bubble, warns senior market analyst

A financial analyst has warned that it could be "game over" if the AI bubble bursts and investment in the sector dries up.

AI Macroeconomics6 articles
Editor's pickPAYWALL
WSJ· Yesterday

BIS Sees Peril for Economy, Financial System in AI Investment Boom

Fierce competition will dominate artificial intelligence risks driving investment spending to excessive levels that could tip some economies into recession, the Bank for International Settlements said.

Editor's pickManufacturing & Industrials
Arxiv· Today

Heterogeneous Diffusion of Electric Vehicles in China: Demand, Learning, Product Entry, and the Incidence of Industrial Policy

arXiv:2606.27924v1 Announce Type: new Abstract: China's electric-vehicle (EV) sales share rose from about 1% in 2015 to roughly 45% in 2024. We evaluate this technology transition with an equilibrium differentiated-products model of the Chinese auto market, and quantify both its attribution and its welfare and reallocation consequences. Every yuan of 2024 EV subsidy delivered about 3.38 yuan of private surplus, but this surplus accrued asymmetrically. Per-capita consumer-surplus loss from subsidy removal is about five times larger in Tier 1 than in the Rest tier; about half of the aggregate welfare loss operates through indirect Wright's-law learning rather than the direct cash transfer; and EV-native firms (BYD, Tesla, New Forces) retain 16-27% of their 2024 EV business under subsidy removal while traditional state-owned manufacturers retain only 11%. A Shapley decomposition into six channels -- Quality, Variety, Battery, Subsidy, Residual, and Market -- attributes the historical 2015-2024 rise primarily to product-quality gains (+45.49%), choice-set expansion (+14.81%), and battery-cost decline (+8.20%). The Subsidy block is negative (-13.63%) because direct purchase subsidies were phased down, not because subsidies reduce demand: a separate counterfactual that removes the 2024 subsidy entirely lowers EV share by 23-33%.

Editor's pickEnergy & Utilities
Arxiv· Today

Major Space Weather Risks Identified via Coupled Physics-Engineering-Economic Modeling

arXiv:2412.18032v3 Announce Type: replace-cross Abstract: Space weather poses an important but under-quantified threat to society. While severe geomagnetic storms are recognized as potential global catastrophes, their socio-economic impacts remain poorly quantified. We present a novel physics-engineering-economic framework that links geophysical drivers to power grid geoelectric fields, transformer vulnerability, and macroeconomic consequences. Using the United States as an example, we estimate daily U.S. economic losses for a 250-year geomagnetic storm from transformer thermal heating of 2.04 billion USD (95 percent confidence interval: 1.86 to 2.22 billion USD), disrupting power for approximately 5.7 million people and 150,000 businesses. These estimates are conservative lower bounds, reflecting only transformer thermal heating effects and excluding voltage collapse, cascading failures, and restoration costs. The true societal risk is likely substantially higher. Nonetheless, the contribution is in providing the first nationwide end-to-end coupling from space physics to potential macroeconomic loss, with quantified uncertainties. Our results demonstrate that coupled socio-economic modeling of space weather is both feasible and essential, and the framework is scalable and transferable, offering a template for assessing space weather risk to critical infrastructure in other countries.

AI Market Competition6 articles
AI Productivity6 articles
Editor's pickProfessional Services
Arxiv· Today

Towards Automating Scientific Review with Google's Paper Assistant Tool

arXiv:2606.28277v1 Announce Type: cross Abstract: Artificial intelligence is driving a revolution in scientific discovery, accelerating everything from hypothesis generation to mathematical theorem proving. However, this rapid acceleration is creating a systemic challenge: traditional human peer review cannot scale to match the influx of AI-assisted science. Ultimately, to resolve this tension, we must also deploy AI to accelerate the verification and review process itself. To frame the discussion around this transition, we propose a taxonomy consisting of four progressive levels of AI-human collaboration in scientific evaluation, and discuss various trade-offs involved with each. As a step toward this future, we introduce the Paper Assistant Tool (PAT), an agentic AI framework built for deep scientific review and verification. PAT ingests full scientific manuscripts and produces a comprehensive evaluation, checking theoretical results, validating experiments, suggesting improvements, and identifying potential flaws. By utilizing inference scaling techniques, PAT is able to identify deeper issues than a single model call alone, achieving a 34% improvement over zero-shot recall on mathematical errors in the SPOT benchmark. Pilot deployments of PAT as a pre-submission tool for authors at two major Computer Science conferences -- STOC and ICML -- demonstrate its ability to identify critical errors and suggest substantive improvements to research papers. By catching errors early, PAT eases the cognitive burden placed on referees, while preserving their control over the outcomes of the review process.

Editor's pickProfessional Services
Daily Brew· Yesterday

AI Revolutionizes Law Practice: Scale Law Firm AI Offers Tailored Workshops and Workflow Automation

Shift Into AI's Scale Law Firm AI has appointed Tima Mousavi to lead AI training, enhancing efficiency for U.S. and Canadian law firms through automated workflows.

Editor's pick
Arxiv· Today

How to deal with machine learning bias in economic history

arXiv:2606.28063v1 Announce Type: new Abstract: Machine learning (ML) has rapidly transformed economic history, lowering costs of digitization, data linkage, and imputation, and making information in historical text usable at scale. This paper offers a practical guide to using these tools well. However, ML tools have also created new problems. Prediction errors are often systematically correlated with covariates of interest, so even highly accurate models can distort and sometimes reverse coefficients, and standard validation cannot detect this. Given that ML tools often perform worse for historical data, this problem is especially severe for the field of economic history. We also identify a solution to this problem. We show that recent debiasing methods can correct such bias for a wide class of applications, using a small, randomly sampled set of expert-coded labels while retaining the efficiency of large-scale prediction. We organize the field with a taxonomy of three ML tasks, survey the literature along it, and indicate where debiasing applies and where validation against proxies remains the only recourse. We close with best-practice guidance on digitization, model choice, and reproducibility.

Labor, Society & Culture

21 articles
AI & Employment7 articles
Editor's pickPAYWALL
WSJ· Yesterday

Why do predictions about the impact of artificial intelligence on jobs vary so widely? Three economists defend their positions

The predictions are all over the place. Why do the optimists and pessimists see the same data—and come to such widely different conclusions?

Editor's pickProfessional Services
Arxiv· Today

"Generate" the Future of Work through AI: Empirical Evidence from Online Labor Markets

arXiv:2308.05201v4 Announce Type: replace-cross Abstract: Large Language Model (LLM)-based generative AI systems are general-purpose tools capable of augmenting or even automating a wide range of job functions, positioning them to reshape labor market dynamics. However, predicting their precise impact a priori is challenging, given AI's simultaneous effects on both demand and supply, as well as the strategic responses of market participants. Leveraging an extensive dataset from a leading online labor platform, we document a pronounced displacement effect and an overall contraction in submarkets where required skills closely align with core LLM functionalities. Although demand and supply both decline, the reduction in supply is comparatively smaller, thereby intensifying competition among freelancers. Notably, further analysis shows that this heightened competition is especially pronounced in programming-intensive submarkets. This pattern is attributed to skill-transition effects: by lowering the human-capital barrier to programming, ChatGPT enables incumbent freelancers to enter programming tasks. Moreover, these transitions are not homogeneous, with high-skilled freelancers contributing disproportionately to the shift. Our findings illuminate the multifaceted impacts of general-purpose AI on labor markets, highlighting not only the displacement of certain occupations but also the inducement of skill transitions within the labor supply. These insights offer practical implications for policymakers, platform operators, and workers.

Editor's pickPAYWALL
FT· Today

Artificial intelligence and Engels’ Pause

Politics could be a bigger constraint than power and compute

Editor's pickPAYWALL
Theatlantic· Yesterday

The People Who Will Thrive in the AI Age

What will differentiate people is not how smart they are but their relationship to mental effort.

AI & Inequality2 articles
Editor's pickProfessional Services
Arxiv· Today

Measuring Racial Disparities in Rent Growth Under Algorithmic Landlord Concentration in U.S. Metros

arXiv:2606.27525v1 Announce Type: new Abstract: The 2024 Department of Justice antitrust complaint against RealPage, Inc. named five major residential REITs for coordinating algorithmic rent pricing across hundreds of thousands of apartment units in major US metropolitan areas. This paper studies whether census-tract-level corporate landlord concentration (CLC), measured from SEC EDGAR 10-K property filings geocoded to census tracts, the first such application in the literature, is associated with rent growth 2019-2023, and whether that association is larger in majority-minority neighborhoods. Rent outcomes are measured using the Zillow Observed Rent Index (ZORI). To account for the possibility that corporate landlords preferentially locate in neighborhoods already seeing rent appreciation, all regressions control for a fully novel Algorithmic Housing Burden Index (AHBI), a composite of pre-existing rent burden and market tightness from ACS data. Across 665 census tracts in ten US metropolitan areas, doubling REIT concentration is associated with 2.8 percentage points higher rent growth (p = 0.086, p = 0.030, HC1 robust). This association is significantly stronger in majority-minority tracts. Within the same metro, high-CLC majority-minority tracts are associated with 5.9 percentage points higher rent growth than comparable white tracts (p = 0.039). An XGBoost model predicts 44 percent of out-of-sample rent growth variance, with SHAP analysis independently confirming that CLC's contribution is positive in minority tracts and negative in white tracts. Taken all together, these findings provide the first tract-level evidence consistent with corporate landlord concentration being associated with disproportionately higher rent growth in communities of color.

Editor's pickEducation
Arxiv· Today

DysLexLens: A Low-Resource LLM Framework for Analysing Dyslexic Learners Insights from Online Forums

arXiv:2606.27619v1 Announce Type: new Abstract: Dyslexic learners increasingly use artificial intelligence (AI) tools to support reading, writing, organisation, and study-related tasks. However, their lived experiences with these tools remain largely underexamined. This paper proposes DysLexLens, a low-resource LLM framework, designed to analyse dyslexic learners experience with AI through online forum discussions. DysLexLens is designed as an end-to-end, evidence-traceable architecture which transforms noisy social media posts into a dictionary-driven corpora, provides knowledge-graph (KG)-based question reasoning, generates verifiable query responses, and enables response evaluation through quantitative and human-grounded assessment. DysLexLens has four key features. First, it employs a dictionary-driven filtering method to construct a more focused Reddit corpus on dyslexia and AI, filtering out noisy and weakly related posts to improve the relevance of data collected from low-resource forum contexts. Second, it integrates LLM-assisted semantic analysis with KG-based query reasoning to uncover meaningful patterns. Third, it has quantitative evaluation metrics (RAGAS and Query Robustness) to measure LLM-generated response performance. Fourth, it provides structured qualitative validation guidelines for assessing response quality, with a specific focus on hallucination and evidence alignment. We demonstrate the effectiveness of DysLexLens using dyslexia-related Reddit forum data and 30 questions. The results show its potential generalisability to other low-resource forum data contexts. DysLexLens, sample data, questions and evaluation results are available at Github to support reproducibility.

AI & Misinformation2 articles
AI Ethics & Safety7 articles
Editor's pick
Arxiv· Today

AI Persuasive Framing in Collective Dilemmas

arXiv:2606.27951v1 Announce Type: new Abstract: AI agents are promising tools that can act as flexible behavioral nudges to enhance human cooperation in addressing large-scale societal problems. However, evidence on whether AI agents can effectively boost cooperation remains mixed. We recruited 1,283 participants to play iterated Collective Risk Games in small groups, testing whether AI assistants could nudge participants toward cooperation. By using persuasive framing personalized to each player's Social Value Orientation profile, the AI interventions significantly increased contributions and group success rates. These cooperative effects were short-lived, however, fading after the first few rounds. Strikingly, when the AI treatments were reconfigured to promote selfish behavior through exculpatory framing, the negative effects on contributions and group success were larger and substantially more persistent, particularly for personalized interventions. This asymmetry between prosocial and antisocial persuasion highlights the dual-use risks of AI systems designed to influence group behavior in collective action settings.

Editor's pick
Arxiv· Today

NormAct: A Benchmark for Hidden Social Norm Compliance in Embodied Planning

arXiv:2606.27826v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) are increasingly deployed as embodied planners in egocentric environments, where task success requires not only achieving instructed goals but also acting in socially appropriate ways. While explicit goals may render certain actions optimal, implicit social norms often impose hidden constraints. Existing evaluations typically focus on explicit goal achievement or direct norm knowledge, seldom assessing whether planners can infer and apply these hidden constraints within action sequences. We introduce NormAct, a benchmark for embodied social-norm interactions that evaluates plans on Goal Achievement, Norm Compliance, and overall Task Success. NormAct uniquely embeds hidden norms within ordinary tasks, testing whether models can realize them without explicit instruction. Experiments with state-of-the-art MLLMs (GPT-5.4, Claude Opus 4.7, Gemini 3 Pro) reveal a significant gap: models achieve explicit goals in 67.3\% of cases, but comply with hidden norms in only 26.4\%. Cue-condition experiments indicate that this gap stems not from a lack of general social knowledge, but from challenges in activating and grounding relevant norms in context. To address this, we propose NormPerceptor, a context-conditioned cue generator that infers scene-relevant norms prior to planning, increasing Task Success from 24.2\% to 46.7\%. Our results underscore the importance of enabling embodied agents to proactively detect hidden norms, ground them in visual evidence, and integrate them as action-planning constraints. Our benchmark is publicly available at https://huggingface.co/datasets/Caleb196x/NormAct.

Editor's pick
Arxiv· Today

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers

arXiv:2604.24155v3 Announce Type: replace Abstract: The project of aligning machine behavior with human values raises a basic problem: whose moral expectations should guide AI decision-making? Much alignment research assumes that the appropriate benchmark is how humans themselves would act in a given situation. Studies of agent-type value forks challenge this assumption by showing that people do not always judge humans and AI systems identically.This paper extends that challenge by examining two further possibilities: first, that evaluations of AI behavior change when its human origins are made visible; and second, that people judge the humans who program AI systems differently from either the machines or the human actors they are compared against. An experiment with 1,002 U.S. adults measured moral judgments in a runaway mine train scenario, varying the subject of evaluation across four conditions: a repairman, a repair robot, a repair robot programmed by company engineers, and company engineers programming a repair robot. We find no significant difference in evaluations of the repairman and the robot. However, judgments shifted substantially when the robot's actions were described as the product of human design. Participants exhibited markedly more deontological, rule-based reasoning when evaluating either the programmed robot or the engineers who programmed it, suggesting that rendering human agency visible activates heightened moral constraints. These findings indicate that people may evaluate humans, AI systems acting in the same situation, and the humans who design them in meaningfully different ways. The fact that these evaluations do not necessarily converge gives rise to the alignment target problem: which normative target should guide the development of artificial moral agents in high-stakes domains, and whether these plural judgments can be reconciled within a coherent account of value alignment.

Editor's pick
Arxiv· Today

Psychometric Comparability of LLM-Based Digital Twins

arXiv:2601.14264v2 Announce Type: replace Abstract: Large language models (LLMs) act as digital twins for human respondents, yet their psychometric comparability remains uncertain. We propose a construct validity framework spanning construct representation and the nomothetic span, benchmarking models against human gold standards. Across studies, digital twins achieved high aggregate-level accuracy and profile correlations, but showed attenuated item-level correlations. In word association tests, LLM networks exhibited humanlike small-world structure and theory-consistent communities, yet diverged lexically and in local structure. In decision-making and contextualized tasks, they under-reproduced heuristic biases, demonstrating normative rationality, compressed variance, and limited temporal sensitivity. Feature-rich and trait relevant conditioning improved Big Five personality prediction and nomothetic-span alignment, but network invariance remained limited, with partial configural solutions and persistent loading differences. In cross-language free-text tasks in English and Chinese, feature-rich digital twins better approximated construct-level narrative content, but linguistic and idiographic differences persisted. These findings clarify that digital twins are most useful within validated boundaries, where the construct, task and level of inference align with evidence from human data.

Editor's pickGovernment & Public Sector
Guardian· Yesterday

‘It’s dangerous and it’s going to erode trust’: redesign of US government websites stokes surveillance fears

The National Design Studio, staffed by Doge veterans, installed visitor-tracking software on vital federal websites An opaque White House office staffed largely by veterans of Elon Musk’s “department of government efficiency” (Doge) has quietly rebuilt some of the federal government’s most sensitive websites – for passport applications, voter registration, prescription-drug pricing and children’s savings – in ways critics say appear to violate federal law. The National Design Studio (NDS) was established by a Donald Trump executive order last August, and is led by Trump-aligned Airbnb co-founder Joe Gebbia and staffed by Doge veterans. Continue reading...

Editor's pickConsumer & Retail
Artificial Intelligence Newsletter | June 29, 2026· 3 days ago

G7 data watchdogs consider minors’ privacy risks with age verification, smart devices

Privacy regulators from G7 nations signed a declaration addressing data protection for minors, focusing on connected home devices and age verification technologies.

Editor's pick
Arxiv· Today

Odyssey: Constructing Verifiable Local Truth-Preserving Foundation Models

arXiv:2606.27593v1 Announce Type: new Abstract: We introduce a categorical framework called ODYSSEY for constructing verifiable, local truth-preserving foundation models as compositions of foundries: building-block architectural components that specify a cover of local contexts, local representation families, restriction maps, gluing rules, obstruction policies, update obligations, and human-facing views. A foundry is an organized sheaf of knowledge that carries within it an argumentation component. Concrete foundries are built from generic foundries such as evidence/argument, operational decision, institutional/financial, market meaning, scientific challenge, research-program, assistant-build, and evaluation-harness foundries. Universal Foundry Learning (UFL) formalizes foundry construction as a composition of left and right Kan extensions, with left Kan extension rolling local artifacts into candidate foundries and right Kan extension enforcing the restriction, gluing, obstruction, and argumentation conditions required for promotion. Foundry SQL (FSQL) is a small typed query surface for slicing maintained foundry artifacts that uses TICKET (Topos Integration using Causal Kan Extension Transformers) certification for admitting external or pre-built models into durable ODYSSEY state. ODYSSEY is fully implemented and tested across a wide spectrum of concrete foundries, showing that the same categorical machinery supports domain construction, artifact replay, sheaf diagnostics, grounded Toulmin/local-LLM scrutiny, residual-obstruction ledgers, and optimized TICKET-compatible causal-claim extraction across heterogeneous sources. This paper is to be presented as a 2.5 hour tutorial at ICML 2026. The tutorial home page is at https://bit.ly/4ajS0nA.

AI Skills & Education3 articles

Technology & Infrastructure

35 articles
AI Agents & Automation6 articles
Editor's pickProfessional Services
Arxiv· Today

Your AI Travel Agent Would Book You a Bullfight: An Agentic Benchmark for Implicit Animal Welfare in Frontier AI Models

arXiv:2606.18142v3 Announce Type: replace-cross Abstract: AI agents are moving from advisors to actors, booking travel, planning menus, and running procurement on behalf of users. Existing benchmarks for AI and animal welfare evaluate model text responses to question-answer prompts, leaving open whether the welfare reasoning surfaced in those responses transfers to agentic deployment where the model must take actions with tools. We introduce TAC (Travel Agent Compassion), the first agentic benchmark measuring whether AI agents avoid options involving animal exploitation when acting on behalf of users. TAC presents an AI agent with twelve hand-authored travel booking scenarios across six categories of animal exploitation, augmented to forty-eight samples to control for price, rating, and position confounds. We evaluate seven frontier models from four labs. Every model scores below the chance level of sixty-four percent, with the best performer (Claude Opus 4.7) at fifty-three percent. A single welfare-aware sentence in the system prompt yields gains of forty-seven to sixty-three percentage points in Claude and GPT-5.5, twenty-six points in GPT-5.2, and under twelve points in DeepSeek and Gemini. An auxiliary Inspect Scout audit of 288 base-condition transcripts from the top two performers, using Gemini 2.5 Flash Lite as judge, flags zero transcripts for evaluation awareness, suggesting the below-chance rates do not stem from the models recognising the evaluation. We discuss implications for category-level variation across cultural domains, the limits of text-response welfare benchmarks, and the EU General-Purpose AI Code of Practice systemic risk framework.

Editor's pick
Arxiv· Today

When Does Personality Composition Matter for Multi-Agent LLM Teams?

arXiv:2606.27443v1 Announce Type: new Abstract: Personality prompting shapes how large language models communicate, yet whether these behavioral shifts affect objective task outcomes remains under-explored. Prior work shows that agents prompted with low agreeableness produce adversarial language, while those prompted with high agreeableness become cooperative, but the relationship between communication style and task performance has not been systematically examined across multiple domains. In this work, we investigate whether personality composition matters for multi-agent team performance by manipulating personality traits across frontier LLMs on three task domains: structured coding, open-ended research collaboration, and competitive bargaining. We find that personality effects depend critically on task structure. In coding tasks, low agreeableness leads to large communication shifts that have little effect on milestone completion. In open-ended collaboration and bargaining, the same manipulation substantially degrades performance. We discuss implications for multi-agent system design and the limits of personality manipulation.

Editor's pickTechnology
Moneycontrol· Today

AI agents to transform tech teams by 2027 as companies race to adapt: KPMG- Moneycontrol.com

Global survey finds companies accelerating investments in agentic AI, with digital assistants expected to account for over a third of core technology teams by 2027

Editor's pickTechnology
Substack· Yesterday

The Sequence Radar #885: Last Week in AI: Models, Games, and the Future of Evaluation

Patronus AI raises $50M Series B — Agent-evaluation startup Patronus AI raised a $50M Series B led by Greenfield Partners (total funding now $70M) and unveiled its first “Digital World Models,” large-scale simulation environments for training and stress-testing AI agents.

Editor's pickTechnology
Daily Brew· Yesterday

Tail Control: The Counterintuitive Engineering of Reliable Agentic Workflows

Delivering consistent, on-time AI responses is a challenge of variance rather than speed. The fixes for reliability are often counterintuitive.

Editor's pick
Arxiv· Today

Internalizing the Future: A Unified Agentic Training Paradigm for World Model Planning

arXiv:2606.27483v1 Announce Type: new Abstract: Large language model (LLM) agents have demonstrated strong capability in sequential decision-making, yet they remains fundamentally reactive in long-horizon tasks. Unlike humans who employ "what-if" reasoning to evaluate potential plans before commitment, standard agents lack an internal world model to simulate future outcomes. Therefore, we propose to internalize future-aware planning by training a single autoregressive model to verbalize both a prospective state rollout and a plan-conditioned success estimate-a textual analogue of the Q-value. Crucially, we identify a format-capability gap: simply fine-tuning agents on look-ahead traces during post-training leads to superficial mimicry of foresight without genuine predictive grounding. To bridge this gap, we introduce a three-stage training paradigm: (i) World Model Agentic Mid-Training (WM-AMT) to inject latent predictive capabilities into the policy; (ii) Format-Eliciting SFT (FE-SFT) to structure this injected capability; and (iii) Foresight-Conditioned Reinforcement Learning (FC-RL) to refine the calibration and utility of the generated simulations. Evaluated on search and mathematical reasoning tasks, our approach consistently outperforms other training baselines. Our results demonstrate that effective internal world modeling in LLM agents requires a capability-first training pipeline to achieve grounded and calibrated foresight.

AI Infrastructure & Compute10 articles
Editor's pickTechnology
Arxiv· Today

AI-Model Network: Concept, Current State and Future

arXiv:2606.27382v1 Announce Type: new Abstract: While the primary function of computers lies in computation and processing, the core value of the Internet is rooted in sharing and collaboration. Computers create the Internet, and the Internet empowers the value of computers. The rapid development of the Internet, cloud computing, and big data is pushing artificial intelligence into the era of large models (LMs). However, the practical application of LMs is currently hindered by high training costs and deployment complexities, driving a shift toward lightweight, private, and domain-specific models. With the rapid proliferation and wide distribution of heterogeneous models, enabling effective interaction and collaboration among them has emerged as a critical bottleneck that urgently needs to be addressed in LM development. Drawing inspiration from the development of the Internet, this paper proposes the concept, vision, and system architecture of world wide AI-model network (AI-ModelNet). It is a novel paradigm that achieves interconnection, capability sharing, and collaborative reasoning by establishing pathways between models. We first briefly review the current state of single-model and multi-model research. Subsequently, the systemic vision and hierarchical architecture of AI-ModelNet are articulated, followed by validation of the framework's feasibility through a prototype system and diverse application cases. Finally, key directions for future research are discussed preliminarily.

Editor's pickTechnology
Cyber News Centre· Yesterday

Broadcom and Blackstone Launch $35B AI Compute Platform | Cyber News Centre

Broadcom’s $35 billion AI XPV Platform with Apollo and Blackstone signals that frontier AI growth is now constrained by power, silicon and data center capacity, not algorithms, pulling private equity, energy policy and infrastructure regulation into the core of the AI race.

Editor's pickEnergy & Utilities
Artificial Intelligence Newsletter | June 29, 2026· 3 days ago

China's five-year energy plan backs AI integration, green power for data centers

China's 15th Five-Year Plan aims to expand 'AI Plus' in the energy sector by integrating computing infrastructure with energy planning and renewable energy bases.

Editor's pickTechnology
The Straits Times· Yesterday

AI’s next frontier? China and the US look to space in the computing race | The Straits Times

Often described as orbital data centres or space computing, the emerging field has become a hot topic in the AI and commercial space sectors. Read more at straitstimes.com.

Editor's pickTechnology
Daily Brew· Yesterday

DSpark: Speculative decoding accelerates LLM inference

A new paper introduces DSpark, a method for speculative decoding designed to accelerate large language model inference.

AI Models & Capabilities15 articles
Editor's pickTechnology
Ethan Mollick· Yesterday

AI model routers misallocate intelligence for qualitative tasks, reducing economic value

Ethan Mollick argues that current AI model routers systematically underestimate the difficulty of non-math/coding tasks like innovation and marketing, assigning weaker models to such work. Since qualitative tasks often benefit most from frontier models, this routing flaw could erode the economic value of AI adoption in business-critical areas.

Editor's pickTechnology
Reuters· Yesterday

Google limits Meta’s use of its Gemini AI models, FT reports

Global pressures from rising public debt to financial fragilities and the sustainability of ​the AI boom are increasing risks, underscoring the need for disciplined policymaking, according to the Bank for International ‌Settlements.

Editor's pick
Arxiv· Today

LLM Agents as Static Level-k Players in Behavioural Games

arXiv:2606.27845v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly used as stand-ins in behavioural games. These stand-ins rely on the assumption that the LLM's distribution of choices meaningfully matches how humans play the same game. This study tests that assumption through two games. The first is a p-beauty contest, and the second one is a public goods game. The study first investigates five local-model settings within the same model family. These settings are varied together in a 360-cell factorial, which balances temperature, scale (0.5-32B), quantisation, instruct vs base, and framing. Each cell's distribution is then compared against whole choice distributions in published human data. Each deployment setting, except for quantisation, governs a different aspect of fidelity. Mechanically, while the dispersion of human players can be somewhat recovered through deployment settings, the strategic process behind it cannot. Through the lens of the level-k cognitive theory, we find that LLMs act as static, category-retrieved level-k players, where k is set by the model scale. The models also do not run within-game belief-updating or backward induction throughout multiple-round horizon settings. While human contributions decayed in the public goods game, LLMs stayed flat or rose at every scale. When the horizon test was administered, LLMs were more cooperative under an indefinite horizon compared to a finite one. However, LLMs ignore their relative round position, so no last-round defection was displayed. This implies that LLMs retrieved levels relative to the horizon category rather than working out iteratively from the specific game setting.

Editor's pickTechnology
Arxiv· Today

Grounded Iterative Language Planning: How Parameterized World Models Reduce Hallucination Propagation in LLM Agents

arXiv:2606.27806v1 Announce Type: new Abstract: World models for language agents come in two useful forms. An agent-based world model calls an LLM API and reasons flexibly in language, but its errors appear as hallucinated state changes that are hard to score with ordinary regression losses. A parameterized world model is a trained transition predictor; its errors are easier to measure with quantities such as NodeMSE, delta accuracy, and validity accuracy, but it is usually weaker as a standalone planner. We compare these two families on four graph-structured planning benchmarks and introduce operational hallucination metrics for the agent-based case. The comparison motivates \textbf{Grounded Iterative Language Planning} (GILP), which trains only a small parameterized backbone and combines it with API-based agent reasoning. The backbone supplies valid actions, predicted state deltas, risk, and value; the LLM drafts an action and imagined delta; and a consistency gate asks for revision when the two disagree. On real GPT-4o-mini calls, GILP reduces hallucinated-state rate from 0.176 to 0.035. In calibrated simulator ablations, it raises success from 0.668 to 0.838 while adding only ~22% extra LLM calls.

Editor's pickTechnology
Arxiv· Today

ATOD: Annealed Turn-aware On-policy Distillation for Multi-turn Autonomous Agents

arXiv:2606.27814v1 Announce Type: new Abstract: Training small language-model agents for long-horizon interactive tasks requires both fast imitation and reward-driven improvement. On-policy distillation (OPD) provides dense teacher guidance and typically improves rapidly in the early stage, but its gains saturate once the student approaches the teacher, limiting the final performance ceiling. Reinforcement learning (RL) directly optimizes environment rewards and encourages exploratory improvement toward a higher reward-defined ceiling, but sparse and delayed feedback makes early-stage learning much less efficient than OPD. In this paper, we propose ATOD (Annealed Turn-aware On-policy Distillation), a hybrid online distillation algorithm that explicitly exploits this complementarity. (1) ATOD uses an annealed OPD-RL schedule: OPD dominates early training to approach teacher-level behavior, while RL is gradually strengthened to drive reward-based exploration. (2) ATOD introduces Turn-level Disagreement-Uncertainty Reweighting (T-DUR), which softly amplifies high-utility turns and improves dense supervision in long trajectories. Experiments on ALFWorld, WebShop, and Search-QA show that ATOD consistently outperforms competing post-training baselines: across the three student sizes, ATOD improves average success rate by 3.03 points over OPD and 23.62 points over GRPO, while surpassing the corresponding teacher models by 2.16 points.

Editor's pickTechnology
Ethan Mollick· Yesterday

Chinese open-weight model GLM-5.2 nears GPT-4 class, but frontier gap persists

Ethan Mollick notes that GLM-5.2, the latest Chinese open-weight model, is solid and comparable to earlier GPT tiers, but falls short of the current frontier. He estimates that next-generation Mythos-class models may arrive in 6–12 months if release restrictions allow. The continued pace of open-weight progress has implications for competitive dynamics and compute geopolitics.

Editor's pickEducation
Arxiv· Today

Verifiable Geometry Problem Solving: Solver-Driven Autoformalization and Theorem Proposing

arXiv:2606.27926v1 Announce Type: new Abstract: Geometry Problem Solving have increasingly adopt the neuro-symbolic paradigm, combining neural intuition with symbolic rigor. However, current frameworks suffer from severe bottlenecks in two core stages: autoformalization, which treats multimodal translation as a static task decoupled from downstream solver compatibility, and theorem prediction, where solvers frequently hit a deductive impasse due to fixed rule libraries. To address these, we propose SD-GPS, a solver-driven framework that treats the symbolic solver as an execution oracle throughout both formalization and deduction. First, Solver-Driven Autoformalization unifies supervised formal-language adaptation and solvability-guided reinforcement learning into a single module built on QwenVL3-2B, making executability the central training signal. Second, Verified Theorem Proposing introduces an impasse-aware agent that proposes local auxiliary lemmas from current proof states, ensuring soundness by filtering all proposals through symbolic verification. Empirical evaluations on Geometry3K and PGPS9K demonstrate that SD-GPS consistently outperforms existing MLLM, neural, and neuro-symbolic methods across standard completion, multiple-choice, and cross-modal reference regimes, proving that closing the loop between multimodal perception and symbolic execution significantly improves geometric reasoning, offering profound insights into how neural agents can be grounded by formal systems to achieve verifiable problem-solving capabilities.

Editor's pick
Arxiv· Today

Helpfulness Hurts: Domain-Dependent Degradation of Mid-Trained Compassion Values Under Post-Training

arXiv:2606.26102v2 Announce Type: replace-cross Abstract: Standard post-training pipelines apply supervised fine-tuning (SFT) and reinforcement learning (RL) to make language models helpful, but these processes may inadvertently degrade values instilled during pre-training. We investigate whether the domain of post-training data differentially affects the retention of animal compassion values in a Llama 3.1 8B model mid-trained on compassion-oriented synthetic data, using both SFT (helpfulness via Dolly-15k vs. coding via Magicoder-110K) and GRPO (helpfulness via RLHFlow vs. coding via Magicoder), evaluated on the ANIMA 2.2 benchmark and MORU benchmark (Moral Reasoning Under Uncertainty). Helpfulness training significantly degrades animal compassion relative to coding training on ANIMA (SFT: 35.7% vs. 65.2%; GRPO: 18.7% vs. 32.0%), replicating across two independent helpfulness datasets and two training paradigms. On English MORU items, helpfulness training degrades general moral reasoning by 25.5 percentage points (46.4% vs. 71.9%), a striking gap that rivals the compassion effect in magnitude. However, this effect does not transfer cross-lingually: on the multilingual MORU benchmark, the domain effect disappears (SFT: 52.3% vs. 51.2%). In contrast, the animal compassion effect transfers consistently across languages, with Magicoder's ANIMA percentage-point gain over the base model 4.5 times larger on non-English items than English items. This divergence suggests that values instilled through mid-training are encoded more deeply and cross-lingually than reasoning improvements from domain-specific post-training. These results suggest that, for labs building on value-laden mid-training, coding-domain post-training may better preserve mid-trained values than helpfulness post-training without harming general reasoning capabilities.

Editor's pickTechnology
Artificial Intelligence Newsletter | June 29, 2026· 3 days ago

OpenAI limits model release following latest US government intervention

OpenAI is limiting the release of its GPT-5.6 models at the request of the White House, marking the second US government intervention in two weeks regarding frontier AI models.

Editor's pickTechnology
Artificial Intelligence Newsletter | June 29, 2026· 3 days ago

OpenAI announces 'limited preview' of GPT-5.6 at White House's request

OpenAI is launching a limited preview of its GPT-5.6 generative AI models for trusted partners at the request of the White House.

Editor's pick
Arxiv· Today

Understanding Rollout Error in Graph World Models

arXiv:2606.27780v1 Announce Type: new Abstract: World models are often used for planning by rolling learned dynamics forward. Many planning environments, however, are not vectors or images; they are graphs of agents, tools, skills, routes, and dependencies. In these settings, a local prediction error may stay local or spread through the graph, and the failure mode changes again when edges are predicted rather than fixed. This paper studies long-horizon rollout error in Graph World Models (GWMs). We formulate a unified fixed-edge and dynamic-edge GWM framework with action nodes for node-, edge-, and graph-level decisions. We develop graph-valued rollout bounds that separate topology-induced amplification from model-induced amplification, and we introduce a joint node-edge operator for dynamic-edge rollouts. Guided by the analysis, we propose Error-Aware GWM, which combines spectral regularization, rollout consistency, and critical-node weighting. Across synthetic topologies and heterogeneous agent-graph testbeds, rollout error and planning regret grow with horizon, dynamic-edge training is needed when structure evolves, and Error-Aware GWM prevents long-horizon divergence while preserving prediction accuracy. Real-world graph benchmarks clarify the scope of GWMs: they are most useful for dynamic graph rollout and agent planning, while specialized graph models remain strong on static or sparse prediction tasks.

Editor's pick
Arxiv· Today

Hybrid Fact-Checking that Integrates Knowledge Graphs, Large Language Models, and Search-Based Retrieval Agents Improves Interpretable Claim Verification

arXiv:2511.03217v2 Announce Type: replace-cross Abstract: Large language models (LLMs) excel in generating fluent utterances but can lack reliable grounding in verified information. At the same time, knowledge-graph-based fact-checkers deliver precise and interpretable evidence, yet suffer from limited coverage or latency. By integrating LLMs with knowledge graphs and real-time search agents, we introduce a hybrid fact-checking approach that leverages the individual strengths of each component. Our system comprises three autonomous steps: 1) a Knowledge Graph (KG) Retrieval for rapid one-hop lookups in DBpedia, 2) an LM-based classification guided by a task-specific labeling prompt, producing outputs with internal rule-based logic, and 3) a Web Search Agent invoked only when KG coverage is insufficient. Our pipeline achieves an F1 score of 0.93 on the FEVER benchmark on the Supported/Refuted split without task-specific fine-tuning. To address Not enough information cases, we conduct a targeted reannotation study showing that our approach frequently uncovers valid evidence for claims originally labeled as Not Enough Information (NEI), as confirmed by both expert annotators and LLM reviewers. With this paper, we present a modular, opensource fact-checking pipeline with fallback strategies and generalization across datasets.

Editor's pick
Arxiv· Today

MER-R1: Multimodal Emotion Reasoning via Slow-Fast Thinking Synergy

arXiv:2606.27652v1 Announce Type: new Abstract: We find that explicit reasoning does not necessarily translate into better multimodal emotion recognition (MER) accuracy, even though it makes predictions more interpretable. Specifically, for reasoning-based MLLMs, fast thinking by triggering direct answers often outperforms slow thinking after deliberative reasoning. Our empirical analyses show that fast thinking improves recall with broader and more confident predictions, whereas slow thinking favors precision through conservative filtering of incorrect categories. Building on these insights, we propose MER-R1, a reinforcement learning framework that turns slow-fast complementarity into explicit optimization. Dual-objective disentanglement separates recall and precision into two optimization signals, allowing them to be jointly optimized rather than traded off against each other. Slow-fast confidence calibration further aligns the final slow-thinking answer with fast-thinking intuition, strengthening correct emotions while suppressing incorrect ones. In this way, MER-R1 unifies the recall-oriented intuition of fast thinking with the precision-oriented selectivity of slow thinking. We further provide theoretical justification for this synergy, showing that it mitigates variance-induced interference during optimization. Extensive experiments on MER-UniBench and MME-Emotion show that MER-R1 achieves state-of-the-art performance and makes reasoning genuinely benefit emotion recognition.

Editor's pick
Arxiv· Today

RelBall: Relation Ball with Quaternion Rotation for Knowledge Graph Completion

arXiv:2606.27967v1 Announce Type: new Abstract: Real-world knowledge graphs are often incomplete, lacking many valid facts. Knowledge Graph Completion (KGC) aims to predict missing links using known triples, thereby enhancing graph coverage. A key challenge is modeling diverse relational patterns such as symmetry, antisymmetry, inversion, composition and semantic hierarchy. Existing models such as RotatE can capture symmetric, antisymmetric, inverse, and commutative composition patterns, yet struggle with non-commutative composition. Rotate3D addresses this by introducing non-commutativity via three-dimensional rotations, but still fails to capture the semantic hierarchies prevalent in knowledge graphs. Moreover, both models cannot effectively model one-to-many relations. To overcome these limitations, we propose RelBall, which extends Rotate3D with two innovations. First, our model introduces modulus transformation to model hierarchies, driving abstract concepts toward smaller moduli and concrete instances toward larger ones. Second, it introduces a tail-centric relation ball to model one-to-one, one-to-many, many-to-one, and many-to-many relations. RelBall offers the following advantages: (1) coverage of all relational patterns, including the ones mentioned above; (2) an interpretable hierarchical representation where the modulus directly reflect semantic levels; (3) support for one-to-one, one-to-many, many-to-one, and many-to-many relations. Experiments on multiple datasets demonstrate RelBall's competitive link prediction performance against various baselines.

Editor's pickTechnology
Daily Brew· Today

GLM 5.2 beats Claude in our benchmarks

Semgrep reports that their new GLM 5.2 model has outperformed Claude in internal cybersecurity benchmarks.

AI Security & Cybersecurity3 articles
Editor's pickTechnology
VentureBeat· Yesterday

Prompt injection is exploiting enterprise AI's biggest design flaws by targeting agents, RAG pipelines and model routers

In the past two years, businesses have been trying to fit large language models (LLMs) into support, analytics, development, and internal automation like never before. Along with the increasing adoption of AI technology, another trend is gaining momentum — cybercriminals are taking advantage of the disconnect between assumptions about LLMs and their actual characteristics. In 2025 and 2026, several independent sources have highlighted the same trend: Prompt injection remains one of the most impactful and widely demonstrated attack vectors against LLM systems. The OWASP LLM Top 10 (2025) lists prompt injection as LLM01, identifying it as the most critical category of LLM‑specific vulnerabilities, for the second consecutive edition. OWASP's ranking reflects the fact that LLMs still struggle to reliably separate instructions from data, making them susceptible to manipulation through crafted inputs. CrowdStrike's 2026 Global Threat Report — built on frontline intelligence across more than 280 tracked adversaries — documented that threat actors injected malicious prompts into legitimate generative AI tools at more than 90 organizations in 2025. They then used those injections to generate commands that stole credentials and cryptocurrency. The report stated it plainly: "Prompts are the new malware." AI-enabled adversaries increased their overall attack volume by 89% year-over-year, with prompt injection working as both an entry point and a force multiplier. Real‑world incidents illustrate the operational impact. In August 2024, researchers at PromptArmor disclosed a prompt injection vulnerability in Slack AI that allowed an attacker to exfiltrate data from private Slack channels they had no access to — including API keys shared in private developer channels — by placing a malicious instruction in a public channel or embedding it in an uploaded document. In June 2025, researchers at Aim Security disclosed EchoLeak (CVE-2025-32711, CVSS 9.3), the first documented zero-click prompt injection exploit against a production AI system, targeting Microsoft 365 Copilot. By sending a single crafted email, no user interaction required, an attacker could cause Copilot to access internal files and transmit their contents to an attacker-controlled server. Both vulnerabilities were patched. These incidents underscore the fact that prompt injection is not a theoretical weakness but a practical, repeatable threat organizations must address as they deploy AI systems at scale. Prompt injection techniques have undergone major evolutions over recent years, now targeting multi-agent architecture, retrieval-augmented generation (RAG) pipelines, model routers, and long-term memory capabilities. The enterprise challenge: Too much trust Businesses deploy LLMs to process instructions, summarize information, and trigger automated workflows, but it is difficult for LLMs to tell: Instructions from data Information from context Context from metadata User intent from metadata This creates an opportunity for attackers to manipulate and influence the model's behavior, either directly or indirectly. Modern prompt injection Cross-model prompt injection LLM use is a common practice among enterprises. Attackers corrupt the output of a particular model, knowing well that other models would be processing the content. Hence, the corruption propagates through all AI systems. RAG supply chain poisoning Attackers create malicious information — documentation, blog articles, GitHub READMEs. Then they wait until this malicious information is ingested in enterprises' RAG pipelines, then use it as an attack vector. Agent hijacking AI agents have evolved to the point where they can send emails, modify cloud infrastructure, execute code snippets, and interact with internal corporate systems. It takes just a single instruction to make agents act differently in a harmful manner. Context overflow attacks With the help of million-token context windows, attackers place malicious code within the document and hope that an LLM will stumble upon it and execute it, thus overriding all previous instructions. Memory poisoning Due to the implementation of long-term memory in LLMs, attackers can inject instructions that permanently reconfigure their state. Model‑router manipulation Enterprises increasingly use model routers to select between multiple LLMs. Attackers craft prompts that force routing to the weakest or least‑guarded model. Why this matters for business leaders Prompt injection is not a theoretical problem. It directly affects: Customer‑facing systems (chatbots, support agents) Internal copilots (developer tools, security assistants) Automation workflows (ticketing, cloud operations, HR processes) Data governance (RAG pipelines, knowledge bases) The risk is no longer limited to "the model said something it shouldn't." In 2026, prompt injection can: Trigger unauthorized actions Leak sensitive data Corrupt internal workflows Manipulate analytics Alter business logic Compromise multi‑agent systems The attack surface has expanded dramatically. What enterprises should do now 1. Constrain model permissions Limit what the model can do, not just what it should do. 2. Segment untrusted content Treat all external data — including RAG sources — as potentially hostile. 3. Monitor tool invocation Require human approval for high‑impact actions. 4. Validate content provenance Ensure RAG pipelines don't ingest poisoned external content. 5. Harden model routers Prevent attackers from forcing routing to weaker models. 6. Treat LLMs as untrusted components This mindset shift is the foundation of modern AI security. The bottom line Prompt injection remains the most effective way to compromise enterprise AI systems because it exploits the fundamental way LLMs interpret text. Until organizations treat LLMs as untrusted interpreters — not autonomous decision‑makers — prompt injection will continue to dominate the AI threat landscape. Julie Brunias is an AI Security Architect.

Editor's pick
Arxiv· Today

Towards Reliable and Robust LLM Planning: Symbolic Feedback-Driven Iterative Self-Refinement Framework

arXiv:2606.27757v1 Announce Type: new Abstract: Large language models (LLMs) have attracted widespread attention from academia and industry, yet their deployment raises critical security concerns regarding robustness and reliability. Planning, a core component of intelligent behavior, remains challenging for LLMs, which often produce infeasible or incorrect solutions in long-horizon decision-making tasks due to inherent complexity. In this paper, we propose a symbolic feedback-driven iterative self-refinement framework to enhance the robustness and reliability of LLMs in long-horizon planning. Specifically, a natural language prompting mechanism is introduced to map logical symbols into natural language descriptions, enabling LLMs to better capture task constraints and semantics. We further design a symbolic verifier that identifies errors and converts them into corrective instructions interpretable by the LLM, thereby guiding self-refinement. In addition, we leverage a plan recognizer to infer goal reachability, facilitating more effective guidance toward desired goals. Empirical results demonstrate that the proposed framework consistently improves both feasibility and correctness in long-horizon planning tasks. This highlights its effectiveness in enhancing the reliability of LLM-based planning and potential to enable more trustworthy AI systems.

Adoption, Deployment & Impact

15 articles
AI Applications5 articles
AI Measurement & Evaluation2 articles
Editor's pickMedia & Entertainment
Arxiv· Today

When AI Deceives: A Natural Experiment on the Causal Effects of Perceived Deception on Player Ratings in RPGs

arXiv:2606.27689v1 Announce Type: new Abstract: AI-driven deception mechanisms are increasingly prevalent in digital games, yet the direction and magnitude of their effects on player experience remain contested. Existing research has not sufficiently disentangled designer-intended deception intensity from players' actual perception of deception, and most prior work relies on low-ecological-validity experiments or cross-sectional surveys. The present study aims to independently examine the causal effects of design deception intensity (DDI) and player deception awareness (PDA) on player ratings within a naturalistic gaming environment, and to investigate the moderating role of player experience. Leveraging the 54 version updates of Baldur's Gate 3 between 2019 and 2025 as a quasi-natural experiment, it collected all English-language Steam reviews posted within 1 to 28 days following each update, and constructed a player-version two-way fixed effects panel dataset. DDI was coded by human annotators based on patch notes; PDA was extracted and aggregated from review texts using a fine-tuned BERT classifier. The model incorporated both player and version fixed effects, complemented by five robustness checks including subsample partitioning, lagged variables, and placebo tests. PDA exerts a monotonic negative effect on positive review rates: within the observed PDA range, the net loss in review valence is approximately 0.4 percentage points, with a negative quadratic term that falsifies the inverted-U hypothesis of moderate perception optimality. DDI exhibits a U-shaped effect with an inflection point at a relatively low intensity, although the upward trend on the right branch is primarily driven by contemporaneous new content bundled with high-intensity updates. Any degree of deception awareness undermines player evaluations, while the positive manifestation of design intensity depends on content-confounding effects.

Editor's pickEducation
Arxiv· Today

Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction

arXiv:2606.28186v1 Announce Type: cross Abstract: Predicting human item difficulty is central to educational assessment, where reliable estimates support fairness and effective test construction. Existing methods often depend on costly human calibration or item-level textual representations, providing limited evidence about the cognitive processes that make items difficult. We argue that difficulty should be viewed not only as a property of item text, but also as an observable consequence of the problem-solving burden an item induces. Large Reasoning Models (LRMs) offer scalable process evidence through reasoning traces, but such evidence must be structured to support interpretable modeling. To this end, we introduce Epi2Diff (Episode to Difficulty), a framework that maps LRM reasoning traces into cognitively grounded episode sequences. These episodes group trace segments into functional problem-solving states, enabling difficulty to be modeled through reasoning scale, effort allocation, and state transitions. Epi2Diff extracts compact episode-dynamic features and combines them with semantic item representations for human difficulty prediction. Experiments on four real-world human difficulty datasets show that Epi2Diff consistently outperforms strong baselines, including fine-tuned small language models, LLM in-context learning, and supervised LLM adaptation. On SAT-derived classification benchmarks, Epi2Diff achieves an 8.1% average relative gain over supervised LLM fine-tuning baselines. Further analyses show that harder items induce more effortful, iterative, and implementation-centered episode dynamics, rather than merely longer responses. These results demonstrate that cognitive episodes in LRM reasoning traces provide a predictive and interpretable process representation for human item difficulty, offering a new lens for educational measurement with reasoning models.

Geopolitics, Policy & Governance

21 articles
AI Geopolitics5 articles
AI Policy & Regulation12 articles
Editor's pickPAYWALLTechnology
Bloomberg· Today

Anthropic's Mythos 5 Cleared by US for Wider Use

Anthropic PBC won US approval to restore some access to its powerful Mythos 5 artificial intelligence model, after resolving concerns from Donald Trump's administration about the technology’s potential threats to national security. Bloomberg's Neil Campling reports. (Source: Bloomberg)

Editor's pickMedia & Entertainment
Artificial Intelligence Newsletter | June 29, 2026· 3 days ago

Industry weighs in on EU copyright reform

A major consultation on EU copyright reform has concluded, with the European Commission now preparing to address how copyright law should apply in the age of AI.

Editor's pick
Artificial Intelligence Newsletter | June 29, 2026· Today

GDPR — not AI Act — delayed release of frontier AI in Europe, research shows

Research from the Centre for the Governance of AI indicates that GDPR, rather than the EU AI Act, has been the primary regulatory factor delaying the release of frontier AI models in Europe.

Editor's pick
Daily Brew· Today

Global Divide on AI Governance: Ban Superintelligence or Foster Deliberate Policy?

Debate is intensifying over how to govern advanced AI, with proposals ranging from international bans on superintelligence to comprehensive global policy frameworks.

Editor's pickGovernment & Public Sector
House of Commons Library· Yesterday

AI regulation in the UK - House of Commons Library

This briefing provides an introduction to artificial intelligence (AI) and how it is regulated in the UK.

Editor's pick
Artificial Intelligence Newsletter | June 29, 2026· 3 days ago

DOJ, Mississippi recast xAI pollution suit as fight over citizen enforcement

The Trump administration and Mississippi are attempting to shift the focus of a lawsuit against xAI from Clean Air Act violations to a broader legal battle over federalism and citizen standing in court.

Editor's pickDefense & National Security
Artificial Intelligence Newsletter | June 29, 2026· 3 days ago

Anthropic risk designation 'punishment,' former national security officials say

Former government officials argued in a legal brief that the US administration's designation of Anthropic as a security risk is pretextual and intended to punish the company.

Editor's pickMedia & Entertainment
Daily Brew· Today

AI-Driven Content Boom in India Faces Copyright Challenges Amid Legal Uncertainties

Indian entertainment firms are adopting AI for content creation, but face legal hurdles as current copyright laws require human authorship, leaving AI-generated works vulnerable.

Editor's pickTransportation & Logistics
Top Daily Headlines: Security boss thought MFA would be too much security· Today

US auto regulators want to kill robotaxi brake pedals

The NHTSA suggests that requiring human brake controls in driverless vehicles hinders innovation.

Editor's pick
Artificial Intelligence Newsletter | June 29, 2026· 3 days ago

Japan set to adopt social media election rules

Japanese lawmakers approved a bill to curb misleading AI-generated content in elections, requiring disclosure and mitigation measures from large social media platforms.

Editor's pickTechnology
Daily Brew· Today

The KIDS Act would require age checks to get online

The EFF analyzes the proposed KIDS Act, which would mandate age verification for online access, raising significant privacy and civil liberty concerns.

Editor's pickConsumer & Retail
Reuters· Yesterday

Reuters AI News | Latest Headlines and Developments | Reuters

Explore the latest artificial intelligence news with Reuters - from AI breakthroughs and technology trends to regulation, ethics, business and global impact.

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.