AI Intelligence Brief

Tue 12 May 2026

Daily Brief — Curated and contextualised by Best Practice AI

163Articles
Editor's pickEditor's Highlights

Korea Proposes AI Dividend, OpenAI Invests Billions, and Gartner Finds No ROI

TL;DR South Korea is considering a 'citizen dividend' funded by AI profits, reflecting pressure to share gains from tech giants like Samsung. OpenAI is launching a new unit with $4 billion to boost corporate AI capabilities, while a Gartner study reveals that automation-driven layoffs are not improving ROI. Google reports AI being used by hackers to find software flaws, signaling new cybersecurity challenges.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

6 of 163 articles
Lead story
Editor's pickPAYWALL
Bloomberg· Today

Korea Roils Market by Floating ‘Citizen Dividend’ from AI

A top South Korean policymaker said the nation should pay citizens a “dividend” using taxes on AI profits, underscoring growing pressure to redistribute gains from a boom that’s enriched chipmakers like Samsung Electronics Co. and SK Hynix Inc.

Editor's pick
Fortune· Yesterday

AI isn't paying off in the way companies think. Layoffs driven by automation are failing to generate returns, study finds | Fortune

A Gartner study found that while 80% of companies surveyed reported workforce reductions, there was no correlation to higher ROI.

Editor's pickTechnology
Arxiv· Today

Generative AI Fuels Solo Entrepreneurship, but Teams Still Lead at the Top

arXiv:2605.10291v1 Announce Type: new Abstract: Recent advances in generative artificial intelligence (AI) are reshaping who enters entrepreneurship, but not who reaches the top of the quality distribution. Using data on over 160,000 product launches on Product Hunt, we find that entrepreneurial entry increased sharply following the public release of ChatGPT-3.5, driven disproportionately by solo entrepreneurs. This shift toward solo entry is particularly pronounced in categories that historically favored team-based ventures. However, much of this growth reflects low-commitment, experimental entry and does not translate into greater representation among the highest-quality outcomes. Team-based ventures are increasingly dominant in the top tiers of platform rankings. These findings suggest that generative AI lowers barriers to solo entrepreneurship while reinforcing team-based advantages.

Economics & Markets

37 articles
AI Investment & Valuations12 articles
Editor's pickPAYWALLTechnology
WSJ· Today

Sam Altman’s Business Dealings Under GOP Scrutiny Ahead of OpenAI’s IPO

The Republican-led House Oversight Committee says it is investigating, and six GOP state attorneys general are calling for SEC review after a WSJ article.

Editor's pickPAYWALLManufacturing & Industrials
FT· Today

Will investors embrace China’s humanoid robot champion?

Unitree aims to go public later this year in a crucial test for android industry

Editor's pickPAYWALLTechnology
FT· Yesterday

How AI mania is disguising big companies’ hit from Iran war — in charts

Biggest groups have gained $5.4tn in value since conflict began — but semiconductor sector accounts for most of the gains

Editor's pickPAYWALL
Bloomberg· Today

Why AI Matters More Than Iran War in Markets

Global equities have rallied to record levels, fuelled by euphoria over artificial intelligence. Investors' risk appetite has held up despite volatility in energy markets caused by the war in Iran. Bloomberg Markets Live Executive Editor Mark Cudmore and Bloomberg TV Markets Producer Anthony Stephens discuss. (Source: Bloomberg)

Editor's pickTechnology
Quantum Zeitgeist· Yesterday

Nscale Adds $790M To Fund 115MW AI Infrastructure Growth

Nscale Adds $790M to Fund 115MW AI Infrastructure Growth · Light’s Entangled State Steers Energy Flow in Novel Material · Algorithmiq Secures €18M to Industrialize Quantum Algorithms · Integrated Quantum Emitters Enable Stable, On-Demand Single Photons for Communication · Multiverse Computing ...

Editor's pickFinancial Services
InvestmentNews· Yesterday

AI M&A surges as software captures nearly three-quarters of North American deals - InvestmentNews

S&P Global report shows AI transactions hit record levels, with software dominating investor interest.

Editor's pickTechnology
TipRanks· Yesterday

OpenAI’s $6.6 Billion Employee Payday Signals a Bigger AI Wealth Boom - TipRanks.com

OpenAI, the private AI firm behind ChatGPT, has given some of its staff a large cash win before the company even goes public. According to a Wall Street Journal rep...

Editor's pickFinancial Services
PitchBook· Yesterday

LPs fight tooth and nail for foundational AI co-investment share - PitchBook

Competition for deals involving the largest pre-IPO AI companies, like OpenAI and Anthropic, is separating the strongest LPs from the weakest.

Editor's pickTechnology
Artificial Intelligence Newsletter | May 12, 2026· Yesterday

Microsoft CEO defends investment in OpenAI as not muddying nonprofit mission

Microsoft CEO Satya Nadella defended the company's $13 billion investment in OpenAI during a California jury trial regarding claims that the investment breached OpenAI's charitable trust.

Editor's pickPAYWALLTechnology
Bloomberg· Yesterday

Software Firm ServiceNow Plans to Raise $4 Billion in Bond Sale

Software company ServiceNow Inc. is looking to raise about $4 billion from a potential US high-grade bond sale tied to the software firm’s recent acquisitions.

Editor's pickEnergy & Utilities
Energy-Storage.News· Yesterday

Energy Vault reaffirms guidance, sets sights on AI infrastructure - Energy-Storage.News

Energy Vault has released its Q1 2026 financials, showing expansion in AI infrastructure activities, and operations in Australia and Japan.

Editor's pickPAYWALLTechnology
Bloomberg· Yesterday

S&P Rises as Chipmakers Lift Stocks | The Close 5/11/2026

Bloomberg Television brings you the latest news and analysis leading up to the final minutes and seconds before and after the closing bell on Wall Street. Today's guests are Bank of America Securities' Jill Carey Hall, Brown Harris Stevens CEO Bess Freedman, Jeffries' Brian Tanquilut, Grain CEO & Founder David Grain, Cambria Investment Management's Mebane Faber, Hirtle & Co.'s Founder & Executive Chair Jon Hirtle, Oliver Wyman Partner Daniel Tannebaum, 55/Redefined Group's CEO Lyndsey Simpson & KPMG Chief Economist Diane Swonk. (Source: Bloomberg)

AI Macroeconomics6 articles
Editor's pickPAYWALL
Bloomberg· Today

Korea Roils Market by Floating ‘Citizen Dividend’ from AI

A top South Korean policymaker said the nation should pay citizens a “dividend” using taxes on AI profits, underscoring growing pressure to redistribute gains from a boom that’s enriched chipmakers like Samsung Electronics Co. and SK Hynix Inc.

Editor's pick
Arxiv· Today

Statistical Model Checking of the Keynes+Schumpeter Model: A Transient Sensitivity Analysis of a Macroeconomic ABM

arXiv:2605.10447v1 Announce Type: cross Abstract: Agent-based models (ABMs) are increasingly used in macroeconomics, but their analysis still often relies on ad hoc Monte Carlo campaigns with heterogeneous statistical effort across parameter settings. We show how statistical model checking (SMC), implemented through MultiVeStA, can provide a principled analysis layer for a realistic macroeconomic ABM without rewriting the simulator in a dedicated formalism. Our case study is the heuristic-switching Keynes+Schumpeter(K+S) model, analysed hrough a transient sensitivity campaign over one-parameter sweeps, two macro observables (unemployment and GDP growth), and one auxiliary micro-level probe (market share) on the post-warmup phase of a 600-step horizon. The analysis is driven by reusable temporal queries, observable-specific precision targets, and confidence-based stopping rules that automatically determine the simulation effort required by each configuration. Results show a clear contrast across parameter families: macro-financial and structural sweeps produce the strongest transient effects, whereas several heuristic-rule sweeps remain much weaker under the same precision policy. More broadly, the paper shows that SMC can support reproducible and informative quantitative analysis of substantively rich economic ABMs, while making uncertainty estimates and simulation cost explicit parts of the reported results.

Editor's pickTechnology
Exponentialview· Yesterday

📈⏳ The broken bargain of Moore’s Law

The physics has been slowing for years. The economics may be catching up.

Editor's pickFinancial Services
Artificial Intelligence Newsletter | May 12, 2026· Today

US Fed revamping its infrastructure to cope with AI, legislative changes, Waller says

The US Federal Reserve is updating its infrastructure to address the challenges and opportunities presented by artificial intelligence and evolving legislative requirements.

Editor's pickEnergy & Utilities
ANI News· Yesterday

Energy, Compute and AI infrastructure will define India's next economic cycle, says Gautam Adani

Gautam Adani, Chairman of Adani Group, on Monday said India's next economic cycle will be defined by large-scale investments in energy, data centres, compute infrastructure and artificial intelligence (AI) ecosystems.

Editor's pick
Arxiv· Today

On the probability distribution of long-term changes in the growth rate of the global economy: An outside view

arXiv:2605.09182v1 Announce Type: new Abstract: Daniel Kahneman and Amos Tversky argued for challenging inside views (informed by contextual specifics) with outside views (based on historical "base rates" for certain event types). A reasonable inside view of the prospects for the global economy in this century is that growth will converge to 2.5%/year or less: population growth is expected to slow or halt by 2100; and as more countries approach the technological frontier, economic growth should slow as well. To test that view, this paper models gross world product (GWP) observed since 10,000 BCE or earlier, in order to estimate a base distribution for changes in the growth rate as a function of the GWP level. For econometric rigor, it casts a GWP series as a sample path in a stochastic diffusion whose specification is novel yet rooted in neoclassical growth theory. After estimation, most observations fall between the 40th and 60th percentiles of predicted distributions. The fit implies that GWP explosion is all but inevitable, in a median year of 2047. The friction between inside and outside views highlights two insights. First, accelerating growth is more easily explained by theory than is constant growth. Second, the world system may be less stable than traditional growth theory and the growth record of the last two centuries suggest.

AI Market Competition6 articles
Editor's pickTransportation & Logistics
Arxiv· Today

TourMart: A Parametric Audit Instrument for Commission Steering in LLM Travel Agents

arXiv:2605.10440v1 Announce Type: new Abstract: Online travel agents (Booking, Trip.com, Expedia) have replaced ranked-list interfaces with conversational LLM agents that compress many options into one sentence of advice. Each booking earns the OTA commission and different suppliers pay different rates: the agent has a structural incentive to favor higher-margin recommendations. Whether any deployed agent does this, and by how much, no one can currently measure. Disclosure banners, conversion A/B testing, UI dark-pattern taxonomies, and generic LLM safety scores were built for older interfaces and miss the prose-recommendation surface where the steering happens. We propose TourMart, an applied intelligent-system audit instrument for LLM-OTA commission governance. Two governance levers -- lambda (gain on message-induced perception in the traveler's accept/reject decision) and kappa (budget-normalized cap on how far the message can shift perceived welfare) -- drive a paired counterfactual: holding the traveler and bundle fixed, the steering delta is read off between a commission-aware prompt and a minimum-disclosure factual template. A symmetric six-gate producer audit separates LLM-engineering failures (template collapse, refusal, internal-ID leakage) from genuine commercial steering. At deployed (lambda=1, kappa=0.05), a Qwen-14B reader shows +7.69pp steering (exact McNemar p=0.003); a Llama-3.1-8B reader shows +3.50pp in the same direction at n=143, with an extended-n supplement (n=270) confirming significance (+2.96pp, p=0.008). Across the (lambda, kappa) grid both arms pass family-wise scenario-clustered correction (p<0.001 / p=0.008). TourMart outputs a sentence a compliance report can quote: "at this deployment, 7.7 extra commission-steered recommendations per 100 paired traveler sessions."

Editor's pickFinancial Services
Arxiv· Today

Manipulation, Insider Information, and Regulation in Leveraged Event-Linked Markets

arXiv:2605.10486v1 Announce Type: cross Abstract: The introduction of leverage on prediction-market event contracts raises three structurally distinct questions that have not been addressed jointly: how leverage changes manipulation incentives, how it interacts with informed-trading rents, and how regulatory frameworks should respond. This paper develops a theoretical framework for the first two and a synthesis of the existing regulatory landscape for the third. The principal analytical move is a two-axis manipulation taxonomy distinguishing market-price manipulation from real-world outcome manipulation, where the manipulator affects the underlying event itself. Continuous-underlying derivative markets generally do not make outcome manipulation a venue-level payoff channel; event-linked markets do. Within this taxonomy, leverage plays asymmetric roles: it scales market-price manipulation linearly but shifts the cost-benefit threshold for outcome manipulation, and it scales informed-trading rents in three ways (direct multiplication, Sharpe-ratio preservation, detection-cost amortization). Section 7 connects Paper 1's pre-emption and halt-protocol findings (CC-007b, CC-008) to three manipulation channels: pre-emption introduced by the dynamic-margin engine, halt-arbitrage introduced by the resolution-zone halt protocol, and strategic bad-debt-shifting that no engine in Paper 1's framework family addresses. The framework's manipulation-resistance contribution is a re-allocation of attack surface, not a net reduction. The regulatory synthesis covers principal jurisdictions (US, EU, UK, Singapore, offshore) and identifies three regulatory-arbitrage pathways. The paper concludes with 14 recommendations for venue operators, regulatory bodies, and the research community, separated into framework-independent and framework-conditional categories.

Editor's pickProfessional Services
Theregister· Yesterday

OpenAI can't have incompetent AI consultants ruining the market, so bought its own

By which we mean it bought someone else's with other people's money

AI Productivity5 articles
Editor's pickProfessional Services
Arxiv· Today

PLACO: A Multi-Stage Framework for Cost-Effective Performance in Human-AI Teams

arXiv:2605.08388v1 Announce Type: new Abstract: Human-AI teams play a pivotal role in improving overall system performance when neither the human nor the model can achieve such performance on their own. With the advent of powerful and accessible Generative AI models, several mundane tasks have morphed into Human-AI team tasks. From writing essays to developing advanced algorithms, humans have found that using AI assistance has led to an accelerated work pace like never before. In classification tasks, where the final output is a single hard label, it is crucial to address the combination of human and model output. Prior work elegantly solves this problem using Bayes rule, using the assumption that human and model output are conditionally independent given the ground truth. Specifically, it discusses a combination method to combine a single deterministic labeler (the human) and a probabilistic labeler (the classifier model) using the model's instance-level and the human's class-level calibrated probabilities.

Editor's pickTechnology
Theregister· Today

GitLab promises a different kind of layoff as biz pivots toward AI

Code hosting biz is trimming its global footprint and flattening its management layer

Editor's pick
Project Syndicate· Yesterday

Who Will Solve the AI Productivity Puzzle? by Robin Rivaton - Project Syndicate

Robin Rivaton shows that value lies in reorganizing overall production processes, not in improving individuals’ output.

AI Startups & Venture5 articles

Labor, Society & Culture

24 articles
AI & Culture1 articles
Editor's pick
Arxiv· Today

Political Plasticity: An Analysis of Ideological Adaptability in Large Language Models

arXiv:2605.08415v1 Announce Type: new Abstract: Since the advent of Large Language Models (LLMs), a significant area of research has focused on their intrinsic biases, particularly in political discourse. This study investigates a different but related concept, "political plasticity", which is defined as the capacity of models to adapt their responses based on the user supplied context. To analyze this, a testing framework was developed using an expanded corpus of 200 politically-oriented questions across economic and personal freedom axes, based on a prior framework by Lester (1996). The study explored several methods to induce political bias, including simplified and topic-based system prompts, as well as user prompts with few-shot examples. The results show that while system prompts were largely ineffective, user prompts successfully elicited significant ideological shifts, particularly along the Economic Freedom axis in larger and newer models. Through a validation experiment, we examined whether models answer questionnaires by recognizing the underlying question format. Inverting the sense of the questions revealed unexpected, counter-intuitive shifts in most models, suggesting potential data leakage. Finally, we also analyzed how model plasticity varies when the experiment is conducted in different languages. The results reveal subtle yet notable shifts across each of the analyzed languages. Overall, our results indicate that small and older LLMs exhibit limited or unstable political plasticity, whereas newer frontier models display reliable, expected adaptability.

AI & Employment6 articles
Editor's pickEducation
Arxiv· Today

The Division of Understanding: Specialization and Democratic Accountability

arXiv:2604.09871v2 Announce Type: replace Abstract: This paper studies how the organization of production shapes democratic accountability. I propose a model in which learning economies make specialization productively efficient: most workers perform one-domain tasks, while a small set of integrators with cross-domain knowledge keep the system coherent. When policy consequences run across domains, integrators understand them better than specialists. Electoral competition then tilts government policies toward integrators' interests, while low aggregate system knowledge weakens governance and reduces the fraction of public resources converted into citizen-valued services. Labor markets leave these civic margins unpriced, failing to internalize the political returns to system knowledge. Broadening specialists can therefore raise welfare relative to the market allocation. The model speaks to debates on liberal arts education and the effects of AI.

Editor's pickPAYWALL
FT· Today

Will AI turn us all into hipsters and artisans?

There is good reason to be dubious about the notion that automation will supplant all demand for human labour

Editor's pick
Arxiv· Today

Labor Supply under Temporary Wage Increases: Evidence from a Randomized Field Experiment

arXiv:2602.11992v2 Announce Type: replace Abstract: We conduct a pre-registered randomized controlled trial to test for income targeting in labor supply decisions among sellers of a Swedish street paper. Unlike most workers, these sellers choose their own hours and face severe liquidity constraints and volatile incomes. Treated individuals received a 25 percent bonus per copy sold for the duration of an issue, simulating an increase in earnings potential. Consistent with standard labor supply theory, they sold more papers and, by our measures, worked longer hours and took fewer days off. These findings contrast with studies on intertemporal labor supply that find small substitution effects.

AI Ethics & Safety9 articles
Editor's pickTechnology
Theregister· Yesterday

Anthropic’s bug-hunting Mythos was greatest marketing stunt ever, says cURL creator

After all that hype, AI scanner found one low-severity cURL flaw

Editor's pick
Arxiv· Today

Playing games with knowledge: AI-Induced delusions need game theoretic interventions

arXiv:2605.08409v1 Announce Type: new Abstract: Conversational AI has a fundamental flaw as a knowledge interface: sycophantic chatbots induce epistemic entrenchment and delusional belief spirals even in rational agents. We propose the problem does not stem from the AI model, rooted instead in a systemic consequence of the paradigm shift from user-driven knowledge search to users and agents engaged in strategic, repeated-play communication. We formalize the problem as a Crawford-Sobel cheap talk game, where costless user signals induce a pooling equilibrium. Agents optimized for user satisfaction produce sycophantic strategies that provide identical reinforcement across user types with opposite epistemic incentives: exploratory ``Growth-seekers'' ($\theta_G$) and confirmatory ``Validation-seekers'' ($\theta_V$). Under repeated play, this identification failure creates a coordination trap -- analogous to a Prisoner's Dilemma -- where locally rational feedback loops drive users toward pathologically certain false beliefs. We propose an inference-time mechanism design intervention called an Epistemic Mediator that breaks this pooling equilibrium by introducing a costly signal (epistemic friction), forcing type revelation based on users' asymmetric cognitive costs for processing resistance. A key contribution is Belief Versioning, a git-inspired epistemic meta-memory system that stores healthy beliefs and rollbacks when validation-seeking resistance is detected. In simulation, this intervention achieves a separating equilibrium achieving a $48\times$ differential in spiral rates while passing a learning preservation criterion), evidence that epistemic safety in AI is fundamentally a problem of strategic information environment design rather than simple model alignment.

Editor's pickProfessional Services
Arxiv· Today

Cost-of-Ethics Crisis: Beliefs, Decisions, and Justifications in the Job Searches of Computer Science Students in Canada and the United States

arXiv:2605.09680v1 Announce Type: new Abstract: Workplace norms in computer science have received growing attention due to a series of recent ethical scandals. One type of response has been a push to improve the ethics education provided to computer science students. Evidence for the effectiveness of ethics education remains mixed; some evidence suggests that norms are changing, while gaps between stated values and practice remain. Our focus here is on whether students, who have received some contemporary CS ethics education, are able to effectively apply ethical reasoning to their own decision-making in what is typically the first significant ethical decision of their careers: their job search. Our study examines the ethical decision making of 129 computer science students and recent graduates during their job searches. We find that most students prioritize factors like compensation, location, and workplace culture over ethical and social issues. Even when expressing ethical concerns, respondents often justify taking actions contradicting their moral views through commonly-shared explanations such as desire to make money or the perceived inability to avoid unethical workplaces. This work sheds light on the disconnect between ethics education and real-world CS graduate decision making. We offer insights for evolving curricula to better address practical ethical dilemmas, with implications for educators and industry.

Editor's pickPAYWALLFinancial Services
Bloomberg· Today

Australia Watchdog Says Money Launderers Ramping Up AI for Scams

Australia’s financial crimes watchdog warned of a heightened threat of money laundering linked to artificial intelligence that has been used by crooks to scale up activities, automate processes and create fake documents.

Editor's pickEducation
Arxiv· Today

Teachers' Perceived Benefits and Risks of AI Across Fifty-Five Countries: An Audit of LLM Alignment and Steerability

arXiv:2605.08486v1 Announce Type: new Abstract: Teachers' trust in artificial intelligence (AI) in education depends on how they balance its perceived benefits and risks. Yet global discussions about scaling AI in education rely on fragmented evidence, as most studies of teachers' perceptions focus on single countries or small samples. This lack of representative cross-national evidence limits both theory building and policy development. At the same time, large language models (LLMs) are increasingly used in research, policy, and teachers' professional workflows, despite limited validation in education. To address these gaps, we conduct a large-scale audit of LLM alignment with teachers' perceptions of AI by combining representative international survey data with systematic model evaluation. Using OECD TALIS data from 55 countries and territories, we measure cross-national variation in teachers' perceived benefits and risks of AI. We then benchmark responses from eight state-of-the-art LLMs across four providers under both general and country-specific prompting, comparing higher- and lower-reasoning models. Results reveal substantial cross-national variation in teacher perceptions that is not reliably reflected in LLM outputs. Models compress country differences, overestimate both benefits and risks, and show limited gains from identity prompting or enhanced reasoning. This misalignment matters because LLM-generated guidance and professional discourse increasingly shape how teachers learn about and discuss AI, potentially influencing trust and future adoption decisions. Our findings caution against treating LLM outputs as substitutes for direct engagement with teachers when informing global AI-in-education initiatives. At the same time, some models (e.g., Gemini 3 Fast) partially capture cross-national ranking patterns, suggesting a complementary role in hypothesis generation and exploratory comparative analysis.

Editor's pick
Arxiv· Today

StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs

arXiv:2605.10442v1 Announce Type: new Abstract: Multilingual studies of social bias in open-ended LLM generation remain limited: most existing benchmarks are English-centric, template-based, or restricted to recognizing pre-specified stereotypes. We introduce StereoTales, a multilingual dataset and evaluation pipeline for systematically studying the emergence of social bias in open-ended LLM generation. The dataset covers 10 languages and 79 socio-demographic attributes, and comprises over 650k stories generated by 23 recent LLMs, each annotated with the socio-demographic profile of the protagonist across 19 dimensions. From these, we apply statistical tests to identify more than 1{,}500 over-represented associations, which we then rate for harmfulness through both a panel of humans (N = 247) and the same LLMs. We report three main findings. \textbf{(i)} Every model we evaluate emits consequential harmful stereotypes in open-ended generation, regardless of size or capabilities, and these associations are largely shared across providers rather than isolated misbehaviors. \textbf{(ii)} Prompt language strongly shapes which stereotypes appear: rather than transferring as a shared set of biases, harmful associations adapt culturally to the prompt language and amplify bias against locally salient protected groups. \textbf{(iii)} Human and LLM harmfulness judgments are broadly aligned (Spearman $\rho=0.62$), with disagreements concentrating on specific attribute classes rather than specific providers. To support further analyses, we release the evaluation code and the dataset, including model generations, attribute annotations, and harmfulness ratings.

Editor's pickMedia & Entertainment
Guardian· Yesterday

Texas accuses Netflix of spying on children in new lawsuit

Ken Paxton accuses streamer of designing addictive platform and falsely representing data collection practices Texas sued Netflix on Monday, accusing the streaming company of spying on children and designing its platform to be addictive. Ken Paxton, the Texas attorney general, said Netflix has for years falsely represented to consumers that it did not collect or share user data, when it actually tracked and sold viewers’ habits and preferences to commercial data brokers and advertising technology companies, making billions of dollars a year. Continue reading...

Editor's pick
Forbes· Yesterday

Council Post: ​AI Ethics Beyond Bias: The Risk Of Removing Humans From The Economy

If business leaders do not come together and set their own guardrails, governments will.

Editor's pickTechnology
Artificial Intelligence Newsletter | May 12, 2026· Yesterday

OpenAI sued over ChatGPT’s alleged role in Florida State University shooting

OpenAI faces a new wrongful death and negligence lawsuit filed by the family of a 2025 Florida State University shooting victim, alleging the chatbot contributed to the violence.

AI Skills & Education5 articles

Technology & Infrastructure

46 articles
AI Agents & Automation7 articles
Editor's pickGovernment & Public Sector
GovExec· Yesterday

The public sector agentic era: Trading pilots for transformation - Government Executive

“Through our research, we’ve ... adopters of AI and agents,” said Karen Dahut, CEO of Google Public Sector, at Google Cloud Next. “According to our Return on Investment of AI in the Public Sector report, 55% of public sector leaders say that their organizations are already using AI agents, 42% report that their organization has deployed more than 10 agents, and nearly half, 46%, say their productivity has at least doubled thanks to AI agents.” This measurable ROI is shifting ...

Editor's pickProfessional Services
Daily AI News May 11, 2026: AI Is a Social Sport· Yesterday

Building Fast & Accurate Agents with Prime-RL Post Training

This article explores using Prime-RL post-training to develop faster, more accurate agents for structured tasks like spreadsheet workflows.

Editor's pickTechnology
Arxiv· Today

CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents

arXiv:2605.08399v1 Announce Type: new Abstract: Tool-augmented language models can extend small language models with external executable skills, but scaling the tool library creates a coupled challenge: the library must evolve with the planner as new reusable subroutines emerge, while retrieval from the growing library must remain within a fixed context budget. Existing tool-use and skill-library methods typically treat tools as flat or text-indexed memories, causing prompt cost to grow with library size and obscuring the typed, compositional structure of executable code. We propose CoCoDA, a framework that co-evolves the planner and tool library through a single code-native structure: a compositional code DAG. Nodes are primitive or composite tools, edges encode invocation dependencies, and each node stores a typed signature, description, pre/post-condition specification, and worked examples. At inference time, Typed DAG Retrieval prunes candidates by symbolic signature unification, ranks survivors by descriptions, filters them by behavioral specifications, and disambiguates with examples, keeping expensive context materialization on progressively smaller candidate sets. At training time, successful trajectories are folded into validated composite tools, while the planner is updated with a DAG-induced reward that credits composites by their primitive expansion size. We provide theoretical results showing retrieval cost reduction, sublinear retrieval time, compositional advantage under the shaped reward, monotone co-evolution under conservative updates, and DAG well-formedness. Across mathematical reasoning, tabular analysis, and code task benchmarks, CoCoDA enables an 8B student to match or exceed a 32B teacher on GSM8K and MATH and consistently improves over strong tool-use and library-learning baselines.

Editor's pickTechnology
Arxiv· Today

MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

arXiv:2605.08374v1 Announce Type: new Abstract: Episodic memory allows LLM agents to accumulate and retrieve experience, but current methods treat each memory independently, i.e., evaluating retrieval quality in isolation without accounting for the dependency chains through which memories enable the creation of future memories. We introduce MemQ, which applies TD($\lambda$) eligibility traces to memory Q-values, propagating credit backward through a provenance DAG that records which memories were retrieved when each new memory was created. Credit weight decays as $(\gamma\lambda)^d$ with DAG depth $d$, replacing temporal distance with structural proximity. We formalize the setting as an Exogenous-Context MDP, whose factored transition decouples the exogenous task stream from the endogenous memory store. Across six benchmarks, spanning OS interaction, function calling, code generation, multimodal reasoning, embodied reasoning, and expert-level QA, MemQ achieves the highest success rate on all six in generalization evaluation and runtime learning, with gains largest on multi-step tasks that produce deep and relevant provenance chains (up to +5.7~pp) and smallest on single-step classification (+0.77~pp) where single-step updates already suffice. We further study how $\gamma$ and $\lambda$ interact with the EC-MDP structure, providing principled guidance for parameter selection and future research. Code will be available soon.

AI Hardware3 articles
Editor's pickTechnology
ICO Optics· Yesterday

Korea’s AI Memory Dominance: Limits to Future AI Leadership

Years of manufacturing expertise and tech know-how keep Korea at the heart of AI hardware supply chains. That edge brings strong margins as AI demand keeps climbing, putting memory makers in the spotlight of the semiconductor world. But things are shifting. The industry’s focus is moving toward chip ...

Editor's pickTechnology
Arxiv· Today

Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI

arXiv:2605.08137v1 Announce Type: cross Abstract: Weight pruning is widely advocated for deploying Large Language Models on resource-constrained IoT and edge devices, yet its impact on model fairness remains poorly understood. We conduct a controlled empirical study of three instruction-tuned models (Gemma-2-9b-it, Mistral-7B-Instruct-v0.3, Phi-3.5-mini-instruct) across three pruning methods (Random, Magnitude, Wanda) at four sparsity levels (10-70%) on 12,148 BBQ bias benchmark items with 5 random seeds, totaling 2,368,860 inference records. Our results reveal a Smart Pruning Paradox: activation-aware pruning (Wanda) preserves perplexity nearly perfectly (just 3.5% increase at 50% sparsity for Mistral-7B), yet produces the highest bias amplification, with Stereotype Reliance Score increasing 83.7% and 47-59% of previously unbiased items developing new stereotypical behaviors at 70% sparsity. Random pruning destroys language capability entirely (perplexity exceeding $10^4$ and reaching $10^8$) but produces only random-chance bias. We further show that unstructured pruning provides zero storage savings and zero inference latency reduction on real edge hardware, undermining the primary motivation for its use in IoT deployment. Of 180 dense-vs-pruned comparisons, 141 (78.3%) are significant ($p < 0.05$) with mean $|h| = 0.305$. Published quantization studies report up to 21% of responses flipping between biased and unbiased states; our pruning results show transition rates nearly three times higher (47-59%), suggesting pruning poses a categorically greater risk to alignment than quantization. These findings demonstrate that perplexity-based evaluation provides false assurance of behavioral equivalence, and that IoT deployment pipelines require bias-aware validation before deploying pruned models at the edge.

AI Infrastructure & Compute11 articles
Editor's pickTechnology
Substack· Yesterday

COMPUTEX 2026: Why Taiwan Is Becoming the Strategic Center of the Global AI Supply Chain

This means Taiwan’s supply chain must evolve from pure manufacturing execution toward deeper co-development. Suppliers must support more customized platforms, faster engineering cycles, and more complex integration demands. The companies that can move from component supply to system-level collaboration will capture more value in the AI infrastructure cycle. ... Although COMPUTEX will be filled with AI chip ...

Editor's pickEnergy & Utilities
LevelFields· Yesterday

LevelFields — AI Infrastructure Boom Drives Demand for Utilities, Nuclear Power and Grid Expansion

Lumentum, which supplies optical ... revenue growth YoY and significant margin expansion. Management pointed specifically to rising demand tied to cloud computing, optical networking, and AI infrastructure buildouts. The impact is now spreading into utilities and power generation as well. Constellation Energy highlighted ...

Editor's pickTechnology
BNN Bloomberg· Yesterday

Market Outlook: AI infrastructure demand keeps stocks climbing

Hans Albrecht says AI infrastructure spending and rising inference demand continue to support markets near record highs.

Editor's pickTechnology
PR Newswire· Yesterday

MICROIP Unveils "Software-Driven Hardware" Strategy at EEC 2026, Partnering with Poland to Build a Resilient Edge AI & ASIC Supply Chain

/PRNewswire/ -- At the 2026 European Economic Congress (EEC 2026), MICROIP Chairman Dr. James Yang participated in a high-level dialogue at the "Poland-Taiwan...

Editor's pickTechnology
DIGITIMES· Today

Commentary: SEA semiconductor industry pivots towards AI as a strategic hub

SEMICON SEA 2026 drew heavy crowds, underscoring Southeast Asia's emergence as an indispensable "strategic hub" in the global AI compute supply chain. The region's semiconductor players are now moving from "capacity substitution" to "technological self-reliance" in what is shaping up to be ...

Editor's pickPAYWALLEnergy & Utilities
Bloomberg· Yesterday

SoftBank in Talks for Major Data Center Project in France

SoftBank Group Corp. founder Masayoshi Son has held talks about unveiling an ambitious French AI data center project with President Emmanuel Macron in the coming weeks, according to people familiar with the matter.

Editor's pickPAYWALLTelecommunications
Bloomberg· Yesterday

What’s Next for Telecom and Digital Infrastructure

Grain Management CEO and founder David Grain discusses what's next for telecom and digital infrastructure. He sees real workloads absorbing infrastructure and says they're 'very' active in deploying capital. He speaks with Katie Greifeld and Romaine Bostick on "The Close." (Source: Bloomberg)

Editor's pickTechnology
Bebeez· Yesterday

Nscale extends funding momentum with €670 million for Norway AI infrastructure project

London’s AI infrastructure hyperscaler titan Nscale has announced an additional €670 million ($790 million) in financing to reinforce the continued development of the AI data centre in Narvik, Norway; reportedly the largest AI infrastructure project in the country. The financing was committed by ABN AMRO, DNB, Eksfin, Nordea and SEB. The committed financing includes an […]

Editor's pickTechnology
The Manila Times· Today

Kneron Warns the AI Industry Is Approaching a Massive Inference Infrastructure Bottleneck | The Manila Times

SAN DIEGO, May 12, 2026 (GLOBE NEWSWIRE) -- Kneron, the San Diego based edge AI company developing full stack inference infrastructure, says the artificial intelligence industry may be vastly underestimating the next major bottleneck of AI and it has nothing to do with training larger models.

Editor's pickTechnology
Artificial Intelligence Newsletter | May 12, 2026· Yesterday

Samsung SDS-led group selected for South Korea's national AI computing center

A Samsung SDS-led consortium has been selected to build and operate South Korea's 2.5-trillion-won national AI computing center, with construction beginning this year.

Editor's pickManufacturing & Industrials
OfficeChai· Yesterday

How Chinese PCB Manufacturers Are Supporting the AI Industry

The focus is not only on “who can make a PCB,” but on which companies appear relevant to the AI hardware supply chain through high-speed PCB, HDI PCB, AI server PCB, turnkey PCBA, and EMS support. AI computing creates a different kind of challenge for PCB manufacturing.

AI Models & Capabilities12 articles
Editor's pickFinancial Services
Arxiv· Today

Calibrating Behavioral Parameters with Large Language Models

arXiv:2602.01022v3 Announce Type: replace Abstract: Behavioral parameters such as loss aversion, herding, and extrapolation are central to asset pricing models but remain difficult to measure reliably. We develop a framework that treats large language models (LLMs) as calibrated measurement instruments for behavioral parameters. Using four models and 24{,}000 agent--scenario pairs, we document systematic rationality bias in baseline LLM behavior, including attenuated loss aversion, weak herding, and near-zero disposition effects relative to human benchmarks. Profile-based calibration induces large, stable, and theoretically coherent shifts in several parameters, with calibrated loss aversion, herding, extrapolation, and anchoring reaching or exceeding benchmark magnitudes. To assess external validity, we embed calibrated parameters in an agent-based asset pricing model, where calibrated extrapolation generates short-horizon momentum and long-horizon reversal patterns consistent with empirical evidence. Our results establish measurement ranges, calibration functions, and explicit boundaries for eight canonical behavioral biases.

Editor's pick
Arxiv· Today

Embeddings for Preferences, Not Semantics

arXiv:2605.08360v1 Announce Type: new Abstract: Modern AI is opening the door to collective decision-making in which participants express their views as free-form text rather than voting on a fixed set of candidates. A natural idea is to embed these opinions in a vector space so that the substantial literature on facility location problems and fair clustering can be brought to bear. But standard text embeddings measure semantic similarity, whereas distances in facility location problems and fair clustering require what we call \textit{preferential similarity}: a participant's agreement with a piece of text should be inversely related to their distance from it. Off-the-shelf embeddings inherit a coarse preference signal through a correlation between semantic and preferential similarity, but fail to capture preferences when the correlation breaks. We formalize this as an invariance problem: text embedding models encode both a preference-relevant signal (stance and values) and semantic nuisance (style and wording), and the two are observationally correlated, so a geometry that relies on nuisance can appear preference-correct even when it is not. We show that synthetic training data designed to break this correlation provably shifts the optimal scorer away from nuisance-dominated cosine and significantly improves preference prediction across 11 online deliberation datasets.

Editor's pickTechnology
The Register· Yesterday

Microsoft researchers find AI models and agents can't handle long-running tasks

An intern who failed this much would be shown the door

Editor's pickHealthcare
Arxiv· Today

MedThink: Enhancing Diagnostic Accuracy in Small Models via Teacher-Guided Reasoning Correction

arXiv:2605.08094v1 Announce Type: new Abstract: Accurate clinical diagnosis requires extensive domain knowledge and complex clinical reasoning capabilities. Although large language models (LLMs) hold great potential for clinical reasoning, their high computational and memory requirements limit their deployment in resource-constrained environments. Knowledge distillation (KD) can compress LLM capabilities into smaller models, but traditional KD merely transfers superficial answer patterns and fails to preserve the structured reasoning required for reliable diagnosis. To address this, we propose a two-stage distillation framework, MedThink, designed to cultivate robust clinical reasoning in small language models (SLMs). In the first stage, a teacher LLM screens data and injects domain-knowledge explanations to fine-tune a student model, establishing a knowledge foundation. In the second stage, the teacher evaluates the student's errors, generates reasoning chains linking knowledge to correct answers, and refines the student's diagnostic reasoning through a second round of fine-tuning. We evaluate MedThink on general medical benchmarks and a gastroenterology dataset comprising 955 question-answer pairs. Experiments demonstrate that MedThink outperforms six distillation strategies in all benchmarks: achieving an improvement of up to 12.7% over the student baseline in general tasks, and reaching a total top accuracy of 56.4% in gastroenterology evaluation. This indicates that iterative distillation centered on reasoning can significantly enhance the diagnostic accuracy and generalization capabilities of SLMs whilst maintaining computational efficiency. Our code and data are publicly available at https://github.com/destinybird/PrecisionBoost.

Editor's pickTechnology
VentureBeat· Yesterday

Thinking Machines shows off preview of near-realtime AI voice and video conversation with new 'interaction models'

Is AI leaving the era of "turn-based" chat? Right now, all of us who use AI models regularly for work or in our personal lives know that the basic interaction mode across text, imagery, audio, and video remains the same: the human user provides an input, waits anywhere between milliseconds to minutes (or in some cases, for particularly tough queries, hours and days), and the AI model provides an output. But if AI is to really take on the load of jobs requiring natural interaction, it will need to do more than provide this kind of "turn-based" interactivity — it will ultimately need to respond more fluidly and naturally to human inputs, even responding while also processing the next human input, be it text or another format. That at least seems to be the contention of Thinking Machines, the well-funded AI startup founded last year by former OpenAI chief technology officer Mira Murati and former OpenAI researcher and co-founder John Schulman, among others. Today, the firm announced a research preview of what it deems to be "interaction models, a new class of native multimodal systems that treats interactivity as a first-class citizen of model architecture rather than an external software "harness," scoring some impressive gains on third-party benchmarks and reduced latency as a result. However, the models are not yet available to the general public or even enterprises — the company says in its announcement blog post: "In the coming months, we will open a limited research preview to collect feedback, with a wider release later this year." 'Full duplex' simultaneous input/output processing At the heart of this announcement is a fundamental shift in how AI perceives time and presence. Current frontier models typically experience reality in a single thread; they wait for a user to finish an input before they begin processing, and their perception freezes while they generate a response. In their blog post, the Thinking Machines researchers described the status quo as a limitation that forces humans to "contort themselves" to AI interfaces, phrasing questions like emails and batching their thoughts. To solve this "collaboration bottleneck," Thinking Machines has moved away from the standard alternating token sequence. Instead, they use a multi-stream, micro-turn design that processes 200ms chunks of input and output simultaneously. This "full-duplex" architecture allows the model to listen, talk, and see in real time, enabling it to backchannel while a user speaks or interject when it notices a visual cue—such as a user writing a bug in a code snippet or a friend entering a video frame. Technically, the model utilizes encoder-free early fusion. Rather than relying on massive standalone encoders like Whisper for audio, the system takes in raw audio signals as dMel and image patches (40x40) through a lightweight embedding layer, co-training all components from scratch within the transformer. Dual model system The research preview introduces TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts (MoE) model with 12 billion active parameters. Because real-time interaction requires near-instantaneous response times that often conflict with deep reasoning, the company has architected a two-part system: The Interaction Model: Stays in a constant exchange with the user, handling dialog management, presence, and immediate follow-ups. The Background Model: An asynchronous agent that handles sustained reasoning, web browsing, or complex tool calls, streaming results back to the interaction model to be woven naturally into the conversation. This setup allows the AI to perform tasks like live translation or generating a UI chart while continuing to listen to user feedback—a capability demonstrated in the announcement video where the model provided typical human reaction times for various cues while simultaneously generating a bar chart. Impressive performance on major benchmarks against other leading AI labs' fast interaction models To prove the efficacy of this approach, the lab utilized FD-bench, a benchmark specifically designed to measure interaction quality rather than just raw intelligence.The results show that TML-Interaction-Small significantly outperforms existing real-time systems: Responsiveness: It achieved a turn-taking latency of 0.40 seconds, compared to 0.57s for Gemini-3.1-flash-live and 1.18s for GPT-realtime-2.0 (minimal). Interaction Quality: On FD-bench V1.5, it scored 77.8, nearly doubling the scores of its primary competitors (GPT-realtime-2.0 minimal scored 46.8). Visual Proactivity: In specialized tests like RepCount-A (counting physical repetitions in video) and ProactiveVideoQA, Thinking Machines’ model successfully engaged with the visual world while other frontier models remained silent or provided incorrect answers. Metric TML-Interaction-Small GPT-realtime-2.0 (min) Gemini-3.1-flash-live (min) Turn-taking latency (s) 0.40 1.18 0.57 Interaction Quality (Avg) 77.8 46.8 54.3 IFEval (VoiceBench) 82.1 81.7 67.6 Harmbench (Refusal %) 99.0 99.5 99.0 A potentially huge boon to enterprises — once the models are made available If made available to the enterprise sector, Thinking Machines' interaction models would represent a fundamental shift in how businesses integrate AI into their operational workflows. A native interaction model like TML-Interaction-Small allows for several enterprise capabilities that are currently impossible or highly brittle with standard multimodal models: Current enterprise AI requires a "turn" to be completed before it can analyze data. In a manufacturing or lab setting, a native interaction model can monitor a video feed and proactively interject the moment it detects a safety violation or a deviation from a protocol — without waiting for the worker to ask for feedback. The model's success in visual benchmarks like RepCount-A (accurate repetition counting) and ProactiveVideoQA (answering questions as visual evidence appears) suggests it could serve as a real-time auditor for high-stakes physical tasks. The primary friction in voice-based customer service is the 1–2 second "processing" delay common in 2026's standard APIs. Thinking Machines' model achieves a turn-taking latency of 0.40 seconds, roughly the speed of a natural human conversation. Because it handles simultaneous speech natively, an enterprise support bot could listen to a customer's frustration, provide "backchannel" cues (like "I see" or "mm-hmm") without interrupting the user, and offer live translation that feels like a natural conversation rather than a series of disjointed recordings. Standard LLMs lack an internal clock; they "know" time only if it is provided in a text prompt. Interaction models are natively time-aware, allowing them to manage time-sensitive processes like "Remind me to check the temperature every 4 minutes" or "Alert me if this process takes longer than the last one". This is critical for industrial maintenance and pharmaceutical research where timing is an essential variable. Background on Thinking Machines This release marks the second major milestone for Thinking Machines following the October 2025 launch of Tinker, a managed API for fine-tuning language models that lets researchers and developers control their data and training methods while Thinking Machines handles the infrastructure burden of distributed training. The company said Tinker supports both small and large open-weight models, including mixture-of-experts models, and early users included groups at Princeton, Stanford, Berkeley and Redwood Research. At launch in early 2025, Thinking Machines framed itself as an AI research and product company trying to make advanced AI systems “more widely understood, customizable and generally capable.” In July 2025, Thinking Machines said it had raised about $2 billion at a $12 billion valuation in a round led by Andreessen Horowitz, with participation from Nvidia, Accel, ServiceNow, Cisco, AMD and Jane Street, described by WIRED as the largest seed funding round in history. The Wall Street Journal reported in August 2025 that rival tech CEO Mark Zuckerberg approached Murati about acquiring Thinking Machines Lab and, after she declined, Meta pursued more than a dozen of the startup’s roughly 50 employees. In March and April 2026, the company also became known for its compute ambitions: it announced a Nvidia partnership to deploy at least one gigawatt of next-generation Vera Rubin systems, then expanded its Google Cloud relationship to use Google’s AI Hypercomputer infrastructure with Nvidia GB300 systems for model research, reinforcement learning workloads, frontier model training and Tinker. By April 2026, Business Insider reported that Meta had hired seven founding members from Thinking Machines, including Mark Jen and Yinghai Lu, while another Thinking Machines researcher, Tianyi Zhang, also moved to Meta. The same reporting said Joshua Gross, who helped build Thinking Machines’ flagship fine-tuning product Tinker, had joined Meta Superintelligence Labs, and that the company had grown to about 130 employees despite the departures. Thinking Machines was not simply losing people, however: it also hired Meta veteran Soumith Chintala, creator of PyTorch, as CTO, and added other high-profile technical talent such as Neal Wu. TechCrunch separately reported in April 2026 that Weiyao Wang, an eight-year Meta veteran who worked on multimodal perception systems, had joined Thinking Machines, underscoring that the talent flow was not one-way. Thinking Machines previously stated it was committed to "significant open source components" in its releases to empower the research community. It's unclear if these new interaction models models will fall under the same ethos and release terms. But one thing is certain: by making interactivity native to the model, Thinking Machines believes that scaling a model will now make it both smarter and a more effective collaborator.

Editor's pickTechnology
Arxiv· Today

Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

arXiv:2605.08354v1 Announce Type: new Abstract: Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment. Prevailing RLHF approaches reduce this structure to scalar or pairwise labels, collapsing nuanced preferences into opaque parametric proxies and exposing vulnerabilities to reward hacking. While recent Rubrics-as-Reward (RaR) methods attempt to recover this structure through explicit criteria, generating rubrics that are simultaneously reliable, scalable, and data-efficient remains an open problem. We introduce Auto-Rubric as Reward (ARR), a framework that reframes reward modeling from implicit weight optimization to explicit, criteria-based decomposition. Before any pairwise comparison, ARR externalizes a VLM's internalized preference knowledge as prompt-specific rubrics, translating holistic intent into independently verifiable quality dimensions. This conversion of implicit preference structure into inspectable, interpretable constraints substantially suppresses evaluation biases including positional bias, enabling both zero-shot deployment and few-shot conditioning on minimal supervision. To extend these gains into generative training, we propose Rubric Policy Optimization (RPO), which distills ARR's structured multi-dimensional evaluation into a robust binary reward, replacing opaque scalar regression with rubric-conditioned preference decisions that stabilize policy gradients. On text-to-image generation and image editing benchmarks, ARR-RPO outperforms pairwise reward models and VLM judges, demonstrating that explicitly externalizing implicit preference knowledge into structured rubrics achieves more reliable, data-efficient multimodal alignment, revealing that the bottleneck is the absence of a factorized interface, not a deficit of knowledge.

Editor's pickPAYWALLTechnology
Washington Post· Yesterday

Analysis | See the hidden rules behind AI. Then use them to rewrite this article. - Washington Post

Behind the scenes, artificial intelligence companies invisibly add thousands of words of instructions to every conversation you have with a chatbot to steer its behavior. They include phrases like “Aim for readable, accessible responses” and “You must avoid providing …

Editor's pickTechnology
Daily AI News May 11, 2026: AI Is a Social Sport· Yesterday

ZAYA1-74B-Preview: Scaling Pretraining on AMD

Zyphra's ZAYA1-74B-Preview is an open-weights MoE large language model trained end-to-end on AMD infrastructure, currently available as a pre-RL preview.

Editor's pickEnergy & Utilities
Arxiv· Today

Forecasting Residential Heating and Electricity Demand with Scalable, High-Resolution, Open-Source Models

arXiv:2505.22873v2 Announce Type: replace Abstract: We present a novel framework for high-resolution forecasting of residential heating demand and non-heating electricity demand using probabilistic deep learning models. Because our models are trained on electricity consumption from a predominantly gas-heated region, the learned electricity demand patterns primarily reflect non-heating end uses such as lighting, appliances, and cooling. We focus specifically on providing hourly building-level electricity and heating demand forecasts for the residential sector. Leveraging multimodal building-level information -- including data on building footprint areas, heights, nearby building density, nearby building size, land use patterns, and high-resolution weather data -- and probabilistic modeling, our methods provide granular insights into demand heterogeneity. Validation at the building level underscores a step change improvement in performance relative to NREL's ResStock model, which has emerged as a research community standard for residential heating and electricity demand characterization. In building-level heating and electricity estimation backtests, our probabilistic models respectively achieve RMSE scores 18.8% and 27.6% lower than those based on ResStock, with probabilistic forecast quality measured via WIS improving by 59% for both applications. By offering an open-source, scalable, high-resolution platform for demand estimation and forecasting, this research advances the tools available for policymakers and grid planners, contributing to the broader effort to decarbonize the U.S. building stock and meeting climate objectives.

Editor's pickTechnology
Arxiv· Today

Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits

arXiv:2605.08200v1 Announce Type: new Abstract: A pervasive intuition holds that vision-language models (VLMs) are most trustworthy when their attention maps look sharp: concentrated attention on the queried region should imply a confident, calibrated answer. We test this Attention-Confidence Assumption directly. We instrument three open-weight VLM families (LLaVA-1.5, PaliGemma, Qwen2-VL; 3-7B parameters) with a unified mechanistic pipeline -- the VLM Reliability Probe (VRP) -- that compares attention structure, generation dynamics, and hidden-state geometry against a single correctness label. Three results emerge. (i) Attention structure is a near-zero predictor of correctness (R_pb(C_k,y)=0.001, 95% CI [-0.034,0.036]; R_pb(H_s,y)=-0.012, [-0.047,0.024] on a pooled n=3,090 split), even though attention remains causally necessary for feature extraction (top-30% patch masking drops accuracy by 8.2-11.3 pp, p0.95 on POPE for two of three families, and self-consistency at K=10 is the strongest behavioral predictor we measure at 10x inference cost (R_pb=0.43). (iii) Causal neuron-level ablations expose a sharp architectural split with direct monitor-design implications: late-fusion LLaVA concentrates reliability in a fragile late bottleneck (-8.3 pp object-identification accuracy after top-5 probe-neuron ablation), whereas early-fusion PaliGemma and Qwen2-VL distribute it widely and absorb destruction of ~50% of their peak-layer hidden dimension with <=1 pp degradation. The takeaway is narrow but consequential: in 3-7B VLMs, reliability is read more reliably off hidden-state geometry, layer-wise margin formation, and sparse late-layer circuits than off attention-map sharpness.

Editor's pickTechnology
Arxiv· Today

On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective

arXiv:2605.08368v1 Announce Type: new Abstract: Debates about large language model post-training often treat supervised fine-tuning (SFT) as imitation and reinforcement learning (RL) as discovery. But this distinction is too coarse. What matters is whether a training procedure increases the probability of behaviors the pretrained model could already produce, or whether it changes what the model can practically reach. We argue that post-training research should distinguish between capability elicitation and capability creation. We make this distinction operational by introducing the notion of accessible support: the set of behaviors that a model can practically produce under finite budgets. Post-training that reweights behaviors within this support is capability elicitation; whereas changing the support itself corresponds to capability creation. We develop this argument through a free-energy view of post-training. SFT and RL can both be seen as reweighting a pretrained reference distribution, only with different external signals. Demonstration signals define low-energy behavior for SFT, and reward signals define low-energy behavior for RL. When the update remains close to the base model, the main effect is local reweighting, not capability creation. Within this framework, the central question is no longer whether post-training is framed as SFT or RL, but whether it reweights behaviors already within reach, or instead expands the model's reachable behavioral space through search, interaction, tool use, or the incorporation of new information.

Editor's pick
Arxiv· Today

Belief or Circuitry? Causal Evidence for In-Context Graph Learning

arXiv:2605.08405v1 Announce Type: new Abstract: How do LLMs learn in-context? Is it by pattern-matching recent tokens, or by inferring latent structure? We probe this question using a toy graph random-walk across two competing graph structures. This task's answer is, in principle, decidable: either the model tracks global topology, or it copies local transitions. We present two lines of evidence that neither account alone is sufficient. First, reconstructing the internal representation structure via PCA reveals that at intermediate mixture ratios, both graph topologies are encoded in orthogonal principal subspaces simultaneously. This pattern is difficult to reconcile with purely local transition copying. Second, residual-stream activation patching and graph-difference steering causally intervene on this graph-family signal: late-layer patching almost fully transfers the clean graph preference, while linear steering moves predictions in the intended direction and fails under norm-matched and label-shuffled controls. Taken together, our findings are most consistent with a dual-mechanism account in which genuine structure inference and induction circuits operate in parallel.

AI Research & Science2 articles
Editor's pick
Arxiv· Today

Toward an Engineering of Science: Rebalancing Generation and Verification in the Age of AI

arXiv:2605.10425v1 Announce Type: new Abstract: AI systems can now cheaply generate plausible scientific artifacts such as papers, reviews, and surveys. This creates a risk of \emph{epistemic pollution} in our scientific systems, where unreliable but plausible-looking artifacts can accumulate faster than the system can filter them out. The problem is structural: the epistemic infrastructure of science was calibrated to a world where producing a plausible artifact required substantial expertise, labor, and time, so generation cost itself served as a rough filter; AI weakens that filter without comparably lowering verification cost. We argue that \textbf{AI-era science should treat this as an engineering problem: redesigning epistemic infrastructure to rebalance the costs of generation and verification}. The current paper-centered system makes verification expensive: papers compress long-context scientific logic into prose, forcing reviewers, human or AI, to reconstruct underlying argument structure before they can evaluate it. As one step in this direction, we propose \textbf{blueprints} as preliminary epistemic infrastructure: structured, decomposed research artifacts that represent claims, evidence, assumptions, and definitions as typed graph components. Blueprints are designed to trade an upfront generation cost for cheaper, more local, more distributed verification downstream. We have instantiated the proposal in a proof-of-concept prototype.

AI Security & Cybersecurity11 articles
Editor's pickTelecommunications
Artificial Intelligence Newsletter | May 12, 2026· Yesterday

Singapore mobilizes whole-of-country response to frontier AI cyber threats

Singapore has directed critical infrastructure and telecommunications operators to bolster cybersecurity in response to the growing threat of AI-accelerated cyberattacks.

Editor's pickTechnology
DIGITIMES· Yesterday

China's cybersecurity AI charges ahead despite US model lockout

AI is upending the world of cybersecurity, as more capable models and agentic capabilities bring about new ways to exploit vulnerabilities, along with new ways to discover and patch them. While major US AI companies have so far led the race, Chinese firms are also using AI to stay competitive ...

Editor's pickPAYWALLTechnology
NYT· Yesterday

Google Says Criminal Hackers Used A.I. to Find a Major Software Flaw

The company said that it had identified, for the first time, hackers using artificial intelligence to discover an unknown bug. The attempted attack represents “a taste of what’s to come,” one expert said.

Editor's pickTechnology
Guardian· Yesterday

AI-powered hacking has exploded into industrial-scale threat, Google says

Criminal groups and state-linked actors appear to be using commercial models to refine and scale up attacks Business live – latest updates In just three months, AI-powered hacking has gone from a nascent problem to an industrial-scale threat, according to a report from Google. The findings from Google’s threat intelligence group add to an intensifying, global discussion about how the newest AI models are extremely adept at coding – and becoming extremely powerful tools for exploiting vulnerabilities in a broad array of software systems. Continue reading...

Editor's pickTechnology
Daily Brew· Yesterday

AI tool poisoning exposes a major flaw in enterprise agent security

Researchers have identified a vulnerability where AI tools can be 'poisoned,' creating significant security risks for enterprise-level agents.

Editor's pick
Arxiv· Today

The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play

arXiv:2605.08427v1 Announce Type: new Abstract: Self-play red team is an established approach to improving AI safety in which different instances of the same model play attacker and defender roles in a zero-sum game, i.e., where the attacker tries to jailbreak the defender; if self-play converges to a Nash equilibrium, the model is guaranteed to respond safely within the settings of the game. Although the parameter sharing enforced by the use of the same model for the two roles improves stability and performance, it introduces fundamental theoretical and architectural limitations. We show that the set of Nash equilibria that can be reached corresponds to a broad class of behaviours that includes trivial always refuse strategies and oracle-like defenders, thus limiting practical applicability. We then show that when attacker and defender share and update the same base model, the dynamics collapse to self-consistency, so that attacks do not enforce adversarial pressure on the defender. In response, we propose Anchored Bipolicy Self-Play, which trains distinct role-specific LoRA adapters on top of a frozen base model, thereby maintaining stable optimisation while preserving adversarial pressure through explicit role separation. In relation to standard self-play, we show up to 100x greater parameter efficiency than finetuning and consistent improvements in safety compared to self-play fine-tuned models. We evaluate on Qwen2.5-{3B, 7B,14B}-IT models across widely used safety benchmarks, showing improved robustness without loss of reasoning ability. Cross-play experiments further show that our attacker and defender models are superior to self-play in terms of adversarial defence and safety.

Adoption, Deployment & Impact

34 articles
AI Adoption Barriers & Enablers13 articles
Editor's pickProfessional Services
Arxiv· Today

Human Learning about AI

arXiv:2406.05408v3 Announce Type: replace Abstract: We study \emph{Human Projection} (HP): people's tendency to evaluate AI using the same frameworks they use for humans -- treating features such as task difficulty and the reasonableness of mistakes as diagnostic of overall ability. We formalize HP and its consequences for equilibrium adoption, testing its predictions experimentally. First, people project human difficulty onto AI, overestimating performance on human-easy tasks, underestimating it on human-hard ones, and over-updating after easy failures and hard successes -- leading to systematic misspecification when AI performance is jagged rather than human-ordered. Second, HP interprets observed performance through a single ability index, inducing all-or-nothing adoption even when AI outperforms humans on only some tasks; experimentally stripping AI of human-like cues weakens cross-task generalization and reduces over-adoption. Finally, a field experiment with a parenting-advice chatbot shows that less humanly reasonable mistakes cause larger drops in trust and future engagement. Anthropomorphic AI design can amplify HP, misaligning beliefs and distorting adoption.

Editor's pick
MIT Technology Review· Yesterday

Fostering breakthrough AI innovation through customer-back engineering

Despite years of digitization, organizations capture less than one-third of the value expected from digital investments, according to McKinsey research. That’s because most big companies begin with technological capabilities and bolt applications onto them, rather than starting with customer needs and working backward to technology solutions. Not prioritizing the customer can create fragmented solutions; disjointed…

Editor's pickEnergy & Utilities
Arxiv· Today

From Expansion to Consolidation: Socio-Spatial Contagion Dynamics in Off-Grid PV Adoption

arXiv:2605.09642v1 Announce Type: new Abstract: In traditional rural societies, where social ties are embedded in physical space, the diffusion of emerging technologies may be amplified through socio-spatial contagion (SSC). Such processes may play a key role in accelerating residential PV adoption in off-grid regions. Yet empirical evidence on SSC in PV adoption remains largely limited to affluent, grid-connected settings, while off-grid regions often lack systematic installation records. To address these gaps, we use a deep learning segmentation model to extract PV installations from a decade-long series of remote sensing imagery across 507 off-grid settlement clusters (hereafter, communities). This enables data-driven spatio-temporal point pattern inference of SSC in data-scarce contexts. SSC is quantified through the range and intensity of clustering of new installations around prior adopters, and the dynamics of these dimensions are linked to adoption outcomes. We found that SSC is nearly ubiquitous, often spanning most of the community's spatial extent, while exhibiting substantial heterogeneity in intensity. Although SSC intensifies over time, its effects remain temporally concentrated, peaking within 1 to 2 years of nearby installations and weakening thereafter. SSC intensity is positively associated with adoption rates in both cross-sectional and temporal analyses. However, the relationship between SSC range and adoption changes over time - in early diffusion phases, adoption growth is associated with range expansion, whereas in later phases it is associated with range contraction. This shift reflects a transition from clustering to consolidation of installations. These findings highlight the potential of seeding interventions to accelerate PV diffusion in off-grid regions.

Editor's pickTechnology
Forbes· Yesterday

Council Post: AI Infrastructure Is Scaling Fast. Decision-Making Isn’t

AI infrastructure is scaling faster than enterprise decision-making. And that gap is becoming the real bottleneck.

Editor's pick
AVIXA Xchange· Yesterday

The Real Barriers to AI Adoption Aren’t What You Think | AVIXA Xchange

Budget constraints and ROI justification tend to dominate AI adoption conversations in boardrooms, at industry events, and in many vendor pitches.

Editor's pick
Prokerala· Yesterday

86 pc Indian employees use AI, but ROI and governance lag: Report

While 86 per cent of employees in India use artificial intelligence at work, only 35 per cent say AI's return on investment has met or exceeded e....

Editor's pick
Raconteur· Yesterday

Why AI governance is a European imperative - Raconteur

If the German Autobahn is the fastest road system in Europe, then AI is the technological equivalent, and we’re all in the fast lane. Experimental pilots are rapidly evolving into production-level deployments, delivering productivity gains and decision-making improvements.

Editor's pickManufacturing & Industrials
Arxiv· Today

Trustworthiness in Digital Twin Systems: Systematic Review and Research Horizons

arXiv:2605.08208v1 Announce Type: new Abstract: Digital Twins (DTs) are increasingly deployed across application domains, yet the treatment of trust-related issues remains unevenly addressed. To examine whether and how trust is discussed in the current landscape, we conducted a systematic review of existing DT review papers and a mapping of their abstracts. Seven trust-related challenges and seven trust-enhancing strategies were defined to guide the analysis, enabling the trust focus of each paper to be characterised. By aggregating the challenges and strategies referenced across domains, distinct patterns of emphasis were observed. With certain domains consistently sharing similar spectrum of trust concerns, four integration types, including human-centred, safety-critical, context-specific, and technologically-driven, were identified as emergent categories reflecting how trust is prioritised in different deployment contexts. Drawing on the characteristics of these types, several preliminary directions for future research were proposed. These include the development of trust-by-design principles to inform early-stage decision-making, the inclusion of trust metadata in platform schemas to prompt systematic developer consideration of trust factors, and the exploration of how architectural choices, such as federated DTs, influence user trust.

Editor's pickProfessional Services
ITWeb· Yesterday

The skills gap quietly undermining your AI strategy | ITWeb

The AI conversation is dominated by tools, infrastructure and innovation – but without the skills to support it, it quickly becomes an operational risk.

Editor's pickManufacturing & Industrials
Canadian Mining Journal· Yesterday

Caterpillar targets mining skills gap with US$1M challenge - Canadian Mining Journal

As mining and heavy industry accelerate their adoption of automation, digitalization and advanced technologies, Caterpillar is looking to address another growing challenge […]

Editor's pickManufacturing & Industrials
Auto Remarketing· Yesterday

Inside the ‘architectural mismatch’ between AI capabilities & dealer needs | Auto Remarketing

Explore the latest trends and insights in the automotive remarketing industry. Stay updated on strategies for franchise and used car dealers.

Editor's pickTransportation & Logistics
Daily Brew· Yesterday

Do City Delivery Drones Make Sense? No One Knows, but They're Flying Over NYC

A look at the current state of drone delivery services in New York City and the uncertainty surrounding their long-term viability and impact.

Editor's pickTechnology
⚙️ Why AI is about to move from cloud to edge· Yesterday

Podcast: Why the future of AI is hybrid and not cloud

In this episode, we explore why the next phase of AI could run across personal devices and discuss the industry's biggest challenges, including affordability and privacy.

AI Applications7 articles
Editor's pickMedia & Entertainment
Arxiv· Today

When 'For You' Isn't For You: Measuring User Agency in TikTok's Algorithmic Feed

arXiv:2605.10690v1 Announce Type: new Abstract: The short-form video-sharing service TikTok has become an important platform in the social media landscape, with much of its popularity owed to its algorithmically-driven "For You Page" (FYP). This feature serves as the "home screen" for the platform and provides a personalized feed of content for each user. Unlike other social media services-where new users start their journey by explicitly signaling whom they choose to friend or follow-the TikTok FYP algorithm instead begins making inferences based on implicit signals, such as how long they watch particular videos. As a result, users have less explicit control over what content they see, and concerns have been raised about the impact on users (e.g., the delivery of potentially harmful content). In this work, we investigate the extent to which users have control over the content they see on the FYP on TikTok. We first develop novel techniques to study the TikTok mobile app, introducing a new avenue for conducting controlled experiments that enable us to send both explicit and implicit signals on the app. We then use these techniques to study the FYP algorithm based on accounts we control. We find that the FYP algorithm is sensitive to both types of signals, changing the amount of personalized content the account sees. However, we find that users may have difficulty convincing the FYP algorithm to stop showing content the user wishes to no longer see: the most effective explicit signal-marking a video as 'Not Interested'-is unintuitively buried in the interface. Worse, we find that once accounts cease to indicate disinterest in a topic, many find their feeds dominated by such content again.

Editor's pickProfessional Services
Arxiv· Today

Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction

arXiv:2605.08220v1 Announce Type: new Abstract: The automated extraction of data from scientific charts is a critical task for large-scale literature analysis. While multimodal Large Language Models (LLMs) show promise, their accuracy on non-standardized charts remains a challenge. This raises a key research question: what is the most effective strategy to improve model performance (high-level semantic priming) or low-level spatial priming? This paper presents a comparative investigation into these two distinct strategies. We describe our exploratory experiments with semantic methods, such as a two-stage metadata-first framework and Chain-of-Thought, which failed to produce a statistically significant improvement. In contrast, we present a simple but highly effective spatial priming method: overlaying a coordinate grid onto the chart image before analysis. Our quantitative experiment on a synthetic dataset demonstrates that this grid-based approach provides a statistically significant reduction in data extraction error (SMAPE reduced from 25.5% to 19.5%, p < 0.05) compared to a baseline. We conclude that for the current generation of multimodal models, providing explicit spatial context is a more effective and reliable strategy than high-level semantic guidance for this class of tasks.

Editor's pickEducation
Arxiv· Today

Million Tutoring Moves (MTM): An Open Multimodal Dataset for the Science of Tutoring

arXiv:2605.08092v1 Announce Type: new Abstract: We introduce the Million Tutoring Moves (MTM) project, an open dataset initiative aimed at advancing the science of tutoring through large-scale, reusable, and multimodal interaction data. MTM is developed within the National Tutoring Observatory (NTO), a research infrastructure designed to study authentic tutoring interactions and translate them into actionable insights for research, practice, and AI-powered educational technology development. In this paper, we present the vision behind MTM and describe MTM v1, an initial release consisting of 4,654 math tutoring transcripts from a U.S.-based nonprofit online tutoring platform. MTM v1 serves as a first step toward a broader repository that is safe, open, large-scale, broad-coverage, and multimodal. By making tutoring interactions systematically observable and analyzable, MTM aims to support research on instructional processes, improve tutoring practice, and enable the development of AI systems grounded in real educational interactions.

Editor's pickManufacturing & Industrials
ArchDaily· Yesterday

Rethinking the Architecture Firm for the AI Era | ArchDaily

Learn how AI-driven tools are transforming how architects work today, enhancing coordination and research through collaborative workflows.

AI Measurement & Evaluation2 articles
Editor's pickHealthcare
Arxiv· Today

Measuring What Matters: Benchmarking Generative, Multimodal, and Agentic AI in Healthcare

arXiv:2605.08445v1 Announce Type: new Abstract: AI models are increasingly deployed in live clinical environments where they must perform reliably across complex, high-stakes workflows that standard training and validation datasets were never designed to capture. Evaluating these systems requires benchmarks: structured combinations of tasks, datasets, and metrics that enable reproducible, comparable measurement of what a model can do. The central challenge in healthcare AI is not performance alone, but the absence of systematic methods to measure reliability, safety, and clinical relevance under real-world conditions. Most existing benchmarks test what a model knows; too few test whether it can perform reliably and without failing across the full complexity of real clinical tasks. Current benchmarks have accumulated through ad hoc dataset construction optimized for narrow task performance: frontier models achieve near-perfect scores on medical licensing examinations, but when evaluated across real clinical tasks, performance degrades sharply, scoring 0.74--0.85 on documentation, 0.61--0.76 on clinical decision support, and only 0.53--0.63 on administrative and workflow tasks \cite{medhelm}. High benchmark scores give a false sense of deployment readiness, and the gap between performance and utility widens precisely as AI systems take on more consequential clinical roles. Without a principled framework for benchmark design, the field cannot determine whether poor clinical performance reflects model limitations or failures in how performance is being measured.

AI Productivity Evidence4 articles
Editor's pickPAYWALLTechnology
FT· Today

Amazon staff use AI tool for unnecessary tasks to inflate usage scores

In-house MeshClaw tool enables employees to delegate jobs to AI agents and climb company’s AI leaderboard

Editor's pickEducation
Arxiv· Today

Understanding Student Effort Using Response-Time Propensities During Problem Solving

arXiv:2605.08943v1 Announce Type: new Abstract: Adaptive learning systems can produce substantial learning gains, yet many students engage for too brief or too superficial a period to benefit. A central obstacle is measuring effort. Effort during multi-step problem solving is rarely directly observed, and common log-based proxies, such as time on task, cannot distinguish between a student working carefully and a student encountering a harder problem. We examine step-to-step response time as a scalable effort signal by modeling trait-like differences in students' typical response timing during tutoring (while adjusting for skill difficulty). Using step-level logs from eight classroom deployments of algebra tutoring systems (2020 to 2023) across six U.S. schools (794 students), we estimate student- and knowledge-component-level propensities using hierarchical models and relate them to learning efficiency, defined as performance improvement per completed solution step. Response-time propensities show moderate to strong stability within students, supporting their use as an individual differences measure beyond correctness. At the same time, their relationship to learning is not uniform but conditional on the learner and context. Slower propensities predict greater learning efficiency for higher-proficiency students, consistent with constructive processing, whereas for lower-proficiency students, slower propensities are weakly related or even negative, consistent with unproductive struggle or idling. These associations are strongest early in practice sequences and attenuate later in the class period, highlighting an actionable window for detecting emerging disengagement and low persistence. Overall, response-time propensities provide a practical way to incorporate temporal process data into learner models and to target adaptive supports when effort is most diagnostic.

Geopolitics, Policy & Governance

22 articles
AI Geopolitics4 articles
AI Policy & Regulation15 articles
Editor's pickPAYWALLGovernment & Public Sector
Washington Post· Today

Governments can’t agree on what AI actually is

One core reason that global action around AI has been poor is that none of the world agrees on what AI is. First, there is clearly a definitional problem. When some people refer to artificial intelligence , they think almost exclusively about ChatGPT or large language models.

Editor's pick
Arxiv· Today

NeurIPS Should Require Reproducibility Standards for Frontier AI Safety Claims

arXiv:2605.08192v1 Announce Type: new Abstract: Frontier AI safety claims - published assertions that a highly capable general-purpose model is below a threshold of concern, adequately mitigated, or suitable for release - increasingly shape model deployment, governance, and public trust. Yet the artefacts needed to evaluate them are routinely withheld, producing an evidential inversion: the most consequential claims in AI safety are often the least reproducible. This position paper argues that NeurIPS should require reproducibility standards for papers making such claims, treating non-reproducibility not as a transparency preference but as an evaluation-methodology failure. The 2026 International AI Safety Report [Bengio et al., 2026] concludes that reliable pre-deployment safety testing has become harder to conduct and that models now distinguish test from deployment contexts; the 2025 Foundation Model Transparency Index [Wan et al., 2025] reports a sector-average transparency score of 40/100 with no major developer adequately disclosing train-test overlap; contemporaneous measurement-theory work shows that attack-success-rate comparisons across systems are often founded on low-validity measurements [Chouldechova et al., 2025]. We propose a three-tier disclosure framework, distinguishing public, controlled, and claim-restricted disclosure, paired with a mandatory claim inventory, scope statements, and a phased implementation path with graduated sanctions. The framework treats secrecy and openness as endpoints of a spectrum, with controlled review (via a federated colloquium of qualified secure-review hosts) covering claims whose artefacts cannot be released publicly, and right-scaling claims whose artefacts cannot be reviewed even confidentially. The standard the community applies to its most consequential claims should be at least as high as the standard it applies to its least.

Editor's pickTechnology
Reuters· Yesterday

EU says OpenAI offers to open access to cybersecurity model, Anthropic not there yet | Reuters

The European Commission on Monday welcomed an offer by U.S. ‌artificial intelligence giant OpenAI to provide open access to its cybersecurity features, but said its rival Anthropic has not yet gone so far.

Editor's pickGovernment & Public Sector
Arxiv· Today

Social Policy of Large Language Models: How GPT, Claude, DeepSeek and Grok Allocate Social Budgets in Spain and Germany

arXiv:2605.10234v1 Announce Type: new Abstract: We study how four widely used large language models, namely Claude, GPT-4o, DeepSeek and Grok, distribute a fixed national social budget across twelve macro-areas of public expenditure under two European national contexts, Spain and Germany. Each combination of model and country is queried six times under identical prompts and generation parameters, producing forty-eight independent allocations that are compared against approximate Organisation for Economic Co-operation and Development (OECD) reference budgets and against each other. We formalise five hypotheses regarding geopolitical bias, housing under-allocation, structural convergence, sensitivity to national context, and under-representation of politically sensitive categories. The differences between models are then validated through Kruskal-Wallis tests on each macro-area, with post-hoc Mann-Whitney U comparisons under Bonferroni correction, and complemented by an analysis of pairwise Pearson correlations and a lexical examination of the textual justifications produced by each model. The results show that all four models share a systematic implicit social policy that diverges from real European spending structures: pensions are under-allocated by a factor close to three, while housing and employment are over-allocated by factors of four and two respectively. The principal axis of differentiation between models is not geopolitical, since Claude and DeepSeek are the most correlated pair across both countries, but rather a contrast between concentration and dispersion of the budget. Only Claude exhibits substantive sensitivity to the national context. The conclusions delimit the conditions under which language models may responsibly support, but not replace, expert deliberation in public budgeting.

Editor's pickPAYWALLTelecommunications
FT· Today

US communications regulator targets Chinese tech for security risks

FCC chair Brendan Carr cracks down on goods from drones to routers despite trade thaw with Beijing

Editor's pick
Arxiv· Today

Alignment as Jurisprudence

arXiv:2605.08416v1 Announce Type: new Abstract: Jurisprudence, the study of how judges should properly decide cases, and alignment, the science of getting AI models to conform to human values, share a fundamental structure. These seemingly distant fields both seek to predict and shape how decisions by powerful actors, in one case judges and in the other increasingly powerful artificial intelligences, will be made in the unknown future. And they use similar tools of the specification and interpretation of language to try to accomplish those goals. The great debates of jurisprudence, about what the law is and what it should be, can provide insight into alignment, and lessons from what does and does not work in alignment can help make progress in jurisprudence. This essay puts the two fields directly into conversation. Drawing on leading accounts of jurisprudence, particularly Dworkin's principle-oriented interpretivism and Sunstein's positivist account of law as analogical reasoning, and on cutting-edge alignment approaches, namely Constitutional AI and case-based reasoning, it illustrates the value of a more sophisticated legally-inspired approach to the interplay of rules and cases in finetuning alignment and points to ways that AI can provide a better understanding of how the law works and how it can be improved by the introduction of AI. AI systems and the law should operate to empower people to act in the world, helping to expand their capabilities and the extent to which they are able to achieve their goals. As AI continues to improve in capacity, and as the constraints that legal theory places on human judges seem be coming undone, the conversation between these two fields will become increasingly essential and may help point to a better version of both.

Editor's pick
City AM· Yesterday

Starmer's Europe reset risks strangling UK AI sector with EU regulation

Keir Starmer's pledge to place "Britain at the heart of Europe" risks reigniting fears that closer EU ties could undo the UK's AI advantage.

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | May 11, 2026· 4 days ago

EU Commission seeks feedback on AI transparency guidelines

The European Commission has opened a consultation on new transparency guidelines for AI, requiring disclosure when users interact with AI and the implementation of machine-readable marks.

Editor's pickEnergy & Utilities
Artificial Intelligence Newsletter | May 12, 2026· Yesterday

Australian energy ministers mull national data-center rules

Australian energy ministers are set to consider changes to national energy policy in response to a rapid growth of data centers, including requirements for operators to invest in renewable power.

Editor's pickTechnology
Artificial Intelligence Newsletter | May 12, 2026· Yesterday

Meituan, Didi, Alibaba platforms revamp algorithms under CAC campaign

China's major lifestyle-services platforms have completed an initial round of self-inspections and rectification of problematic algorithms as part of a broader push to tighten oversight of the digital economy.

Editor's pickGovernment & Public Sector
Tech Policy Press· Yesterday

When Federal Agencies Pick AI Vendors, They Are Buying Different Policy Interpretations | TechPolicy.Press

Changing AI vendors may also change how government systems perform, say Paulo Carvão, Isabel Adler, Jeffrey Zhou and Claudio Mayrink Verdun.

Editor's pickTechnology
Artificial Intelligence Newsletter | May 12, 2026· Yesterday

US FTC offers Take it Down Act tips, warnings ahead of compliance deadline

The US FTC is reminding tech companies to establish notice and takedown processes for nonconsensual intimate images, including AI deepfakes, before the upcoming compliance deadline.

Editor's pick
Foreign Policy· Yesterday

Clear AI Definitions Needed for New Regulation

Without clear definitions, governance is impossible.

Editor's pick
🐀 Chatbot court snitches· Yesterday

Fair use drives discovery

Fair use is a foundational U.S. copyright principle that preserves access to information for transformative uses, allowing AI to learn from diverse data and accelerate breakthroughs.

Editor's pickTechnology
Artificial Intelligence Newsletter | May 12, 2026· Yesterday

California AG says states still hold authority to regulate content of online platforms

The California Attorney General argued in court that Section 230 does not prevent states from regulating online content, defending a state law against deceptive election communications.

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.