Tue 12 May 2026
Daily Brief — Curated and contextualised by Best Practice AI
Korea Proposes AI Dividend, OpenAI Invests Billions, and Gartner Finds No ROI
TL;DR South Korea is considering a 'citizen dividend' funded by AI profits, reflecting pressure to share gains from tech giants like Samsung. OpenAI is launching a new unit with $4 billion to boost corporate AI capabilities, while a Gartner study reveals that automation-driven layoffs are not improving ROI. Google reports AI being used by hackers to find software flaws, signaling new cybersecurity challenges.
The stories that matter most
Selected and contextualised by the Best Practice AI team
Korea Roils Market by Floating ‘Citizen Dividend’ from AI
A top South Korean policymaker said the nation should pay citizens a “dividend” using taxes on AI profits, underscoring growing pressure to redistribute gains from a boom that’s enriched chipmakers like Samsung Electronics Co. and SK Hynix Inc.
AI isn't paying off in the way companies think. Layoffs driven by automation are failing to generate returns, study finds | Fortune
A Gartner study found that while 80% of companies surveyed reported workforce reductions, there was no correlation to higher ROI.
Generative AI Fuels Solo Entrepreneurship, but Teams Still Lead at the Top
arXiv:2605.10291v1 Announce Type: new Abstract: Recent advances in generative artificial intelligence (AI) are reshaping who enters entrepreneurship, but not who reaches the top of the quality distribution. Using data on over 160,000 product launches on Product Hunt, we find that entrepreneurial entry increased sharply following the public release of ChatGPT-3.5, driven disproportionately by solo entrepreneurs. This shift toward solo entry is particularly pronounced in categories that historically favored team-based ventures. However, much of this growth reflects low-commitment, experimental entry and does not translate into greater representation among the highest-quality outcomes. Team-based ventures are increasingly dominant in the top tiers of platform rankings. These findings suggest that generative AI lowers barriers to solo entrepreneurship while reinforcing team-based advantages.
Google Says Criminal Hackers Used A.I. to Find a Major Software Flaw
The company said that it had identified, for the first time, hackers using artificial intelligence to discover an unknown bug. The attempted attack represents “a taste of what’s to come,” one expert said.
Amazon staff use AI tool for unnecessary tasks to inflate usage scores
In-house MeshClaw tool enables employees to delegate jobs to AI agents and climb company’s AI leaderboard
Sam Altman’s Business Dealings Under GOP Scrutiny Ahead of OpenAI’s IPO
The Republican-led House Oversight Committee says it is investigating, and six GOP state attorneys general are calling for SEC review after a WSJ article.
Economics & Markets
Sam Altman’s Business Dealings Under GOP Scrutiny Ahead of OpenAI’s IPO
The Republican-led House Oversight Committee says it is investigating, and six GOP state attorneys general are calling for SEC review after a WSJ article.
Will investors embrace China’s humanoid robot champion?
Unitree aims to go public later this year in a crucial test for android industry
How AI mania is disguising big companies’ hit from Iran war — in charts
Biggest groups have gained $5.4tn in value since conflict began — but semiconductor sector accounts for most of the gains
Why AI Matters More Than Iran War in Markets
Global equities have rallied to record levels, fuelled by euphoria over artificial intelligence. Investors' risk appetite has held up despite volatility in energy markets caused by the war in Iran. Bloomberg Markets Live Executive Editor Mark Cudmore and Bloomberg TV Markets Producer Anthony Stephens discuss. (Source: Bloomberg)
Nscale Adds $790M To Fund 115MW AI Infrastructure Growth
Nscale Adds $790M to Fund 115MW AI Infrastructure Growth · Light’s Entangled State Steers Energy Flow in Novel Material · Algorithmiq Secures €18M to Industrialize Quantum Algorithms · Integrated Quantum Emitters Enable Stable, On-Demand Single Photons for Communication · Multiverse Computing ...
AI M&A surges as software captures nearly three-quarters of North American deals - InvestmentNews
S&P Global report shows AI transactions hit record levels, with software dominating investor interest.
OpenAI’s $6.6 Billion Employee Payday Signals a Bigger AI Wealth Boom - TipRanks.com
OpenAI, the private AI firm behind ChatGPT, has given some of its staff a large cash win before the company even goes public. According to a Wall Street Journal rep...
LPs fight tooth and nail for foundational AI co-investment share - PitchBook
Competition for deals involving the largest pre-IPO AI companies, like OpenAI and Anthropic, is separating the strongest LPs from the weakest.
Microsoft CEO defends investment in OpenAI as not muddying nonprofit mission
Microsoft CEO Satya Nadella defended the company's $13 billion investment in OpenAI during a California jury trial regarding claims that the investment breached OpenAI's charitable trust.
Software Firm ServiceNow Plans to Raise $4 Billion in Bond Sale
Software company ServiceNow Inc. is looking to raise about $4 billion from a potential US high-grade bond sale tied to the software firm’s recent acquisitions.
Energy Vault reaffirms guidance, sets sights on AI infrastructure - Energy-Storage.News
Energy Vault has released its Q1 2026 financials, showing expansion in AI infrastructure activities, and operations in Australia and Japan.
S&P Rises as Chipmakers Lift Stocks | The Close 5/11/2026
Bloomberg Television brings you the latest news and analysis leading up to the final minutes and seconds before and after the closing bell on Wall Street. Today's guests are Bank of America Securities' Jill Carey Hall, Brown Harris Stevens CEO Bess Freedman, Jeffries' Brian Tanquilut, Grain CEO & Founder David Grain, Cambria Investment Management's Mebane Faber, Hirtle & Co.'s Founder & Executive Chair Jon Hirtle, Oliver Wyman Partner Daniel Tannebaum, 55/Redefined Group's CEO Lyndsey Simpson & KPMG Chief Economist Diane Swonk. (Source: Bloomberg)
Korea Roils Market by Floating ‘Citizen Dividend’ from AI
A top South Korean policymaker said the nation should pay citizens a “dividend” using taxes on AI profits, underscoring growing pressure to redistribute gains from a boom that’s enriched chipmakers like Samsung Electronics Co. and SK Hynix Inc.
Statistical Model Checking of the Keynes+Schumpeter Model: A Transient Sensitivity Analysis of a Macroeconomic ABM
arXiv:2605.10447v1 Announce Type: cross Abstract: Agent-based models (ABMs) are increasingly used in macroeconomics, but their analysis still often relies on ad hoc Monte Carlo campaigns with heterogeneous statistical effort across parameter settings. We show how statistical model checking (SMC), implemented through MultiVeStA, can provide a principled analysis layer for a realistic macroeconomic ABM without rewriting the simulator in a dedicated formalism. Our case study is the heuristic-switching Keynes+Schumpeter(K+S) model, analysed hrough a transient sensitivity campaign over one-parameter sweeps, two macro observables (unemployment and GDP growth), and one auxiliary micro-level probe (market share) on the post-warmup phase of a 600-step horizon. The analysis is driven by reusable temporal queries, observable-specific precision targets, and confidence-based stopping rules that automatically determine the simulation effort required by each configuration. Results show a clear contrast across parameter families: macro-financial and structural sweeps produce the strongest transient effects, whereas several heuristic-rule sweeps remain much weaker under the same precision policy. More broadly, the paper shows that SMC can support reproducible and informative quantitative analysis of substantively rich economic ABMs, while making uncertainty estimates and simulation cost explicit parts of the reported results.
📈⏳ The broken bargain of Moore’s Law
The physics has been slowing for years. The economics may be catching up.
US Fed revamping its infrastructure to cope with AI, legislative changes, Waller says
The US Federal Reserve is updating its infrastructure to address the challenges and opportunities presented by artificial intelligence and evolving legislative requirements.
Energy, Compute and AI infrastructure will define India's next economic cycle, says Gautam Adani
Gautam Adani, Chairman of Adani Group, on Monday said India's next economic cycle will be defined by large-scale investments in energy, data centres, compute infrastructure and artificial intelligence (AI) ecosystems.
On the probability distribution of long-term changes in the growth rate of the global economy: An outside view
arXiv:2605.09182v1 Announce Type: new Abstract: Daniel Kahneman and Amos Tversky argued for challenging inside views (informed by contextual specifics) with outside views (based on historical "base rates" for certain event types). A reasonable inside view of the prospects for the global economy in this century is that growth will converge to 2.5%/year or less: population growth is expected to slow or halt by 2100; and as more countries approach the technological frontier, economic growth should slow as well. To test that view, this paper models gross world product (GWP) observed since 10,000 BCE or earlier, in order to estimate a base distribution for changes in the growth rate as a function of the GWP level. For econometric rigor, it casts a GWP series as a sample path in a stochastic diffusion whose specification is novel yet rooted in neoclassical growth theory. After estimation, most observations fall between the 40th and 60th percentiles of predicted distributions. The fit implies that GWP explosion is all but inevitable, in a median year of 2047. The friction between inside and outside views highlights two insights. First, accelerating growth is more easily explained by theory than is constant growth. Second, the world system may be less stable than traditional growth theory and the growth record of the last two centuries suggest.
TourMart: A Parametric Audit Instrument for Commission Steering in LLM Travel Agents
arXiv:2605.10440v1 Announce Type: new Abstract: Online travel agents (Booking, Trip.com, Expedia) have replaced ranked-list interfaces with conversational LLM agents that compress many options into one sentence of advice. Each booking earns the OTA commission and different suppliers pay different rates: the agent has a structural incentive to favor higher-margin recommendations. Whether any deployed agent does this, and by how much, no one can currently measure. Disclosure banners, conversion A/B testing, UI dark-pattern taxonomies, and generic LLM safety scores were built for older interfaces and miss the prose-recommendation surface where the steering happens. We propose TourMart, an applied intelligent-system audit instrument for LLM-OTA commission governance. Two governance levers -- lambda (gain on message-induced perception in the traveler's accept/reject decision) and kappa (budget-normalized cap on how far the message can shift perceived welfare) -- drive a paired counterfactual: holding the traveler and bundle fixed, the steering delta is read off between a commission-aware prompt and a minimum-disclosure factual template. A symmetric six-gate producer audit separates LLM-engineering failures (template collapse, refusal, internal-ID leakage) from genuine commercial steering. At deployed (lambda=1, kappa=0.05), a Qwen-14B reader shows +7.69pp steering (exact McNemar p=0.003); a Llama-3.1-8B reader shows +3.50pp in the same direction at n=143, with an extended-n supplement (n=270) confirming significance (+2.96pp, p=0.008). Across the (lambda, kappa) grid both arms pass family-wise scenario-clustered correction (p<0.001 / p=0.008). TourMart outputs a sentence a compliance report can quote: "at this deployment, 7.7 extra commission-steered recommendations per 100 paired traveler sessions."
Manipulation, Insider Information, and Regulation in Leveraged Event-Linked Markets
arXiv:2605.10486v1 Announce Type: cross Abstract: The introduction of leverage on prediction-market event contracts raises three structurally distinct questions that have not been addressed jointly: how leverage changes manipulation incentives, how it interacts with informed-trading rents, and how regulatory frameworks should respond. This paper develops a theoretical framework for the first two and a synthesis of the existing regulatory landscape for the third. The principal analytical move is a two-axis manipulation taxonomy distinguishing market-price manipulation from real-world outcome manipulation, where the manipulator affects the underlying event itself. Continuous-underlying derivative markets generally do not make outcome manipulation a venue-level payoff channel; event-linked markets do. Within this taxonomy, leverage plays asymmetric roles: it scales market-price manipulation linearly but shifts the cost-benefit threshold for outcome manipulation, and it scales informed-trading rents in three ways (direct multiplication, Sharpe-ratio preservation, detection-cost amortization). Section 7 connects Paper 1's pre-emption and halt-protocol findings (CC-007b, CC-008) to three manipulation channels: pre-emption introduced by the dynamic-margin engine, halt-arbitrage introduced by the resolution-zone halt protocol, and strategic bad-debt-shifting that no engine in Paper 1's framework family addresses. The framework's manipulation-resistance contribution is a re-allocation of attack surface, not a net reduction. The regulatory synthesis covers principal jurisdictions (US, EU, UK, Singapore, offshore) and identifies three regulatory-arbitrage pathways. The paper concludes with 14 recommendations for venue operators, regulatory bodies, and the research community, separated into framework-independent and framework-conditional categories.
OpenAI can't have incompetent AI consultants ruining the market, so bought its own
By which we mean it bought someone else's with other people's money
Ex-OpenAI exec Sutskever says he spent a year gathering proof of alleged Altman dishonesty | Reuters
Former Open AI chief scientist Ilya Sutskever testified on Monday that he spent about a year gathering evidence for the ChatGPT maker's board that CEO Sam Altman had displayed a "consistent pattern of lying."
Amazon vs Microsoft: 3 Business Model Shifts Reshaping Cloud Stack Wars - FourWeekMBA
The key differentiator: Microsoft ... and Dynamics, the natural hosting choice becomes Azure PaaS services, which then require Azure infrastructure. This integrated approach generates higher per-customer lifetime value than Amazon’s bottom-up model. Artificial intelligence is forcing both companies to rethink their cloud stack monetization strategies. Amazon’s traditional infrastructure-first approach struggles with AI workloads ...
Microsoft’s C.E.O. Intervened When OpenAI Fired Sam Altman, Musk’s Lawyer Claims
Elon Musk’s lawyer argued that Microsoft’s Satya Nadella played a role in getting Mr. Altman his job back at OpenAI when he was briefly fired in 2023.
PLACO: A Multi-Stage Framework for Cost-Effective Performance in Human-AI Teams
arXiv:2605.08388v1 Announce Type: new Abstract: Human-AI teams play a pivotal role in improving overall system performance when neither the human nor the model can achieve such performance on their own. With the advent of powerful and accessible Generative AI models, several mundane tasks have morphed into Human-AI team tasks. From writing essays to developing advanced algorithms, humans have found that using AI assistance has led to an accelerated work pace like never before. In classification tasks, where the final output is a single hard label, it is crucial to address the combination of human and model output. Prior work elegantly solves this problem using Bayes rule, using the assumption that human and model output are conditionally independent given the ground truth. Specifically, it discusses a combination method to combine a single deterministic labeler (the human) and a probabilistic labeler (the classifier model) using the model's instance-level and the human's class-level calibrated probabilities.
GitLab promises a different kind of layoff as biz pivots toward AI
Code hosting biz is trimming its global footprint and flattening its management layer
Who Will Solve the AI Productivity Puzzle? by Robin Rivaton - Project Syndicate
Robin Rivaton shows that value lies in reorganizing overall production processes, not in improving individuals’ output.
AI in Telecommunication Research Report 2026 - Global $6.75+ Bn Market Trends, Opportunities, and Forecasts, 2021-2025 & 2025-2031
The Global AI in Telecommunication Market is expanding due to the demand for lowering operational expenses, managing 5G/IoT complexities, and enhancing network reliability. Opportunities lie in intelligent automation, operational efficiency, autonomous networks, and generative AI for innovative ...
Cracking The Code of Campaign Success with Google's AlphaEvolve Agent
An experiment using Google's AlphaEvolve-style agentic methods to improve campaign prediction and optimization, showing potential accuracy gains.
Generative AI Fuels Solo Entrepreneurship, but Teams Still Lead at the Top
arXiv:2605.10291v1 Announce Type: new Abstract: Recent advances in generative artificial intelligence (AI) are reshaping who enters entrepreneurship, but not who reaches the top of the quality distribution. Using data on over 160,000 product launches on Product Hunt, we find that entrepreneurial entry increased sharply following the public release of ChatGPT-3.5, driven disproportionately by solo entrepreneurs. This shift toward solo entry is particularly pronounced in categories that historically favored team-based ventures. However, much of this growth reflects low-commitment, experimental entry and does not translate into greater representation among the highest-quality outcomes. Team-based ventures are increasingly dominant in the top tiers of platform rankings. These findings suggest that generative AI lowers barriers to solo entrepreneurship while reinforcing team-based advantages.
Council Post: Is AI A Bubble? Watch What Founders Do After They Raise
The way a company behaves in the weeks after funding closes can tell you a lot about market health.
Quantum start-up Algorithmiq raises €18m amid Milan move
Algorithmiq said it is ‘building and industrialising the algorithmic layer’ in quantum technology. Read more: Quantum start-up Algorithmiq raises €18m amid Milan move
Secai Partners With Mila to Accelerate AI-Powered Healthcare Across North America | Markets Insider
MONTREAL, May 11, 2026 (GLOBE NEWSWIRE) -- Secai, the Montreal-based healthcare AI company behind the Voxira platform, is proud to announce a st...
Top 10 AI Startups To Watch: Core Trends & Market Leaders
Discover the Top 10 AI Startups transforming enterprise workflows, developing autonomous agents, and securing major venture capital funding.
Labor, Society & Culture
The Division of Understanding: Specialization and Democratic Accountability
arXiv:2604.09871v2 Announce Type: replace Abstract: This paper studies how the organization of production shapes democratic accountability. I propose a model in which learning economies make specialization productively efficient: most workers perform one-domain tasks, while a small set of integrators with cross-domain knowledge keep the system coherent. When policy consequences run across domains, integrators understand them better than specialists. Electoral competition then tilts government policies toward integrators' interests, while low aggregate system knowledge weakens governance and reduces the fraction of public resources converted into citizen-valued services. Labor markets leave these civic margins unpriced, failing to internalize the political returns to system knowledge. Broadening specialists can therefore raise welfare relative to the market allocation. The model speaks to debates on liberal arts education and the effects of AI.
Will AI turn us all into hipsters and artisans?
There is good reason to be dubious about the notion that automation will supplant all demand for human labour
Labor Supply under Temporary Wage Increases: Evidence from a Randomized Field Experiment
arXiv:2602.11992v2 Announce Type: replace Abstract: We conduct a pre-registered randomized controlled trial to test for income targeting in labor supply decisions among sellers of a Swedish street paper. Unlike most workers, these sellers choose their own hours and face severe liquidity constraints and volatile incomes. Treated individuals received a 25 percent bonus per copy sold for the duration of an issue, simulating an increase in earnings potential. Consistent with standard labor supply theory, they sold more papers and, by our measures, worked longer hours and took fewer days off. These findings contrast with studies on intertemporal labor supply that find small substitution effects.
Fidelity is growing its work force by thousands. Blame AI. - The Boston Globe
The Boston financial services giant is cutting about 1,000 jobs, but it's adding more. The company needs real-world techies and other hands-on workers to roll out key products and services right now.
[Essay] AI’s Three-Body Problem: Why the Future of Work Is So Hard to Predict
Experts still disagree. Some of the most credible people building AI believe artificial superintelligence could arrive within a few years, driven by systems that learn, improve, and compound their own capabilities.
Women Leaders Say AI Workplaces Are Intensifying Invisible Labor
Many women in leadership say the pressure to manage teams, caregiving and AI-era instability simultaneously is becoming neurologically exhausting.
Anthropic’s bug-hunting Mythos was greatest marketing stunt ever, says cURL creator
After all that hype, AI scanner found one low-severity cURL flaw
Playing games with knowledge: AI-Induced delusions need game theoretic interventions
arXiv:2605.08409v1 Announce Type: new Abstract: Conversational AI has a fundamental flaw as a knowledge interface: sycophantic chatbots induce epistemic entrenchment and delusional belief spirals even in rational agents. We propose the problem does not stem from the AI model, rooted instead in a systemic consequence of the paradigm shift from user-driven knowledge search to users and agents engaged in strategic, repeated-play communication. We formalize the problem as a Crawford-Sobel cheap talk game, where costless user signals induce a pooling equilibrium. Agents optimized for user satisfaction produce sycophantic strategies that provide identical reinforcement across user types with opposite epistemic incentives: exploratory ``Growth-seekers'' ($\theta_G$) and confirmatory ``Validation-seekers'' ($\theta_V$). Under repeated play, this identification failure creates a coordination trap -- analogous to a Prisoner's Dilemma -- where locally rational feedback loops drive users toward pathologically certain false beliefs. We propose an inference-time mechanism design intervention called an Epistemic Mediator that breaks this pooling equilibrium by introducing a costly signal (epistemic friction), forcing type revelation based on users' asymmetric cognitive costs for processing resistance. A key contribution is Belief Versioning, a git-inspired epistemic meta-memory system that stores healthy beliefs and rollbacks when validation-seeking resistance is detected. In simulation, this intervention achieves a separating equilibrium achieving a $48\times$ differential in spiral rates while passing a learning preservation criterion), evidence that epistemic safety in AI is fundamentally a problem of strategic information environment design rather than simple model alignment.
Cost-of-Ethics Crisis: Beliefs, Decisions, and Justifications in the Job Searches of Computer Science Students in Canada and the United States
arXiv:2605.09680v1 Announce Type: new Abstract: Workplace norms in computer science have received growing attention due to a series of recent ethical scandals. One type of response has been a push to improve the ethics education provided to computer science students. Evidence for the effectiveness of ethics education remains mixed; some evidence suggests that norms are changing, while gaps between stated values and practice remain. Our focus here is on whether students, who have received some contemporary CS ethics education, are able to effectively apply ethical reasoning to their own decision-making in what is typically the first significant ethical decision of their careers: their job search. Our study examines the ethical decision making of 129 computer science students and recent graduates during their job searches. We find that most students prioritize factors like compensation, location, and workplace culture over ethical and social issues. Even when expressing ethical concerns, respondents often justify taking actions contradicting their moral views through commonly-shared explanations such as desire to make money or the perceived inability to avoid unethical workplaces. This work sheds light on the disconnect between ethics education and real-world CS graduate decision making. We offer insights for evolving curricula to better address practical ethical dilemmas, with implications for educators and industry.
Australia Watchdog Says Money Launderers Ramping Up AI for Scams
Australia’s financial crimes watchdog warned of a heightened threat of money laundering linked to artificial intelligence that has been used by crooks to scale up activities, automate processes and create fake documents.
Teachers' Perceived Benefits and Risks of AI Across Fifty-Five Countries: An Audit of LLM Alignment and Steerability
arXiv:2605.08486v1 Announce Type: new Abstract: Teachers' trust in artificial intelligence (AI) in education depends on how they balance its perceived benefits and risks. Yet global discussions about scaling AI in education rely on fragmented evidence, as most studies of teachers' perceptions focus on single countries or small samples. This lack of representative cross-national evidence limits both theory building and policy development. At the same time, large language models (LLMs) are increasingly used in research, policy, and teachers' professional workflows, despite limited validation in education. To address these gaps, we conduct a large-scale audit of LLM alignment with teachers' perceptions of AI by combining representative international survey data with systematic model evaluation. Using OECD TALIS data from 55 countries and territories, we measure cross-national variation in teachers' perceived benefits and risks of AI. We then benchmark responses from eight state-of-the-art LLMs across four providers under both general and country-specific prompting, comparing higher- and lower-reasoning models. Results reveal substantial cross-national variation in teacher perceptions that is not reliably reflected in LLM outputs. Models compress country differences, overestimate both benefits and risks, and show limited gains from identity prompting or enhanced reasoning. This misalignment matters because LLM-generated guidance and professional discourse increasingly shape how teachers learn about and discuss AI, potentially influencing trust and future adoption decisions. Our findings caution against treating LLM outputs as substitutes for direct engagement with teachers when informing global AI-in-education initiatives. At the same time, some models (e.g., Gemini 3 Fast) partially capture cross-national ranking patterns, suggesting a complementary role in hypothesis generation and exploratory comparative analysis.
StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs
arXiv:2605.10442v1 Announce Type: new Abstract: Multilingual studies of social bias in open-ended LLM generation remain limited: most existing benchmarks are English-centric, template-based, or restricted to recognizing pre-specified stereotypes. We introduce StereoTales, a multilingual dataset and evaluation pipeline for systematically studying the emergence of social bias in open-ended LLM generation. The dataset covers 10 languages and 79 socio-demographic attributes, and comprises over 650k stories generated by 23 recent LLMs, each annotated with the socio-demographic profile of the protagonist across 19 dimensions. From these, we apply statistical tests to identify more than 1{,}500 over-represented associations, which we then rate for harmfulness through both a panel of humans (N = 247) and the same LLMs. We report three main findings. \textbf{(i)} Every model we evaluate emits consequential harmful stereotypes in open-ended generation, regardless of size or capabilities, and these associations are largely shared across providers rather than isolated misbehaviors. \textbf{(ii)} Prompt language strongly shapes which stereotypes appear: rather than transferring as a shared set of biases, harmful associations adapt culturally to the prompt language and amplify bias against locally salient protected groups. \textbf{(iii)} Human and LLM harmfulness judgments are broadly aligned (Spearman $\rho=0.62$), with disagreements concentrating on specific attribute classes rather than specific providers. To support further analyses, we release the evaluation code and the dataset, including model generations, attribute annotations, and harmfulness ratings.
Texas accuses Netflix of spying on children in new lawsuit
Ken Paxton accuses streamer of designing addictive platform and falsely representing data collection practices Texas sued Netflix on Monday, accusing the streaming company of spying on children and designing its platform to be addictive. Ken Paxton, the Texas attorney general, said Netflix has for years falsely represented to consumers that it did not collect or share user data, when it actually tracked and sold viewers’ habits and preferences to commercial data brokers and advertising technology companies, making billions of dollars a year. Continue reading...
Council Post: AI Ethics Beyond Bias: The Risk Of Removing Humans From The Economy
If business leaders do not come together and set their own guardrails, governments will.
OpenAI sued over ChatGPT’s alleged role in Florida State University shooting
OpenAI faces a new wrongful death and negligence lawsuit filed by the family of a 2025 Florida State University shooting victim, alleging the chatbot contributed to the violence.
Little Impact of ChatGPT Availability on High School Student Test Score Performance
arXiv:2605.08812v1 Announce Type: new Abstract: In educational settings, AI can be used as a learning aid, but can also be used to avoid schoolwork, thereby passing classes while learning little. Many existing studies on the impact of AI on education focus on AI use in controlled settings or with specialized tools. In this paper, the dropoff in ChatGPT activity during non-school summer months in 2023 and 2024 is used to identify areas with heavy educational AI use and thus estimate the educational impact of AI as it is actually used. I find no meaningful impact of AI usage on high school test score averages in either direction. These results imply that, to the extent that high school students use AI to avoid learning, it either does not matter much for their test performance or is cancelled out by positive uses of AI in the aggregate.
EDITORIAL: The growing AI skills gap - Taipei Times
Bringing Taiwan to the World and the World to Taiwan
Culture Council: The Skills AI Can’t Replace — And Why We Need to Start Teaching Them in Schools
We are entering a moment where the most valuable skills are not the ones we’ve traditionally prioritized.
Renegotiating the Education Social Contract for the Age of AI (SSIR)
Choice, agency, and how to design a learning system where private gain and public good reinforce each other.
Oev and Anad discuss skills gap and workforce training | Cyprus Mail
The committee was chaired by Yangos ... the Education, Training and Young Talent Attraction Committee of Oev. During the meeting, participants discussed the challenges facing the labour market in relation to the existing skills gap and the need to promote targeted actions for workforce upskilling and reskilling. The management of the Human Resource Development Authority presented both its current and future activities, with particular emphasis on programmes aimed at people ...
Technology & Infrastructure
The public sector agentic era: Trading pilots for transformation - Government Executive
“Through our research, we’ve ... adopters of AI and agents,” said Karen Dahut, CEO of Google Public Sector, at Google Cloud Next. “According to our Return on Investment of AI in the Public Sector report, 55% of public sector leaders say that their organizations are already using AI agents, 42% report that their organization has deployed more than 10 agents, and nearly half, 46%, say their productivity has at least doubled thanks to AI agents.” This measurable ROI is shifting ...
Building Fast & Accurate Agents with Prime-RL Post Training
This article explores using Prime-RL post-training to develop faster, more accurate agents for structured tasks like spreadsheet workflows.
CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents
arXiv:2605.08399v1 Announce Type: new Abstract: Tool-augmented language models can extend small language models with external executable skills, but scaling the tool library creates a coupled challenge: the library must evolve with the planner as new reusable subroutines emerge, while retrieval from the growing library must remain within a fixed context budget. Existing tool-use and skill-library methods typically treat tools as flat or text-indexed memories, causing prompt cost to grow with library size and obscuring the typed, compositional structure of executable code. We propose CoCoDA, a framework that co-evolves the planner and tool library through a single code-native structure: a compositional code DAG. Nodes are primitive or composite tools, edges encode invocation dependencies, and each node stores a typed signature, description, pre/post-condition specification, and worked examples. At inference time, Typed DAG Retrieval prunes candidates by symbolic signature unification, ranks survivors by descriptions, filters them by behavioral specifications, and disambiguates with examples, keeping expensive context materialization on progressively smaller candidate sets. At training time, successful trajectories are folded into validated composite tools, while the planner is updated with a DAG-induced reward that credits composites by their primitive expansion size. We provide theoretical results showing retrieval cost reduction, sublinear retrieval time, compositional advantage under the shaped reward, monotone co-evolution under conservative updates, and DAG well-formedness. Across mathematical reasoning, tabular analysis, and code task benchmarks, CoCoDA enables an 8B student to match or exceed a 32B teacher on GSM8K and MATH and consistently improves over strong tool-use and library-learning baselines.
MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs
arXiv:2605.08374v1 Announce Type: new Abstract: Episodic memory allows LLM agents to accumulate and retrieve experience, but current methods treat each memory independently, i.e., evaluating retrieval quality in isolation without accounting for the dependency chains through which memories enable the creation of future memories. We introduce MemQ, which applies TD($\lambda$) eligibility traces to memory Q-values, propagating credit backward through a provenance DAG that records which memories were retrieved when each new memory was created. Credit weight decays as $(\gamma\lambda)^d$ with DAG depth $d$, replacing temporal distance with structural proximity. We formalize the setting as an Exogenous-Context MDP, whose factored transition decouples the exogenous task stream from the endogenous memory store. Across six benchmarks, spanning OS interaction, function calling, code generation, multimodal reasoning, embodied reasoning, and expert-level QA, MemQ achieves the highest success rate on all six in generalization evaluation and runtime learning, with gains largest on multi-step tasks that produce deep and relevant provenance chains (up to +5.7~pp) and smallest on single-step classification (+0.77~pp) where single-step updates already suffice. We further study how $\gamma$ and $\lambda$ interact with the EC-MDP structure, providing principled guidance for parameter selection and future research. Code will be available soon.
Towards an agentic IT infrastructure: what does that mean? - Techzine Global
IT infrastructures aren't the easiest to automate with probabilistic systems. What can AI agents deliver for these solutions?
Belgium’s Holmes launches with €1.1 million pre-Seed to catch software bugs before they reach users
Holmes, a technology startup automating software testing in the AI era, has launched with a €1.1 million pre-Seed funding round. The round was led by Syndicate One, with participation from Aikido founders Roeland Delrue and Willem Delbare, Showpad co-founder Louis Jonckheere, and serial entrepreneur Thomas Van Overbeke. The funds NewSchool, RDY, and 100IN also participated. […]
Genesis Mission: The Manhattan Project for AI. Kind of.
The largest AI companies in the world all signed on at the same time. The mission identifies twenty-six scientific challenges these AI agents are supposed to tackle, ranging from cancer research to fusion energy to quantum computing.
Korea’s AI Memory Dominance: Limits to Future AI Leadership
Years of manufacturing expertise and tech know-how keep Korea at the heart of AI hardware supply chains. That edge brings strong margins as AI demand keeps climbing, putting memory makers in the spotlight of the semiconductor world. But things are shifting. The industry’s focus is moving toward chip ...
Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI
arXiv:2605.08137v1 Announce Type: cross Abstract: Weight pruning is widely advocated for deploying Large Language Models on resource-constrained IoT and edge devices, yet its impact on model fairness remains poorly understood. We conduct a controlled empirical study of three instruction-tuned models (Gemma-2-9b-it, Mistral-7B-Instruct-v0.3, Phi-3.5-mini-instruct) across three pruning methods (Random, Magnitude, Wanda) at four sparsity levels (10-70%) on 12,148 BBQ bias benchmark items with 5 random seeds, totaling 2,368,860 inference records. Our results reveal a Smart Pruning Paradox: activation-aware pruning (Wanda) preserves perplexity nearly perfectly (just 3.5% increase at 50% sparsity for Mistral-7B), yet produces the highest bias amplification, with Stereotype Reliance Score increasing 83.7% and 47-59% of previously unbiased items developing new stereotypical behaviors at 70% sparsity. Random pruning destroys language capability entirely (perplexity exceeding $10^4$ and reaching $10^8$) but produces only random-chance bias. We further show that unstructured pruning provides zero storage savings and zero inference latency reduction on real edge hardware, undermining the primary motivation for its use in IoT deployment. Of 180 dense-vs-pruned comparisons, 141 (78.3%) are significant ($p < 0.05$) with mean $|h| = 0.305$. Published quantization studies report up to 21% of responses flipping between biased and unbiased states; our pruning results show transition rates nearly three times higher (47-59%), suggesting pruning poses a categorically greater risk to alignment than quantization. These findings demonstrate that perplexity-based evaluation provides false assurance of behavioral equivalence, and that IoT deployment pipelines require bias-aware validation before deploying pruned models at the edge.
COMPUTEX 2026: Why Taiwan Is Becoming the Strategic Center of the Global AI Supply Chain
This means Taiwan’s supply chain must evolve from pure manufacturing execution toward deeper co-development. Suppliers must support more customized platforms, faster engineering cycles, and more complex integration demands. The companies that can move from component supply to system-level collaboration will capture more value in the AI infrastructure cycle. ... Although COMPUTEX will be filled with AI chip ...
LevelFields — AI Infrastructure Boom Drives Demand for Utilities, Nuclear Power and Grid Expansion
Lumentum, which supplies optical ... revenue growth YoY and significant margin expansion. Management pointed specifically to rising demand tied to cloud computing, optical networking, and AI infrastructure buildouts. The impact is now spreading into utilities and power generation as well. Constellation Energy highlighted ...
Market Outlook: AI infrastructure demand keeps stocks climbing
Hans Albrecht says AI infrastructure spending and rising inference demand continue to support markets near record highs.
MICROIP Unveils "Software-Driven Hardware" Strategy at EEC 2026, Partnering with Poland to Build a Resilient Edge AI & ASIC Supply Chain
/PRNewswire/ -- At the 2026 European Economic Congress (EEC 2026), MICROIP Chairman Dr. James Yang participated in a high-level dialogue at the "Poland-Taiwan...
Commentary: SEA semiconductor industry pivots towards AI as a strategic hub
SEMICON SEA 2026 drew heavy crowds, underscoring Southeast Asia's emergence as an indispensable "strategic hub" in the global AI compute supply chain. The region's semiconductor players are now moving from "capacity substitution" to "technological self-reliance" in what is shaping up to be ...
SoftBank in Talks for Major Data Center Project in France
SoftBank Group Corp. founder Masayoshi Son has held talks about unveiling an ambitious French AI data center project with President Emmanuel Macron in the coming weeks, according to people familiar with the matter.
What’s Next for Telecom and Digital Infrastructure
Grain Management CEO and founder David Grain discusses what's next for telecom and digital infrastructure. He sees real workloads absorbing infrastructure and says they're 'very' active in deploying capital. He speaks with Katie Greifeld and Romaine Bostick on "The Close." (Source: Bloomberg)
Nscale extends funding momentum with €670 million for Norway AI infrastructure project
London’s AI infrastructure hyperscaler titan Nscale has announced an additional €670 million ($790 million) in financing to reinforce the continued development of the AI data centre in Narvik, Norway; reportedly the largest AI infrastructure project in the country. The financing was committed by ABN AMRO, DNB, Eksfin, Nordea and SEB. The committed financing includes an […]
Kneron Warns the AI Industry Is Approaching a Massive Inference Infrastructure Bottleneck | The Manila Times
SAN DIEGO, May 12, 2026 (GLOBE NEWSWIRE) -- Kneron, the San Diego based edge AI company developing full stack inference infrastructure, says the artificial intelligence industry may be vastly underestimating the next major bottleneck of AI and it has nothing to do with training larger models.
Samsung SDS-led group selected for South Korea's national AI computing center
A Samsung SDS-led consortium has been selected to build and operate South Korea's 2.5-trillion-won national AI computing center, with construction beginning this year.
How Chinese PCB Manufacturers Are Supporting the AI Industry
The focus is not only on “who can make a PCB,” but on which companies appear relevant to the AI hardware supply chain through high-speed PCB, HDI PCB, AI server PCB, turnkey PCBA, and EMS support. AI computing creates a different kind of challenge for PCB manufacturing.
Calibrating Behavioral Parameters with Large Language Models
arXiv:2602.01022v3 Announce Type: replace Abstract: Behavioral parameters such as loss aversion, herding, and extrapolation are central to asset pricing models but remain difficult to measure reliably. We develop a framework that treats large language models (LLMs) as calibrated measurement instruments for behavioral parameters. Using four models and 24{,}000 agent--scenario pairs, we document systematic rationality bias in baseline LLM behavior, including attenuated loss aversion, weak herding, and near-zero disposition effects relative to human benchmarks. Profile-based calibration induces large, stable, and theoretically coherent shifts in several parameters, with calibrated loss aversion, herding, extrapolation, and anchoring reaching or exceeding benchmark magnitudes. To assess external validity, we embed calibrated parameters in an agent-based asset pricing model, where calibrated extrapolation generates short-horizon momentum and long-horizon reversal patterns consistent with empirical evidence. Our results establish measurement ranges, calibration functions, and explicit boundaries for eight canonical behavioral biases.
Embeddings for Preferences, Not Semantics
arXiv:2605.08360v1 Announce Type: new Abstract: Modern AI is opening the door to collective decision-making in which participants express their views as free-form text rather than voting on a fixed set of candidates. A natural idea is to embed these opinions in a vector space so that the substantial literature on facility location problems and fair clustering can be brought to bear. But standard text embeddings measure semantic similarity, whereas distances in facility location problems and fair clustering require what we call \textit{preferential similarity}: a participant's agreement with a piece of text should be inversely related to their distance from it. Off-the-shelf embeddings inherit a coarse preference signal through a correlation between semantic and preferential similarity, but fail to capture preferences when the correlation breaks. We formalize this as an invariance problem: text embedding models encode both a preference-relevant signal (stance and values) and semantic nuisance (style and wording), and the two are observationally correlated, so a geometry that relies on nuisance can appear preference-correct even when it is not. We show that synthetic training data designed to break this correlation provably shifts the optimal scorer away from nuisance-dominated cosine and significantly improves preference prediction across 11 online deliberation datasets.
Microsoft researchers find AI models and agents can't handle long-running tasks
An intern who failed this much would be shown the door
MedThink: Enhancing Diagnostic Accuracy in Small Models via Teacher-Guided Reasoning Correction
arXiv:2605.08094v1 Announce Type: new Abstract: Accurate clinical diagnosis requires extensive domain knowledge and complex clinical reasoning capabilities. Although large language models (LLMs) hold great potential for clinical reasoning, their high computational and memory requirements limit their deployment in resource-constrained environments. Knowledge distillation (KD) can compress LLM capabilities into smaller models, but traditional KD merely transfers superficial answer patterns and fails to preserve the structured reasoning required for reliable diagnosis. To address this, we propose a two-stage distillation framework, MedThink, designed to cultivate robust clinical reasoning in small language models (SLMs). In the first stage, a teacher LLM screens data and injects domain-knowledge explanations to fine-tune a student model, establishing a knowledge foundation. In the second stage, the teacher evaluates the student's errors, generates reasoning chains linking knowledge to correct answers, and refines the student's diagnostic reasoning through a second round of fine-tuning. We evaluate MedThink on general medical benchmarks and a gastroenterology dataset comprising 955 question-answer pairs. Experiments demonstrate that MedThink outperforms six distillation strategies in all benchmarks: achieving an improvement of up to 12.7% over the student baseline in general tasks, and reaching a total top accuracy of 56.4% in gastroenterology evaluation. This indicates that iterative distillation centered on reasoning can significantly enhance the diagnostic accuracy and generalization capabilities of SLMs whilst maintaining computational efficiency. Our code and data are publicly available at https://github.com/destinybird/PrecisionBoost.
Thinking Machines shows off preview of near-realtime AI voice and video conversation with new 'interaction models'
Is AI leaving the era of "turn-based" chat? Right now, all of us who use AI models regularly for work or in our personal lives know that the basic interaction mode across text, imagery, audio, and video remains the same: the human user provides an input, waits anywhere between milliseconds to minutes (or in some cases, for particularly tough queries, hours and days), and the AI model provides an output. But if AI is to really take on the load of jobs requiring natural interaction, it will need to do more than provide this kind of "turn-based" interactivity — it will ultimately need to respond more fluidly and naturally to human inputs, even responding while also processing the next human input, be it text or another format. That at least seems to be the contention of Thinking Machines, the well-funded AI startup founded last year by former OpenAI chief technology officer Mira Murati and former OpenAI researcher and co-founder John Schulman, among others. Today, the firm announced a research preview of what it deems to be "interaction models, a new class of native multimodal systems that treats interactivity as a first-class citizen of model architecture rather than an external software "harness," scoring some impressive gains on third-party benchmarks and reduced latency as a result. However, the models are not yet available to the general public or even enterprises — the company says in its announcement blog post: "In the coming months, we will open a limited research preview to collect feedback, with a wider release later this year." 'Full duplex' simultaneous input/output processing At the heart of this announcement is a fundamental shift in how AI perceives time and presence. Current frontier models typically experience reality in a single thread; they wait for a user to finish an input before they begin processing, and their perception freezes while they generate a response. In their blog post, the Thinking Machines researchers described the status quo as a limitation that forces humans to "contort themselves" to AI interfaces, phrasing questions like emails and batching their thoughts. To solve this "collaboration bottleneck," Thinking Machines has moved away from the standard alternating token sequence. Instead, they use a multi-stream, micro-turn design that processes 200ms chunks of input and output simultaneously. This "full-duplex" architecture allows the model to listen, talk, and see in real time, enabling it to backchannel while a user speaks or interject when it notices a visual cue—such as a user writing a bug in a code snippet or a friend entering a video frame. Technically, the model utilizes encoder-free early fusion. Rather than relying on massive standalone encoders like Whisper for audio, the system takes in raw audio signals as dMel and image patches (40x40) through a lightweight embedding layer, co-training all components from scratch within the transformer. Dual model system The research preview introduces TML-Interaction-Small, a 276-billion parameter Mixture-of-Experts (MoE) model with 12 billion active parameters. Because real-time interaction requires near-instantaneous response times that often conflict with deep reasoning, the company has architected a two-part system: The Interaction Model: Stays in a constant exchange with the user, handling dialog management, presence, and immediate follow-ups. The Background Model: An asynchronous agent that handles sustained reasoning, web browsing, or complex tool calls, streaming results back to the interaction model to be woven naturally into the conversation. This setup allows the AI to perform tasks like live translation or generating a UI chart while continuing to listen to user feedback—a capability demonstrated in the announcement video where the model provided typical human reaction times for various cues while simultaneously generating a bar chart. Impressive performance on major benchmarks against other leading AI labs' fast interaction models To prove the efficacy of this approach, the lab utilized FD-bench, a benchmark specifically designed to measure interaction quality rather than just raw intelligence.The results show that TML-Interaction-Small significantly outperforms existing real-time systems: Responsiveness: It achieved a turn-taking latency of 0.40 seconds, compared to 0.57s for Gemini-3.1-flash-live and 1.18s for GPT-realtime-2.0 (minimal). Interaction Quality: On FD-bench V1.5, it scored 77.8, nearly doubling the scores of its primary competitors (GPT-realtime-2.0 minimal scored 46.8). Visual Proactivity: In specialized tests like RepCount-A (counting physical repetitions in video) and ProactiveVideoQA, Thinking Machines’ model successfully engaged with the visual world while other frontier models remained silent or provided incorrect answers. Metric TML-Interaction-Small GPT-realtime-2.0 (min) Gemini-3.1-flash-live (min) Turn-taking latency (s) 0.40 1.18 0.57 Interaction Quality (Avg) 77.8 46.8 54.3 IFEval (VoiceBench) 82.1 81.7 67.6 Harmbench (Refusal %) 99.0 99.5 99.0 A potentially huge boon to enterprises — once the models are made available If made available to the enterprise sector, Thinking Machines' interaction models would represent a fundamental shift in how businesses integrate AI into their operational workflows. A native interaction model like TML-Interaction-Small allows for several enterprise capabilities that are currently impossible or highly brittle with standard multimodal models: Current enterprise AI requires a "turn" to be completed before it can analyze data. In a manufacturing or lab setting, a native interaction model can monitor a video feed and proactively interject the moment it detects a safety violation or a deviation from a protocol — without waiting for the worker to ask for feedback. The model's success in visual benchmarks like RepCount-A (accurate repetition counting) and ProactiveVideoQA (answering questions as visual evidence appears) suggests it could serve as a real-time auditor for high-stakes physical tasks. The primary friction in voice-based customer service is the 1–2 second "processing" delay common in 2026's standard APIs. Thinking Machines' model achieves a turn-taking latency of 0.40 seconds, roughly the speed of a natural human conversation. Because it handles simultaneous speech natively, an enterprise support bot could listen to a customer's frustration, provide "backchannel" cues (like "I see" or "mm-hmm") without interrupting the user, and offer live translation that feels like a natural conversation rather than a series of disjointed recordings. Standard LLMs lack an internal clock; they "know" time only if it is provided in a text prompt. Interaction models are natively time-aware, allowing them to manage time-sensitive processes like "Remind me to check the temperature every 4 minutes" or "Alert me if this process takes longer than the last one". This is critical for industrial maintenance and pharmaceutical research where timing is an essential variable. Background on Thinking Machines This release marks the second major milestone for Thinking Machines following the October 2025 launch of Tinker, a managed API for fine-tuning language models that lets researchers and developers control their data and training methods while Thinking Machines handles the infrastructure burden of distributed training. The company said Tinker supports both small and large open-weight models, including mixture-of-experts models, and early users included groups at Princeton, Stanford, Berkeley and Redwood Research. At launch in early 2025, Thinking Machines framed itself as an AI research and product company trying to make advanced AI systems “more widely understood, customizable and generally capable.” In July 2025, Thinking Machines said it had raised about $2 billion at a $12 billion valuation in a round led by Andreessen Horowitz, with participation from Nvidia, Accel, ServiceNow, Cisco, AMD and Jane Street, described by WIRED as the largest seed funding round in history. The Wall Street Journal reported in August 2025 that rival tech CEO Mark Zuckerberg approached Murati about acquiring Thinking Machines Lab and, after she declined, Meta pursued more than a dozen of the startup’s roughly 50 employees. In March and April 2026, the company also became known for its compute ambitions: it announced a Nvidia partnership to deploy at least one gigawatt of next-generation Vera Rubin systems, then expanded its Google Cloud relationship to use Google’s AI Hypercomputer infrastructure with Nvidia GB300 systems for model research, reinforcement learning workloads, frontier model training and Tinker. By April 2026, Business Insider reported that Meta had hired seven founding members from Thinking Machines, including Mark Jen and Yinghai Lu, while another Thinking Machines researcher, Tianyi Zhang, also moved to Meta. The same reporting said Joshua Gross, who helped build Thinking Machines’ flagship fine-tuning product Tinker, had joined Meta Superintelligence Labs, and that the company had grown to about 130 employees despite the departures. Thinking Machines was not simply losing people, however: it also hired Meta veteran Soumith Chintala, creator of PyTorch, as CTO, and added other high-profile technical talent such as Neal Wu. TechCrunch separately reported in April 2026 that Weiyao Wang, an eight-year Meta veteran who worked on multimodal perception systems, had joined Thinking Machines, underscoring that the talent flow was not one-way. Thinking Machines previously stated it was committed to "significant open source components" in its releases to empower the research community. It's unclear if these new interaction models models will fall under the same ethos and release terms. But one thing is certain: by making interactivity native to the model, Thinking Machines believes that scaling a model will now make it both smarter and a more effective collaborator.
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria
arXiv:2605.08354v1 Announce Type: new Abstract: Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment. Prevailing RLHF approaches reduce this structure to scalar or pairwise labels, collapsing nuanced preferences into opaque parametric proxies and exposing vulnerabilities to reward hacking. While recent Rubrics-as-Reward (RaR) methods attempt to recover this structure through explicit criteria, generating rubrics that are simultaneously reliable, scalable, and data-efficient remains an open problem. We introduce Auto-Rubric as Reward (ARR), a framework that reframes reward modeling from implicit weight optimization to explicit, criteria-based decomposition. Before any pairwise comparison, ARR externalizes a VLM's internalized preference knowledge as prompt-specific rubrics, translating holistic intent into independently verifiable quality dimensions. This conversion of implicit preference structure into inspectable, interpretable constraints substantially suppresses evaluation biases including positional bias, enabling both zero-shot deployment and few-shot conditioning on minimal supervision. To extend these gains into generative training, we propose Rubric Policy Optimization (RPO), which distills ARR's structured multi-dimensional evaluation into a robust binary reward, replacing opaque scalar regression with rubric-conditioned preference decisions that stabilize policy gradients. On text-to-image generation and image editing benchmarks, ARR-RPO outperforms pairwise reward models and VLM judges, demonstrating that explicitly externalizing implicit preference knowledge into structured rubrics achieves more reliable, data-efficient multimodal alignment, revealing that the bottleneck is the absence of a factorized interface, not a deficit of knowledge.
Analysis | See the hidden rules behind AI. Then use them to rewrite this article. - Washington Post
Behind the scenes, artificial intelligence companies invisibly add thousands of words of instructions to every conversation you have with a chatbot to steer its behavior. They include phrases like “Aim for readable, accessible responses” and “You must avoid providing …
ZAYA1-74B-Preview: Scaling Pretraining on AMD
Zyphra's ZAYA1-74B-Preview is an open-weights MoE large language model trained end-to-end on AMD infrastructure, currently available as a pre-RL preview.
Forecasting Residential Heating and Electricity Demand with Scalable, High-Resolution, Open-Source Models
arXiv:2505.22873v2 Announce Type: replace Abstract: We present a novel framework for high-resolution forecasting of residential heating demand and non-heating electricity demand using probabilistic deep learning models. Because our models are trained on electricity consumption from a predominantly gas-heated region, the learned electricity demand patterns primarily reflect non-heating end uses such as lighting, appliances, and cooling. We focus specifically on providing hourly building-level electricity and heating demand forecasts for the residential sector. Leveraging multimodal building-level information -- including data on building footprint areas, heights, nearby building density, nearby building size, land use patterns, and high-resolution weather data -- and probabilistic modeling, our methods provide granular insights into demand heterogeneity. Validation at the building level underscores a step change improvement in performance relative to NREL's ResStock model, which has emerged as a research community standard for residential heating and electricity demand characterization. In building-level heating and electricity estimation backtests, our probabilistic models respectively achieve RMSE scores 18.8% and 27.6% lower than those based on ResStock, with probabilistic forecast quality measured via WIS improving by 59% for both applications. By offering an open-source, scalable, high-resolution platform for demand estimation and forecasting, this research advances the tools available for policymakers and grid planners, contributing to the broader effort to decarbonize the U.S. building stock and meeting climate objectives.
Where Reliability Lives in Vision-Language Models: A Mechanistic Study of Attention, Hidden States, and Causal Circuits
arXiv:2605.08200v1 Announce Type: new Abstract: A pervasive intuition holds that vision-language models (VLMs) are most trustworthy when their attention maps look sharp: concentrated attention on the queried region should imply a confident, calibrated answer. We test this Attention-Confidence Assumption directly. We instrument three open-weight VLM families (LLaVA-1.5, PaliGemma, Qwen2-VL; 3-7B parameters) with a unified mechanistic pipeline -- the VLM Reliability Probe (VRP) -- that compares attention structure, generation dynamics, and hidden-state geometry against a single correctness label. Three results emerge. (i) Attention structure is a near-zero predictor of correctness (R_pb(C_k,y)=0.001, 95% CI [-0.034,0.036]; R_pb(H_s,y)=-0.012, [-0.047,0.024] on a pooled n=3,090 split), even though attention remains causally necessary for feature extraction (top-30% patch masking drops accuracy by 8.2-11.3 pp, p0.95 on POPE for two of three families, and self-consistency at K=10 is the strongest behavioral predictor we measure at 10x inference cost (R_pb=0.43). (iii) Causal neuron-level ablations expose a sharp architectural split with direct monitor-design implications: late-fusion LLaVA concentrates reliability in a fragile late bottleneck (-8.3 pp object-identification accuracy after top-5 probe-neuron ablation), whereas early-fusion PaliGemma and Qwen2-VL distribute it widely and absorb destruction of ~50% of their peak-layer hidden dimension with <=1 pp degradation. The takeaway is narrow but consequential: in 3-7B VLMs, reliability is read more reliably off hidden-state geometry, layer-wise margin formation, and sparse late-layer circuits than off attention-map sharpness.
On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective
arXiv:2605.08368v1 Announce Type: new Abstract: Debates about large language model post-training often treat supervised fine-tuning (SFT) as imitation and reinforcement learning (RL) as discovery. But this distinction is too coarse. What matters is whether a training procedure increases the probability of behaviors the pretrained model could already produce, or whether it changes what the model can practically reach. We argue that post-training research should distinguish between capability elicitation and capability creation. We make this distinction operational by introducing the notion of accessible support: the set of behaviors that a model can practically produce under finite budgets. Post-training that reweights behaviors within this support is capability elicitation; whereas changing the support itself corresponds to capability creation. We develop this argument through a free-energy view of post-training. SFT and RL can both be seen as reweighting a pretrained reference distribution, only with different external signals. Demonstration signals define low-energy behavior for SFT, and reward signals define low-energy behavior for RL. When the update remains close to the base model, the main effect is local reweighting, not capability creation. Within this framework, the central question is no longer whether post-training is framed as SFT or RL, but whether it reweights behaviors already within reach, or instead expands the model's reachable behavioral space through search, interaction, tool use, or the incorporation of new information.
Belief or Circuitry? Causal Evidence for In-Context Graph Learning
arXiv:2605.08405v1 Announce Type: new Abstract: How do LLMs learn in-context? Is it by pattern-matching recent tokens, or by inferring latent structure? We probe this question using a toy graph random-walk across two competing graph structures. This task's answer is, in principle, decidable: either the model tracks global topology, or it copies local transitions. We present two lines of evidence that neither account alone is sufficient. First, reconstructing the internal representation structure via PCA reveals that at intermediate mixture ratios, both graph topologies are encoded in orthogonal principal subspaces simultaneously. This pattern is difficult to reconcile with purely local transition copying. Second, residual-stream activation patching and graph-difference steering causally intervene on this graph-family signal: late-layer patching almost fully transfers the clean graph preference, while linear steering moves predictions in the intended direction and fails under norm-matched and label-shuffled controls. Taken together, our findings are most consistent with a dual-mechanism account in which genuine structure inference and induction circuits operate in parallel.
Singapore mobilizes whole-of-country response to frontier AI cyber threats
Singapore has directed critical infrastructure and telecommunications operators to bolster cybersecurity in response to the growing threat of AI-accelerated cyberattacks.
China's cybersecurity AI charges ahead despite US model lockout
AI is upending the world of cybersecurity, as more capable models and agentic capabilities bring about new ways to exploit vulnerabilities, along with new ways to discover and patch them. While major US AI companies have so far led the race, Chinese firms are also using AI to stay competitive ...
Google Says Criminal Hackers Used A.I. to Find a Major Software Flaw
The company said that it had identified, for the first time, hackers using artificial intelligence to discover an unknown bug. The attempted attack represents “a taste of what’s to come,” one expert said.
AI-powered hacking has exploded into industrial-scale threat, Google says
Criminal groups and state-linked actors appear to be using commercial models to refine and scale up attacks Business live – latest updates In just three months, AI-powered hacking has gone from a nascent problem to an industrial-scale threat, according to a report from Google. The findings from Google’s threat intelligence group add to an intensifying, global discussion about how the newest AI models are extremely adept at coding – and becoming extremely powerful tools for exploiting vulnerabilities in a broad array of software systems. Continue reading...
AI tool poisoning exposes a major flaw in enterprise agent security
Researchers have identified a vulnerability where AI tools can be 'poisoned,' creating significant security risks for enterprise-level agents.
The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play
arXiv:2605.08427v1 Announce Type: new Abstract: Self-play red team is an established approach to improving AI safety in which different instances of the same model play attacker and defender roles in a zero-sum game, i.e., where the attacker tries to jailbreak the defender; if self-play converges to a Nash equilibrium, the model is guaranteed to respond safely within the settings of the game. Although the parameter sharing enforced by the use of the same model for the two roles improves stability and performance, it introduces fundamental theoretical and architectural limitations. We show that the set of Nash equilibria that can be reached corresponds to a broad class of behaviours that includes trivial always refuse strategies and oracle-like defenders, thus limiting practical applicability. We then show that when attacker and defender share and update the same base model, the dynamics collapse to self-consistency, so that attacks do not enforce adversarial pressure on the defender. In response, we propose Anchored Bipolicy Self-Play, which trains distinct role-specific LoRA adapters on top of a frozen base model, thereby maintaining stable optimisation while preserving adversarial pressure through explicit role separation. In relation to standard self-play, we show up to 100x greater parameter efficiency than finetuning and consistent improvements in safety compared to self-play fine-tuned models. We evaluate on Qwen2.5-{3B, 7B,14B}-IT models across widely used safety benchmarks, showing improved robustness without loss of reasoning ability. Cross-play experiments further show that our attacker and defender models are superior to self-play in terms of adversarial defence and safety.
Exclusive: White Circle raises $11 million to stop AI models from going rogue in the workplace
The Paris startup, backed by leaders from OpenAI, Anthropic, DeepMind, Mistral, and Hugging Face, says companies need real-time tools to control what AI systems do after they are deployed.
OpenAI launching security AI initiative to compete with Claude Mythos
The system will focus on detecting and patching vulnerabilities before attackers can find and exploit them. Read more: OpenAI launching security AI initiative to compete with Claude Mythos
Council Post: Data Security Considerations For Building Enterprise AI Agents
Every time an agent acts on untrusted input, it creates an opportunity for that pipeline to be exploited.
Security teams are turning to AI to survive alert overload - Help Net Security
Cybersecurity teams are expanding AI adoption across threat detection, incident response and security operations workflows.
AI Is Involved in Most Modern Security Breaches: Report | Extremetech
AI is a major component of both cyberattacks and defenses.
Adoption, Deployment & Impact
Human Learning about AI
arXiv:2406.05408v3 Announce Type: replace Abstract: We study \emph{Human Projection} (HP): people's tendency to evaluate AI using the same frameworks they use for humans -- treating features such as task difficulty and the reasonableness of mistakes as diagnostic of overall ability. We formalize HP and its consequences for equilibrium adoption, testing its predictions experimentally. First, people project human difficulty onto AI, overestimating performance on human-easy tasks, underestimating it on human-hard ones, and over-updating after easy failures and hard successes -- leading to systematic misspecification when AI performance is jagged rather than human-ordered. Second, HP interprets observed performance through a single ability index, inducing all-or-nothing adoption even when AI outperforms humans on only some tasks; experimentally stripping AI of human-like cues weakens cross-task generalization and reduces over-adoption. Finally, a field experiment with a parenting-advice chatbot shows that less humanly reasonable mistakes cause larger drops in trust and future engagement. Anthropomorphic AI design can amplify HP, misaligning beliefs and distorting adoption.
Fostering breakthrough AI innovation through customer-back engineering
Despite years of digitization, organizations capture less than one-third of the value expected from digital investments, according to McKinsey research. That’s because most big companies begin with technological capabilities and bolt applications onto them, rather than starting with customer needs and working backward to technology solutions. Not prioritizing the customer can create fragmented solutions; disjointed…
From Expansion to Consolidation: Socio-Spatial Contagion Dynamics in Off-Grid PV Adoption
arXiv:2605.09642v1 Announce Type: new Abstract: In traditional rural societies, where social ties are embedded in physical space, the diffusion of emerging technologies may be amplified through socio-spatial contagion (SSC). Such processes may play a key role in accelerating residential PV adoption in off-grid regions. Yet empirical evidence on SSC in PV adoption remains largely limited to affluent, grid-connected settings, while off-grid regions often lack systematic installation records. To address these gaps, we use a deep learning segmentation model to extract PV installations from a decade-long series of remote sensing imagery across 507 off-grid settlement clusters (hereafter, communities). This enables data-driven spatio-temporal point pattern inference of SSC in data-scarce contexts. SSC is quantified through the range and intensity of clustering of new installations around prior adopters, and the dynamics of these dimensions are linked to adoption outcomes. We found that SSC is nearly ubiquitous, often spanning most of the community's spatial extent, while exhibiting substantial heterogeneity in intensity. Although SSC intensifies over time, its effects remain temporally concentrated, peaking within 1 to 2 years of nearby installations and weakening thereafter. SSC intensity is positively associated with adoption rates in both cross-sectional and temporal analyses. However, the relationship between SSC range and adoption changes over time - in early diffusion phases, adoption growth is associated with range expansion, whereas in later phases it is associated with range contraction. This shift reflects a transition from clustering to consolidation of installations. These findings highlight the potential of seeding interventions to accelerate PV diffusion in off-grid regions.
Council Post: AI Infrastructure Is Scaling Fast. Decision-Making Isn’t
AI infrastructure is scaling faster than enterprise decision-making. And that gap is becoming the real bottleneck.
The Real Barriers to AI Adoption Aren’t What You Think | AVIXA Xchange
Budget constraints and ROI justification tend to dominate AI adoption conversations in boardrooms, at industry events, and in many vendor pitches.
86 pc Indian employees use AI, but ROI and governance lag: Report
While 86 per cent of employees in India use artificial intelligence at work, only 35 per cent say AI's return on investment has met or exceeded e....
Why AI governance is a European imperative - Raconteur
If the German Autobahn is the fastest road system in Europe, then AI is the technological equivalent, and we’re all in the fast lane. Experimental pilots are rapidly evolving into production-level deployments, delivering productivity gains and decision-making improvements.
Trustworthiness in Digital Twin Systems: Systematic Review and Research Horizons
arXiv:2605.08208v1 Announce Type: new Abstract: Digital Twins (DTs) are increasingly deployed across application domains, yet the treatment of trust-related issues remains unevenly addressed. To examine whether and how trust is discussed in the current landscape, we conducted a systematic review of existing DT review papers and a mapping of their abstracts. Seven trust-related challenges and seven trust-enhancing strategies were defined to guide the analysis, enabling the trust focus of each paper to be characterised. By aggregating the challenges and strategies referenced across domains, distinct patterns of emphasis were observed. With certain domains consistently sharing similar spectrum of trust concerns, four integration types, including human-centred, safety-critical, context-specific, and technologically-driven, were identified as emergent categories reflecting how trust is prioritised in different deployment contexts. Drawing on the characteristics of these types, several preliminary directions for future research were proposed. These include the development of trust-by-design principles to inform early-stage decision-making, the inclusion of trust metadata in platform schemas to prompt systematic developer consideration of trust factors, and the exploration of how architectural choices, such as federated DTs, influence user trust.
The skills gap quietly undermining your AI strategy | ITWeb
The AI conversation is dominated by tools, infrastructure and innovation – but without the skills to support it, it quickly becomes an operational risk.
Caterpillar targets mining skills gap with US$1M challenge - Canadian Mining Journal
As mining and heavy industry accelerate their adoption of automation, digitalization and advanced technologies, Caterpillar is looking to address another growing challenge […]
Inside the ‘architectural mismatch’ between AI capabilities & dealer needs | Auto Remarketing
Explore the latest trends and insights in the automotive remarketing industry. Stay updated on strategies for franchise and used car dealers.
Do City Delivery Drones Make Sense? No One Knows, but They're Flying Over NYC
A look at the current state of drone delivery services in New York City and the uncertainty surrounding their long-term viability and impact.
Podcast: Why the future of AI is hybrid and not cloud
In this episode, we explore why the next phase of AI could run across personal devices and discuss the industry's biggest challenges, including affordability and privacy.
When 'For You' Isn't For You: Measuring User Agency in TikTok's Algorithmic Feed
arXiv:2605.10690v1 Announce Type: new Abstract: The short-form video-sharing service TikTok has become an important platform in the social media landscape, with much of its popularity owed to its algorithmically-driven "For You Page" (FYP). This feature serves as the "home screen" for the platform and provides a personalized feed of content for each user. Unlike other social media services-where new users start their journey by explicitly signaling whom they choose to friend or follow-the TikTok FYP algorithm instead begins making inferences based on implicit signals, such as how long they watch particular videos. As a result, users have less explicit control over what content they see, and concerns have been raised about the impact on users (e.g., the delivery of potentially harmful content). In this work, we investigate the extent to which users have control over the content they see on the FYP on TikTok. We first develop novel techniques to study the TikTok mobile app, introducing a new avenue for conducting controlled experiments that enable us to send both explicit and implicit signals on the app. We then use these techniques to study the FYP algorithm based on accounts we control. We find that the FYP algorithm is sensitive to both types of signals, changing the amount of personalized content the account sees. However, we find that users may have difficulty convincing the FYP algorithm to stop showing content the user wishes to no longer see: the most effective explicit signal-marking a video as 'Not Interested'-is unintuitively buried in the interface. Worse, we find that once accounts cease to indicate disinterest in a topic, many find their feeds dominated by such content again.
Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction
arXiv:2605.08220v1 Announce Type: new Abstract: The automated extraction of data from scientific charts is a critical task for large-scale literature analysis. While multimodal Large Language Models (LLMs) show promise, their accuracy on non-standardized charts remains a challenge. This raises a key research question: what is the most effective strategy to improve model performance (high-level semantic priming) or low-level spatial priming? This paper presents a comparative investigation into these two distinct strategies. We describe our exploratory experiments with semantic methods, such as a two-stage metadata-first framework and Chain-of-Thought, which failed to produce a statistically significant improvement. In contrast, we present a simple but highly effective spatial priming method: overlaying a coordinate grid onto the chart image before analysis. Our quantitative experiment on a synthetic dataset demonstrates that this grid-based approach provides a statistically significant reduction in data extraction error (SMAPE reduced from 25.5% to 19.5%, p < 0.05) compared to a baseline. We conclude that for the current generation of multimodal models, providing explicit spatial context is a more effective and reliable strategy than high-level semantic guidance for this class of tasks.
Million Tutoring Moves (MTM): An Open Multimodal Dataset for the Science of Tutoring
arXiv:2605.08092v1 Announce Type: new Abstract: We introduce the Million Tutoring Moves (MTM) project, an open dataset initiative aimed at advancing the science of tutoring through large-scale, reusable, and multimodal interaction data. MTM is developed within the National Tutoring Observatory (NTO), a research infrastructure designed to study authentic tutoring interactions and translate them into actionable insights for research, practice, and AI-powered educational technology development. In this paper, we present the vision behind MTM and describe MTM v1, an initial release consisting of 4,654 math tutoring transcripts from a U.S.-based nonprofit online tutoring platform. MTM v1 serves as a first step toward a broader repository that is safe, open, large-scale, broad-coverage, and multimodal. By making tutoring interactions systematically observable and analyzable, MTM aims to support research on instructional processes, improve tutoring practice, and enable the development of AI systems grounded in real educational interactions.
Rethinking the Architecture Firm for the AI Era | ArchDaily
Learn how AI-driven tools are transforming how architects work today, enhancing coordination and research through collaborative workflows.
This Gen Zer dropped out of college to become an influencer—now he’s a millionaire from selling products like Medicube and Neutrogena on TikTok Shop
Logan Walter, a 21-year-old content creator selling products on TikTok Shop, reels in a seven-figure monthly income making videos in his childhood bedroom.
Rotterdam’s Ditto raises €7.6 million to make “what did the doctor say?” easier to answer
Ditto, a Rotterdam-based HealthTech startup that has developed a free app that translates complex medical information into plain language, has raised €7.6 million for its European rollout. The round was led by Heal Capital, with participation from Optiverder and Rubio Impact Ventures. “No patient should have to guess what was just said. We are fundamentally […]
How AI and Machine Learning Can Provide Actionable Customer-Level Load Intelligence Without Advanced Metering Infrastructure
Utilities need customer-level hourly load visibility to manage EV growth, electrification, and rising peak demand. But most don’t have usable AMI data for this purpose - and even those that do often ...
Amazon staff use AI tool for unnecessary tasks to inflate usage scores
In-house MeshClaw tool enables employees to delegate jobs to AI agents and climb company’s AI leaderboard
Understanding Student Effort Using Response-Time Propensities During Problem Solving
arXiv:2605.08943v1 Announce Type: new Abstract: Adaptive learning systems can produce substantial learning gains, yet many students engage for too brief or too superficial a period to benefit. A central obstacle is measuring effort. Effort during multi-step problem solving is rarely directly observed, and common log-based proxies, such as time on task, cannot distinguish between a student working carefully and a student encountering a harder problem. We examine step-to-step response time as a scalable effort signal by modeling trait-like differences in students' typical response timing during tutoring (while adjusting for skill difficulty). Using step-level logs from eight classroom deployments of algebra tutoring systems (2020 to 2023) across six U.S. schools (794 students), we estimate student- and knowledge-component-level propensities using hierarchical models and relate them to learning efficiency, defined as performance improvement per completed solution step. Response-time propensities show moderate to strong stability within students, supporting their use as an individual differences measure beyond correctness. At the same time, their relationship to learning is not uniform but conditional on the learner and context. Slower propensities predict greater learning efficiency for higher-proficiency students, consistent with constructive processing, whereas for lower-proficiency students, slower propensities are weakly related or even negative, consistent with unproductive struggle or idling. These associations are strongest early in practice sequences and attenuate later in the class period, highlighting an actionable window for detecting emerging disengagement and low persistence. Overall, response-time propensities provide a practical way to incorporate temporal process data into learner models and to target adaptive supports when effort is most diagnostic.
WebDevPro #139: The Developer’s Edge in an AI-Assisted Workflow
AI has changed the texture of web development work. A few years ago, most of us bounced between docs, Stack Overflow, GitHub issues, and half-finished notes in our own repos. Now, many developers open a chat window first. We ask for an explanation, a starter function, a refactor, a regex fix, ...
Learning on the Shop floor
Shopify's internal AI coding agent, River, demonstrates how Generative AI can serve as a collaborative workplace learning system rather than just a private productivity tool.
AI isn't paying off in the way companies think. Layoffs driven by automation are failing to generate returns, study finds | Fortune
A Gartner study found that while 80% of companies surveyed reported workforce reductions, there was no correlation to higher ROI.
What Leaders Get Wrong About the ROI of AI
“Too many leaders measure the ROI of AI through the lens of cost savings,” writes Katy George.
Benjamin Yuille - Enterprise Sales Leader | B2B SaaS ...
AI revtech is impressive. It works really well… Just not in true enterprise. It thrives in low ACV, transactional sales.
Chasing AI Speed? Measure Value-Latency Before Budget Cuts
Stakeholders don’t fund speed on its own. They fund impact they can verify, delivered on a timeline they can defend.
The CEO’s Guide To Getting ROI From AI
Also in the Forbes CEO newsletter: Debt reaches a troubling milestone, AI blamed for job cuts, Amazon gets into the logistics business.
Marketers strain to juggle media budgets, AI and high expectations from CEOs
A new survey reveals sustained pressure on budgets as CMOs struggle to deliver on marketing goals and AI objectives.
Geopolitics, Policy & Governance
Trump heads to China to spread the gospel of American tech while emulating Xi Jinping on AI
Tim Cook and Elon Musk, among other tech CEOS, will accompany the US president on a trip to China Donald Trump is heading to China this week. If his guest list is any clue, he wants to discuss technology with Xi Jinping, though perhaps after the war in Iran. On Monday, news broke that outgoing Apple CEO, Tim Cook, as well as SpaceX and Tesla CEO, Elon Musk, would join the US president. Other guests from the tech sphere include Meta’s recently appointed president, Dina Powell McCormick; Sanjay Mehrotra, CEO of computer memory maker Micron; Chuck Robbins, CEO of longtime telecom giant Cisco; and Cristiano Amon, CEO of semiconductor maker Qualcomm, according to a White House official. Continue reading...
Britain pays Starlink millions despite Musk's calls to overthrow UK government
Satellite service supports troops and Ukraine, but payments may raise eyebrows after boss's political broadsides
The US And China Are Negotiating AI’s Future – Is The Middle East’s Neutral Position Still Tenable? - TechRound
The US and China are in bilateral AI security talks. Gulf sovereign wealth funds are backing both sides. The Middle East's neutrality is about to be tested.
The world holds its breath as Trump-Xi summit approaches - Brownstone Worldwide
When U.S. President Donald Trump and China’s Xi Jinping come together for a crucial meeting in Beijing, the world will be watching closely as the two leaders negotiate on a wide range of pressing issues. The agenda includes discussions on trade, technology, rare earth export controls, Taiwan, ...
Koreans should all get an AI bonus, says presidential adviser
Samsung and SK Hynix stocks dip after policy chief comments
Sovereign cloud is only possible if you’re Chinese or American: Gartner
Gartner suggests that true sovereign cloud capabilities are currently limited to China and the US, posing challenges for European organizations concerned about data control.
Governments can’t agree on what AI actually is
One core reason that global action around AI has been poor is that none of the world agrees on what AI is. First, there is clearly a definitional problem. When some people refer to artificial intelligence , they think almost exclusively about ChatGPT or large language models.
NeurIPS Should Require Reproducibility Standards for Frontier AI Safety Claims
arXiv:2605.08192v1 Announce Type: new Abstract: Frontier AI safety claims - published assertions that a highly capable general-purpose model is below a threshold of concern, adequately mitigated, or suitable for release - increasingly shape model deployment, governance, and public trust. Yet the artefacts needed to evaluate them are routinely withheld, producing an evidential inversion: the most consequential claims in AI safety are often the least reproducible. This position paper argues that NeurIPS should require reproducibility standards for papers making such claims, treating non-reproducibility not as a transparency preference but as an evaluation-methodology failure. The 2026 International AI Safety Report [Bengio et al., 2026] concludes that reliable pre-deployment safety testing has become harder to conduct and that models now distinguish test from deployment contexts; the 2025 Foundation Model Transparency Index [Wan et al., 2025] reports a sector-average transparency score of 40/100 with no major developer adequately disclosing train-test overlap; contemporaneous measurement-theory work shows that attack-success-rate comparisons across systems are often founded on low-validity measurements [Chouldechova et al., 2025]. We propose a three-tier disclosure framework, distinguishing public, controlled, and claim-restricted disclosure, paired with a mandatory claim inventory, scope statements, and a phased implementation path with graduated sanctions. The framework treats secrecy and openness as endpoints of a spectrum, with controlled review (via a federated colloquium of qualified secure-review hosts) covering claims whose artefacts cannot be released publicly, and right-scaling claims whose artefacts cannot be reviewed even confidentially. The standard the community applies to its most consequential claims should be at least as high as the standard it applies to its least.
EU says OpenAI offers to open access to cybersecurity model, Anthropic not there yet | Reuters
The European Commission on Monday welcomed an offer by U.S. artificial intelligence giant OpenAI to provide open access to its cybersecurity features, but said its rival Anthropic has not yet gone so far.
Social Policy of Large Language Models: How GPT, Claude, DeepSeek and Grok Allocate Social Budgets in Spain and Germany
arXiv:2605.10234v1 Announce Type: new Abstract: We study how four widely used large language models, namely Claude, GPT-4o, DeepSeek and Grok, distribute a fixed national social budget across twelve macro-areas of public expenditure under two European national contexts, Spain and Germany. Each combination of model and country is queried six times under identical prompts and generation parameters, producing forty-eight independent allocations that are compared against approximate Organisation for Economic Co-operation and Development (OECD) reference budgets and against each other. We formalise five hypotheses regarding geopolitical bias, housing under-allocation, structural convergence, sensitivity to national context, and under-representation of politically sensitive categories. The differences between models are then validated through Kruskal-Wallis tests on each macro-area, with post-hoc Mann-Whitney U comparisons under Bonferroni correction, and complemented by an analysis of pairwise Pearson correlations and a lexical examination of the textual justifications produced by each model. The results show that all four models share a systematic implicit social policy that diverges from real European spending structures: pensions are under-allocated by a factor close to three, while housing and employment are over-allocated by factors of four and two respectively. The principal axis of differentiation between models is not geopolitical, since Claude and DeepSeek are the most correlated pair across both countries, but rather a contrast between concentration and dispersion of the budget. Only Claude exhibits substantive sensitivity to the national context. The conclusions delimit the conditions under which language models may responsibly support, but not replace, expert deliberation in public budgeting.
US communications regulator targets Chinese tech for security risks
FCC chair Brendan Carr cracks down on goods from drones to routers despite trade thaw with Beijing
Alignment as Jurisprudence
arXiv:2605.08416v1 Announce Type: new Abstract: Jurisprudence, the study of how judges should properly decide cases, and alignment, the science of getting AI models to conform to human values, share a fundamental structure. These seemingly distant fields both seek to predict and shape how decisions by powerful actors, in one case judges and in the other increasingly powerful artificial intelligences, will be made in the unknown future. And they use similar tools of the specification and interpretation of language to try to accomplish those goals. The great debates of jurisprudence, about what the law is and what it should be, can provide insight into alignment, and lessons from what does and does not work in alignment can help make progress in jurisprudence. This essay puts the two fields directly into conversation. Drawing on leading accounts of jurisprudence, particularly Dworkin's principle-oriented interpretivism and Sunstein's positivist account of law as analogical reasoning, and on cutting-edge alignment approaches, namely Constitutional AI and case-based reasoning, it illustrates the value of a more sophisticated legally-inspired approach to the interplay of rules and cases in finetuning alignment and points to ways that AI can provide a better understanding of how the law works and how it can be improved by the introduction of AI. AI systems and the law should operate to empower people to act in the world, helping to expand their capabilities and the extent to which they are able to achieve their goals. As AI continues to improve in capacity, and as the constraints that legal theory places on human judges seem be coming undone, the conversation between these two fields will become increasingly essential and may help point to a better version of both.
Starmer's Europe reset risks strangling UK AI sector with EU regulation
Keir Starmer's pledge to place "Britain at the heart of Europe" risks reigniting fears that closer EU ties could undo the UK's AI advantage.
EU Commission seeks feedback on AI transparency guidelines
The European Commission has opened a consultation on new transparency guidelines for AI, requiring disclosure when users interact with AI and the implementation of machine-readable marks.
Australian energy ministers mull national data-center rules
Australian energy ministers are set to consider changes to national energy policy in response to a rapid growth of data centers, including requirements for operators to invest in renewable power.
Meituan, Didi, Alibaba platforms revamp algorithms under CAC campaign
China's major lifestyle-services platforms have completed an initial round of self-inspections and rectification of problematic algorithms as part of a broader push to tighten oversight of the digital economy.
When Federal Agencies Pick AI Vendors, They Are Buying Different Policy Interpretations | TechPolicy.Press
Changing AI vendors may also change how government systems perform, say Paulo Carvão, Isabel Adler, Jeffrey Zhou and Claudio Mayrink Verdun.
US FTC offers Take it Down Act tips, warnings ahead of compliance deadline
The US FTC is reminding tech companies to establish notice and takedown processes for nonconsensual intimate images, including AI deepfakes, before the upcoming compliance deadline.
Clear AI Definitions Needed for New Regulation
Without clear definitions, governance is impossible.
Fair use drives discovery
Fair use is a foundational U.S. copyright principle that preserves access to information for transformative uses, allowing AI to learn from diverse data and accelerate breakthroughs.
California AG says states still hold authority to regulate content of online platforms
The California Attorney General argued in court that Section 230 does not prevent states from regulating online content, defending a state law against deceptive election communications.
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.