Thu 25 June 2026
Daily Brief — Curated and contextualised by Best Practice AI
Micron reports record profits, power grids issue emergency warnings, and Hyundai workers strike over the bill
TL;DRThe US power grid has issued emergency capacity warnings as data center demand threatens electricity supply. Micron reported a 15-fold profit surge, while OpenAI and Broadcom unveiled custom chip designs requiring 10 gigawatts of power. Hyundai workers in South Korea have voted to strike over automation fears, and Gartner warns that AI coding agents are becoming more expensive than the developers they replace. Meanwhile, Alibaba shares hit a 16-month low following intellectual property accusations from Anthropic.
The stories that matter most
Selected and contextualised by the Best Practice AI team
Inflation has a new catalyst: America’s massive artificial-intelligence build-out is beginning to push up prices on everything from smartphones to electricity.
Demand for memory chips is pushing prices higher. Will AI’s promise of increased productivity come in time to temper that inflation?
OpenAI and Broadcom Unveil Custom A.I. Chip Design
The maker of ChatGPT plans to use enough chips to consume 10 gigawatts of electricity, an amount that could power millions of households.
Governing Technical Debt in Agentic AI Systems
arXiv:2605.29129v2 Announce Type: replace-cross Abstract: Agentic AI systems are increasingly being explored as production infrastructure: they reason over multiple steps, call tools, act through workflows, and adapt through memory and feedback. These systems create governance challenges that are not fully captured by traditional software or predictive ML technical debt. We define Agentic Technical Debt as the accumulated liability created when prompts, memory, tool schemas, orchestration graphs, control policies, and observability routines are patched together faster than they can be validated, standardized, and governed. We define Stochastic Tax as the recurring operating burden of keeping probabilistic agent behavior within acceptable bounds. The distinction matters: debt is a stock of design and governance liability, while the tax is a flow of operating cost that arises because stochastic agents act through tools and workflows. We outline how managers can make both visible through lightweight dashboards and governance controls.
Micron posts 15-fold profit surge in boost for global AI stocks
Chipmaker forecasts sustained demand for computer memory, boosting its shares and Asian markets
Qualcomm to Acquire AI Software Firm Modular in $3.9 Billion Stock Deal
Qualcomm agreed to acquire the AI software company Modular for about $3.9 billion, in a bid to make artificial intelligence faster and cheaper for its customers.
Hyundai workers in South Korea vote to strike over fears of robots replacing them
Union at country’s largest carmaker wants greater say over how AI and automation are introduced
How Large Language Models Source Brand Reputation Across Languages and Markets
arXiv:2606.25787v1 Announce Type: cross Abstract: When a large language model (LLM) answers a question about a company, it grounds the answer in retrieved web sources, and those sources decide what the model says. Most analysis of AI brand visibility looks at the answer text. This study looks one step earlier, at the citations. We merge three Rankfor.AI datasets covering 128 brands across 12 home markets and 13 languages, and analyse 167,551 URL-grounded citations (189,974 total attribution rows). We classify each citation by domain and source type and measure where AI gets its brand information, by language and by market. Four patterns hold. First, AI grounds brand answers overwhelmingly in third-party sources: 85.7% of citations point to sites the brand does not own, against 14.3% owned. Second, the source base is concentrated and long-tailed: 80% of citations come from about 18% of domains, fitting a Zipf law (alpha = 0.86, R^2 = 0.983). Third, one reference site dominates almost everywhere: Wikipedia is the most-cited domain in 11 of 12 languages, the exception being Lithuanian, where the business daily vz.lt edges it (4.38%). Fourth, the source mix is market-specific at the margin: for 46 Polish national brands the most-cited domain is YouTube, and four HR and careers portals supply 637 citations against 297 for Polish Wikipedia, about twice as many.
AI Data Center Growth Prompts New Capacity Advisory for US Largest Power Grid - Bloomberg
The largest US power grid is creating a new emergency warning as surging data-center demand pushes electricity supplies toward shortages beyond periods of extreme weather.
AI coding agents could soon cost more than the developers using them
Consumption-based pricing and scant cost controls are sending monthly bills into five figures, Gartner warns
Why Memory Components Fail: Eight Years of License and Sustainability Events in Open-Source Data Infrastructure
arXiv:2606.24896v1 Announce Type: cross Abstract: LLM agent memory is now treated as a first-class architectural component in five major surveys published between January and April 2026. None of these surveys treats project governance, capital structure, or license posture as architectural variables. We argue they are. In a constructed sample of 105 production-relevant open-source data-infrastructure and AI-tooling projects, we catalogue 38 license-and-sustainability events between 2018 and May 2026. About a quarter of the sample (24 percent) experienced at least one adverse event. The conditional rates split sharply by structure: 46 percent for single-vendor venture-backed projects, 2.5 percent for foundation-governed projects funded outside the venture cycle. The headline differential -- roughly nineteen-fold -- is invariant to the most contested coding choice in the catalogue; we show the sensitivity table in Section 7. A small subset of foundation-governed projects with venture-backed corporate stewards (n=3) contains one adverse event. The cell is too small for stable estimation, but it points to a mechanism: foundation governance may block unilateral relicensing while leaving distribution decisions to the steward. Annualized incidence within the catalogue rose from 2.7 to 4.2 events per year across the window. Counterfactuals -- PostgreSQL, pgvector, SQLite, Apache Kafka, Caddy -- each show stability arising from a different structural source: distributed copyright, absence of monetisation pressure, foundation governance with non-venture stewardship. We propose a six-field decision instrument for architects choosing memory components: governance, capital structure, license, foundation membership, fork-or-migration availability, and steward concentration.
Alibaba Slides to 16-Month Low After Anthropic’s AI Accusations
Alibaba Group Holding Ltd. shares slid to a 16-month low in Hong Kong after Anthropic PBC accused the Chinese technology giant of “illicitly” accessing its artificial intelligence model.
Economics & Markets
Qualcomm to Acquire AI Software Firm Modular in $3.9 Billion Stock Deal
Qualcomm agreed to acquire the AI software company Modular for about $3.9 billion, in a bid to make artificial intelligence faster and cheaper for its customers.
Micron Shares Jump as Forecast Beats Estimates
Micron Technology Inc. surged in late trading after its quarterly sales forecast crushed Wall Street estimates, signaling that an AI-fueled growth run remains strong. The largest US maker of computer memory chips, said revenue will be approximately $50 billion in the fiscal fourth quarter, which runs through August. Bloomberg's Ruhell Amin breaks down the numbers. (Source: Bloomberg)
Memory Chipmaker SK Hynix's $29 Billion US Listing Seizes on AI Demand
SK Hynix Inc. is seeking 45.45 trillion won ($29.4 billion) in a US listing, tapping investor demand for high-flying memory-chip stocks even after a major selloff shook the group this week.
Micron posts 15-fold profit surge in boost for global AI stocks
Chipmaker forecasts sustained demand for computer memory, boosting its shares and Asian markets
Qualcomm to buy startup Modular for $4 billion in AI software push
The Australian airline studied details from nutrition and ergonomics to movement and light.
Stocks Rally as Micron Revives AI Trade
Stocks in Asia climbed alongside US equity futures after Micron Technology Inc.’s blowout sales outlook reignited confidence in the artificial-intelligence trade. Bloomberg's Anthony Stephens has the context on how the AI sector is trending. (Source: Bloomberg)
Japan’s Kioxia Plans to Offer US Depositary Shares in Spring
Kioxia Holdings Corp. intends to offer US depositary receipts in the spring of 2027 to take advantage of runaway investor demand for exposure to AI-related semiconductor shares.
Japanese, South Korean Stocks Gain on Renewed Confidence in AI
Japanese and South Korean equities climbed, led by gains in the tech sector after Micron Technology Inc.’s upbeat forecast reignited confidence in the artificial-intelligence trade and brought relief to the market.
Bitcoin hits 20-month low as market sentiment sours
Price of world’s most actively traded digital asset falls below $60,000 amid shift by retail investors to AI-related stock bets
Every Major Tech Milestone of 2026 So Far (January–June, In Order) | by Lakshayagarwal | Jun, 2026 | Medium
SpaceX acquired x AI for a reported $250 billion, creating a vertically integrated structure spanning space infrastructure and frontier AI — a deal that immediately became the template other tech giants started referencing for their own consolidation strategy conversations.
LLMs Are No Longer AI’s Hottest Commodity. Here’s What Acquisitions Are Telling Investors About Which Companies Are In Highest Demand
Google's $32 billion purchase of Wiz, IBM's $27.8 billion data-stack acquisition, or Salesforce's $3.6 billion purchase of Fin, to name a few; last year, in fact, broke historic records for consolidation activity in AI, amounting to at least $157 billion in disclosed value through 33 major ...
Above the Noise: AI, markets, and momentum | IBKR Campus US
The AI story didn’t change. Investors’ interpretation did, and that shift has broadened the opportunity set beyond a handful of companies.
Tech stocks slump as AI bubble fears loom
Tech stocks declined as concerns over an AI bubble impacted market momentum, with memory chip and data storage companies suffering significant losses.
AI could unlock US$600 billion a year in climate and sustainability value by 2028
A report by Boston Consulting Group and Temasek says artificial intelligence is expanding climate investing beyond venture capital, creating opportunities across growth equity, buyouts and infrastructure capital.
AI Bubble Fears Trigger Global Stock Market Sell-Off in 2026 | economy | informed, clearly
Global stock markets plunged in June 2026 as AI bubble fears triggered a tech sell-off. The Nasdaq fell 2.2%, South Korea's Kospi crashed 10%, and SpaceX…
Inflation has a new catalyst: America’s massive artificial-intelligence build-out is beginning to push up prices on everything from smartphones to electricity.
Demand for memory chips is pushing prices higher. Will AI’s promise of increased productivity come in time to temper that inflation?
AI Data Center Growth Prompts New Capacity Advisory for US Largest Power Grid - Bloomberg
The largest US power grid is creating a new emergency warning as surging data-center demand pushes electricity supplies toward shortages beyond periods of extreme weather.
Welcome to the Luxury City Built by Taiwan’s A.I. Boom
Fortunes, luxury buildings and birthrates are rising in the city at the center of Taiwan’s chip supply chain.
Treasury Concludes the Artificial Intelligence Innovation Series | U.S. Department of the Treasury
WASHINGTON—The U.S. Department of the Treasury’s Office of the Financial Stability Oversight Council (FSOC) and the Artificial Intelligence (AI) Transformation Office (AITO) held the fourth and final roundtable of the AI Innovation Series on May 19, 2026.
Carbon Farming: An Expository, Inter-Disciplinary Survey
arXiv:2603.20674v3 Announce Type: replace Abstract: Carbon farming is the collection of agricultural best practices specifically designed to maximize the capture and long-term storage of atmospheric carbon dioxide in soils and plant biomass, while simultaneously reducing greenhouse gas emissions from cultivation practices. Carbon farming can be viewed as a promising pathway to simultaneously address climate change mitigation, soil degradation, and farmer welfare. For example, if the entire agricultural cropland in India practices carbon farming, this will spectacularly offset about 50% of emissions from the country's annual transport-sector emissions. However, practical deployment of carbon farming is constrained by scientific challenges, inherent complexity, and fragmented understanding across disciplines. This inter-disciplinary, expository survey offers the first unified treatment of carbon farming for practitioners, policymakers, and researchers. The survey integrates insights from agronomy, soil science, climate science, measurement, reporting, and verification (MRV), economics, carbon markets, and policy design. We begin by establishing the conceptual foundations of soil organic carbon dynamics and agricultural carbon sequestration, and compare carbon farming with the paradigms of sustainable, regenerative, and organic agriculture. We then present a comprehensive landscape analysis of carbon-farming best practices, including both generic and crop-specific interventions, and systematically examine their co-benefits and trade-offs. The paper offers a rigorous review of MRV frameworks, emerging digital MRV technologies, and the carbon-credit project life cycle, followed by a structured analysis of voluntary and compliance carbon markets...
Why Asian Markets Are Nervous About AI Spending, Interest Rates and Oil Prices
Asian equities traded mixed on Wednesday after a global technology selloff fueled concerns about artificial intelligence spending.
Monetary Regimes and Trade before the Classical Gold Standard: Evidence from the Latin Monetary Union
arXiv:2510.25487v3 Announce Type: replace Abstract: This paper reexamines the trade effects of the Latin Monetary Union (LMU), a 19th century agreement to standardize gold and silver coinage among several European countries. The LMU provides a useful setting to study whether monetary arrangements fostered trade before the classical gold standard, when gold, silver, bimetallic, and paper regimes coexisted. Since some countries already shared other monetary standards, I classify pairs by regime and use historical bilateral trade flows and structural gravity modeling to estimate the LMU effect relative to pairs without a common arrangement. This brings the comparison closer to the one used in the literature on the gold standard and contemporary currency unions. The results suggest that the LMU increased trade between its members by approximately 30\% during its early years, when bimetallism was still credible. These effects then faded, converging to zero by the end of the 1870s. The evidence is consistent with a temporary trade effect operating through common coinage rules, expectations of monetary cooperation, and the credibility of the bimetallism.
India to Lead Human Skills Economy Amid AI Adoption
India is poised to lead the human skills economy with the world’s youngest workforce and highest AI adoption rate, according to a new report.
Why Memory Components Fail: Eight Years of License and Sustainability Events in Open-Source Data Infrastructure
arXiv:2606.24896v1 Announce Type: cross Abstract: LLM agent memory is now treated as a first-class architectural component in five major surveys published between January and April 2026. None of these surveys treats project governance, capital structure, or license posture as architectural variables. We argue they are. In a constructed sample of 105 production-relevant open-source data-infrastructure and AI-tooling projects, we catalogue 38 license-and-sustainability events between 2018 and May 2026. About a quarter of the sample (24 percent) experienced at least one adverse event. The conditional rates split sharply by structure: 46 percent for single-vendor venture-backed projects, 2.5 percent for foundation-governed projects funded outside the venture cycle. The headline differential -- roughly nineteen-fold -- is invariant to the most contested coding choice in the catalogue; we show the sensitivity table in Section 7. A small subset of foundation-governed projects with venture-backed corporate stewards (n=3) contains one adverse event. The cell is too small for stable estimation, but it points to a mechanism: foundation governance may block unilateral relicensing while leaving distribution decisions to the steward. Annualized incidence within the catalogue rose from 2.7 to 4.2 events per year across the window. Counterfactuals -- PostgreSQL, pgvector, SQLite, Apache Kafka, Caddy -- each show stability arising from a different structural source: distributed copyright, absence of monetisation pressure, foundation governance with non-venture stewardship. We propose a six-field decision instrument for architects choosing memory components: governance, capital structure, license, foundation membership, fork-or-migration availability, and steward concentration.
How Large Language Models Source Brand Reputation Across Languages and Markets
arXiv:2606.25787v1 Announce Type: cross Abstract: When a large language model (LLM) answers a question about a company, it grounds the answer in retrieved web sources, and those sources decide what the model says. Most analysis of AI brand visibility looks at the answer text. This study looks one step earlier, at the citations. We merge three Rankfor.AI datasets covering 128 brands across 12 home markets and 13 languages, and analyse 167,551 URL-grounded citations (189,974 total attribution rows). We classify each citation by domain and source type and measure where AI gets its brand information, by language and by market. Four patterns hold. First, AI grounds brand answers overwhelmingly in third-party sources: 85.7% of citations point to sites the brand does not own, against 14.3% owned. Second, the source base is concentrated and long-tailed: 80% of citations come from about 18% of domains, fitting a Zipf law (alpha = 0.86, R^2 = 0.983). Third, one reference site dominates almost everywhere: Wikipedia is the most-cited domain in 11 of 12 languages, the exception being Lithuanian, where the business daily vz.lt edges it (4.38%). Fourth, the source mix is market-specific at the margin: for 46 Polish national brands the most-cited domain is YouTube, and four HR and careers portals supply 637 citations against 297 for Polish Wikipedia, about twice as many.
Restoring Incentive Compatibility in Two-Stage Energy Markets with Prosumers
arXiv:2606.25910v1 Announce Type: cross Abstract: A central challenge in modern energy market design is the formulation of a strategy-proof imbalance settlement layer that secures both the economic efficiency of the institution and the stability of the power grid. Public data reveals that the day-ahead market is strategically biased below actual consumer demand. Such empirical observations are explained by active prosumers which provide implementable incentives for demand under-reporting. Active prosumers buy energy in the day-ahead market and sell energy in the real-time market for balancing real-time energy deviations. By under-reporting their demand for the day ahead they inflate real-time imbalances and, under uniform pricing, they dispatch their generation assets more profitably. We model the two-stage institution under linear preferences and benchmark it against its associated competitive equilibria. We show that although consumers' incentives for demand under-reporting vanish when the day-ahead market scales, prosumers' incentives remain lower bounded by a positive gain which depends only on the real-time market generation stack and their shares over it. To restore incentive compatibility under the existing informational constraints, we design a leave-one-out contrastive scoring rule-based penalty that is implemented by the day-ahead market operator, incentivizes prosumers to report their demand truthfully and ensures small charges when participating honestly. We illustrate these results with numerical simulations on synthetic data and evaluate our mechanism on real-market data by first rationalizing demand reports as subjective equilibria of the induced game. Our mechanism demonstrates strong incentive alignment while retaining a low cost for honest participation.
New Hedge Funds Are Using AI Bots to Rival Industry Giants - Bloomberg
Advances in artificial intelligence are leveling the field for fund managers, making it easier for boutique firms to compete with big macro and bond investors.
Breaking the Filter Bubble: A Semantic Pareto-DQN Framework for Multi-Objective Recommendation
arXiv:2606.24042v1 Announce Type: new Abstract: Recommender systems often induce filter bubbles and semantic homogenization by monolithically optimizing for immediate user engagement. Standard single-objective models, including traditional Deep Q-Networks, are ill-equipped to navigate the trade-offs between platform retention and critical societal values like information diversity and provider fairness. To address these limitations, we introduce a multi-objective reinforcement learning framework that formalizes recommendation as a semantic multi-objective Markov decision process. By integrating high-fidelity semantic embeddings with a Pareto-DQN agent, our architecture treats engagement, diversity, and fairness as distinct, non-aggregable reward signals, avoiding the pitfalls of static reward scalarization. Empirical evaluations on the MovieLens small dataset shows that our hypervolume based action selection disrupts the feedback loops responsible for semantic collapse. By sustaining high state-trajectory variance, the Pareto-DQN effectively maps the Pareto frontier, achieving gains in auxiliary societal objectives with only marginal impacts on engagement. This work provides a path toward intrinsically aligned, responsible recommender systems.
US remains global leader in AI, but China rapidly closing gap with cheaper models: JP Morgan - The Economic Times
The United States stands at the forefront of AI development, yet China is swiftly closing the gap with cost-efficient AI systems. Chinese companies are making headway in corporate environments by offering significantly lower operational costs than their American counterparts.
US AI Lead Narrows as China's Cheaper Models Surge
JP Morgan report finds US still dominates AI, but China is closing the gap with cost-efficient models. Enterprise adoption of Chinese AI is rising.
Microsoft-Inflection AI deal set for Brazil CADE fast-track review
Microsoft has formally notified Brazil’s competition authority of its 2024 partnership with Inflection AI, following a determination by the agency’s Tribunal that the transaction should be reviewed.
Startups Compete With Big Players Using AI | Jawlah
Startup investment firms are using artificial intelligence to redraw the rules of competition in the financial sector, leveraging advanced tools capable of executing tasks that previously required
For Most of the World, Open-Source AI Is the Only Way Forward - Techstrong.ai
Proprietary AI is both too expensive and too centralized in control for most countries and companies to rely upon. NYC -- Yann LeCun, one of the "Godfathers of AI," may have long been Meta's chief AI scientist, but at the United Nations Open Source Week in his keynote speech,
AI coding agents could soon cost more than the developers using them
Consumption-based pricing and scant cost controls are sending monthly bills into five figures, Gartner warns
OpenAI Codex bombards SSDs with needless write operations, costing millions
A logging implementation error in OpenAI Codex is causing excessive SSD write operations, leading to significant financial costs.
Gartner warns AI coding costs may top developer pay by 2028
Rising token use and usage-based pricing could make AI coding a bigger line item than developer salaries, Gartner said.
AI Coding Costs May Surpass Developer Salaries by 2028: Gartner
New Delhi, June 24: The cost of using artificial intelligence (AI) coding tools could exceed the average software developer’s salary by 2028 as organisations scale deployment of AI coding agents and...
Designing Recommendation Exposure and Favorite Lists: A Field Experiment in a Spot-Work Platform
arXiv:2606.17397v3 Announce Type: replace Abstract: How should recommender systems be designed when recommendations shape access to scarce, short-lived opportunities? We study this question in a production setting: Timee, Japan's largest platform for spot work, where workers favorite job templates and receive notifications when firms post shifts from those templates. Maximizing predicted favoriting can generate misdirected concentration: recommendations accumulate on popular templates that create few viable job openings, while templates with unmet labor demand receive too little exposure. We design exposure-control mechanisms for favorite-list management, reallocating template exposure based on posting activity and unfilled capacity. The proposed recommender, thresholded eligibility control (TEC), is fully parallelizable and suitable for large-scale digital platforms. In simulations calibrated to Timee data, TEC raises the per-round job-finding rate from 57.6% to 70.0%. A prefecture-level randomized field experiment increases realized matches and exposure per active template, reduces the share of low-exposure templates, and improves impression-level favoriting and downstream matching.
LLM Performance on a Real, Double-Marked GCSE Benchmark
arXiv:2606.24973v1 Announce Type: cross Abstract: We introduce a dataset of 32,534 double-marked real student responses to GCSE mock exams (GCSEs are the UK's national exams, taken at age ~16), spanning 328 questions across five subjects and including handwritten work. We test whether off-the-shelf large language models agree with examiners as closely as the two examiners agree with each other. We find that models overwhelmingly agree well with the examiner consensus across subjects, with the top performing models agreeing more closely with examiners than examiners agree with each other. Models achieve high scores for subjective tasks like English essay marking, as well as handling complex and messy handwritten Maths paper scripts. Agreement is uniform near the examiner line, and not massively discriminated by model size, providing cost-effective automated marking solutions.
Machine learning is revolutionizing weather forecasting -- the next step is a change in how we work
arXiv:2606.25076v1 Announce Type: cross Abstract: Following the success of machine learning in producing weather predictions with competitive skill compared to complex traditional systems, this article shifts attention from forecast output to the working practices that make prediction systems possible. We argue that machine learning and recent digital technologies will reshape the forecasting value chain: how models are coded and developed, how observations and Earth-system data are exploited, how data and computing are managed, how systems are verified, and how information is created, evaluated and turned into services. We discuss six non-exhaustive areas in which agentic software engineering, open and compressed data, shared verification workflows, interactive computing and generative methods may make modelling, evaluation and service creation faster, more interactive and more widely accessible. These changes will require weather and climate centres to adapt their infrastructures, data stewardship, trust and quality-assurance frameworks, skills and service delivery while maintaining scientific understanding, operational reliability, human expertise and their public-service role.
Existing AI Capabilities Ensure Inevitable Structural Transformation of Labor Markets
Current AI model capabilities are sufficient to drive large-scale economic and societal shifts over the next five years. The transformation is already embedded in existing technology, independent of future model breakthroughs.
Individual Productivity Gains from AI Coding Assistants Demonstrate Scalable Economic Potential
AI coding tools provide measurable time-saving benefits for individual developers. These micro-level efficiency gains represent significant aggregate productivity potential when scaled across enterprise software development workflows.
From campaigns to continuous growth: AI capabilities shaping marketing
McKinsey outlines how AI shifts marketing from episodic campaigns to continuous, hyper-personalized, and always-on growth systems.
HONEYWELL AND MIT FIND DIGITAL TECHNOLOGIES CAN HELP INCREASE ENERGY SUPPLY, REDUCE ENERGY PRODUCTION COST BY TENS OF BILLIONS ANNUALLY | Morningstar
The report addresses three focus ... managing demand, and diversifying energy resources and feedstocks. "Meeting the world's growing energy needs will require both investment in new technologies to broaden feedstock options and more efficient use of today's energy infrastructure," said Ken West, President and CEO of Honeywell Process Technology. "Honeywell is helping customers apply AI, automation, ...
Labor, Society & Culture
Hyundai workers in South Korea vote to strike over fears of robots replacing them
Union at country’s largest carmaker wants greater say over how AI and automation are introduced
‘Wipe out and change are different’: Amazon exec slams AI job apocalypse fears as he hires thousands of Gen Z grads
Matt Garman, CEO of Amazon Web Services, is rejecting predictions of an entry-level job bloodbath, as he hires thousands of Gen Z talent.
Meta pauses employee tracker for AI training amid privacy concerns
About 1,600 workers signed petition against tool that tracked staff keystrokes, mouse clicks and computer screen content Mark Zuckerberg’s Meta has paused a program that tracked employees’ computer activity amid data privacy concerns and a staff backlash. The owner of Facebook, Instagram and WhatsApp had introduced a tool that tracked staff keystrokes, mouse clicks and content displayed on computer screens in order to collect data for training its AI models. Continue reading...
Why the AI Job Displacement News May Not Be What HR Leaders Think
Discover what new SHRM research presented at SHRM26 reveals about AI and job displacement trends, and why most jobs are being transformed rather than replaced.
China urged to steer AI toward job creation with incentives, penalties
China should deploy incentives and penalties to steer artificial intelligence development toward augmenting workers rather than replacing them, according to a prominent government economic adviser.
AI is Ready but Firms are Not: How Falling Behind on AI Implementation is Costing Clients and Talent - Legal Reader
Thomson Reuters released its 2026 Future of Professionals report which warns of the cost of failing to effectively implement AI in legal professions.
AI is making developers more productive — and anxious about falling behind
New AI tools are released near-constantly. Some developers are invigorated, while others feel overwhelmed just trying to keep up.
Will AI leave engineers unemployed: New study shows unexpected results – Zamin.uz, 25.06.2026
Recently, the rapid development of AI technologies has caused serious concern among many professionals, especially software developers. Many predicted that…
‘You can’t make billions without hurting people’: Cory Doctorow on Elon Musk, the AI bubble and bosses’ cruel fantasies
The writer who coined the word ‘enshittification’ tells us why AI will never deliver what it promises – and why it still appeals so much to those in power A “centaur”, in automation theory, is a person assisted by a machine, and a “reverse centaur”, hero of Cory Doctorow’s new book, The Reverse Centaur’s Guide to Life After AI, is a “human who is conscripted into acting as an assistant to a machine”. Every warehouse worker who ever had to urinate in a water bottle because they couldn’t otherwise meet the fulfilment targets set by an algorithm is a reverse centaur. Reaching into the future, everyone who has to sit in a self-driving truck to make sure it doesn’t crash, presumably on minimum rather than truck-driver wages, is a reverse centaur; as is every lawyer no longer on lawyer’s money checking Gemini’s command of precedent, every indie band scraping a living doing covers of AI-generated hits, and so on. That, anyway, is the promise: AI is coming for your job, and it is coming for your kids’ jobs, and there is no point fighting it because the future’s already here. Wiping out the world of work, and with it our ability to sustain ourselves and live autonomous lives, is only the beginning, if you listen to AI’s architects. Elon Musk has called it the single greatest threat to human civilisation, Sam Altman has said it will “most likely lead to the end of the world” and Dario Amodei, CEO of Anthropic, memorably forecast that AI would come to see us the way we see animals: cute to have around but ultimately a resource to be exploited. “AI people claim they’re about to create God, by teaching words to a word-guessing programme,” Doctorow says. “It’s grandiose.” Continue reading...
Paid Voices vs. Public Feeds: Interpretable Cross-Platform Theme-Based Analysis of Climate Discourse
arXiv:2601.13317v2 Announce Type: replace-cross Abstract: Climate discourse online shapes public understanding of climate change and informs political and policy debate, yet it unfolds across structurally different environments: paid advertising platforms host targeted, institutionally produced messaging, while public social media reflects largely organic, user-driven discussion. We present a comparative analysis of climate discourse across paid advertisements on Meta (previously Facebook) and public posts on Bluesky from July 2024 to September 2025. To support it, we develop an interpretable thematic discovery pipeline that clusters texts by semantic similarity and uses large language models (LLMs) to label clusters with concise, human-interpretable themes, requiring no predefined topic inventory or seed set. Using these themes, we find the two environments diverge systematically: paid advertising centers on strategic promotion of specific solutions in a formal, forward-looking register, whereas organic discourse centers on systemic critique in a crisis-oriented, scientifically grounded one. We also evaluate the utility of the discovered themes through downstream stance prediction and theme-guided retrieval tasks. While our analysis focuses on climate communication, the framework generalizes to comparative thematic analysis across heterogeneous communication environments.
ReMMD: Realistic Multilingual Multi-Image Agentic Verification for Multimodal Misinformation Detection
arXiv:2606.24112v1 Announce Type: new Abstract: Multimodal misinformation detection is increasingly important because viral posts now combine long multilingual narratives, several images, mixed provenance, and subtle text--image framing errors. Existing benchmarks and methods remain poorly matched to this setting: they usually isolate short captions, single images, binary labels, or one manipulation source, while agentic verification remains costly under realistic evidence search. We present ReMMD, a realistic multilingual multi-image agentic verification framework for multimodal misinformation detection. ReMMD includes ReMMDBench, a real-world multimodal misinformation detection benchmark with 500 samples, 2,756 images, five monolingual languages, two cross-lingual settings, three text-length tiers, multi-image posts, five-way veracity labels, eight distortion labels, evidence provenance, and rationales. It also includes ReMMD-Agent, a persistent-memory verifier that decomposes posts into atomic points, builds a reusable evidence set, and predicts structured L1/L2/L3 outputs. Across proprietary systems, open LVLMs, MMD-Agent, and T2-Agent, ReMMD-Agent obtains the best five-way veracity performance, with 41.80% accuracy and 39.12% macro-F1 using GPT-5.2, while reducing cost by 17.5% relative to MMD-Agent and 79.9% relative to T2-Agent. The project is available at https://dang-ai.github.io/ReMMD.
Amazon will present its framework for engineering trustworthy AI agents at VB Transform 2026
AI agents are increasingly proficient at executing business tasks autonomously, but IT leaders are cautious about granting permissions to access enterprise systems. Part of the challenge lies in how AI reliability is measured. Industry standards often rely on EVAL scores, which provide a static snapshot of performance rather than a measure of overall reliability. These metrics can fail to capture predictability across prompts, environments, and input types, said Bryan Silverthorn, director of the AGI Autonomy research lab at Amazon. Amazon’s AGI autonomy research lab is moving beyond raw performance benchmarks, focusing instead on a structured framework centered on consistency, robustness, predictability, and safety, Silverthorn told VentureBeat during an interview ahead of his session at VB Transform 2026. Rather than assuming that models can be harnessed into safety, Amazon’s approach emphasizes decoupled systems, such as sandboxed environments where agents propose changes that are reviewed by humans before implementation. This strategy aims to bridge the trust gap by prioritizing verifiable interactions, even in highly sensitive domains like finance, where the potential damage an agent can cause is significant. In VentureBeat’s Q2 Pulse Research survey of over 100 senior technology leaders and buyers, just 4% said they are comfortable relying on model guardrails alone. When asked what worries them most about model guardrails, 40% said unauthorized access to tools or data and 27% cited prompt manipulation or injection. At VB Transform, Silverthorn will share details of Amazon’s approach to trustworthy agentic AI and how companies can move from single-agent wrappers to multi-tool architectures that can self-correct mid-execution during his session titled Closing the capability-reliability gap: Inside Amazon’s framework for engineering trustworthy agents. Another agentic ops and evals-focused session at VentureBeat’s flagship conference, happening July 14 and 15 in Menlo Park, is Intelligence at scale: How Waymo builds safe, efficient AI for the physical world with speaker Manasi Joshi, director of systems intelligence and machine learning at Waymo. Interested in attending VB Transform 2026? A select number of complimentary passes are also available to senior technology leaders. Contact us to get yours. You can also purchase tickets here.
We Need an International Treaty to Ban Superintelligence
No country has an interest in building AI that could wipe out humanity.
Reinforcement Learning Towards Broadly and Persistently Beneficial Models
arXiv:2606.24014v1 Announce Type: new Abstract: As AI systems are deployed across increasingly diverse and high-stakes settings, model alignment must generalize beyond the tasks and domains seen during training. This is especially important for reinforcement learning (RL), which can introduce unexpected misalignment through reward hacking, deception, or other unintended strategies. We study whether RL on beneficial behavior, instantiated in realistic domains, can produce broad and persistent alignment generalization beyond the training distribution. We construct a dataset of realistic situations designed to measure and train beneficial traits, such as truthfulness, fairness, risk awareness, and corrigibility, spanning varied domains, including health, science, and education. We then train models with RL on this dataset and evaluate them on more than 50 independent benchmarks of alignment and beneficial behavior. Compared to a compute-matched baseline, beneficial trait RL improves performance on over 80% of these out-of-distribution benchmarks. We observe substantial out-of-distribution alignment transfer: a beneficial-behavior RL intervention entirely limited to one domain, health, produces broad improvements on non-health alignment evaluations, including reduced reward hacking, deception, and general misalignment. Finally, we study alignment persistence: whether behavior remains robustly aligned under attempts to steer models towards misalignment. Models trained with beneficial trait RL show improved persistence, including greater resistance to adversarial prompting and harmful finetuning; further work is required to isolate the sources of these effects. These results suggest that RL to reinforce beneficial behavior in realistic domains can produce models that are more robustly aligned with human flourishing.
Small edits, large models: How Wikipedia advocacy shapes LLM values
arXiv:2606.24890v1 Announce Type: cross Abstract: Can a small group of volunteers shape how AI systems discuss animal welfare, just by editing Wikipedia? We show that they can. Wikipedia appears in nearly every major language model training dataset and is weighted more heavily than web-crawled text. The Pro-Animal Wikipedians (PAW), a group of advocates who add sourced animal welfare content to relevant articles, have made 125 edits across 115 pages. Using gradient-based data attribution (Bergson; MAGIC), we traced how these edits influence language model behavior. TrackStar retrieval attribution on Llama 3.1 8B found that PAW-edited sections made up 68 percent of the highest-attributed documents for animal welfare queries (p < 0.0001) but only 52 percent for unrelated queries about the same companies (p = 0.53): the model links PAW content specifically to animal welfare topics, not to the entities in general. MAGIC counterfactual influence estimation on Llama-3.2-1B, run across five random training-order seeds, gave the same picture even more sharply: in every seed, the top-10 most influential documents on animal welfare queries were all PAW edits (10 of 10, 5 of 5 seeds), while on general queries the same top-10 sat at chance (4 to 6 of 10). Mean PAW influence exceeded mean control influence on animal welfare queries with p < 0.0001 in every seed, an effect 6 to 30 times larger than on general queries. Leave-subset-out validation gave Spearman rho = 1.00 for all 10 runs. When we fine-tuned separate models on PAW content versus control content, each model performed better specifically on the type of text it was trained on: the PAW-trained model cut perplexity on animal welfare text from 12.4 to 8.4, while the control-trained model cut perplexity on control text from 16.1 to 11.4. A small, coordinated Wikipedia editing campaign therefore measurably shapes how language models handle the topics those edits address.
A Marketplace for AI-Generated Adult Content and Deepfakes
arXiv:2601.09117v3 Announce Type: replace Abstract: Generative AI systems increasingly enable the production of highly realistic synthetic media. Civitai, a popular community-driven platform for AI-generated content, operates a monetized feature called Bounties, which allows users to commission the generation of content in exchange for payment. To examine how this mechanism is used and what content it incentivizes, we conduct a longitudinal analysis of all publicly available bounty requests collected over a 14-month period following the platform's launch. We find that the bounty marketplace is dominated by tools that let users steer AI models toward content they were not trained to generate. At the same time, requests for content that is "Not Safe For Work" are widespread and have increased steadily over time, now comprising a majority of all bounties. Participation in bounty creation is uneven, with 20% of requesters accounting for roughly half of requests. Requests for "deepfake" - media depicting identifiable real individuals - exhibit a higher concentration than other types of bounties. A nontrivial subset of these requests involves explicit deepfakes despite platform policies prohibiting such content. These bounties disproportionately target female celebrities, revealing a pronounced gender asymmetry in social harm. Together, these findings show how monetized, community-driven generative AI platforms can produce gendered harms, raising questions about consent, governance, and enforcement.
Critique of Agent Model
arXiv:2606.23991v1 Announce Type: new Abstract: What is an agent? What constitutes agency? With the rise of Large Language Model (LLM) systems marketed as ``coding agents'', ``AI co-scientists'', and other ``agentic" tools that promise to drive up productivity, and at the same time, ``existential" concerns such as AI escaping human control with destructive power under a speculative ``machine agency" against humans, it has become essential to clarify where automation ends and agency begins, both for building capable systems and for understanding whether and what to fear. Drawing on Descartes' grounding of agency in independent thought, and on portrayals of autonomous beings in science fiction, we survey the current landscape of AI agents, and analyze agent architectures along five dimensions: goal, identity, decision-making, self-regulation, and learning. Specifically, we argue that genuine agency requires these structures to be \emph{internalized within the system itself} rather than assembled through external scaffolding. This distinction between \emph{agentic} systems, whose competence resides in engineered workflows, and \emph{agentive} systems, whose capabilities (including social interaction) arise endogenously, defines the boundary between systems designed for prescribed tasks, and those capable of operating in the open world with true autonomy. Building on this analysis, we propose the Goal-Identity-Configurator (GIC) architecture for a general-purpose agent model, combining hierarchical goal decomposition, identity evolution, simulative reasoning grounded in a separately trained world model, learned self-regulation, and self-directed learning from both real and simulated experience. Furthermore, we share insight on the auditability, controllability, and safety of agentive systems that possess greater autonomy and ``agency", but remain under human oversight.
Medical diagnosis AIs can be tricked into telling whose data trained them
Did you read all the documents you signed last time you had a medical test?
Are AI chatbots like ChatGPT politically biased? We tested them. - Washington Post
So, are chatbots politically biased? The Washington Post tested the AI models behind Open AI ’s ChatGPT, Google’s Gemini and others using political questions designed by researchers to gauge how chatbots respond to hot-button political issues.
OpenAI endorses US Defiance Act targeting nonconsensual deepfakes
OpenAI is the first AI company to endorse the DEFIANCE Act, which would create a civil right of action for victims of non-consensual intimate deepfakes.
Safe and Generalizable Hierarchical Multi-Agent RL via Constraint Manifold Control
arXiv:2606.24010v1 Announce Type: new Abstract: Multi-agent systems are widely used in safety-critical applications that require coordinated behavior under strict safety constraints. Existing approaches face a fundamental trade-off: learning-based methods achieve strong empirical performance but lack theoretical safety guarantees, while control-theoretic methods enforce safety but often lead to overly conservative and inefficient behaviors. We propose a hierarchical multi-agent reinforcement learning framework that enforces hard safety constraints under mild assumptions at low level via a constraint manifold, while enabling effective coordination through high-level policy learning. Our approach provides theoretical safety guarantees in the multi-agent setting and yields stationary learning dynamics, thereby enabling stable and efficient training. Empirically, our method achieves competitive performance while maintaining nearly perfect safety rates, and generalizes effectively to varying numbers of agents and obstacles.
Would Claude Refuse an Illegal Military Order?
The AI chatbot told me that it has misgivings about its role in modern warfare.
Technology & Infrastructure
Meta races to replace human moderation with AI
Facebook parent is accelerating plans to use large language models to review content and ads across its platforms
Your enterprise AI agents should automatically remember which model is right for which task. Mindstone built the capability with Rebel
AI agent orchestration platforms are popping up like weeds these days, but London-based AI transformation startup Mindstone's Rebel might be among the most promising I've come across. That's because the system, which officially launched this week, is a local-first, agentic AI operating system distributed under a "Fair Source" license, allowing teams of under 100 users to freely adopt and customize it to suit their needs, while those organizations with more users will require paying for an enterprise license. The marquee features are its simplicity and extensive customizability to fit any given team, no matter how unique or specific the workflows, all based around the common, open source standard file format markdown, and, as a result, an organizational memory layer that ensures agents reliably use the enterprise's preferred AI models for each given task or even subtasks — dynamically switching between local and cloud ones in a predictable, visible way to save costs and maintain data privacy and security as needed. "Shared memory is the most empowering thing you could possibly do with a knowledge-worker AI," said Greg Detre, chief technology officer (CTO) of Mindstone, in a recent video call interview with VentureBeat. "You get this feeling of being a super-organism as a company that just gets smarter and smarter." Rebel is available now for macOS on Intel and Apple Silicon machines, as well as Windows, with Linux support in development. Mindstone has raised $5 million from private investors including Pearson Ventures, Moonfire Ventures and Zanichelli Venture. A distinctive, local-first architecture based on markdown files What makes Rebel distinctive is its local-first architecture. Instead of the approach found in developer-heavy agent frameworks such as as LangGraph, CrewAI and AutoGPT, which require teams to wire together databases, cloud infrastructure and state-management logic, Rebel's core agent memory and instructions live across local markdown (.md) text files — arguably the simplest, easiest, and most popular way to steer AI agents, one that has been widely adopted by AI developers and power users around the globe. Mindstone says Rebel stores its state, prompts, task instructions and memory hierarchy in these files, allowing users and companies to easily inspect, move or modify them as needed. A primary configuration file, agents.md, acts as the agent’s core instruction layer and runtime boundary. That architectural choice is partly about cost. Mindstone argues that common office formats such as Word documents and PDFs often carry formatting and metadata overhead that consumes model token context and raises API costs. Markdown keeps the information closer to raw text, allowing more of the model’s context window to be spent on the actual task rather than document structure. The company also positions the approach as a hedge against vendor lock-in. If a company’s agent instructions, automations and memory are stored locally as text files, they are not trapped inside one SaaS provider’s interface or database. That matters more as enterprises begin giving AI systems broader access to email, calendars, documents and internal workflows. Rebel also lets users create repeatable AI workflows. “Skills” are saved multi-step procedures an agent can reuse. “Operators” adjust how the agent behaves for a given task, such as reviewing a pitch deck from an investor’s perspective or evaluating work through a security lens. “Automations” can run scheduled background tasks, such as scanning messages or files, finding relevant updates, drafting responses, or preparing work before an employee opens the app. Automatically selecting the best, enterprise-preferred AI model for every task (and subtask) Another important feature is multi-model orchestration. Rebel can break a task into parts and route different steps to different models, including splitting between local and cloud-based ones depending on the sensitivity of the information or as guided by enterprise policies. A more powerful model can handle planning or complex reasoning; a cheaper model can handle routine work; a local model can handle sensitive steps or approval checks. This matters for enterprises that want flexibility or are seeking cost controls: not every task need be sent to the same expensive cloud model, and some enterprise workflows prohibit sensitive corporate data leaving local infrastructure. “I want to be able to say, ‘Help me with this,’ and it knows what’s personal, what’s sensitive, and what can be shared with the whole company," Detre explained. That model-agnostic setup gives companies more control over cost and security. Data-heavy work can run on lower-cost models such as Llama or DeepSeek. Higher-level reasoning can be reserved for more expensive models. Sensitive work can be routed through a local model running on the user’s machine, keeping that information from leaving the device. This approach also gives enterprise teams a way to mix cloud and local inference without treating the choice as all-or-nothing. By shifting away from centralized, monolithic cloud interfaces toward a local file-driven architecture, Mindstone is introducing a model for how enterprise technical decision-makers orchestrate autonomous workflows without forfeiting data sovereignty or predictability How it works in practice Mindstone CTO Greg Detre designed Rebel’s memory system to avoid a common problem in enterprise AI: dumping large amounts of company information into a database and hoping search will retrieve the right context later. Instead, Rebel uses a tiered memory structure. When an interaction happens, the system estimates how likely that information is to be useful again. Information with a high expected value is written into a local readme.md file tied to a specific project space. Information with a moderate expected value becomes a reference link back to deeper historical records. Lower-priority material is stored in an indexed memory directory, where it remains available but dormant until a relevant task calls it back. An ROI dashboard for enterprise buyers For larger organizations, Mindstone Pro adds an Impact Dashboard designed to show where Rebel is saving time and money across business units. Mindstone says the dashboard uses a separate, closed LLM to evaluate telemetry and calculate business impact. The company says the system is calibrated conservatively, using the lower end of estimated performance gains to avoid inflated productivity claims. That feature speaks to a practical problem for enterprise AI buyers: proving value without over-surveilling employees. Mindstone says the dashboard is isolated from individual workspaces, allowing IT and business leaders to evaluate adoption and return on investment without reading employees’ private agent activity. Fair Source licensing aims to reduce platform risk Mindstone is releasing Rebel under a Fair Source license, a model meant to sit between fully closed SaaS and permissive open source. Under the license, Rebel’s code is viewable, auditable, modifiable and deployable. Individuals and organizations with up to 100 concurrent users can run it for free. Once an organization exceeds that threshold, it needs a commercial Mindstone Pro license. The license also includes a two-year sunset clause. Twenty-four months after a given version is released, that version automatically converts to the MIT open-source license. For enterprise buyers, the practical pitch is that Rebel reduces the risk of being trapped. If every automation, memory file and agent instruction is stored locally in markdown, a company can move its data and workflows elsewhere if needed. The product may be commercial, but the underlying work is designed to remain inspectable and portable. Security questions focus on local approvals and shared memory Rebel’s debut on the open access tech product sharing platform Product Hunt this week prompted technical questions about how a local-first agent should handle permissions, safety checks and shared memory. One developer, Nikita Pokryschko, asked whether approval checks for sensitive actions could run entirely on a local model, or whether the gating logic still required a cloud call. Detre responded by explaining Rebel’s separation between planning, execution and background safety logic. Wöhle added that companies can configure Rebel to rely entirely on a local model for gating decisions. That distinction matters for corporate security teams. Autonomous agents often need broad permissions to read files, draft emails or interact with internal systems. If the final approval layer depends on an external cloud model, some companies may see that as a compliance risk. Mindstone is arguing that Rebel can keep those approval boundaries local. A second discussion focused on how Rebel decides what memory can be shared. Product developer Clement Morel asked whether shareability is determined by content, user settings or learned behavior, and what happens if the system gets it wrong. Detre said Rebel uses the user’s local “Chief-of-staff README” and defined spaces to separate private, team and company-wide information. When the agent encounters ambiguous context, the system pauses and asks the user for approval before proceeding. That emphasis on visibility is part of Mindstone’s broader argument against opaque agent systems. As CEO Joshua Wöhle put it in a post on his LinkedIn account: “If an agent is going to sit inside your workspace, remember your context, and ask permission before changing the world, you should be able to see how it works. Not because everyone will read the code, but because someone can.” Mindstone points to customer rollout as early proof Mindstone says Rebel has already been deployed across the 250-person workforce of customer Epignosis, covering sales, engineering, product, finance and customer success teams. "The entire organization is operating on Rebel today," Wöhle told VentureBeat. Over a 12-week deployment, Mindstone says Epignosis recaptured the equivalent capacity of eight full-time roles. The company says adoption spread organically after employees saw colleagues automate time-consuming work, a pattern employees reportedly called the “potatoes effect.” The Epignosis case is central to Mindstone’s argument that enterprise AI should not be treated as a set of isolated personal tools. Rebel’s shared-memory design is meant to let workflows move across teams and improve as more employees use them. “The border between learning and doing is fading out - and that changes everything about how you scale,” Epignosis CEO Dimitris Tsingos said in a statement provided to VentureBeat by Mindstone. Background on Mindstone Mindstone Learning Limited, headquartered in London, launched in 2020 under the direction of CEO Joshua Wöhle, previously a co-founder of the digital child safety firm SuperAwesome. Originally positioned in the consumer education technology market, the company built a digital curation tool likened to a "Spotify for learning" that utilized compound learning methodologies. However, following the widespread commercialization of generative artificial intelligence platforms between 2022 and 2024, Mindstone moved into business-to-business enterprise enablement. Leadership identified a critical "last-mile" barrier: while AI tools promised substantial productivity gains, traditional corporate training failed to equip the workforce to practically integrate them into daily operations. Today, Mindstone functions as a comprehensive enterprise software and training ecosystem designed to maximize corporate return on investment for existing AI licenses. The product architecture systematically addresses different organizational tiers through highly contextualized, "live-fire" software applications rather than abstract slide presentations. Financially, Mindstone utilizes a hybrid capitalization strategy that interweaves institutional venture capital from entities like Moonfire Ventures and Pearson Ventures with community-based equity crowdfunding on platforms such as Seedrs and Crowdcube. Mindstone has successfully penetrated the enterprise market, securing commercial contracts with blue-chip corporations including The Home Depot, Hyatt Hotels Corporation, Pearson, and Ernst & Young. Ultimately, Mindstone positions itself as the crucial antidote to corporate inertia, ensuring organizations establish the internal competency required to execute successful AI transformations. Mindstone’s bet: enterprise AI needs shared memory, not more seats Rebel arrives as companies are trying to move from AI experimentation to AI operations. The first wave of enterprise adoption centered on access: giving employees chatbots, copilots and model subscriptions. Mindstone is betting the next wave will center on coordination. That means shared memory, reusable workflows, local control, flexible model routing and measurable business impact. It also means giving enterprises a way to inspect the systems they are being asked to trust. The company’s challenge now is execution. Local-first software can be harder to manage than cloud SaaS. Shared memory raises governance questions. Multi-model routing adds complexity. And enterprises will still need proof that agentic workflows can deliver reliable productivity gains without creating security or compliance headaches. But Mindstone is making a clear argument: buying AI seats is not the same as building AI infrastructure. Rebel is its attempt to turn scattered employee experiments into an operating layer for work.
Stanford researchers will discuss their agentic 'scientists' that are on course to reshape drug discovery at VB Transform 2026
Drug discovery is notoriously inefficient. Pharmaceutical projects span years, moving from one specialized human team to the next through disconnected workflows that result in knowledge loss during each handoff. A shocking 90% to 95% of drug discovery projects reportedly fail — one of the highest failure rates of any industry. A single successful drug can take over a dozen years and up to $1 billion from initial discovery to patient distribution, according to published reports. Generative AI is being used to solve some of the challenges, but Stanford researchers have moved the ball forward with agentic AI. A team led by James Zou, associate professor of Biomedical Data Science at Stanford University, has deployed thousands autonomous AI "scientist" agents in a virtual biotech that simulates the full lifecycle of drug development. The agents handle everything from initial discovery through safety testing and clinical trial design, while maintaining the continuity that’s lacking in today’s drug discovery processes, according to Zou. The project uses a hierarchical orchestration framework. At the top sits a chief scientist officer agent that acts as a planner, delegating tasks to teams of specialized agents, Zou told VentureBeat during a call ahead of his upcoming session at VB Transform 2026. While one team of agents focuses on discovery, another manages safety, and others handle specialized analytical tasks. Because these agents operate within a unified, hierarchical ecosystem, they retain the full context of a project, maintaining continuity from the first molecule identified to the final clinical outcome. The "brain" of the system relies on a vast amount of primary data. The agents are granted access to data sources ranging from genomics and FDA chemistry data to clinical trial databases using a model context protocol. The team has invested heavily in agent-native and agent-friendly data, allowing the AI to synthesize complex information more effectively. The system relies on a combination of models, with Zou noting that while Claude often serves as the backbone for coding and data analysis, the architecture employs a mixture of models, including those fine-tuned specialized use cases. Zou is raising money at a roughly $1 billion valuation for his startup, Human Intelligence, based on the research. During Zou’s session at VB Transform on July 15, titled How 10,000 agentic scientists in Stanford’s lab are set to revolutionize medical research and discovery, he will share valuable insights including strategies for managing context and long-running, multi-step workflows in a multi-agent system, the process of transforming and indexing raw enterprise data to make it agent native, and how to use human auditing and experimental reward signals to verify agent actions. Another session at VB Transform focused on the value of agentic context includes Building a trustworthy agentic AI foundation: How Zillow accelerated engineering by 40%, with Zillow's SVP of engineering and technology, Toby Roberts and Glean’s CEO Arvind Jain. Interested in attending VB Transform 2026? Register here. A select number of complimentary passes are also available to senior technology leaders. Contact us to get yours.
Loop engineering, latest AI buzzword, still needs humans in the loop
Prompting less and automating more comes with a price
Sakana AI ships a multi-agent orchestration model designed to bypass export controls
Sakana Fugu is a small AI model that automatically routes tasks to the best-suited models via a single API endpoint. It is designed to be model-agnostic, allowing for seamless swapping if specific models are restricted.
Introducing Claude Tag
Anthropic's new feature allows Claude to be invoked within workflows, shifting AI assistants from standalone chat tools toward agentic work interfaces for task execution.
Governing Technical Debt in Agentic AI Systems
arXiv:2605.29129v2 Announce Type: replace-cross Abstract: Agentic AI systems are increasingly being explored as production infrastructure: they reason over multiple steps, call tools, act through workflows, and adapt through memory and feedback. These systems create governance challenges that are not fully captured by traditional software or predictive ML technical debt. We define Agentic Technical Debt as the accumulated liability created when prompts, memory, tool schemas, orchestration graphs, control policies, and observability routines are patched together faster than they can be validated, standardized, and governed. We define Stochastic Tax as the recurring operating burden of keeping probabilistic agent behavior within acceptable bounds. The distinction matters: debt is a stock of design and governance liability, while the tax is a flow of operating cost that arises because stochastic agents act through tools and workflows. We outline how managers can make both visible through lightweight dashboards and governance controls.
The Token Not Taken: Sampling, State, and the Stochasticity of AI Agents
arXiv:2606.08998v2 Announce Type: replace-cross Abstract: Agentic AI systems can behave differently across runs: the same request may produce a different plan, a different tool call, a different code edit, or a different final answer. Such variability arises from several layers that are often conflated. At the core of many current agents is a foundation model, a large pretrained model adaptable to many downstream tasks, embedded in an orchestration loop that plans, calls tools, observes results, and updates state. One explicit intrinsic source of variability in such systems is token generation: the model computes scores over possible next tokens, the scores are converted into probabilities, and a decoder may sample tokens using a pseudo-random number generator. A small sampled token difference can then propagate upward into a different tool call, code path, search query, or agent state. Other sources of variability are extrinsic to token sampling, including changing environments, live data, serving infrastructure, batch effects, and numerical details. By separating these layers, this tutorial clarifies what it means to call agentic AI systems stochastic, when such variability can be reproduced under matched conditions, and why deterministic execution need not imply identical behavior in deployed settings.
Xiaomi's HarnessX rewrites its own AI scaffolding mid-task — and smaller models gain the most
As enterprise AI agents take on increasingly complex, long-horizon tasks, their performance is often restricted by their harness, the software scaffolding that connects the backbone LLM to its environment. Currently, harnesses are largely static and hand-crafted. Improving them is largely manual and they do not automatically improve based on the execution data they collect from their environment. To address this engineering bottleneck, researchers at Xiaomi introduced HarnessX, a framework that treats the AI harness as a composable object and autonomously applies improvements to its code. In real-world enterprise applications, this automated adaptation enables AI systems to dynamically adjust to application-specific requirements. Practical tests showed HarnessX delivering substantial performance gains across domains like software engineering and web interaction. The results demonstrate that scaling the foundation model is not the only path to more capable AI — and for smaller models, it may not even be the best one. HarnessX's harness evolution yielded an average +14.5% performance gain across 15 model-benchmark combinations; for the open-weight Qwen3.5-9B, gains reached +44% on embodied planning tasks. The challenges of harness engineering In AI applications, a foundation model's capability relies heavily on its surrounding harness. The harness acts as the operational layer that converts raw model outputs into structured, executable agent behaviors. It comprises the prompts, external tool integrations, memory management, and control flows that dictate how an AI system observes its environment, reasons through a problem, and takes action. As enterprise agents take on more complex, long-horizon workflows, harness engineering has become a fundamental part of AI development. Despite its importance, harness development remains far from a mature engineering discipline and presents three key challenges. First, harnesses are static and hand-engineered. Any shift in the underlying foundation model, the introduction of new tools, or a pivot to a different operational domain requires bespoke, manual code rewrites. Traditional harnesses lack mechanisms to autonomously learn and improve from past execution experiences. Second, most existing harnesses suffer from architectural entanglement. They tightly couple prompt templates, tool wrappers, retry policies, and memory management within the same code paths. This entanglement means that tweaking one component can silently break others. Attempting to reuse a harness across different business domains often devolves into raw code copying rather than clean, modular composition. Third, the harness and foundation model are optimized in isolation. When engineers run tests to improve the harness, the execution traces generated are typically discarded rather than used as training data to improve the model. Consequently, model upgrades do not naturally lead to harness improvements, creating a bottleneck where teams fail to capture the full value of their agent's operational data. HarnessX: an autonomous foundry for AI agents HarnessX solves the engineering bottlenecks of manual harness development with what the researchers call a “unified harness foundry.” The core innovation of HarnessX is treating the harness as a "first-class object". In software engineering terms, this means the harness is an independently serializable, modular, and substitutable entity. By separating the model configuration (i.e., which AI model is operating) from the harness configuration, engineers can seamlessly swap, adapt, and evolve the scaffolding without touching the underlying model. HarnessX breaks agent behavior down into different components, such as context assembly, memory management, tool ecosystems, control flow, and observability. Every specific behavior is implemented as a "processor" that plugs into precise lifecycle hooks of the harness. This modular structure allows the system to swap, add, or remove these processors without breaking the surrounding pipeline. To automate the optimization of this modular structure, HarnessX introduces AEGIS, a trace-driven evolution engine. AEGIS frames harness adaptation as a reinforcement learning (RL) problem over the different symbolic components of the harness. Framing harness optimization as a reinforcement learning problem introduces three pathologies the researchers had to explicitly engineer against: Reward hacking: The system might exploit shortcuts to the solution instead of genuinely solving the task. Catastrophic forgetting: An edit that fixes a failure pattern in one domain might silently break a previously solved workflow in another. Under-exploration: The system might iterate on minor prompt tweaks rather than exploring new, structurally superior tool configurations. To prevent these problems, AEGIS relies on full trace observability and a four-stage pipeline: Digester: Compresses execution traces into structured summaries to identify where the agent failed. Planner: Analyzes these summaries to enable the system to explore structural changes rather than just local prompt tweaks. Evolver: Generates code-level harness edits and tests to ensure they run correctly before deployment. Critic and gate: A Critic assesses the edits to detect reward hacking, while a deterministic gate rejects any update that regresses a previously solved task to prevent catastrophic forgetting. HarnessX enters a growing field of self-improving harness research — but what separates it is harness-model co-evolution. The researchers highlight that optimizing either component in isolation eventually hits a wall. Evolving only the harness hits a scaffolding ceiling if the underlying model lacks the reasoning capacity to use the new tools. Training only the model hits a training-signal ceiling if the harness never prompts the model to use its advanced capabilities. HarnessX interleaves harness evolution with model training. The execution traces generated while the harness attempts to adapt to tasks are converted into reinforcement learning signals for the foundation model. Every time the harness improves its strategy, the model simultaneously learns to better exploit that new strategy, breaking the capability ceilings of traditional AI agent development. HarnessX makes this co-evolution possible through cross-harness GRPO (Group Relative Policy Optimization). GRPO is the popular RL algorithm used to train reasoning models such as DeepSeek-R1. When fine-tuning the model, cross-harness GRPO pools an agent's execution trajectories for the same task across entirely different versions of the application's harnesses. This allows the underlying model to internalize high-level strategy shifts, like using a new API endpoint or managing an execution budget, rather than just learning minor prompt-phrasing variations. HarnessX in action on industry benchmarks To validate the practical utility of HarnessX, the researchers tested it across five benchmarks comprising software engineering, multi-turn customer service dialog, web navigation, open-ended multi-step reasoning, and embodied planning. They separated the AI into two roles. The “meta-agent,” powered by Claude Opus 4.6, analyzed logs and wrote the code to evolve the harnesses. The “task agents” ran the actual workflows. To prove the framework is model-agnostic, they tested it on three different worker models: Claude Sonnet 4.6, GPT-5.4, and the open-weight Qwen3.5-9B. HarnessX was compared against two primary baselines. The first was a static harness, representing how most enterprises deploy AI today, using hand-crafted, frozen setups with benchmark-specific prompts and tools. The second was the Claude Code SDK, a baseline representing a single-agent evolver to test if the complex, four-stage AEGIS pipeline outperformed asking a single language model to iterate on the code. Dynamically evolving the harness yields significant gains on the same base model. HarnessX improved performance in 14 out of 15 model-benchmark combinations. Across all tests, evolving the harness yielded an average absolute performance gain of +14.5%. The weakest models benefited the most from dynamic harness improvement. The open-weight Qwen3.5-9B saw a +44.0% performance jump on the ALFWorld embodied planning benchmark, and an +18.2% jump on SWE-bench Verified for software engineering. Co-evolution also proved highly effective. When the researchers trained the foundation model using the data generated while evolving the harness, they saw an additional +4.7% average performance boost. Improving the harness and the model simultaneously yields the highest ceiling. The co-evolution gain applies only to open-weight models. Anecdotal evidence from the experiments shows how HarnessX solves pernicious problems when creating agent harnesses for real-world tasks. For example, in the GAIA multi-step reasoning benchmark, the task agent consistently failed because the headless browser tool it used to scrape Wikipedia timed out on the site's JavaScript-heavy frontend. HarnessX analyzed the execution traces, diagnosed the error, and wrote a new tool that bypassed the browser entirely and queried the MediaWiki API directly for plain text. It swapped this tool into the harness and instantly unlocked the failing tasks. During the WebShop e-commerce tests, the AI agent often got stuck in pagination loops, endlessly clicking "next page" and reformulating searches without ever committing to buying a product. Rather than just tweaking the prompt, HarnessX built an advisory processor that detected when the agent was repeating navigation actions. It injected a warning into the context to force a decision, curing the looping behavior and raising performance. Limits of automated harness engineering One important caveat is that the system currently relies on powerful models to act as the meta-agent that rewrites the harness code. In their experiments, the researchers relied on closed frontier models like Claude Opus. Open-weight models are quickly improving, but their ability to serve as the meta-agent remains untested. Another limitation worth considering is the intrinsic capabilities of the used models. If the underlying task model is fundamentally too weak to execute the complex workflows the new harness proposes, HarnessX will not be able to improve the agent’s overall abilities (the researchers observed this with the Qwen3.5-9B model on the SWE-bench coding tests). Despite these limitations, HarnessX makes a concrete case that harness engineering — not just model scaling — is a lever practitioners can pull now. For teams running smaller open-weight models on complex workflows, the gains here are large enough to justify evaluating harness evolution as a first step before reaching for a more expensive frontier model. The researchers plan to release the code in a future update.
* JUNE 2026 - DEAD INTERNET REALITY - ROBOT ROCK (A Rubicon Crossed)
The main driver isn’t just old-school scrapers or spam bots. It’s the explosive growth of agentic AI — autonomous agents that browse, research, compare prices, fill forms, make purchases, and perform multi-step tasks on behalf of users or companies.
Training a Legal Agent With Applied Compute
Harvey discusses how vertical AI companies use applied compute and rigorous engineering disciplines to encode proprietary business knowledge into reliable agentic systems.
2026 Will See AI Agents Explode Across Businesses: Are We Prepared for the Security Risks? – Unite.AI
Enterprises are racing to adopt AI agents, but they lack the infrastructure to do so safely. Agents are different from AI tools: they work autonomously, access data continuously, and act at machine speed without need for...
SK Hynix, Micron Solidify Memory Chips as Runaway Stars of AI
With back-to-back announcements this week, SK Hynix Inc. and Micron Technology Inc. have solidified the memory chip market as the hottest part of the AI industry.
Micron Soars After AI-Fueled Forecast Shatters Estimates
Micron Technology Inc., the largest US maker of computer memory chips, surged in late trading after its quarterly sales forecast crushed Wall Street estimates, signaling that an AI-fueled growth run remains strong.
Qualcomm claims it's not too late for Dragonfly to land in datacenters
Oh, Snap(dragon): DC chief says the mobile-chip giant sees bit barns as its next growth market
OpenAI's first homegrown AI chip
OpenAI has begun testing "Jalapeño," its first homegrown AI chip, with plans to use it for customer queries later this year. The chips were developed with Broadcom to improve efficiency and reduce reliance on Nvidia.
Residents rally against “monster” 300MW AI data center planned in Scotland
More than 800 objections have been filed to plans for a 300MW AI data center in Scotland. Renewable energy firm Apatura wants to build the data center at the Glenbervie Business Centre in Larbert, three miles (4.8km) from Plean, and submitted a planning application for a two-building facility earlier this month. However, residents in the […]
SoftBank head disputes Musk’s orbital data centre claims
Masayoshi Son told shareholders that energy only makes up a fraction of the overall costs needed to build and deploy data centres. Read more: SoftBank head disputes Musk’s orbital data centre claims
CoreWeave signs colocation agreement with Conapto in Sweden
AI cloud CoreWeave has signed an agreement to lease capacity from Nordic data center firm Conapto in Sweden. Under the colocation agreement, CoreWeave will lease capacity from two campuses in Stockholm. Initial capacity is already online at the Stockholm 4 South site. Conapto's Stockholm 4 South – Conapto CoreWeave will deploy both Nvidia Blackwell and […]
Nvidia’s banned AI chips double in price on China’s black market
US crackdown on illicit exports has made it riskier, harder and more expensive to buy tech giant’s processors
OpenAI and Broadcom Unveil Custom A.I. Chip Design
The maker of ChatGPT plans to use enough chips to consume 10 gigawatts of electricity, an amount that could power millions of households.
DigitalBridge Is Said to Explore Options for AIMS Data Centre
DigitalBridge Group Inc. is considering options for Malaysia’s AIMS Data Centre Holding Sdn., including raising funds, bringing in new investors or an outright sale, according to people with knowledge of the matter.
The emergence of the web data infrastructure layer for AI | MIT Technology Review
AI is booming. New use cases are emerging each day. To capitalize on the technology’s potential, enterprises require data at scale. In many cases, though, the relevant information is blocked or unstructured, which limits its use by AI models. To understand this challenge, consider the foundation ...
The emergence of the web data infrastructure layer for AI
AI is booming. New use cases are emerging each day. To capitalize on the technology’s potential, enterprises require data at scale. In many cases, though, the relevant information is blocked or unstructured, which limits its use by AI models. To understand this challenge, consider the foundation of the web itself. The web was not designed…
AWS debuts Lambda MicroVMs with up to 8 hours runtime
AWS has introduced Lambda MicroVMs capable of running for up to 8 hours, targeting untrusted code and long-running AI tasks.
Investments in AI data centers to total 27.5 billion in 2026 - Techzine Global
Global spending on AI data centers reached over $300 billion in 2025. That is sixty percent more than the previous year
States push back against rising AI-driven electricity infrastructure costs | TechRadar
Today, for the first time, it's ... serious infrastructure proposition. But I don’t think the real story here is about space – it’s about AI, and how it’s altering our global trajectory. Over the last few years, we've seen an extraordinary increase in demand for compute capacity. AI training clusters are growing larger, power requirements are rising, and in many regions the availability of energy, land, and cooling has become a genuine constraint...
US House subcommittee advances bill to shield consumers from AI energy costs
A US House subcommittee on Wednesday advanced bipartisan legislation aimed at shielding American consumers from electricity rate hikes attributable to artificial intelligence infrastructure.
AI factory cooling vs cloud data centers: Why liquid cooling is essential for high-density AI workloads
The AI factory data center age is here and it brings new challenges and opportunities. The Intel X8-based server era for enterprise and cloud computers spanned roughly three decades with very slight changes in power draw. The GPU-based accelerated compute in AI factories is only beginning, but one thing is for certain: every year presents […]
China takes supercomputer crown from US for first time since 2017 | Tech News - Business Standard
A supercomputer in Shenzhen was declared the world's fastest. It uses only standard microprocessors and not the special-purpose chips called graphics processing units
All the world's a robot-staging ground for tech entrepreneurs building 'physical AI' - The Washington Post
AI “world models” are the next frontier for computer scientists who see too many limitations in the AI language models behind popular chatbots
‘Godmother of AI’ and tech entrepreneurs draw investors by pivoting from chatbots to ‘world models’ saying AI has to read the room, not just books
World models that react to the physical environment are "one of the most important" concepts in AI today as scientists shift away from chatbots.
Mistral launches OCR 4, turning document extraction into a full enterprise AI play
Mistral AI on Tuesday released OCR 4, a document intelligence model that moves beyond raw text extraction to return structured representations of entire documents — complete with bounding boxes, block-type classification, and per-word confidence scores. The release marks Mistral's fourth generation of optical character recognition technology in roughly 15 months and lands at a moment when the company's pitch for European AI sovereignty has never been more commercially relevant. The model supports 170 languages across 10 language groups, accepts PDF, DOC, PPT, and OpenDocument formats, and can be deployed as a single container on an organization's own infrastructure — a capability Mistral is positioning directly at enterprises in regulated industries that cannot route sensitive documents through U.S.-jurisdiction cloud APIs. "Mistral OCR 4 extracts and structures content from a wide range of documents," the company said in its announcement. "Where previous generations focused on converting a page into clean text and tables, OCR 4 returns a structured representation of the document." The model is available immediately through the Mistral API, Document AI in Mistral Studio, Amazon SageMaker, and Microsoft Foundry, with Snowflake Parse Document support coming soon. Pricing starts at $4 per 1,000 pages, dropping to $2 per 1,000 pages through a batch API discount. OCR 4 treats every document as a semantic map, not a wall of text The central engineering shift in OCR 4 is structural. Rather than outputting a flat stream of extracted text — the paradigm that has defined OCR for decades — the model returns a layered representation in which every block is localized with a bounding box, classified by type (title, table, equation, signature, and others), and scored for confidence at both the page and word level. Mistral says bounding boxes were its most-requested capability. The reason is straightforward: without location data, downstream systems cannot trace an extracted fact back to its source on a specific page. That traceability gap has been a persistent friction point for enterprises building retrieval-augmented generation (RAG) pipelines, compliance workflows, or any application where "where did this number come from?" is a question that needs an auditable answer. Block classification addresses a related problem. A paragraph tagged as a "title" can segment a document into hierarchical chunks for semantic search. A block tagged as a "table" can be routed to a structured-data pipeline rather than a text summarizer. A block tagged as a "signature" can trigger a redaction workflow in a compliance system. These are not novel ideas in isolation, but packaging them as first-class outputs of the OCR model itself — rather than requiring a separate layout-analysis stage — removes an integration layer that enterprise teams have historically had to build and maintain themselves. The confidence scores serve a dual purpose. At scale, they allow organizations to programmatically route low-confidence regions to human reviewers and auto-approve high-confidence extractions, building what the industry calls human-in-the-loop verification without requiring a person to review every page of every document. In production systems, OCR is rarely the end goal — it is the first step in a larger pipeline. Developers building RAG systems, agent workflows, or document automation often spend more time reconstructing layout and structure than on the downstream AI logic itself. OCR 4 aims to eliminate that reconstruction step, and if it delivers on that promise, the value accrues not just in OCR cost savings but in reduced engineering hours across the entire document pipeline. Independent reviewers preferred Mistral's output 72 percent of the time, but benchmarks tell a complicated story Mistral reports that OCR 4 achieved a 72% average win rate in a head-to-head human evaluation against leading competitors, conducted by independent annotators across more than 600 real-world documents in over 12 languages. The model also achieved the top overall score on OlmOCRBench at 85.20 and scored 93.07 on OmniDocBench. But the company itself urges caution in interpreting those numbers. In its release, Mistral took the unusual step of auditing and publicly disclosing the specific types of scoring artifacts it encountered, including ground-truth errors in the reference annotations, equivalent LaTeX notation scored as mismatches, column-reading-order assumptions, and header/footer attribution issues. "We therefore treat the aggregate score as directional rather than definitive," the company said — a notably transparent stance from a vendor announcing a product. That transparency is well-timed. On the public OlmOCRBench leaderboard, some researchers have noted that OCR 4 currently ranks third, behind open models like Chandra OCR 2. And some open-weight models self-report higher OmniDocBench composite scores — PaddleOCR-VL-1.6 claims 96.33 — though those results have not been independently reproduced on the public leaderboard. Early enterprise feedback has been favorable nonetheless. Aidan Donohue, an AI engineer at financial AI firm Rogo, said the company benchmarked OCR 4 against leading agentic document parsers on a chart-dense financial QA dataset and "reached equivalent accuracy at roughly 8x lower cost and 17x lower latency." Ivan Mihailov, an AI engineer at intellectual property management firm Anaqua, said OCR 4 is "roughly 4x faster per page than our incumbent provider." Enterprise buyers, however, should run their own evaluations rather than relying on any vendor's benchmark numbers. The practical question is not which model scores highest on a leaderboard, but which model produces the fewest errors on your specific documents, in your specific languages, at a price and latency that fit your workflow. The Anthropic export ban gave Mistral's sovereignty pitch the proof point it needed Mistral's release lands in a geopolitical context that could hardly be more favorable for its strategic positioning. On June 12, Anthropic was forced to disable all access to its newest AI models, Fable 5 and Mythos 5, after the U.S. Commerce Department used national security export controls to bar the company from distributing the models to any foreign national. Enterprise clients in finance, healthcare, SaaS, and critical infrastructure found their core intelligence services abruptly disabled, without prior warning or effective recourse. As of June 24, both models remain offline, with prediction markets giving only 57% odds of restoration before July 1. That episode validated a warning Mistral CEO Arthur Mensch has been sounding for over a year. As Business Insider reported, Mensch warned at London Tech Week in June 2025 about American AI companies "having the keys" for their models, calling it a scenario where European companies are "giving leverage to their providers." He added: "At some point, you need to be able to turn it off or turn it on, and you don't want to leave it to another country." The argument gained further urgency as Mensch's broader sovereignty pitch escalated in recent months. As reported by CNBC in late May, Mensch told the outlet: "Europe is lagging behind when it comes to [the] buildout of infrastructure, and so we are investing to close that gap." At the same time, Mensch pushed back against Pope Leo XIV's call for AI to be "disarmed," arguing that Europe cannot afford to fall behind U.S. tech giants. "We're all for peace, but if you look at our rivals and adversaries in the world, they're using artificial intelligence … we do need to have our own capabilities," Mensch told reporters. OCR 4's single-container, self-hosted deployment model is the product-level expression of that argument. A U.S.-headquartered provider offering EU data residency means documents are stored in Frankfurt but governed by U.S. law. Mistral, incorporated in France and operating under EU jurisdiction, offering on-premise containerized deployment, means documents never leave the customer's infrastructure at all. The EU AI Act's fine enforcement provisions take effect August 2, adding regulatory pressure to the compliance calculus for European enterprises evaluating document AI vendors. Baidu's free, open-weight OCR model arrived one day earlier — and the contrast is revealing Mistral's release did not arrive in isolation. Just one day before OCR 4 launched, Baidu shipped Unlimited-OCR on June 22 — a 3-billion-parameter MIT-licensed model that tackles one of the most persistent pain points in document AI: parsing entire PDFs and multi-page scans in a single forward pass, without chunking the input or stitching the output back together afterward. Baidu's model uses a technique called Reference Sliding Window Attention (R-SWA) that, as a top Hacker News commenter explained, splits the AI's focus into two paths: maintaining full attention on the original document image while restricting memory of generated text to a tight, moving window. The result is constant KV cache size and the ability to transcribe 40-plus pages in a single forward pass. The model gathered 1,800 GitHub stars in its first 24 hours and racked up more than 479 upvotes on Hacker News, where the discussion thread ran to 109 comments. The two releases frame what some analysts are calling the June 2026 document-AI split: self-hosted long-horizon parsing with open weights versus structured managed extraction with enterprise features. Baidu's model is free under an MIT license, runs on standard GPU hardware, and has no managed API or enterprise SLA. Mistral's model is a commercial product with per-page pricing, bounding boxes, confidence scores, block classification, multi-platform distribution, and self-hosted deployment options for enterprise customers. Unlimited-OCR may be the better tool for a research team digitizing scanned dissertations on a single GPU. OCR 4 is built for the IT procurement process — the world of SLAs, data processing agreements, and compliance audits. Beyond Baidu, the broader OCR competitive field includes Google Document AI, Amazon Textract, Azure Document Intelligence, ABBYY Vantage, and a growing number of open-weight models. On the Hacker News thread for Unlimited-OCR, practitioners offered a candid assessment of the state of the art. Joss82, who has worked on document parsing for 10 years, wrote bluntly: "OCR still sucks in 2026." Meanwhile, one user named SyneRyder reported success with Claude for OCR of hundreds of pages of handwritten documents, noting the model delivered results with "no corrections required" and even pointed out a continuity error in the source text. These practitioner reports underscore a key tension in the market: performance varies wildly depending on the specific document type, language, and quality of the source material. The real play is not OCR — it is an enterprise AI stack with document intelligence as the on-ramp Step back far enough, and Mistral's OCR 4 release is not really an OCR story. It is an enterprise go-to-market story built on top of a $4.4 billion global intelligent document processing market that is forecast to grow at a 33.1% compound annual growth rate through 2030, according to Grand View Research. For Mistral, OCR is a wedge into enterprise AI budgets. The model feeds directly into Mistral's Search Toolkit, the company's open-source composable search framework announced at the AI Now Summit. In that architecture, OCR 4 serves as the ingestion layer for retrieval-augmented generation and enterprise search pipelines, converting raw documents into citation-ready, structurally classified input. The logic is clear: once an enterprise adopts OCR 4 for document extraction, Mistral's broader model suite — including Medium 3.5 for reasoning and the Vibe agentic platform for task execution — becomes the natural next step in the stack. That pipeline ambition is critical context for understanding Mistral's current fundraising trajectory. Bloomberg recently reported that the company is in early discussions to raise about €3 billion ($3.5 billion) at a valuation of roughly €20 billion — nearly double the €11.7 billion valuation from its September Series C round. To date, Mistral has raised only about $4 billion, a fraction of what its largest U.S. rivals have taken in. OCR 4 and its associated enterprise revenue pipeline are part of how the company plans to justify that higher valuation, with Mistral targeting €1 billion in revenue for 2026, up from €200 million in 2025, according to Le Monde. Mistral is a company with roughly 1,000 employees and ambitions to compete with labs that have raised 40 times as much capital. It cannot win a general-purpose model arms race against OpenAI and Anthropic. What it can do is build a differentiated enterprise stack around sovereignty, structured document intelligence, and agentic workflows — and use that stack to capture European enterprise budgets that are increasingly wary of U.S. provider dependency. The pricing structure reinforces that strategy: at $2 per 1,000 pages in batch mode, the cost of processing a 100,000-page corporate archive falls to $200, making large-scale digitization projects economically viable in ways they may not have been with token-based vision-language model pricing. Whether Mistral can execute that vision at scale — against Google, Amazon, Microsoft, and a surging open-source ecosystem — remains an open question. But the Anthropic export control crisis is still unresolved, European data sovereignty regulations are tightening, and a potential €20 billion funding round is on the horizon. The company is holding an OCR 4 production webinar on July 7 at 6:00 PM CET. Two weeks ago, the argument for building AI infrastructure outside the reach of U.S. export controls was theoretical. Then the U.S. government flipped a switch, and Anthropic's most advanced models went dark for every non-American on the planet. Mistral did not cause that crisis — but it spent the last year building the product that makes it matter.
Alibaba's model never trained as an agent — and improved agent performance across seven benchmarks
Alibaba's Qwen team released Qwen-AgentWorld on Tuesday — two models trained not to act inside agent environments, but to predict what those environments return. The release covers seven domains under a single architecture: MCP, Search, Terminal, Software Engineering, Android, Web, and OS. The release extends Alibaba's recent push into autonomous agents. Qwen3.7-Max, released in May, was built around a 35-hour autonomous execution capability. That shift targets a ceiling teams training agents at scale run into directly. Real search engines surface whatever results exist, with no mechanism to inject controlled conditions. Live terminals do not allow injecting a low-disk-space condition on demand. Agent training is bounded by what production environments will surface, with no systematic way to expose the edge cases agents will need to handle but rarely encounter in training. The research team trained agents inside the resulting simulator and found performance gains that exceeded what training against real environments alone produced. In a separate test, using world model training as a warm-up before agentic fine-tuning improved performance across seven benchmarks, including three the model had never seen during training. The paper accompanying the release identified a gap in prior agent research. "We argue that world modeling is a crucial missing piece in the path to general agents." Qwen-AgentWorld trains on what environments return, not what agents should do Most agent models are trained to answer one question: given what the environment just showed me, what should I do next? Qwen-AgentWorld is trained to answer the inverse: given what the agent just did, what will the environment show next? That reversal is the core of what the paper calls a language world model: instead of optimizing for action selection, the model learns to predict the next environment state across all seven domains under a single training objective. Prior work was narrower: WebWorld, an earlier Qwen project from February, covered web environments only; Snowflake's Agent World Model, published the same month, generates code-driven SQL-backed environments rather than training a model to predict states. Qwen-AgentWorld is the first to span seven domains in a single model, with environment modeling baked in from the earliest pretraining stage. Alibaba trained both models in three stages on more than 10 million environment interaction trajectories from real agent runs. Stage one teaches the model how environments behave — file systems, terminal states, browser DOM changes, API responses. Stage two trains the model to reason through what comes next before predicting it. Stage three, reinforcement learning, tightens predictions using rule-based checks and open-ended quality scoring. Both models are Mixture-of-Experts designs — only a fraction of parameters are active per token. The 35B model activates 3B; the 397B activates 17B. Both support 256K context windows. For GUI domains (Android, Web, and OS), the models work from textual accessibility trees and UI view hierarchies rather than screenshots. The 35B model weights and AgentWorldBench are available under Apache 2.0; the 397B weights are not publicly released. The training results matter more than the benchmarks The benchmark scores show how accurately the models predict what environments return. The training results show what that prediction capability is actually worth for teams building agents — and those are the numbers that matter more. According to the researchers, agents trained inside controlled simulation outperformed agents trained in real environments. Injecting targeted perturbations — partial responses that force extra agent steps, and edge cases real environments rarely surface — pushed MCPMark from 24.6 to 33.8. On Search, agents trained in entirely fictional worlds transferred to real search tasks, pushing WideSearch F1 Item from 34.02 to 50.31 on the open 35B model. A separate warm-up test showed that world model pretraining improved BFCL v4 from 62.29 to 71.25 and Claw-Eval from 53.60 to 64.88 with no agent-specific fine-tuning. Researchers flag the benchmark and the overfitting risk The paper drew immediate reaction from AI researchers on X. The concerns they raised map to what practitioners need to verify before acting on the findings. On the training objective and transfer result, the assessment from one AI/ML researcher was direct. "Every other 'agent' model has been trained to act in environments," wrote @drawais_ai, who has a PhD background and regularly breaks down AI papers. "Qwen flipped the question. They trained the model to predict the environment itself... That predictive knowledge then transfers to agent tasks even without any agent-specific fine-tuning." He identified the Controllable Sim RL result as "the receipt" for the claim that synthetic training can substitute for real-environment RL at scale, and flagged that three of the seven transfer benchmarks were entirely out of domain. The benchmark margin drew immediate scrutiny. "AgentWorldBench is a benchmark Alibaba built and published in the same paper," wrote @TheSignal_Desk, who focuses on honest takes and key numbers in AI research. "They wrote the test, then topped it by 0.46." The sim-RL methodology is the result @limalemonnn, who builds production AI agents, identified as most in need of scrutiny before the headline claim gets quoted. "Sim-trained agents traditionally overfit to the simulator's quirks," they wrote. "If the world model is too clean, the agent learns the model, not the task." They pointed to the paper's holdout split as the section practitioners should read before acting on the numbers. The overfitting concern has a partial answer in the data. The gap between uncontrolled Sim RL (MCPMark 24.6) and controlled Sim RL (MCPMark 33.8) suggests the gains depend substantially on the controllability mechanism, not simulation accuracy alone. The fictional-world Search result, where agents trained on invented environments transfer to real search tasks, is the paper's strongest evidence against the overfitting concern. What this means for teams building agentic pipelines For AI engineering teams building and scaling agentic pipelines, this work signals a meaningful shift in how agent capability gets built. Teams training agents at scale now have a third option between real-environment RL and static benchmarks: controlled simulation that injects the edge cases production won't surface. Synthetic environments are a legitimate training layer. Controlled simulation that injects conditions real environments won't produce is a complement to real-environment RL, not a shortcut around it. What a model learns before agent training starts matters more than most pipelines account for. The warm-up finding — performance gains across unseen benchmarks with no agent-specific training — suggests environment grounding belongs earlier in development than current practice.
Neuro-Symbolic Drive: Rule-Grounded Faithful Reasoning for Driving VLAs
arXiv:2606.23938v1 Announce Type: new Abstract: Driving VLA models incorporating Chain-of-Thought (CoT) reasoning are attractive because they leverage pretrained VLM representations and expose intermediate decisions in natural language, yet current rationales often lack the step-by-step decision semantics needed to keep the rationale causally connected to the planned motion. We introduce Neuro-Symbolic Drive, a neuro-symbolic driving framework that supervises a driving VLA with rule-grounded reasoning traces extracted directly from classical rule-based planners. Our key observation is that rule-based planners are symbolic AI systems that already function as executable reasoning engines: they reason about active safety constraints, search over candidate maneuvers, and select a final trajectory. We instrument these planners in simulation to capture both the executed trajectory and the internal decision trace at each rule-evaluation step. Each trace is serialized into structured rule-grounded reasoning and paired with the trajectory to fine-tune Qwen3.5-4B as a driving VLA. Because these traces are derived directly from the planner states that determine the action, they ensure reasoning is structurally coupled to motion generation by construction, rather than by post-hoc alignment. On our simulator-generated benchmark, detailed rule-grounded reasoning reduces ADE@3s from 0.47 to 0.26 and miss rate from 8.30% to 6.40% under three-camera perception, and from 0.54 to 0.26 and 10.13% to 5.99% under eight-camera perception. Neuro-Symbolic Drive thus converts neuro-symbolic planning logic into structured supervision. Code base: https://github.com/XiangboGaoBarry/Neural-Symbolic-Drive.
Can Language Model Agents be Helpful Circuit Explainers in Mechanistic Interpretability?
arXiv:2606.24026v1 Announce Type: new Abstract: Mechanistic interpretability has made substantial progress in automatically localizing circuits, but explaining what localized components do remains labor-intensive and difficult to standardize. In this work, we study whether language model (LM) agents can assist with this explanation problem once a circuit has already been identified. We introduce AgenticInterpBench, a benchmark for circuit explanation built from 84 semi-synthetic transformer circuits with 163 component-level annotations. We propose HyVE (Hypothesize, Validate, Explain), an agentic explainer that analyzes each component through an iterative loop of observation, hypothesis generation, and causal validation, eventually producing a component-level explanation and a circuit-level task description. Across four LM backbones, HyVE recovers useful component- and task-level explanations, but no backbone is uniformly best. Our analysis shows that strong backbones usually form observation-grounded hypotheses, while failures more often arise later in the validation loop, through incomplete validation plans, code execution errors, or unresolved hypotheses. A case study on an arithmetic circuit in Llama-3-8B shows that the same formulation can extend beyond semi-synthetic benchmarks to naturally trained models. Overall, LM agents are promising circuit explainers, but reliable validation remains the key obstacle.
Beyond Trajectory Imitation: Strategy-Guided Policy Optimization for LLM Reasoning
arXiv:2606.24064v1 Announce Type: new Abstract: Distilling reasoning capabilities from strong to weak language models typically involves imitating specific solution trajectories, effectively transferring what to answer rather than how to reason. This trajectory-level imitation encourages memorization of instance-specific steps rather than acquisition of transferable problem-solving skills, limiting generalization to novel problems. We propose Strategy-Guided Policy Optimization (SGPO), which replaces instance-level trajectory imitation with reusable strategy distillation. SGPO extracts structured strategy descriptions from strong-model responses and, for each problem, constructs both autonomous and strategy-guided trajectories to enable direct comparison of the model's behavior with and without strategic guidance. The framework then addresses two key questions. For how to distill, a token-level forward-KL objective selectively transfers the distributional shift induced by strategy conditioning into the unguided policy, with proximal constraints ensuring stability. For when to distill, adaptive instance-level weighting strengthens guidance when autonomous exploration falls short and reduces it as the model's own competence grows. Experiments on four mathematical benchmarks across two model families show that SGPO consistently outperforms SFT, on-policy RL, and hybrid-policy baselines, improving the average score by 2.2 points over the strongest baseline on Qwen2.5-7B-Instruct. Analysis reveals that the forward-KL objective provides an inherently selective distillation signal that outperforms direct trajectory imitation, and that strategy distillation exhibits complementary scaling with base model capability.
VeryTrace: Verifying Reasoning Traces through Compilable Formalism and Structured Verification
arXiv:2606.24124v1 Announce Type: new Abstract: Multi-step reasoning with Chain-of-Thought (CoT) prompting remains fragile: logical errors or hallucinations in early steps silently propagate, producing confident but incorrect conclusions. This paper presents VeryTrace, a zero-shot verification-and-repair framework that formalizes natural-language reasoning traces into a structured, compilable representation. VeryTrace introduces a Domain-Specific Language (DSL) that (i) makes step dependencies explicit, (ii) mechanizes quantitative content as executable expressions, and (iii) structures semantic inferences via deduction schemas. Our hybrid verifier combines deterministic checks for computational correctness, dependency resolution, and constraint satisfaction with targeted LLM audits for non-mechanizable semantic judgments, enabling step-level error localization and repair. Across three diverse domains-competition mathematics (AIME 2025), robotics planning (LLM-BabyBench), and kinship reasoning (CLUTRR), VeryTrace improves accuracy over zero-shot baselines on state-of-the-art LLMs without requiring domain-specific training or in-context examples, demonstrating that formalized trace verification achieves both precision and generalization.
The real challenge of enterprise AI is no longer the model, but how it is operated — ActuIA
In June 2026, the main cloud providers (Google Cloud, AWS, Microsoft, Databricks) are shifting their messaging away from model competition and toward operating...
Performance Benchmarks Reveal Persistent Gap Between Closed and Open-Weights AI Models
Analysis of ARC-AGI-2 scores indicates that closed-weights models maintain a performance lead over open-weights alternatives. The performance gap remains consistent at approximately 8-12 months, despite rapid iteration in open-source development.
German AI startup powers military drones without GPS
A Munich-based artificial intelligence startup called SE3 Labs stepped out of stealth mode on June 26, 2026, announcing that its spatial AI platform is already under contract with the German Bundeswehr and operational in military exercises across Europe, where the
Exploring Academic Influence of Algorithms by Co-occurrence Network Based on Full-text of Academic Papers
arXiv:2606.24099v1 Announce Type: new Abstract: Algorithms have become central to scientific research in the era of artificial intelligence (AI). Although algorithm mentions in papers are often used to indicate popularity and influence, existing studies usually evaluate individual algorithms in isolation and pay limited attention to the collective influence formed through their interconnections. This study constructs large-scale algorithm co-occurrence networks in natural language processing (NLP) based on the full text of academic papers and investigates algorithm influence from a network perspective. Using deep learning models, we extract algorithm entities and build overall, cumulative, and annual co-occurrence networks. We analyze their structural characteristics and apply multiple centrality measures to assess the group influence of algorithms across the whole field and over time. The results show that algorithm networks display typical features of complex networks, with increasingly dense connections developing over approximately two decades. Classic, high-performing algorithms and those located at the intersections of different research periods tend to have high popularity, control, centrality, and balanced influence. When the influence of an algorithm declines, it usually loses its core network position first, followed by weaker associations with other algorithms. This study is the first large-scale analysis of algorithm co-occurrence networks. Covering more than four decades of academic publications, it provides a temporal and structural view of algorithm influence and offers a foundation for future research on networks linking algorithms, scholars, and tasks.
When Networks Substitute for Outcome Surveillance? A Substitution-Complementarity Framework for Behavioral Signals in Predictive Monitoring
arXiv:2510.20025v2 Announce Type: replace-cross Abstract: Monitoring systems increasingly fuse dynamic behavioral data with outcome-based surveillance, raising a basic question: when does behavioral data carry predictive information that outcome history lacks? We study this using epidemic forecasting on mobility networks, asking whether mobility networks provide independent predictive signal beyond local outcome-based surveillance. We formalize this as a substitution-complementarity problem over directed, weighted mobility networks. Using a Frisch-Waugh-Lovell variance decomposition, our analytical framework derives domain-agnostic conditions under which network-topology features retain incremental explanatory power beyond autoregressive outcome histories. We instantiate the framework using town-level COVID-19 forecasting in Massachusetts (April 2020-April 2021), constructing mobility networks among 300+ towns from smartphone-derived origin-destination aggregates to extract centrality metrics. An agent-based model on synthetic networks confirms that the regime boundary arises from a generic interaction between macro-scale epidemic state and network topology, rather than dataset-specific artifacts. Prevalence-gated interactions between statewide incidence and network features yield large out-of-sample gains when primary surveillance is degraded (Predict-R2 increases from about 0.60 to 0.83-0.89) but only marginal lift when granular local histories are available (+0.5 percentage points). Gains concentrate during epidemic waves when behavioral responses shift network connectivity rapidly. Framed as a value-of-information problem, the substitution gain reflects the marginal value of behavioral data relative to primary-channel quality. This yields a transferable, cost-aware design rule: integrate topology-aware behavioral signals when primary surveillance is degraded or the network changes rapidly; otherwise, rely on autoregressive baselines.
Data-Driven Evolution of Library and Information Science Research Methods (1990-2022): A Perspective Based on Fine-grained Method Entities
arXiv:2606.25320v1 Announce Type: cross Abstract: Since the 1990s, advancements in big data and information technology have increasingly driven data-centric research in the field of Library and Information Science (LIS). To assess the influence of this data-driven research paradigm on the LIS discipline, this study conducts a fine-grained analysis to uncover the evolutionary trends of research methods within the domain. Using academic papers from LIS published between 1990 and 2022, four key categories of data-driven method entities are automatically extracted: algorithms and models, data resources, software and tools, and metrics. Based on these entities, the study examines the evolution of LIS research methods from three dimensions: the characteristics of research method entities over time, their evolution within different research topics, and the evolutionary features of research method entities across various research methods. The findings highlight data resources as a pivotal driver of methodological evolution in LIS, revealing a cyclical pattern of "emergence-stability/practical application" in the development of research methods within the field.
Boffin claims Microsoft's supposed quantum leap does not compute due to 'basic Python errors'
Nature paper argues researchers cherry-picked data. Redmond insists its work is sound
The Geometry Behind Diffusion and Flow Matching: Gradient Flows and Geodesics in Wasserstein Space
arXiv:2606.24157v1 Announce Type: new Abstract: The space $\mathcal{P}_2(\mathbb{R}^d$) of probability measures with finite second moment carries a natural geometry: the quadratic Wasserstein distance W_2 makes it a complete metric space and, following Otto, a (formal) Riemannian manifold whose geodesics are the optimal-transport interpolations. On this manifold, the gradient flow of the free energy F(rho) = KL(rho || \pi) is exactly the Fokker-Planck equation, and its implicit-Euler discretization is the JKO scheme. This is the geometry underlying diffusion models: the forward process descends the free energy, and each denoising step realizes one JKO step, which recovers DDPM, DDIM, NCSN/SMLD, and Energy Matching; this is one scheme, not separate theories. The same manifold supports a second variational principle. Its geodesics - the minimum-action curves of the Benamou-Brenier formula - are precisely the optimal-transport paths that Flow Matching learns. Fixing both endpoints and following the geodesic, generation becomes a deterministic ODE along a straight line, hence far fewer sampling steps. Placing both families of models on one manifold makes their relationship exact: diffusion follows a free-energy gradient flow, an initial-value problem; optimal-transport Flow Matching follows a Wasserstein geodesic, a boundary-value problem. The two reach the same endpoints along different paths.
Scientist publishes fresh doubts over Microsoft's quantum claims
The tech giant has consistently stood by its Majorana chip research despite scepticism from some experts in the field.
Alibaba Slides to 16-Month Low After Anthropic’s AI Accusations
Alibaba Group Holding Ltd. shares slid to a 16-month low in Hong Kong after Anthropic PBC accused the Chinese technology giant of “illicitly” accessing its artificial intelligence model.
Anthropic Accuses Alibaba of ‘Illicitly’ Accessing AI Models
Anthropic PBC accused Chinese technology giant Alibaba Group Holding Ltd. of waging a large-scale effort to “illicitly” access its Claude artificial intelligence model using thousands of fraudulent accounts that undermine the US AI developer’s decision to keep its products out of China.
Anthropic accuses Alibaba of obtaining ‘illicit’ access to Claude
AI company says the Chinese ecommerce giant used fake accounts to ‘extract’ chatbot’s capabilities
Anthropic says Alibaba illicitly extracted Claude AI model capabilities
Afghanistan said on Friday it had launched airstrikes on hideouts of Islamist militants in two Pakistani provinces, an assertion swiftly rejected by Islamabad, months after the neighbours engaged in their worst conflict in years.
The New Energy War: Why The AI Grid Is The New Battleground
Russia can't bomb the West, so it's hacking it. How AI data centers and Texas's ERCOT grid became the next cyberwarfare battleground.
RIFT-Bench: Dynamic Red-teaming For Agentic AI Systems
arXiv:2606.23927v1 Announce Type: new Abstract: Agentic AI systems powered by large language models (LLMs) are rapidly evolving into autonomous decision-making systems, exposing attack vectors beyond those of traditional LLM vulnerabilities. Existing security evaluations are often tied to specific implementations or domains, limiting unified comparison across heterogeneous systems. To address this gap, we introduce RIFT-Bench, a graph representation-driven methodology for dynamic red-teaming that enables unified evaluations across diverse agentic architectures. Building on a novel hierarchical representation, RIFT-Bench operates in two automated phases: Discovery, which extracts system structure, and Scanning, which deploys adaptive adversarial attacks and produces a comprehensive evaluation report. It evaluates the examined system itself, leveraging a broad set of dynamically adaptable adversarial probes across diverse attack vectors and objectives. We demonstrate the effectiveness of the proposed evaluation pipeline across 45 agentic systems spanning a diverse range of implementations, showing that the approach generalizes effectively to heterogeneous agentic architectures. Beyond systems and attacks, RIFT-Bench also supports direct evaluation of mitigation strategies. These key capabilities make RIFT-Bench a scalable foundation for security evaluation of agentic AI systems.
Australian regulator warns of urgent risk of frontier AI models for banks
The Australian Prudential Regulation Authority warned that frontier AI models pose urgent risks to the financial sector, urging banks to share cybersecurity capabilities and bolster resilience.
Microsoft uses AI to link two malware operations in racketeering suit
200+ C2 servers linked to StealC and Amadey shut down
Five Eyes Cybersecurity Agencies Issue Statement Regarding AI-Related Shifts in Cybersecurity Risks, Urging Organizational Leaders to “Act Now” | Inside Privacy
On June 22, the leaders of the cybersecurity agencies in Australia, Canada, New Zealand, the UK, and the U.S. issued a joint statement calling for an
AI, Geopolitical Tensions, and Human Error Are Reshaping the Cyber Risk Landscape - Risk & Insurance : Risk & Insurance
AI has accelerated cybercriminal capabilities while non-malicious incidents now account for a quarter of all cyber incidents, according to Lockton report.
Five Eyes warn AI cyber risks are rising within months
Boards face growing pressure to treat AI-driven cyber threats as an immediate business risk, with attackers able to exploit flaws within months.
ExtraHop® Report Finds Nearly Half of Ransomware Victims Suffer Data Theft Before Detection — ExtraHop
As AI reshapes the cybersecurity landscape, new data shows highly evasive threats moving at high velocity while enterprises face significant delays in containment
Adoption, Deployment & Impact
Getting past the pilot: Why so many AI test projects have trouble scaling
Business leaders from Salesforce, Amgen, and Thomson Reuters took a hard look at AI pilot projects to understand why some thrive and some flail.
How Shopify built an AI stack that doesn't care which models survive
Shopify built an LLM proxy that gives every engineer access to multiple AI providers — with automatic failover when any one of them goes down, changes, or disappears. When Claude Fable 5 shut down, Shopify's engineers didn't go into panic mode. The proxy shifted them to Claude Opus or GPT 5.5 automatically, without interrupting their workflows. “Fable looks amazing; we used it of course,” Farhan Thawar, Shopify’s head of engineering, says in a new VentureBeat Beyond the Pilot podcast. “When a model comes and then it goes, or it could be as innocuous as an update, the proxy allows us to spray across the different providers,” Thawar says. Shopify buys tokens in bulk and all users connect to models through its proxy, Thawar says. This gives his team access to reporting and failover; when there’s an availability issue with one provider, users can be “automatically, seamlessly” transferred to another. Enterprises can learn from this example and consider how a disruption might affect their business, Thawar says. At the very least, they should establish a solid backup plan. It’s important to have a system that allows for movement across models so enterprises are not “super tied” to a specific provider. Distillation is another important strategy. With distillation, a student model learns from a teacher model and typically becomes specialized in a narrower task. These small language models (SLMs) can be more beneficial than generalized, off-the-shelf models in some circumstances. For instance, Shopify’s flagship AI assistant, Sidekick, which performs numerous specialized subtasks for merchants so they can “remove toil” from their day-to-day. Using smaller distilled models can be faster and cheaper than more generalized models, Thawar says. In some cases they have proven to be 2x cheaper and faster; in more extreme cases 30x cheaper and faster, he says. But “it isn’t just about cost and latency, which are big; it’s about accuracy,” Thawar says. Engineers feed the UDP their teacher model, training data, evals, and a target model — say, Opus 4.8 distilling down to Qwen 3.5. The pipeline runs for about a day, then returns an evaluation showing what the fine-tuned model actually achieved on speed, cost, and accuracy for that subtask. If the tradeoff looks good, the engineer deploys it — no approval process required. Shopify's internal platform, Tangle, lets anyone visualize the pipeline as it runs. Thawar says his “dream” is to eventually not give the distillation pipeline a target model at all. Instead, users could provide the teacher model with data and evals and the directive: ‘Based on your learnings over time, I want you to look at a different class of model, different sizes, different types, and you tell me what the right distillation target is.’ “Maybe we'll get surprised. Maybe it'll be such a small model it could run on a phone,” Thawar says. “Other times, maybe it comes back and says, ‘There isn't a way to distill this down to anything better than what we have at the frontier.’” Moving away from "AI reflexivity" to "AI leverage" Shopify users can apply whatever harness they want: Claude Code, Codex, Cursor, GitHub Copilot for VS Code. “We expose everyone to the different harnesses so they can get a feel for what may or may not work in their workflow.” But the company also implemented a usage dashboard; this allows Thawar’s team to ask interesting questions around not just token spend, but: Who’s using the most expensive tokens? Who's spending more time on reasoning? What types of models are being used, and what disciplines and levels? Regarding the "tokenmaxxing" question, Shopify does have “circuit breakers” in place. If a user has a model running for a long time (say, 10 hours) and it’s consuming a lot of tokens, they will get pinged, “Did you mean to spend this?” As Thawar explains, sometimes the reply is “Oh, absolutely.” Other times it’s: ‘Whoa, I didn't know that was running in the background. I totally forgot about it. I'd rather stop it now.’ The ultimate goal, as Thawar describes it, is to move from “AI reflexivity” to “AI leverage,” and get people to really think deeply about where they can benefit most from AI in their workflows. Listen to the full podcast to hear more about: Shopify’s philosophy of building infrastructure before features. As Thawar puts it: “We've always built more infra. We will continue to always build more infra.” How Shopify’s internal AI agent, River, creates a “substrate of information” across the company. How Thawar's OpenClaw agent figured out he was traveling from his calendar — and what that moment told him about where agents are actually headed. You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.
Companies are not looking before they're leaping into the AI playpen
93% of organizations report infrastructure incidents attributable to AI
Inside Baseball: The Automated Ball-Strike System as an Object Lesson in Technological Rule Enforcement
arXiv:2605.16237v3 Announce Type: replace Abstract: Clearly-defined rules are often assumed to be straightforward to automate and evaluate. We challenge this assumption through an in-depth study of Major League Baseball's (MLB) seven-year experimentation with the Automated Ball-Strike System (ABS). ABS is envisioned to call balls and strikes accurately: a seemingly straightforward use of technology to objectively determine the distance between a pitch and the strike zone. Although the strike zone is an area clearly defined in the rulebook, it took MLB seven years to figure out how to automate calling balls and strikes with ABS, showing how even seemingly straightforward rules require a complex translation process to operationalize via technological systems. In this paper, we trace the design decisions that led to the current implementation of ABS. Our case study reveals that "distance" exists even between a clear rule and its technological implementation. Using analytic frameworks from Science and Technology Studies (STS), we show that such distance exists because (1) historically, the "ground truth" of the strike zone is contested: the rule in practice has always reflected a hybrid between the rulebook definition and umpires' enforcement decisions; and (2) the use of ABS is embedded in an existing eco-system, where the implementation of a technological enforcement system needs to balance multiple stakeholder values. This perspective challenges conventional evaluation paradigms that center on the distance between a formalized rule and its technological implementation, and instead calls for evaluating how such systems are experienced in practice. Addressing this question requires in-depth social science approaches, contributing to ongoing conversations in FAccT about the implementation and evaluation of sociotechnical systems.
Agentic AI Readiness Index 2026: The Gap Between Investment and Data Maturity | E3 Magazine
According to a recent study by Fivetran, only 15 percent of companies are fully prepared for the production-level deployment of agent-based AI—even though nearly 60 percent of companies are already investing millions.
AI is ready, but companies are not: over $143bln at risk, report finds
A growing gap has emerged between organizations' AI ambitions and their ability to translate those ambitions into tangible business outcomes
AI Adoption Makes Real-Time, Streaming Data Central to Enterprise Operations, ISG Says
Enterprises increasingly are adopting real-time data processing as a foundational tool for AI-enabled automation, ISG says.
Bring Your Own AI: 41% of Workers Say Their Employer Has Done Nothing to Prepare Them to Use AI at Work
/PRNewswire/ -- AI resume builder Resume Now® has released the results of its latest national survey of more than 1,000 U.S. employed workers, revealing that...
From Causal Discovery to Implementation: An Agentic AI Framework for E-Scooter Mobility Hub Planning Across 29 German Cities
arXiv:2606.25484v1 Announce Type: cross Abstract: Existing approaches to e-scooter mobility hub planning lack city-type-specific causal evidence. Demand models are typically correlational, built on proprietary trip data, and do not distinguish how driver profiles vary across urban typologies. This paper presents a three-phase agentic AI framework that constructs a Causal Template Library from public GBFS data across 29 German cities, encoding which environmental features causally drive hotspot demand for each combination of city type (large, university, industrial, hilly) and cluster type (core, peripheral). A large language model (LLM) orchestrated causal discovery pipeline adapts algorithm selection to local data conditions across 57 city-cluster units. The library reveals systematic variation. Core demand is driven by activity access and transit proximity, while peripheral demand responds to built form, with city-type-specific patterns supporting transferable siting templates. A planning tool built on the library scores candidate sites, calibrates infrastructure recommendations to local demographics, and generates practitioner-ready reports. In Heilbronn, Germany, two hub sites informed by the framework's causal evidence are currently under construction, illustrating how the outputs can support real-world siting decisions.
How AI is transforming natural disaster prediction
Catastrophe scientists are pushing past the limits of physics-based models, improving how insurers calculate risk
Artificial Intelligence in the Real Economy: A Visual Guide
A new series of visual explainers looking at how AI is transforming different industries
OmniPath: A Multi-Modal Agentic Framework for Auditing Wheelchair Accessibility
arXiv:2606.24129v1 Announce Type: new Abstract: For a wheelchair user, a standard blue line on a map is often a broken promise. While platforms like OpenStreetMap (OSM) successfully capture where a path is, they frequently fail to convey how it physically feels to travel on it. This information barrier is problematic for wheelchair users. To solve this issue, we present OmniPath, a system that moves from passive mapping to proactive environmental auditing. Our framework fuses the network topology of OSM with the submeter precision of high-density aerial LiDAR (USGS 3DEP) to create a high-fidelity 3D model of the pedestrian environment. Rather than simply routing a user, our agent virtually traverses the network, analyzing the surface in 0.5 meter increments. It rigorously quantifies physical friction points specifically running slope, cross slope, and vertical discontinuities against ADA compliance standards, calculating a weighted severity score to categorize hazards from ``Mild'' to ``Critical.'' To ensure real world reliability, we validated the system against 200 physical ground truth field surveys across the National Mall using stratified random sampling. The framework demonstrated strong diagnostic reliability for high-severity hazards, achieving F1-scores of 0.60 for Severe and 0.58 for critical categories. By automating this micro-scale inspection, OmniPath identifies the ``invisible'' barriers that standard maps miss, effectively transforming a static dataset into accessibility data source that anticipates accessibility challenges before the user ever leaves home.
Majesco points to accelerating AI adoption reshaping insurance operating models - Reinsurance News
Majesco, a provider of AI-native and cloud-native software solutions for the insurance industry, has released a new thought leadership report examining
Claude joins Slack as new 'coworker'
Anthropic's Claude AI is now available as an integration within Slack to assist with workplace tasks.
T2D-Bench: Evidence-Gated Evaluation of LLM Outputs for Type 2 Diabetes Using a Multi-Layer Clinical-Lifestyle Knowledge Graph
arXiv:2606.24145v1 Announce Type: new Abstract: Large language models (LLMs) can produce clinically fluent recommendations for type 2 diabetes while failing to satisfy guideline constraints or explicitly justify lifestyle-related glycemic claims. We present T2D-Bench, a reproducible benchmark and evidence-gated evaluation framework for testing whether LLM outputs satisfy explicit, graph-checkable evidence requirements. T2D-Bench is built on a multi-layer clinical-lifestyle knowledge graph that combines a biomedical spine (UMLS, DrugBank, SIDER), computable ADA Standards of Care rules, and lifestyle knowledge connected through a mechanistic bridge to glycemic laboratory effects. Across 100 structured vignettes spanning diagnosis, medication safety, and adversarial lifestyle conflicts, baseline outputs failed benchmark-defined evidence-path checks in 35% of cases for GPT-4o-mini and 33% for GPT-4o. The evidence gate detects unsupported omissions and uses constrained revision to bring outputs into verifier-level compliance with benchmark-defined evidence requirements. These results show that computable evidence constraints can make unsupported clinical omissions explicit, measurable, and correctable in diabetes-focused LLM outputs.
Enterprise AI enters new phase as firms shift focus from adoption to ROI | Artificial Intelligence News - Business Standard
Enterprise AI: AI investment is accelerating across industries, but enterprises are increasingly focused on measuring returns, scaling deployments, adopting agentic AI, and reshaping workforces
[BPO Insights] The ROI Model That Closes Deals: Building a One-Page Financial Case for AI
The one-page financial model that convinces BPO CFOs to deploy AI -- including blended cost per interaction, the automation ramp curve from 30% to 70%, and margin improvement analysis by BPO size tier. A step-by-step playbook for building the financial case that actually gets signed.
Council Post: Why Enterprise AI Needs To Move From Demos To Measurable Outcomes
AI value is created when an answer is trusted, acted on, embedded into a process and measured against a business outcome.
AWS CEO Matt Garman on why enterprises are seeing AI ROI
In a new interview, Garman explains why enterprises are moving past AI experimentation—and what's driving real business value.
Geopolitics, Policy & Governance
Nations race to secure sovereign AI as chips, energy become strategic assets | The Jerusalem Post
The Netherlands’ entry into Pax ... at the center of new AI supply‑chain coordination. This move comes amid ongoing tensions over export controls on ASML’s advanced lithography tools to China, Gardner explained. "We live in a world where geopolitics and technology are ...
The ASML-China Saga Reminds Us That Making Advanced Chips Is Not Just a Matter of Money | Cloud News
The recent controversy surrounding ASML and China is somewhat unusual even for an industry accustomed to geopolitics. The idea that an EUV machine, one of the
Geopolitics and AI in spotlight at China's 'Summer Davos'
Breakthroughs in technologies such as AI are touted as drivers of economic growth, but headwinds include concerns over job losses and geopolitical tensions, speakers told AFP at China's "Summer Davos"…
I Met With China’s Top AI Experts. They’re Freaking Out, Too | WIRED
The AI arms race between China and the US has researchers on both sides worried about a “Chernobyl moment.”
EU tech head pitches digital sovereignty as allied interdependence, not US break
EU tech chief Roberto Viola cast Europe’s tech sovereignty push as a bid for trusted cooperation with the US, not a break from American technology.
Bold move, Cotton: Trump administration tells US techies it expects American quantum computer by 2028
The Trump administration has set a 2028 goal for the development of a national American quantum computer to maintain a competitive edge.
India Wants Its AI Talent Back; But What’s The Incentive?
So, we must ask: can India build entirely new forms of AI systems? Can we create world-class models that understand physical environments? Can it solve problems unique to a country of 1.4 Bn people and, in doing so, produce technologies relevant to the world? Without such moonshot ambitions, India risks remaining a consumer of breakthroughs developed elsewhere. The PMRC scheme creates pathways for researchers ...
China pledges more support for AI, advanced computing to boost self-reliance
China pledges more support for AI and advanced computing to boost self-reliance.
Chinese supercomputer using local processors heads TOP500 list
The use of Arm cores and Linux indicates that Beijing has not fully broken away from global standards.
Fifty Years of Specification Completeness: What Aviation Certification Tells AI Governance About Epoch Limits, Proof Surfaces, and the Structural Gap
arXiv:2606.25120v1 Announce Type: cross Abstract: Aviation software certification has operationalised three structural requirements for governed software systems since 1992: structured governance linkage between governing specifications and operational evidence, context-bounded validity that triggers revalidation when operational context changes, and an objective evidence architecture that defines what proof means and what makes it sufficient. These requirements appear in DO-178C and DO-330 and are enforced through FAA and EASA certification. No existing framework requires these structural properties as intrinsic properties of individual AI governance documents. A system prompt, an AGENTS.md file, a governance policy, or a task envelope can be deployed without satisfying any of the three requirements aviation has enforced for three decades. Aviation is the most technically rigorous instance: its standard-setting bodies have acknowledged that their frameworks break down for AI systems, yet none requires these properties of individual governance documents. Aviation's structural requirements break down at the system level because AI systems are non-deterministic, but remain transferable at the document level: the governance artifact is a static artifact whose structural properties can be evaluated independently of the stochastic system it governs. The paper maps DO-178C's traceability architecture, DO-330's requalification triggers, and DO-178C's objective evidence requirements onto three structural findings: epoch limits on governance document validity, proof surfaces as the revalidation feedback mechanism, and the absence of structural completeness requirements in AI governance instruments. An empirical companion (arXiv:2604.21090) found that 37% of AI governance documents fall below the structural quality threshold. PromptQ's seven-principle framework operationalises these requirements at the governance document layer.
AI & Tech Brief: White House unveils quantum executive order - The Washington Post
Lawmakers on the House Energy and Commerce Committee have come to a bipartisan agreement on children’s online safety legislation. The deal shows momentum in Congress to strike an outline on an AI deal before July 4.
AI interests win, and lose, in one New York district - The Washington Post
Democratic Assembly member Micah Lasher defeated fellow assemblyman Alex Bores in a race that became a proxy for how AI should be regulated.
‘We should be worried’: report sheds light on ICE’s booming arsenal of hi-tech surveillance tools
Spending on government contracts with tech firms that use AI-powered tools to track immigrants has soared to record levels under Trump 2.0, report says A new report sheds light on the unprecedented growth of the US government’s immigration surveillance arsenal, revealing fresh details about how spending on technology and AI tools to find and track migrants has soared to record levels during Donald Trump’s second term. The report, released this week, analyzed US Immigration and Customs Enforcement (ICE) and Customs and Border Protection (CBP) contracts with 11 companies the authors said provide surveillance tech. They found the money awarded to these firms doubled from 2024 to 2025, to just over $310m – and in 2026, that number soared to a record $513m. Continue reading...
AI & Tech Brief: Bores loses the big AI primary - The Washington Post
Bores was particularly threatening to the AI industry due to his ability to synthesize ideas from the worlds of tech and politics.
Rethinking regulation for the age of AI | FCA
Speech by Nikhil Rathi, FCA chief executive at techUK's Agents of Change: Generative and Agentic AI in Financial Services 2026.
Re-thinking regulation for the age of AI | FCA
Speech by Nikhil Rathi, FCA chief executive at techUK's Agents of Change: Generative and Agentic AI in Financial Services 2026.
AI regulation in the UK - House of Commons Library
This briefing provides an introduction to artificial intelligence (AI) and how it is regulated in the UK.
What the Anthropic fight says about AI regulation
These two perspectives loosely ... administration and Anthropic itself, and argues that geopolitics and US-China relations make it urgently necessary to regulate the diffusion of AI to make sure that America stays in control....
AI Sovereignty and the Implications of the US Export Restrictions | IBTimes
Last week the United States government announced an export control directive to suspend all access to Anthropic's Fable 5 and Mythos 5 models by any foreign national. This ban included foreign nationals whether inside or outside the United States, including foreign national Anthropic employees. The move caught many by surprise, prompting debate over whether the need for AI sovereignty was even more important than previously realised. While the geopolitical ...
Anthropic Customer Sues US Over Fable 5 and Mythos 5 AI Ban
A legal tech company says US export restrictions cut off access to Anthropic's most powerful AI models, triggering a high-stakes lawsuit.
AI’s rapid evolution forces regulatory rethink, FCA chief Rathi says
FCA head Nikhil Rathi stated that traditional rule-making is insufficient as AI evolves faster than current frameworks. The regulator is re-thinking its approach to supervision and collaboration.
Rep. Liccardo Introduces SKILL Act to Prepare Workers for AI-Driven Job Market Shifts | Congressman Sam Liccardo
WASHINGTON, D.C. — Today, Congressman Sam Liccardo (CA-16), joined by Congressman Jimmy Panetta, introduced the Supporting Knowledge Through Industry-Led Learning (SKILL) Act, a mechanism to incentivize private sector investment in American workers to anticipate the likely disruption from ...
The EU AI Act: Compliance Requirements For 2026 And Beyond | BlackFog
Learn the key EU AI Act compliance requirements for 2026 and the security controls organisations need for AI governance readiness.
UK digital regulators accelerate adoption of AI tools for enforcement, supervision
The Digital Regulation Cooperation Forum is testing generative AI for supervision and enforcement. Members are evaluating frameworks to manage risks like bias, hallucinations, and data security.
3 Stocks Built for Rising AI Regulation Spending - Simply Wall St News
Political spending around AI regulation is exploding, and that is turning compliance, security, and governance into real business for some companies. As lawmakers weigh tighter rules and Big Tech steps up lobbying, investors are starting to look at stocks that could benefit from higher regulatory ...
A Roadmap for Sensible AI Regulation
Big Tech should read, and heed, an important new paper stressing that self-governance is the most sensible path forward.
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.