AI Intelligence Brief

Wed 10 June 2026

Daily Brief — Curated and contextualised by Best Practice AI

118Articles
Editor's pickSummary

AI Automates Outsourcing, Tata Predicts Job Cuts, and IMF Warns of Backlash

TL;DRGenerative AI is transforming outsourcing by automating routine tasks, challenging traditional labor models. Tata's CEO forecasts AI will replace half of its tech jobs, highlighting the potential for significant workforce disruption. The IMF warns of backlash against AI's impact on employment, drawing parallels with globalization's effects. Meanwhile, UK AI startups raised £8.2 billion in venture capital, capturing nearly half of European tech investment.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

10 of 118 articles

Economics & Markets

25 articles
AI Macroeconomics6 articles
Editor's pickProfessional Services
Daily AI News June 9, 2026: Claude's Dynamic Workflows - Automation on Steroids· 4 days ago

AI Is Rewriting the Economics of Outsourcing

Generative AI is undermining the traditional labor-arbitrage model of outsourcing by automating routine, rules-based work and shifting focus to task-level workflows.

Editor's pickEnergy & Utilities
Arxiv· 3 days ago

Accounting for AI Inference in Corporate GHG Inventories: A Four-Tier Methodology for Scope 3 Category 1 Reporting

arXiv:2606.10660v1 Announce Type: new Abstract: AI inference services -- API subscriptions, enterprise chat tools, and SaaS products with embedded AI features -- fall unambiguously within Scope 3 Category 1 under the Corporate Sustainability Reporting Directive (CSRD), which requires disclosure for fiscal years starting January 2024. Yet no standardised methodology exists for including them in corporate GHG inventories. Current practice either omits the category entirely or applies a generic economic input-output (EEIO) factor calibrated to the ICT sector as a whole, overestimating AI inference emissions by 10-40x relative to physically derived alternatives. We propose a four-tier framework that matches estimation precision to the data organisations can realistically obtain, progressing from direct token-based physical estimation -- using GPU energy benchmarks and regional grid carbon intensities -- down to a spend-based EEIO fallback for services where no usage data exists. Emission factors are derived from peer-reviewed GPU energy benchmarks (ML.ENERGY Leaderboard v3), confirmed grid carbon intensities (EPA eGRID 2023; Ember 2023), and published water use effectiveness data (Li et al., 2025). Applied to a 200-person European firm, the framework yields a total below 1 tCO2e, illustrating that the compliance challenge is methodological rather than magnitude-driven. We further document a water-carbon trade-off that current ESG tools do not surface: Sweden's hydro-dominated grid delivers the lowest carbon intensity in our dataset but the highest water footprint, with direct implications for data centre location strategy.

Editor's pickPAYWALLGovernment & Public Sector
FT· 4 days ago

There is a simpler option for making AI pay its way: tax it properly

The world of laissez-faire no longer exists given the impact of the technology

Editor's pick
Arxiv· 3 days ago

GAGI: A Gini-Adjusted GDP-per-Capita Index for Distribution-Aware Macroeconomic Welfare Monitoring

arXiv:2606.09944v1 Announce Type: new Abstract: GDP per capita is the default lens through which governibng bodies track the economic prosperity and consequences of economic events , yet it is blind to two first-order determinants of lived prosperity: income/wealth distribution and inflation impact. Inequality-adjusted income measures are themselves not new but What is missing from the macroeconomic monitoring toolkit specifically is not a welfare concept but an operational monitoring trigger: a statistic minimal enough to compute annually from public data, transparent enough to audit without modelling assumptions, and normalised so that year-on-year, cross-country change ? the quantity a regulator needs to act on? is legible. We assemble such an instrument, the Gini- Adjusted GDP per Capita Index (GAGI): a reproducible, publicly computable formulation that rescales each country's GDP per capita by its inequality-adjustment factor (1-G) and its price level, normalised to a 2010 baseline. GAGI is a general-purpose welfare index, not inherently specific to AI automation, applicable wherever welfare-adjusted prosperity needs tracking. Applying GAGI to the G7 economies over 2010-2026, we show that welfare-adjusted prosperity has diverged persistently and increasingly from headline GDP growth, that the divergence widens sharply after 2022, temporally coincident with, though not, on this evidence alone, demonstrated to be caused by the after effects of COVID and the acceleration of generative-AI deployment. We argue that GAGI is a necessary complement to GDP-based monitoring: any macroeconomic monitoring instrument that tracks only aggregate output will systematically miss the distributional harm that automation can cause even while reported growth remains strong.

Editor's pick
The Business Times· 4 days ago

Now you see it, now you don't: Why data can't capture the AI revolution - The Business Times

Nobel laureate Michael Spence says the biggest economic transformation in history may barely show up in statistics but could widen wealth inequality Read more at The Business Times.

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | June 9, 2026· 5 days ago

UK government establishes AI Economics Institute to inform policymaking

The new research organization will assess the economic impact of AI, including its effects on productivity, labor markets, and growth, to guide future government policy.

AI Market Competition5 articles
Editor's pickTechnology
Theregister· 4 days ago

Neo4j plots Palantir alternative with GraphAware acquisition

Graph database biz says on-prem, air-gapped intel stack gives governments a no-kill-switch option

Editor's pickTechnology
Artificial Intelligence Newsletter | June 10, 2026· 3 days ago

US judge signals openness to Yelp request to apply Google Search monopoly ruling

A California federal judge indicated she may allow Yelp to bar Google from relitigating issues already decided in the DOJ's monopolization case.

Editor's pickMedia & Entertainment
Arxiv· 3 days ago

Unintended Consequences of Recommender System Interventions: Evidence from a Field Experiment

arXiv:2606.08265v1 Announce Type: cross Abstract: Platform content interventions in recommendation systems are typically evaluated as static "nudges", ignoring that the systems adaptively learn from the resulting user behavior. We investigate this dynamic through a large-scale field experiment on a short-video platform. The experiment involves a "sleep reminder" campaign designed to reduce late-night usage. Paradoxically, the intervention increased late-night engagement by 14.75% and overall platform usage by 2.18%, and the effects persisted for weeks even after the experiment. We explain this through a forced-exploration mechanism, showing that by revealing high latent demand for the promoted content, the intervention triggers a recommendation policy update that routine user behavior would not produce. The data generated by the intervention induced the algorithm to update its post-campaign policy, reinforcing the very engagement loops the campaign aimed to mitigate. Our findings demonstrate that user-facing interventions can effectively retrain the underlying algorithm, triggering durable, system-wide shifts in content distribution that challenge standard evaluation metrics in platform governance and social responsibility initiatives.

AI Productivity4 articles
Editor's pick
Arxiv· 3 days ago

Predictive Assistance and the Temporal Dynamics of Exploratory Compression

arXiv:2606.10094v1 Announce Type: new Abstract: Classical theories of cognition describe problem solving as exploratory search through structured problem spaces in which repeated interaction gradually compresses search into efficient representational structures. Predictive artificial intelligence systems introduce a distinct regime in which stabilization may occur before exploratory diversification unfolds, supplying solutions and decision trajectories prior to internally generated search. This paper develops a geometric dynamical framework in which attention evolves over a landscape of strategies shaped by stabilizing drift, endogenous exploratory perturbation, and responsiveness-gated learning. Predictive assistance is modeled as a process of exogenous exploratory compression that stabilizes trajectories before self-generated exploration broadens the accessible regions of strategy space. The framework yields three main results. First, sustained predictive stabilization reduces exploratory responsiveness by attenuating the effective influence of intrinsic perturbations even when exploratory variability remains present. Second, curvature accumulates and relaxes asymmetrically, producing hysteresis and delayed recovery of exploratory mobility after assistance withdrawal. Third, developmental outcomes depend critically on the timing of stabilization, with early intervention narrowing future exploratory traversal before broad representational diversification has occurred. The framework generates empirically testable predictions concerning exploratory entropy, premature convergence, and delayed recovery following predictive stabilization. More broadly, the results suggest that predictive systems may reshape the geometry of exploratory cognition itself.

Editor's pickTechnology
Arxiv· 3 days ago

Self-Distillation Policy Optimization via Visual Feedback: Bridging Code and Visual Artifacts

arXiv:2606.10334v1 Announce Type: new Abstract: Code-generating large language models (LLMs) increasingly produce visual artifacts such as charts, web pages, and slides by writing programs that are executed by non-differentiable renderers, committing to code before observing the render. As a result, otherwise executable code often yields artifacts with visually salient defects, including overlapping elements, clipped text, broken alignment, low contrast, and overflow. We study visual-feedback self-distillation for code-generated visual artifacts. We propose Visual-SDPO, a self-distillation policy-optimization framework that treats rendered visual feedback as privileged context for a weight-sharing teacher and distills this feedback into a coding student. To make supervision spatially targeted rather than uniform, we introduce Visual-Grounded Code Credit Weighting, which traces each detected defect back to the code statements responsible for the affected elements and amplifies the distillation signal on those statements. A sequence-level GRPO (Group Relative Policy Optimization) term complements the dense token-level objective by rewarding executable, visually high-quality rollouts, while failed executions remain learnable through the self-distillation path by passing execution errors as privileged context to the teacher. We instantiate Visual-SDPO for chart, web/UI, and slide generation with a unified Qwen3-VL-8B-Instruct backbone. Across chart-to-code, UI-to-code, and slide-generation benchmarks (ChartMimic, Design2Code, and AeSlides), Visual-SDPO improves over the zero-shot base by more than 10 absolute points in the primary metric and over GRPO by at least 2.4 points, with fewer training steps and no added inference-time cost.

Editor's pick
Arxiv· 3 days ago

AnnotateThis: Analyzing a human-LLM system for annotating social media data with the concept of climate change mitigation pessimism

arXiv:2606.10210v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly being integrated into research workflows. However, LLMs have been shown to struggle with difficult and nuanced concepts such as those found in computational social science (CSS) research. Within the CSS community, there has been a call for new systems to be developed which center humans in LLM-supported scientific workflows. We develop AnnotateThis, a human-centered system for inspecting and improving LLM annotations, a process we refer to as LLM grounding for a target concept. AnnotateThis is developed with both computational and social scientists to reflect existing workflows for data annotation. It includes a range of information features for users to interrogate the quality and reliability of LLM annotations. We evaluate our system in two settings. In the first, we assume a researcher may not have access to ground truth data and that users of AnnotateThis have limited prior knowledge of the concept they would like an LLM to annotate. That is, they may be conducting concept specification and LLM grounding simultaneously. In the second setting, we assume access to ground truth labels and that the concept is specified for a given annotation task; here, the task of LLM grounding is more straightforward. We find that in both settings users can improve the quality of LLM annotations with AnnotateThis and that their final annotations far surpass those created without human intervention. For example, when we evaluate with ground truth labels, we see an absolute improvement of 0.15 in F-Measure and 0.23 in accuracy over a fully automated state-of-the-art method for prompt refinement.

Editor's pickConsumer & Retail
Arxiv· 3 days ago

From Prompt to Purchase: How AI Brand Recommendations Move Consumers on the Open Web

arXiv:2606.10907v1 Announce Type: new Abstract: When a conversational assistant recommends a brand to a user with no recent observed engagement, that user's same-name Google search rises +4.3 percentage points (pp) [3.1, 5.5], visits to the brand's own site +2.4 pp [1.4, 3.5], and brand-specific retailer-page visits +1.0 pp [0.3, 1.7] over matched backward placebos. Recovering that estimate is the work. The mention creates a brand exposure no web log attributes to the assistant, and the naive all-mention funnel that seems to measure it is confounded: many mentions are incidental references to brands the user already uses ("your Netflix download"), whose downstream visits are that existing customer's own behavior and surface as a brand-specific pre-trend. We measure off-platform response on a panel that joins opt-in clickstream to the same users' ChatGPT, Claude, and Gemini conversations, and isolate the effect with a pre-trend event study, a stance classifier, non-customer conditioning, and a within-response same-category control: incidental name-drops then move behavior far less (+1.8/+1.1/+0.3), and the named brand moves far more than unnamed same-category brands in the same response. The downstream path is mostly search-mediated and reaches both own sites and retailer pages, with a destination mix that tracks baseline brand-directed behavior rather than redirecting toward either. The design is observational and we do not observe transactions, so retail is purchase-adjacent. Standard referrer-based and last-click measurement miss this upstream exposure: assistants move observably-unengaged users into open-web brand navigation along a path attributed elsewhere.

Labor, Society & Culture

18 articles
AI & Employment8 articles
Editor's pick
Business Insider· 4 days ago

IMF chief warns not to underestimate the backlash against AI's impact on workers

Kristalina Georgieva of the IMF said world leaders should not ignore AI's negative impacts, citing the effects of globalization as a cautionary tale.

Editor's pickPAYWALLProfessional Services
Bloomberg· 4 days ago

Tata Boss Predicts AI Agents Will Replace Half Its Tech Jobs

The boss of one of India’s largest conglomerates predicted AI agents will replace half the jobs at IT leader Tata Consultancy Services Ltd. in future, joining a growing list of company chiefs warning about major disruptions as artificial intelligence matures.

Editor's pickProfessional Services
Arxiv· 3 days ago

CollabSkill: Evaluating Human-Agent Collaboration On Real-World Tasks

arXiv:2606.09833v1 Announce Type: cross Abstract: AI agents are reshaping the workspace, leading to drastic change of how humans work. Despite the considerable potential of human-agent collaboration both in preserving human agency and generating economic value, this paradigm remains largely absent from occupational task evaluation, hindered by the difficulty of gathering real human data and accounting for inter-human variability. We introduce CollabSkill, a framework for evaluating human-agent collaboration on real-world occupational tasks. CollabSkill pairs real human workers with AI agents on tasks matched to their occupational background, collecting data that capture the complexity of economically valuable tasks and the usage patterns of real workers. To account for inter-human variability, CollabSkill employs a Bayesian skill rating system to disentangle and quantify the skill contributions of both humans and AI agents. Drawing on over 1,500 prompts from 386 working sessions contributed by 93 human workers, our analysis yields insights on two fronts: on the agent side, rankings on CollabSkill diverge meaningfully from those of existing fully autonomous benchmarks where Codex leads, with Claude Code ranking first; on the human side, CollabSkill reveals that practical experience emerges as the primary driver of collaboration skill, with hands-on collaboration meaningfully shifting workers' AI literacy. Together, we hope CollabSkill enables the community to invest in systematic evaluation of human-agent collaboration and spurs development efforts aimed at building AI agents that genuinely augment human workers.

Editor's pickPAYWALLEducation
NYT· 4 days ago

In the Hybrid A.I.-Human Work Force, Who Will Actually Thrive?

A panel of experts explains how job seekers should prepare for the future of work.

Editor's pick
Arxiv· 3 days ago

An economic geography dataset of U.S. skill specialization, relatedness, and complexity

arXiv:2606.09918v1 Announce Type: new Abstract: We release a new dataset of U.S. skill specialization, relatedness, and complexity, derived from 433.6 million job postings between 2010 and 2024. The panel covers 3,194 counties across 15 years and reports 201 variables that describe the volume of job postings (e.g., labor demand), the modality and nature of work (e.g., remote share, internship share), and the structure of employer skill demand by category (e.g., specialized, software, and common). We develop a suite of economic geography variables: skill-based measures of county specialization, relatedness, diversity, complexity, and dynamics. These measures are further decomposed by employer entity type (corporate, university, government, and federal lab), along with entity-pair measures of alignment, overlap, and directional skill gaps. An accompanying interactive dashboard supports both academic research and applied use, with features including spatiotemporal visualization, county rankings and trends, pairwise county comparisons, and individual county profiles.

Editor's pickEducation
Benefits Canada· 4 days ago

Survey finds only 19% of employees feel confident using AI tools at work | Benefits Canada.com

Only 19 per cent of employees feel confident using artificial intelligence at work, according to a new report by the Achievers Workforce Institute. The report, which analyzed responses from 3,000 global employees, revealed two connected trends: a two-year decline in the percentage of employees ...

Editor's pickTechnology
Straight Arrow News· 4 days ago

Why AI CEOs are rewriting their workforce predictions ahead of IPOs

Last month, during a conference ... increase productivity rather than replace people, as people adopt it into their work. Alan Smeaton, emeritus professor of computer studies at Dublin City University, told Straight Arrow he sees provocative predictions and reversals as attempts by AI CEOs to stay in the headlines ahead of going public. The future impact of AI on the labor market is actually not entirely clear, Smeaton said. He cited a January 2026 report from ...

Editor's pickProfessional Services
Daily Brew· 4 days ago

CEOs who think AI replaces their employees are just bad CEOs

The author argues that leadership teams viewing AI primarily as a tool for workforce replacement are failing to leverage the technology's true potential.

AI & Inequality1 articles
Editor's pickMedia & Entertainment
Arxiv· 3 days ago

Gender-based discrepancies in the algorithmic delivery of political ads on social media

arXiv:2606.10834v1 Announce Type: new Abstract: Social media has become a key channel for political advertising during election campaigns. However, algorithmic biases in the delivery of these ads may distort the public's exposure to political messaging. This can hinder citizens' ability to make informed choices and undermine equal access to political discourse, raising concerns about the integrity of electoral processes. In this study, we examine gender-based discrimination in the delivery of political ads during the 2024 European Parliament elections. Using a large-scale dataset of over 110000 ads from 453 political parties and 968 candidates that generated over 7 billion impressions across 25 EU countries, we find that men were significantly more likely to be shown ads from populist and far-right parties than women -- even after accounting for ad content, platform-level competition, and targeting strategies. All else equal, ads by populist parties reach, on average, a 6 percentage point higher male share. Such imbalances restrict the ability of parties to reach diverse audiences and prevent voters from engaging equally with the full range of political viewpoints. This pattern is particularly concerning given that far-right and populist ads may reinforce political polarization and widen existing gender gaps in political engagement. Our findings underscore the need for platforms and policymakers to audit algorithmic ad delivery in political campaigns on social media and to implement safeguards that ensure fairness and protect democratic processes.

AI Ethics & Safety4 articles
Editor's pickHealthcare
Guardian· 4 days ago

Doctors and NHS could be sued for mistakes made by AI tools, report warns

Medical Protection Society calls for law to be overhauled to help medics avoid liability for errors made by technology Doctors and the NHS could be sued for medical negligence over mistakes made by artificial intelligence tools used in diagnosing patients and suggesting their treatment, ministers are being warned. Under the law as it stands, medics and the health service can be held liable for patients being harmed or dying even if it was AI that made the errors that resulted in their suffering. Continue reading...

Editor's pickHealthcare
Arxiv· 3 days ago

"Where is this coming from?" Uncovering Trustworthiness Ideals in AI-powered Peripartum Information Seeking

arXiv:2606.10158v1 Announce Type: new Abstract: AI-powered tools increasingly promise to fill information gaps in health, especially in domains like maternal and reproductive health that demand timely, accurate, and actionable information. This is extremely important, as the United States leads peer nations in preventable deaths, with stark racial disparities. However, current AI and NLP-powered systems aim to improve access to vetted maternal health information by routing user queries to a factual response while under-specifying the socio-technical governance structures that shape trust, use, and harm in practice. We report findings from four synchronous focus groups ($n=24$) with three stakeholder groups central to peripartum information support: birthing people, clinicians, and health workers (e.g., doulas, social workers, community health workers) exploring topics around information seeking, experience with current clinical infrastructure, misinformation, and an AI-enabled factual answering tool design probe. Our inductive analysis surfaces a central finding: in high-stakes health contexts shaped by historical inequities, trustworthiness must be inspectable and not asserted. While stakeholders diverge on what makes information credible, they converge on the need for transparency, recourse, and ecosystem complementarity. Based on the discussions, we identify four themes and governance requirements: (1) support for social and identity-based sensemaking, (2) pluralistic verification practices, (3) inspectable governance with recourse mechanisms, and (4) ecosystem-aware integration that avoids shifting burden. Building on these findings, we propose design artifacts that are mistrust-aware and promote principled governance mechanisms for transparent, pluralistic AI systems. Finally, we discuss the implications of our findings for expanding human-AI evaluations and improving the transparency of deployed AI systems.

Technology & Infrastructure

43 articles
AI Agents & Automation7 articles
Editor's pickProfessional Services
Arxiv· 3 days ago

Business World Model

arXiv:2606.10044v1 Announce Type: new Abstract: Businesses are increasingly adopting AI-enabled tools to improve productivity, reduce costs, and enhance products and services. However, the transformative potential of AI extends beyond automating predefined tasks: it lies in enabling intelligent systems to plan, optimize, and execute business initiatives from high-level strategic objectives. This paper introduces the concept and architecture of a business world model (BWM), a world model specialized for business and organizational environments. Inspired by world models in artificial intelligence, cognitive science, and control theory, a BWM encodes business states, dynamics, constraints, objectives, and feasible action space to support autonomous decision-making. We propose a business-semantics-centric formulation in which business states, dynamics and actions are linked to key business entities. Within this framework, agents can simulate alternative action sequences, estimate their effects on future business outcomes, and evaluate trade-offs under uncertainty. The proposed architecture integrates semantic data representations, probabilistic machine learning models, deterministic business rules, and explicit action space into a coherent structure for planning and counterfactual reasoning. Although its individual components are not new, the contribution of BWM lies in organizing them as an executable internal simulator for business initiatives. This work establishes a conceptual foundation for autonomous business systems capable of moving from instruction-based execution toward goal-driven planning and execution.

Editor's pickManufacturing & Industrials
Arxiv· 3 days ago

Sim2Schedule: A Simulator-Guided LLM Framework for Autonomous Open-Pit Mine Scheduling

arXiv:2606.10286v1 Announce Type: new Abstract: Open-pit mine scheduling is a critical process for maximizing economic return under complex geotechnical and operational constraints. While Mixed-Integer Linear Programming (MILP) provides mathematically optimal baselines, its exponential computational complexity and inability to adapt in real time limit its practical deployment in dynamic industrial environments. This work introduces a simulator-driven Large Language Model (LLM) scheduling framework in which the LLM acts as an autonomous decision-making agent, guided at each step by a custom simulator that encodes geotechnical precedence, extraction-processing coupling, and dynamic capacity constraints directly into the action generation mechanism. Operating entirely zero-shot within a closed, data-secure environment, the framework produces complete, interpretable extraction and processing schedules without cloud-based inference, domain-specific fine-tuning, or retraining. To provide a trustworthy performance benchmark, a novel MILP formulation is developed that incorporates realistic operational and geotechnical constraints. Evaluated across mining instances of varying scale and time periods, the LLM-based framework recovers between 94\% and 99\% of the MILP optimal NPV while scaling linearly in computation time. These results position simulator-constrained LLM agents as a practical and scalable alternative to classical optimization for long-horizon industrial scheduling under complex operational constraints.

Editor's pickProfessional Services
Microsoft News· 4 days ago

KPMG and Microsoft scale trusted, enterprise AI agents globally through deployment of Agent 365 and Copilot - Source

With Microsoft 365 Copilot, KPMG ... 365 further enhances KPMG’s Workbench ecosystem, providing centralized governance and control of AI agents operating across systems, data and business processes....

Editor's pickTechnology
Theregister· 4 days ago

Apple’s iOS 27 goes all agentic on compromised passwords, promises to change them with one tap

iBiz might not win the AI race, but analysts say it's focusing on features people may actually use

Editor's pick
Arxiv· 3 days ago

Regimes: An Auditable, Held-Out-Gated Improvement Loop Demonstrated on LongMemEval with ActiveGraph

arXiv:2606.10241v1 Announce Type: new Abstract: Autonomous improvement loops are hard to trust because the improvement process is usually external scaffolding bolted onto the agent: failures go unlogged, diagnoses cannot be replayed, and promote-or-discard decisions land in a side database rather than the agent's own history. We show that an event-sourced agent runtime removes that friction and turns controlled improvement into a first-class workflow. When the agent's state is a deterministic projection of an append-only event log, failures are recorded, a run replays exactly from its log, candidate patches scope to typed pipeline seams, gates are auditable, and every promotion or discard is itself an event. We demonstrate this with Regimes, a loop on the ActiveGraph runtime that diagnoses failed evaluations, proposes a repair at a pipeline point, and promotes it only after static checks, sandbox execution, in-sample evaluation, and held-out validation. The loop is target-agnostic: the same control flow runs against different tasks through a common interface. On LongMemEval-S the dominant failure is not retrieval but reconciliation: the evidence is already in the assembled context, yet the reader answers incorrectly. Across five seeded held-out splits, Regimes discovers reader-prompt repairs that improve final held-out accuracy by +0.05 to +0.10 in four splits and +0.01 in one over-promotion split; two splits are individually significant (seed 5 unadjusted for its sequential promotion structure), and the pooled count is descriptive only, since the splits share one 500-question pool. The durable contributions are ActiveGraph as an auditable substrate that makes controlled improvement loops tractable, the held-out-gated loop it supports, the failure-regime taxonomy routing each failure to a pipeline location (whose marginal value over an unrouted baseline is the primary open question), and the prompt-as-discovery-probe hypothesis.

Editor's pickTechnology
Arxiv· 3 days ago

The Agentic Web Requires New Normative Infrastructure

arXiv:2606.10711v1 Announce Type: new Abstract: The agentic web, in which users interact with the internet largely through agents acting on their behalf, is now technically feasible. However, many of the consumer and social benefits that could be realized by online AI agents acting scrupulously in their principals' interest are currently obstructed by outdated laws, terms of service, and other less formal practices which allow online platforms to block and degrade agent access, often in secret. No distinction is currently drawn between "malicious bots" and AI agents acting with the express delegated authority of a user. For the agentic web to realize its promise, it needs not only the technical infrastructure of protocols and interfaces, but the normative infrastructure of a broadly-accepted and socially-beneficial set of laws, norms and practices governing agentic access to online properties. Building that normative infrastructure requires a society-wide conversation. This paper aims to help precipitate that conversation, to identify normative principles that can guide it, and to advocate for policies that enable users' appropriately delegated agents to act online on their behalf, with as few curbs on their doing so as is reasonable given the other legitimate interests at stake.

Editor's pickTechnology
Daily AI News June 9, 2026: Claude's Dynamic Workflows - Automation on Steroids· 4 days ago

A Harness for Every Task: Dynamic Workflows in Claude Code

This article explores dynamic workflow harnesses for Claude Code that coordinate multiple AI agents across long-running tasks, enabling orchestration and task decomposition.

AI Infrastructure & Compute11 articles
Editor's pickPAYWALLTechnology
Bloomberg· 3 days ago

Meta Partners With Reliance on First India AI Data Center

Meta Platforms Inc. is partnering with Reliance Industries Ltd. to build its first AI data center in India, adding to a wave of investment in tech infrastructure globally.

Editor's pickEnergy & Utilities
Artificial Intelligence Newsletter | June 9, 2026· 4 days ago

Ohio becomes test case for data center opposition as US tech companies defend plans

Tech companies are facing scrutiny from Ohio lawmakers over their hyperscale data center development plans. Representatives from Amazon, Meta, Google, and Microsoft were pressed on the economic benefits of these projects.

Editor's pickEnergy & Utilities
Artificial Intelligence Newsletter | June 12, 2026· Yesterday

The new politics of data centers: US governors say build, but pay

Governors in states like Texas and Illinois are imposing new conditions on data center developers to protect water resources and ensure developers pay for necessary infrastructure.

Editor's pickTechnology
Arxiv· 3 days ago

From Stacks to Circuits: A Regenerative Socio-Technical Roadmap for AI Infrastructure within Planetary Boundaries

arXiv:2606.10544v1 Announce Type: cross Abstract: Current scaling trajectories for Generative AI, typified by linear supply-side "stacks," prioritize performance density while externalizing significant thermodynamic and material costs. As the "Twin Transition" of green and digital transformation accelerates, the industry faces technology gaps - including Scope 3 emissions and e-waste recycling - that impede sustainable scaling and lead to social tensions. This study proposes a Regenerative Socio-Technical roadmap that repurposes the Sustainable Production and Consumption system map to reframe artificial intelligence infrastructure as a system-of-systems governed ultimately by planetary limits. By integrating the Institute of Electrical and Electronics Engineers International Roadmap for Devices and Systems (IEEE IRDS) sustainability considerations for semiconductor facilities, the study proposes a metabolic circuit framework that centers "Values and Needs" within production and consumption relationship loops. This study identifies critical gaps in current Nvidia-centric roadmaps and proposes a competing reference architecture. It demonstrates how a spontaneous order of resource parsimony and planetary accountability can provide an actionable pathway for regulatory compliance and industrial resilience in the digital circular economy.

Editor's pickEnergy & Utilities
Guardian· 4 days ago

World’s first wind-powered underwater datacentre starts operating in China

Datacentre off Shanghai coast uses less power and water than land-based equivalent The world’s first wind-powered underwater datacentre has started operations off the coast of Shanghai, as China presses forwards with solutions for energy challenges created by the country’s artificial intelligence boom. The Shanghai Lingang undersea datacentre demonstration project, which launched in May, has a capacity of 24 megawatts. It is a joint effort between HiCloud Technology and China Communications Construction, a state-owned company. Continue reading...

Editor's pickEnergy & Utilities
Daily Brew· 3 days ago

Startup’s nuclear-inspired cooling system could make data centers more sustainable

A new cooling system inspired by nuclear technology could significantly improve the sustainability of data centers.

Editor's pickTechnology
Siliconrepublic· 4 days ago

Bloomberg: China plans $295bn spend for nationwide data centre build-out

China’s core AI industry – which boasts more than 6,200 companies – was valued at nearly $174bn in 2025. Read more: Bloomberg: China plans $295bn spend for nationwide data centre build-out

Editor's pickManufacturing & Industrials
Arxiv· 3 days ago

Dismantle and Dissolve, (Re)build, Remix: A Research-creation Inquiry into the Political Economy of Graphics Cards

arXiv:2606.10958v1 Announce Type: new Abstract: This contribution follows a four-year investigation (2022--2026) into the political economy of graphics card miniaturization. It begins from the premise that rethinking our relationship to artificial intelligence and its sociotechnical entanglements requires demystifying and opening the black box of this technical object. Within our algorithmic culture, the graphics card (GPU) enables the massive, parallel processing of large datasets, making possible the training of the models that underpin our intelligent systems. GPU miniaturization is equally crucial: as a key driver of the Internet of Things, this sociotechnical phenomenon enables the inclusion of these cards in increasingly compact and powerful systems while also enabling better management of energy resources. The development of these everyday objects and technologies nevertheless reinforces several major problems. Drawing on both the social sciences and the critical, reflexive, speculative, and fictional methodologies of research-creation, the author developed several investigative fieldwork sites -- among liquid nitrogen overclockers in Taiwan and urban miners in Ghana -- and conducted situated experimentations on some fifty acquired graphics cards. Structured around three themes (dismantle and dissolve, rebuild, remix), this paper demonstrates how research-creation methods constitute full epistemologies for apprehending what seems a priori external, opaque, or inaccessible, and for restoring artificial intelligence to its tangible materialities. In doing so, it contributes to the field of ICT for sustainability by affirming research-creation as a rigorous means of disentangling the material and environmental infrastructures that computational systems both depend on and obscure.

Editor's pickTechnology
Daily Brew· 4 days ago

Prefill Once, Fan Out: KV Snapshot Sharing for Multi-Agent LLM Pipelines

Learn how to build a C++ runtime with copy-on-fork KV snapshots to eliminate redundant LLM prefills in multi-agent pipelines.

Editor's pickEnergy & Utilities
Bebeez· 4 days ago

Real estate firm Trevian partners with Glesys for a data center in Oulu, Finland

Finnish real estate firm Trevian is partnering with European cloud firm Glesys for a new data center in Oulu, Finland. The companies this week announced a partnership to establish Campus Oulu, a new AI-ready data center campus in Oulu, a city in central Finland and the regional capital of North Ostrobothnia. – Google Maps The […]

Editor's pickTechnology
Bebeez· 4 days ago

Ark DC to add new building to Longcross data center campus outside London, UK

UK data center firm Ark is expanding one of its facilities outside London to accommodate Nebius. The company this week announced the investment of £807 million ($1bn) in its campus at Longcross Park in Surrey, enabling AI cloud provider Nebius to expand its deployment at the site. – Dan Swinhoe As part of the deal, Nebius will […]

AI Models & Capabilities10 articles
Editor's pickTechnology
VentureBeat· 4 days ago

On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.

On-device AI models have stayed small because the entire weight set has to live in DRAM, capping practical parameter counts well below what server-side deployments use. Enterprise architects evaluating agentic workloads have had to choose between capable cloud-dependent models and limited on-device ones. Apple's third-generation foundation models, announced at WWDC26, break that constraint by moving the weight set off DRAM entirely. The AFM 3 family was developed in collaboration with Google and spans five models: two on-device and three server-based, all running within Apple's Private Cloud Compute boundary. The server-side models, including AFM 3 Cloud Pro for agentic tool use and complex reasoning, run on Nvidia GPUs in Google Cloud. The on-device architecture is Apple's own. AFM 3 Core Advanced is a 20-billion-parameter model that stores weights in NAND flash rather than DRAM. "Instead of forcing the entire model into DRAM, the full model is stored in flash memory," Apple's research team wrote. "Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, as standard MoE models require, AFM 3 Core Advanced makes routing decisions per prompt." How the architecture actually works The memory wall Apple is working around is one every local AI developer runs into. "You can't put 20B parameters in RAM at any reasonable precision," Awni Hannun, a researcher at Anthropic and former Apple research scientist, posted on X. "To make it work they are using pretty exotic architecture by today's standards. A small model predicts from the query (or prompt) which experts to load from NAND into RAM." That prediction-and-load mechanism has three distinct components, each driven by the hardware constraints of consumer silicon. The full 20B weight set lives in flash, not DRAM. AFM 3 Core Advanced stores its entire parameter set in NAND flash rather than active memory. Standard on-device deployments require the full model to fit in DRAM, which is what caps their parameter counts. Apple's approach, which it calls Instruction-Following Pruning (IFP) and developed with its own researchers, treats flash as the model's permanent home and DRAM as a working buffer for whichever experts a given prompt requires. Expert routing happens once per prompt, not per token. In a conventional Mixture of Experts model, a router selects different experts for every token generated — which would require continuous weight movement between flash and DRAM at inference speed. NAND-to-DRAM bandwidth cannot support that. AFM 3 Core Advanced routes once at prompt time, selects a fixed expert set, loads it into DRAM alongside always-active shared experts, and generates all tokens from that same configuration. "The key distinction from a typical MoE is that you do this once per query and then generate all the tokens with the same experts," Hannun wrote. Active parameter count scales from 1B to 4B depending on task complexity. Rather than running a fixed model size for every request, AFM 3 Core Advanced adjusts how many parameters it activates based on what the task requires — 1 billion for simpler operations, up to 4 billion for harder ones, all drawn from the 20-billion-parameter pool in flash. What Apple has and hasn't disclosed The architecture paper is detailed on the memory design and sparse activation mechanism. It is less forthcoming on practical deployment constraints. Apple's profiling tools expose timing but not the metrics that decide production viability. "Energy, memory bandwidth, thermal? Not in the docs," Marco Abis, who is building Ziraph, a profiler for local AI on Apple silicon, posted on X. "A notable gap, given those decide most of on-device performance."  Abis also did not find a statement in Apple's documentation — across the Core AI docs, the Foundation Models docs or the Private Cloud Compute security post — of when an on-device request transparently offloads, or whether that routing is visible to the developer or the user. For enterprises that need to document where inference runs, that is a direct compliance problem. Not all the information is currently available. Apple has indicated a full technical report with benchmarks is coming later this summer. What this means for enterprise architects Regulated industries evaluating agentic AI deployments now have a concrete architectural decision to make. The DRAM wall for on-device agents just moved. Enterprises evaluating agents that need to run without a cloud round-trip now have a 20-billion-parameter local option to evaluate. The constraint shifts from model capability to device hardware. The private/cloud boundary is now an architectural decision, not a default. Simpler requests stay on-device; complex agentic tasks route to AFM 3 Cloud Pro on Private Cloud Compute. Apple has not publicly specified when a request offloads or whether that routing is visible to the developer — a gap that complicates policy decisions for organizations that need to document where inference runs. The agentic server tier depends on Google Cloud. AFM 3 Cloud Pro runs on Nvidia GPUs in Google Cloud. The Private Cloud Compute guarantee covers data privacy. It does not eliminate the Google Cloud dependency for server-side inference. AFM 3 Core Advanced gives enterprises a 20-billion-parameter on-device option that did not exist before WWDC26. Whether it is deployable at scale depends on answers Apple has not yet published. Those details are due in the summer technical report.

Editor's pickTechnology
VentureBeat· 4 days ago

Cohere open-sources a coding agent that runs on a single H100

Engineering teams building agentic coding pipelines now have a concrete open-source alternative to managed models like Claude Fable 5 — one that runs on a single H100. The tradeoff: Cohere's North Mini Code, which launched Tuesday, generated three times the output tokens of comparable models in independent testing, a verbosity cost that compounds in high-volume production workloads. The new open-source model is a 30 billion parameter mixture-of-experts (MoE) model with 3 billion parameters active per token, built for agentic software engineering including sub-agent orchestration, architecture mapping, code review and terminal work. The model supports a 256,000 token context window with a 64,000 token maximum generation length, and is available on Hugging Face under an Apache 2.0 license. What North Mini Code can do North Mini Code targets the full agentic coding stack. Here is what the model does and what it runs on. Software engineering. Cohere built North Mini Code specifically for agentic software engineering, not adapted from a general-purpose base. It has integrated tool-use capabilities and supports interleaved thinking, which Cohere says improves performance across multi-step agentic work. Architecture mapping and code review. North Mini Code can analyze and map systems architecture, surface dependencies and perform code review across large codebases. With a 256,000 token context window, it can hold substantial multi-file projects in a single context pass. Terminal-based agentic tasks. The model is trained for terminal environments, handling shell interactions, package scripts and command-line tooling. Cohere benchmarked it on Terminal-Bench v2, which tests agents in real terminal environments rather than synthetic code generation tasks. How it was built North Mini Code is a sparse mixture-of-experts model with 128 experts, of which 8 activate per token. The compute requirement at inference time is closer to a 3 billion parameter model despite 30 billion total parameters. Nick Frosst, co-founder of Cohere, demoed it running on a Mac Studio via MLX at around 20 gigabytes of RAM, the same machine he uses for his own local coding work. Cohere trained the model through two stages of supervised fine-tuning followed by reinforcement learning with verifiable rewards across more than 70,000 verifiable tasks spanning approximately 5,000 repositories, deduplicated against SWE-Bench.  Rather than optimizing against a single agent scaffold, Cohere trained across three. SWE-Agent uses a rich CLI with specialized commands. Mini-SWE-Agent uses a single bash tool with raw shell output. OpenCode uses individually typed tools returning structured JSON. Cohere reports a 10 percentage point gain on OpenCode evaluation from the multi-harness approach while maintaining SWE-Agent performance. Where it fits North Mini Code enters a market that now includes Mistral Devstral Small 2, GitHub Copilot, Cursor, and Claude Fable 5 — each with distinct cost and deployment tradeoffs. Cohere's primary benchmark comparison is against Mistral Devstral Small 2, a 24 billion parameter dense model. In vendor-reported internal tests, Cohere claims 2.8x higher output throughput and a 30% inter-token latency advantage over Devstral Small 2 in internal tests under identical hardware configurations. Cohere also claims, in its Hugging Face technical post, that North Mini Code outperforms open-source models up to four times its parameter count on its reported benchmarks, including models at 120 billion parameters. Artificial Analysis independently ranks it eighth of 127 comparable open-weight models on output speed at 210 tokens per second, with a time to first token of 0.25 second against a class median of 1.95 seconds. It places 18th of 127 on the Artificial Analysis Intelligence Index. One flag from the same data: the model generated 75 million output tokens to complete the Intelligence Index against a class median of 25 million. In high-volume agentic pipelines, that verbosity compounds into inference cost and latency. "Suddenly people are thinking like hey, am I getting enough economic value out of the tokens from a model?" Frosst said during the launch video. "Local deployment is one way of empowering people and making AI really something that works for them." GitHub Copilot, Cursor and Claude Code operate on per-usage or subscription pricing with no on-premises option. Anthropic's Claude Fable 5, now the most capable publicly available managed coding model, runs at $50 per million output tokens. For Frosst, the model is the polar opposite of Fable. "Its small, cost effective, apache 2.0, and locally deployable. This is the way LLMs should go. small, open source, transparent and sovereign, vs large, expensive, proprietary and hegemonic," Frosst wrote in a post on X. What this means for enterprises For teams building production agentic coding pipelines, North Mini Code's release clarifies a set of decisions that have been forming for months. Purpose-built agentic training is now a baseline to evaluate against. The distinction between models fine-tuned for code and models trained specifically for agentic workflows, with verified tool calls and multi-harness robustness, is now a material factor in pipeline decisions. Any model vendor claiming agentic coding capability should be able to answer whether its training used verifiable agentic tasks or was adapted from a general-purpose base. Verbosity is a hidden pipeline cost that benchmarks do not surface. Artificial Analysis measured North Mini Code generating three times the output tokens of comparable models. That verbosity compounds across inference cost and latency in high-volume pipelines. Throughput testing against actual workload volume is the evaluation step the benchmark rankings skip. The frontier pricing split is now a real architectural decision. Fable 5 at $50 per million output tokens and North Mini Code on a single H100 represent a genuine tradeoff between cost control and data residency on one side, and managed infrastructure overhead on the other. Teams running high-volume agentic coding pipelines should model both cost paths against their actual workload before committing to either.

Editor's pick
Arxiv· 3 days ago

From Context-Aware to Conflict-Aware: Generalizing Contrastive Decoding for Knowledge Conflict in LLMs

arXiv:2606.10298v1 Announce Type: new Abstract: When large language models generate from retrieved or augmented contexts, conflicts between external context and parametric priors remain a central reliability bottleneck. Existing contrastive decoding methods follow a \emph{context-aware} paradigm that unilaterally amplifies context over parametric priors, overwriting correct priors when the context is erroneous. We generalize this to the \textbf{conflict-aware} paradigm that dynamically allocates authority between prior and context based on conflict signals, rather than presupposing context trustworthiness. We show that the affine combination of prior and context logits yields a \textbf{power family} with an inherent \textbf{regime asymmetry}: extrapolation amplifies errors unboundedly when the prior is correct, interpolation under-corrects when the context is correct, and no static regime covers both. Existing contrastive decoding methods are instances of this family, mostly extrapolative. To evaluate both conflict directions, we propose TriState-Bench, a model-aware evaluation protocol that calibrates per-model prior knowledge to measure three conflict states: correction, resistance, and agreement. To resolve the asymmetry, we propose Adaptive Regime Routing (ARR), which routes between regimes at each step, lifting resistance EM from below 6 to 16--33 without sacrificing correction or agreement. Our code is available at https://github.com/keith-Jiang/conflict-aware-decoding.

Editor's pickEducation
Arxiv· 3 days ago

RealMath-Eval: Why SOTA Judges Struggle with Real Human Reasoning

arXiv:2606.10254v1 Announce Type: new Abstract: While Large Language Models (LLMs) have achieved near-perfect performance in \emph{solving} high-school mathematics, their ability to \emph{evaluate} the diverse reasoning processes of real human students remains under-examined. To bridge this gap, we introduce \textbf{RealMath-Eval}, a rigorously annotated benchmark of 224 real-world exam responses from high schools. Our initial evaluation reveals that even state-of-the-art LLM judges struggle significantly on this task, exhibiting a high Mean Squared Error ($\sim$2.96) against expert human grading. To probe a plausible explanation, we contrast this performance with a control setting where the same judges evaluate synthetic LLM-generated solutions. We identify a stark ``Evaluation Gap'': judges are considerably more accurate and consistent on synthetic text (MSE $\sim$1.17) but struggle to generalize to authentic student reasoning. Through semantic embedding analysis, we find that synthetic errors suffer from a ``structural collapse'' into predictable, low-dimensional linear subspaces, whereas human errors form a more diverse error space. Furthermore, generative probability probes suggest that human reasoning involves significantly higher information-theoretic surprisal, indicating that student reasoning transitions are more out-of-distribution for current models. Finally, we find that surface-level style transfer fails to close this gap. Our findings suggest that current LLM evaluation pipelines relying heavily on synthetic data may not adequately capture the diversity of authentic student mathematical reasoning.

Editor's pickTechnology
VentureBeat· 4 days ago

Anthropic brings Mythos to the masses with Claude Fable 5, its most powerful generally available model ever

Anthropic today launched two new AI models — Claude Fable 5 and Claude Mythos 5 — marking the company’s first broad release of the powerful “Mythos-class” AI capabilities it previously made available only to participating organizations in its restricted cybersecurity program, Project Glasswing, which it announced two months ago. The company says Fable 5, which is the version most users and developers will get starting today, exceeds every Claude model it has previously made generally available — featuring stronger performance across software engineering, knowledge work, vision, scientific research and long-running tasks. It smashes the existing benchmarks and comes atop on nearly all of them, though the prior Claude Mythos Preview version of the model still takes the top spots on computer use and multidisciplinary reasoning (see benchmark chart below and here). The new Claude Mythos 5, by contrast, is less restricted in its capabilities, but more restricted in its availability. It is an upgraded version of the prior, similarly capable but limited release Mythos Preview model. As such, it has certain safeguards lifted — but it’s only officially accessible to Anthropic-approved users, including Anthropic's cybersecurity partners in its Project Glasswing effort, and select biology researchers. The key difference is that the general purpose Fable 5 wraps the same underlying Mythos-class capability in new safeguards. Anthropic says requests involving certain high-risk areas — including cybersecurity, biology and chemistry, and model distillation — are automatically routed to Claude Opus 4.8, Anthropic's previously flagship general model, instead, with users notified when that happens. That is not the case on Mythos 5. The company says more than 95% of Fable 5 sessions run entirely on Fable 5’s own responses, with no fallback, and that internal and external red-teaming efforts found no “universal jailbreaks” after more than 1,000 hours of testing. Anthropic says Fable 5 is available to the general public today through its website, apps, and API, but that Mythos 5 will initially only be made available to users who already have access to the older Claude Mythos Preview. Pricing, access and a tricky rollout Anthropic is pricing both Fable 5 and Mythos 5 at $10 per million input tokens and $50 per million output tokens. The company says that is less than half the price of Claude Mythos Preview, but still ranks as the most expensive of major AI models available globally. VentureBeat Frontier AI Model API Pricing Snapshot Model Input Output Total Cost Source MiMo-V2.5 Flash $0.10 $0.30 $0.40 Xiaomi MiMo deepseek-v4-flash $0.14 $0.28 $0.42 DeepSeek deepseek-v4-pro $0.435 $0.87 $1.305 DeepSeek MiniMax-M3 $0.30 $1.20 $1.50 MiniMax Gemini 3.1 Flash-Lite $0.25 $1.50 $1.75 Google Qwen3.7-Plus $0.40 $1.60 $2.00 Alibaba Cloud MiMo-V2.5 $0.40 $2.00 $2.40 Xiaomi MiMo Grok 4.3 (low context) $1.25 $2.50 $3.75 xAI GLM-5 $1.00 $3.20 $4.20 Z.ai Kimi-K2.6 $0.95 $4.00 $4.95 Moonshot/Kimi GLM-5.1 $1.40 $4.40 $5.80 Z.ai Grok 4.3 (high context) $2.50 $5.00 $7.50 xAI Qwen3.7-Max $2.50 $7.50 $10.00 Alibaba Cloud Gemini 3.5 Flash $1.50 $9.00 $10.50 Google Gemini 3.1 Pro Preview (≤200K) $2.00 $12.00 $14.00 Google GPT-5.4 $2.50 $15.00 $17.50 OpenAI Gemini 3.1 Pro Preview (>200K) $4.00 $18.00 $22.00 Google Claude Opus 4.8 $5.00 $25.00 $30.00 Anthropic GPT-5.5 $5.00 $30.00 $35.00 OpenAI Claude Fable 5 / Claude Mythos 5 $10.00 $50.00 $60.00 Anthropic For developers, Fable 5 is available through the Claude API as claude-fable-5. Anthropic says Fable 5 is fully available today on the Claude API and on consumption-based Enterprise plans. For subscription users, the rollout is more complicated. Anthropic says Fable 5 will be included on Pro, Max, Team and seat-based Enterprise plans at no extra cost from today through June 22. On June 23, the company plans to remove Fable 5 from those plans, after which using it will require usage credits. Anthropic says it aims to restore Fable 5 as a standard part of subscription plans as quickly as possible. The difference between Fable 5 and Mythos 5 Anthropic is not presenting Fable 5 and Mythos 5 as two separate models in the usual “small versus large” sense. Instead, they appear to share the same base capability level. The difference is access control — that is, how easily it will be for users to get their hands on the models, and the guardrails embedded in each. As previously mentioned Fable 5 includes a new safeguard layer that detects certain high-risk requests — including cybersecurity, biology and chemistry, and attempts to distill the model’s capabilities into other systems — and routes those requests to Claude Opus 4.8. Mythos 5 lifts some of those restrictions for trusted users working in approved domains. In practical terms, Mythos 5 is more powerful for sensitive cyber and biology work because it can answer in areas where Fable 5 falls back. For most ordinary enterprise and developer tasks, however, Anthropic says Fable 5 performs effectively the same as Mythos 5. The launch also signals how Anthropic plans to bring frontier models with dangerous dual-use capabilities into the market: not by releasing all capabilities to everyone, and not by simply refusing risky questions, but by routing some requests to a less capable model while keeping the stronger model available for the majority of everyday work. A major improvement in autonomous coding For enterprise buyers, the most immediate use case is likely software engineering. Anthropic says Fable 5 can work unattended for longer and with more independence than previous Claude models, which is exactly the capability enterprises need if they want AI agents to do more than autocomplete code or answer developer questions. On SWE-bench Pro, which measures a model's ability to complete difficult software engineering tasks, Anthropic says Fable 5 and Mythos 5 reach 80.3%, vastly outperforming OpenAI's latest and greatest general model GPT-5.5, which scored 58.6%. On Cognition’s FrontierCode Diamond benchmark, which tests high-quality, maintainable agentic coding, the models score 29.3%, compared with 13.4% for Claude Opus 4.8 and 5.7% for GPT-5.5, according to the benchmark table included in Anthropic’s materials. Anthropic also says Fable 5 scores highest among frontier models on FrontierCode even at medium reasoning effort, suggesting the model may deliver stronger coding results without always needing maximum compute. The most striking customer example comes from Stripe. Anthropic says Stripe tested Fable 5 in a 50-million-line Ruby codebase and found that the model completed a codebase-wide migration in one day that otherwise would have taken a team more than two months by hand. Stripe said, “Fable 5 compresses months of engineering into days. In our 50-million-line Ruby codebase, it did in a day what would've taken us more than two months by hand.” Other early users describe the model as especially useful for long-horizon development tasks. Cursor said, “Fable 5 is the state of the art model on CursorBench. It's opened up a class of long-horizon problems that were out of reach for earlier models.” Replit said Fable 5 is the highest-performing model it has tested on ViBench, its end-to-end “vibe-coding” benchmark, and that it builds apps in less time with fewer tokens. Figma said Fable 5 is “a clear step forward on agentic coding and prototyping.” This is the enterprise shift Anthropic is trying to sell: AI coding systems that can take on larger units of work, not just individual tickets. That could include codebase migrations, app prototyping, pull request review, test generation, debugging across unfamiliar tools, user interface design and multi-step internal software projects. Base44 said, “Fable 5 is much deeper and better at one-shotting full apps, and its tool calling is excellent.” Genspark said, “Fable 5 came out #1 on our evals, winning head-to-head against every model we tested. It was significantly stronger on the hardest tasks in the set — UI design and game coding.” Rakuten said, “At the highest effort, Fable 5 reflects on and validates its own work. For us, that's what makes highly autonomous operations possible — the extra thinking pays for itself.” For CTOs and engineering leaders, that suggests the model’s value may come less from raw code generation and more from sustained execution: understanding an intent, planning steps, calling tools, checking its own work and continuing through a task without constant human steering. Knowledge work, finance, legal and operations Anthropic is also positioning Fable 5 as a stronger model for enterprise knowledge work. On GDPval-AA, Anthropic reports a score of 1932 for Fable 5 and Mythos 5, compared with 1890 for Claude Opus 4.8, 1769 for GPT-5.5 and 1314 for Gemini 3.1 Pro. On GDPpdf, a benchmark focused on visual document reasoning, Fable 5 and Mythos 5 score 29.8% without tools, compared with 22.5% for Opus 4.8, 24.9% for GPT-5.5 and 16.7% for Gemini 3.1 Pro. That matters for enterprises because much of corporate work still lives in messy documents: PDFs, spreadsheets, charts, reports, contracts, filings, slide decks and screenshots. Anthropic says Fable 5 shows gains in document-based reasoning, chart and table interpretation and complex problem solving. Hex said, “Fable 5 is the first to break 90% on our core analytics benchmark of complex, long-running analytical tasks — a 10-point jump over Opus. On the hardest questions, it shows strong judgment and attention to nuance.” Hebbia said Fable 5 was the highest-scoring model on its Finance Benchmark for senior-level reasoning, with double-digit gains in document reasoning, chart and table interpretation, and problem solving. The finance examples are notable because they point to AI agents moving beyond summarization into higher-stakes analytical workflows. IMC said Fable 5 “aced our trading-analysis evaluations nearly across the board: factual lookup, conceptual reasoning, root-cause analysis, expected-value analysis.” Optiver said the model was stronger than Opus 4.8 on its trading benchmark and “remarkably consistent,” scoring identically across repeated runs. Balyasny Asset Management said Fable 5 was the strongest finance-first model it had tested. Legal and operations teams may also see immediate impact. Crosby Legal said, “Fable 5 feels materially different. In blind review, our lawyers found its redlines matched or beat our current model every time.” Notion said the model can take work “you'd chip away at all afternoon” and turn messy notes into a functioning project plan. Zapier said Fable 5 is the new leader on AutomationBench and is more autonomous than Opus 4.8: “Where Opus stops to ask, Fable 5 keeps looking.” For enterprise software vendors, that points toward more capable embedded agents in workflow products: agents that can review a contract, update a project plan, assemble a spreadsheet, inspect a chart, file a ticket, run a query, call an internal API and keep going until the work is complete. Vision and interface understanding Anthropic says Fable 5 is also its strongest vision model. In its launch materials, the company says the model can extract precise numbers from detailed scientific figures and complete vision-based tasks such as rebuilding a web app’s source code from screenshots alone. That has immediate implications for enterprise automation. Many business processes still depend on visual interfaces that are not cleanly exposed through APIs: dashboards, PDFs, forms, legacy apps, screenshots, scans and image-heavy reports. A stronger vision model could help agents operate across those environments with less custom integration work. Anthropic also says Fable 5 needs less scaffolding than previous Claude models. As an example, the company says earlier Claude models struggled to play Pokémon FireRed even with extra tools, while Fable 5 impressively beat the game using a minimal vision-only harness. Anthropic posted a fast forwarded video of its playthrough to YouTube and in its blog post: The point is not gaming itself, but the broader agentic skill: reading a visual environment, remembering progress, deciding what to do next and executing over a long horizon. In another internal test, Anthropic says it had the model play the deck-building game Slay the Spire with access to persistent file-based memory. The company says persistent memory improved Fable 5’s performance three times more than it improved Opus 4.8’s, and that Fable reached the game’s final act three times more often. For enterprise users, this suggests Fable 5 may make better use of notes, logs and stored context during multi-step work. That could matter for internal agents that operate over days or weeks: sales operations agents that track account research, engineering agents that manage migrations, finance agents that update models, or support agents that remember what they tried across many turns. From restricted cyber model to general-purpose enterprise AI The announcement follows Anthropic’s April 2025 rollout of Claude Mythos Preview through Project Glasswing, a restricted program for cyber defenders, critical infrastructure providers and major software maintainers. Anthropic created Glasswing after internal evaluations showed Mythos-class models could find and exploit software vulnerabilities at a level that raised meaningful misuse concerns. Following the debut of Glasswing and Mythos, U.S. officials and intelligence agencies began weighing how such models could reshape both cyber defense and offensive operations, while Sen. Mark Warner warned that AI-assisted vulnerability discovery should force industry to “accelerate and reprioritize patching.” Financial regulators also took notice: The Guardian reported that Mythos entered discussions among senior banking officials and regulators in the U.S. and U.K. because of fears that AI-accelerated cyberattacks could threaten payment systems and broader financial stability. The reaction has not been limited to alarm. Governments also want access: Reuters reported that South Korea’s national internet security agency had secured Mythos access through Project Glasswing, reflecting a broader geopolitical race to use frontier AI for national cyber defense. At the same time, Anthropic has faced scrutiny over whether it can safely gate the very capabilities it says are too risky for general release. The Verge reported that unauthorized users accessed Mythos after its limited rollout, calling the incident damaging for a company that has built its brand around responsible AI. Critics have also questioned whether Anthropic’s warning-heavy framing risks becoming a form of market positioning, since it casts the company as both the source of the new capability and the gatekeeper deciding which governments, companies and researchers get to use it. With Fable 5, Anthropic is leaning into its gatekeeper role, attempting to separate the general enterprise value of a Mythos-class model from the riskiest parts of its capability profile. The company says Fable 5 can handle software engineering, research, visual reasoning, document analysis and long-running agentic workflows, while classifiers block or reroute requests that could provide what Anthropic calls “uplift” to malicious actors. Those classifiers cover three main areas. Cybersecurity, where Anthropic says Mythos-class models can discover and exploit vulnerabilities and perform broader “agentic hacking” tasks such as reconnaissance, discovery and lateral movement. Biology and chemistry, where the company says the same reasoning that can help researchers design therapies could also help well-resourced malicious actors pursue dangerous biological work. Model distillation, where Anthropic says users may try to extract Claude’s capabilities to train competing models, including models that could be released without similar safeguards. When Fable 5’s classifiers detect one of those categories, the response is automatically handled by Claude Opus 4.8. Anthropic says users will be told when this happens. That is a notable product decision: rather than declining those requests outright, Anthropic is trying to keep the user experience functional while reducing access to the most capable version of the model in sensitive areas. Anthropic says it red-teamed the new classifier system internally and externally. The company says an external bug bounty produced no universal jailbreaks after more than 1,000 hours of testing, and external red-teaming organizations also failed to find a universal jailbreak. One external partner found that Fable 5 complied with zero harmful single-turn cyber requests related to planning cyberattacks, exploit development or defense evasion, even when prompts used any of 30 public jailbreak techniques, according to Anthropic. The company is still acknowledging tradeoffs. Anthropic says the safeguards are deliberately cautious and may sometimes trigger on benign requests. That could frustrate security professionals, biology researchers and advanced enterprise users whose legitimate work overlaps with the blocked categories. The company says it plans to reduce false positives over time. Mythos 5 and the restricted frontier While Fable 5 is the broad commercial launch, Mythos 5 is the model to watch for enterprises operating in security, critical infrastructure and life sciences. The company says all users with Claude Mythos Preview access can upgrade to Mythos 5 beginning today. It plans to expand access through a trusted access program, in collaboration with the U.S. government. The distinction is important for sectors where the blocked capabilities are not edge cases but core workflows. A security team may need to reproduce vulnerabilities, test exploitability, analyze lateral movement or simulate attacker behavior in a controlled environment. A biology research team may need to reason through molecular design workflows that would trigger general-use safeguards. Fable 5 is not designed to give every user unrestricted access to those capabilities; Mythos 5 is designed for vetted users who need them. Anthropic says Mythos 5 has the strongest cybersecurity capabilities of any model in the world. In the company’s benchmark table, the model family scores 78.0% on ExploitBench, compared with 69.0% for Claude Mythos Preview, 40.0% for Opus 4.8 and 34.0% for GPT-5.5. On CyberGym, Anthropic’s chart shows Mythos 5 at 83.8%, slightly ahead of Mythos Preview at 83.1% and far above Opus 4.8 with default safeguards. The company is making a similar argument in biology. Anthropic says Mythos-class models outperform dedicated protein language models on a task involving adeno-associated viruses, a delivery mechanism used in gene therapies. The company frames that as both promising and risky: the same capability that could help gene therapy research could also be misused in dangerous biological work. Anthropic says its internal protein design experts used Mythos 5 to accelerate parts of the drug design process by about tenfold. In one example, the company says Mythos 5, using protein design and bioinformatics tools without human assistance, matched or beat skilled human operators by choosing binding sites, selecting and running tools, and recovering from failures. Anthropic says nine of 14 protein targets in the study produced strong candidates for drug design that it is now investigating. The company also says Mythos 5 produced novel molecular biology hypotheses that Anthropic scientists preferred over Opus-class model hypotheses about 80% of the time in blinded comparisons. Anthropic says several of those ideas have advanced to experimental evaluation, and one hypothesis involving an E. coli protein was later corroborated by an independent lab working on the same problem. Those claims are potentially significant, but they should be treated carefully until more details are published. Anthropic says it intends to publish additional results in the coming months. For now, the strongest enterprise implication is directional: the company believes its highest-end models can already perform parts of scientific research workflows with less human intervention than prior systems. New, longer data retention requirement The company also introduced a new data-retention policy for Mythos-class models. Anthropic says it will require 30-day retention for all traffic on Fable 5, Mythos 5 and future models with similar or higher capability levels, across both first-party and third-party surfaces. The company says it will not use that data to train new Claude models or for non-safety purposes, and says it has added privacy protections including logging human access and deleting the data after 30 days in almost all cases. That policy may become one of the most important enterprise buying questions around Fable 5. Many businesses want frontier AI capability but also want strict control over data retention, especially in regulated sectors. Anthropic’s position is that stronger monitoring is necessary for models with this level of capability. Enterprise customers will have to decide whether the capability gain justifies the retention requirement. Enterprise implications The broader enterprise significance of Fable 5 is that Anthropic is trying to commercialize a more autonomous class of AI model without exposing all of its capabilities to every user. That could become a template for how frontier labs release increasingly powerful systems: one model family, multiple access tiers, and domain-specific restrictions depending on user trust and risk. If Fable 5 performs as Anthropic and early customers describe, developers may hand off larger tasks: code migrations, refactors, UI builds, test writing, bug fixing, documentation, internal tooling and multi-step app creation. For knowledge-work-heavy enterprises, Fable 5 could make AI more useful in workflows where earlier models were too brittle: finance research, spreadsheet analysis, legal redlines, procurement review, board materials, market research, sales operations and project planning. The main gain is not just better answers; it is fewer turns, fewer corrections and more ability to keep working through ambiguity. For security teams, the launch is more complicated. Most organizations will get Fable 5, not unrestricted Mythos 5. That means they may see stronger general coding and analysis, but not full access to the cyber capabilities Anthropic considers risky. Trusted defenders inside Project Glasswing will get Mythos 5, giving them a more direct way to use the model for vulnerability discovery and defensive testing. For life sciences companies, the pattern is similar. Fable 5 may help with general research, literature analysis, data interpretation and scientific reasoning, but the more sensitive biological capabilities will be restricted. Anthropic is effectively creating a separate access path for vetted researchers whose work requires capabilities that could be dangerous in the wrong hands. The launch also raises competitive pressure across the AI industry. Anthropic is claiming state-of-the-art results across agentic coding, knowledge work, vision, cybersecurity, legal reasoning, spatial reasoning and health benchmarks. But the more strategically important claim may be that it has found a workable release mechanism for models above its Opus class. If Fable 5’s safeguards hold up under real-world use, Anthropic will argue it can bring more powerful models to market sooner without fully opening the riskiest capabilities. That is still a large “if.” The enterprise market will test not only Fable 5’s benchmark performance, but also its reliability, false-positive rate, data-retention tradeoffs and cost at scale. A model that can complete more work autonomously can also burn more tokens, trigger more governance questions and create new review burdens for teams that must verify its output. Still, today’s launch marks a clear shift in the Claude lineup. Opus is no longer Anthropic’s top commercial capability tier. Mythos-class models now sit above it. Fable 5 is the first version of that tier for general users; Mythos 5 is the restricted version for trusted high-risk work. Together, they show how Anthropic plans to push frontier AI deeper into enterprise workflows while trying to keep the most dangerous capabilities gated.

Editor's pickTechnology
OpenAI Banked Resets 💳, Kimi 30% Faster Coding 🚀, RL Robot Sim 🤖· Yesterday

Kimi.ai open-sources new coding model with 30% fewer reasoning tokens

Moonshot AI released Kimi-K2.7-Code, an open-source model that reduces reasoning tokens by 30% to improve speed and efficiency in coding tasks.

Editor's pick
Arxiv· 3 days ago

What Spatial Memory Must Store: Occlusion as the Test for Language-Agent Memory

arXiv:2606.10299v1 Announce Type: new Abstract: Language-agent "memory palace" systems anchor each memory to a world coordinate, on the intuition that geometry adds something text cannot. We make that intuition testable and report three results. First, the memory-palace default of folding spatial proximity into a linear blend beside recency and importance does not help and can hurt: in a pre-registered recall experiment the shipped blend fails its own frozen test (mean Delta-Hit@5 -0.0375, Wilcoxon p=0.306), sitting at a position-blind baseline, while a geometry-led weighting wins decisively (+0.3208, p0.000, pooled exact McNemar p=2.5x10^-29), a run that surfaced and fixed a real relay anchor defect. We concede that occlusion-needs-geometry is near-tautological; the contribution is the measurement and isolation, separating what spatial memory must store from how it is read. These pilots power a frozen confirmatory study (SPMEM-ZERO-REAL-PREREG-v1); the full human-authored multi-world study with blind raters remains future work.

Editor's pickTechnology
Arxiv· 3 days ago

From Senses to Decisions: The Information Flow of Auditory and Visual Perception in Multimodal LLMs

arXiv:2606.10147v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) can listen and see, but how do audio and visual signals actually travel through the network to shape an answer? Despite their growing role in research and real-world applications, the internal pathways through which audio and visual tokens influence the final prediction remain poorly understood. In this study, we examine audio-visual information flow inside Audio-Visual Large Language Models (AVLLMs), tracing how AVLLMs route, utilize, and integrate audio and visual information across two input configurations, audio-visual video and multiple interleaved audio-visual items. We find that for audio-visual video, AVLLMs follow the sequential information flow pathway established for VLMs and VideoLLMs, with audio and visual contribution flowing along this pathway in proportion to the task's reliance on each modality. In settings with multiple interleaved audio-visual items, this routing shifts to different parallel streams. Furthermore, we demonstrate that audio-visual and other token types can be discarded once their information is transferred to LLM, with minimal impact on the model's prediction or even slight improvement, generalizing across multiple tasks and datasets, enabling more efficient inference. These findings hold across multiple models and scales, Qwen2.5-Omni and Video-SALMONN2 Plus at 3B and 7B scales, leading to hypotheses on why these flow structures emerge. Together, these results deliver the first coherent picture of how AVLLMs orchestrate sound and sight inside the network and lay the groundwork for the next wave of interpretability, design, and efficiency advances in audio-visual and broader MLLMs.

Editor's pickTechnology
Daily Brew· 4 days ago

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second

Xiaomi introduces a new high-speed model capable of 1000 tokens per second.

Editor's pickTechnology
MacRumors· 4 days ago

Apple Outlines Major AI and Developer Tool Updates at 2026 Platforms State of the Union - MacRumors

Apple yesterday held its WWDC 2026 Platforms State of the Union, detailing a wide range of updates to its developer tools and platforms, headlined by a major expansion of the Foundation Models framework. The main announcement was free access to Apple Foundation Models running on Private Cloud ...

AI Research & Science3 articles
Editor's pickTransportation & Logistics
Arxiv· 3 days ago

Mobility Anomaly Generation using LLM-Driven Behavior with Kinematic Constraints

arXiv:2606.10314v1 Announce Type: new Abstract: Although the study of human trajectory anomalies is critical for advancing spatial data mining, empirical research remains severely hindered by a pervasive lack of ground-truth datasets. Despite the availability of several real-world and simulated human trajectory collections, these datasets exclusively capture normal mobility patterns and lack annotated anomalies. This specific scarcity is fundamentally driven by the inherent statistical rarity of anomalous events, precluding the feasibility of conventional observational methods. Compounding this challenge, the systematic acquisition of large-scale mobility data is strictly bottlenecked by prohibitive costs and stringent privacy regulations. To overcome these fundamental limitations and establish a reliable human trajectory anomalies dataset with annotated ground truth, we introduce a novel, end-to-end generative framework designed to synthesize realistic trajectory anomalies at scale. Our architecture bridges the gap between purely synthetic mobility data and complex real-world physical constraints by operating directly on baseline simulated trajectories. We employ Large Language Model (LLM) agents to systematically inject semantically meaningful behavioral anomalies such as irregular out-of-distribution check-ins and skipped routine visits. To ensure rigorous spatial validity, the system leverages map-constrained routing reconstruction to recalculate the physical transitions between these LLM agent-modified staypoints. Moreover, to narrow the simulation-to-reality gap, we augment the resulting trajectories with a context-aware spatial noise model, parameterized by environmental and location-specific variables, to accurately emulate heterogeneous GPS sensor degradation.

Editor's pickTechnology
Top Daily Headlines: GitHub nukes 70+ Microsoft repos, breaks CI/CD pipelines, following suspected worm infections· 4 days ago

Python JIT compiler project under threat after steering council says proper process wasn't followed

No new features to be submitted to main branch, existing code removed in 6 months if new proposal not created and accepted.

Editor's pick
Arxiv· 3 days ago

Minimalist Genetic Programming

arXiv:2606.10237v1 Announce Type: new Abstract: Genetic programming (GP) is based on two important insights. First, that any learning task can fundamentally be posed as a program induction problem, where the goal is to construct a symbolic hierarchical model that is expressed as a syntax tree. Second, to pose this task as a search problem, and use evolution to locate the desired model. Since it was proposed, GP has produced notable results in a wide range of tasks and problem domains. This work presents an alternative view by modifying the second core insight of GP, posing the problem as a syntactic derivation task instead. In particular, this paper presents Minimalist Genetic Programming (MGP), an algorithm that like GP is biologically inspired, but instead of evolution it takes inspiration from the Minimalist Program to human language, in which syntax is understood as an optimal solution to the problem of linking two other mental systems. In minimalism, the core computational process is a binary set formation operator called $MERGE$, than can be used to incrementally construct complex syntactic structures using a simple Markovian process. MGP is able to discover the core building blocks of the symbolic expressions, and to incrementally combined them using $MERGE$. The proposed system is benchmarked on symbolic regression tasks that are known to be difficult to solve with standard GP systems because of the propensity for bloat. Results show that when a proper lexicon of atomic syntactic objects are chosen, MGP is able to consistently produce the exact ground truth model on a set of symbolic regression where standard GP struggles to do the same. The insights provided by minimalism are shown to be relevant to the problem of program induction, and should be explored further based on the potential exhibited by MGP in this work.

AI Security & Cybersecurity8 articles
Editor's pickTechnology
Arxiv· 3 days ago

Deployment-Time Memorization in Foundation-Model Agents

arXiv:2606.10062v1 Announce Type: new Abstract: Foundation-model agents are increasingly long-lived systems that remember users across interactions, making memorization an explicit deployment-time function rather than solely a property of model weights. Existing work addresses parametric memorization or audits fixed memory configurations, but does not characterize how memory-design choices jointly shape personalization utility, extraction risk, and deletion fidelity. We study this surface as deployment-time memorization, formulating agent memory as a privacy-utility frontier measured by Personalization Recall (PR) and Adversarial Extraction Rate (AER), and sweeping three memory-design knobs: summarization aggressiveness, retrieval breadth (k), and deletion mode. We further introduce the Forgetting Residue Score (FRS) to quantify whether deleted information remains recoverable from derived memory tiers. On LongMemEval, key-fact summarization reduces canary extraction by 76% on Gemma 3 12B and 64% on GPT-4o-mini while preserving nearly all personalization recall; critically, once content is compressed away, increasing k no longer restores leakage. The same compression, however, induces a deletion-fidelity failure: raw-only deletion leaves derived summary copies recoverable in approximately 20% of instances, and only full-pipeline purge or tombstone redaction drives worst-tier residue to zero. Together, these results establish that persistent agent memory must be evaluated as a first-class memorization mechanism -- assessed by what it helps agents recall, what it makes extractable, and what it can truly erase.

Editor's pickTechnology
Top Daily Headlines: Signal says UK plan to scan devices for nude images 'endangers us all'· 3 days ago

Devs know AI code is riddled with holes, but ship it anyway

Pressure to deploy wins out over security as four in five orgs confess to breaches from vulnerable apps.

Editor's pickTechnology
Guardian· 4 days ago

Spyware firm targeted WhatsApp users in defiance of US court order, Meta says

Tech company says it ‘caught and disrupted’ NSO Group’s attempts to access accounts in Jordan and Lebanon A spyware firm has been targeting WhatsApp users with malicious links in contravention of a US court order forbidding it from doing so, Meta has said. In a post, Meta said WhatsApp had “caught and disrupted spear phishing attempts” by NSO Group, which a spokesperson said targeted a handful of users in Jordan and Lebanon. It had also caught the group creating “test accounts and groups” on WhatsApp. Continue reading...

Editor's pickGovernment & Public Sector
Top Daily Headlines: Signal says UK plan to scan devices for nude images 'endangers us all'· 3 days ago

France probes compromise of gov messaging platform after account hijack

Authorities say the breach only exposed public chat rooms, but alleged attacker claims to have accessed far more data.

Adoption, Deployment & Impact

15 articles
AI Adoption Barriers & Enablers5 articles
Editor's pickMedia & Entertainment
Arxiv· 3 days ago

Towards Gaze-Informed AI Disclosure Interfaces: Eye-Tracking Attentional and Cognitive Load While Reading AI-Assisted News

arXiv:2605.14999v1 Announce Type: cross Abstract: As generative AI becomes increasingly integrated into journalism, designing effective AI-use disclosures that inform readers without imposing unnecessary burden is a key challenge. While prior research has primarily focused on trust and credibility, the impact of disclosures on readers' attentional and cognitive load remains underexplored. To address this gap, we conducted a $3\times2\times2$ mixed factorial study manipulating the level of AI-use disclosure detail (none, one-line, detailed), news type (politics, lifestyle), and role of AI (editing, partial content generation), measuring load via NASA-TLX and eye-tracking. Our results reveal a significant attentional cost: one-line disclosures resulted in significantly higher fixation durations and saccade counts, particularly for AI-edited content. Detailed disclosures did not impose additional burden. Drawing on Information-Gap Theory, we argue that brief labels may trigger increased visual scrutiny by alerting readers to AI use without providing enough information. NASA-TLX scores and pupil diameter showed no significant differences across conditions, suggesting that AI-use disclosures do not impose cognitive burden regardless of the detail level. Interview insights contextualize these findings and reveal a strong preference for detailed or ``detail-on-demand'' designs. Our findings inform the design of gaze-informed adaptive disclosure interfaces that dynamically adjust transparency levels based on readers' attentional patterns and news context.

Editor's pick
Digital Journal· 4 days ago

Why lasting AI transformation begins with people, process, and shared ownership across organizations - Digital Journal

Quail Group, a consultancy focused on organizational alignment, process improvement, and behavioral change, observes that organizations continue pursuing the next wave of technology in pursuit of greater efficiency and performance. Yet each new tool can introduce additional layers of work when ...

Editor's pickProfessional Services
CAclubindia· 4 days ago

How AI is Revolutionizing CA Practice in India: A Practical Guide for Chartered Accountants

Discover how AI tools like ChatGPT, Claude, and Gemini are transforming CA practice in India. Learn practical use cases, compliance benefits, client communication strategies, and a step-by-step 30-day AI adoption plan for Chartered Accountants.

AI Applications3 articles
Editor's pickHealthcare
Arxiv· 3 days ago

The Empirically Grounded Adaptive Virtual Patient for Psychotherapy Training: Disclosure That Responds to Therapist Micro-Skills

arXiv:2606.10051v1 Announce Type: new Abstract: Simulated patients offer a scalable way to train psychotherapy micro-skills such as empathic responding and exploratory probing, but current systems either follow fixed scripts or rely on LLMs that drift unpredictably over long sessions. We present the Adaptive Virtual Patient (AVP), which adapts its disclosure behavior -- from guarded, through moderate openness, to full disclosure -- in response to trainee skill. The AVP is grounded in a structural equation model fit to nearly 2{,}000 hours of real-world psychotherapy transcripts, which quantifies how therapist empathy and exploration shift a patient's openness over time. An LLM generates the AVP's utterances conditioned on a disclosure level that the dynamics module updates each turn. In an evaluation with 20 clinicians and trainees over 80 sessions (1{,}033 turns), the AVP's disclosure rises in response to therapist empathy and exploration, while a prompt-only baseline stays flat; ablations confirm that the empirically motivated parameterization outperforms alternatives, with exploration carrying most of the adaptive signal.

Editor's pickTechnology
VentureBeat· 4 days ago

Apple’s new Siri AI is more than just a smarter assistant — it's a new enterprise app layer

Apple’s new Siri AI, unveiled yesterday at Apple's annual Worldwide Developers Conference (WWDC 2026), may look like a consumer product story on the surface. But for enterprise developers and IT leaders, the bigger news from WWDC26 is that Apple is turning Siri into a systemwide AI interface for apps, data and workplace actions across iPhone, iPad, Mac, Apple Watch and Vision Pro, as revealed in the WWDC26 Apple Intelligence developer guide. In other words, if your company offers an application on Apple devices, whether it's served on iOS mobile device or Mac, the new Siri AI may force you to change how that application is discovered, served, and its contents and workflows made available to end users. Enterprise developers can expose app content through App Entities, make it available to Apple’s Spotlight semantic index, define actions through App Intents and App Schemas, and map onscreen user interface elements to app objects through View Annotations. That makes Siri AI much more than a voice assistant. Apple is positioning it as an AI-powered app action and content-discovery layer built into its operating systems. Siri becomes an app action layer For enterprise developers, the shift could be significant. A business app that properly adopts Apple’s new frameworks could let users ask Siri to find, summarize, update or act on app content without the developer having to build a separate chatbot interface. Apple says App Intents, its existing framework for exposing app actions to system features like Siri and Shortcuts, is the path for connecting apps to Apple Intelligence and Siri AI, while schemas make app content and actions usable through natural language. In practical terms, that could apply to customer records in a CRM, open tickets in an IT service desk, project tasks, invoices, calendar events, documents, expenses, notes, messages or field-service records. Instead of opening an app, searching manually and clicking through menus, an employee could ask Siri to act on the specific object they are viewing or retrieve a related item from another app. Spotlight becomes the enterprise search hook Apple says in its WWDC26 Apple Intelligence guide that entity schemas contribute app content to the Spotlight semantic index, while intent schemas let users take action on that indexed content without developers defining a rigid list of command phrases. Apple also says the new View Annotations API lets developers map views to entities so users can refer to what is onscreen conversationally — for example, “summarize this customer thread,” “add this invoice to my expenses,” or “follow up on this task tomorrow.” That is an important distinction from earlier voice-assistant integrations, which often required narrow command structures and explicit invocation phrases. Apple is instead giving developers a way to describe an app’s data and capabilities so Siri, Spotlight and Shortcuts can use them through the system. Developers get testing tools for Siri and app actions Apple is also adding AppIntentsTesting, a framework that validates App Intents through the same infrastructure used by Siri, Shortcuts and Spotlight without requiring UI automation. That matters for enterprise software teams because natural-language app actions need to be testable, repeatable and reliable before they are trusted in production workflows. It also gives developers a path to include Siri and Spotlight behavior in ordinary testing pipelines instead of treating assistant integration as a manual demo feature. The result is a clearer developer mandate: if an app wants to show up well inside Siri AI, it will likely need to expose its data, actions and onscreen context through Apple’s system frameworks. For enterprise SaaS vendors, that could become an important part of Apple-platform competitiveness, especially in categories such as productivity, collaboration, CRM, project management, finance, design, knowledge management, healthcare, logistics and field operations. Apple expands its model stack for developers Apple is also using WWDC26 to expand its AI developer stack beyond Siri. The updated Foundation Models framework gives Swift developers access to Apple’s on-device models, Apple models running through Private Cloud Compute and third-party model providers that conform to Apple’s Language Model protocol. That gives developers more flexibility than a single Apple-only model path. Apple says in its Apple Intelligence developer guide that the framework now supports multimodal prompts, Vision tools, dynamic model profiles and evaluations. In theory, an enterprise app could use an Apple on-device model for private or lightweight tasks, call Apple’s Private Cloud Compute for heavier reasoning, or plug in an outside provider such as Claude, Gemini, an open-source model or a company-controlled model through Apple’s model-provider interface. Core AI brings custom models onto Apple silicon Apple is also introducing Core AI, an operating system-level framework for running developers’ own models on Apple silicon. For enterprises that do not want sensitive data sent to a cloud model at all, local inference remains one of Apple’s most important advantages. Core AI gives developers a first-party way to deploy custom models with Swift APIs, memory controls and optimized execution on Apple hardware. Evaluations signal a more mature enterprise AI posture The company’s new Evaluations framework also points at a more mature enterprise AI posture. AI features are difficult to test with conventional unit tests because model outputs can vary. Apple says the framework helps developers define metrics, automatically grade outputs and aggregate statistics. For enterprise buyers, that matters because AI features need measurable reliability, not just impressive demos. Apple is also explicitly addressing the security risks of app agents. WWDC26 developer materials include a session on how developers can mitigate risks to agentic features, covering indirect prompt injection, data exfiltration, unintended actions, threat modeling, user confirmations, authentication and safeguards for App Intents and Foundation Models. That is a notable acknowledgement that AI assistants able to read context and take action across apps create new attack surfaces. Enterprise IT gets new Apple Intelligence controls For enterprise IT, Apple also answered some of the governance questions raised by Siri AI’s initial announcement. Its WWDC26 device management documentation describes new management controls for Apple Intelligence, Siri and external intelligence integrations. Supervised devices can use Apple’s intelligence settings configuration to allow or deny features such as Genmoji, Image Playground, Writing Tools, Image Wand, app-specific intelligence in Mail, Notes and Safari, Apple Intelligence Report, Visual Intelligence Summary and on-device-only processing for dictation and translation. Apple says additional management for Siri AI and Visual Intelligence will arrive in later beta releases. That means enterprise controls are not complete yet, but Apple is clearly building Siri AI into its managed-device architecture rather than treating it as an unmanaged consumer feature. Apple also adds controls for outside AI services Apple is also adding controls for external intelligence services. Its deployment docs describe a configuration for managing external intelligence integrations, including whether users can access outside AI services and whether they can sign in to those services. That will matter for organizations trying to control when employees use Apple’s own models, Apple’s private cloud architecture or third-party AI systems. Those controls could help Apple compete with Microsoft and Google in enterprise AI, but with a different pitch. Microsoft Copilot and Google Gemini are tied deeply to their respective productivity clouds. Apple’s strategy is more device- and OS-centered: make AI available where the user already works, expose app actions through system frameworks and emphasize on-device processing and Private Cloud Compute as privacy advantages. Apple’s privacy pitch remains central Apple’s privacy architecture remains central to that pitch. Siri AI uses Apple Foundation Models on device and through Private Cloud Compute. Apple says in its Siri AI announcement that requests handled by Private Cloud Compute do not store personal data or make it accessible to Apple. For industries such as healthcare, financial services, legal, education and government, that claim may be more important than any single assistant feature. But enterprises will still need more detail before treating Siri AI as a fully governed workplace assistant. Apple’s WWDC26 materials show progress on management controls, external AI restrictions and app-level governance, but the full picture is still emerging. Key questions remain around auditability, retention, work-versus-personal data boundaries, role-based access, compliance certifications, and how much control IT departments will have over Siri’s ability to act inside specific business apps. Availability limits could complicate rollout Availability also complicates enterprise rollout. Siri AI is in developer testing now for iOS 27, iPadOS 27, macOS 27 and visionOS 27, with watchOS support coming in a later beta. Apple says the user-facing beta arrives later this year. The feature requires Apple Intelligence-capable hardware, which means many older corporate devices will not support it. Apple also says Siri AI will not initially be available on iPhone and iPad in the European Union, and that Siri AI and other new Apple Intelligence features are not available in China while the company works through regulatory requirements. That means global enterprises may face fragmented deployment, with different feature availability by hardware, operating system, language and region. App Store changes give business software vendors another opening Apple also introduced enterprise-adjacent App Store changes that could matter for business software vendors. StoreKit 2 will support subscriptions for groups and organizations, including volume purchasing through Apple Business and Apple School Manager. IT teams will be able to buy and assign App Store subscriptions through device management workflows, while developers will be able to manage subscription availability for organizations. That gives Apple a more business-friendly path for selling app subscriptions into managed environments. The company is also unifying Apple Business Manager, Apple Business Essentials and Apple Business Connect under Apple Business, which Apple describes as a broader platform for Managed Apple Accounts, device management, volume licensing, Admin APIs, Apple Maps locations, Tap to Pay on iPhone, Branded Mail and multi-seat subscriptions. Apple’s enterprise AI strategy comes into focus Taken together, the WWDC26 enterprise story is bigger than Siri alone. Apple is building an AI stack that spans user-facing assistant features, developer integration frameworks, local and private-cloud model infrastructure, AI testing, App Store business subscriptions and device-management controls. The strategic question is whether Apple can make this more than another Siri reset. Developers will need to adopt Apple’s app-intelligence frameworks. Enterprises will need stronger governance assurances. Users will need the assistant to work reliably across real workflows, not just Apple’s own apps. But the direction is now much clearer. Apple is not trying to compete in enterprise AI by launching a standalone chatbot. It is embedding AI into the operating system, making apps addressable through Siri and Spotlight, giving developers model and testing tools, and giving IT teams at least the beginnings of policy controls. For enterprise developers, that means App Intents, App Schemas, App Entities, Spotlight indexing and View Annotations may become core parts of building competitive Apple-platform apps. For enterprise technology leaders, it means Apple’s devices could soon include a native AI assistant that can act across business workflows — if Apple can prove that the privacy, security and management model is strong enough for production use.

AI ROI & Business Case5 articles
Editor's pickProfessional Services
Arxiv· 3 days ago

Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents

arXiv:2606.10209v1 Announce Type: new Abstract: Large language models deployed as autonomous agents for enterprise workflows face a key challenge: verbose tool responses from enterprise systems can cause context overflow, stale-state errors, and high inference cost. We study this problem in automated expense itemization in Microsoft Dynamics 365 Finance and Operations using Model Context Protocol tools. We evaluate four GPT-5 configurations on a 50-task hotel expense benchmark: no user model, full conversation history, context pruned to the last 5 tool call/response pairs, and pruning with automated summarization. Results are averaged across 5 independent runs, with the user model held constant for the context-engineering comparison. The no-user-model baseline achieves only 8.0% complete itemization. Full-context retention improves completion to 71.0%, but consumes 1,480,996 tokens and 14.56 hours per benchmark. Pruning to the last 5 tool calls improves completion to 79.0% while reducing token use to 535,274 and runtime to 5.39 hours. Adding summarization achieves the best result: 91.6% complete itemization and 99.64% average amount itemized, with 553,374 tokens and 5.79 hours. We further report confidence intervals, effect-size analysis, sensitivity over pruning and summary windows, failure analysis, results across five expense types grouped into three categories, and cross-model evidence with Claude Sonnet 4.5. These results show that, for this class of enterprise tool-use workflow, selective retention of recent tool interactions plus compact summarization can improve both reliability and efficiency compared with full-history retention.

Editor's pickHealthcare
Arxiv· 3 days ago

Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

arXiv:2606.10279v1 Announce Type: new Abstract: Supervised fine-tuning with synthetic rationale data is widely assumed to improve language model performance on clinical prediction tasks by teaching models not just what to predict but why. We test this assumption on five-year Alzheimer's disease and related dementias (ADRD) prediction from longitudinal health histories. Across a large-scale controlled experiment of 504 configurations, we find that rationale-based SFT consistently and substantially hurts prediction performance relative to label-only fine-tuning. The degradation persists across model families and data scales, and is not resolved by using a reasoning-oriented base model. Crucially, the failure is not explained by poor rationale quality: human expert annotation confirms that the generated rationales are medically accurate and faithfully grounded in patient-specific evidence, and few-shot experiments show that the same rationales improve performance when used as inference-time demonstrations rather than training targets. We identify the root cause as a structural conflict between narrative plausibility and discriminative optimization. We hope our work paves the path toward a more precise understanding of when and how rationale-based supervision helps and when it does not, guiding the responsible development of language models for high-stakes clinical prediction.

Editor's pickTechnology
StrongMocha· 4 days ago

The Earnings Call Gap: What Q1 2026 Just Told Us About AI ROI - StrongMocha

Analysis of Q1 2026 earnings shows a widening gap between AI investment claims and measurable results, impacting stock reactions and investor confidence.

Geopolitics, Policy & Governance

17 articles
AI Policy & Regulation13 articles
Editor's pickPAYWALLDefense & National Security
FT· 4 days ago

Pentagon restores Alibaba, Baidu and BYD to Chinese military groups blacklist

Three companies reinstated as US national security risk after sudden removal in February

Editor's pickPAYWALLGovernment & Public Sector
FT· 4 days ago

Hungary deploys AI to track alleged Orbán corruption

Head of the country’s anti-graft body says €160bn may have been siphoned off during ex-PM’s 16 years in power

Editor's pickTechnology
Siliconrepublic· 4 days ago

Apple’s Siri AI won’t be available in the EU at launch

Enforcement of Europe’s Digital Markets Act means Apple can't launch the system safely within the EU, the company said. Read more: Apple’s Siri AI won’t be available in the EU at launch

Editor's pickGovernment & Public Sector
🍎 Smarter Siri, sort of· 5 days ago

Scoop: White House and Hill talk state laws

The White House is negotiating a federal preemption of some state AI laws in exchange for support on key tech policy priorities, including legislation to protect kids online.

Editor's pickTechnology
Artificial Intelligence Newsletter | June 10, 2026· 4 days ago

Acute risk to AI market triggered need for Meta action, EU's Ribera says

EU antitrust enforcer Teresa Ribera stated that the rapid development of AI services necessitated an interim injunction against Meta to ensure fair access to distribution channels.

Editor's pick
Artificial Intelligence Newsletter | June 10, 2026· 4 days ago

OpenAI wins in Indian ruling over forced search indexing

The Calcutta High Court dismissed IndiaMart's claims against OpenAI, ruling that businesses cannot legally compel private AI platforms to index or promote their commercial links.

Editor's pickPAYWALLDefense & National Security
Bloomberg· 4 days ago

Leonardo’s New CEO Sees Green Light for Space Venture Next Year

Leonardo SpA’s new chief executive officer expects European regulatory approval for its satellite venture with Airbus SE and Thales SA by the end of next year, clearing the way for one of Europe’s most ambitious defense and space projects.

Editor's pickGovernment & Public Sector
KGOU· 4 days ago

Oklahoma Ethics Commission, political leaders weigh future of AI-generated ads | KGOU - Oklahoma's NPR Source

In response to concerns from candidates, elected officials and the public, Oklahoma's Ethics Commission began exploring the regulation of AI-generated campaign ads at a special meeting Friday.

Editor's pickGovernment & Public Sector
Daily Brew· 4 days ago

UK Pushes Device-Level Protections to Shield Kids from Online Harm

The UK government plans to enforce device-level protections to shield children from online abuse, shifting responsibility onto tech platforms.

Editor's pickTechnology
Top Daily Headlines: Signal says UK plan to scan devices for nude images 'endangers us all'· 3 days ago

Signal says UK plan to scan devices for nude images 'endangers us all'

Encrypted messaging app warns device-level checks could be repurposed for censorship.

Editor's pick
Artificial Intelligence Newsletter | June 10, 2026· 4 days ago

France, Germany set to unveil joint ‘European digital service’ criteria

France and Germany will unveil a joint definition of a 'European digital service' at the VivaTech conference to help shape future EU digital sovereignty rules.

Editor's pickTechnology
Guardian· 4 days ago

Crackdown on tech platforms will go ahead despite US intervention, says No 10

US embassy came out against UK’s proposed under-16 social media ban, which would affect American firms White House displeasure over the prospect of an under-16 social media ban will not deter the UK from cracking down on tech platforms, the British government has said. The technology secretary, Liz Kendall, told the Guardian she was not concerned “in the slightest” by the Trump administration’s intervention in the debate over restrictions, after the US embassy in London posted a notice warning against a ban. Continue reading...

Editor's pickTechnology
Artificial Intelligence Newsletter | June 10, 2026· 4 days ago

Snapchat faces Dutch class action over online safety, data concerns

The Foundation for Market Information Research has filed a mass lawsuit in The Netherlands against Snapchat, alleging breaches of EU online safety, AI, and data protection regulations.

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.