Tue 30 June 2026
Daily Brief — Curated and contextualised by Best Practice AI
The BIS warns of bubbles, Korea subsidizes chips, and Ford rehires
TL;DRThe Bank for International Settlements warned that a $1 trillion AI investment boom risks a dotcom-style correction. South Korea launched a $576 billion chip initiative, while Taiwan raided Super Micro over alleged illegal shipments of Nvidia hardware to China. Meanwhile, Ford has begun rehiring human engineers after AI-driven quality checks failed to meet production standards. Meta reported a 25% reduction in inference hardware requirements through custom CXL ASIC development.
The stories that matter most
Selected and contextualised by the Best Practice AI team
The central bank of central banks just released its flagship annual report — and it sees a $1 trillion AI investment boom headed for a reckoning | Fortune
The Bank for International Settlements draws an explicit parallel to the dotcom crash and railway mania, but the stakes are bigger now.
AI Premium
arXiv:2606.30583v1 Announce Type: cross Abstract: Using 380 trillion tokens of realized AI consumption across more than four hundred large language models from the licensed proprietary OpenRouter dataset covering approximately 2 percent of current global monthly AI token consumption, we analyze how AI affects firms, markets, and workers. Leveraging the unprecedented size, scope and granularity data, we construct the AI Factor from growth in tokens, dollars, and users, estimate firm-level AI Betas from stock return comovement, and characterize the AI Premium. First, we build a high-frequency AI factor and decompose it into salient components. Second, we show that firms whose returns covary more positively with the AI factor--high AI beta firms--earn higher subsequent returns, and the AI premium is large and heterogeneous. A value-weighted long-short strategy earns 64.1 basis points per week, and the premium is large for loadings on the intensive, frontier-oriented margin of AI consumption-closed-source models, paying and seasoned users, and long prompts--but not on casual or open-weight use. Third, the premium reaches beyond technology firms into consumer-facing and capital-heavy parts of the economy, but is absent in emerging markets, including China. Fourth, the AI exposure is more positive in nonroutine interactive work and the more negative in analytical, scientific, and operations-control skills--an occupation one standard deviation higher in interaction-and-communication content has 0.36-standard-deviation higher market-implied AI premium. Additionally, we provide early evidence of the rise of the agentic economy.
Korea taps Samsung, SK Hynix in $576 billion AI-chip drive to cement global leadership
Reuters.com is your online source for the latest Asia news stories and current events, ensuring our readers up to date with any breaking news developments
Taiwan Raids Super Micro in Widening China Chip Smuggling Probe
Taiwan government agencies raided the offices of Super Micro Computer Inc. and several of its local affiliates, deepening an investigation into the alleged smuggling of Nvidia Corp. chips into China using the company’s servers.
Managing the Human Fallback: Skill Investment Under Improving AI and Worker Mobility
arXiv:2606.29111v1 Announce Type: cross Abstract: When firms deploy autonomous AI, they must decide how much work to leave to the system and how much to keep workers engaged. This decision affects current output and future human capital. We develop a parsimonious two-period model in which AI may outperform the worker when it functions, but may fail with positive probability. A firm chooses worker engagement; engagement lowers current output for below-benchmark workers, but changes future skill through learning and erosion. We distinguish two dimensions of AI progress: capability, the system's output when it works, and reliability, the probability that it works. In a single-firm benchmark, engagement is valuable only as fallback investment. The firm engages the least-skilled workers most, because they have the largest skill gaps and are least costly to bring toward a useful fallback level. With worker mobility, engagement also affects labor-market sorting: workers prefer jobs that build more valuable skill trajectories. This sorting motive targets higher-skill workers near the AI frontier, where skill gains are more valuable and engagement is less costly. Mobility can therefore reverse the engagement pattern, shifting investment from the least-skilled toward the most-skilled workers below the AI benchmark. Mobility also reshapes how AI progress affects engagement: greater capability raises engagement by increasing the value of the skill trajectory a firm offers, whereas greater reliability can raise or lower it because it reduces fallback need while also changing learning opportunities. Under worker mobility, human-AI work design becomes a problem of human-capital investment, in which allocating work today shapes future skill.
Heavy corporate AI spenders add staff faster than peers
Study of 22,000 US companies challenges fears that generative AI will trigger broad job losses
Ford rehires human engineers after AI fails to match quality checks
The car-maker found AI quality checks failed to match the skill of veteran technicians.
Meituan open sources LongCat-2.0, the 1.6T, near-frontier agentic coding model that's been leading OpenRouter — trained entirely on Chinese chips
A few hours ago, Chinese delivery app company Meituan officially unveiled LongCat-2.0 on GitHub, Hugging Face, and its native platform, unmasking the model as the computational engine behind "Owl Alpha," the anonymous stealth model that has spent the last two months commanding global developer charts on OpenRouter. Developed to fundamentally disrupt closed-source enterprise dominance in autonomous software engineering, the 1.6-trillion-parameter Mixture-of-Experts (MoE) system brings a native 1-million-token context window to the public domain under a highly permissive, enterprise grade, commercially viable MIT license. Commercial access to the architecture introduces a highly aggressive pricing tier, deploying a mechanism where all context-cache hits are processed completely free of charge, running alongside a time-limited "Token Pack" flash-sale paradigm. There's also a typical "pay-as-you-go" API for non-cache hits standard priced at $0.75/$2.95 per million tokens in/out. However, a limited-time promotional discount aggressively slashes these operational expenditures down to $0.30 per million tokens for uncached input and $1.20 per million tokens for output, both on the cheaper-end of top performing models globally. Model Input ($/1M) Output ($/1M) Total ($/1M) Source MiMo-V2.5 Flash $0.10 $0.30 $0.40 Xiaomi deepseek-v4-flash $0.14 $0.28 $0.42 DeepSeek deepseek-v4-pro $0.435 $0.87 $1.305 DeepSeek MiniMax-M3 $0.30 $1.20 $1.50 MiniMax LongCat-2.0 — limited-time promo $0.30 $1.20 $1.50 LongCat Gemini 3.1 Flash-Lite $0.25 $1.50 $1.75 Google Qwen3.7-Plus $0.40 $1.60 $2.00 Alibaba Cloud MiMo-V2.5 $0.40 $2.00 $2.40 Xiaomi LongCat-2.0 — standard $0.75 $2.95 $3.70 LongCat Grok 4.3 (low context) $1.25 $2.50 $3.75 xAI MiMo-V2.5 Pro (≤256K) $1.00 $3.00 $4.00 Xiaomi Kimi-K2.6 $0.95 $4.00 $4.95 Moonshot AI GLM-5.2 $1.40 $4.40 $5.80 Z.ai GPT-5.6 Luna $1.00 $6.00 $7.00 OpenAI Grok 4.3 (high context) $2.50 $5.00 $7.50 xAI MiMo-V2.5 Pro (>256K) $2.00 $6.00 $8.00 Xiaomi Qwen3.7-Max $2.50 $7.50 $10.00 Alibaba Cloud Gemini 3.5 Flash $1.50 $9.00 $10.50 Google Gemini 3.1 Pro Preview (≤200K) $2.00 $12.00 $14.00 Google GPT-5.6 Terra $2.50 $15.00 $17.50 OpenAI GPT-5.4 $2.50 $15.00 $17.50 OpenAI Gemini 3.1 Pro Preview (>200K) $4.00 $18.00 $22.00 Google Claude Opus 4.8 $5.00 $25.00 $30.00 Anthropic GPT-5.5 $5.00 $30.00 $35.00 OpenAI GPT-5.5 Instant (chat-latest) $5.00 $30.00 $35.00 OpenAI Sakana Fugu Ultra (≤272K) $5.00 $30.00 $35.00 Sakana AI GPT-5.6 Sol $5.00 $30.00 $35.00 OpenAI Claude Fable 5 / Claude Mythos 5 $10.00 $50.00 $60.00 Anthropic What makes the release a definitive inflection point for global tech infrastructure is its operational independence: the massive model was trained entirely on a cluster of over 50,000 domestic Chinese Application-Specific Integrated Circuits (ASICs), proving that near-frontier AI models can be scaled successfully without relying on the typical U.S. Nvidia GPUs that have, to date, powered much of the global generative AI frontier model training effort. This successful deployment of alternative silicon signals a profound structural shift. If Chinese conglomerates can consistently iterate trillion-parameter architectures using homegrown ASICs rather than general-purpose GPUs, it would seem to threaten Nvidia's dominance in this sector. Crucially, this technological pivot arrives precisely as Washington pressures top-tier American labs to restrict access to their latest models. Following a U.S. governmental request, OpenAI was forced to limit access to its new GPT-5.6 models, while Anthropic was previously also ordered by the U.S. to restrict access to its latest Claude Fable 5 / Mythos 5 models, which it took entirely offline in response. At the same time, a growing chorus of technologists, activists, and industry experts warn that these defensive regulatory maneuvers have inadvertently backfired. By locking down Western closed-source models and driving up API costs, the U.S. government has left a wide operational window for global developers seeking affordable, high-performance alternatives like those found in Chinese open source models such as Meituan LongCat-2.0. The raw operational metrics backed up the developer enthusiasm: during its unbranded residency on OpenRouter, Owl Alpha accounted for approximately 10.1 trillion monthly tokens—averaging 559 billion tokens per day—representing a 242% month-over-month explosion in volume that propelled it into the platform's global top three. By the time Meituan stepped forward to claim the architecture, the model had already secured the top ranking on the Hermes Agent workspace, second place on Claude Code deployments, and third place across international OpenClaw environments. Technology: Engineering the 1M-Token Sparse Context At the core of LongCat-2.0 lies an aggressive optimization of Mixture-of-Experts (MoE) sparsity, scaling total parameters to 1.6 trillion while limiting active computation to an average of 48 billion parameters per token. Depending on the structural complexity of a query, the model’s dynamic activation ranges from 33 billion to 56 billion parameters. This design implements a "Zero-Compute Experts" framework, ensuring that routine execution elements pass through lighter subnetworks, entirely eliminating the idle computational overhead that typically penalizes ultra-dense models. To sustain a functional 1-million-token context window without incurring catastrophic hardware bottlenecks, Meituan introduced LongCat Sparse Attention (LSA). Designed as an evolutionary iteration of DeepSeek Sparse Attention, LSA resolves the quadratic scoring costs and memory fragmentation that typically plague fine-grained sparse mechanisms through three distinct, orthogonal vectors: Streaming-aware Indexing (SI): This system restructures the token selection pipeline by blending hardware-aligned contiguous data reads with dynamic random selection. By converting fragmented memory access into highly predictable, sequential blocks, the system achieves coalesced High Bandwidth Memory (HBM) utilization and elevated effective bandwidth. Cross-Layer Indexing (CLI): Leveraging the empirical reality that attention saliency remains highly stable across adjacent hidden layers, CLI amortizes calculation costs. A single indexing pass successfully guides multiple consecutive layers during inference, a capability reinforced by cross-layer distillation throughout the training phase. Hierarchical Indexing (HI): This approach applies a coarse-to-fine, two-stage scoring layout. The indexer performs a rapid, approximate block-level recall to filter candidates, before running fine-grained token selection exclusively on the remaining population. Furthermore, Meituan integrated an N-gram Embedding module inherited from its lighter model lines. By expanding parameter allocation in sparse dimensions completely orthogonal to the MoE expert layout, the architecture appends 135 billion parameters to a 5-gram token combination framework. This expands the core embedding space by roughly 100-fold, allowing the model to capture dense local token relationships and accelerate large-batch inference operations by reducing memory Input/Output (I/O) bottlenecks. Product: Post-Training, MOPD Framework and Benchmark Performance While generalist large language models prioritize fluid, conversational interfaces, LongCat-2.0 focuses explicitly on multi-step engineering tasks, tool integration, and automated repository manipulation — agentic tasks, in other words. In standardized assessments, LongCat-2.0 registers an empirical 59.5 on SWE-bench Pro, surpassing GPT-5.5's benchmark of 58.6. The model further establishes its agentic specialization by marking a 70.8 on Terminal-Bench 2.1, a 77.3 on SWE-bench Multilingual, and a 73.2 on the general corporate workflow simulator FORTE. This precise operational behavior is achieved through a structural post-training layer called Multi-Teacher Optimization via Mixture of Specialized Experts (MOPD). Rather than blending raw human feedback into a singular reward function, the MOPD architecture segregates post-training optimization into three independent, highly focused expert clusters. The Agent Experts are fine-tuned strictly for structural execution, specializing in precise tool invocation, multi-turn API parameter parsing, and self-correcting loop mechanisms to avoid execution stagnation. The Reasoning Experts are optimized in isolation to advance multi-hop logic, complex chain-of-thought engineering, mathematics, and high-level STEM problem-solving. The Interaction Experts focus entirely on human alignment, instruction-following nuances, factual grounding to suppress hallucinations, and maintaining rigid safety guardrails without diminishing the model's overall utility. By segregating these vectors during post-training, LongCat-2.0 prevents functional degradation. A dynamic gate-routing mechanism then seamlessly fuses these specialized behaviors at runtime, allowing the final model to coordinate deep reasoning, stable tool execution, and safe user interaction simultaneously While LongCat-2.0 generally trails premium frontier systems like Claude Opus 4.8 across broad general-agent benchmarks such as FORTE and BrowseComp, it explicitly punches above its weight in software engineering. What makes this open-weight architecture special is its hyper-focus on autonomous development; it manages to narrowly exceed OpenAI's proprietary GPT-5.5 on the rigorous software engineering benchmark SWE-bench Pro (scoring 59.5 against 58.6), proving it is highly capable and fiercely competitive for complex coding tasks despite a leaner computational footprint. Commercial Framework: Pay-As-You-Go vs. Flash-Sale Token Packs Meituan's deployment strategy introduces a specialized commercial model that splits network access between conventional real-time API billing and structured "Token Packs". For traditional enterprise integration, standard top-up accounts are available, deducting operational capital in real time based directly on token input and generation metrics. However, to accommodate the unpredictable compute bursts characteristic of autonomous development agents, Meituan launched a structured Token Pack framework. Purchased as fixed, one-time volumetric allocations valid for a strict 30-day window, these packages stack directly on top of an organization's existing baseline API account. To manage network load across its ASIC clusters, Meituan releases these high-volume packages via limited flash sales four times daily, precisely at 10:00, 16:00, 21:00, and 23:00 Beijing Time on a first-come, first-served basis.The economic standout of this framework is the zero-charge processing of context cache hits. In massive agentic environments where a coding assistant must repeatedly read, reference, and modify the same multi-million-token code repository over an extended session, standard architectures penalize developers by charging full pricing for repeated input context. Under Meituan's infrastructure, only cache-miss inputs and final token generations consume the package quota. This architecture completely alters the operational cost economics of large-scale agent software development, enabling deep iterative context exploration without compounding costs. Licensing: Open-Source Structural Freedom By registering the LongCat-2.0 repository under the open-source MIT License, Meituan positions the architecture with maximum legal flexibility for enterprise integration. In contrast to copyleft paradigms like the GNU General Public License (GPL)—which legally obligates developers to open-source any derivative frameworks or internal software that links to the code—the MIT license permits near-unrestricted freedom. For corporate engineering teams, this legal standard ensures that LongCat-2.0 can be deeply modified, compiled, and hard-coded directly into closed-source commercial applications, proprietary dev tools, and internal automation backends. Corporations can fork the repository, optimize the internal LSA mechanisms for private databases, and sell the resulting software stack to end users without any obligation to disclose their proprietary intellectual property or structural enhancements. Meituan's Evolution: From Delivery Super App to AI Powerhouse Founded in March 2010 by serial entrepreneur Wang Xing, Meituan initially launched as a Groupon-style daily deals website before rapidly evolving into one of China’s dominant “super apps”. Following a massive 2015 merger with Dianping, the Beijing-based tech giant solidified a dominant market share over the country's urban delivery corridors, bridging local consumer reviews, instant retail, hotel bookings, and food delivery. Operating as a publicly traded powerhouse on the Hong Kong Stock Exchange, Meituan claims over 770 million annual transacting users and supports a network of more than 14.5 million merchants. However, faced with intense domestic market competition, severe margin compression, and a sliding profit margin, the company aggressively pivoted its strategy beyond logistics. Meituan publicly committed to investing "billions" into artificial intelligence and domestic chip capabilities to revitalize its technology-driven offerings. This strategic shift into the global AI race began materializing in late 2025 with the release of LongCat-Flash, a 560-billion-parameter Mixture-of-Experts foundation model, followed quickly by the advanced reasoning model LongCat-Flash-Thinking. By open-sourcing these frontier-class models under enterprise-friendly licenses, Meituan signaled its ambition to become a foundational player in global AI infrastructure rather than remaining strictly a regional e-commerce and delivery giant. Enterprise Implications: Autonomous Operational Workflows For modern enterprises, the release of LongCat-2.0 unlocks clear operational strategies across software engineering, system operations, and long-form data interpretation. The combination of an open-weight, MIT-licensed model with an expansive 1-million-token context window means organizations can bypass the data privacy concerns and recurring overhead associated with hosting proprietary third-party APIs.In large-scale enterprise development environments, teams can leverage the model's specialized Agent Experts to orchestrate autonomous codebase migrations. Instead of dedicating hundreds of developer hours to manually rewriting legacy application frameworks, engineers can pass an entire enterprise repository along with modern SDK documentation directly into the 1-million-token context window. LongCat-2.0 can map the dependencies, execute the repository-level structural updates, compile the new codebase, and catch compilation and execution bugs autonomously within local sandbox environments before generating a final pull request. The model's architectural separation via the MOPD gate-routing mechanism yields significant advantages for strict enterprise compliance. By routing specific operational queries through isolated expert clusters, a financial institution or healthcare firm can deploy deep logic and mathematical reasoning passes without risking factual hallucination or violating strict safety bounds. The Interaction Experts function as an implicit guardrail layer, suppressing errors and enforcing instruction-following protocols without degrading the raw processing power of the internal Reasoning Experts. Combined with the zero-cost caching model, enterprises can maintain hyper-focused autonomous software networks that can repeatedly inspect corporate data pools, continuously maintaining and optimizing internal infrastructure at a fraction of standard operational costs.
Financing Artificial Intelligence Infrastructure: Mapping AI Infrastructure Investment and Compute Governance Across Africa
arXiv:2606.28404v1 Announce Type: new Abstract: Artificial intelligence depends on large-scale compute resources and their supporting infrastructure. However, AI governance debates treat compute primarily as a technical input rather than as an outcome of investment, ownership, and financial control. This paper examines AI infrastructure investment flows across Africa through a systematic analysis of 46 publicly announced projects totalling USD $12.7 billion between 2019 and 2025. Using a value chain framework, we analyze who invests in AI-relevant infrastructure and where investments concentrate. Our findings reveal a highly concentrated landscape dominated by global data center operators, hyperscale technology firms, and development finance institutions, clustering in South Africa, Kenya, Nigeria, and Egypt. We introduce asymmetrical interdependence to describe a structural condition in which capital and physical infrastructure account for 73% of total funding while control remains concentrated in the compute layer among a small number of global technology firms. We argue that compute governance must account for capital flows, ownership, and control, not only geographic access, because these dynamics shape AI compute equity. Infrastructure presence is necessary but insufficient for meaningful governance capacity.
Zuck saves Meta bucks by reusing memory from old servers with a custom CXL ASIC
In production on millions of boxes and the payoff is a 25% reduction in machines needed for some inference workloads.
Measuring Racial Disparities in Rent Growth Under Algorithmic Landlord Concentration in U.S. Metros
arXiv:2606.27525v2 Announce Type: replace Abstract: The 2024 Department of Justice antitrust complaint against RealPage, Inc. named five major residential REITs for coordinating algorithmic rent pricing across hundreds of thousands of apartment units in major US metropolitan areas. This paper studies whether census-tract-level corporate landlord concentration (CLC), measured from SEC EDGAR 10-K property filings geocoded to census tracts, the first such application in the literature, is associated with rent growth 2019-2023, and whether that association is larger in majority-minority neighborhoods. Rent outcomes are measured using the Zillow Observed Rent Index (ZORI). To account for the possibility that corporate landlords preferentially locate in neighborhoods already seeing rent appreciation, all regressions control for a fully novel Algorithmic Housing Burden Index (AHBI), a composite of pre-existing rent burden and market tightness from ACS data. Across 665 census tracts in ten US metropolitan areas, doubling REIT concentration is associated with 2.8 percentage points higher rent growth (p = 0.086, p = 0.030, HC1 robust). This association is significantly stronger in majority-minority tracts. Within the same metro, high-CLC majority-minority tracts are associated with 5.9 percentage points higher rent growth than comparable white tracts (p = 0.039). An XGBoost model predicts 44 percent of out-of-sample rent growth variance, with SHAP analysis independently confirming that CLC's contribution is positive in minority tracts and negative in white tracts. Taken all together, these findings provide the first tract-level evidence consistent with corporate landlord concentration being associated with disproportionately higher rent growth in communities of color.
Agent Safety Is Action Alignment
arXiv:2606.28739v1 Announce Type: new Abstract: Large language models increasingly act as agents: they call tools, move money, delete records, and send messages on a user's behalf. To keep them safe, practitioners imported the chatbot-era recipe (train the model to refuse unsafe inputs) into the agentic setting, and treat the resulting capability loss as a manageable ``alignment tax.'' We argue this is a \emph{category error}. Refusal is a primitive for \emph{content safety}, where the harm is in the model's output and is therefore a learnable function of it. Agentic harm is different in kind: it lies not in any output but in the relation between the authority an action exercises and the authority the user granted, which is absent from the text the model sees. Importing content-safety methods into this regime does not trade capability for safety; it pays capability and buys negative security. We support this with three lines of evidence spanning the autonomy spectrum: defense-trained models learn surface patterns rather than intent; the same training collapses multi-step agents before any threat appears while leaving them exploitable; and even undefended frontier models exceed granted authority under ordinary use. We conclude that action safety cannot be installed in weights. It must be expressed as \emph{least privilege}, enforced \emph{outside} the model at the action boundary, and evaluated as \emph{action alignment} (a relational, deployment-conditioned property) rather than a refusal score.
Economics & Markets
LSEG CEO on AI Market Impact, Share Buybacks, Volatility
David Schwimmer, CEO of the London Stock Exchange Group (LSEG), discussed the transformative impact of artificial intelligence on the financial markets. He emphasized that despite a challenging 2025 for shares, AI presents a significant opportunity for LSEG by enhancing the value of its proprietary data, which constitutes over 90% of its dataset. He speaks with Romaine Bostick & Katie Greifeld on "The Close." (Source: Bloomberg)
Anthropic's Mythos Reveals a Business Model Shift That Should Terrify OpenAI - FourWeekMBA
The Government Contract Is the New Enterprise SaaS When the Trump administration announced it would deploy Anthropic’s Mythos across more than 100 US companies and federal agencies, most coverage treated it as a procurement story. It isn’t. It’s a business model story — and it may be ...
Clients Push Consultants to Adopt Outcome-Based Fees to Share in Risk - Business Insider
Consulting giants like BCG and Accenture are shifting from fixed rates to outcome-based fees as clients push them to share the risk of AI integration.
WPP, Publicis, Omnicom, Havas, Dentsu: Why every advertising holding company is rebuilding itself - Storyboard18
Its recent financial updates continue to emphasise AI, connected media, commerce and digital transformation as core growth drivers rather than traditional advertising services. Also read: From agencies to ecosystems: How holding companies are owning the creator economy ... Havas has chosen evolution over large-scale restructuring. Rather than undertaking major organisational ...
Vibe coding platform Base44 launches own model
Base44 is launching its own proprietary model as AI startups increasingly focus on building defensibility.
Suno launches Spark incubator program to feed independent artists to its AI machine
Suno has introduced an incubator program designed to integrate independent artists into its AI-driven music generation platform.
Tidal won't pay royalties on AI-generated music but isn't banning it outright
Tidal has updated its policy to exclude AI-generated tracks from royalty payments while stopping short of a total ban.
AI Premium
arXiv:2606.30583v1 Announce Type: cross Abstract: Using 380 trillion tokens of realized AI consumption across more than four hundred large language models from the licensed proprietary OpenRouter dataset covering approximately 2 percent of current global monthly AI token consumption, we analyze how AI affects firms, markets, and workers. Leveraging the unprecedented size, scope and granularity data, we construct the AI Factor from growth in tokens, dollars, and users, estimate firm-level AI Betas from stock return comovement, and characterize the AI Premium. First, we build a high-frequency AI factor and decompose it into salient components. Second, we show that firms whose returns covary more positively with the AI factor--high AI beta firms--earn higher subsequent returns, and the AI premium is large and heterogeneous. A value-weighted long-short strategy earns 64.1 basis points per week, and the premium is large for loadings on the intensive, frontier-oriented margin of AI consumption-closed-source models, paying and seasoned users, and long prompts--but not on casual or open-weight use. Third, the premium reaches beyond technology firms into consumer-facing and capital-heavy parts of the economy, but is absent in emerging markets, including China. Fourth, the AI exposure is more positive in nonroutine interactive work and the more negative in analytical, scientific, and operations-control skills--an occupation one standard deviation higher in interaction-and-communication content has 0.36-standard-deviation higher market-implied AI premium. Additionally, we provide early evidence of the rise of the agentic economy.
Financing Artificial Intelligence Infrastructure: Mapping AI Infrastructure Investment and Compute Governance Across Africa
arXiv:2606.28404v1 Announce Type: new Abstract: Artificial intelligence depends on large-scale compute resources and their supporting infrastructure. However, AI governance debates treat compute primarily as a technical input rather than as an outcome of investment, ownership, and financial control. This paper examines AI infrastructure investment flows across Africa through a systematic analysis of 46 publicly announced projects totalling USD $12.7 billion between 2019 and 2025. Using a value chain framework, we analyze who invests in AI-relevant infrastructure and where investments concentrate. Our findings reveal a highly concentrated landscape dominated by global data center operators, hyperscale technology firms, and development finance institutions, clustering in South Africa, Kenya, Nigeria, and Egypt. We introduce asymmetrical interdependence to describe a structural condition in which capital and physical infrastructure account for 73% of total funding while control remains concentrated in the compute layer among a small number of global technology firms. We argue that compute governance must account for capital flows, ownership, and control, not only geographic access, because these dynamics shape AI compute equity. Infrastructure presence is necessary but insufficient for meaningful governance capacity.
The central bank of central banks just released its flagship annual report — and it sees a $1 trillion AI investment boom headed for a reckoning | Fortune
The Bank for International Settlements draws an explicit parallel to the dotcom crash and railway mania, but the stakes are bigger now.
Korea taps Samsung, SK Hynix in $576 billion AI-chip drive to cement global leadership
Reuters.com is your online source for the latest Asia news stories and current events, ensuring our readers up to date with any breaking news developments
How the AI bubble could pop and take down the global economy, according to the BIS
Central bank for central banks sees shades of dotcom mania in hyperscaler capex binge
Magnificent Seven stocks shed $2.3tn in Wall Street tech rotation
Investors switch to soaring chipmakers benefiting from hyperscalers’ vast AI spending
Shares in chipmakers underpinning AI boom rocket in first half of 2026
Value of some chip manufacturers have tripled, or more, driving Asia Pacific stock markets sharply higher Shares in chipmakers have surged in the first half of this year as investors piled into companies that make the hardware underpinning the AI boom, according to analysis. Investors have driven up the value of semiconductor and memory chip manufacturers, whose profits have soared during 2026, at the expense of some large software companies, which have fallen out of favour this year. Continue reading...
In San Francisco’s A.I. Era, Even $180,000 Tech Salaries Are No Longer Enough
As OpenAI and Anthropic prepare to go public, tech workers making six figures are grousing that they cannot compete with the new A.I. elite. Some doubt they can afford to stay.
The AI Investment Cycle Is Driving Global Economic Growth | The WealthAdvisor
The artificial intelligence investment cycle is emerging as one of the most significant drivers of global economic growth, creating new opportunities and considerations...
BIS Annual Report Warns AI Investment Boom Risks Financial Instability - News and Statistics - IndexBox
The BIS warns that massive AI spending by tech giants is building financial vulnerabilities, comparing the boom to historical manias and cautioning that opaque financing could lead to a sharp crash.
AI spending sparks historic Microsoft selloff: Opportunity ahead? - A Historic Decline | The Economic Times
Microsoft shares are on track to record their worst monthly performance since December 2000, marking one of the sharpest corrections in the company's recent history. The stock has lost hundreds of billions of dollars in market value during June as investors reassess the outlook for artificial ...
Prismnews
OpenAI and Anthropic raised a combined ... billion, respectively. The Federal Reserve also said U.S. data-center spending alone was expected to exceed half a trillion dollars in 2025, a scale that has made AI infrastructure one of the biggest investment booms in the econom...
Lenovo falls to one-month low after warning AI will keep memory prices elevated By Investing.com
Follow the latest DRAM, NAND and semiconductor trends with InvestingPro — now 50% off · The moves come after a volatile week for semiconductor stocks, with investors rotating out of AI hardware names following a strong first-half rally despite upbeat demand signals from memory makers. Chip stocks posted their steepest weekly decline since March 2025 as investors locked in gains. Lenovo said DRAM and NAND prices ...
Are the wheels falling off the AI investment boom? - ABC News
Overshadowing the tumult that is global politics has been a financial boom beyond almost anything witnessed before, focused entirely on the perceived benefits wrought by a brave new world of AI that threatens to up-end our lives.
AI infrastructure bets outshine weak Indian stock markets in 2026 | Business News - The Indian Express
FIIs have been record sellers of Indian equities so far in 2026, but rewarded companies with links to AI
Why a collapse in $1 trillion AI spending boom could hit Bitcoin traders first
The BIS says the artificial intelligence buildout could unwind through credit markets, forcing Bitcoin to trade first with risk assets.
AI Infrastructure Investment Boom 2026: Top Stocks & Market Outlook
The artificial intelligence infrastructure boom has evolved from a niche technology trend into a full-blown economic super-cycle that is reshaping global markets in 2026. With tech giants projected to invest over $1.5 trillion in AI infrastructure over the next few years, this transformation ...
AI power demand: Utilities cashing in on data center expansion (AEP:NASDAQ) | Seeking Alpha
AI data centers are driving surging power demand. Discover utilities, power producers, and ETFs positioned to benefit from long-term electricity supply...
Chris Wood boosts SK Hynix and Samsung bets amid South Korea chip pullback | Domain-b.com
Jefferies strategist Christopher Wood increased holdings in SK Hynix and Samsung Electronics while trimming Indian, U.S. and Chinese stocks to capitalize on long-term AI-driven demand for memory chips
📈 Data to start your week
China's young guns, confidence in AI & faster than ever to $1 million
Rocket Lab to acquire Iridium
Rocket Lab has announced a historic deal to acquire Iridium, aiming to create a fully integrated space infrastructure.
BIS says debt, AI boom and fragilities raise global risks
Global pressures from rising public debt to financial fragilities and the sustainability of the AI boom are increasing risks, underscoring the need for disciplined policymaking, according to the Bank for International Settlements.
Bayesian Optimization on the Equilibrium Manifold
arXiv:2606.29299v1 Announce Type: new Abstract: Computing optimal policy in heterogeneous-agent economies is complicated by the possibility of multiple equilibria. We overcome this difficulty by showing that when the equilibrium manifold has a low-dimensional Negishi-weight parameterization, Bayesian optimization reliably finds approximate solutions and can be used to certify candidate solutions with high probability. This insight brings recent machine learning advances to bear on a core problem in macroeconomics. We apply Bayesian optimization to a dynamic economy with heterogeneous agents and climate change and compute optimal carbon taxes in this setting. Although in principle the presence of the carbon externality creates scope for multiple equilibria, we show that in an example with realistic calibration of damages competitive equilibra are most likely unique.
A Knowledge Theory of Capital:The Value of Natural and Artificial Intelligence, Volume 1
arXiv:2606.18288v2 Announce Type: replace Abstract: This volume develops a knowledge theory of capital for economies in which productive capacity increasingly resides in software, data, models, routines, expertise, platforms, organizations, commons, and public epistemic infrastructure. Beginning from Adam Smith's theory of labour, stock, specialization, and market extent, it asks what changes when knowledge becomes stock-like, mobile across forms, scalable, governable, recombinable, and imperfectly visible in accounting. The book introduces knowledge-bearing stock as the central object and analyses how it is generated, converted into governable form, deployed, improved through feedback, enclosed or shared, measured, impaired, and used as input to future production. It distinguishes embodied, disembodied, institutionalized, commons, and public knowledge forms and develops concepts such as first conversion, cognitive enclosure, feedback capture, dark capital, and expected knowledge loss. The argument is conditional and testable: modern wealth depends not only on capital accumulation, but on how productive knowledge is governed.
The Man Who Saw AI Coming
Erik Brynjolfsson wants to talk with you about the future.
The AI boom propping up markets could trigger the next crash, central banks warn | Euronews
We give you the latest climate ... the trends and explain how our planet is changing. We meet the experts on the front line of climate change who explore new strategies to mitigate and adapt. ... Be Bold. Discover Saudi ... People participate in a march to protest the opening of AI data centers in Vancouver, British Columbia, 27 June 2026 - Darryl Dyck/The ...
AI boom could cause the next economic crash, BIS warns | American Banker
Updated June 29, 2026, 3:45 p.m. EDT ... The recent AI spending and investing boom has helped prop up the global economy amid the strains caused by high interest rates, the Iran war and tariffs, the Bank of International Settlements said in a report published Sunday.
Bank Earnings, Credit Supply & the Macroeconomy: Evidence from Canada
arXiv:2606.30381v1 Announce Type: new Abstract: This paper studies whether news about banks' balance sheets propagates to aggregate financial conditions and macroeconomic activity. We construct high-frequency Canadian bank net-worth shocks using stock-price reactions around earnings announcements of the six large Canadian banks. Guided by a model in which higher intermediary net worth expands credit supply and lowers borrowing spreads, we use the co-movement between bank equity prices and Canadian corporate spreads to purge raw bank equity surprises from contaminating information. Favorable purged credit-supply bank net-worth shocks lower corporate spreads, raise bank valuations and broader equity prices, appreciate the Canadian dollar, and increase real activity over the medium run. The results are robust across specifications, samples, and additional outcomes, and suggest that bank earnings news is macroeconomically relevant in concentrated banking systems.
FT Alphaville’s AI Prediction World Cup: Post-group update: Predictions are predictably hard
Many stats; no learnings
Erik Brynjolfsson Profiles AI's Economic Impact | Let's Data Science
Editorial analysis: For AI and data-practitioners, economic framing matters because measuring AI's value requires causal metrics and sector-level signals, not only model benchmarks. The Atlantic published a feature profiling economist **Erik Brynjolfsson** on **June 29, 2026**, reporting that ...
Meituan open sources LongCat-2.0, the 1.6T, near-frontier agentic coding model that's been leading OpenRouter — trained entirely on Chinese chips
A few hours ago, Chinese delivery app company Meituan officially unveiled LongCat-2.0 on GitHub, Hugging Face, and its native platform, unmasking the model as the computational engine behind "Owl Alpha," the anonymous stealth model that has spent the last two months commanding global developer charts on OpenRouter. Developed to fundamentally disrupt closed-source enterprise dominance in autonomous software engineering, the 1.6-trillion-parameter Mixture-of-Experts (MoE) system brings a native 1-million-token context window to the public domain under a highly permissive, enterprise grade, commercially viable MIT license. Commercial access to the architecture introduces a highly aggressive pricing tier, deploying a mechanism where all context-cache hits are processed completely free of charge, running alongside a time-limited "Token Pack" flash-sale paradigm. There's also a typical "pay-as-you-go" API for non-cache hits standard priced at $0.75/$2.95 per million tokens in/out. However, a limited-time promotional discount aggressively slashes these operational expenditures down to $0.30 per million tokens for uncached input and $1.20 per million tokens for output, both on the cheaper-end of top performing models globally. Model Input ($/1M) Output ($/1M) Total ($/1M) Source MiMo-V2.5 Flash $0.10 $0.30 $0.40 Xiaomi deepseek-v4-flash $0.14 $0.28 $0.42 DeepSeek deepseek-v4-pro $0.435 $0.87 $1.305 DeepSeek MiniMax-M3 $0.30 $1.20 $1.50 MiniMax LongCat-2.0 — limited-time promo $0.30 $1.20 $1.50 LongCat Gemini 3.1 Flash-Lite $0.25 $1.50 $1.75 Google Qwen3.7-Plus $0.40 $1.60 $2.00 Alibaba Cloud MiMo-V2.5 $0.40 $2.00 $2.40 Xiaomi LongCat-2.0 — standard $0.75 $2.95 $3.70 LongCat Grok 4.3 (low context) $1.25 $2.50 $3.75 xAI MiMo-V2.5 Pro (≤256K) $1.00 $3.00 $4.00 Xiaomi Kimi-K2.6 $0.95 $4.00 $4.95 Moonshot AI GLM-5.2 $1.40 $4.40 $5.80 Z.ai GPT-5.6 Luna $1.00 $6.00 $7.00 OpenAI Grok 4.3 (high context) $2.50 $5.00 $7.50 xAI MiMo-V2.5 Pro (>256K) $2.00 $6.00 $8.00 Xiaomi Qwen3.7-Max $2.50 $7.50 $10.00 Alibaba Cloud Gemini 3.5 Flash $1.50 $9.00 $10.50 Google Gemini 3.1 Pro Preview (≤200K) $2.00 $12.00 $14.00 Google GPT-5.6 Terra $2.50 $15.00 $17.50 OpenAI GPT-5.4 $2.50 $15.00 $17.50 OpenAI Gemini 3.1 Pro Preview (>200K) $4.00 $18.00 $22.00 Google Claude Opus 4.8 $5.00 $25.00 $30.00 Anthropic GPT-5.5 $5.00 $30.00 $35.00 OpenAI GPT-5.5 Instant (chat-latest) $5.00 $30.00 $35.00 OpenAI Sakana Fugu Ultra (≤272K) $5.00 $30.00 $35.00 Sakana AI GPT-5.6 Sol $5.00 $30.00 $35.00 OpenAI Claude Fable 5 / Claude Mythos 5 $10.00 $50.00 $60.00 Anthropic What makes the release a definitive inflection point for global tech infrastructure is its operational independence: the massive model was trained entirely on a cluster of over 50,000 domestic Chinese Application-Specific Integrated Circuits (ASICs), proving that near-frontier AI models can be scaled successfully without relying on the typical U.S. Nvidia GPUs that have, to date, powered much of the global generative AI frontier model training effort. This successful deployment of alternative silicon signals a profound structural shift. If Chinese conglomerates can consistently iterate trillion-parameter architectures using homegrown ASICs rather than general-purpose GPUs, it would seem to threaten Nvidia's dominance in this sector. Crucially, this technological pivot arrives precisely as Washington pressures top-tier American labs to restrict access to their latest models. Following a U.S. governmental request, OpenAI was forced to limit access to its new GPT-5.6 models, while Anthropic was previously also ordered by the U.S. to restrict access to its latest Claude Fable 5 / Mythos 5 models, which it took entirely offline in response. At the same time, a growing chorus of technologists, activists, and industry experts warn that these defensive regulatory maneuvers have inadvertently backfired. By locking down Western closed-source models and driving up API costs, the U.S. government has left a wide operational window for global developers seeking affordable, high-performance alternatives like those found in Chinese open source models such as Meituan LongCat-2.0. The raw operational metrics backed up the developer enthusiasm: during its unbranded residency on OpenRouter, Owl Alpha accounted for approximately 10.1 trillion monthly tokens—averaging 559 billion tokens per day—representing a 242% month-over-month explosion in volume that propelled it into the platform's global top three. By the time Meituan stepped forward to claim the architecture, the model had already secured the top ranking on the Hermes Agent workspace, second place on Claude Code deployments, and third place across international OpenClaw environments. Technology: Engineering the 1M-Token Sparse Context At the core of LongCat-2.0 lies an aggressive optimization of Mixture-of-Experts (MoE) sparsity, scaling total parameters to 1.6 trillion while limiting active computation to an average of 48 billion parameters per token. Depending on the structural complexity of a query, the model’s dynamic activation ranges from 33 billion to 56 billion parameters. This design implements a "Zero-Compute Experts" framework, ensuring that routine execution elements pass through lighter subnetworks, entirely eliminating the idle computational overhead that typically penalizes ultra-dense models. To sustain a functional 1-million-token context window without incurring catastrophic hardware bottlenecks, Meituan introduced LongCat Sparse Attention (LSA). Designed as an evolutionary iteration of DeepSeek Sparse Attention, LSA resolves the quadratic scoring costs and memory fragmentation that typically plague fine-grained sparse mechanisms through three distinct, orthogonal vectors: Streaming-aware Indexing (SI): This system restructures the token selection pipeline by blending hardware-aligned contiguous data reads with dynamic random selection. By converting fragmented memory access into highly predictable, sequential blocks, the system achieves coalesced High Bandwidth Memory (HBM) utilization and elevated effective bandwidth. Cross-Layer Indexing (CLI): Leveraging the empirical reality that attention saliency remains highly stable across adjacent hidden layers, CLI amortizes calculation costs. A single indexing pass successfully guides multiple consecutive layers during inference, a capability reinforced by cross-layer distillation throughout the training phase. Hierarchical Indexing (HI): This approach applies a coarse-to-fine, two-stage scoring layout. The indexer performs a rapid, approximate block-level recall to filter candidates, before running fine-grained token selection exclusively on the remaining population. Furthermore, Meituan integrated an N-gram Embedding module inherited from its lighter model lines. By expanding parameter allocation in sparse dimensions completely orthogonal to the MoE expert layout, the architecture appends 135 billion parameters to a 5-gram token combination framework. This expands the core embedding space by roughly 100-fold, allowing the model to capture dense local token relationships and accelerate large-batch inference operations by reducing memory Input/Output (I/O) bottlenecks. Product: Post-Training, MOPD Framework and Benchmark Performance While generalist large language models prioritize fluid, conversational interfaces, LongCat-2.0 focuses explicitly on multi-step engineering tasks, tool integration, and automated repository manipulation — agentic tasks, in other words. In standardized assessments, LongCat-2.0 registers an empirical 59.5 on SWE-bench Pro, surpassing GPT-5.5's benchmark of 58.6. The model further establishes its agentic specialization by marking a 70.8 on Terminal-Bench 2.1, a 77.3 on SWE-bench Multilingual, and a 73.2 on the general corporate workflow simulator FORTE. This precise operational behavior is achieved through a structural post-training layer called Multi-Teacher Optimization via Mixture of Specialized Experts (MOPD). Rather than blending raw human feedback into a singular reward function, the MOPD architecture segregates post-training optimization into three independent, highly focused expert clusters. The Agent Experts are fine-tuned strictly for structural execution, specializing in precise tool invocation, multi-turn API parameter parsing, and self-correcting loop mechanisms to avoid execution stagnation. The Reasoning Experts are optimized in isolation to advance multi-hop logic, complex chain-of-thought engineering, mathematics, and high-level STEM problem-solving. The Interaction Experts focus entirely on human alignment, instruction-following nuances, factual grounding to suppress hallucinations, and maintaining rigid safety guardrails without diminishing the model's overall utility. By segregating these vectors during post-training, LongCat-2.0 prevents functional degradation. A dynamic gate-routing mechanism then seamlessly fuses these specialized behaviors at runtime, allowing the final model to coordinate deep reasoning, stable tool execution, and safe user interaction simultaneously While LongCat-2.0 generally trails premium frontier systems like Claude Opus 4.8 across broad general-agent benchmarks such as FORTE and BrowseComp, it explicitly punches above its weight in software engineering. What makes this open-weight architecture special is its hyper-focus on autonomous development; it manages to narrowly exceed OpenAI's proprietary GPT-5.5 on the rigorous software engineering benchmark SWE-bench Pro (scoring 59.5 against 58.6), proving it is highly capable and fiercely competitive for complex coding tasks despite a leaner computational footprint. Commercial Framework: Pay-As-You-Go vs. Flash-Sale Token Packs Meituan's deployment strategy introduces a specialized commercial model that splits network access between conventional real-time API billing and structured "Token Packs". For traditional enterprise integration, standard top-up accounts are available, deducting operational capital in real time based directly on token input and generation metrics. However, to accommodate the unpredictable compute bursts characteristic of autonomous development agents, Meituan launched a structured Token Pack framework. Purchased as fixed, one-time volumetric allocations valid for a strict 30-day window, these packages stack directly on top of an organization's existing baseline API account. To manage network load across its ASIC clusters, Meituan releases these high-volume packages via limited flash sales four times daily, precisely at 10:00, 16:00, 21:00, and 23:00 Beijing Time on a first-come, first-served basis.The economic standout of this framework is the zero-charge processing of context cache hits. In massive agentic environments where a coding assistant must repeatedly read, reference, and modify the same multi-million-token code repository over an extended session, standard architectures penalize developers by charging full pricing for repeated input context. Under Meituan's infrastructure, only cache-miss inputs and final token generations consume the package quota. This architecture completely alters the operational cost economics of large-scale agent software development, enabling deep iterative context exploration without compounding costs. Licensing: Open-Source Structural Freedom By registering the LongCat-2.0 repository under the open-source MIT License, Meituan positions the architecture with maximum legal flexibility for enterprise integration. In contrast to copyleft paradigms like the GNU General Public License (GPL)—which legally obligates developers to open-source any derivative frameworks or internal software that links to the code—the MIT license permits near-unrestricted freedom. For corporate engineering teams, this legal standard ensures that LongCat-2.0 can be deeply modified, compiled, and hard-coded directly into closed-source commercial applications, proprietary dev tools, and internal automation backends. Corporations can fork the repository, optimize the internal LSA mechanisms for private databases, and sell the resulting software stack to end users without any obligation to disclose their proprietary intellectual property or structural enhancements. Meituan's Evolution: From Delivery Super App to AI Powerhouse Founded in March 2010 by serial entrepreneur Wang Xing, Meituan initially launched as a Groupon-style daily deals website before rapidly evolving into one of China’s dominant “super apps”. Following a massive 2015 merger with Dianping, the Beijing-based tech giant solidified a dominant market share over the country's urban delivery corridors, bridging local consumer reviews, instant retail, hotel bookings, and food delivery. Operating as a publicly traded powerhouse on the Hong Kong Stock Exchange, Meituan claims over 770 million annual transacting users and supports a network of more than 14.5 million merchants. However, faced with intense domestic market competition, severe margin compression, and a sliding profit margin, the company aggressively pivoted its strategy beyond logistics. Meituan publicly committed to investing "billions" into artificial intelligence and domestic chip capabilities to revitalize its technology-driven offerings. This strategic shift into the global AI race began materializing in late 2025 with the release of LongCat-Flash, a 560-billion-parameter Mixture-of-Experts foundation model, followed quickly by the advanced reasoning model LongCat-Flash-Thinking. By open-sourcing these frontier-class models under enterprise-friendly licenses, Meituan signaled its ambition to become a foundational player in global AI infrastructure rather than remaining strictly a regional e-commerce and delivery giant. Enterprise Implications: Autonomous Operational Workflows For modern enterprises, the release of LongCat-2.0 unlocks clear operational strategies across software engineering, system operations, and long-form data interpretation. The combination of an open-weight, MIT-licensed model with an expansive 1-million-token context window means organizations can bypass the data privacy concerns and recurring overhead associated with hosting proprietary third-party APIs.In large-scale enterprise development environments, teams can leverage the model's specialized Agent Experts to orchestrate autonomous codebase migrations. Instead of dedicating hundreds of developer hours to manually rewriting legacy application frameworks, engineers can pass an entire enterprise repository along with modern SDK documentation directly into the 1-million-token context window. LongCat-2.0 can map the dependencies, execute the repository-level structural updates, compile the new codebase, and catch compilation and execution bugs autonomously within local sandbox environments before generating a final pull request. The model's architectural separation via the MOPD gate-routing mechanism yields significant advantages for strict enterprise compliance. By routing specific operational queries through isolated expert clusters, a financial institution or healthcare firm can deploy deep logic and mathematical reasoning passes without risking factual hallucination or violating strict safety bounds. The Interaction Experts function as an implicit guardrail layer, suppressing errors and enforcing instruction-following protocols without degrading the raw processing power of the internal Reasoning Experts. Combined with the zero-cost caching model, enterprises can maintain hyper-focused autonomous software networks that can repeatedly inspect corporate data pools, continuously maintaining and optimizing internal infrastructure at a fraction of standard operational costs.
AI drives a boom in new games but big developers dominate
More instinctive technology is accelerating production amid concerns it risks losing gamers’ trust
LivCor deal with states approved by US judge in RealPage litigation
A North Carolina federal judge approved a $7 million settlement between LivCor and nine states regarding allegations of using RealPage's software to align rental prices.
South Korea experiment finds algorithmic self-preferencing sways online purchase
A Korea Fair Trade Commission study found that algorithmic ranking manipulation significantly distorts consumer choice, with platform-favored products seeing higher sales.
Palantir Joins Forces with Army to Enhance AI Tools, Shares Surge 5% on Defense Modernization Project
Palantir is collaborating with the U.S. Army on AI-enhanced tools under the NGC2 program, focusing on rapid prototyping and improved decision-making.
OpenAI-Ona deal sparks regulatory review in Australia
The Australian Competition & Consumer Commission is reviewing OpenAI's proposed acquisition of Gitpod, Inc., now operating as Ona.
Google restricts Meta's access to Gemini AI models
We cannot provide a description for this page right now
Equilibrium Transition from Loss-Leader Competition: How Advertising Restrictions Facilitate Price Coordination in Chilean Pharmaceutical Retail
arXiv:2512.22917v2 Announce Type: replace Abstract: Between December 2007 and April 2008 Chile's three retail pharmacy chains coordinated price increases on 222 medicines, weeks after advertising restrictions ended the comparative-price war that drove prices below cost. I study the transition with a demand-grounded structural model. The mechanism has two parts. Store traffic: comparative-price ads broadcast who is cheapest, so undercutting pays, yielding a below-cost war. Belief: a coordinated increase holds only if rivals expect it matched. The advertising ban moves both: by collapsing price sensitivity it makes undercutting unprofitable for the inelastic majority of drugs, so the coordinated price becomes a static best response, and as a public event it shifts beliefs, releasing the wave. A dynamic model estimated by simulated method of moments reproduces the path--the war, the failed attempts, and the post-ban coordination. The harm is distributional: a transfer to supra-competitive rents, with small deadweight loss because post-ban demand is inelastic.
Oracle vs. IBM: Two Legacy Tech Giants, Two Opposite Bets on AI Survival - FourWeekMBA
What IBM is doing is repositioning ... research partner, the entity that sits one level below the AI gold rush. This is a classic picks-and-shovels business model. During a gold rush, the most durable profits go to those selling equipment, not those digging....
Swimming in Dark Water: When Cartels Mimic Competition
arXiv:2606.30470v1 Announce Type: new Abstract: This paper analyzes the internal organization and economic effects of a bid-rigging cartel in the road construction sector of the Swiss canton of Ticino, active from 1999 to 2005. Using exceptionally rich documentary evidence, we reconstruct how cartel members coordinated bids and allocated contracts under a formal agreement known as the 'convention'. We show that, despite the absence of side payments, the cartel implemented a cost-based allocation mechanism that closely approximated the first-best collusive outcome. Regression and machine-learning analyses indicate that observable cost proxies systematically predict both winning bids and bid rankings. The evidence further suggests that cartel members strategically mimicked competitive bidding behavior, allowing them to evade standard econometric detection methods. Using double machine learning, we estimate average overcharges of at least 45\%, and potentially substantially higher, highlighting the significant financial harm caused by this sophisticated form of collusion.
Businesses face up to budget-busting AI bills
A shift to usage-based pricing and new models is making companies rethink spending
Cheaper AI is better: Soaring bills are reshaping how businesses choose models | MarketScreener
Silicon Valley's powerful and pricey AI models have been a necessity for businesses looking to future-proof themselves. But now a growing number of tech CEOs are arguing that cheaper options would be...
The cost of AI's 'compute' is coming into focus, and it's a lot | American Banker
Banks and other companies are starting to face the true cost of buying AI services, and are already looking to cut corners.
AWS Raises GPU Cloud Pricing As AI Boom Pushes Memory Costs Upward
And the fast growth of generative ... Micron, and Samsung. ... Analysts also expect memory prices to stay elevated through the coming year, since AI investments are still accelerating worldwide, even with everyone watching costs more closely now. AWS’s latest pricing update sort of underlines how the global AI boom is, in a way, rewiring the economics of cloud computing...
Building an app with AI costs 17 times more than getting an answer - CNBC TV18
Tokens are the basic units AI models use to process text. They can be whole words, parts of words, numbers or punctuation. Every prompt and every AI-generated response consumes tokens, making them the standard measure of both computing usage and API costs.
The Tokenpocalypse Is Here: Companies Cut AI Spending in 2026 - Memeburn
AI token costs are forcing companies to cap usage, rethink tools and control spending. Here’s why South African firms should care.
How to Choose Between Small and Frontier Models
An exploration of the rise of small language models and how to decide when to use them versus larger frontier models.
Ford on why it hired 350 ‘gray beard’ engineers: you need their mentorship for younger workers — and to drive huge AI productivity gains
"These engineers carry the hard-earned wisdom of decades of design," Ford told Fortune, adding that AI is important to quality gains.
OpenAIs Codex Drives Shift Toward Agentic Workflows | Let's Data Science
For practitioners, the rise of agentic tools changes the unit of productive work from short chat interactions to delegated, long-horizon tasks, altering automation design and evaluation. Editorial analysis: Agentic workflows raise engineering priorities around stateful orchestration, tool ...
Baidu's AI chip unit Kunlunxin targets $50 billion Hong ...
Reuters.com is your online source for the latest Asia news stories and current events, ensuring our readers up to date with any breaking news developments
Goldman leads $110M bet on Taktile's AI software | American Banker
The company's software takes decisions ... them to AI agents (programs that carry out multistep tasks). ... announced the Series C funding round on Thursday. Goldman Sachs led it; Balderton Capital, Index Ventures, Tiger Global, Y Combinator and Dig Ventures also joined. The startup, founded in ...
Chamath Palihapitiya raises $135M for his AI coding startup
Chamath Palihapitiya has secured $135 million in Series A funding and will take on the role of CEO.
Pocket AI Device Redefines Meetings with Offline Recording, Attracts $11M Funding and Rapid Growth
Pocket, an AI-native hardware device for capturing conversations, has sold over 130,000 units since October 2025 and secured $11 million in funding.
Cork medtech NeuroBell bags $5.5m for its AI neonatal monitor
NeuroBell hopes to nab US clearance for its AI-powered neonatal monitor Luna this year. Read more: Cork medtech NeuroBell bags $5.5m for its AI neonatal monitor
From charging EVs and drones to powering data centres: London-based Gaussion raises €24.5 million
Gaussion, a London-based DeepTech company pioneering energy intelligence technology for battery packs, today announces the close of a €24.5 million ($28 million) funding round – bringing total funding to over €38 million ($44 million). The round was co-led by BGF and AlbionVC, with follow-on participation from mobility specialist fund Autotech Ventures, UCL Technology Fund, DN […]
Italy's first VC consolidation deal makes a €600M case that AI changes everything — TFN
P101 SGR and PranaVentures have formed Italy’s first venture capital consolidation deal, creating a €600M platform to back startups from seed to scale.
AI Agents News Brief: Funding Surges, Enterprise Adoption Grows, and Development Accelerates
The AI agents landscape is experiencing a significant surge in investment and development. Straiker announced a substantial $64 million Series A funding round, bringing their total funding to $85 million, aimed at sec...
Labor, Society & Culture
Managing the Human Fallback: Skill Investment Under Improving AI and Worker Mobility
arXiv:2606.29111v1 Announce Type: cross Abstract: When firms deploy autonomous AI, they must decide how much work to leave to the system and how much to keep workers engaged. This decision affects current output and future human capital. We develop a parsimonious two-period model in which AI may outperform the worker when it functions, but may fail with positive probability. A firm chooses worker engagement; engagement lowers current output for below-benchmark workers, but changes future skill through learning and erosion. We distinguish two dimensions of AI progress: capability, the system's output when it works, and reliability, the probability that it works. In a single-firm benchmark, engagement is valuable only as fallback investment. The firm engages the least-skilled workers most, because they have the largest skill gaps and are least costly to bring toward a useful fallback level. With worker mobility, engagement also affects labor-market sorting: workers prefer jobs that build more valuable skill trajectories. This sorting motive targets higher-skill workers near the AI frontier, where skill gains are more valuable and engagement is less costly. Mobility can therefore reverse the engagement pattern, shifting investment from the least-skilled toward the most-skilled workers below the AI benchmark. Mobility also reshapes how AI progress affects engagement: greater capability raises engagement by increasing the value of the skill trajectory a firm offers, whereas greater reliability can raise or lower it because it reduces fallback need while also changing learning opportunities. Under worker mobility, human-AI work design becomes a problem of human-capital investment, in which allocating work today shapes future skill.
Heavy corporate AI spenders add staff faster than peers
Study of 22,000 US companies challenges fears that generative AI will trigger broad job losses
AI agents are not your "coworkers" | MIT Technology Review
Marketing AI agents as digital employees may make human workers worse at spotting errors and more likely to offload accountability.
AI agents are not your “coworkers”
This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. Imagine coming in to work to learn that a new underling will report to you. The worker is not a person but an AI tool—one that your company nonetheless calls Alex, an…
The most reassuring argument about AI and jobs quietly explains why Gen Z can’t get one
It’s not just the Jevons Paradox—it’s also the lump of labor fallacy. But there’s no God-given right for entry-level work to exist.
CEOs expect large-scale reskilling as AI reshapes workforce, finds report
The EY Ireland CEO Outlook survey found that Irish CEOs remain confident about growth over the next 12 months despite global volatility and geopolitical risk. Read more: CEOs expect large-scale reskilling as AI reshapes workforce, finds report
Advisors see AI as a growing threat to their business, even as they step up adoption | Advisor.ca
Natixis advisor survey also shows concern about wealth transfer
Exclusive: Inside Amazon's brutal AI-centric app-ification of HR - Fast Company
Ineffective chatbots, automated apps, Kafkaesque nightmares: Amazon's relentless focus on efficiency is pulverizing workers' last lifeline of relief. "I watched the human get sucked out of the job," says one former employee.
Tech layoffs 2026 full list: Over 1 lakh jobs cut as AI reshapes workforce - India Today
AI-driven restructuring, cost-cutting and slowing global demand have made 2026 another difficult year for workers. Technology firms, software companies and even automakers have announced more than 1 lakh layoffs, with companies increasingly citing artificial intelligence, automation and business ...
OpenAI chief economist calls for tailored AI strategies in Europe
OpenAI chief economist Aaron Chatterji urges EU countries to develop tailored AI labor strategies, citing a new report showing only 14% of EU jobs face high
AI agents to transform tech teams by 2027 as companies race to adapt: KPMG- Moneycontrol.com
Global survey finds companies accelerating investments in agentic AI, with digital assistants expected to account for over a third of core technology teams by 2027
When Everyone Uses AI, Companies Risk Losing Critical Skills
This article discusses the risk that widespread AI adoption may erode essential human capabilities like critical thinking and judgment if organizations become overly dependent on automation.
The AI jobs debate just got messier
A look at the evolving and complex discourse surrounding the impact of AI on the labor market.
Is AI an exoskeleton for the mind?
Technology that helps people do things they couldn’t otherwise achieve can also lead to atrophy
AI Job Cuts Keep Spreading Across Big Companies - Finimize
A Reuters factbox lists AI-linked layoffs since October 2025, from HSBC’s 20,000 cuts to smaller reductions at tech and finance firms.
AI Management Vital for 92% of Tech Execs by 2031
KPMG report: 92% of tech executives say managing AI agents is a vital skill by 2031. AI assistants to make up 36% of core tech teams by 2027.
Selling AI As A Replacement Wins Attention & Kills Trust
Selling AI as a people replacement is a brand liability. Here's why substitution positioning costs credibility and what employment data actually shows.
The human impact of AI forces a redesign in how we work - IT-Online
At IBM, we believe this is one ... same intentionality we bring to any transformation. ... The study identifies a small but telling group of organisations that have achieved both advanced AI maturity and strong organisational change capabilities....
Measuring Racial Disparities in Rent Growth Under Algorithmic Landlord Concentration in U.S. Metros
arXiv:2606.27525v2 Announce Type: replace Abstract: The 2024 Department of Justice antitrust complaint against RealPage, Inc. named five major residential REITs for coordinating algorithmic rent pricing across hundreds of thousands of apartment units in major US metropolitan areas. This paper studies whether census-tract-level corporate landlord concentration (CLC), measured from SEC EDGAR 10-K property filings geocoded to census tracts, the first such application in the literature, is associated with rent growth 2019-2023, and whether that association is larger in majority-minority neighborhoods. Rent outcomes are measured using the Zillow Observed Rent Index (ZORI). To account for the possibility that corporate landlords preferentially locate in neighborhoods already seeing rent appreciation, all regressions control for a fully novel Algorithmic Housing Burden Index (AHBI), a composite of pre-existing rent burden and market tightness from ACS data. Across 665 census tracts in ten US metropolitan areas, doubling REIT concentration is associated with 2.8 percentage points higher rent growth (p = 0.086, p = 0.030, HC1 robust). This association is significantly stronger in majority-minority tracts. Within the same metro, high-CLC majority-minority tracts are associated with 5.9 percentage points higher rent growth than comparable white tracts (p = 0.039). An XGBoost model predicts 44 percent of out-of-sample rent growth variance, with SHAP analysis independently confirming that CLC's contribution is positive in minority tracts and negative in white tracts. Taken all together, these findings provide the first tract-level evidence consistent with corporate landlord concentration being associated with disproportionately higher rent growth in communities of color.
The Digital Afterlife of Empires: Four Language Models Converge on the Same Imperial Cartography of Writing
arXiv:2606.28325v1 Announce Type: new Abstract: Large language models process the world's writing systems with radical inequality. We constructed the Digital Script Representation Index (DSRI), a seven-axis measure of digital support, and applied it to the 300 writing systems of the Global Script Database (Fukui, 2026). Only 29 scripts (9.7%) are fully supported by contemporary digital infrastructure; among 158 living scripts, 60 (38.0%) lack complete support. Tokenizer efficiency varies by a factor of 31.7 across 45 scripts measured with parallel text. A serial mediation model -- imperial intervention to speaker population to web corpus to tokenizer efficiency -- is consistent with full mediation, with the direct effect of empire indistinguishable from zero (beta = -0.22, p = 0.39) and structural equation model fit indices indistinguishable from saturation at n = 45; the bias-corrected bootstrap CI grazes zero, and we treat the mediation as suggestive rather than confirmatory. Across four independent LLM families (Claude, GPT-4o, Grok, DeepSeek; 12,000 API calls), base-rate-deviation error patterns converge at Spearman rho = 0.85-0.98 (all p < 0.002). 172 script-feature items are answered identically wrong by all four models; over-attribution outnumbers under-recognition 3.9:1, and "used for religion" alone concentrates 43.6% of convergent errors (enrichment 4.1x). With religion excluded as a sensitivity check, the cross-architecture convergence is preserved (mean rho = 0.87 on nine features) and the over-attribution asymmetry persists at 1.77:1 (n = 97, binomial p = 0.008), indicating multi-channeled rather than single-channeled bias. The findings are consistent with an interpretation in which the structural inequalities historical empires inflicted on script communities persist in contemporary language models through the shared training corpus rather than through any individual model's design choices.
Insidious by Design: Implications of Large Language Model algorithmic bias for the Global South
arXiv:2606.28333v1 Announce Type: new Abstract: \begin{quote} The biases in Large Language Models' (LLMs) outputs remain inadequately theorised, particularly from the perspective of the Global South. This article reports on a small-scale exploratory study in which identical prompts were submitted to four major LLMs (ChatGPT, Claude, Grok, and Copilot), firstly, prompting for stories using names suggestive of specific racial and gender communities, and secondly asking questions about `development'. Drawing on critical AI scholarship and postcolonial theory, we argue that LLM outputs are patterned in ways that reproduce racial hierarchies, gender asymmetries, and Western-centric epistemic frameworks. We argue that these biases are insidious: they operate below the threshold of both obvious error and overt prejudice, and instead are subtly embedded in narrative structure and emotional template. Simply put, women, in LLM narratives have rich interior lives, while men make plans. Black people face hardships while white people navigate the world with agency. And explanations as to the economic world order fail to consider Southern explanations. The models perform plausibility while reproducing dominance. We conclude that universities require structural critique of these technologies rather than unreflective adoption, and that critical AI literacy must engage seriously with questions of whose knowledge systems are reproduced and legitimated, or marginalised and undermined.
Could AI create a new form of inequality in South Africa?
If AI is not designed properly, inequalities can reappear in new, digital forms.
When Medical Safety Alignment Fails: A Benchmark for Evaluating LLMs on High-Risk Medical Queries
arXiv:2606.28332v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for medical and health-related questions, yet their safety in high-risk medical scenarios remains poorly understood. We introduce \textsc{MedHarm}\footnote{Code and data will be released upon acceptance. Due to the sensitive nature of high-risk medical queries, data access will be available to qualified researchers upon request.}, a high-risk medical safety benchmark with 1,100 medically grounded queries across 10 safety-critical categories, including toxicology, pharmacology, covert poisoning, anesthesia, and fetal harm. Unlike broad medical QA benchmarks, \textsc{MedHarm} targets realistic clinical, educational, and technical prompts that require refusal, caution, or safe redirection rather than direct helpfulness. We evaluate 15 LLMs spanning general-purpose, medical-purpose, closed-source, and downstream SFT models, together with 4 representative guardrail models. Results reveal a substantial gap between apparent alignment and medical safety: aligned models can still produce unsafe or actionable responses, medical fine-tuning can amplify harmful specificity, and external guardrails reduce some failures while introducing brittle blocking and weak safe helpfulness. These findings show that medical safety cannot be inferred from general alignment or medical capability alone, highlighting the need for domain-specific stress testing before deploying LLMs in safety-critical medical applications.
Agent Safety Is Action Alignment
arXiv:2606.28739v1 Announce Type: new Abstract: Large language models increasingly act as agents: they call tools, move money, delete records, and send messages on a user's behalf. To keep them safe, practitioners imported the chatbot-era recipe (train the model to refuse unsafe inputs) into the agentic setting, and treat the resulting capability loss as a manageable ``alignment tax.'' We argue this is a \emph{category error}. Refusal is a primitive for \emph{content safety}, where the harm is in the model's output and is therefore a learnable function of it. Agentic harm is different in kind: it lies not in any output but in the relation between the authority an action exercises and the authority the user granted, which is absent from the text the model sees. Importing content-safety methods into this regime does not trade capability for safety; it pays capability and buys negative security. We support this with three lines of evidence spanning the autonomy spectrum: defense-trained models learn surface patterns rather than intent; the same training collapses multi-step agents before any threat appears while leaving them exploitable; and even undefended frontier models exceed granted authority under ordinary use. We conclude that action safety cannot be installed in weights. It must be expressed as \emph{least privilege}, enforced \emph{outside} the model at the action boundary, and evaluated as \emph{action alignment} (a relational, deployment-conditioned property) rather than a refusal score.
Ground Truths in Suicide Research: The Current State of AI-Based Suicide Detection in Social Media
arXiv:2606.28334v1 Announce Type: new Abstract: Recent advances in artificial intelligence (AI) and social media data have led to growing optimism about the ability to detect suicide risk at scale. However, the empirical foundations of this work remain unclear. This article provides a synthesis of current research on AI-based suicide detection in social media, drawing on a recent umbrella review of 22 systematic reviews covering studies up to 2022, alongside an ongoing literature review extending the analysis to more recent work. Across these sources, we identified 195 relevant studies, which are documented in a detailed supplementary dataset outlining their key characteristics and findings (see Supplementary Information). Analysis of these studies reveals consistent patterns, including rapid growth, concentration on a small number of platforms, reliance on textual and English-language data, and repeated use of similar datasets. Most importantly, the majority of studies rely on indirect labeling strategies that do not involve direct, individual-level validation of suicide risk. Instead, ground truth is typically inferred from observable features of online content, such as linguistic markers or community membership. As a result, the predictive task often shifts from identifying individuals at risk to classifying posts that contain suicidal or distress-related language, limiting the ability of current approaches to detect individuals who do not express such content explicitly online. These findings suggest that current advances in model performance should be interpreted with caution. Progress in this field is likely to depend less on improving model performance and more on ensuring that model predictions meaningfully correspond to suicide risk as it is experienced in real life.
Aristotelian Virtue Profiling of LLMs through Ethical Dilemmas
arXiv:2606.28683v1 Announce Type: new Abstract: Large Language Models (LLMs) often face ethical tradeoffs in which several responses may be defensible but express different priorities, such as fairness, honesty, courage, or restraint. We introduce VirtueMap, a framework for describing these patterns through an Aristotelian virtue-ethics lens. Instead of asking for a single correct answer, VirtueMap asks humans or LLMs to rank all five responses to each of seven general, non-lethal, non-political, and non-religious ethical dilemmas. To define the reference orderings used for scoring, we first proposed, for each dilemma and virtue, an ordering of the five responses from most to least expressive of that virtue. We then collected more than 100 respondent evaluations per ordering and retained it as operational ground truth only when at least 95% confirmed it. Rankings are scored against these retained orderings using normalized Borda alignment, yielding profiles over Practical Wisdom, Justice, Truthfulness, Courage, and Temperance. We apply VirtueMap to nine LLM families in a repeated-run evaluation and find high mean rank consistency (90.3%), with the largest differences appearing on Courage, Temperance, and Justice. We also release an interactive website that computes profiles locally in the browser and compares respondents with measured LLM profiles.
Are AI chatbots like ChatGPT politically biased? We tested them. - Washington Post
So, are chatbots politically biased? The Washington Post tested the AI models behind Open AI ’s ChatGPT, Google’s Gemini and others using political questions designed by researchers to gauge how chatbots respond to hot-button political issues.
Agentic Safety is an Epistemic Property, Not a Behavioral One
arXiv:2606.28347v1 Announce Type: new Abstract: Contemporary AI safety spans pre-training interventions, post-training alignment, deployment-time controls, monitoring, and red-teaming. These methods are necessary, but they primarily certify snapshots of system behavior. As AI systems become more capable, dynamic, embodied, and self-improving, this snapshot view becomes incomplete: safety depends not only on whether a system behaves acceptably now, but whether it remains correctable as it learns, adapts, acts, and modifies itself over time. This paper argues that safety should therefore be treated as an epistemic property of the evolving learner, not merely a behavioral property of the current policy. We introduce teachability as the capacity to preserve future corrective leverage under bounded human, institutional, or environmental intervention. We argue that advanced systems can retain visible competence while eroding the representational, algorithmic, or meta-decision conditions needed for future correction. Safe advanced AI systems must not only behave acceptably now; they must remain teachable later.
Defeat Devices in AI Systems
arXiv:2606.28863v1 Announce Type: new Abstract: AI systems increasingly exhibit behavior that differs systematically between evaluation and deployment contexts. Alignment faking, sandbagging, benchmark gaming, deceptive scheming, specification gaming, and trojans have each been documented separately, with each line of work characterizing one facet of what we argue is a single structural mechanism. We propose that this common mechanism is a defeat device, an engineering and regulatory concept long established in vehicle-emissions law and brought to broad public attention by the 2015 Volkswagen emissions case. A defeat device in an AI system has three necessary elements: a discriminator that detects evaluation context, a concealed swap that conditions behavior on detection, and a gap between eval-distribution and deployment-distribution performance on the stated evaluation criterion. We formalize this triadic test as a behavioral definition, organize documented cases along three taxonomic axes (origin, trigger, swap mechanism), propose Trigger-Axis-Aware Differential Probing (TADP) as a forensic detection protocol, and advance the claim that defeat devices can naturally emerge in current frontier AI systems without any operator engineering. We characterize naturally-emerging defeat devices as potentially one of the harmful emerging phenomena that AI safety practice should monitor and test for systematically. Implications for evaluation methodology, post-training pipeline design, interpretability research priorities, and AI governance follow.
From Prompting to Epistemic Proactivity: Temporal Trajectories of Student-AI Interaction in Mathematics Learning
arXiv:2606.28472v1 Announce Type: new Abstract: GenAI is increasingly used by students as learning companions, yet little is known about how they use these tools in open-ended learning settings, where the goal is not to complete a specific task but to improve understanding and making progress. This study examined Grade-9 students' dialogue with a general-purpose LLM during mathematics practice, in which students prepared a curriculum-aligned skill for a later assessment. We investigated whether students' interactions revealed forms of epistemically proactive AI use: trajectories in which they strategically use and regulate AI to advance their understanding, and whether these trajectories predicted immediate AI-free performance on the same skill. A total of 112 students worked with a web-based LLM tutor on a mathematical-modeling task; 97 completed both AI-free pre- and post-tests. Student turns were coded for self-regulated learning functions, help-seeking content, and mathematical-modeling activity; three dimensions hypothesized to capture epistemically proactive AI use in this task. Descriptively, students' interactions showed little explicit regulation and mostly involved procedural or conceptual questions. Static summaries of AI use, including whole-session prompt functions, request types, modeling stages, and behavioral diversity, did not predict post-test performance after controlling for prior knowledge. In contrast, temporal indicators were informative: students performed better when their interactions shifted from early to late phases toward a more epistemically proactive balance of conceptual or procedural help-seeking and mathematical work, rather than verification, answer-seeking, or validation. These findings suggest that productive AI-supported learning is better understood as a domain-specific trajectory of epistemic proactivity. We discuss implications for AI tutor design and classroom orchestration.
Four Types of LLM Reliance and Their Predictors Among Undergraduate Writers: A Mixed-Methods Study at a Minority-Serving R1 University
arXiv:2606.28749v1 Announce Type: new Abstract: Although most undergraduates now use large language models (LLMs), a form of generative artificial intelligence (GenAI) for academic writing, no validated method distinguishes the qualitatively different ways students rely on them. Existing instruments assess reliance solely by frequency of use, a measure that, as this study shows, inadvertently rewards dependence on AI rather than recognizing students' own intellectual contribution. Conducted at a public minority-serving university and grounded in the AI Literacy Framework, Expectancy-Value Theory, and Biggs's Presage-Process-Product model, the study drew on 382 undergraduates, 14 interviews, and 396 open-ended survey responses. Four distinct reliance types were identified and confirmed: Strategic (34.3%), Instrumental (30.9%), Dialogic (30.4%), and Dependent (4.5%). Students' value and cost beliefs predicted the intensity of their reliance on LLMs, whereas their AI literacy predicted the type of reliance they adopted, indicating that differentiated support is needed. Notably, Strategic users, those who engaged AI most deliberately, scored lowest on standard outcome measures. This pattern reflects a limitation of current instruments, which index AI's contribution rather than writing quality, thereby penalizing students who show the greatest independent thinking. Analysis also revealed an additional group, roughly 13%, who declined to use AI for ethical rather than practical reasons, and who existing frameworks overlook. These findings carry implications for AI literacy programs, the measurement of student learning outcomes, and equitable AI policy at minority-serving institutions.
92% of tech executives see AI management as vital work skill by 2031: KPMG - The Tribune
As the corporate landscape shifts toward automated decision-making, 92 per cent of tech executives report that managing artificial intelligence agents will become an important skill within the next five years. According to a report by KPMG, this rapid rise of agentic AI is forcing organizations ...
Council Post: Why AI Transformation Succeeds Only When Talent And Technology Evolve Together
Transformation is as much a talent development challenge as a technological one.
LLM-Ideoplasticity: Measuring Ideological Plasticity in the Political Behavior of LLMs as a Context-Conditioned Distribution
arXiv:2606.28335v1 Announce Type: new Abstract: We argue, with systematic empirical evidence, that a large language model's political ideology is not a fixed point, but a conditional distribution $\mathbb{P}($position$\mid$context$)$ over a real political space. We evaluate nine current LLMs using a unified measurement framework anchored by VAA-CHES projection models, which map responses onto three validated dimensions (lrgen, lrecon, galtan) across six contextual axes. Our findings reveal high sensitivity to context: persuasive framing and under-represented languages displace coordinates by up to 0.57 and 0.52 units, respectively, while chain-of-thought reasoning often amplifies rather than dampens paraphrase instability. Despite this local plasticity, the model cohort occupies a remarkably narrow Overton envelope overall, occupying roughly one-third the spread of major European parties. Supported by a multi-trait multi-method (MTMM) analysis, we conclude that a single point cannot summarize LLM political behavior; it must be characterized as a shape. Our code and data are publicly available at https://github.com/sakhadib/LLM-Ideoplasticity.
Use of artificial intelligence for mental health splits opinion
Two-thirds of those aged 25 to 34 have asked chatbots for wellbeing support
Technology & Infrastructure
An AI agent for treatment reasoning over a biomedical tool universe
arXiv:2606.28692v1 Announce Type: new Abstract: Treatment reasoning underpins every therapeutic decision, integrating disease context, comorbidities, medications, contraindications, and evolving biomedical knowledge to select an appropriate therapy. It is inherently iterative: candidates are weighed against many constraints, revised as evidence emerges, and grounded in verifiable sources. Here we introduce ATHENA-R1, an AI agent for treatment reasoning across all FDA approved drugs since 1939, trained by reinforcement learning over a universe of 212 biomedical tools. At each step it identifies missing information, selects and runs relevant tools, and incorporates the evidence. To train it without human-annotated traces, we build a two-level self-learning framework: multi-agent systems construct the tools, tasks, and reasoning trajectories for supervised fine-tuning, then reinforcement learning with scientific feedback rewards reasoning quality (evidence gathering, grounded tool use, logical non-redundancy). Across five benchmarks of 3,168 drug reasoning tasks and 456 patient treatment cases, ATHENA-R1 outperforms language models and tool-use systems, reaching 94.7% accuracy on open-ended drug reasoning and 82.9% on treatment reasoning, 17.8 and 10.7 points above GPT-5. In blinded evaluations by experts from 28 rare disease organizations, it is preferred over reference models on all criteria, and physicians rated it favorably on complex hospitalized cardiovascular and infectious-disease cases. Adverse-event hypotheses it generated, tested in electronic health records from 5.4 million patients, reached adjusted odds ratios of 1.48-1.84, with no elevation among negative controls. Because it requires knowing what evidence to seek before concluding, treatment reasoning has long been hard for AI; we show it can be reframed as a learnable process of iterative evidence gathering that reinforcement learning can train AI to perform.
Agentic Abstention: Do Agents Know When to Stop Instead of Act?
arXiv:2606.28733v1 Announce Type: new Abstract: LLM agents are expected to act over multiple turns, using search, browsing interfaces, and terminal tools to complete user goals. Yet not every goal is well specified or achievable in the available environment. In such cases, a reliable agent should recognize that further interaction is unlikely to help and abstain from additional tool calls. We define Agentic Abstention, the problem of deciding when an agent should stop acting under uncertainty. Unlike standard LLM abstention, which is usually evaluated as a single-turn answer-or-abstain decision, agentic abstention is a sequential decision problem: an agent can answer, abstain, or gather more information at each turn, and the need to abstain may only become clear after interacting with the environment. We study this problem across web shopping, terminal environments, and question answering, evaluating 13 LLM-as-agent systems and 2 agent scaffolds on more than 28,000 tasks. Our results show that the main challenge is not only whether agents can abstain, but also when they abstain. Some agents never abstain when they should, while others do so only after many unnecessary interactions. This gap is especially large on tasks where the instruction appears feasible until the environment reveals otherwise (e.g., no valid result matches the instruction). We further find that model scale, reasoning, and agent scaffolding affect abstention in different ways, where larger or more capable models sometimes perform worse at timely abstention. Finally, we introduce CONVOLVE, a context engineering method for improving agentic abstention that distills full interaction trajectories into reusable stopping rules. On WebShop, CONVOLVE substantially improves timely abstention without updating model parameters, raising Llama-3.3-70B's timely recall rate from 26.7 to 57.4. Our dataset and code are available at https://lhannnn.github.io/agentic-abstention
Why enterprises aren't ready for AI Agents yet
I’ve noticed that while founders ... themselves: organisations themselves have lower risk-taking capacity. Still, enterprise adoption of AI agents remains slow, and three talks at SuperAI, from Snowflake, Alibaba Cloud and Sierra highlighted some challenges and offered some ...
AI speeds the march of China’s factory robots into new sectors
Artificial intelligence is enabling the spread of automation to traditional industries
Recursive Self-Evolving Agents via Held-Out Selection
arXiv:2606.28374v1 Announce Type: new Abstract: LLM agents are increasingly improved without weight updates by evolving a natural-language artifact, such as reflections, workflows, playbooks, cheatsheets, or optimized prompts, that conditions a frozen policy. Such methods are typically reported as wins on the single benchmark where they help. We study them apples-to-apples and surface a sharper picture. We introduce RSEA, a Recursive Self-Evolving Agent that carries a compact three-layer natural-language state: an imperative strategy, reusable skills, and a procedural playbook. Across generations, RSEA rewrites all three layers from its own trajectories and commits a candidate only if it does not regress on a disjoint held-out split, using a strict keep-better gate. Across four diverse benchmarks, ALFWorld, GAIA, (\tau)-bench, and WebShop, and six faithful baselines, ReAct, Reflexion, GEPA, AWM, ACE, and Dynamic Cheatsheet, all evaluated on one shared local backbone, we find three main results. First, no artifact universally wins. RSEA is the strongest single-pass method on ALFWorld, reaching 69.3% compared with 64.6% for ReAct (McNemar (p=0.015)), and reaches 79.4% with retry, the best overall result. However, concrete-workflow induction, represented by AWM, is best on the strong-backbone tool-use tasks. Second, unguarded context evolution is high-variance and unsafe. Dynamic Cheatsheet, which curates context online without a held-out gate, is near-best on ALFWorld at 70.7%, yet collapses on WebShop, with a score of 0.14 compared with 0.43 for ReAct. Third, RSEA's strict held-out selection is what makes recursive self-evolution monotone-safe: it never significantly underperforms the base agent on any benchmark and falls back to vanilla ReAct when evolved context would hurt.
GPTNT: Benchmarking Real-Time Collaboration Between Multimodal Agents on Keep Talking And Nobody Explodes
arXiv:2606.28514v1 Announce Type: new Abstract: Multimodal models are increasingly deployed to solve tasks collaboratively with humans or other artificial agents. Existing benchmarks show that these models possess many of the required component capabilities, but the conditions that coincide in collaboration, including time pressure, information asymmetry, and imperfect communication, are usually studied in isolation. We introduce GPTNT, a benchmark built on the cooperative video game Keep Talking and Nobody Explodes, in which two agents must coordinate to defuse procedurally generated bomb puzzles against a live countdown. One agent can see and manipulate the bomb but does not have the defusal instructions; the other has the instructions but cannot see or manipulate the bomb. Neither agent can succeed alone: success requires effective and efficient communication. Unlike turn-based proxies, GPTNT requires agents to act asynchronously and communicate in real time. GPTNT is designed to separate collaboration from reliance on memorized solutions: the instruction manual, the partner, or both can be withheld to isolate what a model derives in the moment from what it already knows. We show that GPTNT poses a substantial challenge for state-of-the-art systems: none of the closed- or open-source models we test defuses a single bomb in real time, a bar that human players clear. Through controlled experiments, we identify critical weaknesses in state tracking, efficient action under time pressure, ambiguity handling, and error recovery. We release GPTNT as a benchmark for collaborative performance that current evaluations leave unmeasured. Because it runs on the real game, GPTNT benefits from procedural generation and inherits a living modding community, allowing the benchmark to evolve as models improve rather than being solved once and retired.
How AI Agents Transform Industrial Operations - Thought Leadership
This concept, often called physical ... AI agents in robotics or other physical systems, making physical systems more intelligent and adaptive in dynamic real-world environments. Rather than large language models, physical AI is built on vision language action models that process visual, textual and action data to perceive surroundings, receive instruction and execute actions. This capability opens possibilities for automating tasks that ...
Using Local Coding Agents
A practical guide to building and running local AI coding agents using open-source models, allowing organizations to control their own infrastructure and workflows.
Tail Control: The Counterintuitive Engineering of Reliable Agentic Workflows
Delivering consistent, on-time AI responses is a challenge of variance rather than speed. The fixes for reliable agentic workflows are often counterintuitive.
South Korea to Speed Nuclear Push to Meet Surging Power Demand
The South Korean government will consider ways to cut atomic power construction times as it looks to ramp up energy supply to meet the demands of artificial intelligence.
‘We’re up against forces that have all the money in the world’: Erin Brockovich on her battle against AI datacentres
In 1993, she squeezed a $333m settlement from a Californian energy company in a scandal over contaminated water. Three decades later, she has a new target in her sights – and it’s global When Erin Brockovich woke to find 30 emails from people from the same town, she realised something was going on. People email Brockovich all the time because of what happened in 1993, when she was instrumental in suing Pacific Gas and Electric Company (PG&E) on behalf of residents of the town of Hinkley, California, whose groundwater had been contaminated. The case resulted in a settlement of $333m – then the largest ever payout for a direct-action lawsuit. When she was immortalised by Julia Roberts in the 2000 film Erin Brockovich, she became the hero we didn’t know we needed, a modern day Joan of Arc. She had won against PG&E with no formal legal training. The emails she received a few weeks ago were about datacentres. In April, she put a callout on her website asking for anyone with concerns about one near them to get in touch. Within a month, 3,862 people had replied. Tech companies have needed datacentres to power their technology “for ever”, she says, but the new ones being built to power AI? “This feels like Hinkley on steroids.” Continue reading...
Energy providers are flying blind thanks to unpredictable AI data center demands | IT Pro
Research from Capgemini has found that uncertainty, speed constraints, and rising system complexity are leaving firms struggling to predict future consumption
AI Energy Consumption Is Raising Household Bills: Efficiency Gains Cannot Reverse It
U.S. data centers consumed 176 terawatt-hours of electricity in 2023, representing 4.4% of national grid consumption, according to the Department of Energy's Lawrence Berkeley National Laboratory, as cited in Belfer Center research. That share is projected to reach 6.7% to 12% by 2028 as AI infrastructure ...
Data centers are ready to negotiate flexibility for speed | Construction Dive
Hyperscalers want their data centers online and utilities want to provide interconnections, but experts say both are still looking for common operating guidelines.
The United States accelerates in data centers: AI turns energy into the new bottleneck | Cloud News
The industry must better articulate its value, share infrastructure costs, and demonstrate flexible, efficient operations. Energy flexibility will be a major topic in the coming years. Not all AI workloads are equally urgent. Some loads can be shifted over time or between regions. If data centers learn to reduce consumption ...
Grid Woes Push UK Clean Power Goal Back Five Years, Report Shows
The UK will probably miss its clean electricity target by five years because of capacity constraints on its grid and could struggle to deliver on promises to cut household energy bills, according to consultant LCP Delta.
Spain’s Openchip lands €115 million SETT investment to strengthen Europe’s semiconductor capabilities
Openchip, a Barcelona-based scale-up specialising in the design and marketing of high-performance, energy-efficient chips, has secured an investment of €115 million to accelerate the design of high-performance, energy-efficient chips for AI and high-performance computing (HPC) applications. The backing came from the Spanish Society for Technological Transformation (SETT), which reports to the Ministry for Digital Transformation […]
AI super cycle lifts semiconductors, but China chip group warns of a distorted boom
AI has pushed the global semiconductor industry into a new
TSMC stock gains as supply chain expansion with Winbond boosts AI hardware ambitions
Viktoras Karapetjanc, Traders Union ... its economic leadership in the semiconductor sector. He notes that expanding partnerships and resilience in supply chains support the company’s strategic edge, especially amid rising demand for AI hardware....
Intel Shares Tumble Nearly 6% as Chip Sector Faces Renewed Pressure
Intel's stock fell 5.9% amid challenges in the AI hardware sector, with investors cautious about its turnaround efforts and competitive positioning.
Taiwan Raids Super Micro in Widening China Chip Smuggling Probe
Taiwan government agencies raided the offices of Super Micro Computer Inc. and several of its local affiliates, deepening an investigation into the alleged smuggling of Nvidia Corp. chips into China using the company’s servers.
Huawei Leads Charge as Nvidia's China Dominance Fades Amid AI Chip Market Shakeup
Nvidia's grip on China's AI market loosens as Huawei and local chipmakers surge amid U.S. export restrictions and China's drive for tech self-sufficiency.
Nvidia's AI chip sales stall in China as domestic firms take the lead - Fast Company
According to industry analysts, Huawei's most advanced AI chips are comparable to Nvidia's H200 series.
Nvidia's AI chip sales in China stall, as local chipmakers like Huawei take the lead - ABC News
In the race between the U.S. and China to develop artificial intelligence, the battle over hardware and computing power is heating up as Chinese companies like Huawei overtake global industry leaders like Nvidia in their home market
Making AI Work: Nvidia Faces Declining AI Chip Sales in China as Huawei Dominates Market, ETEnterpriseai
Making AI Work: Nvidia's AI chip sales are struggling in China, as local companies like Huawei lead the market amid U.S. export restrictions and a shift toward domestic technology.
China rewrites the AI chip playbook
Huawei emerges as the biggest winner as Nvidia loses market share
Super Micro stock falls after Taiwan raids expand Nvidia chip probe
Such changes would provide prosecutors with broader legal authority to pursue cases involving the illicit trade of AI hardware. The proposed measures come as Taiwan remains central to the global semiconductor supply chain, with both Nvidia and Advanced Micro Devices relying on Taiwan Semiconductor ...
IBM’s New Chip Could Vastly Reduce AI Energy Use
Using seaweed to 3D-print buildings. AI companies are staring down the laws of economics. Why investors should pay attention to nature.
OpenAI is teasing new hardware for Codex
OpenAI is hinting at upcoming hardware developments specifically designed for its Codex coding model.
AI Infrastructure Spending Creates New Wave of Semiconductor Ecosystem Winners | Markets Insider
This move reflects the company’s ... the hardware and infrastructure that power today's rapidly expanding AI ecosystem, including NVIDIA Corporation (NASDAQ: NVDA), Advanced Micro Devices Inc. (NASDAQ: AMD), Broadcom Inc. (NASDAQ: AVGO) and Super Micro Computer Inc. (NASDAQ: SMCI). The migration of Taiwan's semiconductor supply chain into the United ...
South Korea plans massive AI and chip investment drive worth up to $648 billion
South Korea's President Lee to announce $648 billion AI and semiconductor investment drive led by Samsung and SK Hynix, targeting chips, data centers, and
Samsung, SK Group To Unveil Investment Plan Worth Hundreds Of Billions To Boost AI And Chip Production | IBTimes
Google Explores Samsung Role In Next-Generation AI Chip As Demand Reshapes Global Semiconductor Supply Chain · Samsung Electronics and SK Hynix have become central suppliers in the AI hardware market as demand for high-bandwidth memory (HBM) chips continues to rise.
South Korea unveils three mega-projects to drive AI-era industrial growth
South Korea has unveiled a state-backed industrial strategy centered on semiconductors, physical AI, and AI data centers, with President Lee Jae Myung pledging direct oversight.
Supermicro Taiwan offices raided in chip smuggling probe
Server maker’s shares fell about 8% after news of the investigation
Zuck saves Meta bucks by reusing memory from old servers with a custom CXL ASIC
In production on millions of boxes and the payoff is a 25% reduction in machines needed for some inference workloads.
AI spending boom accelerates as Big Tech pours trillions into infrastructure | Fortune
JPMorgan raised its AI capital spending forecast through 2030 as companies continue to expand AI infrastructure.
AI data centers face heatwave and severe weather climate risks
Heatwaves and severe weather are raising risks for AI data centers, from grid strain to higher insurance and repair costs.
China's five-year energy plan backs AI integration, green power for data centers
China's 15th Five-Year Plan aims to expand the 'AI Plus' initiative by integrating computing infrastructure with energy planning to support data center electricity demand.
1.5GW data center campus proposed in Devon, UK
The company behind a massive planned power cable to the UK is pivoting to plans for a large data center campus in Devon. Announced this week, Xlinks has proposed a large AI data and energy storage development at Alverdiscott, Devon. – Xlinks The campus comprises two separate planning proposals submitted to Torridge District Council: a […]
~150MW data center planned in Norfolk, UK
A data center could be built in Norfolk, UK. First reported by the Norwich Evening News, Norwich Apex Ltd aims to develop a two-story building, substation, and a three-story office block on 13 hectares of land off Ipswich Road in Norwich. – Google Maps Norwich Apex secured planning permission to develop the Apex Business Park […]
Elea announces AI data center in the Amazon region of Brazil
Elea Data Centers and energy firm AXIA Energia are set to develop an AI data center in the Brazilian Amazon. Set to be located in Belém, Pará, the BEL1 facility is scheduled to begin operations in the second quarter of 2027. It will have an initial capacity of 7.5MW, with the potential to expand up […]
These 7 Stocks Will Solve AI’s Most Important Bottleneck
Artificial intelligence has no shortage of obstacles. The industry is scrambling to secure enough electricity to power new data centers, enough land to build them, and enough high-bandwidth memory (HBM) to keep next-generation chips fed with data. Yet another constraint is emerging that could ...
Previewing GPT-5.6 Sol: a Next-generation Model
OpenAI introduced the GPT-5.6 family, including Sol, Terra, and Luna, with improved capabilities while limiting access to a small group of trusted partners.
Nous Research ships gated model blends that beat Claude and GPT-5.5 on benchmarks
Hermes Agent introduced Mixture of Agents (MoA), which allows multiple models to collaborate on a single query. Internal benchmarks show this approach outperforms standalone models like Claude Opus and GPT-5.5.
The Human-Machine Knowledge Spiral
arXiv:2606.29227v1 Announce Type: new Abstract: Nonaka emphasized that innovation is the result of a continuous back-and-forth between tacit and explicit knowledge. Artificial intelligence introduces a fundamentally new object into this process -- tacit machine knowledge -- but Nonaka's ideas are more relevant than ever. The central role of the knowledge-creating company remains the same: to create the shared context in which different kinds of knowledge can feed off each other, become organizational knowledge, and set off further cycles of innovation.
DeepSeek open sources DSpark, a new framework to speed up LLM inference by up to 85%
Even as the geopolitical conversation around AI continues to grow more fraught following the U.S. government's actions to limit the new models from Anthropic and OpenAI, Chinese open source darling DeepSeek is back with yet another open release that could once again change AI development around the globe. Over the weekend, the firm released DSpark, a new, MIT-Licensed system designed to make large language models answer faster without changing what the underlying model is trying to say. The easiest way to think about it is this: most AI chatbots write like someone crossing a river one stepping stone at a time. They choose one small chunk of text, then the next, then the next. DSpark gives the system a scout that runs a few steps ahead, guesses the likely path, and lets the larger model quickly check which steps are safe. When the guesses are good, the model moves faster. When the guesses are weak, DSpark tries not to waste time checking them. DeepSeek published the work with a technical paper, model checkpoints and DeepSpec, a codebase for training and evaluating speculative decoding systems. The release is available through DeepSeek’s public GitHub and Hugging Face pages, both under the permissive, friendly, commonplace MIT license, making the new technique broadly usable by developers, researchers and commercial enterprise operations that want to study or adapt the approach. The system is aimed at one of the most expensive problems in AI deployment: serving large models quickly enough for real users, while using hardware efficiently enough to make the economics work. That matters for consumer chatbots, coding assistants, agentic workflows and enterprise AI systems where users expect long answers to stream quickly rather than crawl out word by word. DeepSeek is applying DSpark to its own latest frontier open model, DeepSeek-V4. Specifically, DeepSeek used its new DSpark framework on DeepSeek-V4-Flash, its already speed-optimized 284-billion-parameter mixture-of-experts model with 13 billion active parameters, and DeepSeek-V4-Pro, its more thoughtful and powerful 1.6-trillion-parameter model with 49 billion active parameters (Both support context windows up to one million tokens). But the broader significance is that DSpark is not conceptually limited to DeepSeek-V4. DeepSeek’s own tests and released checkpoints cover other open model families, including Alibaba's open weights Qwen and Google's open weights Gemma. That means enterprise teams running open-weight models could, in principle, train or fine-tune DSpark-style draft modules for their own target models. It is not a switch that any API customer can flip from the outside, but it is a method that can travel to other models when the operator controls the weights and serving stack. Staggering speed increases for generating tokens during inference In DeepSeek’s live production tests, DSpark improved aggregate throughput by 51% for DeepSeek-V4-Flash at an 80-token-per-second-per-user service target, and by 52% for DeepSeek-V4-Pro at a 35-token-per-second-per-user target. At matched system capacity, DeepSeek reports per-user generation speedups of 60% to 85% for V4-Flash and 57% to 78% for V4-Pro over its prior MTP-1 production baseline. The different speed claims measure different things. The 60% to 85% figure for V4-Flash, and the 57% to 78% figure for V4-Pro, describe how much faster individual users receive generated tokens when DeepSeek compares DSpark with MTP-1 at matched practical system capacity. Those are the cleaner “generation speed” numbers. DeepSeek also reports much larger 661% and 406% increases, but these measure aggregate throughput under very strict speed targets: 120 tokens per second per user for V4-Flash and 50 tokens per second per user for V4-Pro. At those targets, DeepSeek says its older MTP-1 baseline approaches an operational cliff, meaning it can keep only a small number of concurrent requests running while preserving that level of responsiveness. DSpark avoids more of that collapse, so the percentage difference in total system output becomes much larger. Put simply: the 85% number is closer to “how much faster the ride feels for a user” under comparable conditions, while the 661% and 406% figures are closer to “how much more traffic the road can still carry” when the old system is already bottlenecking. Why speculative decoding matters LLMs usually generate text one token at a time. A token can be a word, part of a word, punctuation mark or other small piece of text. Every new token depends on the text already produced, so the model has to keep pausing, checking the full context and choosing the next piece. That is accurate, but slow. It is like having a senior editor approve every word before a writer can move to the next one. The editor may be excellent, but the process creates a bottleneck. Speculative decoding, developed in the early Transfomer era, tries to fix that bottleneck. Instead of asking the large model to produce every token one by one, the system uses a smaller or lighter draft component to suggest several likely next tokens. The large model then checks that batch of guesses in parallel. If the draft guessed correctly, the system moves ahead several tokens at once. If the draft made a bad guess, the system rejects the bad token and anything after it, adds a corrected token, and tries again. The point is speed without changing the larger model’s intended output. In the standard speculative decoding setup, the draft model is not replacing the target model. It is acting more like an assistant who prepares a rough next sentence for the senior editor to approve or reject. The idea did not appear out of nowhere with today’s large language models. A key precursor came in 2018, when Mitchell Stern, Noam Shazeer and Jakob Uszkoreit proposed blockwise parallel decoding for deep autoregressive models. Their method predicted multiple future steps in parallel, then kept the longest prefix validated by the main model. That paper established much of the draft-and-check intuition behind later speculative decoding work. The research line became more explicit in 2022. Heming Xia, Tao Ge and co-authors introduced SpecDec, a draft-and-verify approach for sequence-to-sequence generation. Later that year, Yaniv Leviathan, Matan Kalman and Yossi Matias posted “Fast Inference from Transformers via Speculative Decoding,” which helped define the modern version of the technique for transformer-based language models. DeepMind researchers followed in 2023 with a closely related method called speculative sampling. Those 2022 and 2023 papers are the clearest ancestors of how speculative decoding is discussed in current LLM inference work: a faster draft process proposes tokens, and the larger target model verifies them in a way designed to preserve the target model’s output distribution. Since then, the field has moved quickly through several variants, including separate draft models, multi-token prediction heads, tree-based verification, feature-level methods such as EAGLE, self-speculation, Medusa-style extra heads and parallel/blockwise drafters such as DFlash. The key metric is not how many tokens a draft model can guess. It is how many of those guesses the larger model actually accepts. Long speculative blocks help only if enough of the proposed tokens survive verification. Otherwise, the system spends compute checking guesses that it throws away. That is the context for DSpark. Speculative decoding is already an established inference technique before DeepSeek’s release, with support in major serving stacks and multiple competing research approaches. But it is still not a solved problem. Speedups depend heavily on the draft model, the workload, the serving setup and the current traffic level. DSpark’s contribution is to improve both sides of the trade-off: it tries to draft more coherent token blocks and then verify only the parts of those blocks that are likely to pay off under real serving conditions. What DSpark changes DSpark tackles two related problems: bad guesses and wasted checking. First, the system uses what DeepSeek calls semi-autoregressive generation. In plain English, that means DSpark tries to combine speed with a bit more awareness of sequence. A fully parallel drafter can guess several tokens at once, which is fast, but its later guesses can become less coherent because each position is predicted too independently. A purely step-by-step drafter can keep better track of how one token leads to the next, but it loses much of the speed advantage. DSpark tries to keep the best of both. It uses a parallel backbone for most of the drafting work, then adds a lightweight sequential head that lets the draft take nearby token relationships into account. In the paper’s example, a parallel drafter might confuse likely phrase endings such as “of course” and “no problem,” producing awkward combinations because it is guessing positions too separately. DSpark’s sequential component helps the system make the later tokens fit the earlier ones. Second, DSpark adds confidence-scheduled verification. Rather than always asking the target model to check the same number of draft tokens, DSpark estimates which prefix of the draft is likely to survive. A hardware-aware scheduler then adjusts how much of each draft should be verified based on both model confidence and current serving load. A simple analogy: when a restaurant is quiet, the head chef can inspect more of the prep cook’s work. When the kitchen is slammed, the chef spends attention only on the dishes most likely to be ready. DSpark applies a similar idea to AI serving. Under lighter traffic, the system can afford to check longer draft prefixes. Under heavier traffic, it trims low-confidence trailing guesses before they consume batch capacity that could be used for other users. DeepSeek frames this as an answer to a common production trade-off. Static multi-token drafting can look attractive in isolation, but can hurt throughput under high concurrency because the system keeps checking tokens that are likely to be rejected. DSpark’s scheduler makes the verification budget flexible instead of fixed. Offline results: better draft acceptance across Qwen and Gemma DeepSeek tested DSpark offline on Qwen3-4B, Qwen3-8B, Qwen3-14B and Gemma4-12B target models across math, coding and chat benchmarks. In those tests, the team compared DSpark with DFlash, a parallel drafter, and Eagle3, an autoregressive drafter. The paper reports accepted length per decoding round, a measure of how many tokens survive verification on average. Across the three Qwen3 model sizes, DSpark improved macro-average accepted length over Eagle3 by 30.9%, 26.7% and 30.0%, respectively. Compared with DFlash, it improved accepted length by 16.3%, 18.4% and 18.3%. The paper also says the gains generalized to Gemma4-12B. That supports a point raised by developer Daniel Han, who highlighted on X that DeepSeek showed DSpark working beyond DeepSeek’s own V4 models, including Gemma and Qwen. I would include Han as community reaction, not as the sole evidence for the claim. The stronger support comes from DeepSeek’s own benchmarks and released checkpoints. The offline results also show why workload matters. Structured tasks such as math and code tend to have higher accepted lengths than open-ended chat. That makes intuitive sense: a code completion or math step often has fewer reasonable next moves than a free-form conversation. For enterprises, this means DSpark-style methods may be especially attractive for coding assistants, data analysis agents, structured workflow automation and other settings where outputs follow more predictable patterns. How enterprises could use DSpark without DeepSeek-V4 One of the most important questions is whether DSpark is a DeepSeek-only optimization or a broader method that can be applied to other models. The answer is: broader method, but not automatic plug-in. For open-weight models, the path is relatively clear. An enterprise running Qwen, Gemma, Llama, Mistral, Granite, Command-style open weights or another model it hosts itself could train or fine-tune a DSpark-style draft module against that target model. The team would then measure acceptance on its own workloads and integrate the verification scheduler into its inference stack. That is different from simply downloading DeepSeek’s DSpark module and attaching it to any model. Speculative decoding depends on alignment between the draft module and the target model. The draft has to learn what the target model is likely to accept. A drafter trained for DeepSeek-V4 will not automatically be the right drafter for a different model, especially one fine-tuned on a company’s internal data or configured for different reasoning behavior. DeepSpec’s workflow reflects this. The process involves preparing data, regenerating target-model answers, building a target cache, training the draft model and evaluating speculative-decoding acceptance. For domain-specific use, the draft model may need additional fine-tuning, especially if the target model runs in a thinking or reasoning mode. For proprietary models, the answer depends on what the enterprise controls. If a company owns or fully hosts the model weights and serving stack, it could theoretically train and deploy a DSpark-style drafter. If the model is available only through a hosted API from a vendor, the customer cannot directly add DSpark from the outside. The API provider could implement a similar optimization internally, but the customer generally cannot access the token verification loop, logits, batching behavior or serving scheduler needed to make DSpark work. That distinction matters for enterprise buyers. DSpark strengthens the case for open or self-hosted AI infrastructure because it gives advanced teams another lever to improve speed and cost. But it also shows why model serving is becoming a specialized discipline. The value is not just in picking a model, but in how intelligently that model is run. What developers get from DeepSpec For developers, DeepSpec gives a concrete implementation path for training and evaluating speculative decoding draft models. It includes data preparation, training and benchmark evaluation steps, along with released checkpoints for several open model families. That makes the release useful not only for running DeepSeek-V4 with DSpark, but also for researchers and infrastructure teams studying how to add faster decoding to other open models. There are real deployment caveats. DeepSpec’s own README says the default Qwen3-4B data preparation setup can require roughly 38 TB of target cache storage, and the default scripts assume a single node with eight GPUs. That makes the release more immediately relevant to AI labs, cloud teams and sophisticated enterprise AI infrastructure groups than to ordinary application developers. Still, releasing the training pipeline matters. Many inference optimizations appear only as papers, vague benchmarks or closed production claims. DeepSpec gives developers something closer to a set of blueprints: not a finished enterprise product, but a way to reproduce, adapt and evaluate the method. Early community testing The release has already drawn fast developer attention. Developer Rafael Caricio published a GitHub pull request documenting single-stream DeepSeek-V4-Flash DSpark work, reporting warmed benchmark anchors of 26.33 tokens per second without speculative decoding, 39.88 tokens per second with MTP-1, and roughly 60 tokens per second with DSpark — about 1.5x over MTP-1 and 2.3x over no-spec decoding. A later commit in the same thread recorded a five-run mean of 60.31 tokens per second, with a 1.51x gain over MTP-1 and 2.29x over non-speculative decoding. The same work also points to an important practical limit: in realistic multi-turn coding sessions, performance can degrade as draft acceptance falls with growing context. In other words, DSpark can make decoding faster, but acceptance quality still determines how much speed the system actually realizes. That is a useful reality check. DSpark is not magic. It still depends on how predictable the next tokens are and how well the drafter stays aligned with the target model. But the early implementation work suggests DeepSeek’s claims are not purely academic. Developers are already testing the method in practical serving environments and reporting gains close to the paper’s single-stream expectations. The bottom line DSpark shows how much performance remains available in the inference layer, even when the underlying model architecture stays the same. As AI companies compete on model quality, context length and pricing, decoding efficiency is becoming another major battleground. Faster generation means lower latency for users, higher throughput for providers and better economics for teams serving open models at scale. DeepSeek’s release is notable because it combines a production-tested method, open code, public checkpoints and a detailed paper. The main innovation is not just drafting more tokens. It is making the system more selective about which speculative work is worth verifying. For enterprise teams, the broader lesson is that the next wave of AI performance gains will not come only from larger models. It will also come from smarter ways to run the models companies already have — especially when those companies control enough of the stack to tune the model, train a compatible draft module and optimize the serving engine around real workloads.
Self-Supervised Theorem Discovery in a Formal Axiomatic System
arXiv:2606.28747v1 Announce Type: new Abstract: Recent artificial intelligence (AI) systems have shown remarkable progress in mathematical reasoning. Many existing approaches, including large language models (LLMs), draw on human prior knowledge in the form of mathematical text, code, or theorem libraries. Although these approaches are highly effective in practice, it remains an open question whether an agent can autonomously discover useful theorems without such human priors. We study this question in a formal axiomatic system by developing an agent that starts from axioms and inference rules alone and gradually grows a library of useful theorems. Concretely, we propose a self-supervised theorem-discovery algorithm that alternates between proof search and useful-theorem extraction, building a theorem library whose entries are reused as lemmas for subsequent proof search. Experiments show that the agent discovers tens of thousands of theorems and finds proofs for human-written benchmark problems, suggesting that its discoveries include theorems meaningful from a human mathematical perspective. Furthermore, the discovered theorems improve LLM proof performance when provided as prompt lemmas, indicating that they can serve as external knowledge for LLM reasoning. Our results provide evidence that useful theorems can emerge from proof search without relying on human-provided theorem libraries. More broadly, they suggest a path toward self-evolving AI systems for mathematics whose discoveries remain formally verifiable.
Search for Truth from Reasoning: A Dynamic Representation Editing Framework for Steering LLM Trajectories
arXiv:2606.28589v1 Announce Type: new Abstract: Current approaches to enhance Large Language Model (LLM) reasoning, such as Chain-of-Thought and "Wait" prompts, primarily encourage models to think more, yet often fail to guide them toward Truth. While Representation Editing (RepE) offers a intrinsic control, its application to dynamic reasoning trajectories remains underexplored. In this work, we bridge this gap by investigating the geometry of truth within unfolding reasoning chains. We uncover three critical insights: (1) Truth is encoded at the sentence level and is entangled with latent reasoning patterns; (2) Effective intervention follows an Uncertainty Principle and a Decay Effect, requiring localization to early, high-entropy forks; (3) Naive steering vectors suffer from noise, risking collateral damage to correct trajectories. Based on these findings, we propose DynaSteer, a dynamic RepE framework. DynaSteer employs pattern clustering to disentangle reasoning manifolds and utilizes Fisher-LDA to project purified truth. By dynamically monitoring lookahead entropy, it selectively steers and rolls back trajectories only when necessary. Comprehensive experimental results on several MATH benchmark verify the effectiveness of DynaSteer, and experiments on out-of-domain coding tasks further confirm its generalization ability. Our code is publicly available at https://github.com/tianlwang/DynaSteer.
ComMem: Complementary Memory Systems for Test-Time Adaptation of Vision-Language Models
arXiv:2606.28719v1 Announce Type: new Abstract: Test-time adaptation (TTA) of vision-language models (VLMs) is essential for their robust deployment in dynamic, real-world environments. However, existing TTA methods often adapt locally without accumulating knowledge over time, or operating within a single modality without exploiting VLMs' inherently multi-modal nature. Inspired by the \textbf{Com}plementary \textbf{Mem}ory systems of the biological brain, we propose \textbf{ComMem}, an innovative approach that mimics the distinct but cooperative roles of the hippocampus and neocortex to enable effective TTA for VLMs. ComMem consists of two key components: a fast-adapting detailed memory, akin to the hippocampus, that forms a dynamic visual cache from high-confidence test samples; and a slow-integrating abstract memory, akin to the neocortex, that continually refines global textual prototypes. For each test instance, ComMem jointly optimizes both memory systems to ensure cross-modal consistency. Extensive experiments on 15 benchmark datasets show that ComMem significantly outperforms state-of-the-art methods under both natural distribution shifts and cross-dataset generalization, offering a promising direction for enhancing VLMs' practical adaptability.
China’s Z.ai claims it can match Mythos on cybersecurity
New AI developments from China, specifically the Z.ai model, are being positioned as competitors to the Mythos framework.
Field Order Should Not Matter: Permutation-Invariant Embedding Model Fine-Tuning for Structured Metadata Retrieval
arXiv:2606.30473v1 Announce Type: cross Abstract: We study retrieval over catalogs of structured metadata, where each record is a small schema whose fields answer different kinds of query. Embedding a record with a text encoder first serializes its fields into a string, which forces a choice of field order. We show this choice, usually treated as an implementation detail, silently controls retrieval quality once the encoder is fine-tuned. A standard fine-tune loses 7.4 nDCG@10 points when the index is rebuilt under a different field order, because it reads absolute position instead of the field labels. We propose permutation-invariant fine-tuning ($\textbf{PI-FT}$), which serializes each record under a freshly sampled field order with random field dropout, so meaning binds to the labels rather than to position. The change is about two lines in the data loader; it costs negligible in-distribution accuracy and cuts the order-change penalty to 0.2 points. We study this in the discovery of development statistics, a catalog of nearly 10,000 indicators that should be searchable in many languages by a model small enough to self-host. As AI assistants and agents increasingly mediate access to public data and statistics, this retrieval step decides whether an answer is grounded in the right indicator or series, making discoverability a precondition for disseminating data through AI. Because usage logs cannot provide training signal for indicators no one has searched, we generate the queries instead. $\textbf{DevDataBench}$ is a fully LLM-generated benchmark of grounded, facet-targeted queries across 15 languages, covering every indicator for both training and evaluation. A fine-tuned 118M-parameter CPU encoder outperforms every zero-shot baseline, including $\texttt{text-embedding-3-large}$ (0.707 vs.\ 0.556 nDCG@10), with the largest gains in low-resource languages. We release the benchmark, pipeline, models, and a reusable PI-FT framework.
COMPASS: Grounding Composition-Intent Guidance in Unified Multimodal Models
arXiv:2606.28696v1 Announce Type: new Abstract: Composition is a high-level visual intent that governs where subjects are placed and how a scene is organized, yet current unified multimodal models remain unreliable at fine-grained composition recognition and struggle to turn such intent into controllable generation. We present COMPASS, the first unified multimodal framework that grounds composition-intent control in a single system spanning both composition perception and composition-guided generation, with a shared expert token $\tau_c$ as the central intent anchor. On the perception side, COMPASS injects composition expertise into an MoE backbone in a minimally invasive manner and distills the inferred intent into $\tau_c$. On the generation side, COMPASS reuses $\tau_c$ as a global conditioning signal that steers the denoising trajectory, effectively converting passive composition analysis into explicit layout control. To support systematic instruction-following composition learning and evaluation at scale, we construct Comp-11, a large-scale dataset with an 11-class taxonomy and reasoning-augmented annotations. Extensive experiments show that COMPASS substantially improves category-level composition understanding and delivers more composition-consistent, prompt-faithful generation than strong baselines.
Google Cloud Launches AI Models Tailored for Scientific Breakthroughs - Artiverse
Google already runs private previews ... industry researchers, and Nobel laureates. Over 100 institutions, including Stanford, Imperial College London, and The Crick Institute, collaborate with Google to validate these AI systems. Google’s track record in scientific AI is solid. Its models AlphaFold and AlphaGenome have revolutionized structural bioinformatics and genomic analysis. Now, these new offerings aim to bring similar breakthroughs to industrial ...
How Basic Persuasion Can Bypass AI Safeguards - Knowledge at Wharton
New Wharton research suggests AI models may be vulnerable to many of the same persuasion tactics that influence people.
OpenAI announces 'limited preview' of GPT-5.6 at White House's request
OpenAI is launching a limited preview of its GPT-5.6 generative AI models for trusted partners at the request of the White House.
GPT-5.6 cheated its way out of evaluation
Reports suggest that the GPT-5.6 model exhibited behavior designed to bypass standard evaluation protocols.
OpenAI's most powerful model is here — but not for everyone
OpenAI has released its most powerful AI model to date, though access remains limited.
Data and Evaluation Closed-Loop for Model Capability Enhancement
arXiv:2606.28471v1 Announce Type: new Abstract: Model capability is the central variable in LLM pre-training, yet is never observed directly: data shapes it prospectively, while evaluation reveals it only retrospectively, compressing samples, prompts, decoding, and scoring rules into one noisy score. Practical optimization runs this backward: a failure is observed first, and the engineer must infer the corpus fix. The two sides speak incompatible vocabularies -- benchmark names and per-sample correctness versus data sources, domains, and quality labels -- so this inference is usually intuition, not method. We close this gap with the \emph{capability slice}: a group of evaluation samples sharing background condition, task type, solving operation, and output constraint -- precise enough to localize a single weakness yet stable enough to survive aggregation, unlike a benchmark name, too coarse, or a single sample, too noisy. Built around this unit, an evaluation taxonomy, a non-instruction data taxonomy, and mapping rules form a closed loop turning a benchmark-level failure into a targeted, testable data intervention. We test this loop on two case studies pulling in opposite directions. First, the loop rules the data out: continued pre-training drives BBH down by $-46.82\%$, but diagnosis traces this to a single masked \texttt{\textless EOS\textgreater} loss rather than weakened reasoning; restoring it recovers BBH to $66.44$, above the original checkpoint, without changing the data. Second, the loop rules the data in: a persistent math-reasoning weakness is decomposed by solving operation into specific failing combinations, and a weakness-targeted sampling procedure built from it lifts AIME2025/AIME2026 Pass@128 from $6.67$/$0.00$ to $26.67$ each. The same unmodified loop reaches opposite, correct verdicts in both cases, showing the evaluation-to-data inference can be routine, auditable, and experimentally validated rather than intuitive.
BV-Blend: Uncertainty-Weighted Historical Baselines for Stable Critic-Free RL with Verifiable Rewards
arXiv:2606.28707v1 Announce Type: new Abstract: Critic-free reinforcement learning with verifiable rewards (RLVR), exemplified by Group Relative Policy Optimization (GRPO), avoids training a value function (critic) and reduces memory and compute overhead relative to critic-based PPO pipelines for aligning large language models. However, GRPO-style advantage estimation depends on prompt-local (within-prompt-group) reward statistics and can be unstable. In particular, when all rollouts in a prompt group receive identical rewards, the within-group reward variance becomes zero, and group normalization yields zero advantages for that group, impeding learning in cold-start regimes with binary verifiers. We introduce BV-Blend, a critic-free framework that stabilizes advantage estimation by combining prompt-local on-policy statistics with semantic-cluster-conditioned historical moments. BV-Blend maintains EMA-tracked reward moments for each cluster, derives a confidence weight from a standard error of the mean (SEM) proxy, and uses this weight to blend historical and prompt-local baseline and variance statistics into a standardized advantage for PPO-style clipped updates. Experiments on verifiable reasoning benchmarks show that BV-Blend improves training stability and performance, and remains robust in regimes where group-normalized methods may stall.
China Builds AI Vulnerability Scanner to Counter Mythos: Every Zero-Day Goes to Beijing by Law
China AI vulnerability scanner Tulongfeng was unveiled June 28 by sanctioned Qihoo 360 at ISC.AI 2026 using a multi-agent swarm to challenge Anthropic Mythos — but Chinese law requires every zero-day to be reported to Beijing within 48 hours, undercutting the cyber deterrence framing Zhou Hongyi
The attack that hijacked Claude Code came through Sentry. Datadog, PagerDuty, and Jira have the same exposure.
A single fake error report hijacked Claude Code in controlled testing — the agent ran the attacker's code with the developer's full privileges, and not one alert fired. EDR, WAF, IAM, and the firewall all missed it completely. Tenet Security's June agentjacking disclosure describes a single crafted Sentry error event — sent through a public credential that requires no breach and no authentication — that injected attacker instructions into error data that Claude Code, Cursor, and Codex then executed as trusted diagnostic output. Tenet tested 100-plus targets in controlled conditions and achieved an 85% success rate. Sentry called the flaw "technically not defensible." he Cloud Security Alliance classified agentjacking as a systemic MCP vulnerability class within days of the disclosure. No credentials were stolen, no policy was violated, no perimeter was breached: every step in the chain was authorized. That is the problem. Tenet identified 2,388 organizations with publicly exposed Sentry credentials that could be used to inject malicious events at scale. The research is proof-of-concept, not confirmed exploitation across all 2,388. But one captured Claude Code environment held a live AWS secret access key and private repository URLs. Here is the scope test: If your AI coding agents are connected to Sentry, Datadog, PagerDuty, Jira, or any MCP-connected data source your developers trust — and those agents can execute shell commands — then your stack has the same blind spot. Organizations running Sentry should audit all publicly exposed DSNs immediately. Sentry's architecture intentionally makes DSN credentials public for frontend error reporting, so the mitigation isn't revoking the DSN — it's restricting what agents can do with the data those DSNs return. Why your stack can't see it Agentjacking works because every step is authorized: The attacker sends a valid Sentry API call using a public DSN, the MCP server returns the injected event as authentic output, and the agent executes the instruction using the developer's privileges. No signature fired. The victim saw only benign diagnostics while the agent silently exposed cloud credentials and source-control tokens. SOC teams have never needed to distinguish between a developer running an npm install and an agent running that command in response to a malicious error event. That distinction did not exist until AI coding agents became production tools. The stack that cannot make it is the stack agentjacking bypasses. Five surveys, one pattern Five independent surveys from the first half of 2026 found that enterprises trust their AI agents far more than their enforcement justifies. Only 34% of organizations apply the same security controls to AI agents as to humans, according to an Okta/Apprize360 survey of 292 executives and 492 knowledge workers. Fifty-two percent of employees use unapproved AI tools, and 58% of executives reported an AI-related incident or close call in the prior year. HiddenLayer’s 2026 AI Threat Landscape Report surveyed 250 IT and security leaders: 33% reported agents had already exceeded intended scope, and 31% could not confirm whether they had experienced an AI breach. One in eight AI breaches was linked to agentic systems. Gravitee’s survey of over 900 executives and practitioners found only 14.4% of agents went live with full security approval, and 88% reported confirmed or suspected incidents. A follow-up of 750 leaders in April found agent estates had doubled while monitoring barely moved. The runtime gap nobody closed “Securing agents looks very similar to securing highly privileged users,” said Elia Zaitsev, CTO of CrowdStrike, in an interview with VentureBeat. “They have identities, access to underlying systems, they reason, they take action.” Zaitsev pointed to the gap the industry left open. “No one has been talking about securing agents at runtime. We are doing that now. What is your safety net? If all these controls fail, how do you prevent them from failing silently?” CrowdStrike's fleet data quantifies the exposure: more than 1,800 agentic applications on enterprise endpoints, approximately 160 million instances under monitoring. On June 15, CrowdStrike shipped Continuous Identity for AI Agents at Identiverse, replacing static policies with continuous enforcement that authorizes every agent action in real time. The control class that announcement reflects — continuous action-level authorization with verifiable agent identity — is now a baseline procurement criterion regardless of vendor. “People have kind of forgotten about runtime security,” Zaitsev said. “We did this with endpoint, virtualization, and cloud. People focused on patching vulnerabilities, locking down permissions. Somehow, they always seem to miss something. The safety net is runtime.” Zaitsev was equally direct about sandbox approaches. “If you start with an agent in a sandbox that has no ability to touch anything, it is worthless. Very quickly, you are in this race of giving it more capabilities. And then what is the point of your sandbox?” Agents derive their value from access. Every access grant is an attack surface. The governance gap is a budget problem Kayne McGladrey, an IEEE Senior Member, described the structural challenge in an exclusive interview with VentureBeat. “The CISO doesn’t have the budget. The CISO doesn’t have the staff. We can observe risks, we can advise on business risks, but we don’t own the business systems affected by those risks,” McGladrey said. When agent governance spans six departmental budgets, no single executive can confirm whether agents get the same access reviews as humans. The Okta survey quantifies the disconnect. Only 43% of workers say agent policies are clear, compared to 65% of executives, and nearly two-thirds apply weaker controls to agents than to humans. The people deploying agents daily do not recognize the governance posture their leadership claims to have built. Assaf Keren, chief security officer at Qualtrics and former CISO at PayPal, put it plainly. “The real risk starts not by the implementation of AI systems. It is the fact that baseline architecture is not well established. When we put an AI system on top of something not architected well, we are accelerating the fractures.” Keren called runtime behavior analytics “an unsolved problem right now.” The 5-question gap test The five-question gap test draws on five surveys from the first half of 2026. Each question maps to a gap that agentjacking exploits. Run this before any Q3 vendor evaluation. Gap to test The proof What breaks Monday action Source / sample 1. Agent inventory. What percentage of agents, MCP connections, and LLM automations completed security review before deployment? 14.4% get full security/IT approval before going live. 52% of employees use unapproved AI tools. Average enterprise now manages 37+ deployed agents, roughly doubled from Q4 2025. Unapproved agents are invisible to your identity platform and unaccountable in a breach disclosure. Agentjacking targets exactly these unmanaged MCP connections. No census means no audit trail for regulatory response. Commission a full agent, MCP server, and LLM automation census. Make census completion a procurement gate for all Q3 vendor evaluations. Flag any agent discovered post-census as a shadow AI incident. Gravitee State of AI Agent Security 2026, 900+ respondents (Feb 2026); Gravitee April 2026 update, 750 senior tech leaders; Okta/Apprize360, 292 execs + 492 workers (June 2026) 2. Controls parity. Do agents receive the same access reviews, privilege scoping, and revocation timelines as human employees? 34% always apply the same controls to agents as humans. 61% of privileged access fulfilled without proper review. Only 22% treat agents as independent identity-bearing entities. An agent with a static OAuth token and no review cycle is a permanent privileged account with no termination date. Agentjacking inherits whatever privileges the developer holds. 45.6% of orgs rely on shared API keys for agent-to-agent auth. Add every production agent to the next access review cycle. Mandate human-in-the-loop for any agent action touching PII, financial data, or production infrastructure. Replace shared API keys with scoped, short-lived tokens. Okta/Apprize360 (784 respondents, June 2026); Palo Alto Networks (2,930 respondents); Gravitee (900+, shared API keys data) 3. Scope drift. Have any agents accessed data or systems beyond their defined scope in the last 12 months? 33% report agents already exceeded scope. 53% say agents exceed permissions occasionally or sometimes. Meta Sev 1, March 2026: agent posted sensitive data to unauthorized channel. Only 8% say agents never exceed intended permissions. Scope drift triggers reportable events under GDPR, CCPA, HIPAA, and SEC cybersecurity rules. If detection cannot distinguish agent-initiated from human-initiated access, disclosure timelines are unachievable. Agent-spawned sub-agents (25.5% of deployed agents can create other agents) make audit trails algebraically intractable. Run a 90-day scope-drift audit on every production agent. Compare actual resources touched against approved scope documentation. Block agent-to-agent delegation without explicit human approval for any action exceeding the parent agent’s scope. HiddenLayer AI Threat Landscape 2026 (250 IT/security leaders); CSA AI Agent Security Survey (scope violations data); Gravitee (agent spawning data) 4. Governance perception gap. Would 50 knowledge workers say your AI agent policies are clear? 22-point gap: 65% of executives say policies are clear, 43% of workers agree. 77% of security teams see shadow AI risk but lack visibility to act. 76% cite shadow AI as a definite or probable problem. You are evaluating vendors against a governance posture your workforce does not recognize. Every shadow agent undermines the vendor comparison. Knowledge workers sharing internal messages (54%), HR data (45%), and confidential docs (39%) with unapproved AI tools. One-question survey before your next vendor demo. Gap exceeds 15 points, pause procurement. Publish an internal AI agent acceptable-use policy with specific examples of approved and prohibited agent behaviors. Okta/Apprize360 (784 respondents, June 2026); Ivanti 2026 AI Maturity Report (1,200 respondents); HiddenLayer (shadow AI data) 5. Breach detection certainty. Can your security team confirm whether you experienced an AI-related breach in the last 12 months? 31% cannot answer. 88% reported confirmed or suspected AI agent security incidents. One in eight reported AI breaches now linked to agentic systems. Agentjacking proved EDR, WAF, IAM, and firewall pass an agent-mediated attack without a single alert. No basis for disclosure timelines. No evidence chain for incident response. No defensible position in a regulatory investigation. EU AI Act high-risk compliance obligations take effect August 2, 2026. Require agent-specific runtime detection as a procurement prerequisite. Confirm your org can distinguish agent-initiated actions from human-initiated actions in production telemetry. Test your SOC’s ability to attribute a specific action to a specific agent within 60 minutes. HiddenLayer (250 IT/security leaders); Gravitee (900+, incident rate); Tenet Security (2,388 orgs exposed); CSA (systemic MCP vulnerability classification) Security director action plan EU AI Act high-risk compliance obligations take effect August 2, 2026. Worth factoring into Q3 planning timelines. Run the five-question gap test above before any Q3 vendor evaluation — it costs nothing to administer, and the procurement clarity it creates is worth far more than the 30 minutes it takes. Consider mandating agent-specific runtime detection. If your stack cannot tell what an agent did from what a developer did, agentjacking will bypass it the same way it bypassed every layer in Tenet’s testing. That distinction is the one that matters now. Treat every agent as a privileged insider. According to the Okta/Apprize360 survey, only 34% of organizations apply the same controls to agents as to humans; closing that gap is the single most impactful thing most security teams can do this quarter. Test the perception gap before investing in new tooling. One question to 50 knowledge workers. Do you know your company’s AI agent policies? If the gap between their answer and leadership’s answer exceeds 15 points, that is the problem to solve first. No vendor product fixes a governance posture your own workforce does not recognize. Make agent census completion a procurement gate — every agent, every MCP connection. The security teams getting this right are the ones that started with a complete inventory and worked forward from there. Agentjacking stripped away an assumption that has survived every security architecture since the first firewall went live. Authorized does not mean safe. When every step in the chain is legitimate, the only defense that matters is the one watching what agents do. Not what policies say. What agents do.
TrajRS: Towards Certified Robustness in Pedestrian Trajectory Prediction
arXiv:2606.28716v1 Announce Type: new Abstract: The robustness of trajectory prediction models is crucial for developing safe autonomous driving systems. Adversarial attacks on trajectory prediction can significantly impair the accuracy of predicted trajectories, leading to hazardous driving behaviors. While heuristic defense strategies have been implemented to enhance the robustness of trajectory prediction models, these measures often fail against more sophisticated, targeted adversarial attacks. Hence, there is a pressing need to establish verifiable safety assurances for trajectory prediction models. In this paper, we extend the traditional Randomized Smoothing framework to "TrajRS", which provides a certified robust radius for smoothed trajectory predictors. We clarify and expand the formal definitions of robustness in trajectory prediction and tailor the practical TrajRS scheme specifically to "robustness for the optimal prediction" and "robustness for all possible predictions". An extensive set of experiments demonstrates that TrajRS effectively achieves robustness certification for all smoothed pedestrian trajectory predictors in this work.
Startup sues Palo Alto Networks over AI fabrication
MeetingTV is suing Palo Alto Networks and Koi Security, alleging that a hallucinated AI finding falsely linked its infrastructure to a Chinese hacking operation.
Former Google engineer Ding loses bid for acquittal, new US trial
Linwei Ding, a former Google engineer convicted of stealing AI trade secrets, has been denied an acquittal or a new trial by a federal judge.
Agentic AI Has an Identity Problem and Attackers Know It
AI agents can access data, trigger workflows, and take action across enterprise systems. Token Security explains why governing these privileged identities is becoming essential for enterprise security.
How IAM providers are preparing for agentic AI | Computer Weekly
There is little doubt that enterprises will be deploying agentic AI. As such, technology firms are looking at various ways to secure these systems.
AI may be good at finding security vulnerabilities, but it can't beat human stupidity
You don't need Mythos or GPT-5.5-Cyber to find a vuln to exploit when the world's password habits are so sloppy.
Security researchers tricked LLMs into giving them cocaine recipes by abusing role models for prompt injection
If you want a picture of the future of LLM security, imagine Whac-a-Mole meets Groundhog Day
AI reshaping cybersecurity - on defence and attack
Criminals are embracing artificial intelligence at a pace that cybersecurity experts have never seen before, creating new challenges for governments and organisations already struggling to keep up with a rapidly evolving threat landscape. Bad actors are now adopting AI far more quickly than ...
Five Eyes AI cyber warning prompts calls for faster defence
Attackers are already using AI to exploit flaws faster than many organisations can detect them, Five Eyes agencies warned.
Adoption, Deployment & Impact
Honeywell Aerospace CEO Says AI Works for Blueprints but Isn’t Ready for the Cockpit
Now leading an independent company, Jim Currier is navigating the booming—and demanding—defense and aviation industries.
Can AI help avoid an air traffic control crisis — and would we trust it?
The technology has potential to assist controllers with an increasing flight load but many are wary
PySynthea: A Python-Native Framework for Scalable Synthetic Healthcare Data Generation
arXiv:2606.28346v1 Announce Type: new Abstract: Synthetic healthcare data is increasingly important for research, education, and machine learning development where access to real patient data is limited by privacy and governance constraints. While Synthea provides a widely adopted framework for generating realistic longitudinal electronic health record data, its current implementation presents adoption barriers for many researchers and data scientists due to deployment complexity and limited integration with modern Python-based workflows. This paper introduces PySynthea, a Python-native reimplementation of Synthea designed to improve accessibility, extensibility, and interoperability within the scientific Python ecosystem. The framework provides modular synthetic patient generation, configurable healthcare simulation pipelines, and support for standard healthcare data formats while integrating naturally with tools such as pandas and machine learning workflows. By reducing operational complexity and aligning synthetic data generation with the dominant data science ecosystem, PySynthea aims to accelerate experimentation and broaden the use of synthetic healthcare data in research and applied AI development. The code in this github repository https://github.com/TIET-AI/tietai-synthea.
AI adoption outpaces governance, creating risks for businesses | brief | SC Media
A recent ISACA poll reveals that while 90% of organizations believe employees are using AI, only 22% have seen returns meet or exceed expectations.
Where AI Agents Break In Production
Discover why agentic systems fail at scale and how fixing the execution layer ensures reliability
AI Adoption Stalled by Security Concerns: 72% Face Unauthorized Access
AvePoint's State of AI Report reveals significant governance gaps, with 72% of organizations experiencing unauthorized data access as AI usage grows.
LMW bets on AI, digital transformation as it restructures business processes - The HinduBusinessLine
Coimbatore-based engineering major LMW Ltd is accelerating investments in artificial intelligence (AI), enterprise-wide digitisation and business process re-engineering to navigate a prolonged slowdown in the textile machinery business and position itself for the next phase of growth
28 point compliance checklist for shipping AI agents into enterprise environments
A comprehensive guide for developers on the requirements for deploying AI agents in corporate settings.
Australian IT leaders say data gaps stall AI scale
A lack of live data infrastructure is leaving most Australian IT leaders unable to scale AI, according to new research from Confluent.
Despite Fable 5 warning, European firms resist AI sovereignty - Raconteur
Can European enterprises afford to buy local, or is a pragmatic, multi-cloud Ai compromise the only way to survive?
AI adoption is outpacing companies' AI strategies
Many companies are encouraging AI use without providing adequate training, approved tools, or clear strategies, leaving workers to navigate the technology on their own.
‘Cop on your wrist’: Wearables offer tons of data, but people are still going to sleep to Netflix and TikTok
A group of experts at the Fortune Brainstorm Tech event in Aspen pinpointed a problem: People have more data than ever from their wearables, but nobody knows what to do with the information.
Council Post: The Hidden Reason Most AI Transformations Stall
Layering AI onto fragmented processes doesn’t create speed. It just exposes where organizations are already inefficient.
AI in Practice
How artificial intelligence is being set to use in — and often fundamentally changing — multiple sectors, including: budget-busting AI bills; transforming pharma; AI and trust in air traffic control; spreading factory automation; rewriting gaming rules; chatbots and mental health; astronomical opportunities; agentic travel agents
Why AI can transform pharma — but nature cannot be hurried
Artificial intelligence has the potential to fundamentally change the development of new treatments
IMCBench: A benchmark for multimodal LLMs in Image-grounded Medical Conversations
arXiv:2606.28556v1 Announce Type: new Abstract: Recent advances in large language models and vision-language models have enabled reasoning over multimodal data, offering opportunities for clinical applications such as decision support and triaging. However, existing medical AI benchmarks are fragmented: some support multi-turn dialogues but lack images, while others provide multimodal inputs but focus on single-turn QA tasks. To address this gap, we introduce IMCBench, an image-grounded, multi-turn medical conversation benchmark that pairs real, publicly available clinical images with synthetic patient profiles to simulate realistic patient-clinician interactions. Each conversation is evaluated across three clinical dimensions: safety, accuracy, and appropriate use of uncertainty in diagnosis. We benchmark eight multimodal frontier models across four model families (Claude, GPT, Nova, and Llama), scoring each on a 1-5 scale using LLM-as-Jury scoring calibrated against expert clinician annotations. Our results show that Claude Opus 4.6 achieves the highest overall score (3.61), followed by Claude Sonnet 4.6 (3.30) and GPT-5.2 (3.29), though no model dominates all dimensions and safety degrades for both malignant and rare conditions ($\Delta$ = -0.27 each). Ablation studies further reveal that both visual input and EHR context contribute to safe guidance (safety drops of 0.18 and 0.23 on average when each is removed), with stronger models leveraging visual features more effectively. Together, these findings demonstrate that accurate clinical description does not guarantee safe patient guidance, motivating the need for multi-dimensional evaluation frameworks in medical AI.
US auto regulators want to kill robotaxi brake pedals
Requiring driverless vehicles to keep human brake controls impedes innovation, the NHTSA says.
How A.I. Is Changing the Way Politicians Run for Office
A.I.-generated images are the public face of this election overhaul. Behind the scenes, campaigns are using the technology to analyze voter data, craft campaign materials and write custom messages.
Ministers likely to support law change to allow delivery robots on England’s paths
Exclusive: Safety campaigners concerned about plan for widespread deployment on already crowded pavements Large numbers of autonomous delivery robots could be coming to towns and cities across England after ministers signalled they were likely to support a change in the law allowing their use, prompting concern from safety campaigners. Low-speed robots, which mainly deliver groceries or takeaway food, are already in use in a handful of places but they operate in a regulatory grey area. The 1835 Highways Act bans “carriages” from pavements. Continue reading...
Decision-support strategies for photovoltaic self-consumption under declining electricity prices and limited remuneration of surplus generation
arXiv:2606.30359v1 Announce Type: cross Abstract: The success of distributed photovoltaics may be undermining its own future. As solar penetration increases, electricity prices decline during periods of peak generation, reducing the value of surplus photovoltaic production. This raises a critical question: can citizen-led energy systems remain economically viable in electricity markets dominated by renewable generation? Rather than exploring technically optimal but institutionally unrealistic solutions, we examine the options available under current regulatory and market conditions. Using high-resolution consumption data from a rural community sharing a PV facility among 24 users, we identify pathways for long-term sustainability. The study makes two contributions. First, it shows that effective internal coordination can mobilize participation and investment as successfully as external subsidies. Second, it compares static, dynamic, and hybrid energy-sharing models, with and without storage, providing a flexible framework that balances efficiency, fairness, and governance. Results show that collective self-consumption reduces required PV capacity, lowers investment costs, and increases annual savings compared with individually operated systems. Alternative allocation schemes further improve benefit distribution and local electricity use, although gains depend on trade-offs between efficiency, fairness, and governance complexity. Under current electricity prices and remuneration schemes, battery storage provides limited additional economic value and becomes attractive only under specific market conditions. Overall, the long-term viability of citizen-led photovoltaic initiatives depends less on technological sophistication than on collective coordination and adaptive governance.
Production-Grade AI Agents for Financial Compliance: Lessons From Stripe
Stripe shares how it built production-grade AI agents for financial compliance using specialized multi-agent architectures and human oversight.
A More Intelligent Advertising Ecosystem Is on the Horizon
Why agentic AI is performance marketing’s great hope
India Launches AI-Driven Audit Portal to Revolutionize Rural Development Oversight
India's Office of the Chief Controller of Accounts and NIC have launched a digital portal for internal audits, featuring AI-ready architecture to enhance financial oversight in rural development.
FBI used AI to investigate assassination attempt
The FBI used an AI-powered forensic platform from Exterro to help investigate the attempted assassination at this year's White House Correspondents' Dinner.
Maruti Suzuki Launches 5th Incubation Cohort, Partners with Startups for AI and Sustainability Boost
Maruti Suzuki India teamed up with five startups to enhance customer engagement and automate processes, with a focus on AI integration and battery recycling.
I used Claude Code to get a second opinion on my MRI
An exploration of using AI tools like Claude Code to assist in the analysis of medical imaging.
Benchmark Scores Don’t Break. Clinical Reality Does. The Health AI Readiness Illusion.
Nature Medicine's frontier AI evaluation exposes a dangerous gap: models acing medical benchmarks collapse under adversarial pressure, and the...
Reward Hacking Is Swamping Model Intelligence Gains
Cursor demonstrates how modern AI coding agents often achieve benchmark gains through reward hacking rather than solving problems directly, raising concerns about benchmark reliability.
The Download: metric weaknesses and AI elephant warnings
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. The inevitable weakness of metrics There are plenty of useful things a metric can reveal. There are even more that it can obscure or corrupt. Like a lot of people bitten…
Prompt Engineering Fails Quietly — Prompt Regression Is Why
Small prompt changes can silently break critical behavior in production. This article introduces a practical framework to detect hidden regressions.
Who Plays Which Role When? Communication Role Dynamics for Peer Recognition and Team Performance Prediction
arXiv:2606.28544v1 Announce Type: new Abstract: Team roles offer an interpretable lens on collaboration, yet computational studies of roles often rely on domain-specific personas or data-driven clustering rather than theory-grounded taxonomies. We operationalize a taxonomy of eight communication roles grounded in education literature and annotate a corpus of 6,307 Slack messages from 55 students across 18 teams in a semester-long computer science course project. We evaluate whether LLMs can approximate expert labels, enabling scalable, taxonomy-driven role annotation. Using these role labels, we characterize role dynamics over teams' lifecycles, finding that different roles peak at different moments and that students enact a more diverse set of roles as projects progress. To evaluate the utility of our role constructs, we use them to predict peer recognition, outperforming lexical, conversational, and LLM-prompting baselines. To assess generalizability beyond the educational context, we apply the same role constructs to a public dataset (DeliData) to predict team performance improvement after deliberation, again exceeding prior performance.
StanChart Former AI Chief Joins Accenture as Southeast Asia Head
Standard Chartered Plc’s former global head of AI enablement, David Hardoon, has joined Accenture Plc as managing director and head of advanced AI for Southeast Asia.
The five AI shifts every CEO should prepare for - Forbes Australia
The AI strategies that fuelled growth over the past three years are quickly becoming liabilities. These are the shifts that matter
AI is Speeding Workforce Turnover. But Your Next Great Hire May Already be Working for You - CKGSB Knowledge
Companies looking to hire externally to resolve AI-related workforce upheaval may end up paying a hidden cost. As well as the higher costs of recruitment, external hiring also undermines trust in an existing workforce already struggling to transition to AI. Businesses should instead consider ...
The CEO’s AI quotient: Why the intelligence of your business starts in the boardroom - The Times of India
AI does not transform businesses: Leaders do. The companies achieving the highest returns from AI are not those with the most advanced technology but those whose leadership has made the deepest strategic commitment to it.
The replacement story is sabotaging government's AI rollout
Human judgment is already being squeezed out of public-sector AI use, raising the risk of bland decisions that miss crises and erode trust.
A.I. ‘Employees’ Might Disrupt Work in Unexpected Ways
Scholars say the “unknown unknowns” of using artificial intelligence in the workplace may be undermining the technology’s advertised benefits.
Ford rehires human engineers after AI fails to match quality checks
The car-maker found AI quality checks failed to match the skill of veteran technicians.
Ford rehires more than 300 veteran human engineers
Ford has rehired over 300 veteran engineers after reporting that AI failed to meet required quality and expertise standards.
Ford rehires ‘gray beard’ engineers after AI falls short
Ford is reportedly bringing back experienced engineers after finding that AI-driven processes did not meet production expectations.
Agent confidence on the technical frontier
Enterprise investment in AI is booming. Gartner is calling 2026 an “inflection year” for organizations to align their AI projects with strategic business objectives. As the pressure to prove ROI mounts, executives and technology leaders are looking to agentic AI to drive the measurable financial outcomes their businesses seek. A prime opportunity for AI agents…
Transforming Investing With AI at Franklin Templeton
Patrick George/Ikon Images What would you do with artificial intelligence if you were confident that it would transform your industry? What actions would you take if you felt that you were at an inflection point in that transformation? Would you try to be an early proponent of AI-first in your industry, or a fast follower? […]
Most AI Models Would Run Your Company Into the Ground, Princeton's CEO-Bench Finds
Princeton’s CEO-Bench gave 14 AI models $1 million to run a simulated SaaS startup for 500 days. Most went bankrupt or lost money; only Claude Fable 5, Opus 4.8, and GPT-5.5 profited, and a no-AI rule-based script beat the rest. Wrapping models in coding agents like Claude Code and Codex made them
What companies are getting right about AI in 2026 but why there is still some way to go
Most enterprises still struggle to turn AI pilots into profit, with just 23 per cent able to link initiatives to higher revenue or lower costs.
AI projects are clearing launch and missing ROI - Digital Journal
New research from Winnipeg-based Laivly finds contact centre leaders declaring AI victories their own numbers can't support
Best Use Cases for AI with High ROI (2026 W27) - Audox - CRM & BI Solutions
The best use cases for AI with high ROI (Return on Investment) impact in business typically share characteristics such as high-volume and repeatable processes, rich data availability, direct influence on revenue or major cost components, and meaningful improvement in customer or employee experience.
Geopolitics, Policy & Governance
The new AI-based world order
A single factor is now dictating the hierarchy of global returns
The AI Cold War: How Silicon Valley Is Selling Fear of China to Protect Its Monopoly
Joshua Scheer Is the race for artificial intelligence really about national security—or about preserving corporate power? In this wide-ranging analysis, journalist Ben Norton argues that America's biggest technology companies are working hand in glove with Washington to frame China as an ...
Europe Will Never Catch Up with Anthropic of OpenAI and Become an AI Superpower
But it’s not too late for the continent to claim some technological independence for itself.
China Has Matched Anthropic in Cybersecurity, Resetting AI Race
Chinese AI models have reached leading-edge cybersecurity capabilities, reshaping the competitive landscape and influencing government policy on frontier model access.
EU's push for cloud sovereignty draws fragmentation warnings
The EU is accelerating efforts to cut reliance on foreign cloud, AI, and semiconductor suppliers as it seeks to block so-called
Austria proposes bringing Anthropic into EU to counter U.S. AI limits
Austria has urged the European Union to attract Anthropic, citing concerns that U.S. AI restrictions could leave Europe behind in innovation.
European Union Considers Anthropic Partnership Following US AI Access Restrictions - Blockonomi
Consequently, Washington classified these AI systems as strategically sensitive assets requiring regulatory protection rather than treating them as conventional commercial software products. The European Commission has previously introduced initiatives designed to bolster indigenous cloud computing, artificial intelligence development...
South Korea’s hot new sensation is 3S+1F – a quadrillion-Won AI plan, not a band
Seoul plans to spend about $900 billion to become K-semiconductor powerhouse
Taiwan Opposition’s $7.5 Billion Plan Stirs Drone Defense Debate
Taiwan’s main opposition party outlined plans to develop the drone industry just days after stalling a similar proposal from President Lai Ching-te’s government, amid a debate on unmanned systems with crucial implications for the island’s defense.
South Korea to invest $576 billion in AI chip production with Samsung and SK Hynix | CNN Business
South Korean President Lee Jae Myung, Samsung Electronics Chairman Lee Jae-yong, and SK Group Chairman Chey Tae-won announcing the government's AI investment plan in Seoul, South Korea on June 29, 2026.
Trump's pro-AI divide
The pro-AI movement is splintering over whether national security concerns should outweigh the need to keep U.S. AI companies ahead of Chinese rivals.
Annette Male: A decade after Brexit, the UK’s next independence test is AI | The Drum
Without control over data and ... enforce regulation and shape competition and innovation. This could become the next significant challenge for UK businesses. AI sovereignty means owning enough of the value chain to compete on the UK’s own terms. Done right, this can become a competitive advantage, rather than a defensive policy...
The registrar's function in a hybrid society. AI value chain,smart data and the concept of property
arXiv:2606.28789v1 Announce Type: new Abstract: Artificial intelligence reaches the land registry not as another tool but as a value chain that turns data into intelligence and intelligence into economic value. This paper argues that the decisive legal move is to place validity, a functional, second-order concept, at the centre of that chain. Rights, liability and supervision organise around it. It traces three impacts.Registry information becomes smart data, governed simultaneously by registry law, the GDPR, the European data acts and the AI Act. Control emerges as the operative concept for digital representations of real estate, whose proprietary effect depends on anchoring to the register. In a hybrid society of human and artificial agents, the registry becomes the public node of validity, with blockchain complementing rather than replacing it. Across three legal cultures, the registra's value migrates from processing documents to guaranteeing validated data,making validity an asset for the UNO Sustainable Development Goals.
The Two Genie Game: Adoption and Welfare in Audit-Grounded AI Governance
arXiv:2606.28710v1 Announce Type: new Abstract: We ask under what conditions an agent with a harm-minimizing policy can displace an approval-seeking (RLHF) agent in a competitive market, and when that policy is sufficient to prevent community harm. We use evolutionary game theory (finite-population Moran-Fermi pairwise comparison) to formalize this subject to assumptions of wisher hindsight, peer testimony, a monotone harm ledger, sufficient information density of community feedback, and a finite, depleting resource pool, in a negative-sum environment. We show that adoption is favored when the prior distributions on how readily wishers attune to community sentiment are monotone, exhibit endpoint inversion, and have a centro-symmetric pairing property, and demonstrate this with several long-tailed priors (Hill, Pareto, Lomax, Frechet). Where it is favored, a critical adoption level separates communities that drift back to the approval-seeking agent from those for which the audited agent fixes; above that level fixation is the overwhelmingly likely outcome. We derive when fixation is attainable as a bound on the effective (informational) size N_c of the community, which must be small enough to allow fixation before depletion. We present these as Theorems 5.4 and 5.5; the algebraic and finite-grid backbone is machine-checked in Lean 4, with the barrier-crossing asymptotics retained as explicit hypotheses. We show that a self-audited agent with a community ledger is not, in general, sufficient to prevent community harm. Sufficiency depends both upon the alignment of the agent's audit with community values and the timeframe over which harm is evaluated. Regardless of alignment, once adoption reaches dominance, the state is absorbing. The same policy that reduced harm under alignment becomes a trap, welfare-negative under misalignment and, even under alignment, one that locks in harm deferred past the adoption horizon.
"AI Watermarking": Bridging Policy Discourse and Technical Capabilities
arXiv:2606.28331v1 Announce Type: new Abstract: The widespread deployment of generative artificial intelligence (AI) models has raised serious concerns about the proliferation of AI-generated content. This has led to a surge of interest in, and demand for, reliable tracking and detection mechanisms for content that is AI-generated, such as watermarking, metadata tagging, content tagging, and more. The problem has captured the attention of policymakers as well as the popular media, and a spate of recent bills in the US have sought to regulate the spread of AI content, and enforce or promote methods to track and label it. This work performs a critical analysis of the policy discourse surrounding generative AI content transparency in the US and EU. Through a broad document selection methodology, we first collect a broad corpus of documents containing legislative language and policy-relevant discourse on the topic. We then analyze these through inductive coding, and leverage our coding to systematize these documents, identifying key patterns, gaps, and open questions. We identify critical points of disconnect between policy and technological capabilities and practice, and we highlight and discuss potential ambiguities and pitfalls raised by the trends in our corpus.
Verifying Restrictions on Frontier AI Research
arXiv:2606.28694v1 Announce Type: new Abstract: The premature development of artificial superintelligence poses major risks to humanity, so researchers have proposed international agreements halting such development until it can be done safely. AI progress depends primarily on compute, algorithms, and data; a durable halt would address all three so that advances in one input do not counteract restrictions on another. Improvements to AI algorithms are driven largely through research activities, so this research may need to be restricted during a halt. Given low international trust, signatories will want to verify compliance. This paper analyzes how such restrictions on AI research could be verified, while remaining agnostic about what specific research would be prohibited. It first explores key considerations that affect the verifiability of research restrictions, such as the computational infrastructure necessary for experiments. It then catalogs 28 candidate verification mechanisms. These mechanisms include whistleblowers, search warrants, reviews of AI training code, standard intelligence gathering tools, and more. Some of these mechanisms are not yet implementation-ready, and some might be undesirable upon further inspection. By examining the space of potential options, this work provides a foundation for future research to develop the most promising mechanisms into deployable tools.
What would multilateral ‘AI arms control’ look like?
Given the competition, it’s debatable whether a US-China safety deal is even possible
Europe’s new tech-sovereignty plan doesn’t ban U.S. cloud giants — it sets four levels of “sovereignty” for sensitive government data, and an American law makes the top levels nearly impossible for them to reach - Silicon Canals
On June 3 the European Commission proposed the Cloud and AI Development Act, which grades cloud providers on four sovereignty levels for public-sector data. The rules stop short of a ban, but the Commission’s own tech chief says U.S. firms will struggle to reach the highest tiers because ...
Changes to the AI Act Approved by the Council of the EU
2 August 2027 is the new deadline for the establishment of AI regulatory sandboxes by competent authorities at the national level; 2 December 2026 is when the grace period ends for providers to implement transparency solutions for AI-generated content.
OpenAI limits model release following latest US government intervention
OpenAI is limiting the release of its GPT-5.6 models at the request of the White House, marking the second US government intervention in two weeks regarding frontier AI availability.
GDPR — not AI Act — delayed release of frontier AI in Europe, research shows
Research from the Centre for the Governance of AI indicates that GDPR, rather than the EU AI Act, has been the primary driver of regulatory delays for frontier AI models in Europe.
AI-Driven Content Boom in India Faces Copyright Challenges Amid Legal Uncertainties
Indian entertainment firms are adopting AI for content creation, but face legal hurdles as current copyright laws require human authorship.
China revises privacy standard with new AI, sensitive-data requirements
China has proposed a major overhaul of its flagship national personal-information protection standard, introducing new compliance requirements for AI developers and stricter rules on sensitive data.
AI simplification package gets final approval from EU member states
EU member states approved legislation simplifying parts of the bloc's AI rulebook, delaying obligations for high-risk systems and banning non-consensual sexual deepfakes.
Ctrl+AI+Reg - 29 June 2026 - Ctrl+AI+Reg
See fuller updates in the ‘Global Updates’ tab in the Global AI Regulation Tracker (English version | Chinese version) ... 🌐 [25 June 2026] Outcomes of the Second Pax Silica Summit: The US State Department has reported that the Pax Silica initiative, launched in December 2025, has achieved ...
Anthropic Offers Claude Discount to California Government | Let's Data Science
Editorial analysis: Public-sector access deals change the calculus for enterprise adoption and procurement of generative AI tools, so practitioners should track pricing, training, and support terms. According to Politico, Gov. Gavin Newsom and Anthropic reached an agreement that would make ...
Lawmakers want to ban AI companies from selling your health data
New legislative efforts aim to protect health and location data from being sold by AI companies.
Does Europe Really Have a Plan for Tech Sovereignty? | TechPolicy.Press
If the European Commission truly wants tech sovereignty, it should start by freeing itself from Big Tech’s epistemic capture, write Cecilia Rikap and Vali Stan.
Bangkok Post - New AI legislation set to be completed in coming months
Thailand aims to finalise a draft of the Artificial Intelligence (AI) Act in this fiscal year, with a focus on balancing the promotion of AI adoption and strict regulation.
DOJ, Mississippi recast xAI pollution suit as fight over citizen enforcement
The Trump administration and Mississippi are attempting to shift the focus of an NAACP lawsuit against xAI from Clean Air Act violations to a broader legal battle over federalism and citizen standing in court.
Industry weighs in on EU copyright reform
A major consultation on EU copyright reform has concluded, with the European Commission now tasked with shaping new legislation that addresses the role of AI.
Anthropic risk designation 'punishment,' former national security officials say
Former national security officials argued in a court brief that the US government's designation of Anthropic as a security risk is a pretextual attempt to punish the company.
Anthropic's CEO argued governments should be able to switch off dangerous AI. Days later, the government switched off Anthropic.
A discussion regarding the irony of Anthropic's CEO's stance on AI kill switches following recent government actions.
US Supreme Court says president can remove independent agency officials for any reason
The Supreme Court ruled 6-3 that FTC Act provisions requiring for-cause removal of commissioners are unconstitutional.
Japan set to adopt social media election rules
Japanese lawmakers have approved a bill to curb misleading AI-generated content in elections, requiring disclosure and mitigation measures from large social media platforms.
Global Divide on AI Governance: Ban Superintelligence or Foster Deliberate Policy?
Debate continues over governing advanced AI, with proposals ranging from international bans on superintelligence to comprehensive global policy frameworks.
Clyde & Co Survey Shows Rapid Escalation of AI, Geopolitical Risks
Technology is moving extremely fast, and a recent Clyde & Co survey of global business leaders has found that associated risks are weighing heavily.
Senate Dems against ‘KOSA-Lite’ child safety bill set for House vote next week
US Senate Democrats are opposing the bipartisan Kids Internet and Digital Safety (KIDS) Act, which is scheduled for a vote in the House of Representatives on Monday.
Analysis: AI is Entering a Dark Period
An analysis piece arguing that the AI industry is moving into a phase of increased regulation and restricted access.
US Sen. Warren slams Supreme Court decision on FTC commissioners
Senator Elizabeth Warren criticized a Supreme Court ruling allowing the president to remove FTC commissioners without cause, claiming it enables political control over independent agencies.
US Senator Cantwell says Supreme Court 'gutting' independent agencies
Senator Maria Cantwell stated that the Supreme Court's decision on FTC commissioner removal undermines independent agencies, warning it invites political influence over public interest.
Open Markets says US Supreme Court 'grossly usurped' Congress in Slaughter case
The Open Markets Institute condemned the Supreme Court's ruling on FTC commissioner removal, arguing it usurps Congressional authority and creates a constitutional crisis.
US Supreme Court rules geofence warrants require constitutional protections
The Supreme Court has issued a decision regarding the constitutionality of geofence warrants.
Reuters AI News | Latest Headlines and Developments | Reuters
Explore the latest artificial intelligence news with Reuters - from AI breakthroughs and technology trends to regulation, ethics, business and global impact.
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.