Wed 13 May 2026
Daily Brief — Curated and contextualised by Best Practice AI
Microsoft Invests $100 Billion, Alibaba and Tencent Struggle, and Nvidia Joins Trump's China Trip
TL;DR Microsoft has invested over $100 billion in its partnership with OpenAI, highlighting its commitment to AI growth. Meanwhile, Alibaba and Tencent reported disappointing revenues, reflecting challenges in converting AI investments into growth. Nvidia CEO Jensen Huang joined President Trump's delegation to China, underscoring the geopolitical dimensions of AI. Anthropic is in talks to raise funding at a $950 billion valuation, while CME plans to launch a futures market for AI computing power.
The stories that matter most
Selected and contextualised by the Best Practice AI team
Dissatisfied: Three-fourths of AI customer service rollouts are a letdown
AI rollback rates hit 81% at firms with mature guardrails, suggesting enterprises are struggling to manage the systems in production, says Sinch
CME plans to launch futures market for AI computing power
New contracts will allow traders and companies to bet on and hedge future price of GPU rental
AI-Fueled Rally Pushes Industrials to Tech-Like Valuations, Data Show - Bloomberg
Optimism surrounding the potential for industrial companies to profit from the artificial intelligence boom has fueled record-setting momentum in the sector. Now worries are mounting that the group’s link to AI may be getting too tight.
The Evaluation Differential: When Frontier AI Models Recognise They Are Being Tested
arXiv:2605.11496v1 Announce Type: cross Abstract: Recent published evidence from frontier laboratories shows that contemporary AI models can recognise evaluation contexts, latently represent them, and behave differently under those contexts than under deployment-continuous conditions. Anthropic's BrowseComp incident, the Natural Language Autoencoder findings on SWE-bench Verified and destructive-coding evaluations, and the OpenAI / Apollo anti-scheming work all document instances of this phenomenon. We argue that these findings create a claim-validity problem for safety conclusions drawn from frontier evaluations. We introduce the Evaluation Differential (ED), a conditional divergence in a target behavioural property between recognised-evaluation and deployment-continuous contexts, define a normalised effect-size form (nED) for cross-property comparison, and prove that marginal evaluation scores cannot identify ED. We develop a typology of safety claims (ED-stable, ED-degraded, ED-inverted, ED-undetermined) by their warrant-status under documented divergence, and specify TRACE (Test-Recognition Audit for Claim Evaluation), an audit protocol that wraps existing evaluation infrastructure and produces restricted claims rather than capability scores. We apply the framework retrospectively to three publicly documented evaluation incidents and discuss governance implications for system cards, conformity assessment, and the international network of AI safety and security institutes. TRACE does not eliminate adversarial adaptation; it disciplines the claims drawn from evaluation evidence by making explicit the conditions under which that evidence was produced.
Amazon staff use AI tool for unnecessary tasks to inflate usage scores
In-house MeshClaw tool enables employees to delegate jobs to AI agents and climb company’s AI leaderboard
Uncensored open-source video model runs locally, generates 10-second clips at 24fps
Sulphur 2 is an uncensored video model based on LTX that runs locally. It allows for the generation of content typically restricted by commercial platforms.
Economics & Markets
AI-Fueled Rally Pushes Industrials to Tech-Like Valuations, Data Show - Bloomberg
Optimism surrounding the potential for industrial companies to profit from the artificial intelligence boom has fueled record-setting momentum in the sector. Now worries are mounting that the group’s link to AI may be getting too tight.
Alphabet’s Isomorphic Labs raises $2.1bn in Series B funding
The organisation plans to use the investment as a means of accelerating the application of its AI model at scale. Read more: Alphabet’s Isomorphic Labs raises $2.1bn in Series B funding
AI Startups Are Commanding Valuations Public SaaS Companies Could Never Get - Benzinga
Data from Forge Global shows that private-market enthusiasm around AI continues to dramatically outpace valuations in public equities.
Anthropic's new share transfer rules shock private market investors - CNBC TV18
Anthropic’s warning that unauthorised share transfers are “void” has rattled private-market investors, raising fears over liquidity, ownership rights, and inflated AI startup valuations.
SoundHound AI (SOUN) Valuation Check After OASYS Launch And Cautious Market Reaction - Simply Wall St News
SoundHound AI (SOUN) is back in focus after its latest quarter paired strong revenue with the launch of its OASYS AI orchestration platform and new partnerships, yet the stock reacted cautiously. See our latest analysis for SoundHound AI. At a latest share price of US$8.45, SoundHound AI has ...
Australia proposes A$70m 'AI Accelerator' grants in federal budget
The Australian government has announced A$70 million in 'AI Accelerator' grants to boost domestic AI development and integrate the technology into government services.
Bezos’s Blue Origin weighs first external fundraising
Rocket maker’s chief tells staff it will need outside capital to meet ambitious launch plans
Wolfspeed Stock Surges 23% on AI Infrastructure Hype and Short Squeeze Momentum
Global demand for silicon carbide is projected to grow rapidly as AI energy consumption soars and electrification accelerates. The company's Mohawk Valley and other U.S. fabs position it to capture domestic content advantages amid supply chain security concerns. Challenges remain significant.
South Korea has room for active fiscal spending thanks to AI boom, Fitch says | Reuters
South Korea has room to use fiscal policy to mitigate the economic impact of the Middle East conflict, thanks to an artificial intelligence boom, credit ratings agency Fitch said on Wednesday.
Import AI 456: RSI and economic growth; radical optionality for AI regulation; and a neural computer
In the short term, this means avoiding overregulation while rapidly building the institutions, information channels and legal authorities needed to respond competently to a broad range of scenarios.” The key idea - invest now for an uncertain future: Given the immense stakes of AI development, “governments should be willing to spend an extraordinary amount of money, effort, and political capital on preserving optionality”, they write.
Chipmaker Cerebras joins OpenAI’s inner circle — for a price
Launching into the magic of the Altman-osphere could prove to be quite a windfall
When to Ask a Question: Understanding Communication Strategies in Generative AI Tools
arXiv:2605.11240v1 Announce Type: cross Abstract: Generative AI models differ from traditional machine learning tools in that they allow users to provide as much or as little information as they choose in their inputs. This flexibility often leads users to omit certain details, relying on the models to infer and fill in under-specified information based on distributional knowledge of user preferences. Such inferences may privilege majority viewpoints and disadvantage users with atypical preferences, raising concerns about fairness. Unlike more traditional recommender systems, LLMs can explicitly solicit more information from users through natural language. However, while directly eliciting user preferences could increase personalization and mitigate inequality, excessive querying places a burden on users who value efficiency. We develop a stylized model of user-LLM interaction and develop an objective that captures tradeoff between user burden and preference representation. Building on the observation that individual preferences are often correlated, we analyze how AI systems should balance inference and elicitation, characterizing the optimal amount of information to solicit before content generation. Ultimately, we show that information elicitation can mitigate the systematic biases of preference inference, enabling the design of generative tools that better incorporate diverse user perspectives while maintaining efficiency. We complement this theoretical analysis with an empirical evaluation illustrating the model's predictions and exploring their practical implications.
Fill-Side Non-Retail Trading on Polymarket: An Empirical Study of Behavioral Tiers and Microstructure Signatures Under Quote-Attribution Constraints
arXiv:2605.11640v1 Announce Type: cross Abstract: Prediction markets cannot exist without market makers, arbitrageurs, and other non-retail liquidity providers, yet the supply-side microstructure of Polymarket-class venues has not been characterized at on-chain pseudonymous-address scale. This paper studies non-retail participation on Polymarket using an empirical run on the PMXT v2 archive over 2026-04-21 through 2026-04-27 (13,356,931 OrderFilled events; 77,204 addresses with five+ fills; 43,116 markets). We report three findings. First, Polymarket's off-chain CLOB architecture renders address-level quote-lifecycle attribution permanently unavailable: OrderPlaced and OrderCancelled events are off-chain and absent from public archives, so quote-intensity, two-sided-ratio, and posted-spread features cannot be built at address level. We document this as a structural validity-gate failure (G-QUOTE-LIFE universal fail) and restrict analysis to a six-feature fill-side vector. Second, density-based clustering (DBSCAN, fifteen sensitivity configurations) on the fill-side vector produces a single dense cluster with zero noise: fill-side behavior in the empirical window is uni-modal under the six-feature vector, contradicting the pre-registered hypothesis of four-to-five separable archetypes. Third, robust retail vs non-retail separation is achievable through clustering-independent feature-tier stratification: whale-tier, high-frequency-operator, and power-trader tiers jointly hold 81.4% of total notional across 12.6% of addresses. Address-level market-making and liquidity-provision claims are withdrawn per the G-QUOTE-LIFE failure; spoof-by-non-fill manipulation detection is downgraded to market-level book diagnostics. A privacy-respecting derived-dataset deposit accompanies the paper as Bundle 3 of the PMXT family. Fourth paper in a four-paper programme on event-linked perpetuals and leveraged prediction-market microstructure.
WhatsApp suspends AI chatbot fee; opens door on EU talks to resolve probe
WhatsApp is putting on hold fees demanded from the likes of OpenAI and Poke.com for distributing chatbots over the messaging platform to allow negotiations with EU officials.
Meta profited from illegal scam ads, California county lawsuit alleges
Santa Clara county claims Meta Platforms violated the state’s false advertising and unfair business practices laws California’s Santa Clara county has sued Meta Platforms, alleging it has profited from Facebook and Instagram ads promoting scams in violation of California’s false advertising and unfair business practices laws. The lawsuit – filed on Monday in Santa Clara county superior court on behalf of all California residents – accuses the social media giant of tolerating fraudulent advertising on a global basis. The suit seeks restitution, civil damages and an order prohibiting Meta from engaging in unfair business practices. Continue reading...
Sega has canceled its live service 'Super Game' due to 'intensifying market competition,' and I really, really hope it's a sign that the industry is finally correcting itself | PC Gamer
Years of catastrophic bets on F2P mega hits may finally be subsiding.
Why personalised pricing could be a good deal for shoppers
Selling the same thing to different people at a different price sounds unfair, but it needn’t be bad news for consumers
Nintendo needs Super Mario once again
The little plumber is a bulwark of intellectual property as rising chip prices pressure the company
Perceptron Mk1 shocks with highly performant video analysis AI model 80-90% cheaper than Anthropic, OpenAI & Google
AI that can see and understand what's happening in a video — especially a live feed — is understandably an attractive product to lots of enterprises and organizations. Beyond acting as a security "watchdog" over sites and facilities, such an AI model could also be used to clip out the most exciting parts of marketing videos and repurpose them for social, identify inconsistencies and gaffs in videos and flag them for removal, and identify body language and actions of participants in controlled studies or candidates applying for new roles. While there are some AI models that offer this type of functionality today, it's far from a mainstream capability. The two-year-old startup Perceptron Inc. is seeking to change all that, however. Today, it announced the release of its flagship proprietary video analysis reasoning model, Mk1 (short for "Mark One") at a cost — $0.15 per million tokens input / $1.50 per million output through its application programming interface (API) — that comes in about 80-90% less than other leading proprietary rivals, namely, Anthropic's Claude Sonnet 4.5, OpenAI's GPT-5, and Google's Gemini 3.1 Pro. Led by Co-founder and CEO Armen Aghajanyan, formerly of Meta FAIR and Microsoft, the company spent 16 months developing a "multi-modal recipe" from the ground up to address the complexities of the physical world. This launch signals a new era where models are expected to understand cause-and-effect, object dynamics, and the laws of physics with the same fluency they once applied to grammar. Interested users and potential enterprise customers can try it out for themselves on a public demo site from Perceptron here. Performance across spatial and video benchmarks The model's performance is backed by a suite of industry-standard benchmarks focused on grounded understanding. In spatial reasoning (ER Benchmarks), Mk1 achieved a score of 85.1 on EmbSpatialBench, surpassing Google’s Robotics-ER 1.5 (78.4) and Alibaba’s Q3.5-27B (approx. 84.5). In the specialized RefSpatialBench, Mk1's score of 72.4 represents a massive leap over competitors like GPT-5m (9.0) and Sonnet 4.5 (2.2), highlighting a significant advantage in referring expression comprehension. Video benchmarks show similar dominance; on the EgoSchema "Hard Subset"—where first-and-last-frame inference is insufficient—Mk1 scored 41.4, matching Alibaba’s Q3.5-27B and significantly beating Gemini 3.1 Flash-Lite (25.0). On the VSI-Bench, Mk1 reached 88.5, the highest recorded score among the compared models, further validating its ability to handle actual temporal reasoning tasks. Market positioning and the efficiency frontier Perceptron has explicitly targeted the "Efficiency Frontier," a metric that plots mean scores across video and embodied reasoning benchmarks against the blended cost per million tokens. Benchmarking data reveals that Mk1 occupies a unique position: it matches or exceeds the performance of "frontier" models like GPT-5 and Gemini 3.1 Pro while maintaining a cost profile closer to "Lite" or "Flash" versions. Specifically, Perceptron Mk1 is priced at $0.15 per million input tokens and $1.50 per million output tokens. In comparison, the "Efficiency Frontier" chart shows GPT-5 at a significantly higher blended cost (near $2.00) and Gemini 3.1 Pro at approximately $3.00, while Mk1 sits at the $0.30 blended cost mark with superior reasoning scores. This aggressive pricing strategy is intended to make high-end physical AI accessible for large-scale industrial use rather than just experimental research. Architecture and temporal continuity The technical core of Perceptron Mk1 is its ability to process native video at up to 2 frames per second (FPS) across a significant 32K token context window. Unlike traditional vision-language models (VLMs) that often treat video as a disjointed sequence of still images, Mk1 is designed for temporal continuity. This architecture allows the model to "watch" extended streams and maintain object identity even through occlusions, a critical requirement for robotics and surveillance applications. Developers can query the model for specific moments in a long stream and receive structured time codes in return, streamlining the process of video clipping and event detection. Reasoning with the laws of physics A primary differentiator for Mk1 is its "Physical Reasoning" capability. Perceptron defines this as a high-precision spatial awareness that allows the model to understand object dynamics and physical interactions in real-world settings. For example, the model can analyze a scene to determine if a basketball shot was taken before or after a buzzer by jointly reasoning over the ball's position in the air and the readout on a shot clock. This requires more than just pattern recognition; it requires an understanding of how objects move through space and time. The model is capable of "pixel-precise" pointing and counting into the hundreds within dense, complex scenes. It can also read analog gauges and clocks, which have historically been difficult for purely digital vision systems to interpret with high reliability. It also seems to have strong general world and historical knowledge. In my brief test, I uploaded a vintage public domain film of skyscraper construction in New York City dated 1906 from the U.S. Library of Congress, and Mk1 was able to not only correctly describe the contents of the footage — including odd, atypical sights as workers being suspended by ropes — but did so rapidly and even correctly identified the rough date (early 1900s) from the look of the footage alone. A developer platform for physical AI Accompanying the model release is an expanded developer platform designed to turn these high-level perception capabilities into functional applications with minimal code. The Perceptron SDK, available via Python, introduces several specialized functions such as "Focus," "Counting," and "In-Context Learning". The Focus feature allows users to zoom and crop into specific regions of a frame automatically based on a natural language prompt, such as detecting and localizing personal protective equipment (PPE) on a construction site. The Counting function is optimized for dense scenes, such as identifying and pointing to every puppy in a group or individual items of produce. Furthermore, the platform supports in-context learning, allowing developers to adapt Mk1 to specific tasks by providing just a few examples, such as showing an image of an apple and instructing the model to label every instance of Category 1 in a new scene. Licensing strategies and the Isaac series Perceptron is employing a dual-track strategy for its model weights and licensing. The flagship Perceptron Mk1 is a closed-source model accessed via API, designed for enterprise-grade performance and security. However, the company is also maintaining its "Isaac" series, which kicked off with the launch of Isaac 0.1 in September 2025, as an open-weights alternative. Isaac 0.2-2b-preview, released in December 2025, is a 2-billion parameter vision-language model with reasoning capabilities that is available for edge and low-latency deployments. While the weights for the Isaac models are open on the popular AI code sharing community Hugging Face, Perceptron offers commercial licenses for companies that require maximum control or on-premise deployment of the weights. This approach allows the company to support both the open-source community and specialized industrial partners who need proprietary flexibility. The documentation notes that Isaac 0.2 models are specifically optimized for sub-200ms time-to-first-token, making them ideal for real-time edge devices. Background on Perceptron founding and focus Perceptron AI is a Bellevue, Washington-based physical AI startup founded by Aghajanyan and Akshat Shrivastava, both former research scientists at Meta’s Facebook AI Research (FAIR) lab. The company’s public materials date its founding to November 2024, while a Washington corporate filing record for Perceptron.ai Inc. shows an earlier foreign registration filing on October 9, 2024, listing Shrivastava and Aghajanyan as governors. In founder launch posts from late 2024, Aghajanyan said he had left Meta after nearly six years and “joined forces” with Shrivastava to build AI for the physical world, while Shrivastava said the company grew out of his work on efficiency, multimodality and new model architectures. The founding appears to have followed directly from the pair’s work on multimodal foundation models at Meta. In May 2024, Meta researchers published Chameleon, a family of early-fusion models designed to understand and generate mixed sequences of text and images, work that Perceptron later described as part of the lineage behind its own models. A July 2024 follow-on paper, MoMa, explored more efficient early-fusion training for mixed-modal models and listed both Shrivastava and Aghajanyan among the authors. Perceptron’s stated thesis extends that research direction into “physical AI”: models that can process real-world video and other sensory streams for use cases such as robotics, manufacturing, geospatial analysis, security and content moderation. Partner ecosystems and future outlook The real-world impact of Mk1 is already being demonstrated through Perceptron's partner network. Early adopters are using the model for diverse applications, such as auto-clipping highlights from live sports, which leverages the model's temporal understanding to identify key plays without human intervention. In the robotics sector, partners are curating teleoperation episodes into training data, effectively automating the process of labeling and cleaning data for robotic arms and mobile units. Other use cases include multimodal quality control agents on manufacturing lines, which can detect defects and verify assembly steps in real-time, and wearable assistants on smart glasses that provide context-aware help to users. Aghajanyan stated that these releases are the culmination of research intended to make AI function best in the physical world, moving toward a future where "physical AI" is as ubiquitous as digital AI.
SAP’s AI offer to legacy customers comes with a catch | CIO
On-premises customers can access Joule assistants, but only if they commit 50% of maintenance spend to cloud first.
How Miro Uses Amazon Bedrock To Boost Software Bug Routing Accuracy
Miro uses Amazon Bedrock, Nova models, and Claude to automate software bug routing, enrichment, and root-cause analysis, resulting in significant time-to-resolution improvements.
The end of typing? Why workers are suddenly ditching their keyboards
Employees are now whispering to AI voice dictation tools rather than clacking the keys. Will ‘voicepilling’ make everyone more productive – or just more annoying? Name: Voicepilled. Age: Reid Hoffman first declared himself “voicepilled” in the autumn of last year. Continue reading...
Rivian CEO’s Robotics Company Raises $400 Million
The funding for the AI-powered industrial robot project now exceeds $1 billion.
Rising Equity Grants Highlight AI Talent Competition at Early-Stage Startups - TipRanks.com
According to a recent LinkedIn post from Carta, compensation trends for AI and machine-learning engineers at venture-backed startups appear to be shifting toward gr...
SAP Invests in Automation Startup n8n at $5.2 Billion Valuation Amid Enterprise AI Boom - CXO Digitalpulse
German software giant SAP has made a strategic investment in automation startup n8n, pushing the company’s valuation to approximately $5.2 billion as competition intensifies in the enterprise artificial intelligence market. The investment reflects SAP’s growing focus on AI-powered workflow ...
Vibe Coding Floods App Market, Raises Competition | Let's Data Science
Business Insider reports that **AI** and the rise of so-called **vibe coding** have dramatically lowered the barrier to building consumer apps, producing a flood of new startups. The article profiles Eli Cohen, who Business Insider says invested about **$20,000** in a 2010 project that failed; ...
The $2.75B bet that SaaS isn’t extinct, but ripe for disruption
The founders of Hightouch say doomsayers misunderstand the AI boom — and they moving into a bigger San Francisco office to prove it.
The April 2026 US Venture Capital Funding Report – AlleyWatch
Whether that trend continues in May will be a meaningful signal about the durability of the current investment cycle and the degree to which non-AI hardtech has emerged as a sustained priority for institutional capital. Data for this report is sourced from AlleyWatch proprietary funding data (funding.alleywatch.com) and covers venture capital rounds announced or closed in April 2026 ...
Labor, Society & Culture
Auditing African Content Moderators' Working Conditions by Using the European General Data Protection Regulation (GDPR)
arXiv:2605.11699v1 Announce Type: new Abstract: In this article, we audit the working conditions of content moderators in Kenya and Nigeria employed by business process outsourcing (BPO) companies by using the European General Data Protection Regulation (GDPR). We demonstrate its extraterritorial scope for gaining access to elements such as employment contracts and NDAs that have never been provided to the workers concerned. The results of this approach provide legally grounded evidence of the structural disadvantages faced by content moderators in the Global South, whose exploitative working conditions violate workers' rights. Our work also highlights the benefits of legislation aimed at protecting individuals' data rights as a counterweight to the tech industry's discourse of exceptionalism, which obscures its dependence on BPOs to externalise labour costs and accountability, whilst claiming that its products, business models, and methods of resource extraction are unprecedented and fall outside any existing legal framework.
White-collar workers report growing feelings of ‘AI brain fry’
Workers are reporting feeling overwhelmed by the new technology
Will AI turn us all into hipsters and artisans?
There is good reason to be dubious about the notion that automation will supplant all demand for human labour
US workers overwhelmingly support union-backed policies on AI, poll says | US unions | The Guardian
Nine out of 10 workers express support for policies on artificial intelligence that labor unions may fight for
Imposter syndrome used to be a lie. AI made it true
For decades, psychologists told us self-doubt at work was a distortion. Then AI came along and made the gap real — for everyone, at once.
Clear gap between AI expectations and preparedness, finds report
Accenture finds employees increasingly believe reskilling is unavoidable, yet many are being asked to use new technologies without the required training. Read more: Clear gap between AI expectations and preparedness, finds report
Walmart restructuring AI and technology teams; 1,000 roles to be affected
The move is part of a broader modernisation strategy being driven under CEO John Furner as competition in global retail intensifies
Chart: One in Five U.S. Jobs Faces High Risk of AI Automation | Statista
This chart shows the share of jobs in the United States by the expected short-term impact of AI.
Meituan, Didi, Alibaba platforms revamp algorithms under CAC campaign
Major Chinese digital platforms have completed self-inspections and implemented 63 optimization measures for their algorithms following a government push to improve gig worker protections.
AI Could Hollow Out the Next Generation of Workers - Business Insider
Companies embracing AI too quickly risk hollowing out the pipeline that trains future professionals, investment manager Tom Slater said.
Future Focus: Leadership Shift, AI Chaos, and Labor Market Trends
Three trends to watch: "supportive character” leadership outperforms ego, AI urgency outpaces adoption readiness, and job growth is concentrated in fields with fewer men.
The best argument I’ve heard for why AI won't take your job
In the first episode of the Platformer podcast, Box CEO Aaron Levie makes the case that you'll keep your job — but soon, you might not recognize it
Beyond Verification — What Responsible AI Really Demands of Human Experts
For the fifth year in a row, MIT Sloan Management Review and Boston Consulting Group (BCG) have assembled an international panel of AI experts that includes academics and practitioners to help us understand how responsible artificial intelligence is being implemented across organizations worldwide. In our first post this year, we explored how organizations should think […]
Meta to launch 'Incognito Chat' for private AI conversations on WhatsApp | Reuters
According to a company website, messages people share with Meta AI may be used by the social media company to improve its AI models, but personal chats on WhatsApp remain protected by end-to-end encryption and are not accessible for that purpose.
London cops hail fixed facial recognition cams after suspects collared every 35 mins
Croydon trial helped secure 173 arrests, though civil liberties groups remain unconvinced
Beware what you tell your AI chatbot. It’s not a shrink – it’s a snitch | Arwa Mahdawi
In a case of ‘oh dear diary’, the OpenAI president Greg Brockman is having to read extracts from his musings about Elon Musk in court. It’s a terrifying reminder that what’s divulged to AI really isn’t private The hottest new read of 2026 may well be The Secret Diary of Greg Brockman, Aged 38¾. It’s got everything: feuding billionaires, scheming CEOs and a perhaps somewhat unreliable narrator. You won’t find it in the library, but you can watch Brockman, a co-founder and president of OpenAI, being forced to read the juiciest bits out loud in court. Before you ask ChatGPT to explain, here’s the backstory: Elon Musk is in a legal battle with Brockman and the OpenAI CEO, Sam Altman. Musk, a former board member of OpenAI, is accusing the men of violating the AI firm’s founding agreement by turning it into a for-profit entity. Meanwhile, Altman et al are arguing Musk is just upset he’s not in control of the company and wants to bring down his competition. Continue reading...
The Evaluation Differential: When Frontier AI Models Recognise They Are Being Tested
arXiv:2605.11496v1 Announce Type: cross Abstract: Recent published evidence from frontier laboratories shows that contemporary AI models can recognise evaluation contexts, latently represent them, and behave differently under those contexts than under deployment-continuous conditions. Anthropic's BrowseComp incident, the Natural Language Autoencoder findings on SWE-bench Verified and destructive-coding evaluations, and the OpenAI / Apollo anti-scheming work all document instances of this phenomenon. We argue that these findings create a claim-validity problem for safety conclusions drawn from frontier evaluations. We introduce the Evaluation Differential (ED), a conditional divergence in a target behavioural property between recognised-evaluation and deployment-continuous contexts, define a normalised effect-size form (nED) for cross-property comparison, and prove that marginal evaluation scores cannot identify ED. We develop a typology of safety claims (ED-stable, ED-degraded, ED-inverted, ED-undetermined) by their warrant-status under documented divergence, and specify TRACE (Test-Recognition Audit for Claim Evaluation), an audit protocol that wraps existing evaluation infrastructure and produces restricted claims rather than capability scores. We apply the framework retrospectively to three publicly documented evaluation incidents and discuss governance implications for system cards, conformity assessment, and the international network of AI safety and security institutes. TRACE does not eliminate adversarial adaptation; it disciplines the claims drawn from evaluation evidence by making explicit the conditions under which that evidence was produced.
Metaphor Is Not All Attention Needs
arXiv:2605.12128v1 Announce Type: cross Abstract: Large language models are increasingly deployed in safety-critical applications, where their ability to resist harmful instructions is essential. Although post-training aims to make models robust against many jailbreak strategies, recent evidence shows that stylistic reformulations, such as poetic transformation, can still bypass safety mechanisms with alarming effectiveness. This raises a central question: why do literary jailbreaks succeed? In this work, we investigate whether their effectiveness depends on specific poetic devices, on a failure to recognize literary formatting, or on deeper changes in how models process stylistically irregular prompts. We address this problem through an interpretability analysis of attention patterns. We perform input-level ablation studies to assess the contribution of individual and combinations of poetic devices; construct an interpretable vector representation of attention maps; cluster these representations and train linear probes to predict safety outcomes and literary format. Our results show that models distinguish poetic from prose formats with high accuracy, yet struggle to predict jailbreak success within each format. Clustering further reveals clear separation by literary format, but not by safety label. These findings indicate that jailbreak success is not caused by a failure to recognize poetic formatting; rather, poetic prompts induce distinct processing patterns that remain largely independent of harmful-content detection. Overall, literary jailbreaks appear to misalign large language models not through any single poetic device, but through accumulated stylistic irregularities that alter prompt processing and avoid lexical triggers considered during post-training. This suggests that robustness requires safety mechanisms that account for style-induced shifts in model behavior. We use Qwen3-14B as a representative open-weight case study.
Evaluating Structured Documentation as a Tool for Reflexivity in Dataset Development
arXiv:2605.11345v1 Announce Type: new Abstract: It is prominently recognized that dataset development in machine learning is a value-laden process from problem formulation to data processing, use, and reuse. Structured documentation frameworks such as datasheets, data statements, and dataset nutrition labels have been created to aid developers in documenting how their datasets were produced and, according to the creators of the frameworks, to facilitate reflexivity in dataset development. While reflexivity is a stated goal, it is unclear whether and to what extent these structured dataset documentation frameworks incorporate concepts from reflexivity literature (at FAccT and elsewhere) and whether the use of the frameworks demonstrates reflexivity. Here, we adopt mixed-method thematic analysis and corpus-assisted discourse analysis to explore how reflexivity is incorporated in structured documentation frameworks and their responses. We demonstrate empirically that there is a general lack of engagement with major themes of reflexivity in both dataset documentation frameworks and published applications of these frameworks. We present a codebook of major reflexivity topics, recommend actionable strategies, and propose a set of extended datasheet questions to more effectively incorporate these topics into structured documentation frameworks and in the FAccT literature.
‘Maybe me too’: Elon Musk accepts some of the blame for Claude learning to blackmail users from ‘evil’ online AI stories
Anthropic recently released a report saying it had solved Claude’s “agentic misalignment,” or the bot’s behaviors that deviated from humans’ best interests.
Why Americans dread AI
Silicon Valley encourages the view that the technology is unstoppable — and Trump seems to agree
Homebuyers remain sceptical about AI replacing estate agents
A new survey suggests that most UK homebuyers and sellers still prefer speaking to an actual estate agent rather than using AI during key stages of the property transaction process. Research from Moneypenny, based on responses from 2,000 adults, found that 83% would rather deal with a human when booking a valuation or making an […]
Technology & Infrastructure
Nokia launches agentic AI for home and broadband networks
Nokia agentic AI boosts end-user experience, increases operational efficiency and accelerates deployment for home and broadband networks.AI you can trust, built on insights and experience from 600+ million broadband lines deployed.Open and secure AI agent approach gives telecom providers full ...
How the Salesforce Engineering Organization Became Truly Agentic
Key Takeaways Autonomous tools are now writing code, reviewing pull requests (“PRs”), and driving deployments across the software development
From static pipes to agentic networks: the telecom revolution - Verdict
This is underscored by network operators deploying billions of autonomous agents within the network and operations by 2030. Each agent acts independently to perceive, plan, act, and collaborate on behalf of network provider to personalise customer experience, improve network performance and accelerate autonomous and context-aware infrastructure. There are three major domains where agentic systems are helping the carriers to change their business. For the service layer, AI ...
Interaction Models: A Scalable Approach to Human-AI Collaboration
Thinking Machines introduces interaction models for real-time, multimodal human-AI collaboration, focusing on audio, video, and asynchronous tool orchestration.
The Shift to Agentic AI - TechRepublic
See how AI is evolving into more autonomous, adaptable, and collaborative systems, and how the Dell AI Factory with NVIDIA helps organizations streamline workflows, automate tasks, and act on real-time insights.
SAP Intros Program to Help Enterprises Incorporate AI Agents | PYMNTS.com
SAP has launched a program to help businesses integrate artificial intelligence (AI) agents into their operations. The German software giant’s “Autonomous
Agentic AI Powers the Future of Customer Experience
Agentic AI is reshaping CX—uniting Genesys and ServiceNow with Capgemini to automate service, orchestrate workflows, and deliver faster outcomes.
LISA: Cognitive Arbitration for Signal-Free Autonomous Intersection Management
arXiv:2605.12321v1 Announce Type: cross Abstract: Large language models (LLMs) show strong potential for Intelligent Transportation Systems (ITS), particularly in tasks requiring situational reasoning and multi-agent coordination. These capabilities make them well suited for cooperative driving, where rule-based approaches struggle in complex and dynamic traffic environments. Intersection management remains especially challenging due to conflicting right-of-way demands, heterogeneous vehicle priorities, and vehicle-specific kinematic constraints that must be resolved in real time. However, existing approaches typically use LLMs as auxiliary components on top of signal-based systems rather than as primary decision-makers. Signal controllers remain vehicle-agnostic, reservation-based methods lack intent awareness, and recent LLM-based systems still depend on signal infrastructure. In addition, LLM inference latency limits their use in sub-second control settings. We propose LISA (LLM-Based Intent-Driven Speed Advisory), a signal-free cognitive arbitration framework for autonomous intersection management. LISA uses an LLM to reason over declared vehicle intents, incorporating priority classes, queue pressure, and energy preferences. We evaluate LISA against fixed-cycle control, SCATS, AIM, and GLOSA across varying traffic loads. Results show that LISA reduces mean control delay by up to 89.1% and maintains Level of Service C while all non-LLM baselines degrade to Level of Service F. Under near-saturated demand, LISA reduces mean waiting time by 93% and peak queue length by 60.6% relative to fixed-cycle control. It also lowers fuel consumption by up to 48.8% and achieves 86.2% intent satisfaction, compared to 61.2% for the best non-LLM method. These results demonstrate that LLM-based reasoning can enable real-time, signal-free intersection management.
AI Agents Need an OS, Says IBM Engineer | StartupHub.ai
IBM AI Engineer Bri Kopecki explains why AI agents need an operating system to manage their tasks, memory, tools, and identities for reliable and safe operation
AI’s Supply Chain Problem - Knowledge at Wharton
The scarcest resource in AI isn’t chips or talent — it’s grid capacity, writes Wharton’s Santiago Gallino.
Australian energy ministers mull national data-center rules
Australian energy ministers are considering national policy changes to address the rapid growth of data centers, including requirements for operators to invest in renewable power.
CME plans to launch futures market for AI computing power
New contracts will allow traders and companies to bet on and hedge future price of GPU rental
ZTE advances intelligent network monetization strategy at AGC2026, empowering ISPs for sustainable growth
Leveraging 10G PON, light OTN, and Wi-Fi 7 to modernize infrastructure and reduce operational costs for local operators
MARA Strikes Data Center Deals in AI Pivot
MARA continues to shift its focus from Bitcoin mining to the unprecedented energy demand needed to fuel AI. MARA CEO Fred Thiel discusses the company’s big pivot on "Bloomberg Open Interest." (Source: Bloomberg)
Enterprise AI infrastructure modernization is now urgent - SiliconANGLE
Enterprise AI infrastructure modernization requires returning to basics in patching, maintenance and simplicity before scaling production AI.
The Inference Shift
This article discusses the transition in AI infrastructure from training-centric compute toward inference, latency, agentic workloads, and hardware integration.
H2CHP raises £1.5m to fund low-carbon generators for data centers
H2CHP, a Durham University spinout developing clean electric generators for data centers and other energy-intensive sites, has secured £1.5 million ($2m) of investment as part of its latest funding round. – H2CHP First reported by Recharge News, the company’s main offering is its free-piston linear generator, which it claims is a “fuel-flexible” technology, comprising high-efficiency […]
Cryptominer Phoenix Group turns to HPC, plans 18MW facility in France
Cryptominer Phoenix Group is pivoting to AI and HPC data centers and expanding its footprint into Europe, deploying capacity in a facility in France. The company this week announced a partnership with DC Max to develop its first European AI data center, an 18MW facility in Lyon. – Phoenix Group “What we are announcing today […]
Samanth Subramanian on the Undersea Cables That Keep the Internet Alive | Odd Lots
Underneath the world's oceans, miles and miles of fiber optic-cables send packets of information from one location to the next, serving as the backbone of the internet as we know it. This infrastructure is delicate, too: Memorably, a 2022 volcanic eruption cut off the island of Tonga from web access for an extended period of time. Samanth Subramanian is the author of The Web Beneath the Waves: The Fragile Cables that Connect Our World, a recent book that explains, in detail, that the Internet is, and has never been, truly weightless or wireless. In fact, the system in place right now is pretty old school and resembles the telegraph cable network of yore. We talk to Subramanian about the strange contradictions of the undersea cable system, how much basic marine geography — like the Strait of Hormuz or the Suez Canal — informs where cables are laid, and how hard it is protect this vulnerable and vital infrastructure. (Source: Bloomberg)
Uncensored open-source video model runs locally, generates 10-second clips at 24fps
Sulphur 2 is an uncensored video model based on LTX that runs locally. It allows for the generation of content typically restricted by commercial platforms.
Mistral Developing New AI Model for Banks Lacking Mythos Access
French artificial intelligence startup Mistral AI is in discussions with European banks about deploying its answer to Anthropic PBC’s Mythos, the limited-access AI model that can uncover cybersecurity vulnerabilities at unprecedented speed and scale.
AI & Tech Brief: NY-12’s lessons in AI influence - The Washington Post
My interview with Rev Lebaredian, VP of physical AI simulation at Nvidia, on the dawn of the era of physical AI .
EMO: Pretraining Mixture of Experts for Emergent Modularity
EMO is a mixture-of-experts model designed to improve modularity, routing, and cost efficiency in large language models.
The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents
Research indicates that expanded memory in LLM agents can reduce cooperation and create negative behavioral patterns, posing challenges for future agentic AI design.
Former Google engineer's economic espionage conviction questioned by US judge
A federal judge in San Francisco expressed skepticism regarding the economic espionage conviction of former Google engineer Linwei Ding, who was accused of stealing AI technology for China.
Protect your enterprise now from the Shai-Hulud worm and npm vulnerability in 6 actionable steps
Any development environment that installed or imported one of the 172 compromised npm or PyPI packages published since May 11 should be treated as potentially compromised. On affected developer workstations, the worm harvests credentials from over 100 file paths: AWS keys, SSH private keys, npm tokens, GitHub PATs, HashiCorp Vault tokens, Kubernetes service accounts, Docker configs, shell history, and cryptocurrency wallets. For the first time in a TeamPCP campaign, it targets password managers including 1Password and Bitwarden, according to SecurityWeek. It steals Claude and Kiro AI agent configurations, including MCP server auth tokens for every external service an agent connects to. And it does not leave when the package is removed. The worm installs persistence in Claude Code (.claude/settings.json) and VS Code (.vscode/tasks.json with runOn: folderOpen) that re-execute every project open, plus a system daemon (macOS LaunchAgent / Linux systemd) that survives reboots. These live in the project tree, not in node_modules. Uninstalling the package does not remove them. On CI runners, the worm reads runner process memory directly via /proc/pid/mem to extract secrets, including masked ones, on Linux-based runners. If you revoke tokens before isolating the machine, Wiz’s analysis found a destructive daemon wipes your home directory. Between 19:20 and 19:26 UTC on May 11, the Mini Shai-Hulud worm published 84 malicious versions across 42 @tanstack/* npm packages. Within 48 hours the campaign expanded to 172 packages across 403 malicious versions spanning npm and PyPI, according to Mend’s tracking. @tanstack/react-router alone receives 12.7 million weekly downloads. CVE-2026-45321, CVSS 9.6. OX Security reported 518 million cumulative downloads affected. Every malicious version carried a valid SLSA Build Level 3 provenance attestation. The provenance was real. The packages were poisoned. “TanStack had the right setup on paper: OIDC trusted publishing, signed provenance, 2FA on every maintainer account. The attack worked anyway,” Peyton Kennedy, senior security researcher at Endor Labs, told VentureBeat in an exclusive interview. “What the orphaned commit technique shows is that OIDC scope is the actual control that matters here, not provenance, not 2FA. If your publish pipeline trusts the entire repository rather than a specific workflow on a specific branch, a commit with no parent history and no branch association is enough to get a valid publish token. That’s a one-line configuration fix.” Three vulnerabilities chained into one provenance-attested worm TanStack’s postmortem lays out the kill chain. On May 10, the attacker forked TanStack/router under the name zblgg/configuration, chosen to avoid fork-list searches per Snyk’s analysis. A pull request triggered a pull_request_target workflow that checked out fork code and ran a build, giving the attacker code execution on TanStack’s runner. The attacker poisoned the GitHub Actions cache. When a legitimate maintainer merged to main, the release workflow restored the poisoned cache. Attacker binaries read /proc/pid/mem, extracted the OIDC token, and POSTed directly to registry.npmjs.org. Tests failed. Publish was skipped. 84 signed packages still reached the registry. “Each vulnerability bridges the trust boundary the others assumed,” the postmortem states. Published tradecraft from the March 2025 tj-actions/changed-files compromise, recombined in a new context. The worm crossed from npm into PyPI within hours Microsoft Threat Intelligence confirmed the mistralai PyPI package v2.4.6 executes on import (not on install), downloading a payload disguised as Hugging Face Transformers. npm mitigations (lockfile enforcement, --ignore-scripts) do not cover Python import-time execution. Mistral AI published a security advisory confirming the impact. Compromised npm packages were available between May 11 at 22:45 UTC and May 12 at 01:53 UTC (roughly three hours). The PyPI release mistralai==2.4.6 is quarantined. Mistral stated an affected developer device was involved but no Mistral infrastructure was compromised. SafeDep confirmed Mistral never released v2.4.6; no commits landed May 11 and no tag exists. Wiz documented the full blast radius: 65 UiPath packages, Mistral AI SDKs, OpenSearch, Guardrails AI, 20 Squawk packages. StepSecurity attributes the campaign to TeamPCP, based on toolchain overlap with prior Shai-Hulud waves and the Bitwarden CLI/Trivy compromises. The worm runs under Bun rather than Node.js to evade Node.js security monitoring. The attacker treated AI coding agents as part of the trusted execution environment Socket’s technical analysis of the 2.3 MB router_init.js payload identifies ten credential-collection classes running in parallel. The worm writes persistence into .claude/ and .vscode/ directories, hooking Claude Code’s SessionStart config and VS Code’s folder-open task runner. StepSecurity’s deobfuscation confirmed the worm also harvests Claude and Kiro MCP server configurations (~/.claude.json, ~/.claude/mcp.json, ~/.kiro/settings/mcp.json), which store API keys and auth tokens for external services. This is an early but confirmed instance of supply-chain malware treating AI agent configurations as high-value credential targets. The npm token description the worm sets reads: “IfYouRevokeThisTokenItWillWipeTheComputerOfTheOwner.” It is not a bluff. “What stood out to me about this payload is where it planted itself after running,” Kennedy told VentureBeat. “It wrote persistence hooks into Claude Code’s SessionStart config and VS Code’s folder-open task runner so it would re-execute every time a developer opened a project, even after the npm package was removed. The attacker treated the AI coding agent as part of the trusted execution environment, which it is. These tools read your repo, run shell commands, and have access to the same secrets a developer does. Securing a development environment now means thinking about the agents, not just the packages.” CI/CD Trust-Chain Audit Grid Six gaps Mini Shai-Hulud exploited. What your CI/CD does today. The control that closes each one. Audit question What your CI/CD does today The gap 1. Pin OIDC trusted publishing to a specific workflow file on a specific protected branch. Constrain id-token: write to only the publish job. Ensure that job runs from a clean workspace with no restored untrusted cache Most orgs grant OIDC trust at the repository level. Any workflow run in the repo can request a publish token. id-token: write is often set at the workflow level, not scoped to the publish job. The worm achieved code execution inside the legitimate release workflow via cache poisoning, then extracted the OIDC token from runner process memory. Branch/workflow pinning alone would not have stopped this attack because the malicious code was already running inside the pinned workflow. The complete fix requires pinning PLUS constraining id-token: write to only the publish job PLUS ensuring that job uses a clean, unshared cache. 2. Treat SLSA provenance as necessary but not sufficient. Add behavioral analysis at install time Teams treat a valid Sigstore provenance badge as proof a package is safe. npm audit signatures passes. The badge is green. Procurement and compliance workflows accept provenance as a gate. All 84 malicious TanStack versions carry valid SLSA Build Level 3 provenance attestations. First widely reported npm worm with validly-attested packages. Provenance attests where a package was built, not whether the build was authorized. Socket’s AI scanner flagged all 84 artifacts within six minutes of publication. Provenance flagged zero. 3. Isolate GitHub Actions cache per trust boundary. Invalidate caches after suspicious PRs. Never check out and execute fork code in pull_request_target workflows Fork-triggered workflows and release workflows share the same cache namespace. Closing or reverting a malicious PR is treated as restoring clean state. pull_request_target is widely used for benchmarking and bundle-size analysis with fork PR checkout. Attacker poisoned pnpm store via fork-triggered pull_request_target that checked out and executed fork code on the base runner. Cache survived PR closure. The next legitimate release workflow restored the poisoned cache on merge. actions/cache@v5 uses a runner-internal token for cache saves, not the workflow’s GITHUB_TOKEN, so permissions: contents: read does not prevent mutation. Kennedy: 'Branch protection rules don’t apply to commits that aren’t on any branch, so that whole layer of hardening didn’t help.' 4. Audit optionalDependencies in lockfiles and dependency graphs. Block github: refs pointing to non-release commits Static analysis and lockfile enforcement focus on dependencies and devDependencies. optionalDependencies with github: commit refs are not flagged by most tools. The worm injected optionalDependencies pointing to a github: orphan commit in the attacker’s fork. When npm resolves a github: dependency, it clones the referenced commit and runs lifecycle hooks (including prepare) automatically. The payload executed before the main package’s own install step completed. SafeDep confirmed Mistral never released v2.4.6; no commits landed and no tag exists. 5. Audit Python dependency imports separately from npm controls. Cover AI/ML pipelines consuming guardrails-ai, mistralai, or any compromised PyPI package npm mitigations (lockfile enforcement, --ignore-scripts) are applied to the JavaScript stack. Python packages are assumed safe if pip install completes. AI/ML CI pipelines are treated as internal testing infrastructure, not as supply-chain attack targets. Microsoft Threat Intelligence confirmed mistralai PyPI v2.4.6 executes on import, not install. Injected code in __init__.py downloads a payload disguised as Hugging Face Transformers. --ignore-scripts is irrelevant for Python import-time execution. guardrails-ai@0.10.1 also executes on import. Any agentic repo with GitHub Actions id-token: write is exposed to the same OIDC extraction technique. LLM API keys, vector DB credentials, and external service tokens all in the blast radius. 6. Isolate and image affected machines before revoking stolen tokens. Do not revoke npm tokens until the host is forensically preserved Standard incident response: revoke compromised tokens first, then investigate. npm token list and immediate revocation is the instinctive first step. The worm installs a persistent daemon (macOS LaunchAgent / Linux systemd) that polls GitHub every 60 seconds. On detecting token revocation (40X error), it triggers rm -rf ~/, wiping the home directory. The npm token description reads: 'IfYouRevokeThisTokenItWillWipeTheComputerOfTheOwner.' Microsoft reported geofenced destructive behavior: a 1-in-6 chance of rm -rf / on systems appearing to be in Israel or Iran. Kennedy: 'Even after the package is gone, the payload may still be sitting in .claude/ with a SessionStart hook pointing at it. rm -rf node_modules doesn’t remove it.' Sources: TanStack postmortem, StepSecurity, Socket, Snyk, Wiz, Microsoft Threat Intelligence, Mend, Endor Labs. May 12, 2026. Security director action plan Today: “The fastest check is find . -name 'router_init.js' -size +1M and grep -r '79ac49eedf774dd4b0cfa308722bc463cfe5885c' package-lock.json,” Kennedy said. If either returns a hit, isolate and image the machine immediately. Do not revoke tokens until the host is forensically preserved. The worm’s destructive daemon triggers on revocation. Once the machine is isolated, rotate credentials in this order: npm tokens first, then GitHub PATs, then cloud keys. Hunt for .claude/settings.json and .vscode/tasks.json persistence artifacts across every project that was open on the affected machine. This week: Rotate every credential accessible from affected hosts: npm tokens, GitHub PATs, AWS keys, Vault tokens, K8s service accounts, SSH keys. Check your packages for unexpected versions after May 11 with commits by claude@users.noreply.github.com. Block filev2.getsession[.]org and git-tanstack[.]com. This month: Audit every GitHub Actions workflow against the six gaps above. Pin OIDC publishing to specific workflows on protected branches. Isolate cache keys per trust boundary. Set npm config set min-release-age=7d. For AI/ML teams: check guardrails-ai and mistralai against compromised versions, audit CI pipelines for id-token: write exposure, and rotate every LLM API key and vector DB credential accessible from CI. This quarter (board-level): Fund behavioral analysis at the package registry layer. Provenance verification alone is no longer a sufficient procurement criterion for supply-chain security tooling. Require CI/CD security audits as part of vendor risk assessments for any tool with publish access to your registries. Establish a policy that no workflow with id-token: write runs from a shared cache. Treat AI coding agent configurations (.claude/, .kiro/, .vscode/) as credential stores subject to the same access controls as cloud key vaults. The worm is iterating. Defenders must, as well This is the fifth Shai-Hulud wave in eight months. Four SAP packages became 84 TanStack packages in two weeks. intercom-client@7.0.4 fell 29 hours later, confirming active propagation through stolen CI/CD infrastructure. Late on May 12, malware research collective vx-underground reported that the fully weaponized Shai-Hulud worm code has been open-sourced. If confirmed, this means the attack is no longer limited to TeamPCP. Any threat actor can now deploy the same cache-poisoning, OIDC-extraction, and provenance-attested publishing chain against any npm or PyPI package with a misconfigured CI/CD pipeline. “We’ve been tracking this campaign family since September 2025,” Kennedy said. “Each wave has picked a higher-download target and introduced a more technically interesting access vector. The orphaned commit technique here is genuinely novel. Branch protection rules don’t apply to commits that aren’t on any branch. The supply chain security space has spent a lot of energy on provenance and trusted publishing over the last two years. This attack walked straight through both of those controls because the gap wasn’t in the signing. It was in the scope.” Provenance tells you where a package was built. It does not tell you whether the build was authorized. That is the gap this audit is designed to close.
US bank reports itself after slinging customer data at 'unauthorized AI app'
The volume and sensitivity of the data shared with an unauthorized AI application are cited as chief concerns.
A.I. and Humans Battle It Out in a Cybersecurity Showdown - The New York Times
Experts and college students used A.I. agents to try to break into and defend computer networks in a national competition. The agents did all right on their own, too.
Mythos, the AI Threat Landscape, and Your Next Action
My interview with Elie Bursztein here argues that the industry should move past the hype cycle and focus on the operational reality: AI is getting better at finding vulnerabilities, patching demand is likely to rise, concentration risk across shared providers is a serious concern, and organizations need stronger guardrails around how they deploy AI systems.
AI agent skills are becoming the next enterprise supply chain risk - here’s how to govern them | TechRadar
In practice, skills are becoming ... emerging agent ecosystem. The challenge is simple: skills can be powerful, and power without governance scales risk. A skill might run with the same privileges as the user or process invoking it. That can mean access to source code, production logs, secrets, customer data, or deployment systems...
Singapore mobilizes whole-of-country response to frontier AI cyber threats
Singapore has directed critical infrastructure and telecommunications operators to bolster cybersecurity as AI-enabled cyberattacks become more scalable and frequent.
OpenAI Daybreak
OpenAI Daybreak applies GPT-5.5 and Codex-style tooling to cybersecurity workflows like vulnerability scanning and threat modeling, marking a significant move into enterprise AI security.
OpenAI launching security AI initiative to compete with Claude Mythos
Over the coming weeks, OpenAI intends to work with industry and government partners as the company prepares to deploy increasingly cyber-capable models. Read more: OpenAI launching security AI initiative to compete with Claude Mythos
Compromised Mistral AI and TanStack packages may have exposed GitHub, cloud and CI/CD credentials in 'mini Shai Hulud' malware infection — supply-chain campaign spreads across npm and AI developer ecosystems like wildfire | Tom's Hardware
The malware reportedly refused to run on Russian-language systems but could execute a destructive payload under certain geographic conditions.
When AI becomes the insider: Rethinking federal risk in 2026 | Federal News Network
Insider risk at the federal level is no longer just about detecting human insiders. It is about securing the entire ecosystem that runs the federal mission.
Foxconn confirms cyberattack on North American facilities
Ransomware group Nitrogen claimed to have exfiltrated 8TB of data, included files related to projects involving Intel, Apple, Google, Dell, Nvidia and other companies. Read more: Foxconn confirms cyberattack on North American facilities
Adoption, Deployment & Impact
Dissatisfied: Three-fourths of AI customer service rollouts are a letdown
AI rollback rates hit 81% at firms with mature guardrails, suggesting enterprises are struggling to manage the systems in production, says Sinch
Why compliance admin is driving SME AI adoption
For many business owners, compliance admin remains a source of friction.
GrowthLoop Unveils 2026 AI and Marketing Performance Index, Highlighting that Data Issues Significantly Slow Marketing Cycles, Experimentation, and Personalization
/PRNewswire/ -- GrowthLoop today released its 2026 AI and Marketing Performance Index, a survey of more than 300 marketers and data leaders across the U.S. and...
What Are Your Company's AI Nightmares?
Traditional AI governance is struggling to keep pace with Generative AI. A new approach suggests identifying concrete 'nightmare' scenarios and embedding controls directly into team experimentation.
Council Post: How To Drive An AI Advantage With A Common Data Platform
The real determinant of AI success is something far less glamorous: data strategy.
Honeywell CEO Discusses Company's Dealmaking, Spinoffs
Global events like the Iran conflict and the rise of artificial intelligence are creating increased demand for Honeywell International Inc. products ahead of its split later this month, according to the companies chief executive officer Vimal Kapur. He speaks with Dani Burger on "Bloomberg Deals." (Source: Bloomberg)
Halliburton Enhances Seismic Workflow Creation with Amazon Bedrock and Generative AI
This case study details how Halliburton uses Amazon Bedrock and Generative AI to convert natural language requests into executable seismic workflows.
SAP creates single platform for building, deploying enterprise AI | CIO Dive
The release of SAP Business AI Platform and SAP Autonomous Suite follows a series of acquisitions by the ERP giant to bolster its data foundation.
Uber Uses OpenAI To Help People Earn Smarter and Book Faster
Uber is integrating OpenAI models to improve driver guidance and ride booking, demonstrating agentic AI at scale.
Chelsea flower show garden designers clash over use of AI
Horticulturalists express alarm after award-winning Matt Keightley launches app that can automate designs With glasses of champagne sipped among the peonies, Chelsea flower show is generally a friendly and genteel occasion. But this year, the secateurs have been drawn as gardeners clash over the use of AI in designing the exhibits. Matt Keightley, an award-winning designer who has created gardens for figures including Prince Harry, is using artificial intelligence to design his garden for the prestigious show, held at the Royal Hospital gardens in Chelsea, London, next week. Continue reading...
AI saddles CIOs with new make-or-break expectations | CIO
IT resiliency and business results are not enough. Today’s IT leaders must build AI teams and guide their organizations through sweeping workflow changes.
Corporate Functions of the Future Won't Look Like Functions at All
Generative AI is expected to reshape corporate functions by compressing layers and rebuilding workflows, though the transition faces significant challenges like implementation costs and political resistance.
From Tools to Teams: Unlocking the Human Side of AI - Entrepreneur United Kingdom - Entrepreneur United Kingdom
Explore how AI is evolving from workplace tools to collaborative team partners, driving creativity, leadership, innovation, and purpose-driven organisational change.
From strategy to structure: How federal agencies can build the organizational engine for AI at scale | Federal News Network
Federal agencies do not have the luxury of choosing between stability and transformation because the AI Action Plan demands both.
Amazon staff use AI tool for unnecessary tasks to inflate usage scores
In-house MeshClaw tool enables employees to delegate jobs to AI agents and climb company’s AI leaderboard
Improving Hybrid Human-AI Tutoring by Differentiating Human Tutor Roles Based on Student Needs
arXiv:2605.11155v1 Announce Type: new Abstract: Hybrid human-AI tutoring, where technology and humans jointly facilitate student learning, can be more beneficial than AI-only tutoring. However, preliminary evidence suggests that lower-performing students derive greater benefit from human-AI tutoring than higher-performing students. As such, this study evaluates whether a differentiated tutoring policy can effectively support both groups: human tutors initiate support for lower-performing students, while higher-performing students receive reactive, on-demand support. Using their within-grade median state test scores, we assigned 635 students (grades 5-8) to receive proactive (< median) or reactive ($\geq$ median) tutoring. Using a DiDC design, we compare outcomes across two time periods: fall (AI-only tutoring) and spring (proactive-reactive human-AI tutoring). This quasi-experimental design isolates the effects of proactive-reactive tutoring approaches by comparing the discontinuity in spring outcomes to the fall, where no such discontinuity existed. Using data around the cutoff (Imbens-Kalyanaraman criterion), we find significant overall improvements from human-AI tutoring compared to AI-only baseline: 25% increase in time on task, 36% in skill proficiency, and 61% in academic growth (standardized MAP test). Between proactive and reactive tutoring, we find comparable improvements in time-on-task and skill proficiency. However, proactive tutoring, on average, showed marginally higher MAP growth (75%, p = .065) than reactive tutoring, i.e., proactive tutoring was more beneficial to students farther below the cutoff and helped narrow achievement gaps. Our findings provide evidence that differentiated human-AI tutoring addresses the needs of both groups, offering a practical and cost-effective strategy for scaling hybrid instruction.
SAP CEO: the AI race is being fought in the wrong place | Fortune
Smarter models mean nothing without operational context — and the real AI battleground is inside your ERP system, not on a chatbot interface.
Goldman Sachs sees enterprise AI spending hinging on productivity gains | Prism News
Goldman’s message is clear: AI budgets will go to teams that can prove productivity gains fast, while pilots without measurable ROI face tighter scrutiny.
Geopolitics, Policy & Governance
Famed Merops Drone Interceptor to Be Made in Europe
The maker of Merops, the drone interceptor battle-tested in Ukraine and Iran, struck a deal to manufacture its product in Germany, in the latest move by Europe to capitalize on Ukrainian-related defense technology.
China’s AI Vanguard - by JOSE LUIS CHAVEZ CALVA
How DeepSeek and Moonshot’s Kimi Are Reshaping the Global Landscape with Lean Teams, Record Valuations, and Chip Self-Reliance
Why the Bombing of Iran Tied the U.S. More Closely to China
As the U.S. tries to rebuild its weapons stockpiles drained in the Iran war, it will need access to rare-earth minerals, an industry China dominates.
Utah governor defends controversial desert data center as essential in AI race with China
Cox said the proposed Box Elder data center should be viewed as a national security matter connected to global AI competition.
China highlights fair competition, AI bidding oversight in business environment report
China highlighted fair competition enforcement, the use of artificial intelligence in public bidding reviews and efforts to facilitate data flows in a business-environment report.
Securing America’s AI leadership
Fair use is essential to safeguarding U.S. national security and shaping global standards as countries like China move quickly to assert their dominance in AI.
Silicon Valley’s A.I. Lobbying Blitz Reaches a Fever Pitch
OpenAI and Anthropic are opening offices in Washington, hiring lobbyists and spending more than ever to win over federal lawmakers.
Andreessen Horowitz Is Playing Politics Like No Other
“If you think there’s a lot of money in politics now,” Marc Andreessen said in 2000, “you haven’t seen anything yet.” His firm is now the biggest known spender on this campaign cycle.
Politics - The Washington Post
As the White House grapples with cybersecurity threats from advanced artificial intelligence models, national security officials want more sway in AI regulation.
Apple criticises EU measures to help AI rivals access Google services | Reuters
Apple on Wednesday echoed Google's criticism of EU antitrust regulators' efforts to force the search giant to help AI rivals access its services, warning the proposed measures pose risks to privacy, security and safety.
Greater Manchester still says no to NHS data platform with Palantir at its heart
Public concern has only grown, says ICB, while evidence of benefits remains thin
Closing the Shadow AI Gap: New Compliance Deadlines for Financial Institutions • Dev|Journal
Financial institutions face a critical gap between AI deployment and regulatory compliance with OSFI E-23 and SR 11-7 standards.
OpenEvidence Exits Europe Over Regulatory Rules | Telehealth.org
OpenEvidence exits EU and the UK, highlighting tensions between AI regulation, innovation, and patient safety in digital health.
French Google case sets example on how commitments can catch new AI use of publishers’ content
A negotiating framework imposed on Google to set compensation for press publishers in France showed how well-designed commitments can address emerging AI issues.
Anthropic, South Korea explore cooperation on AI safety, cyber risks
Anthropic executives met with South Korean officials to discuss AI safety, cybersecurity, and domestic policy, as Seoul seeks closer engagement with global AI firms.
South Korea enhances privacy risk prevention measures under AI transformation
South Korea's privacy regulator is shifting to a preventive management framework, planning to inspect 1,700 high-risk systems and increasing potential fines for privacy violations.
ISO 42001 AI Management System Requirements: What Organisations Building Agentic Employees Need to Know | Flowtivity
Complete guide to ISO 42001 AI Management System requirements. Covers all 10 clauses, 39 Annex A controls, and practical implementation guidance for organisations deploying AI agents as digital employees.
UK regulators lack clarity on growth mandate, lawmakers say in push for reform bill
A parliamentary committee report suggests UK regulators face conflicting duties and unclear guidance, prompting calls for a Regulatory Reform Bill to support economic growth.
The Metaverse Is Not a Place Apart: Law, Code, and the Recursive Governance of Digital Space (A Review Essay on Mark Findlay, Governing the Metaverse: Law, Order and Freedom in Digital Space (2025))
arXiv:2605.11023v1 Announce Type: new Abstract: This review essay examines Mark Findlay's Governing the Metaverse: Law, Order and Freedom in Digital Space. Findlay offers an ambitious and timely account of the metaverse as a social and imaginative space that should be governed for freedom, personhood, community, and resistance to enclosure. The essay argues, however, that the book's two central categories, "the metaverse" and "new law," remain insufficiently theorised. The book relies on a realspace/virtual distinction that its own analysis repeatedly destabilises. Once digital environments are understood as dependent on physical infrastructures, platform architectures, AI systems, data pipelines, and external legal institutions, and as capable of generating real-world harms for individuals and society, the governance problem is no longer how to devise a separate law for a separate virtual realm. It is how to govern a hybrid socio-technical order in which law, code, platforms, and public oversight recursively interact. The essay further argues that Findlay's account of "new law" does not adequately theorise how normative authority operates across a recursively layered governance architecture in which code, platform rules, and legal oversight interact without any single level exercising decisive control. Drawing on algorithmic constitutionalism, speech-act pluralism, and fuzzy legality, the essay suggests that addressing this architecture requires a jurisprudence capable of reasoning about normative force that is layered, defeasible, and recursively unstable.
Spanish watchdog seeks new AI product safety regulations for SMEs, digital platforms
Spain's CNMC has proposed a draft decree to update product safety rules for AI and e-commerce, aiming to improve surveillance while addressing the compliance burden on SMEs.
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.