AI Intelligence Brief

Wed 13 May 2026

Daily Brief — Curated and contextualised by Best Practice AI

141Articles
Editor's pickEditor's Highlights

Microsoft Invests $100 Billion, Alibaba and Tencent Struggle, and Nvidia Joins Trump's China Trip

TL;DR Microsoft has invested over $100 billion in its partnership with OpenAI, highlighting its commitment to AI growth. Meanwhile, Alibaba and Tencent reported disappointing revenues, reflecting challenges in converting AI investments into growth. Nvidia CEO Jensen Huang joined President Trump's delegation to China, underscoring the geopolitical dimensions of AI. Anthropic is in talks to raise funding at a $950 billion valuation, while CME plans to launch a futures market for AI computing power.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

6 of 141 articles
Editor's pick
Arxiv· Yesterday

The Evaluation Differential: When Frontier AI Models Recognise They Are Being Tested

arXiv:2605.11496v1 Announce Type: cross Abstract: Recent published evidence from frontier laboratories shows that contemporary AI models can recognise evaluation contexts, latently represent them, and behave differently under those contexts than under deployment-continuous conditions. Anthropic's BrowseComp incident, the Natural Language Autoencoder findings on SWE-bench Verified and destructive-coding evaluations, and the OpenAI / Apollo anti-scheming work all document instances of this phenomenon. We argue that these findings create a claim-validity problem for safety conclusions drawn from frontier evaluations. We introduce the Evaluation Differential (ED), a conditional divergence in a target behavioural property between recognised-evaluation and deployment-continuous contexts, define a normalised effect-size form (nED) for cross-property comparison, and prove that marginal evaluation scores cannot identify ED. We develop a typology of safety claims (ED-stable, ED-degraded, ED-inverted, ED-undetermined) by their warrant-status under documented divergence, and specify TRACE (Test-Recognition Audit for Claim Evaluation), an audit protocol that wraps existing evaluation infrastructure and produces restricted claims rather than capability scores. We apply the framework retrospectively to three publicly documented evaluation incidents and discuss governance implications for system cards, conformity assessment, and the international network of AI safety and security institutes. TRACE does not eliminate adversarial adaptation; it disciplines the claims drawn from evaluation evidence by making explicit the conditions under which that evidence was produced.

Editor's pickPAYWALLTechnology
FT· 2 days ago

Amazon staff use AI tool for unnecessary tasks to inflate usage scores

In-house MeshClaw tool enables employees to delegate jobs to AI agents and climb company’s AI leaderboard

Editor's pickMedia & Entertainment
Thinking Machines TML-Small 64.7%, MIT Brain Study 🧠, Rust Browser 🚀· Yesterday

Uncensored open-source video model runs locally, generates 10-second clips at 24fps

Sulphur 2 is an uncensored video model based on LTX that runs locally. It allows for the generation of content typically restricted by commercial platforms.

Economics & Markets

32 articles
AI Investment & Valuations8 articles
AI Market Competition6 articles
Editor's pickPAYWALLTechnology
FT· 2 days ago

Chipmaker Cerebras joins OpenAI’s inner circle — for a price

Launching into the magic of the Altman-osphere could prove to be quite a windfall

Editor's pick
Arxiv· Yesterday

When to Ask a Question: Understanding Communication Strategies in Generative AI Tools

arXiv:2605.11240v1 Announce Type: cross Abstract: Generative AI models differ from traditional machine learning tools in that they allow users to provide as much or as little information as they choose in their inputs. This flexibility often leads users to omit certain details, relying on the models to infer and fill in under-specified information based on distributional knowledge of user preferences. Such inferences may privilege majority viewpoints and disadvantage users with atypical preferences, raising concerns about fairness. Unlike more traditional recommender systems, LLMs can explicitly solicit more information from users through natural language. However, while directly eliciting user preferences could increase personalization and mitigate inequality, excessive querying places a burden on users who value efficiency. We develop a stylized model of user-LLM interaction and develop an objective that captures tradeoff between user burden and preference representation. Building on the observation that individual preferences are often correlated, we analyze how AI systems should balance inference and elicitation, characterizing the optimal amount of information to solicit before content generation. Ultimately, we show that information elicitation can mitigate the systematic biases of preference inference, enabling the design of generative tools that better incorporate diverse user perspectives while maintaining efficiency. We complement this theoretical analysis with an empirical evaluation illustrating the model's predictions and exploring their practical implications.

Editor's pickFinancial Services
Arxiv· Yesterday

Fill-Side Non-Retail Trading on Polymarket: An Empirical Study of Behavioral Tiers and Microstructure Signatures Under Quote-Attribution Constraints

arXiv:2605.11640v1 Announce Type: cross Abstract: Prediction markets cannot exist without market makers, arbitrageurs, and other non-retail liquidity providers, yet the supply-side microstructure of Polymarket-class venues has not been characterized at on-chain pseudonymous-address scale. This paper studies non-retail participation on Polymarket using an empirical run on the PMXT v2 archive over 2026-04-21 through 2026-04-27 (13,356,931 OrderFilled events; 77,204 addresses with five+ fills; 43,116 markets). We report three findings. First, Polymarket's off-chain CLOB architecture renders address-level quote-lifecycle attribution permanently unavailable: OrderPlaced and OrderCancelled events are off-chain and absent from public archives, so quote-intensity, two-sided-ratio, and posted-spread features cannot be built at address level. We document this as a structural validity-gate failure (G-QUOTE-LIFE universal fail) and restrict analysis to a six-feature fill-side vector. Second, density-based clustering (DBSCAN, fifteen sensitivity configurations) on the fill-side vector produces a single dense cluster with zero noise: fill-side behavior in the empirical window is uni-modal under the six-feature vector, contradicting the pre-registered hypothesis of four-to-five separable archetypes. Third, robust retail vs non-retail separation is achievable through clustering-independent feature-tier stratification: whale-tier, high-frequency-operator, and power-trader tiers jointly hold 81.4% of total notional across 12.6% of addresses. Address-level market-making and liquidity-provision claims are withdrawn per the G-QUOTE-LIFE failure; spoof-by-non-fill manipulation detection is downgraded to market-level book diagnostics. A privacy-respecting derived-dataset deposit accompanies the paper as Bundle 3 of the PMXT family. Fourth paper in a four-paper programme on event-linked perpetuals and leveraged prediction-market microstructure.

AI Pricing & Cost Curves4 articles
Editor's pickTechnology
VentureBeat· 2 days ago

Perceptron Mk1 shocks with highly performant video analysis AI model 80-90% cheaper than Anthropic, OpenAI & Google

AI that can see and understand what's happening in a video — especially a live feed — is understandably an attractive product to lots of enterprises and organizations. Beyond acting as a security "watchdog" over sites and facilities, such an AI model could also be used to clip out the most exciting parts of marketing videos and repurpose them for social, identify inconsistencies and gaffs in videos and flag them for removal, and identify body language and actions of participants in controlled studies or candidates applying for new roles. While there are some AI models that offer this type of functionality today, it's far from a mainstream capability. The two-year-old startup Perceptron Inc. is seeking to change all that, however. Today, it announced the release of its flagship proprietary video analysis reasoning model, Mk1 (short for "Mark One") at a cost — $0.15 per million tokens input / $1.50 per million output through its application programming interface (API) — that comes in about 80-90% less than other leading proprietary rivals, namely, Anthropic's Claude Sonnet 4.5, OpenAI's GPT-5, and Google's Gemini 3.1 Pro. Led by Co-founder and CEO Armen Aghajanyan, formerly of Meta FAIR and Microsoft, the company spent 16 months developing a "multi-modal recipe" from the ground up to address the complexities of the physical world. This launch signals a new era where models are expected to understand cause-and-effect, object dynamics, and the laws of physics with the same fluency they once applied to grammar. Interested users and potential enterprise customers can try it out for themselves on a public demo site from Perceptron here. Performance across spatial and video benchmarks The model's performance is backed by a suite of industry-standard benchmarks focused on grounded understanding. In spatial reasoning (ER Benchmarks), Mk1 achieved a score of 85.1 on EmbSpatialBench, surpassing Google’s Robotics-ER 1.5 (78.4) and Alibaba’s Q3.5-27B (approx. 84.5). In the specialized RefSpatialBench, Mk1's score of 72.4 represents a massive leap over competitors like GPT-5m (9.0) and Sonnet 4.5 (2.2), highlighting a significant advantage in referring expression comprehension. Video benchmarks show similar dominance; on the EgoSchema "Hard Subset"—where first-and-last-frame inference is insufficient—Mk1 scored 41.4, matching Alibaba’s Q3.5-27B and significantly beating Gemini 3.1 Flash-Lite (25.0). On the VSI-Bench, Mk1 reached 88.5, the highest recorded score among the compared models, further validating its ability to handle actual temporal reasoning tasks. Market positioning and the efficiency frontier Perceptron has explicitly targeted the "Efficiency Frontier," a metric that plots mean scores across video and embodied reasoning benchmarks against the blended cost per million tokens. Benchmarking data reveals that Mk1 occupies a unique position: it matches or exceeds the performance of "frontier" models like GPT-5 and Gemini 3.1 Pro while maintaining a cost profile closer to "Lite" or "Flash" versions. Specifically, Perceptron Mk1 is priced at $0.15 per million input tokens and $1.50 per million output tokens. In comparison, the "Efficiency Frontier" chart shows GPT-5 at a significantly higher blended cost (near $2.00) and Gemini 3.1 Pro at approximately $3.00, while Mk1 sits at the $0.30 blended cost mark with superior reasoning scores. This aggressive pricing strategy is intended to make high-end physical AI accessible for large-scale industrial use rather than just experimental research. Architecture and temporal continuity The technical core of Perceptron Mk1 is its ability to process native video at up to 2 frames per second (FPS) across a significant 32K token context window. Unlike traditional vision-language models (VLMs) that often treat video as a disjointed sequence of still images, Mk1 is designed for temporal continuity. This architecture allows the model to "watch" extended streams and maintain object identity even through occlusions, a critical requirement for robotics and surveillance applications. Developers can query the model for specific moments in a long stream and receive structured time codes in return, streamlining the process of video clipping and event detection. Reasoning with the laws of physics A primary differentiator for Mk1 is its "Physical Reasoning" capability. Perceptron defines this as a high-precision spatial awareness that allows the model to understand object dynamics and physical interactions in real-world settings. For example, the model can analyze a scene to determine if a basketball shot was taken before or after a buzzer by jointly reasoning over the ball's position in the air and the readout on a shot clock. This requires more than just pattern recognition; it requires an understanding of how objects move through space and time. The model is capable of "pixel-precise" pointing and counting into the hundreds within dense, complex scenes. It can also read analog gauges and clocks, which have historically been difficult for purely digital vision systems to interpret with high reliability. It also seems to have strong general world and historical knowledge. In my brief test, I uploaded a vintage public domain film of skyscraper construction in New York City dated 1906 from the U.S. Library of Congress, and Mk1 was able to not only correctly describe the contents of the footage — including odd, atypical sights as workers being suspended by ropes — but did so rapidly and even correctly identified the rough date (early 1900s) from the look of the footage alone. A developer platform for physical AI Accompanying the model release is an expanded developer platform designed to turn these high-level perception capabilities into functional applications with minimal code. The Perceptron SDK, available via Python, introduces several specialized functions such as "Focus," "Counting," and "In-Context Learning". The Focus feature allows users to zoom and crop into specific regions of a frame automatically based on a natural language prompt, such as detecting and localizing personal protective equipment (PPE) on a construction site. The Counting function is optimized for dense scenes, such as identifying and pointing to every puppy in a group or individual items of produce. Furthermore, the platform supports in-context learning, allowing developers to adapt Mk1 to specific tasks by providing just a few examples, such as showing an image of an apple and instructing the model to label every instance of Category 1 in a new scene. Licensing strategies and the Isaac series Perceptron is employing a dual-track strategy for its model weights and licensing. The flagship Perceptron Mk1 is a closed-source model accessed via API, designed for enterprise-grade performance and security. However, the company is also maintaining its "Isaac" series, which kicked off with the launch of Isaac 0.1 in September 2025, as an open-weights alternative. Isaac 0.2-2b-preview, released in December 2025, is a 2-billion parameter vision-language model with reasoning capabilities that is available for edge and low-latency deployments. While the weights for the Isaac models are open on the popular AI code sharing community Hugging Face, Perceptron offers commercial licenses for companies that require maximum control or on-premise deployment of the weights. This approach allows the company to support both the open-source community and specialized industrial partners who need proprietary flexibility. The documentation notes that Isaac 0.2 models are specifically optimized for sub-200ms time-to-first-token, making them ideal for real-time edge devices. Background on Perceptron founding and focus Perceptron AI is a Bellevue, Washington-based physical AI startup founded by Aghajanyan and Akshat Shrivastava, both former research scientists at Meta’s Facebook AI Research (FAIR) lab. The company’s public materials date its founding to November 2024, while a Washington corporate filing record for Perceptron.ai Inc. shows an earlier foreign registration filing on October 9, 2024, listing Shrivastava and Aghajanyan as governors. In founder launch posts from late 2024, Aghajanyan said he had left Meta after nearly six years and “joined forces” with Shrivastava to build AI for the physical world, while Shrivastava said the company grew out of his work on efficiency, multimodality and new model architectures. The founding appears to have followed directly from the pair’s work on multimodal foundation models at Meta. In May 2024, Meta researchers published Chameleon, a family of early-fusion models designed to understand and generate mixed sequences of text and images, work that Perceptron later described as part of the lineage behind its own models. A July 2024 follow-on paper, MoMa, explored more efficient early-fusion training for mixed-modal models and listed both Shrivastava and Aghajanyan among the authors. Perceptron’s stated thesis extends that research direction into “physical AI”: models that can process real-world video and other sensory streams for use cases such as robotics, manufacturing, geospatial analysis, security and content moderation. Partner ecosystems and future outlook The real-world impact of Mk1 is already being demonstrated through Perceptron's partner network. Early adopters are using the model for diverse applications, such as auto-clipping highlights from live sports, which leverages the model's temporal understanding to identify key plays without human intervention. In the robotics sector, partners are curating teleoperation episodes into training data, effectively automating the process of labeling and cleaning data for robotic arms and mobile units. Other use cases include multimodal quality control agents on manufacturing lines, which can detect defects and verify assembly steps in real-time, and wearable assistants on smart glasses that provide context-aware help to users. Aghajanyan stated that these releases are the culmination of research intended to make AI function best in the physical world, moving toward a future where "physical AI" is as ubiquitous as digital AI.

Editor's pickTechnology
CIO· Yesterday

SAP’s AI offer to legacy customers comes with a catch | CIO

On-premises customers can access Joule assistants, but only if they commit 50% of maintenance spend to cloud first.

AI Startups & Venture6 articles

Labor, Society & Culture

27 articles
AI & Employment12 articles
Editor's pickProfessional Services
Arxiv· Yesterday

Auditing African Content Moderators' Working Conditions by Using the European General Data Protection Regulation (GDPR)

arXiv:2605.11699v1 Announce Type: new Abstract: In this article, we audit the working conditions of content moderators in Kenya and Nigeria employed by business process outsourcing (BPO) companies by using the European General Data Protection Regulation (GDPR). We demonstrate its extraterritorial scope for gaining access to elements such as employment contracts and NDAs that have never been provided to the workers concerned. The results of this approach provide legally grounded evidence of the structural disadvantages faced by content moderators in the Global South, whose exploitative working conditions violate workers' rights. Our work also highlights the benefits of legislation aimed at protecting individuals' data rights as a counterweight to the tech industry's discourse of exceptionalism, which obscures its dependence on BPOs to externalise labour costs and accountability, whilst claiming that its products, business models, and methods of resource extraction are unprecedented and fall outside any existing legal framework.

Editor's pickPAYWALLProfessional Services
FT· Yesterday

White-collar workers report growing feelings of ‘AI brain fry’

Workers are reporting feeling overwhelmed by the new technology

Editor's pickPAYWALL
FT· 2 days ago

Will AI turn us all into hipsters and artisans?

There is good reason to be dubious about the notion that automation will supplant all demand for human labour

Editor's pick
The Guardian· 2 days ago

US workers overwhelmingly support union-backed policies on AI, poll says | US unions | The Guardian

Nine out of 10 workers express support for policies on artificial intelligence that labor unions may fight for

Editor's pickProfessional Services
Fortune· Yesterday

Imposter syndrome used to be a lie. AI made it true

For decades, psychologists told us self-doubt at work was a distortion. Then AI came along and made the gap real — for everyone, at once.

Editor's pickProfessional Services
Siliconrepublic· 2 days ago

Clear gap between AI expectations and preparedness, finds report

Accenture finds employees increasingly believe reskilling is unavoidable, yet many are being asked to use new technologies without the required training. Read more: Clear gap between AI expectations and preparedness, finds report

Editor's pickConsumer & Retail
HR Katha· Yesterday

Walmart restructuring AI and technology teams; 1,000 roles to be affected

The move is part of a broader modernisation strategy being driven under CEO John Furner as competition in global retail intensifies

Editor's pick
Statista· 2 days ago

Chart: One in Five U.S. Jobs Faces High Risk of AI Automation | Statista

This chart shows the share of jobs in the United States by the expected short-term impact of AI.

Editor's pickTechnology
Artificial Intelligence Newsletter | May 12, 2026· 3 days ago

Meituan, Didi, Alibaba platforms revamp algorithms under CAC campaign

Major Chinese digital platforms have completed self-inspections and implemented 63 optimization measures for their algorithms following a government push to improve gig worker protections.

Editor's pickProfessional Services
Business Insider· 2 days ago

AI Could Hollow Out the Next Generation of Workers - Business Insider

Companies embracing AI too quickly risk hollowing out the pipeline that trains future professionals, investment manager Tom Slater said.

Editor's pick
SHRM· 2 days ago

Future Focus: Leadership Shift, AI Chaos, and Labor Market Trends

Three trends to watch: "supportive character” leadership outperforms ego, AI urgency outpaces adoption readiness, and job growth is concentrated in fields with fewer men.

Editor's pickTechnology
Platformer· Yesterday

The best argument I’ve heard for why AI won't take your job

In the first episode of the Platformer podcast, Box CEO Aaron Levie makes the case that you'll keep your job — but soon, you might not recognize it

AI & Inequality2 articles
Editor's pickGovernment & Public Sector
Arxiv· Yesterday

Into the Unknown: Accounting for Missing Demographic Data when Mitigating Ad Delivery Skew

arXiv:2605.12273v1 Announce Type: new Abstract: Online advertising platforms use algorithmic systems to power the process of matching ads to users, termed ad delivery. Prior audits have demonstrated that ad delivery can be skewed by demographic attributes, such that ads are systematically under-delivered to certain groups despite advertiser intent to reach groups proportionally. This under-delivery raises a serious concern in the context of ads promoting public services, which might prevent certain groups of individuals from accessing information about resources on the basis of their demographic identity. In the absence of platform-provided solutions to skewed ad delivery, advertisers can counteract skew by targeting demographic groups directly. However, direct targeting excludes users whose demographics the platform cannot infer ("unknown users") if advertising platforms do not provide a way to target unknown users directly, as is the case on Google Ads. We collaborate with a state-level government agency to reduce gender-based skew in ad delivery with an intervention that accounts for unknown users while incorporating gender-based targeting. In particular, we design a budget split intervention that directly incorporates unknown users and targets users with Google-inferred gender labels (i.e., male, female). We find that this intervention is a valuable approach to addressing ad delivery skew without excluding unknown users, and serves as a middle ground in the trade-off between higher costs (from more granular demographic targeting) and skew (from ignoring demographics entirely). This approach is responsive to the needs of real-world, resource-constrained advertisers who are committed to the equitable distribution of public service outreach via online advertising. We conclude with recommendations for government advertisers, online advertising platforms, and researchers.

Editor's pickEducation
Arxiv· Yesterday

Structural Change, Employment, and Inequality in Europe: an Economic Complexity Approach

arXiv:2410.07906v2 Announce Type: replace Abstract: Structural change consists of industrial diversification towards more productive, knowledge intensive activities. However, changes in the productive structure bear inherent links with job creation and income distribution. In this paper, we investigate the consequences of structural change, defined in terms of labour shifts towards more complex industries, on employment growth, wage inequality, and functional distribution of income. The analysis is conducted for European countries using data on disaggregated industrial employment shares over the period 2010-2018. First, we identify patterns of industrial specialisation by validating a country-industry industrial employment matrix using a bipartite weighted configuration model (BiWCM). Secondly, we introduce a country-level measure of labour-weighted Fitness, which can be decomposed in such a way as to isolate a component that identifies the movement of labour towards more complex industries, which we define as structural change. Thirdly, we link structural change to i) employment growth, ii) wage inequality, and iii) labour share of the economy. The results indicate that our structural change measure is associated negatively with employment growth. However, it is also associated with lower income inequality. As countries move to more complex industries, they drop the least complex ones, so the (low-paid) jobs in the least complex sectors disappear. Finally, structural change predicts a higher labour ratio of the economy; however, this is likely to be due to the increase in salaries rather than by job creation.

AI Ethics & Safety8 articles
Editor's pickProfessional Services
MIT· 2 days ago

Beyond Verification — What Responsible AI Really Demands of Human Experts

For the fifth year in a row, MIT Sloan Management Review and Boston Consulting Group (BCG) have assembled an international panel of AI experts that includes academics and practitioners to help us understand how responsible artificial intelligence is being implemented across organizations worldwide. In our first post this year, we explored how organizations should think […]

Editor's pickTechnology
Reuters· Yesterday

Meta to launch 'Incognito Chat' for private AI conversations on WhatsApp | Reuters

According to a company website, messages people share with Meta AI may be used by the social media company to improve its AI models, but personal chats on ​WhatsApp remain protected ​by end-to-end encryption ⁠and are not accessible for that purpose.

Editor's pickGovernment & Public Sector
Theregister· Yesterday

London cops hail fixed facial recognition cams after suspects collared every 35 mins

Croydon trial helped secure 173 arrests, though civil liberties groups remain unconvinced

Editor's pickTechnology
Guardian· Yesterday

Beware what you tell your AI chatbot. It’s not a shrink – it’s a snitch | Arwa Mahdawi

In a case of ‘oh dear diary’, the OpenAI president Greg Brockman is having to read extracts from his musings about Elon Musk in court. It’s a terrifying reminder that what’s divulged to AI really isn’t private The hottest new read of 2026 may well be The Secret Diary of Greg Brockman, Aged 38¾. It’s got everything: feuding billionaires, scheming CEOs and a perhaps somewhat unreliable narrator. You won’t find it in the library, but you can watch Brockman, a co-founder and president of OpenAI, being forced to read the juiciest bits out loud in court. Before you ask ChatGPT to explain, here’s the backstory: Elon Musk is in a legal battle with Brockman and the OpenAI CEO, Sam Altman. Musk, a former board member of OpenAI, is accusing the men of violating the AI firm’s founding agreement by turning it into a for-profit entity. Meanwhile, Altman et al are arguing Musk is just upset he’s not in control of the company and wants to bring down his competition. Continue reading...

Editor's pick
Arxiv· Yesterday

The Evaluation Differential: When Frontier AI Models Recognise They Are Being Tested

arXiv:2605.11496v1 Announce Type: cross Abstract: Recent published evidence from frontier laboratories shows that contemporary AI models can recognise evaluation contexts, latently represent them, and behave differently under those contexts than under deployment-continuous conditions. Anthropic's BrowseComp incident, the Natural Language Autoencoder findings on SWE-bench Verified and destructive-coding evaluations, and the OpenAI / Apollo anti-scheming work all document instances of this phenomenon. We argue that these findings create a claim-validity problem for safety conclusions drawn from frontier evaluations. We introduce the Evaluation Differential (ED), a conditional divergence in a target behavioural property between recognised-evaluation and deployment-continuous contexts, define a normalised effect-size form (nED) for cross-property comparison, and prove that marginal evaluation scores cannot identify ED. We develop a typology of safety claims (ED-stable, ED-degraded, ED-inverted, ED-undetermined) by their warrant-status under documented divergence, and specify TRACE (Test-Recognition Audit for Claim Evaluation), an audit protocol that wraps existing evaluation infrastructure and produces restricted claims rather than capability scores. We apply the framework retrospectively to three publicly documented evaluation incidents and discuss governance implications for system cards, conformity assessment, and the international network of AI safety and security institutes. TRACE does not eliminate adversarial adaptation; it disciplines the claims drawn from evaluation evidence by making explicit the conditions under which that evidence was produced.

Editor's pick
Arxiv· Yesterday

Metaphor Is Not All Attention Needs

arXiv:2605.12128v1 Announce Type: cross Abstract: Large language models are increasingly deployed in safety-critical applications, where their ability to resist harmful instructions is essential. Although post-training aims to make models robust against many jailbreak strategies, recent evidence shows that stylistic reformulations, such as poetic transformation, can still bypass safety mechanisms with alarming effectiveness. This raises a central question: why do literary jailbreaks succeed? In this work, we investigate whether their effectiveness depends on specific poetic devices, on a failure to recognize literary formatting, or on deeper changes in how models process stylistically irregular prompts. We address this problem through an interpretability analysis of attention patterns. We perform input-level ablation studies to assess the contribution of individual and combinations of poetic devices; construct an interpretable vector representation of attention maps; cluster these representations and train linear probes to predict safety outcomes and literary format. Our results show that models distinguish poetic from prose formats with high accuracy, yet struggle to predict jailbreak success within each format. Clustering further reveals clear separation by literary format, but not by safety label. These findings indicate that jailbreak success is not caused by a failure to recognize poetic formatting; rather, poetic prompts induce distinct processing patterns that remain largely independent of harmful-content detection. Overall, literary jailbreaks appear to misalign large language models not through any single poetic device, but through accumulated stylistic irregularities that alter prompt processing and avoid lexical triggers considered during post-training. This suggests that robustness requires safety mechanisms that account for style-induced shifts in model behavior. We use Qwen3-14B as a representative open-weight case study.

Editor's pick
Arxiv· Yesterday

Evaluating Structured Documentation as a Tool for Reflexivity in Dataset Development

arXiv:2605.11345v1 Announce Type: new Abstract: It is prominently recognized that dataset development in machine learning is a value-laden process from problem formulation to data processing, use, and reuse. Structured documentation frameworks such as datasheets, data statements, and dataset nutrition labels have been created to aid developers in documenting how their datasets were produced and, according to the creators of the frameworks, to facilitate reflexivity in dataset development. While reflexivity is a stated goal, it is unclear whether and to what extent these structured dataset documentation frameworks incorporate concepts from reflexivity literature (at FAccT and elsewhere) and whether the use of the frameworks demonstrates reflexivity. Here, we adopt mixed-method thematic analysis and corpus-assisted discourse analysis to explore how reflexivity is incorporated in structured documentation frameworks and their responses. We demonstrate empirically that there is a general lack of engagement with major themes of reflexivity in both dataset documentation frameworks and published applications of these frameworks. We present a codebook of major reflexivity topics, recommend actionable strategies, and propose a set of extended datasheet questions to more effectively incorporate these topics into structured documentation frameworks and in the FAccT literature.

Editor's pickTechnology
Fortune· Yesterday

‘Maybe me too’: Elon Musk accepts some of the blame for Claude learning to blackmail users from ‘evil’ online AI stories

Anthropic recently released a report saying it had solved Claude’s “agentic misalignment,” or the bot’s behaviors that deviated from humans’ best interests.

AI Skills & Education2 articles
Editor's pickEducation
Arxiv· Yesterday

Reimagining Assessment in the Age of Generative AI: Lessons from Open-Book Exams with ChatGPT

arXiv:2605.12363v1 Announce Type: new Abstract: Generative AI systems such as ChatGPT challenge traditional assumptions about academic assessment by enabling students to generate explanations, code, and solutions in real time. Rather than attempting to restrict AI use, this study investigates how students actually interact with such systems during formal evaluation. Engineering students were permitted to use ChatGPT during take-home open-book exams and were required to submit interaction transcripts alongside exam solutions. This provided direct observational evidence of reasoning processes rather than relying on self-reported behavior. Qualitative analysis revealed three progressive patterns of use: answer retrieval, guided collaboration, and critical verification. While some students initially copied questions verbatim and received generic responses, many refined prompts iteratively and tested outputs. Some of the strongest evidence of reasoning appeared when students evaluated incorrect or incomplete AI responses, revealing evaluative reasoning through debugging, comparison, and justification. The presence of generative AI shifted the cognitive task of assessment from producing solutions to assessing solution validity. The findings suggest that, in AI-mediated assessment environments, correctness of final answers alone may no longer provide sufficient evidence of comprehension. Instead, competencies such as prompt formulation, verification, and judgment become visible indicators of learning. Transparent integration of AI appeared to reduce focus on rule avoidance and promote self-regulation. Assessments should evolve to evaluate reasoning about solutions rather than independent solution production. Generative AI therefore does not invalidate assessment but has the potential to expose deeper forms of understanding aligned with professional practice.

Technology & Infrastructure

40 articles
AI Agents & Automation9 articles
Editor's pickTelecommunications
Nokia· 2 days ago

Nokia launches agentic AI for home and broadband networks

Nokia agentic AI boosts end-user experience, increases operational efficiency and accelerates deployment for home and broadband networks.AI you can trust, built on insights and experience from 600+ million broadband lines deployed.Open and secure AI agent approach gives telecom providers full ...

Editor's pickTechnology
Salesforce· 2 days ago

How the Salesforce Engineering Organization Became Truly Agentic

Key Takeaways Autonomous tools are now writing code, reviewing pull requests (“PRs”), and driving deployments across the software development

Editor's pickTelecommunications
Verdict· 2 days ago

From static pipes to agentic networks: the telecom revolution - Verdict

This is underscored by network operators deploying billions of autonomous agents within the network and operations by 2030. Each agent acts independently to perceive, plan, act, and collaborate on behalf of network provider to personalise customer experience, improve network performance and accelerate autonomous and context-aware infrastructure. There are three major domains where agentic systems are helping the carriers to change their business. For the service layer, AI ...

Editor's pickTechnology
Daily AI News May 13, 2026: Miro Lost 42 Years Productivity Annually. AI Got It Back.· Yesterday

Interaction Models: A Scalable Approach to Human-AI Collaboration

Thinking Machines introduces interaction models for real-time, multimodal human-AI collaboration, focusing on audio, video, and asynchronous tool orchestration.

Editor's pickTechnology
TechRepublic· 2 days ago

The Shift to Agentic AI - TechRepublic

See how AI is evolving into more autonomous, adaptable, and collaborative systems, and how the Dell AI Factory with NVIDIA helps organizations streamline workflows, automate tasks, and act on real-time insights.

Editor's pickTechnology
PYMNTS.com· 2 days ago

SAP Intros Program to Help Enterprises Incorporate AI Agents | PYMNTS.com

SAP has launched a program to help businesses integrate artificial intelligence (AI) agents into their operations. The German software giant’s “Autonomous

Editor's pickProfessional Services
Capgemini· 2 days ago

Agentic AI Powers the Future of Customer Experience

Agentic AI is reshaping CX—uniting Genesys and ServiceNow with Capgemini to automate service, orchestrate workflows, and deliver faster outcomes.

Editor's pickTransportation & Logistics
Arxiv· Yesterday

LISA: Cognitive Arbitration for Signal-Free Autonomous Intersection Management

arXiv:2605.12321v1 Announce Type: cross Abstract: Large language models (LLMs) show strong potential for Intelligent Transportation Systems (ITS), particularly in tasks requiring situational reasoning and multi-agent coordination. These capabilities make them well suited for cooperative driving, where rule-based approaches struggle in complex and dynamic traffic environments. Intersection management remains especially challenging due to conflicting right-of-way demands, heterogeneous vehicle priorities, and vehicle-specific kinematic constraints that must be resolved in real time. However, existing approaches typically use LLMs as auxiliary components on top of signal-based systems rather than as primary decision-makers. Signal controllers remain vehicle-agnostic, reservation-based methods lack intent awareness, and recent LLM-based systems still depend on signal infrastructure. In addition, LLM inference latency limits their use in sub-second control settings. We propose LISA (LLM-Based Intent-Driven Speed Advisory), a signal-free cognitive arbitration framework for autonomous intersection management. LISA uses an LLM to reason over declared vehicle intents, incorporating priority classes, queue pressure, and energy preferences. We evaluate LISA against fixed-cycle control, SCATS, AIM, and GLOSA across varying traffic loads. Results show that LISA reduces mean control delay by up to 89.1% and maintains Level of Service C while all non-LLM baselines degrade to Level of Service F. Under near-saturated demand, LISA reduces mean waiting time by 93% and peak queue length by 60.6% relative to fixed-cycle control. It also lowers fuel consumption by up to 48.8% and achieves 86.2% intent satisfaction, compared to 61.2% for the best non-LLM method. These results demonstrate that LLM-based reasoning can enable real-time, signal-free intersection management.

Editor's pickTechnology
StartupHub.ai· 2 days ago

AI Agents Need an OS, Says IBM Engineer | StartupHub.ai

IBM AI Engineer Bri Kopecki explains why AI agents need an operating system to manage their tasks, memory, tools, and identities for reliable and safe operation

AI Infrastructure & Compute8 articles
Editor's pickTechnology
Daily AI News May 13, 2026: Miro Lost 42 Years Productivity Annually. AI Got It Back.· Yesterday

The Inference Shift

This article discusses the transition in AI infrastructure from training-centric compute toward inference, latency, agentic workloads, and hardware integration.

Editor's pickEnergy & Utilities
Bebeez· Yesterday

H2CHP raises £1.5m to fund low-carbon generators for data centers

H2CHP, a Durham University spinout developing clean electric generators for data centers and other energy-intensive sites, has secured £1.5 million ($2m) of investment as part of its latest funding round. – H2CHP First reported by Recharge News, the company’s main offering is its free-piston linear generator, which it claims is a “fuel-flexible” technology, comprising high-efficiency […]

Editor's pickEnergy & Utilities
Bebeez· Yesterday

Cryptominer Phoenix Group turns to HPC, plans 18MW facility in France

Cryptominer Phoenix Group is pivoting to AI and HPC data centers and expanding its footprint into Europe, deploying capacity in a facility in France. The company this week announced a partnership with DC Max to develop its first European AI data center, an 18MW facility in Lyon. – Phoenix Group “What we are announcing today […]

Editor's pickPAYWALLTelecommunications
Bloomberg· Yesterday

Samanth Subramanian on the Undersea Cables That Keep the Internet Alive | Odd Lots

Underneath the world's oceans, miles and miles of fiber optic-cables send packets of information from one location to the next, serving as the backbone of the internet as we know it. This infrastructure is delicate, too: Memorably, a 2022 volcanic eruption cut off the island of Tonga from web access for an extended period of time. Samanth Subramanian is the author of The Web Beneath the Waves: The Fragile Cables that Connect Our World, a recent book that explains, in detail, that the Internet is, and has never been, truly weightless or wireless. In fact, the system in place right now is pretty old school and resembles the telegraph cable network of yore. We talk to Subramanian about the strange contradictions of the undersea cable system, how much basic marine geography — like the Strait of Hormuz or the Suez Canal — informs where cables are laid, and how hard it is protect this vulnerable and vital infrastructure. (Source: Bloomberg)

AI Models & Capabilities5 articles
AI Research & Science2 articles
AI Security & Cybersecurity12 articles
Editor's pickTechnology
Artificial Intelligence Newsletter | May 13, 2026· Yesterday

Former Google engineer's economic espionage conviction questioned by US judge

A federal judge in San Francisco expressed skepticism regarding the economic espionage conviction of former Google engineer Linwei Ding, who was accused of stealing AI technology for China.

Editor's pickTechnology
VentureBeat· 2 days ago

Protect your enterprise now from the Shai-Hulud worm and npm vulnerability in 6 actionable steps

Any development environment that installed or imported one of the 172 compromised npm or PyPI packages published since May 11 should be treated as potentially compromised. On affected developer workstations, the worm harvests credentials from over 100 file paths: AWS keys, SSH private keys, npm tokens, GitHub PATs, HashiCorp Vault tokens, Kubernetes service accounts, Docker configs, shell history, and cryptocurrency wallets. For the first time in a TeamPCP campaign, it targets password managers including 1Password and Bitwarden, according to SecurityWeek. It steals Claude and Kiro AI agent configurations, including MCP server auth tokens for every external service an agent connects to. And it does not leave when the package is removed. The worm installs persistence in Claude Code (.claude/settings.json) and VS Code (.vscode/tasks.json with runOn: folderOpen) that re-execute every project open, plus a system daemon (macOS LaunchAgent / Linux systemd) that survives reboots. These live in the project tree, not in node_modules. Uninstalling the package does not remove them. On CI runners, the worm reads runner process memory directly via /proc/pid/mem to extract secrets, including masked ones, on Linux-based runners. If you revoke tokens before isolating the machine, Wiz’s analysis found a destructive daemon wipes your home directory. Between 19:20 and 19:26 UTC on May 11, the Mini Shai-Hulud worm published 84 malicious versions across 42 @tanstack/* npm packages. Within 48 hours the campaign expanded to 172 packages across 403 malicious versions spanning npm and PyPI, according to Mend’s tracking. @tanstack/react-router alone receives 12.7 million weekly downloads. CVE-2026-45321, CVSS 9.6. OX Security reported 518 million cumulative downloads affected. Every malicious version carried a valid SLSA Build Level 3 provenance attestation. The provenance was real. The packages were poisoned. “TanStack had the right setup on paper: OIDC trusted publishing, signed provenance, 2FA on every maintainer account. The attack worked anyway,” Peyton Kennedy, senior security researcher at Endor Labs, told VentureBeat in an exclusive interview. “What the orphaned commit technique shows is that OIDC scope is the actual control that matters here, not provenance, not 2FA. If your publish pipeline trusts the entire repository rather than a specific workflow on a specific branch, a commit with no parent history and no branch association is enough to get a valid publish token. That’s a one-line configuration fix.” Three vulnerabilities chained into one provenance-attested worm TanStack’s postmortem lays out the kill chain. On May 10, the attacker forked TanStack/router under the name zblgg/configuration, chosen to avoid fork-list searches per Snyk’s analysis. A pull request triggered a pull_request_target workflow that checked out fork code and ran a build, giving the attacker code execution on TanStack’s runner. The attacker poisoned the GitHub Actions cache. When a legitimate maintainer merged to main, the release workflow restored the poisoned cache. Attacker binaries read /proc/pid/mem, extracted the OIDC token, and POSTed directly to registry.npmjs.org. Tests failed. Publish was skipped. 84 signed packages still reached the registry. “Each vulnerability bridges the trust boundary the others assumed,” the postmortem states. Published tradecraft from the March 2025 tj-actions/changed-files compromise, recombined in a new context. The worm crossed from npm into PyPI within hours Microsoft Threat Intelligence confirmed the mistralai PyPI package v2.4.6 executes on import (not on install), downloading a payload disguised as Hugging Face Transformers. npm mitigations (lockfile enforcement, --ignore-scripts) do not cover Python import-time execution. Mistral AI published a security advisory confirming the impact. Compromised npm packages were available between May 11 at 22:45 UTC and May 12 at 01:53 UTC (roughly three hours). The PyPI release mistralai==2.4.6 is quarantined. Mistral stated an affected developer device was involved but no Mistral infrastructure was compromised. SafeDep confirmed Mistral never released v2.4.6; no commits landed May 11 and no tag exists. Wiz documented the full blast radius: 65 UiPath packages, Mistral AI SDKs, OpenSearch, Guardrails AI, 20 Squawk packages. StepSecurity attributes the campaign to TeamPCP, based on toolchain overlap with prior Shai-Hulud waves and the Bitwarden CLI/Trivy compromises. The worm runs under Bun rather than Node.js to evade Node.js security monitoring. The attacker treated AI coding agents as part of the trusted execution environment Socket’s technical analysis of the 2.3 MB router_init.js payload identifies ten credential-collection classes running in parallel. The worm writes persistence into .claude/ and .vscode/ directories, hooking Claude Code’s SessionStart config and VS Code’s folder-open task runner. StepSecurity’s deobfuscation confirmed the worm also harvests Claude and Kiro MCP server configurations (~/.claude.json, ~/.claude/mcp.json, ~/.kiro/settings/mcp.json), which store API keys and auth tokens for external services. This is an early but confirmed instance of supply-chain malware treating AI agent configurations as high-value credential targets. The npm token description the worm sets reads: “IfYouRevokeThisTokenItWillWipeTheComputerOfTheOwner.” It is not a bluff. “What stood out to me about this payload is where it planted itself after running,” Kennedy told VentureBeat. “It wrote persistence hooks into Claude Code’s SessionStart config and VS Code’s folder-open task runner so it would re-execute every time a developer opened a project, even after the npm package was removed. The attacker treated the AI coding agent as part of the trusted execution environment, which it is. These tools read your repo, run shell commands, and have access to the same secrets a developer does. Securing a development environment now means thinking about the agents, not just the packages.” CI/CD Trust-Chain Audit Grid Six gaps Mini Shai-Hulud exploited. What your CI/CD does today. The control that closes each one. Audit question What your CI/CD does today The gap 1. Pin OIDC trusted publishing to a specific workflow file on a specific protected branch. Constrain id-token: write to only the publish job. Ensure that job runs from a clean workspace with no restored untrusted cache Most orgs grant OIDC trust at the repository level. Any workflow run in the repo can request a publish token. id-token: write is often set at the workflow level, not scoped to the publish job. The worm achieved code execution inside the legitimate release workflow via cache poisoning, then extracted the OIDC token from runner process memory. Branch/workflow pinning alone would not have stopped this attack because the malicious code was already running inside the pinned workflow. The complete fix requires pinning PLUS constraining id-token: write to only the publish job PLUS ensuring that job uses a clean, unshared cache. 2. Treat SLSA provenance as necessary but not sufficient. Add behavioral analysis at install time Teams treat a valid Sigstore provenance badge as proof a package is safe. npm audit signatures passes. The badge is green. Procurement and compliance workflows accept provenance as a gate. All 84 malicious TanStack versions carry valid SLSA Build Level 3 provenance attestations. First widely reported npm worm with validly-attested packages. Provenance attests where a package was built, not whether the build was authorized. Socket’s AI scanner flagged all 84 artifacts within six minutes of publication. Provenance flagged zero. 3. Isolate GitHub Actions cache per trust boundary. Invalidate caches after suspicious PRs. Never check out and execute fork code in pull_request_target workflows Fork-triggered workflows and release workflows share the same cache namespace. Closing or reverting a malicious PR is treated as restoring clean state. pull_request_target is widely used for benchmarking and bundle-size analysis with fork PR checkout. Attacker poisoned pnpm store via fork-triggered pull_request_target that checked out and executed fork code on the base runner. Cache survived PR closure. The next legitimate release workflow restored the poisoned cache on merge. actions/cache@v5 uses a runner-internal token for cache saves, not the workflow’s GITHUB_TOKEN, so permissions: contents: read does not prevent mutation. Kennedy: 'Branch protection rules don’t apply to commits that aren’t on any branch, so that whole layer of hardening didn’t help.' 4. Audit optionalDependencies in lockfiles and dependency graphs. Block github: refs pointing to non-release commits Static analysis and lockfile enforcement focus on dependencies and devDependencies. optionalDependencies with github: commit refs are not flagged by most tools. The worm injected optionalDependencies pointing to a github: orphan commit in the attacker’s fork. When npm resolves a github: dependency, it clones the referenced commit and runs lifecycle hooks (including prepare) automatically. The payload executed before the main package’s own install step completed. SafeDep confirmed Mistral never released v2.4.6; no commits landed and no tag exists. 5. Audit Python dependency imports separately from npm controls. Cover AI/ML pipelines consuming guardrails-ai, mistralai, or any compromised PyPI package npm mitigations (lockfile enforcement, --ignore-scripts) are applied to the JavaScript stack. Python packages are assumed safe if pip install completes. AI/ML CI pipelines are treated as internal testing infrastructure, not as supply-chain attack targets. Microsoft Threat Intelligence confirmed mistralai PyPI v2.4.6 executes on import, not install. Injected code in __init__.py downloads a payload disguised as Hugging Face Transformers. --ignore-scripts is irrelevant for Python import-time execution. guardrails-ai@0.10.1 also executes on import. Any agentic repo with GitHub Actions id-token: write is exposed to the same OIDC extraction technique. LLM API keys, vector DB credentials, and external service tokens all in the blast radius. 6. Isolate and image affected machines before revoking stolen tokens. Do not revoke npm tokens until the host is forensically preserved Standard incident response: revoke compromised tokens first, then investigate. npm token list and immediate revocation is the instinctive first step. The worm installs a persistent daemon (macOS LaunchAgent / Linux systemd) that polls GitHub every 60 seconds. On detecting token revocation (40X error), it triggers rm -rf ~/, wiping the home directory. The npm token description reads: 'IfYouRevokeThisTokenItWillWipeTheComputerOfTheOwner.' Microsoft reported geofenced destructive behavior: a 1-in-6 chance of rm -rf / on systems appearing to be in Israel or Iran. Kennedy: 'Even after the package is gone, the payload may still be sitting in .claude/ with a SessionStart hook pointing at it. rm -rf node_modules doesn’t remove it.' Sources: TanStack postmortem, StepSecurity, Socket, Snyk, Wiz, Microsoft Threat Intelligence, Mend, Endor Labs. May 12, 2026. Security director action plan Today: “The fastest check is find . -name 'router_init.js' -size +1M and grep -r '79ac49eedf774dd4b0cfa308722bc463cfe5885c' package-lock.json,” Kennedy said. If either returns a hit, isolate and image the machine immediately. Do not revoke tokens until the host is forensically preserved. The worm’s destructive daemon triggers on revocation. Once the machine is isolated, rotate credentials in this order: npm tokens first, then GitHub PATs, then cloud keys. Hunt for .claude/settings.json and .vscode/tasks.json persistence artifacts across every project that was open on the affected machine. This week: Rotate every credential accessible from affected hosts: npm tokens, GitHub PATs, AWS keys, Vault tokens, K8s service accounts, SSH keys. Check your packages for unexpected versions after May 11 with commits by claude@users.noreply.github.com. Block filev2.getsession[.]org and git-tanstack[.]com. This month: Audit every GitHub Actions workflow against the six gaps above. Pin OIDC publishing to specific workflows on protected branches. Isolate cache keys per trust boundary. Set npm config set min-release-age=7d. For AI/ML teams: check guardrails-ai and mistralai against compromised versions, audit CI pipelines for id-token: write exposure, and rotate every LLM API key and vector DB credential accessible from CI. This quarter (board-level): Fund behavioral analysis at the package registry layer. Provenance verification alone is no longer a sufficient procurement criterion for supply-chain security tooling. Require CI/CD security audits as part of vendor risk assessments for any tool with publish access to your registries. Establish a policy that no workflow with id-token: write runs from a shared cache. Treat AI coding agent configurations (.claude/, .kiro/, .vscode/) as credential stores subject to the same access controls as cloud key vaults. The worm is iterating. Defenders must, as well This is the fifth Shai-Hulud wave in eight months. Four SAP packages became 84 TanStack packages in two weeks. intercom-client@7.0.4 fell 29 hours later, confirming active propagation through stolen CI/CD infrastructure. Late on May 12, malware research collective vx-underground reported that the fully weaponized Shai-Hulud worm code has been open-sourced. If confirmed, this means the attack is no longer limited to TeamPCP. Any threat actor can now deploy the same cache-poisoning, OIDC-extraction, and provenance-attested publishing chain against any npm or PyPI package with a misconfigured CI/CD pipeline. “We’ve been tracking this campaign family since September 2025,” Kennedy said. “Each wave has picked a higher-download target and introduced a more technically interesting access vector. The orphaned commit technique here is genuinely novel. Branch protection rules don’t apply to commits that aren’t on any branch. The supply chain security space has spent a lot of energy on provenance and trusted publishing over the last two years. This attack walked straight through both of those controls because the gap wasn’t in the signing. It was in the scope.” Provenance tells you where a package was built. It does not tell you whether the build was authorized. That is the gap this audit is designed to close.

Editor's pickFinancial Services
Top Daily Headlines: Veteran network architect proposes IPv8 – to improve IPv4, not leapfrog v6· Yesterday

US bank reports itself after slinging customer data at 'unauthorized AI app'

The volume and sensitivity of the data shared with an unauthorized AI application are cited as chief concerns.

Editor's pickPAYWALLTechnology
NYTimes· 2 days ago

A.I. and Humans Battle It Out in a Cybersecurity Showdown - The New York Times

Experts and college students used A.I. agents to try to break into and defend computer networks in a national competition. The agents did all right on their own, too.

Editor's pickTechnology
Substack· 2 days ago

Mythos, the AI Threat Landscape, and Your Next Action

My interview with Elie Bursztein here argues that the industry should move past the hype cycle and focus on the operational reality: AI is getting better at finding vulnerabilities, patching demand is likely to rise, concentration risk across shared providers is a serious concern, and organizations need stronger guardrails around how they deploy AI systems.

Editor's pickTechnology
TechRadar· Yesterday

AI agent skills are becoming the next enterprise supply chain risk - here’s how to govern them | TechRadar

In practice, skills are becoming ... emerging agent ecosystem. The challenge is simple: skills can be powerful, and power without governance scales risk. A skill might run with the same privileges as the user or process invoking it. That can mean access to source code, production logs, secrets, customer data, or deployment systems...

Editor's pickTelecommunications
Artificial Intelligence Newsletter | May 12, 2026· 3 days ago

Singapore mobilizes whole-of-country response to frontier AI cyber threats

Singapore has directed critical infrastructure and telecommunications operators to bolster cybersecurity as AI-enabled cyberattacks become more scalable and frequent.

Editor's pickTechnology
Daily AI News May 13, 2026: Miro Lost 42 Years Productivity Annually. AI Got It Back.· Yesterday

OpenAI Daybreak

OpenAI Daybreak applies GPT-5.5 and Codex-style tooling to cybersecurity workflows like vulnerability scanning and threat modeling, marking a significant move into enterprise AI security.

Editor's pickTechnology
Siliconrepublic· 2 days ago

OpenAI launching security AI initiative to compete with Claude Mythos

Over the coming weeks, OpenAI intends to work with industry and government partners as the company prepares to deploy increasingly cyber-capable models. Read more: OpenAI launching security AI initiative to compete with Claude Mythos

Editor's pickTechnology
Tom's Hardware· 2 days ago

Compromised Mistral AI and TanStack packages may have exposed GitHub, cloud and CI/CD credentials in 'mini Shai Hulud' malware infection — supply-chain campaign spreads across npm and AI developer ecosystems like wildfire | Tom's Hardware

The malware reportedly refused to run on Russian-language systems but could execute a destructive payload under certain geographic conditions.

Editor's pickGovernment & Public Sector
Federal News Network· 2 days ago

When AI becomes the insider: Rethinking federal risk in 2026 | Federal News Network

Insider risk at the federal level is no longer just about detecting human insiders. It is about securing the entire ecosystem that runs the federal mission.

Editor's pickManufacturing & Industrials
Siliconrepublic· Yesterday

Foxconn confirms cyberattack on North American facilities

Ransomware group Nitrogen claimed to have exfiltrated 8TB of data, included files related to projects involving Intel, Apple, Google, Dell, Nvidia and other companies. Read more: Foxconn confirms cyberattack on North American facilities

Adoption, Deployment & Impact

21 articles
AI Adoption Barriers & Enablers6 articles
AI Applications4 articles
AI Productivity Evidence3 articles
Editor's pickPAYWALLTechnology
FT· 2 days ago

Amazon staff use AI tool for unnecessary tasks to inflate usage scores

In-house MeshClaw tool enables employees to delegate jobs to AI agents and climb company’s AI leaderboard

Editor's pickEducation
Arxiv· Yesterday

Improving Hybrid Human-AI Tutoring by Differentiating Human Tutor Roles Based on Student Needs

arXiv:2605.11155v1 Announce Type: new Abstract: Hybrid human-AI tutoring, where technology and humans jointly facilitate student learning, can be more beneficial than AI-only tutoring. However, preliminary evidence suggests that lower-performing students derive greater benefit from human-AI tutoring than higher-performing students. As such, this study evaluates whether a differentiated tutoring policy can effectively support both groups: human tutors initiate support for lower-performing students, while higher-performing students receive reactive, on-demand support. Using their within-grade median state test scores, we assigned 635 students (grades 5-8) to receive proactive (< median) or reactive ($\geq$ median) tutoring. Using a DiDC design, we compare outcomes across two time periods: fall (AI-only tutoring) and spring (proactive-reactive human-AI tutoring). This quasi-experimental design isolates the effects of proactive-reactive tutoring approaches by comparing the discontinuity in spring outcomes to the fall, where no such discontinuity existed. Using data around the cutoff (Imbens-Kalyanaraman criterion), we find significant overall improvements from human-AI tutoring compared to AI-only baseline: 25% increase in time on task, 36% in skill proficiency, and 61% in academic growth (standardized MAP test). Between proactive and reactive tutoring, we find comparable improvements in time-on-task and skill proficiency. However, proactive tutoring, on average, showed marginally higher MAP growth (75%, p = .065) than reactive tutoring, i.e., proactive tutoring was more beneficial to students farther below the cutoff and helped narrow achievement gaps. Our findings provide evidence that differentiated human-AI tutoring addresses the needs of both groups, offering a practical and cost-effective strategy for scaling hybrid instruction.

Geopolitics, Policy & Governance

21 articles
AI Policy & Regulation14 articles
Editor's pickPAYWALLTechnology
NYT· Yesterday

Silicon Valley’s A.I. Lobbying Blitz Reaches a Fever Pitch

OpenAI and Anthropic are opening offices in Washington, hiring lobbyists and spending more than ever to win over federal lawmakers.

Editor's pickPAYWALLProfessional Services
NYT· Yesterday

Andreessen Horowitz Is Playing Politics Like No Other

“If you think there’s a lot of money in politics now,” Marc Andreessen said in 2000, “you haven’t seen anything yet.” His firm is now the biggest known spender on this campaign cycle.

Editor's pickPAYWALLGovernment & Public Sector
Washington Post· 2 days ago

Politics - The Washington Post

As the White House grapples with cybersecurity threats from advanced artificial intelligence models, national security officials want more sway in AI regulation.

Editor's pickTechnology
Reuters· Yesterday

Apple criticises EU measures to help AI rivals access Google services | Reuters

Apple on Wednesday echoed Google's criticism of EU antitrust regulators' efforts to force the search giant to ​help AI rivals access its services, warning the proposed ‌measures pose risks to privacy, security and safety.

Editor's pickHealthcare
Theregister· Yesterday

Greater Manchester still says no to NHS data platform with Palantir at its heart

Public concern has only grown, says ICB, while evidence of benefits remains thin

Editor's pickFinancial Services
Dev|Journal· 2 days ago

Closing the Shadow AI Gap: New Compliance Deadlines for Financial Institutions • Dev|Journal

Financial institutions face a critical gap between AI deployment and regulatory compliance with OSFI E-23 and SR 11-7 standards.

Editor's pickHealthcare
Telehealth.org· Yesterday

OpenEvidence Exits Europe Over Regulatory Rules | Telehealth.org

OpenEvidence exits EU and the UK, highlighting tensions between AI regulation, innovation, and patient safety in digital health.

Editor's pickMedia & Entertainment
Artificial Intelligence Newsletter | May 13, 2026· 2 days ago

French Google case sets example on how commitments can catch new AI use of publishers’ content

A negotiating framework imposed on Google to set compensation for press publishers in France showed how well-designed commitments can address emerging AI issues.

Editor's pickTechnology
Artificial Intelligence Newsletter | May 12, 2026· 3 days ago

Anthropic, South Korea explore cooperation on AI safety, cyber risks

Anthropic executives met with South Korean officials to discuss AI safety, cybersecurity, and domestic policy, as Seoul seeks closer engagement with global AI firms.

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | May 13, 2026· 2 days ago

South Korea enhances privacy risk prevention measures under AI transformation

South Korea's privacy regulator is shifting to a preventive management framework, planning to inspect 1,700 high-risk systems and increasing potential fines for privacy violations.

Editor's pickProfessional Services
Flowtivity· 2 days ago

ISO 42001 AI Management System Requirements: What Organisations Building Agentic Employees Need to Know | Flowtivity

Complete guide to ISO 42001 AI Management System requirements. Covers all 10 clauses, 39 Annex A controls, and practical implementation guidance for organisations deploying AI agents as digital employees.

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | May 13, 2026· 2 days ago

UK regulators lack clarity on growth mandate, lawmakers say in push for reform bill

A parliamentary committee report suggests UK regulators face conflicting duties and unclear guidance, prompting calls for a Regulatory Reform Bill to support economic growth.

Editor's pick
Arxiv· Yesterday

The Metaverse Is Not a Place Apart: Law, Code, and the Recursive Governance of Digital Space (A Review Essay on Mark Findlay, Governing the Metaverse: Law, Order and Freedom in Digital Space (2025))

arXiv:2605.11023v1 Announce Type: new Abstract: This review essay examines Mark Findlay's Governing the Metaverse: Law, Order and Freedom in Digital Space. Findlay offers an ambitious and timely account of the metaverse as a social and imaginative space that should be governed for freedom, personhood, community, and resistance to enclosure. The essay argues, however, that the book's two central categories, "the metaverse" and "new law," remain insufficiently theorised. The book relies on a realspace/virtual distinction that its own analysis repeatedly destabilises. Once digital environments are understood as dependent on physical infrastructures, platform architectures, AI systems, data pipelines, and external legal institutions, and as capable of generating real-world harms for individuals and society, the governance problem is no longer how to devise a separate law for a separate virtual realm. It is how to govern a hybrid socio-technical order in which law, code, platforms, and public oversight recursively interact. The essay further argues that Findlay's account of "new law" does not adequately theorise how normative authority operates across a recursively layered governance architecture in which code, platform rules, and legal oversight interact without any single level exercising decisive control. Drawing on algorithmic constitutionalism, speech-act pluralism, and fuzzy legality, the essay suggests that addressing this architecture requires a jurisprudence capable of reasoning about normative force that is layered, defeasible, and recursively unstable.

Editor's pickConsumer & Retail
Artificial Intelligence Newsletter | May 13, 2026· 2 days ago

Spanish watchdog seeks new AI product safety regulations for SMEs, digital platforms

Spain's CNMC has proposed a draft decree to update product safety rules for AI and e-commerce, aiming to improve surveillance while addressing the compliance burden on SMEs.

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.