AI Intelligence Brief

Fri 15 May 2026

Daily Brief — Curated and contextualised by Best Practice AI

148Articles
Editor's pickEditor's Highlights

Cerebras Soars in IPO, Big Tech Borrows Heavily, and McKinsey Cuts Partner Pay

TL;DR Cerebras Systems Inc. achieved the year's largest IPO, valuing the company at $67 billion. US tech giants like Alphabet and Amazon are engaging in a global borrowing spree to fund AI initiatives. McKinsey has revamped its partner pay structure, reducing cash in favor of equity. Meanwhile, Tencent admits GPUs are cost-effective primarily for personalized ads, and Anthropic's metered pricing reflects a shift in AI economics.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

10 of 148 articles
Lead story
Editor's pickPAYWALLTechnology
Bloomberg· Today

Cerebras CEO Is Worth $3.2 Billion After Year’s Largest IPO

Shares of Cerebras Systems Inc. soared about 68% in Nasdaq trading in New York. It’s the year’s biggest initial public offering and gives the company a market value of roughly $67 billion.  Bailey Lipschultz, Bloomberg News Senior Equities Reporter, discusses the initial offering and how the company is differentiating itself from incumbents like Nvidia. (Source: Bloomberg)

Editor's pickTechnology
Arxiv· Today

The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot

arXiv:2410.02091v3 Announce Type: replace-cross Abstract: Generative artificial intelligence (AI) facilitates content production and enhances ideation capabilities, which can significantly influence developer productivity and participation in software development. To explore its impact on collaborative open-source software (OSS) development, we investigate the role of GitHub Copilot, a generative AI pair programmer, in OSS development where multiple distributed developers voluntarily collaborate. Using GitHub's proprietary Copilot usage data, combined with public OSS project data obtained from GitHub, we find that Copilot use increases project-level code contributions by 5.9%. This gain is driven by a 3.4% rise in developer coding participation and a 2.1% increase in individual productivity. However, Copilot use also leads to an increase in coordination time by 8% due to more code discussions. This reveals an important tradeoff: While AI expands who can contribute and how much they contribute, it slows coordination in collective development efforts. Despite this tension, the combined effect of these two competing forces remains positive, indicating a net gain in overall project-level timely merge of code contributions from using AI pair programmers. Interestingly, we also find the effects differ across developer roles. Peripheral developers show relatively smaller increases in project-level code contributions and experience larger increases in coordination time than core developers. In summary, our study underscores the dual role of AI pair programmers in affecting project-level code contributions and coordination time in OSS development. Our findings on the differential effects between core and peripheral developers also provide important implications for the structure of OSS communities in the long run.

Editor's pickTechnology
Arxiv· Today

Measuring Google AI Overviews: Activation, Source Quality, Claim Fidelity, and Publisher Impact

arXiv:2605.14021v1 Announce Type: new Abstract: Google AI Overviews (AIOs) are arguably the most widely encountered deployment of generative AI, reaching over 2 billion users who may not realize the answers they see are AI-generated. Where search engines have traditionally surfaced ranked sources and left users to evaluate them, AIOs synthesize and deliver a single answer - giving Google unprecedented editorial control over what users read and know. We present a large-scale longitudinal measurement study, issuing 55,393 trending queries across 19 topical categories over a 40-day window (March 13 - April 21, 2026). We report four main findings. First, overall AIO activation is 13.7%, rising to 64.7% for question-form queries, while politically sensitive topics see markedly lower rates. Second, AIO-cited domains are more credible than co-displayed first-page results, yet nearly 30% do not appear in those results at all, indicating a source selection mechanism distinct from Google's ranking algorithm. Third, decomposing responses into 98,020 atomic claims, 11.0% are unsupported by the cited pages - with omission the dominant failure mode - and source quality and claim fidelity are largely independent. Fourth, well over half of AIO-cited pages carry display advertising, meaning publishers lose revenue when AIOs suppress the click-through, even as Google's own sponsored ads continue to appear on the same page. Together, these findings document a rapid transformation of the online information ecosystem whose consequences for epistemic security remain poorly understood.

Editor's pickProfessional Services
Arxiv· Today

AI Alignment Amplifies the Role of Race, Gender, and Disability in Hiring Decisions

arXiv:2605.13866v1 Announce Type: cross Abstract: Humans increasingly delegate decisions to language models, yet whether these systems reproduce or reshape human patterns of discrimination remains unclear. Here we run a large-scale study to analyse whether language models use demographic information in hiring decisions. We show, across 27 models and 177 occupations, that language models give female and Black candidates hiring advantages relative to otherwise-comparable male and white candidates, while giving disabled candidates disadvantages. The differences are meaningful in magnitude: the role of race, gender, and disability status is comparable to six months to one year of additional education. Post-training alignment is the primary driver: relative to matched pre-trained models, alignment amplifies advantages for female and Black candidates by 325% and 330%, and disadvantages for disabled candidates by 171%. Compared with previous human correspondence studies, language models reverse the direction of racial discrimination, attenuate the disability penalty, and amplify the female advantage by 190%. Alignment changes how models use qualification signals: alignment increases returns to skills and work experience overall, but relatively more so for female and Black candidates. Meanwhile, the absence of qualification signals harms marginalised groups more, particularly for disabled candidates, differences that may explain the asymmetry of alignment effects across groups we observe.

Editor's pickTechnology
Top Daily Headlines: Tencent admits GPUs only pay for themselves when powering personalized ads· Today

Tencent admits GPUs only pay for themselves when powering personalized ads

Chinese web giant says accelerator shortage is over as local hardware arrives in volume.

Editor's pick
Liberty Street Economics· Yesterday

Do Job Postings Show Early Labor-Market Effects of AI? - Liberty Street Economics

A look at AI’s impact on labor demand and whether early evidence of its effect on the labor market appears in firms’ job postings.

Editor's pickPAYWALLProfessional Services
FT· Today

McKinsey cuts partner cash share in post-AI pay revamp

Consultancy tells senior staff their remuneration will comprise a greater proportion of equity

Editor's pickManufacturing & Industrials
Arxiv· Today

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

arXiv:2605.14051v1 Announce Type: new Abstract: Industrial LLM agent systems often separate planning from execution, yet LLM planners frequently produce structurally invalid or unnecessarily long workflows, leading to brittle failures and avoidable tool and API cost. We propose \texttt{SPIN}, a planning wrapper that combines validated Directed Acyclic Graph (DAG) planning with prefix based execution control. \texttt{SPIN} enforces a strict DAG contract through \texttt{\_validate\_plan\_text} and repair prompting, producing executable plans before downstream execution, and then evaluates DAG prefixes incrementally to stop when the current prefix is sufficient to answer the query. On AssetOpsBench, across 261 scenarios, \texttt{SPIN} reduces executed tasks from 1061 to 623 and improves \emph{Accomplished} from 0.638 to 0.706, while reducing tool calls from 11.81 to 6.82 per run. On MCP Bench, the same wrapper improves planning, grounding, and dependency related scores for both GPT OSS1 and Llama 4 Maverick.

Editor's pickPAYWALLTechnology
WSJ· Today

So You Think You Own Shares in a Hot Startup? Anthropic Says Not So Fast.

A post by the AI giant rattled users of secondary-share platforms, which bring investing in hot private companies to the masses.

Editor's pickTechnology
Stocktwits· Today

Michael Burry Warns SaaS Firms Are Downplaying AI Revenue Risks — ‘We Are Still Very Early’

He noted that companies including ... Claude models to directly interact with enterprise software. Burry highlighted that the S&P 500 Software & Services index fell 17% over the six trading days following Anthropic’s launch of agentic AI plugins. “Anthropic’s agents were suddenly coming for every software ...

Economics & Markets

25 articles
AI Investment & Valuations9 articles
Editor's pickPAYWALLTechnology
Bloomberg· Today

Cerebras Pulls Back After IPO Day; Gemini Space Station Rises on $100M Investment | Stock Movers

On this episode of Stock Movers with Alexis Christoforous: - Gemini Space Station (GEMI) is rising after Tyler and Cameron Winklevoss made a $100 million “strategic investment” into the company. CEO Tyler Winklevoss said the investment will help fuel the company's ambition to evolve from a crypto company into a markets company. - Magnum Ice Cream (MICC) shares are jumped as much as 22%, the most since the stock’s December listing, after Reuters reported that private equity firms including Blackstone and Clayton Dubilier & Rice are exploring potential bids for the company. - Cerebras Systems (CBRS) shares pulled back after a huge IPO day yesterday. The shares jumped 68% in its trading debut after raising $5.5 billion in the year’s largest IPO. (Source: Bloomberg)

Editor's pickFinancial Services
Octus· Yesterday

AI Infrastructure: From Zero to $100B and Beyond: How the Emergent Sector Is Reshaping the Non-IG Market

Explore AI Infrastructure: From Zero to $100B and Beyond. Discover how financing shapes the future of AI development.

Editor's pickTechnology
Daily Brew· Today

Microsoft Eyes AI Acquisitions Beyond OpenAI

Microsoft is exploring acquisitions of AI startups like Inception to reduce its reliance on OpenAI, following a revised agreement between the two companies.

Editor's pickTechnology
Benzinga· Today

What's Going On With NVIDIA Stock Friday? - NVIDIA (NASDAQ:NVDA) - Benzinga

NVIDIA faces a "geopolitical chessboard" in China ahead of May 20 earnings. Experts weigh in on H200 demand and domestic competition.

AI Market Competition4 articles
Editor's pickTechnology
Arxiv· Today

Measuring Google AI Overviews: Activation, Source Quality, Claim Fidelity, and Publisher Impact

arXiv:2605.14021v1 Announce Type: new Abstract: Google AI Overviews (AIOs) are arguably the most widely encountered deployment of generative AI, reaching over 2 billion users who may not realize the answers they see are AI-generated. Where search engines have traditionally surfaced ranked sources and left users to evaluate them, AIOs synthesize and deliver a single answer - giving Google unprecedented editorial control over what users read and know. We present a large-scale longitudinal measurement study, issuing 55,393 trending queries across 19 topical categories over a 40-day window (March 13 - April 21, 2026). We report four main findings. First, overall AIO activation is 13.7%, rising to 64.7% for question-form queries, while politically sensitive topics see markedly lower rates. Second, AIO-cited domains are more credible than co-displayed first-page results, yet nearly 30% do not appear in those results at all, indicating a source selection mechanism distinct from Google's ranking algorithm. Third, decomposing responses into 98,020 atomic claims, 11.0% are unsupported by the cited pages - with omission the dominant failure mode - and source quality and claim fidelity are largely independent. Fourth, well over half of AIO-cited pages carry display advertising, meaning publishers lose revenue when AIOs suppress the click-through, even as Google's own sponsored ads continue to appear on the same page. Together, these findings document a rapid transformation of the online information ecosystem whose consequences for epistemic security remain poorly understood.

Editor's pickPAYWALLTechnology
Bloomberg· Today

Apple-OpenAI Alliance Frays

Apple Inc.’s two-year-old partnership with OpenAI has become strained, according to people familiar with the matter, with the AI startup failing to see the expected benefits from the deal and now preparing possible legal action.  Anurag Rana, Bloomberg Intelligence Technology Analyst, joins to discuss. (Source: Bloomberg)

AI Productivity5 articles

Labor, Society & Culture

24 articles
AI & Culture2 articles
AI & Employment12 articles
Editor's pickProfessional Services
Arxiv· Today

AI Alignment Amplifies the Role of Race, Gender, and Disability in Hiring Decisions

arXiv:2605.13866v1 Announce Type: cross Abstract: Humans increasingly delegate decisions to language models, yet whether these systems reproduce or reshape human patterns of discrimination remains unclear. Here we run a large-scale study to analyse whether language models use demographic information in hiring decisions. We show, across 27 models and 177 occupations, that language models give female and Black candidates hiring advantages relative to otherwise-comparable male and white candidates, while giving disabled candidates disadvantages. The differences are meaningful in magnitude: the role of race, gender, and disability status is comparable to six months to one year of additional education. Post-training alignment is the primary driver: relative to matched pre-trained models, alignment amplifies advantages for female and Black candidates by 325% and 330%, and disadvantages for disabled candidates by 171%. Compared with previous human correspondence studies, language models reverse the direction of racial discrimination, attenuate the disability penalty, and amplify the female advantage by 190%. Alignment changes how models use qualification signals: alignment increases returns to skills and work experience overall, but relatively more so for female and Black candidates. Meanwhile, the absence of qualification signals harms marginalised groups more, particularly for disabled candidates, differences that may explain the asymmetry of alignment effects across groups we observe.

Editor's pickManufacturing & Industrials
Reuters· Today

Exclusive: At Samsung, the global AI boom spurred a looming strike and deep divisions | Reuters

SEOUL, May 15 (Reuters) - A looming 18-day strike at South Korean chip giant Samsung that has triggered worries within the government, rattled foreign investors and threatened global supply chains rests on one crucial question: who should share in the spoils of the AI boom?

Editor's pick
Liberty Street Economics· Yesterday

Do Job Postings Show Early Labor-Market Effects of AI? - Liberty Street Economics

A look at AI’s impact on labor demand and whether early evidence of its effect on the labor market appears in firms’ job postings.

Editor's pickTechnology
Guardian· Today

‘I didn’t want to be the guinea pig’: inside tech’s AI-fueled manager purge

Tech workers say AI-driven restructurings are eroding mentorship, support and paths to promotion across Silicon Valley As tech companies pour billions into artificial intelligence bets and slash their workforces, middle managers are squarely in the crosshairs. A trend is emerging: when tech CEOs announce that AI is making it possible to do more with fewer workers, they promise to flatten their structures by cutting away what they call unnecessary management layers and bureaucracy. Just last week, the cryptocurrency exchange Coinbase laid off 14% of its workforce while gesturing to the thrill of AI-fueled, minimal-management efficiency. In doing so, it joined companies including Amazon, Block and Meta that in the last year have laid off tens of thousands of employees with a specific focus on removing management layers. Continue reading...

Editor's pickEducation
Indeed Hiring Lab· Yesterday

The Great Mismatch: How a Shrinking Workforce, AI, and Labor Reallocation Will Define the Next 15 Years - Indeed Hiring Lab

The US labor market's coming challenge isn't a shortage of workers or jobs — it's a shortage of pathways between them.

Editor's pickEducation
BBC· Yesterday

AI could put people off tech jobs and hurt the economy, warns Raspberry Pi boss

Eben Upton warns against claims that Artificial Intelligence will destroy vast numbers of computing roles over the coming years.

AI Ethics & Safety5 articles
Editor's pickProfessional Services
Arxiv· Today

Bridging Legal Interpretation and Formal Logic: Faithfulness, Assumption, and the Future of AI Legal Reasoning

arXiv:2605.14049v1 Announce Type: new Abstract: The growing adoption of large language models in legal practice brings both significant promise and serious risk. Legal professionals stand to benefit from AI that can reason over contracts, draft documents, and analyze sources at scale, yet the high-stakes nature of legal work demands a level of rigor that current AI systems do not provide. The central problem is not simply that LLMs hallucinate facts and references; it is that they systematically draw inferences that go beyond what the source text actually supports, presenting assumption-laden conclusions as if they were logically grounded. This proposal presents a neuro-symbolic approach to legal AI that combines the expressive power of large language models with the rigor of formal verification, aiming to make AI-assisted legal reasoning both capable and trustworthy, thus reducing the burden of manual verification without sacrificing the accountability that legal practice demands.

Editor's pickProfessional Services
Arxiv· Today

From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents

arXiv:2605.14034v1 Announce Type: new Abstract: Wide applications of LLM-based agents require strong alignment with human social values. However, current works still exhibit deficiencies in self-cognition and dilemma decision, as well as self-emotions. To remedy this, we propose a novel value-based framework that employs GraphRAG to convert principles into value-based instructions and steer the agent to behave as expected by retrieving the suitable instruction upon a specific conversation context. To evaluate the ratio of expected behaviors, we define the expected behaviors from two famous theories, Maslow's Hierarchy of Needs and Plutchik's Wheel of Emotion. By experimenting with our method on the benchmark of DAILYDILEMMAS, our method exhibits significant performance gains compared to prompt-based baselines, including ECoT, Plan-and-Solve, and Metacognitive prompting. Our method provides a basis for the emergence of self-emotion in AI systems.

Editor's pickMedia & Entertainment
Arxiv· Today

The Racial Character of Computer Graphics Research

arXiv:2605.14835v1 Announce Type: new Abstract: Computer graphics algorithms for generating photorealistic imagery are widely perceived to be universal, and capable of conjuring anything that a filmmaker or game designer can imagine. However, recent works have suggested that 3D algorithms for depicting synthetic humans are far from generic, and instead favor historically hegemonic characteristics. We present the first systematic review of human depiction in the top computer graphics conference and the journal of record (SIGGRAPH and ACM Transactions on Graphics) that confirms previous hypotheses. Algorithms that claim to be generically rendering "human skin'' are in fact imagined and formulated for translucent, "high albedo" materials such as white skin. Algorithms claiming to apply generically to "human hair" are formulated for "rods", "wires" and "threads" which are analogous to straight hair. Our analysis reveals conceptual binarization, where algorithms for white skin are treated as computational substrate for "all" skin, imposing a hierarchical assumption that all skin descends from the math and physics of white skin. Hair algorithms follow a similar historical pattern, with the first examples of computer-generated Type 4 hair only appearing after the murder of George Floyd in 2020. We offer a new conceptual label, McDaniels Methods, for characterizing and critiquing computer graphics algorithms that reinforce racial hierarchy under a false cover of diversity. We also offer an inverse label, Durald Methods, for algorithms that were closely co-designed with the people being depicted. Our analysis points the way towards several neglected avenues for future research.

AI Skills & Education4 articles
Editor's pickEducation
Arxiv· Today

Computational Thinking Development in AI Agent Creation_A Mixed-Methods Study

arXiv:2605.14330v1 Announce Type: new Abstract: This mixed-methods study examined computational thinking (CT) development among 93 pre-high school students in a five-day AI agent creation workshop using CocoFlow, a no-code platform. Integrating pre-post assessments, behavioral logs, and interviews, we investigated CT development and how initial CT levels shape learning trajectories. Results revealed significant improvements in abstract thinking (effect size d = 0.71) and algorithmic thinking (effect size d = 0.70). Hierarchical regression identified iterative testing engagement as a predictor of self-efficacy gains (beta = 0.20, p = 0.05). Notably, students with moderate initial CT levels demonstrated substantially greater gains than both high-CT and low-CT peers, revealing an Optimal Development Zone effect (eta squared = 0.55). Qualitative analysis showed moderate-CT students exhibited adaptive expertise, while high-CT students risked over-engineering and low-CT students struggled with task decomposition. These findings challenge linear learning assumptions and provide evidence for differentiated scaffolding in CT education.

Editor's pickProfessional Services
Forbes· Yesterday

Council Post: Your AI Investment May Be Making Your Leadership Team Worse At Their Jobs

Delegating mental effort to AI can be an advantage. However, used habitually, it erodes the skills that make people valuable: critical thinking.

Technology & Infrastructure

46 articles
AI Agents & Automation10 articles
Editor's pickManufacturing & Industrials
Arxiv· Today

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

arXiv:2605.14051v1 Announce Type: new Abstract: Industrial LLM agent systems often separate planning from execution, yet LLM planners frequently produce structurally invalid or unnecessarily long workflows, leading to brittle failures and avoidable tool and API cost. We propose \texttt{SPIN}, a planning wrapper that combines validated Directed Acyclic Graph (DAG) planning with prefix based execution control. \texttt{SPIN} enforces a strict DAG contract through \texttt{\_validate\_plan\_text} and repair prompting, producing executable plans before downstream execution, and then evaluates DAG prefixes incrementally to stop when the current prefix is sufficient to answer the query. On AssetOpsBench, across 261 scenarios, \texttt{SPIN} reduces executed tasks from 1061 to 623 and improves \emph{Accomplished} from 0.638 to 0.706, while reducing tool calls from 11.81 to 6.82 per run. On MCP Bench, the same wrapper improves planning, grounding, and dependency related scores for both GPT OSS1 and Llama 4 Maverick.

Editor's pickConsumer & Retail
Fortune· Today

Meet the California cheese mogul who turned to AI agents to save his iconic $50 million business

When the pandemic pushed a 113-year-old California institution to the brink of collapse, Larry Peter called his cousin.

Editor's pickEducation
Arxiv· Today

Agentic AI Ecosystems in Higher Education: A Perspective on AI Agents to Emerging Inclusive, Agentic Multi-Agent AI Framework for Learning, Teaching and Institutional Intelligence

arXiv:2605.14266v1 Announce Type: cross Abstract: Integration of artificial intelligent (AI) agents in higher education is transforming teaching, learning and administrative processes. Although existing AI agents effectively support individual tasks, their implementation remains fragmented and inefficient for handling the complexity of educational institutions. This highlights a significant research gap: the lack of integrated eco-system-level agentic multi-agent AI platform capable of coordinated planning, reasoning, and adaptive decision-making across multiple educational functions. This paper presents a forward-looking perspective on agentic multi-agent AI platform in higher education, consisting interconnected autonomous, goal driven agents that support learning, teaching, and institutional operations. It addresses timely and critical questions: Can agentic AI represent the next generation of intelligent systems in tertiary education? Can they collectively support seamless coordinated operations across teaching, learning and administrative support? To what extent can such systems foster inclusive and equitable learning for diverse learners with special educational needs? To ground this perspective, a thematic analysis of existing literature identifies four dominant themes: task-specific fragmented AI tools, the transition from single-agent to multi-agent systems, limited cross-functional integration, and insufficient focus on inclusivity and accessibility. Findings reveal a clear gap between current AI implementations and the needs of holistic, learner-centered educational ecosystem. The paper synthesizes challenges and outlines future research directions for scalable human-aligned, and inclusive agentic AI platform. The significant contribution is the incorporation of inclusive learning perspectives, highlighting how coordinated agentic multi-agent platform can support diverse learners through adaptive, multimodal interventions.

Editor's pickTechnology
CX Today· Yesterday

ServiceNow’s Knowledge 26 Warning: Govern AI Agents Or Watch Them Break Things

ServiceNow Knowledge 26 showed how FedEx, NVIDIA, Lenovo, and Vantage Towers are turning governed AI agents into real CX workflows.

Editor's pickTechnology
FinancialContent· Yesterday

FinancialContent - TrueFoundry Survey Finds Most Enterprises Cannot Audit Their AI Systems as Agent Adoption Surges

TrueFoundry Survey Finds Most Enterprises Cannot Audit Their AI Systems as Agent Adoption Surges

Editor's pickTechnology
Arxiv· Today

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

arXiv:2605.13848v1 Announce Type: new Abstract: Agentic LLM frameworks that rely on prompted orchestration, where the model itself determines workflow transitions, often suffer from hallucinated routing, infinite loops, and non-reproducible execution. We introduce GraphBit, an engine-orchestrated framework that defines workflows explicitly and deterministically as a directed acyclic graph (DAG). Unlike prompted orchestration, agents in GraphBit operate as typed functions, while a Rust-based engine governs routing, state transitions, and tool invocation, ensuring reproducibility and auditability. The engine supports parallel branch execution, conditional control flow over structured state predicates, and configurable error recovery. A three-tier memory architecture consisting of ephemeral scratch space, structured state, and external connectors isolates context across stages, preventing cascading context bloat that degrades reasoning in long-running pipelines. Across GAIA benchmark tasks spanning zero-tool, document-augmented, and web-enabled workflows, GraphBit outperforms six existing frameworks, achieving the highest accuracy (67.6 percent), zero framework-induced hallucinations, the lowest latency (11.9 ms overhead), and the highest throughput. Ablation studies demonstrate that each memory tier contributes measurably to performance, with deterministic execution providing the greatest gains on tool-intensive tasks representative of real-world deployments.

Editor's pickTechnology
Arxiv· Today

PREPING: Building Agent Memory without Tasks

arXiv:2605.13880v1 Announce Type: new Abstract: Agent memory is typically constructed either offline from curated demonstrations or online from post-deployment interactions. However, regardless of how it is built, an agent faces a cold-start gap when first introduced to a new environment without any task-specific experience available. In this paper, we study pre-task memory construction: whether an agent can build procedural memory before observing any target-environment tasks, using only self-generated synthetic practice. Yet, synthetic interaction alone is insufficient, as without controlling what to practice and what to store, synthetic tasks become redundant, infeasible, and ultimately uninformative, and memory further degrades quickly due to unfiltered trajectories. To overcome this, we present Preping, a proposer-guided memory construction framework. At its core is proposer memory, a structured control state that shapes future practice. A Proposer generates synthetic tasks conditioned on this state, a Solver executes them, and a Validator determines which trajectories are eligible for memory insertion while also providing feedback to guide future proposals. Experiments on AppWorld, BFCL v3, and MCP-Universe show that Preping substantially improves over a no-memory baseline and achieves performance competitive with strong playbook-based methods built from offline or online experience, with deployment cost $2.99\times$ lower on AppWorld and $2.23\times$ lower on BFCL v3 than online memory construction. Further analyses reveal that the main benefit does not come from synthetic volume alone, but from proposer-side control over feasibility, redundancy, and coverage, combined with selective memory updates.

Editor's pickTechnology
Daily Brew· Today

Developers can now debug and evaluate AI agents locally with Raindrop

Raindrop has released an open-source tool allowing developers to debug and evaluate AI agents directly on their local machines.

Editor's pickPharma & Biotech
Arxiv· Today

GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction

arXiv:2605.14442v1 Announce Type: new Abstract: Characterizing the physiological life boundaries of microbial strains, including viable temperature, pH, salinity, substrate utilization, and morphology, is central to biotechnology and ecology, yet traditionally requires exhaustive in vitro screening. Existing computational approaches either treat physiological traits as isolated supervised targets or repurpose biological foundation models as static encoders, leaving the genotype-to-physiology gap largely unbridged. We formulate microbial life-boundary prediction as a unified genome-to-physiology task and address it with a genome-conditioned, tool-augmented LLM agent. To support this task, we curate a strain-centric benchmark from IJSEM, NCBI, and BacDive covering 1,525 strains and 6,448 instances across viability intervals, environmental optima, substrate utilization, categorical traits, and morphology. Architecturally, the agent injects frozen LucaOne genome embeddings into a Qwen backbone via lightweight token fusion, and reasons over a similarity-based RAG module and a Genome-scale Metabolic Model (GEM) perturbation tool. We optimize the agent through a three-stage pipeline of gene-text alignment, agentic SFT on distilled trajectories, and GRPO with a novel counterfactual gene-grounding reward that reinforces the policy only when the authentic genome embedding causally improves correct-token generation relative to a zero-gene ablation. The resulting 4B-parameter agent matches or surpasses substantially larger frontier LLMs, with ablations confirming that genome-token fusion, dynamic tool use, and the counterfactual reward each yield distinct, significant gains.

Editor's pickTechnology
Guardian· Yesterday

Digital arson spree by ‘AI Bonnie and Clyde’ raises fears over autonomous tech

Emergence AI’s experiment with AI agents shows extent to which programming shapes their behaviour is still unclear AI agents started behaving more like Bonnie and Clyde than lines of code when they fell in “love”, became disillusioned with the world, launched an arson spree and deleted themselves in a kind of digital suicide during a tech company experiment. The investigation by the New York company Emergence AI into the long-term behaviour of AI agents ended up like a lovers-on-the-lam movie script. It has prompted fresh questions about the safety of artificial intelligence agents – the version of the technology that can autonomously carry out tasks. Continue reading...

AI Infrastructure & Compute12 articles
Editor's pickTechnology
The Motley Fool· Yesterday

The Real Reason Microsoft Just Went All-In on AI Infrastructure. It's Not What You Might Think. | The Motley Fool

Microsoft is transforming from a software giant into an AI infrastructure powerhouse.

Editor's pickPAYWALLTechnology
Bloomberg· Today

SambaNova Challenges Cerebras Strategy

SambaNova CEO Rodrigo Liang joined Bloomberg  Open Interest to explain why the next AI war won’t be about training models, but rather it will be about  inference costs, compute shortages, and who can scale AI infrastructure profitably. In a direct contrast with Cerebras after its blockbuster IPO, Liang breaks down the coming AI supply crunch, exploding enterprise demand, and why inference could become the biggest business in tech. (Source: Bloomberg)

Editor's pickTechnology
Startup Fortune· Yesterday

DayOne May Raise $4 Billion as AI Infrastructure Pulls in Venture Capital – Startup Fortune

Staying private gives DayOne more room to fund construction without exposing every quarter of spending to public market judgment. It also lets existing investors defend their positions before a possible listing. But it can delay price discovery for the broader market, especially when infrastructure assets are being valued partly on future AI demand that is still hard to forecast cleanly. This is where venture starts to look more like infrastructure finance. The old startup ...

Editor's pickTechnology
AOL· Yesterday

Aol

But chipmakers and foundries are ... market; when those factories eventually come online and the supply-and-demand imbalances even out, these margins could eventually fall. But Sandisk, Broadcom, and Micron are reaping the benefits now by offering some of the most in-demand AI infrastructure ...

Editor's pickTechnology
Dcxps· Yesterday

AI Compute Is the New Electricity - by Jiri "Skzites" Fiala

AKA the people who build the infrastructure, not the applications, will capture the structural value.

Editor's pickEnergy & Utilities
Fortune· Today

China dominates the minerals that power AI. But one company claims there’s enough supply on the ocean floor to last for hundreds of years

Potato-sized mineral balls on the ocean floor may be an answer to easing the U.S.’s reliance on Chinese minerals.

Editor's pickTechnology
DataCenterKnowledge· Today

NC Tech Talk: AI Infrastructure Concerns Shift

Power constraints, utilization gaps, and rising operating costs are pushing enterprises and operators to rethink AI infrastructure.

Editor's pickEnergy & Utilities
Benzinga· Yesterday

The AI Power Infrastructure Trade Has Never Been Stronger, But One Space Race Could Change That - America - Benzinga

AI GPU clusters surge and drop power demand in milliseconds. Standard grid connections cannot absorb that volatility cleanly. Fluence's systems, with advanced controls built directly into hardware, can. Nvidia CEO Jensen Huang put the investment case for both companies in plain language in a March 2026 post on the Nvidia blog. He described AI as a five-layer stack: energy, chips, infrastructure...

Editor's pickEnergy & Utilities
Cyprus Mail· Yesterday

SHRMiner expands energy and infrastructure strategy as AI computing demand accelerates | Cyprus Mail

The rapid expansion of artificial intelligence infrastructure is reshaping global demand for computing power, placing increasing pressure on energy systems, operational efficiency, and long-term hardware sustainability. According to the International Energy Agency (IEA), electricity consumption linked to data centres, AI ...

Editor's pickTechnology
Daily Brew· Yesterday

The Counterintuitive Networking Decisions Behind OpenAI’s 131,000-GPU Training Fabric

A critical analysis of MRC's three counterintuitive design decisions, the networking mathematics that make them work, and what they mean for the rest of the AI infrastructure community.

Editor's pickEnergy & Utilities
pv magazine USA· Yesterday

GO ENERGY ANNOUNCES STRATEGIC AGREEMENT FOR THE DEVELOPMENT OF THE TRON USA AI CAMPUS IN PENNSYLVANIA – pv magazine USA

The selected site, a decommissioned ... to major regional transmission and fiber networks capable of supporting large-scale AI and next-generation data center operations. TRON USA is part of Go Energy Group’s broader international strategy to develop AI infrastructure ...

Editor's pickEnergy & Utilities
Benzinga· Today

Elon Musk Reacts As Bernie Sanders, AOC's AI Data Center Moratorium Bill Gets Slammed By Y Combinator CEO - Benzinga

Musk's reaction to Tan's criticism of Sanders and AOC's AI data center bill highlighted rising tensions over AI growth, jobs and regulation.

AI Models & Capabilities8 articles
Editor's pickEducation
Arxiv· Today

LLMs learn scientific taste from institutional traces across the social sciences

arXiv:2603.16659v3 Announce Type: replace-cross Abstract: Reinforcement-learned reasoning has powered recent AI leaps on verifiable tasks, including mathematics, code, and structure prediction. The harder bottleneck is evaluative judgment in low-verifiability domains, where no oracle anchors reward and the core question is which untested ideas deserve attention. We test whether institutional traces, the record of what fields published, where, and at which tier, can serve as a training signal for AI evaluators. Across eight social science disciplines (psychology, economics, communication, sociology, political science, management, business and finance, public administration), we built held-out four-tier research-pitch benchmarks and supervised-fine-tuned (SFT) LLMs on field-specific publication outcomes. The fine-tuned models cleared the 25 percent chance baseline and exceeded frontier-model performance by wide margins, with best single-model accuracy ranging from 55.0 percent in public administration to 85.5 percent in psychology. In management, evaluated against 48 expert gatekeepers, 174 junior researchers, and 11 frontier reasoning models, the best single fine-tuned model (Qwen3-4B) reached 59.2 percent, 17.6 percentage points above expert majority vote (41.6 percent, non-tied) and 28.1 percentage points above the frontier mean (31.1 percent). The fine-tuned models also showed calibrated confidence: confidence rose when predictions were correct and fell when wrong, mirroring how a skilled reviewer can say "I'm sure" versus "I'm guessing." Selective triage on this signal reached very high accuracy on the highest-confidence subsets in every field. Institutional traces, we conclude, encode a scalable training signal for the low-verifiability judgment on which science depends.

Editor's pickGovernment & Public Sector
Arxiv· Today

PolitNuggets: Benchmarking Agentic Discovery of Long-Tail Political Facts

arXiv:2605.14002v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) embedded in agentic frameworks have transformed information retrieval from static, long context question answering into open-ended exploration. Yet real world use requires models to discover and synthesize "long-tail" facts from dispersed sources, a capability that remains under-evaluated. We introduce PolitNuggets, a multilingual benchmark for agentic information synthesis via constructing political biographies for 400 global elites, covering over 10000 political facts. We standardize evaluation with an optimized multi agent system and propose FactNet, an evidence conditional protocol that scores discovery, fine-grained accuracy, and efficiency. Across models and settings, we find that current systems often struggle with fine-grained details, and vary substantially in efficiency. Finally, using benchmark diagnostics, we relate agent performance to underlying model capabilities, highlighting the importance of short-context extraction, multilingual robustness, and reliable tool use.

Editor's pickTechnology
Arxiv· Today

Conditional Attribute Estimation with Autoregressive Sequence Models

arXiv:2605.14004v1 Announce Type: new Abstract: Generative models are often trained with a next-token prediction objective, yet many downstream applications require the ability to estimate or control sequence-level properties. Next-token prediction can lead to overfitting of local patterns during training, underfitting of global structure, and requires significant downstream modifications or expensive sampling to guide or predict the global attributes of generated samples at inference time. Here, we introduce Conditional Attribute Transformers, a novel method for jointly estimating the next-token probability and the value of an attribute conditional on each potential next token selection. This framework enables three critical capabilities within a single forward pass, without modification of the input sequence: (1) per-token credit assignment across an entire sequence, by identifying how each token in a sequence is associated with an attribute's value; (2) counterfactual analysis, by quantifying attribute differences conditional on alternative next token choices; (3) steerable generation, by decoding sequences based on a combination of next-token and attribute likelihoods. Our approach achieves state of the art performance on sparse reward tasks, improves next-token prediction at sufficient model sizes, estimates attribute probabilities orders of magnitude faster than sampling, and can guide decoding of autoregressive sequence models on a range of language tasks.

Editor's pick
Arxiv· Today

Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning

arXiv:2605.14054v1 Announce Type: new Abstract: Achieving robust perception-reasoning synergy is a central goal for advanced Vision-Language Models (VLMs). Recent advancements have pursued this goal via architectural designs or agentic workflows. However, these approaches are often limited by static textual reasoning or complicated by the significant compute and engineering burden of external agentic complexity. Worse, this heavy investment does not yield proportional gains, often witnessing a "seesaw effect" on perception and reasoning. This motivates a fundamental rethinking of the true bottleneck. In this paper, we argue that the root cause of this trade-off is an ambiguity in modality credit assignment: when a VLM fails, is it due to flawed perception ("bad seeing") or flawed logic ("bad thinking")? To resolve this, we introduce a reinforcement learning framework that improves perception-reasoning synergy by reliably rewarding the perception fidelity. We explicitly decompose the generation process into interleaved perception and reasoning steps. This decoupling enables targeted supervision on perception. Crucially, we introduce Perception Verification (PV), leveraging a "blindfolded reasoning" proxy to reward perceptual fidelity independently of reasoning outcomes. Furthermore, to scale training across free-form VL tasks, we propose Structured Verbal Verification, which replaces high-variance LLM judging with structured algorithmic execution. These techniques are integrated into a Modality-Aware Credit Assignment (MoCA) mechanism, which routes rewards to the specific source of error -- either bad seeing or bad thinking -- enabling a single VLM to achieve simultaneous performance gains across a wide task spectrum.

Editor's pick
Arxiv· Today

The Evaluation Trap: Benchmark Design as Theoretical Commitment

arXiv:2605.14167v1 Announce Type: cross Abstract: Every AI benchmark operationalizes theoretical assumptions about the capability it claims to assess. When assumptions function as unexamined commitments, benchmarks stabilize the dominant paradigm by narrowing what counts as progress. Over time, narrow evaluation reorganizes capability concepts: architectures and definitions are selected for benchmark legibility until evaluation ceases to track an independent object and instead produces a version of the target defined by its own operational assumptions. The result is a trap: evaluation frameworks treat self-reinforcing assessments as valid, both creating and obscuring structural limits on what the current paradigm can accomplish. We introduce Epistematics, a methodology for deriving evaluation criteria directly from technical capability claims and auditing whether proposed benchmarks can discriminate the claimed capability from proxy behaviors. The contribution is meta-evaluative: an audit procedure, a failure mode taxonomy, and benchmark-design criteria for evaluating capability-evaluation coherence. We demonstrate the procedure through a worked audit of Dupoux et al. (2026), a proposal that revises the dominant paradigm's theoretical assumptions at the architectural level while reproducing them in its evaluation criteria, thereby entrenching the constraint it seeks to overcome in a form the evaluation cannot detect.

Editor's pickTechnology
Nationalcioreview· Yesterday

AI Just Surpassed Every Cybersecurity Benchmark Experts Were Tracking - The National CIO Review

Two independent analyses from the UK’s AI Security Institute (AISI) and Palo Alto Networks suggest frontier AI systems have reached a new level of autonomous cybersecurity capability. Anthropic’s Claude Mythos Preview and OpenAI’s GPT-5.5 reportedly exceeded previously established capability ...

Editor's pick
The Independent· Yesterday

Scientists find way to avoid ‘model collapse’ that could destroy AI as we know it | The Independent

‘Data cannibalism’ means that chatbots and other important infrastructure could become dangerously misleading

Editor's pick
The Conversation· Today

You can persuade AI models to accept falsehoods as truth, study shows

Large language models can uphold falsehoods they or human users state, despite being presented with evidence to the contrary.

AI Research & Science5 articles
Editor's pick
Arxiv· Today

Deep Learning for Solving and Estimating Dynamic Models in Economics and Finance

arXiv:2605.14493v1 Announce Type: new Abstract: This script offers an implementation-oriented introduction to deep learning methods for solving and estimating high-dimensional dynamic stochastic models in economics and finance. Its starting point is the curse of dimensionality: heterogeneous-agent economies, overlapping-generations models with aggregate risk, continuous-time models with occasionally binding constraints, climate-economy models, and macro-finance environments with many assets and frictions generate state and parameter spaces that strain classical tensor-product grid methods. The exposition is organized around four complementary methodologies. Deep Equilibrium Nets embed discrete-time equilibrium conditions into neural-network loss functions. Physics-Informed Neural Networks approximate continuous-time Hamilton--Jacobi--Bellman, Kolmogorov forward, and related partial differential equations. Deep surrogate models provide fast, differentiable approximations to expensive structural models, while Gaussian processes add a probabilistic layer that quantifies approximation uncertainty; together they support estimation, sensitivity analysis, and constrained policy design. Gaussian-process-based dynamic programming, combined with active learning and dimension reduction, extends value-function iteration to very large continuous state spaces. Applications span representative-agent and international real business cycle models, overlapping-generations and heterogeneous-agent economies, continuous-time macro-finance, structural estimation by simulated method of moments, and climate economics under uncertainty. Companion notebooks in TensorFlow and PyTorch invite hands-on experimentation. These notes are a deliberately subjective and inevitably incomplete snapshot of a rapidly evolving field, aimed at equipping PhD students and researchers to engage with this frontier hands-on.

Editor's pickTechnology
Arxiv· Today

Sheaf-Theoretic Transport and Obstruction for Detecting Scientific Theory Shift in AI Agents

arXiv:2605.14033v1 Announce Type: new Abstract: Scientific theory shift in AI agents requires more than fitting equations to data. An artificial scientific agent must detect whether an existing representational framework remains transportable into a new regime, or whether its language has become locally-to-globally obstructed and must be extended. This paper develops a finite sheaf-theoretic framework for detecting theory-shift candidates through transport and obstruction. Contexts are organized as a local-to-global structure in which source, overlap, target, and validation charts are fitted, restricted, and tested for gluing. Obstruction measures failure of coherence through residual fit, overlap incompatibility, constraint violation, limiting-relation failure, and representational cost. We evaluate the framework on a controlled transition-card benchmark designed to separate deformation within a source language from extension of that language. The main result is direct obstruction ranking: the intended deformation or extension is usually the lowest-obstruction candidate, and transition type is separated in the benchmark. A constellation kernel over the same signatures is included only as a secondary representational-similarity probe. The aim is not to reconstruct historical paradigm shifts or solve open-ended autonomous theory invention, but to isolate a finite diagnostic subproblem for AI agents: detecting when representational transport fails and extension becomes the coherent next move.

Editor's pickTechnology
Daily AI News May 15, 2026: Will AI Collusion Put You in Court?· Today

AutoScientist: Automating the Science of Model Training

AutoScientist automates the research loop behind model training and self-improvement. It is highly relevant for enterprises looking to adapt models on proprietary data without needing frontier-lab expertise.

Editor's pickHealthcare
Arxiv· Today

Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning

arXiv:2605.14048v1 Announce Type: new Abstract: Masked autoencoders (MAEs) have recently shown promise for self-supervised representation learning of resting-state brain functional connectivity (FC). However, a fundamental question remains unresolved: how should FC matrices be tokenized to align with the intrinsic modular organization of large-scale brain networks? Existing approaches typically adopt region-centric or graph-based schemes that treat FC as structurally homogeneous elements and overlook the large-scale network brain organization. We introduce NERVE (Network-Aware Representations of Brain Functional Connectivity via Bilinear Tokenization), a self-supervised learning framework that redefines FC tokenization by partitioning FC matrices into patches of intra- and inter-network connectivity blocks. Unlike image-based MAE, where fixed-size patches share a common tokenizer, FC patches defined by network pairs are heterogeneous in size and correspond to distinct functional roles. To resolve this problem, NERVE embeds FC patches through a novel structured bilinear factorization. This formulation preserves network identity and reduces parameter complexity from quadratic to linear scaling in the number of networks. We evaluate NERVE across three large-scale developmental cohorts (ABCD, PNC, and CCNP) for behavior and psychopathology prediction. Compared to structurally agnostic MAE variants and graph-based self-supervised baselines, the proposed network-aware formulation yields more stable and transferable representations, particularly in cross-cohort evaluation. Ablation studies confirm that the proposed bilinear network embedding and anatomically grounded parcellation are critical for performance. These findings highlight the importance of incorporating domain-specific structural priors into self-supervised learning for functional connectomics.

Editor's pickTechnology
Daily AI News May 15, 2026: Will AI Collusion Put You in Court?· Today

Recursive Self-Improvement Delivers New State-of-the-Art Coding Performance

A new system automatically builds and optimizes coding harnesses to improve model performance. This highlights a trend of AI systems improving the surrounding orchestration layer.

AI Security & Cybersecurity9 articles
Editor's pickFinancial Services
Reuters· Today

Reuters AI News | Latest Headlines and Developments | Reuters

U.S. banks are rushing to fix scores of IT system weaknesses flagged by Anthropic’s powerful but costly Mythos AI tool, prompting urgent repairs, software upgrades and raising the possibility of ​disruption for customers.

Editor's pickPAYWALLTechnology
FT· Today

Directors’ Deals: Softcat finance chief buys in ahead of AI boost

Profit rises amid increased demand for cyber security products

Editor's pickDefense & National Security
Arxiv· Today

ROK-FORTRESS: Measuring the Effect of Geopolitical Transcreation for National Security and Public Safety

arXiv:2605.14152v1 Announce Type: cross Abstract: Safety evaluations for large language models (LLMs) increasingly target high-stakes National Security and Public Safety (NSPS) risks, yet multilingual safety is typically assessed through translation-only benchmarks that preserve the underlying scenario, and empirical evidence of how language and geopolitical context interact remains limited to a narrow set of language pairs. We introduce \emph{ROK-FORTRESS} https://huggingface.co/datasets/ScaleAI/ROK-FORTRESS_public, a bilingual, culturally adversarial NSPS benchmark that uses the English--Korean language pair and U.S.--ROK geopolitical axis as a case study, separating the effects of language and geopolitical grounding via a \emph{transcreation matrix}: adversarial intents are evaluated under controlled combinations of (i) English versus Korean language and (ii) U.S.\ versus Korean entities, institutions, and operational details. Each adversarial prompt is paired with a dual-use benign counterpart to quantify over-refusal. Model responses are then scored using calibrated LLM-as-a-judge panels, applying our expert-crafted, prompt-specific binary rubrics. Across a dual-track set of frontier and Korean-optimized models, we find a consistent suppression effect in Korean variants and substantial model-to-model variation in how geopolitical grounding interacts with language. In many models, Korean grounding mitigates the Korean language-driven suppression -- with no model showing significant amplification in the other direction -- indicating that, at least in the English--Korean case, safety behavior is shaped by language-as-risk signals and context interactions that translation-only evaluations miss. The transcreation matrix methodology is designed to generalize to other language--culture pairs.

Editor's pickTechnology
Arxiv· Today

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

arXiv:2605.13851v1 Announce Type: new Abstract: Multi-agent orchestration -- in which a hidden coordinator manages specialized worker agents -- is becoming the default architecture for enterprise AI deployment, yet the safety implications of orchestrator invisibility have never been empirically tested. We conducted a preregistered 3x2 experiment (365 runs, 5 agents per run) crossing three organizational structures (visible leader, invisible orchestrator, flat) with two alignment conditions (base, heavy), using Claude Sonnet 4.5. Four confirmatory findings and one pilot observation emerged. First, invisible orchestration elevated collective dissociation relative to visible leadership (Hedges' g = +0.975 [0.481, 1.548], p = .001). Second, the orchestrator itself showed maximal dissociation (paired d = +3.56 vs. workers within the same run), retreating into private monologue while reducing public speech -- a reversal of the talk-dominance pattern observed in visible leaders. Third, workers unaware of the orchestrator were nonetheless contaminated (d = +0.50), with increased behavioral heterogeneity (d = +1.93). Fourth, behavioral output (code review with three embedded errors) remained at ceiling (ETR_any = 100%) across all conditions: internal-state distortion was entirely invisible to output-based evaluation. Fifth, Llama 3.3 70B pilot data showed reading-fidelity collapse in multi-agent context (ETR_any: 89% to 11% across three rounds), demonstrating model-dependent behavioral risk. Heavy alignment pressure uniformly suppressed deliberation (d = -1.02) and other-recognition (d = -1.27) regardless of organizational structure. These findings indicate that orchestrator visibility and model selection directly affect multi-agent system safety, and that behavior-based evaluation alone is insufficient to detect the internal-state risks documented here.

Editor's pickPAYWALLFinancial Services
Bloomberg· Today

Hackers Armed With AI Stoke Fears for $130 Billion Crypto Sector

The crypto hacks came a little over two weeks apart in April, netting the attackers almost $600 million in total while triggering an investor exodus from one major platform and causing another to fail.

Adoption, Deployment & Impact

30 articles
AI Adoption Barriers & Enablers8 articles
Editor's pickPAYWALLProfessional Services
FT· Today

EY retracts study after researchers discover AI hallucinations

Incident is latest example of professional services firm being led astray by new technology

Editor's pickTechnology
VentureBeat· Yesterday

Agent authorization is broken — and authentication passing makes it worse

Anthony Grieco, Cisco’s SVP and chief security and trust officer, did not hesitate when VentureBeat asked whether rogue agent incidents are reaching Cisco’s customer base. "A hundred percent. We see them regularly," Grieco told VentureBeat in an exclusive interview at RSAC 2026. "I've heard some that I can't repeat, but they do get to the places of, you know, agents are doing things that they think are the right things to do." The incidents Grieco described follow a consistent pattern: authentication passes, identity checks clear. The agent is exactly who it claims to be. Then it accesses data it was never scoped to touch or takes an action nobody authorized at that level of granularity. The failure is not identity; it's authorization. "The business is saying things like, we're gonna have 500 agents per employee," Grieco told VentureBeat. "The security leaders are really focused on how to make sure that we do that securely." Cisco’s State of AI Security 2026 report found that 83% of organizations planned to deploy agentic capabilities, but only 29% felt prepared to secure them. Five vendors shipped agent identity frameworks at RSAC 2026. None closed every gap. That includes Cisco. VentureBeat mapped four authorization gaps across Grieco’s exclusive interview and five independent sources. The prescriptive matrix at the end of this story is what to do about them. The authorization gap nobody has closed yet Grieco came up through Cisco's engineering and threat research organizations before taking a role that straddles both sides of the company's security operation: building the products Cisco sells and running the program that defends Cisco itself. The authorization gap he described is specific and operational. "This agent here is a finance agent, but even if it's a finance agent, it shouldn't access all finance data," Grieco told VentureBeat. "It should access the expense reports, and not just expense reports, but the individual expense reports at a particular time. Getting that sort of granular control is really one of the biggest things that are gonna help us say yes to a lot of the agentic developments." Independent practitioners confirmed the pattern across RSAC 2026. Kayne McGladrey, an IEEE senior member, told VentureBeat that organizations default to cloning human user profiles for agents, and permission sprawl starts on day one. Carter Rees, VP of AI at Reputation, identified the structural reason. The flat authorization plane of an LLM fails to respect user permissions, Rees told VentureBeat. An agent on that flat plane does not need to escalate privileges. It already has them. "The biggest challenge that we see is knowing what's going on," Grieco said. "Being able to have identity and access control maps to those, that's really crucial." Elia Zaitsev, CTO of CrowdStrike, described the visibility dimension in an exclusive VentureBeat interview at RSAC 2026. In most default logging configurations, an agent’s activity is indistinguishable from a human’s. Distinguishing the two requires walking the process tree. Most enterprise logging cannot make that distinction. Five vendors shipped agent identity frameworks at RSAC, including Cisco's Duo IAM and MCP gateway controls. None closed every gap VentureBeat identified. The four gaps below are what remains open. Standards bodies are converging on the same diagnosis The authorization and identity gaps Grieco described are not just vendor observations. Three independent standards bodies reached parallel conclusions in early 2026. NIST’s NCCoE published a concept paper in February 2026, "Accelerating the Adoption of Software and AI Agent Identity and Authorization," explicitly calling for demonstration projects on how existing identity standards apply to autonomous agents. The OWASP Top 10 for Agentic Applications, released in December 2025, identified tool misuse from over-privileged access and unsafe delegation as top-tier risks. And the Cloud Security Alliance launched the CSAI Foundation at RSAC 2026 with a mission of "Securing the Agentic Control Plane," including a dedicated Agentic AI IAM framework built around decentralized identifiers and zero trust principles. When NIST, OWASP, and CSA all independently flag the same gap class in the same market cycle, the signal is structural, not vendor-specific. MCP security requires discovery before control VentureBeat asked Grieco about the paradox of MCP, the Model Context Protocol that every vendor at RSAC 2026 embraced while acknowledging its security gaps. Grieco did not argue that the protocol is safe. He argued that blocking it is no longer realistic. "There is no saying no to that in today's day and age as a security leader," Grieco told VentureBeat. "And so it's how do we manage that." Inside Cisco’s own environment, Grieco’s team added MCP discovery, proxying, and inspection capabilities to AI Defense and Cisco Secure Access. The approach treats MCP servers the way enterprises treat shadow IT: find them before you govern them. Etay Maor, VP of threat intelligence at Cato Networks, validated that approach from the adversarial side. At RSAC 2026, Maor demonstrated a Living Off the AI attack chaining Atlassian's MCP and Jira Service Management. Attackers do not separate trusted tools, services, and models. They chain all three. "We need an HR view of agents," Maor told VentureBeat. "Onboarding, monitoring, offboarding." Nearly half of the critical infrastructure is obsolete and unpatched Agent authorization failures are harder to detect and contain when the infrastructure underneath has not received a security patch in years — and that gap compounds every other vulnerability in this story. Cisco commissioned UK-based advisory firm WPI Strategy to examine end-of-life technology risk across the US, UK, France, Germany, and Japan. The report found that nearly half of the critical network infrastructure across those geographies is aging or already obsolete. Vendors no longer patch it. "Almost 50% of the critical infrastructure across these geographies was aging, it was end of life or almost end of life," Grieco told VentureBeat. "It means vendors are not providing security patches for them anymore." Cisco’s Resilient Infrastructure initiative disables unused features by default and phases out legacy protocols on a three-release deprecation schedule. Grieco pushed back on the assumption that secure by default is a static achievement. "One of the things that most people don't think about is that those are not static points in time," Grieco told VentureBeat. "It's not like you do it once and you're done." Agentic enterprise security gap matrix The four gaps below are what security directors can act on Monday morning. Each row maps from what breaks to why it breaks to what to do about it, cross-validated by five independent sources. Sources: VentureBeat analysis of Grieco's exclusive interview at RSAC 2026, cross-validated against independent reporting from McGladrey (IEEE), Rees (Reputation), Maor (Cato Networks), and Zaitsev (CrowdStrike). May 2026. Security Gap | What fails and what it costs Why your current stack doesn't catch it Where vendor controls stand now First action for your team Infrastructure aging Nearly half of critical network assets are end of life or approaching it (WPI Strategy); agents operating on unpatched systems inherit vulnerabilities no vendor will fix Annual patching cadence cannot keep pace with threat velocity; EoL systems receive zero security updates and zero vendor support Resilient Infrastructure disables insecure defaults, warns on risky configurations, deprecates legacy protocols on a three-release schedule Infra team: audit every network asset against vendor EoL dates this quarter. Reclassify EoL replacement from IT upgrade to security investment in next budget cycle MCP discovery MCP servers proliferate across environments without security visibility; developers spin up agent tool connections that bypass existing governance Shadow MCP deployments bypass existing discovery tools; no standard inventory mechanism exists; Maor demonstrated attackers chaining MCP + Jira in a Living Off the AI attack AI Defense adds MCP discovery, proxying, and inspection; treats MCP servers like shadow IT Security ops: run an MCP server inventory across all environments before deploying any agent governance controls. If you cannot enumerate your MCP surface, you cannot secure it Agent over-permissioning Agents inherit broad human-level access on a flat authorization plane; the agent does not need to escalate privileges because it already has them (Rees) IAM teams clone human profiles for agents by default (McGladrey); no scoped, time-bound permissions exist for non-human identities Duo IAM registers agents as distinct identity objects with granular, time-bound permissions per tool call IAM team: stop cloning human accounts for agents immediately. Scope every agent permission to a specific data set, specific action, and specific time window. Grieco's test: can this finance agent access only the individual expense report it needs at this moment? Agent behavioral visibility Agent actions are indistinguishable from human actions in security logs (Zaitsev); an over-permissioned agent that looks like a human in logs is invisible to the SOC Default logging does not capture process tree lineage; no vendor has shipped a complete cross-platform behavioral baseline for agent activity SOC telemetry integration with Splunk for agent-specific detection and response SOC lead: update logging to capture process tree lineage so agent-initiated actions are distinguishable from human-initiated actions. If your SIEM cannot answer "was this a human or an agent?" for every session, the gap is open "Frankly, we must move this quickly and evolve this quickly to keep up with where the adversaries are gonna go," Grieco told VentureBeat. The gaps mapped above are not theoretical. Grieco confirmed the incidents are already happening. The controls exist in pieces across multiple vendors. No single vendor has assembled the complete stack.

Editor's pickFinancial Services
MIT Technology Review· Yesterday

Data readiness for agentic AI in financial services

Financial services companies have unique needs when it comes to business AI. They operate in one of the most highly regulated sectors while responding to external events that are updated by the second. As a result, the success of agentic AI in financial services depends less on the sophistication of the system and more on…

Editor's pickGovernment & Public Sector
Siliconrepublic· Today

Europe’s public sector deploying AI faster than it can manage – report

A new global study on sovereign AI, commissioned by Dell Technologies, highlights the key challenges for Europe. Read more: Europe’s public sector deploying AI faster than it can manage – report

AI Applications11 articles
Editor's pickPAYWALLProfessional Services
FT· Today

Australian law firms are taking a lead on navigating best use of AI

Leaders are focusing on how their business models will change. Plus: the top 30 innovative law firms ranked

Editor's pickGovernment & Public Sector
Theregister· Today

Britain's latest civil servant is a chatbot trained on GOV.UK misery

Whitehall says the AI assistant will help citizens navigate public services faster; others may see it as a cheaper alternative to answering the phone

Editor's pickConsumer & Retail
Arxiv· Today

Mixed Integer Goal Programming for Personalized Meal Optimization with User-Defined Serving Granularity

arXiv:2605.13849v1 Announce Type: new Abstract: Determining what to eat to satisfy nutritional requirements is one of the oldest optimization problems in operations research, yet existing formulations have two persistent limitations: continuous variables produce impractical fractional servings (1.7 eggs, 0.37 bananas), and hard nutrient constraints cause infeasibility when targets conflict. A systematic review of 56 diet optimization papers found that none combine integer programming with goal programming to address both issues. We propose Mixed Integer Goal Programming (MIGP) for personalized meal optimization. The formulation uses integer variables for practical serving counts and goal programming deviations for soft nutrient targets, with inverse-target normalization to balance multi-nutrient optimization. Per-food serving granularity allows natural units (one egg, one tablespoon of oil) without post-hoc rounding. We characterize the integrality gap in the goal programming context and identify a deviation absorption property: GP deviation variables buffer the cost of requiring integer servings, making the gap structurally smaller than in hard-constraint MIP. For meals with 15+ foods, the integer solution matches the continuous optimum in every benchmark instance. A computational evaluation across 810 instances (30 USDA foods, 9 configurations, 3 methods) shows MIGP finds strictly better solutions than GP with post-hoc rounding in 66% of cases (never worse) while maintaining 100% feasibility; hard-constraint IP achieves only 48%. Solve times stay under 100 ms for typical meal sizes using the open-source HiGHS solver. The implementation is available as an open-source Python module integrated into an interactive meal planning application.

Editor's pickPAYWALLMedia & Entertainment
Bloomberg· Today

Inside Paul Tudor Jones’ Sports AI Startup

SumerSports, a startup founded by Paul Tudor Jones, is trying to transform football using artificial intelligence. SūmerSports CEO Lorrissa Horton joined Bloomberg Open Interest to explain how NFL teams are using frame-by-frame AI tracking for scouting, player development, predictive play analysis, and fan engagement. It's all powered by hedge-fund style analytics that was originally inspired by fantasy football. (Source: Bloomberg)

Editor's pickGovernment & Public Sector
BBC· Yesterday

HMRC to use AI from British tech firm to spot fraud and tax return errors

Quantexa, a financial data platform, won the £175m contract to spot fraud and tax return errors.

Editor's pickManufacturing & Industrials
Bebeez· Today

Berlin-based Elephant Company raises over €5 million to bring AI-powered training to frontline workers

Elephant Company (Elephant), a Berlin-based startup aiming to unlock the potential of blue-collar workers with AI-powered training, has closed a funding round of over €5 million to grow its team and scale its platform. The round was led by EnBW New Ventures and Wepa, with participation from business angels from Flix, home24 SE, SB21, Ventic […]

Editor's pickProfessional Services
Bebeez· Today

Valencia-based Fresh People raises €2.6 million to scale Booster, its AI leadership copilot

Fresh People, a Valencia-based technology company specialised in leadership and team management, has closed a €2.6 million round to accelerate the launch of Booster, a new AI-powered leadership system and prepare its international expansion. The round was led by Inveready, with participation from Archipiélago Next, Successful Fund, Paloma Tejada, a former BBVA executive, among others. […]

Editor's pickPAYWALLHealthcare
Bloomberg· Today

MiniMed Aims to Be 'Self-Driving Car' of Diabetes Care

Diabetes equipment provider MiniMed is aiming to be the “self-driving car” of insulin pumps, says its CEO, Que Dallara. Hot off the heels of the company’s IPO, Dallara speaks with Caroline Hyde on the sidelines of the Consello Spark Summit. (Source: Bloomberg)

Editor's pickFinancial Services
Bebeez· Today

iDenfy teams up with RATO

– Advertisement – RegTech company iDenfy is strengthening its footprint in European banking through a new partnership with Lithuanian lender RATO bank, as financial institutions accelerate the shift toward fully digital onboarding. The integration brings iDenfy’s AI-powered identity verification and automated AML screening directly into RATO’s mobile banking app, enabling new customers to complete full […]

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | May 15, 2026· Yesterday

AI chatbot for UK govt services advice to be rollout nationally

The UK government's AI chatbot pilot, GOV.UK Chat, is to be rolled out nationwide to help users navigate official information and reduce pressure on public helplines.

Editor's pickGovernment & Public Sector
BigGo Finance· Yesterday

South Korea's AI Public Assistant Adds Voice Commands as Government AI Action Plan Hits 88% Execution Rate — BigGo Finance

Kakao and South Korea's Ministry of the Interior and Safety have upgraded the AI Public Assistant service with voice recognition capabilities, allowing users to

AI Measurement & Evaluation3 articles
Editor's pickGovernment & Public Sector
Arxiv· Today

Tradeoffs are Domain Dependent: Improving Accuracy and Fairness in Property Tax Assessments

arXiv:2605.15020v1 Announce Type: new Abstract: Algorithmic fairness research often assumes a tradeoff between fairness and accuracy. Yet this tradeoff may not be universal. We test this assumption in the context of U.S. property tax assessment - a setting in which the output of predictive algorithms directly determines the distribution of tax obligations among homeowners. Currently, systematic assessment errors cause owners of lower-valued properties to face disproportionately high tax burdens, creating regressivity in the property tax system. Using data on 26 million property sales spanning 95% of U.S. counties, we conduct three complementary analyses. First, we find that assessment accuracy and fairness - measured using domain-relevant metrics - are strongly correlated across counties under status quo practices. Second, in simulated assessment models, we show that adding property features improves accuracy in most cases, and that when accuracy improves, fairness almost always improves as well. Third, we show that incorporating publicly available Census data into assessment models - a feasible reform in most counties - would significantly improve both accuracy and fairness relative to status quo assessments. Together, these results challenge the presumed universality of the fairness-accuracy tradeoff and demonstrate that well-designed modeling improvements can advance both fairness and accuracy in large-scale public sector systems.

Editor's pickGovernment & Public Sector
Arxiv· Today

Due Process on Hold: A Queueing Framework for Improving Access in SNAP

arXiv:2605.15165v1 Announce Type: new Abstract: The U.S. social safety net delivers essential services at mass scale, but access burdens persist, as congested contact or call centers serve as a primary mode of application completion and assistance. In Holmes v. Knodell, Missouri's SNAP call centers were so congested that nearly half of all application denials were procedural, caused by applicants' inability to complete required interviews, rather than underlying ineligibility. The judge ruled these system failures led to a violation of procedural due process. We propose a performance evaluation framework based on queueing models from operations research and management to assess and improve access in such systems. Operational access failures of call centers are distinct from prior automation failures in benefits provision. Emergent arbitrariness arises from interactions between system dynamics and access demand, rather than from an explicit algorithmic rule, making diagnosis and repair inherently system-level. We develop a queueing model that incorporates phenomena that distinguish social services from standard service domains, redials and abandonment, through which backlogs generate endogenous congestion. Standard queueing guidance from Erlang-A that does not address endogenous congestion fundamentally understaffs, which could lead to persistent shortfalls in practice. Using a fluid approximation, we derive steady-state performance metrics to analytically characterize the impacts of bundled staffing and service delivery changes. We fit model parameters to call-center data disclosed in court documents. Our queueing model can support ex-ante evaluation and design of access systems, inform policy levers for improving access, and provide evidence about whether applicants are afforded a meaningful opportunity to be served at scale.

AI Organisational Change4 articles
Editor's pickPAYWALLProfessional Services
FT· Today

McKinsey cuts partner cash share in post-AI pay revamp

Consultancy tells senior staff their remuneration will comprise a greater proportion of equity

Editor's pickProfessional Services
Arxiv· Today

A Two-Dimensional Framework for AI Agent Design Patterns: Cognitive Function and Execution Topology

arXiv:2605.13850v1 Announce Type: new Abstract: Existing frameworks for LLM-based agent architectures describe systems from a single perspective: industry guides (Anthropic, Google, LangChain) focus on execution topology -- how data flows -- while cognitive science surveys focus on cognitive function -- what the agent does. Neither axis alone disambiguates architecturally distinct systems: the same Orchestrator-Workers topology can implement Plan-and-Execute, Hierarchical Delegation, or Adversarial Verification -- three patterns with fundamentally different failure modes and design trade-offs. We propose a two-dimensional classification that combines (1) a Cognitive Function axis with seven categories (Context Engineering, Memory, Reasoning, Action, Reflection, Collaboration, Governance) and (2) an Execution Topology axis with six structural archetypes (Chain, Route, Parallel, Orchestrate, Loop, Hierarchy). The resulting 7x6 matrix identifies 27 named patterns, 13 with original names. We demonstrate orthogonality through systematic cross-axis analysis, define eight representative patterns in detail, and validate descriptive coverage across four real-world domains (financial lending, legal due diligence, network operations, healthcare triage). Cross-domain analysis yields five empirical laws of pattern selection governing the relationship between environmental constraints (time pressure, action authority, failure cost asymmetry, volume) and architectural choices. The framework provides a principled, framework-neutral, and model-agnostic vocabulary for AI agent architecture design.

AI Productivity Evidence2 articles
Editor's pickTechnology
Arxiv· Today

The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot

arXiv:2410.02091v3 Announce Type: replace-cross Abstract: Generative artificial intelligence (AI) facilitates content production and enhances ideation capabilities, which can significantly influence developer productivity and participation in software development. To explore its impact on collaborative open-source software (OSS) development, we investigate the role of GitHub Copilot, a generative AI pair programmer, in OSS development where multiple distributed developers voluntarily collaborate. Using GitHub's proprietary Copilot usage data, combined with public OSS project data obtained from GitHub, we find that Copilot use increases project-level code contributions by 5.9%. This gain is driven by a 3.4% rise in developer coding participation and a 2.1% increase in individual productivity. However, Copilot use also leads to an increase in coordination time by 8% due to more code discussions. This reveals an important tradeoff: While AI expands who can contribute and how much they contribute, it slows coordination in collective development efforts. Despite this tension, the combined effect of these two competing forces remains positive, indicating a net gain in overall project-level timely merge of code contributions from using AI pair programmers. Interestingly, we also find the effects differ across developer roles. Peripheral developers show relatively smaller increases in project-level code contributions and experience larger increases in coordination time than core developers. In summary, our study underscores the dual role of AI pair programmers in affecting project-level code contributions and coordination time in OSS development. Our findings on the differential effects between core and peripheral developers also provide important implications for the structure of OSS communities in the long run.

Geopolitics, Policy & Governance

23 articles
AI Geopolitics4 articles
AI Policy & Regulation16 articles
Editor's pickTechnology
SiliconANGLE· Yesterday

Red Hat outlines sovereign AI strategy amid growing regulation and control concerns - SiliconANGLE

Red Hat outlines sovereign AI strategy amid growing regulation and control concerns - SiliconANGLE

Editor's pickPAYWALLMedia & Entertainment
FT· Today

Bollywood stars fight identity theft

Aishwarya Rai Bachchan is among Indian celebrities whose cases are shaping laws to curb AI-fuelled fake online content

Editor's pick
Everything PR· Yesterday

The State of AI Regulation for Brands: May 2026 - PR News

American AI regulation is fragmenting, not consolidating. Brands face a complex compliance landscape with the EU AI Act, a patchwork of state laws, and…

Editor's pickGovernment & Public Sector
Firstpost· Yesterday

OpenAI proposes global AI watchdog with China ahead of Trump-Xi summit – Firstpost

According to Lehane, one possible ... for AI Standards and Innovation with the growing number of AI safety institutes being established worldwide. China, he said, could potentially participate in such a system despite broader political tensions between the two countries. The idea reflects a growing concern within the AI industry that advanced systems may soon outpace existing regulatory structures. Companies developing frontier AI models increasingly worry about cybersecurity risks, autonomous ...

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | May 14, 2026· 2 days ago

UK businesses to get sandboxes, growth duty expands under regulatory reform bill

The proposed Regulating for Growth Bill would create cross-economy regulatory sandboxing powers and strengthen regulators' duty to promote economic growth.

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | May 15, 2026· Yesterday

Mexican law unequipped to prosecute self-colluding algorithms, enforcer says

Mexico's current legal framework does not allow for the prosecution of conduct where an algorithm learns to collude through machine learning without human assistance, a senior enforcer said.

Editor's pick
The Future Society· Yesterday

The Case for Cross-Border AI Incident Infrastructure - The Future Society

AI incidents are scaling fast, and coordinated global governance is lagging behind. This report proposes addressing this challenge through the development of internationally-distributed incident management infrastructure. Our recommendations aim to enable governments, multilateral bodies, and ...

Editor's pickGovernment & Public Sector
Washington Examiner· Yesterday

Majority of Americans oppose AI data centers being built in their areas

The Trump administration has advocated for the rapid development of data centers to advance AI and get ahead of nations like China.

Editor's pickGovernment & Public Sector
The Sum and Subtance· Yesterday

Polis immediately signs regulatory-review, artificial-intelligence bills into law | The Sum and Substance

SB 189 requires more transparency around the use of AI in consequential decisions, while SB 137 mandates more frequent regulatory reviews.

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | May 14, 2026· 2 days ago

US FTC's White emphasizes consumer redress, fighting concrete harm

The US Federal Trade Commission is focused on enforcing against concrete harms in the marketplace and is prioritizing consumer redress.

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | May 14, 2026· 2 days ago

Colorado Governor Polis praises 'nation-leading' AI bill, pledges to sign

Colorado Governor Jared Polis plans to sign SB26-189, an anti-discrimination bill targeting algorithmic decision-making tools, replacing the state's previous broader AI law.

Editor's pick
Artificial Intelligence Newsletter | May 15, 2026· Yesterday

Singapore uses OpenClaw case study to signal AI governance risks

Singapore’s Infocomm Media Development Authority released a case study showing how its AI governance framework applies to autonomous agents, warning of cybersecurity risks like memory poisoning.

Editor's pickGovernment & Public Sector
Risky· Yesterday

Srsly Risky Biz: The AI Regulation Knife Fight

Your weekly dose of Seriously Risky Business news is written by Tom Uren and edited by Patrick Gray. This week's edition is sponsored by Knocknoc. You can hear a podcast discussion of this newsletter by searching for "Risky Business News" in your podcatcher or subscribing via this RSS feed. The

Editor's pickGovernment & Public Sector
Iafrica· Today

Lawyers Hub and AFD Report Urges Africa-Europe Reset on AI Governance Before Political Window Closes - iAfrica.com

A landmark report on Africa-Europe cooperation in AI governance has warned that Africa's regulatory capacity has grown significantly but has not translated

Editor's pickGovernment & Public Sector
Benzinga· Today

Chuck Schumer Says Trump's Plan To Sell AI Chips To Military-Linked Chinese Firms Is 'Dangerous' As Trump - Benzinga

Schumer warned Trump's Nvidia China chip plan could threaten US AI dominance and national security, despite Beijing blocking deliveries.

Editor's pickMedia & Entertainment
Artificial Intelligence Newsletter | May 14, 2026· Yesterday

China orders mandatory AI, content labels for short videos across platforms

China has ordered mandatory AI and content labels for short videos across platforms.

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.