AI Intelligence Brief

Wed 20 May 2026

Daily Brief — Curated and contextualised by Best Practice AI

142Articles
Editor's pickEditor's Highlights

Google Partners with Blackstone, Standard Chartered Cuts Jobs, and China Shields Workers

TL;DR Google and Blackstone are launching a new AI cloud company with $5 billion in equity capital. Standard Chartered plans to cut 8,000 jobs by replacing lower-value human capital with AI. Meta is reorganizing over 7,000 employees around AI initiatives. Meanwhile, China is using legal rulings to protect jobs from AI displacement. Nvidia's $90 billion investment spree ties customers and startups to its technology.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

7 of 142 articles
Lead story
Editor's pickFinancial Services
Arxiv· Today

The Insurability Frontier of AI Risk: Mapping Threats to Affirmative Coverage, Silent Exposures, and Exclusions

arXiv:2605.18784v1 Announce Type: cross Abstract: The rapid diffusion of agentic AI has created a new coverage problem for commercial insurance: some AI-mediated losses are now affirmatively insured, some create silent-AI exposure under legacy cyber, technology errors-and-omissions (E&O), directors-and-officers (D&O), employment practices liability (EPLI), crime, and media policies, and others are being actively excluded. This paper maps that emerging boundary by coding 55 AI threat classes against 26 insurance products, endorsements, and exclusion regimes using public carrier materials and OWASP/MITRE threat catalogs. We identify a four-tier insurability frontier: affirmatively insured perils, silent-AI exposures, actively excluded perils, and perils outside conventional private insurance structures. Our coding measures publicly claimed positioning rather than executed contract wording; the headline statistics describe what carriers publicly state about coverage, not what would be paid in any specific claim. Three patterns emerge. First, affirmative AI coverage is beginning to differentiate by primary risk emphasis: public materials often position Munich Re around model performance and drift, Armilla and parts of the Lloyd's market around hallucination and broader AI liability, Tokio Marine Kiln and CFC around IP and technology E&O concerns, Apollo ibott around emerging autonomous system liability, and Coalition around deepfake and AI-enabled cyber response. Second, legacy lines retain silent-AI exposure where AI is an instrumentality rather than the legal cause of loss. Third, foundation model concentration is the clearest genuinely novel insurability frontier because upstream model failure can correlate losses across many cedents at once; the relevant market design question is which insurability constraint each candidate structure relaxes, not merely which systemic risk template exists.

Editor's pick
Arxiv· Today

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

arXiv:2605.19099v1 Announce Type: new Abstract: We introduce DecisionBench, a benchmark substrate for emergent delegation in long-horizon agentic workflows. The substrate fixes a task suite (GAIA, tau-bench, BFCL multi-turn), a peer-model pool (11 models, 7 vendor families), a delegation interface (call_model plus an optional read_profile channel), a deterministic skill-annotation layer, and a multi-axis metric suite covering quality, cost, latency, delegation rate, routing fidelity-at-k, vendor self-preference, and a counterfactual-delegation ceiling. The substrate is agnostic to how peer information is generated or delivered, so learned routers, richer peer memories, adaptive profile construction, and multi-step delegation can all be evaluated against it. We characterize the substrate with a five-condition reference sweep on the full pool (n=23,375 task instances). Three benchmark-level findings emerge: (i) mean end-task quality is statistically indistinguishable across the four awareness conditions (|beta| = 0.21), so quality-only evaluation would miss the orchestration signal; (ii) routing fidelity-at-1 ranges from 7.5% to 29.5% across conditions at near-equal mean quality, with delivery channel (on-demand tool vs. preloaded description) dominating description content; (iii) a counterfactual ceiling places perfect delegation 15-31 percentage points above measured performance on every suite, locating large unrealized headroom for future orchestration methods. We release the substrate, annotation layer, reference intervention suite, analysis pipeline, and 220 per-condition run archives.

Editor's pickPAYWALLTechnology
WSJ· Yesterday

Analog Devices to Buy Empower Semiconductor for $1.5 Billion

The deal will help Analog Devices expand its total addressable market in artificial-intelligence compute power delivery as demand from AI developers climbs, the semiconductor company said.

Economics & Markets

32 articles
AI Investment & Valuations8 articles
AI Market Competition8 articles
Editor's pick
Daily AI News May 20, 2026: How Agentic AI Supercharges Startups and Threatens Incumbents· Today

How Agentic AI Supercharges Startups and Threatens Incumbents

Agentic AI provides startups with faster iteration, automated go-to-market capabilities, and capital efficiency. This poses a strategic threat to incumbents with legacy business models.

Editor's pickProfessional Services
Arxiv· Today

Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints

arXiv:2605.19140v1 Announce Type: new Abstract: We study workflow learning in a setting where specialized agents hand off control through a shared artifact, each agent observes only a local function of that artifact and its own private state, and no centralized learner accesses joint trajectories -- the operating regime of multi-agent LLM pipelines that span organizational, vendor, or trust boundaries. We formalize this regime as an interface-constrained semi-Markov decision process (IC-SMDP), whose decision epochs occur at handoff times, and design IC-$Q$, an asynchronous decentralized $Q$-learning algorithm in which cross-agent coordination at every handoff is exactly one scalar. Our main result is a finite-sample bound for neural IC-$Q$ that decomposes into three independently controllable error sources: neural function-approximation error, interface representation gap, and a mixing-time residual, under the random option-duration discount. Establishing this bound requires lifting the approximate information state (AIS) framework from single-agent primitive-step MDPs to multi-agent SMDPs and controlling Markovian noise under random duration, neither of which has been done in prior work. To our knowledge this is the first finite-sample guarantee for neural $Q$-learning under decentralized partial observability. Four experiments: a controlled synthetic IC-SMDP that validates the bound term-by-term, multi-LLM mathematical reasoning, multi-agent routing, and multi-agent CPU programming, show that IC-$Q$ matches a centralized oracle without any agent observing joint trajectories, with each of the three error sources scaling along its corresponding axis as the bound predicts.

Editor's pickTechnology
The Register· Yesterday

SAP's AI strategy: Come for the openness, stay because you have to

Joule Studio 2.0 waves the flag of interoperability, API policy tells enterprises who's really in charge

Editor's pickTechnology
Theregister· Yesterday

Anthropic’s Stainless steal tightens grip on AI dev tooling

Claude maker nabs SDK and MCP tooling biz, plans to sunset platform

Editor's pickTechnology
Artificial Intelligence Newsletter | May 20, 2026· Yesterday

Google, Amazon, Microsoft face further delay in EU's cloud and AI development bill

Legislative proposals regarding cloud and AI development in the EU have been delayed, impacting major US hyperscalers and drawing attention from European competitors.

Editor's pickTechnology
Reuters· Yesterday

OpenAI defeats Elon Musk's lawsuit, removes obstacle to IPO | Reuters

People use AI for myriad purposes such as education, facial recognition, financial advice, journalism, legal research, medical diagnoses, and harmful deepfakes.

Editor's pickPAYWALLTechnology
Washington Post· Today

AI & Tech Brief: The AI Influence Machine - The Washington Post

Scale AI , which provides the data infrastructure for AI model training, launches a new seven-figure ad campaign in New York and San Francisco called “the humans stay.”

Editor's pickMedia & Entertainment
Arxiv· Today

The Revenue of Finance Journals: Networks, Pricing Power, and Publication Volume

arXiv:2508.14301v3 Announce Type: replace Abstract: I study commercial revenue at 26 finance journals over 1999-2025, exploiting the creation of the Elsevier Finance Journal Ecosystem (a formal network of coordinated journals planned in 2019 and launched in 2020) as a quasi-natural experiment. Using synthetic control as the primary identification strategy, I find that ecosystem membership generated a projected long-run commercial revenue effect of approximately \$54-\$59 million in real 2024 USD, comprising \$48 million in citation-mechanism-implied APC revenue and \$6-\$11 million in incremental submission-fee revenue (the submission-fee range reflects uncertainty about the share of extra submissions arriving via Elsevier's Article Transfer Service, which generates no incremental fee at the receiving journal). Of this total, approximately \$40-\$44 million is directly observed and realized through 2025 (a \$36 million synthetic-control gap on APC flow revenue plus \$4-\$8 million in incremental submission fees); the remaining \$14-\$15 million reflects standard submission-to-citation-to-revenue propagation lags from citation gains realized in 2019-2025 that are projected to materialize as publication revenue through approximately 2028. The effect is highly concentrated: four core journals (FRL, IRFA, IREF, RIBAF) account for 95% of the gain. Decomposing the revenue effect into intensive (price) and extensive (volume) margins, 89% comes from expanded publication volume; per-paper pricing power rose modestly if at all. The findings speak to the economics of coordinated networks in information-goods markets and to the industrial organization of scholarly publishing.

AI Productivity2 articles
Editor's pickEducation
Arxiv· Today

How Far Are We From True Auto-Research?

arXiv:2605.19156v1 Announce Type: new Abstract: Recent auto-research systems can produce complete papers, but feasibility is not the same as quality, and the field still lacks a systematic study of how good agent-generated papers actually are. We introduce ResearchArena, a minimal scaffold that lets off-the-shelf agents (Claude Code using Opus 4.6, Codex using GPT-5.4, and Kimi Code using K2.5) carry out the full research loop themselves (ideation, experimentation, paper writing, self-refinement) under only lightweight guidance. Across 13 computer science seeds and 3 trials per agent-domain pair, ResearchArena yields 117 agent-generated papers, each evaluated under three complementary lenses: a manuscript-only reviewer (SAR), an artifact-aware peer review (PR) in which agents inspect the workspace alongside the manuscript, and an human conducted meta-review. Under SAR alone the picture is optimistic: Claude Code obtains the highest score, outperforms Analemma's FARS, and matches the weighted-average human ICLR 2025 submission, suggesting that minimally scaffolded agents can produce papers that look competitive on manuscript-only review. Manual inspection, however, reveals this picture is overstated: SAR scores are poorly aligned with its actual acceptance decisions and reward plausible framing without verifying experimental substance. Under artifact-aware PR scores drop sharply, and manual auditing identifies experimental rigor as the major bottleneck, decomposing into three failure modes (fabricated results, underpowered experiments, and plan/execution mismatch) that are highly agent-dependent: Codex 5%/8% paper-vs-artifact mismatch / fabricated references versus Kimi Code 77%/72%, a $\sim$15$\times$ spread that tracks distinct research personas the agents develop. None of the 117 agent-generated papers reaches the acceptance bar of a top-tier venue. This suggests that we are still gapped from the true auto-research.

AI Startups & Venture8 articles
Editor's pickHealthcare
Reuters· Yesterday

Healthcare AI firm Commure valued at $7 billion, raises $70 million | Reuters

Agentic AI — which can ​plan, decide and act autonomously rather ​than just respond to prompts — has become one of venture capital's most sought-after areas, as investors ​pile into businesses using the technology ​to streamline operations.

Editor's pickFinancial Services
Fortune· Today

Exclusive: Circle cofounder raises $30 million for Series A ‘AI-native bank’ Catena Labs

Sean Neville’s startup aims to build banking tools especially designed for AI agents.

Editor's pickTechnology
Bebeez· Today

AI coworker startup Viktor raises €64.7 million Series A after hitting €12.9 million revenue run rate within 10 weeks of launch

Viktor, a Warsaw and Munich-based AI startup that develops an AI coworker that lives in Slack and Microsoft Teams and works across the tools companies already use, has raised €64.7 million ($75 million) in Series A funding.  The round was led by Accel, with participation from Bek Ventures, Kaya VC, Inovo VC and Tenacity Capital. […]

Editor's pickFinancial Services
Bebeez· Today

UK payments startup Primer raises €86.2 million Series C to expand AI capabilities and accelerate US growth

Primer, a London-based payments infrastructure startup, today announced a €86.2 million ($100 million) Series C funding round to accelerate its investment in AI for payments and finance teams and to drive its expansion in the US.  The round was led by Sofina, with participation from Peak XV Partners and continued backing from all existing investors, […]

Editor's pickTechnology
Xpert· Yesterday

Unframe.AI in European competition: An in-depth economic analysis

🚀 Unframe.AI is leaving its stealth phase and promising production-ready enterprise AI in days instead of months. 🇪🇺 The analysis examines whether the California-Berlin startup can conquer the demanding European market. ⚖️ The radical results-based pricing model shifts the risk ...

Editor's pickFinancial Services
Bebeez· Yesterday

Berlin-based bunch bags €30.1 million Series B to modernise Europe’s private markets infrastructure

bunch, a Berlin-based FinTech startup offering end-to-end infrastructure for European private markets, has closed its €30.1 million ($35 million) Series B to accelerate European commercial growth, deepen its automation and AI, and expand the platform into new geographies, asset classes, and workflows. The round was led by Portage, with participation from Illuminate Financial, significant follow-on […]

Editor's pickFinancial Services
Business Insider· Today

We Asked Top Startup Investors How They Use AI. Here's What They Said. - Business Insider

Venture capitalists, including Ann Miura-Ko and Salil Deshpande, are leveraging AI for investment insights, deal sourcing, and developing internal tools.

Editor's pickManufacturing & Industrials
TNW | Insider· Today

CircuitHub takes $28m from Plural to make PCBs the way clouds make compute

CircuitHub has raised $28m led by Plural to expand its automated PCB-manufacturing 'Grid' factories across Europe and the US.

Labor, Society & Culture

25 articles
AI & Employment9 articles
AI Ethics & Safety8 articles
Editor's pickFinancial Services
Arxiv· Today

The Insurability Frontier of AI Risk: Mapping Threats to Affirmative Coverage, Silent Exposures, and Exclusions

arXiv:2605.18784v1 Announce Type: cross Abstract: The rapid diffusion of agentic AI has created a new coverage problem for commercial insurance: some AI-mediated losses are now affirmatively insured, some create silent-AI exposure under legacy cyber, technology errors-and-omissions (E&O), directors-and-officers (D&O), employment practices liability (EPLI), crime, and media policies, and others are being actively excluded. This paper maps that emerging boundary by coding 55 AI threat classes against 26 insurance products, endorsements, and exclusion regimes using public carrier materials and OWASP/MITRE threat catalogs. We identify a four-tier insurability frontier: affirmatively insured perils, silent-AI exposures, actively excluded perils, and perils outside conventional private insurance structures. Our coding measures publicly claimed positioning rather than executed contract wording; the headline statistics describe what carriers publicly state about coverage, not what would be paid in any specific claim. Three patterns emerge. First, affirmative AI coverage is beginning to differentiate by primary risk emphasis: public materials often position Munich Re around model performance and drift, Armilla and parts of the Lloyd's market around hallucination and broader AI liability, Tokio Marine Kiln and CFC around IP and technology E&O concerns, Apollo ibott around emerging autonomous system liability, and Coalition around deepfake and AI-enabled cyber response. Second, legacy lines retain silent-AI exposure where AI is an instrumentality rather than the legal cause of loss. Third, foundation model concentration is the clearest genuinely novel insurability frontier because upstream model failure can correlate losses across many cedents at once; the relevant market design question is which insurability constraint each candidate structure relaxes, not merely which systemic risk template exists.

Editor's pick
Arxiv· Today

Going PLACES: Participatory Localized Red Teaming for Text-to-Image Safety in the Global South

arXiv:2605.19190v1 Announce Type: new Abstract: Despite the global deployment of text-to-image (T2I) models, their safety frameworks are largely calibrated to a Western-centric default, creating significant vulnerabilities for the rest of the world. To embrace cultural pluralism and bring historically under-represented perspectives in T2I safety, we conduct localised community-centered red teaming studies in the Global South. Our two-fold approach prioritizes localization and participation, by focusing on secondary urban centers in these regions, and conducting community engagement and training workshops to contextualize local norms. As a result, we present PLACES, a dataset comprising over 26,000 examples of T2I model failures collected in partnership with universities in Ghana, Nigeria, and two regions of India (Karnataka and Punjab). Analysis of prompts collected reveals a wide-ranging diversity in socio-cultural and linguistic attributes, when compared to existing geography-agnostic crowdsourced red-teaming data. We observe unique adversarial patterns enabled by local cultural and linguistic nuances, and distinct clusters within region around specific themes, such as religion in India. Moreover, we uncover structural contextual gaps in existing safety frameworks by identifying novel harms showing normative dissonance (e.g., violating religious norms, ignoring local customs, and ominous symbolism). This work argues that expanding T2I safety requires moving beyond mere scale to incorporate deeply localised, participatory methodologies for data collection and contextualization. Content warning: This paper includes examples containing potentially harmful or offensive content.

Editor's pick
Arxiv· Today

POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents

arXiv:2605.19127v1 Announce Type: new Abstract: LLM agents increasingly have access to private user data and act on the user's behalf when interacting with third-party systems. The user defines what may and must not be shared, and the agent must robustly follow that intent even when third-party systems behave adversarially. We introduce POLAR-Bench (Policy-aware adversarial Benchmark), in which a trusted model with a privacy policy and a task converses with a third-party model that adversarially probes for both task-relevant and protected attributes. Across 10 domains and 7,852 samples, we score privacy and utility by deterministic set-membership and vary privacy policy dimension and attack strategy along two orthogonal axes, producing a 5 times 5 diagnostic surface per model. Our results reveal a sharp split: current frontier models withhold over 99% of protected attributes, while smaller open-weight models in the 1--30B range, the class users most commonly run as their own trusted agent on-device or via private inference, score notably worse, with the weakest leaking over half. POLAR-Bench thus localizes where each model's intent-following breaks down, providing a foothold for privacy alignment where it matters most.

Editor's pick
Arxiv· Today

Can LLMs Emulate Human Belief Dynamics?

arXiv:2605.18781v1 Announce Type: cross Abstract: Can LLMs simulate how humans form and change beliefs in social networks? We put this to the test by replicating an established study on belief dynamics, evaluating 12 LLMs across multiple model families and parameter sizes. The answer is a clear no, and in systematic ways. LLMs fail to capture initial human belief distributions and tend to be overall more conformist than humans, shifting their responses to align with those around them. They also take a nuanced approach to emulating human homophilic tendencies within networks. Our findings carry a double payoff: they highlight fundamental properties of LLM behavior, and they raise a sharp warning against deploying LLMs as human proxies in social simulations.

Editor's pickTechnology
Guardian· Today

AI engineer says Google unfairly sacked him after he protested against work for Israel

Exclusive: Employment tribunal claim says worker lost his job after distributing leaflets throughout London office Google is facing a legal challenge from an AI engineer who claims he was unfairly dismissed after he protested against its work for the Israeli government, in the latest sign of growing concern about the social and ethical impacts of AI. The engineer distributed flyers around Google DeepMind’s London offices, which read “Google provides military AI to forces committing genocide” and asking colleagues: “Is your paycheck worth this?” He also emailed colleagues about Google’s 2025 decision to drop a promise not to pursue weapons that harm people and surveillance violating international norms and urged them to unionise. Continue reading...

Editor's pickTechnology
Artificial Intelligence Newsletter | May 20, 2026· Yesterday

Major online platforms dodge child-safety norms due to weak enforcement, study says

A study by the 5Rights Foundation and LSE found that child safety on major platforms like Meta and TikTok has not significantly improved despite new EU and UK regulations.

Editor's pick
Arxiv· Today

Bridging the Disciplinary Gap in Explainable AI: From Abstract Desiderata to Concrete Tasks

arXiv:2605.20081v1 Announce Type: new Abstract: Explainable AI (XAI) is often criticized for failing to satisfy broad desiderata (e.g., fairness, accountability) and for limited practical value to stakeholders. This challenge partly arises because researchers across disciplines prioritize different sets of desiderata that remain underspecified and context-dependent, yet expect XAI to satisfy them simultaneously, resulting in fragmented and sometimes incompatible operationalizations. We argue that many desiderata are not independent, but instead form dependency structures in which higher-level goals (\emph{e.g.}, trust, accountability) rely on more foundational properties (\emph{e.g.}, faithfulness, robustness). Some desiderata are multi-faceted and are best understood within these structures. In particular, instead of addressing all desiderata at once, we focus on subsets of dependency structures and translate them into concrete XAI tasks, thereby decomposing research questions into benchmarkable and solvable units. To this end, we propose a three-axis taxonomy (\emph{target}, \emph{functional role}, and \emph{mode of justification}) and a three-step framework for deriving well-scoped, benchmarkable XAI tasks. Our approach builds on a systematic literature review and conceptual analysis, and supports clarifying desiderata, identifying dependencies, scoping feasibility, and delimiting the design space to derive concrete XAI tasks from abstract desiderata. We illustrate its utility through two explanatory cases, showing how the taxonomy and framework guide systematic task design and evaluation in XAI. {\color{red}{This is a preprint of a paper that will appear in AISoLA 2026.}}

Editor's pickTechnology
Guardian· Today

‘I don’t worry about a robot takeover’: AI expert Michael Wooldridge on big tech’s real dangers (and occasional blessings)

Almost 50 years after he first got his hands on a computer, the Oxford professor still believes in the power of technology. Can his beloved game theory explain why Silicon Valley’s entrepreneurs consistently misuse it? Michael Wooldridge is like the teacher you wish you’d had: approachable, able to explain difficult things in simple terms, neither dauntingly highbrow nor off-puttingly cool, and genuinely enthusiastic about what he does. “I love it when you see the light go on in somebody, when they understand something that they didn’t understand before,” he says. “I find that incredibly gratifying.” He comes across a regular sort of guy, which, as an Oxford professor with more than 500 scientific articles and 10 books to his name, he clearly isn’t. Typically, his favourite work is his contribution to Ladybird’s Expert Books – an update of the classic children’s series – on artificial intelligence. “I’m very proud of this,” he says, as he hands me a copy from his bookshelf. We’re in his study in the University of Oxford’s somewhat municipal computing department on a sunny spring day. Maybe it’s the campus setting, but our discussion almost takes the form of a seminar. Continue reading...

Technology & Infrastructure

38 articles
AI Agents & Automation8 articles
Editor's pick
Arxiv· Today

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

arXiv:2605.19099v1 Announce Type: new Abstract: We introduce DecisionBench, a benchmark substrate for emergent delegation in long-horizon agentic workflows. The substrate fixes a task suite (GAIA, tau-bench, BFCL multi-turn), a peer-model pool (11 models, 7 vendor families), a delegation interface (call_model plus an optional read_profile channel), a deterministic skill-annotation layer, and a multi-axis metric suite covering quality, cost, latency, delegation rate, routing fidelity-at-k, vendor self-preference, and a counterfactual-delegation ceiling. The substrate is agnostic to how peer information is generated or delivered, so learned routers, richer peer memories, adaptive profile construction, and multi-step delegation can all be evaluated against it. We characterize the substrate with a five-condition reference sweep on the full pool (n=23,375 task instances). Three benchmark-level findings emerge: (i) mean end-task quality is statistically indistinguishable across the four awareness conditions (|beta| = 0.21), so quality-only evaluation would miss the orchestration signal; (ii) routing fidelity-at-1 ranges from 7.5% to 29.5% across conditions at near-equal mean quality, with delivery channel (on-demand tool vs. preloaded description) dominating description content; (iii) a counterfactual ceiling places perfect delegation 15-31 percentage points above measured performance on every suite, locating large unrealized headroom for future orchestration methods. We release the substrate, annotation layer, reference intervention suite, analysis pipeline, and 220 per-condition run archives.

Editor's pickTechnology
Arxiv· Today

AgentNLQ: A General-Purpose Agent for Natural Language to SQL

arXiv:2605.19010v1 Announce Type: new Abstract: Natural language to SQL (NL2SQL) conversion is an important problem for researchers and enterprises due to the ubiquitous importance of relational databases in broad-ranging practical problems. Despite the rapid advancements in the capabilities of LLMs, NL2SQL has not reached parity in accuracy with human expert SQL writers, hence needing additional improvements in NL2SQL algorithms. This study presents a new multi-agent method for NL2SQL that achieves 78.1% semantic accuracy on the BIg Bench for LaRge-scale Database (BIRD) benchmark. Our method leverages a semantically enriched representation of user-provided schema, adds user-provided business rules, and produces accurate SQL queries. The main contributions of this study are (a) We designed an optimized new orchestrator in a multi-agent solution that uses LLMs to plan, orchestrate, reflect, and self-correct to generate accurate SQL queries, (b) We developed an advanced schema enrichment method that creates context-aware metadata to improve accuracy, and (c) We demonstrated the accuracy and generalizability of the method across different domains and datasets by evaluating it on the BIRD-SQL benchmark.

Editor's pickProfessional Services
⚖️ No one wins· Yesterday

Start thinking agentically

The hard part of agentic AI isn't the technology. It's knowing how to decide where agents should lead, where humans should stay in the loop and how to stop optimizing work at the margins and start redesigning it.

Editor's pickTechnology
HPCwire· Yesterday

AIwire - Covering Scientific & Technical AI

“Almost right” is not good enough once AI starts making decisions inside a business. That warning came from SAP CEO Christian Klein at the Sapphire 2026 event in Orlando, FL. […]

Editor's pickTechnology
Daily AI News May 19, 2026: 73% Success on Cyber Tests Redefines AI Security· Yesterday

How Claude Code Works in Large Codebases: Best Practices and Where to Start

Claude Code’s guidance explains how teams can use AI coding tools effectively in large codebases through configuration files, hooks, and subagents.

Editor's pick
Arxiv· Today

Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

arXiv:2605.19035v1 Announce Type: new Abstract: The rapid advancement of Large Language Models has given rise to autonomous LLM-based agents capable of complex reasoning and execution. As these agents transition from isolated operation to collaborative ecosystems, we witness the emergence of the Agent-to-Agent (A2A) network, a paradigm where heterogeneous agents autonomously coordinate to solve multi-step tasks. While these networks may offer better task performance compared to simply using one agent to complete the entire task, they introduce systemic vulnerabilities, such as adversarial composition, semantic misalignment, and cascading operational failures, that existing agent alignment techniques cannot address. In this vision paper, we argue that the trustworthiness of A2A networks cannot be fully guaranteed via retrofitting on existing protocols that are largely designed for individual agents. Rather, it must be architected from the very beginning of the A2A coordination framework. We present a comprehensive conceptual framework that situates trust in A2A systems through four design pillars.

Editor's pick
Arxiv· Today

Discoverable Agent Knowledge -- A Formal Framework for Agentic KG Affordances (Extended Version)

arXiv:2605.19186v1 Announce Type: new Abstract: Two decades ago, the Semantic Web Services community was asked how agents with different ontological commitments could discover, compose, and invoke web services coherently. The response was OWL-S and WSMO: formally grounded capability descriptions specifying what a service could do, what the agent must already know for invocation to be epistemically sound, and how ontological mismatches could be formally bridged. Current Knowledge Graph (KG) metadata standards such as VoID and DCAT describe what a KG contains yet say nothing about what a specific agent can prove from it, what closure assumptions govern empty results, or whether the agent's task vocabulary is grounded in the schema. Furthermore, in deployed KGs the governing schema DL and the operative entailment regime can diverge: an epistemic failure mode invisible to current metadata. We revisit and extend these insights for the KG setting with a four-dimensional formal framework from which we derive the Agentic Affordance Profile (AAP): a semantic layer above VoID and DCAT enabling principled KG selection, composition, and failure diagnosis at agent planning time. A five-point research agenda identifies the formal, computational, and engineering work needed to realise AAP-based affordance matching at scale.

Editor's pickTechnology
MarkTechPost· Yesterday

Google Launches Antigravity 2.0 at I/O 2026: A Standalone Agent-First Platform with CLI, SDK, Managed Execution, and Enterprise Support - MarkTechPost

Google Launches Antigravity 2.0 at I/O 2026: A Standalone Agent-First Platform with CLI, SDK, Managed Execution, and Enterprise Support

AI Infrastructure & Compute6 articles
Editor's pickTechnology
VentureBeat· Today

AWS nabs white hot gen AI media creation startup fal, becoming its preferred cloud provider

Generative AI’s rapid transition from text-based chatbots to high-fidelity media—spanning images, video, spatial 3D, and audio—has exposed a glaring bottleneck in the modern tech stack: infrastructure. Rendering pixels in real-time requires a staggering amount of compute, and developers are increasingly struggling to manage fragmented GPU clusters just to keep their applications online. Enter fal, a generative media creation platform that has quietly become the connective tissue for 2.5 million developers across the globe, offering literally hundreds of leading AI image, video, and audio creation and editing models — from proprietary ones like OpenAI's ChatGPT-Images-2.0 and Google's Nano Banana Pro 2 to open source rivals — all through its unified interface and APIs. Today, the San Francisco-based startup, recently valued at a massive $4.5 billion following a $300 million Series D round led by Sequoia Capital, announced it has selected Amazon Web Services (AWS) as its preferred cloud provider. While the financial terms of the deal weren't made public, the move signals a maturation in the generative media space, shifting the focus from simply building foundational models to effectively scaling them for mass, commercial consumption. “AWS has been there for distribution and monetization, and for the use of AI in creative pursuits — helping designers, developers, and the creative community think through how they can use AI responsibly, scalably, and at global scale," said Samira Panah Bakhtiar, General Manager for Media, Entertainment, Games, and Sports at AWS, in an exclusive interview with VentureBeat. A one-stop-shop for Gen AI media allowing enterprises to plug in and choose the best model for their needs At its core, fal operates as a unified gateway to the rapidly expanding generative AI ecosystem. Rather than forcing developers to provision their own servers, deal with latency issues, or string together disparate open-source model weights, fal provides a single, unified API. Through this API, users gain instant access to over 1,000 production-ready AI models. Think of it as the Stripe or Plaid of generative media: abstracting away the devastatingly complex back-end plumbing so developers can focus solely on the user experience. It is a "plug-and-play" solution that has already attracted independent creators and enterprise giants alike, powering generative workflows for enterprises including Canva, Adobe, and Amazon MGM Studios. “Generative media workloads demand a fundamentally different infrastructure layer, one that can handle massive parallel inference, rapid model iteration, and production-grade reliability at scale,” said Gorkem Yurtseven, CTO and Co-founder of fal, in a statement provided to VentureBeat. Neither AWS nor fal specified what other cloud or GPU providers the latter was using prior to their deal together. Asked who fal had been using before AWS, Bakhtiar did not name a prior cloud or GPU provider, saying instead that fal is now using AWS services. In a blog post, fal's Head of Compute Partnerships Emir Lise described AWS as providing the “global scale and reliability layer” for its existing serverless generative-media infrastructure — framing the partnership around elasticity, reliability and enterprise scale rather than a replacement of a named incumbent. A public search turned up Tigris as a storage provider for fal — with Tigris saying fal runs a “global fleet of GPUs across many clouds” — and an announcement from fal in Septemeber 2025 that it was available through Google Cloud Marketplace, allowing customers to buy fal through Google Cloud billing and governance, but that listing does not state that Google Cloud powered fal’s GPU infrastructure. 99.99% guaranteed uptime? By partnering with AWS, fail aims to merge its highly optimized inference engine with Amazon’s global reach to handle millions of daily API calls with 99.99% guaranteed uptime. In addition, Bakhtiar said fal users can expect to see "faster inference and performance, greater efficiency, more scalability, and more seamless service continuity — all things you would expect as a result of partnering with the world’s largest, broadly adopted cloud." Therefore, the primary benefit for fal users is better performance and reliability without changing how they work: faster inference, more scalability, smoother continuity, and access to production-ready AI models without managing their own infrastructure. For fal, the partnership makes its platform stronger for creators, studios, and enterprise customers by backing it with AWS’s security, global scale, and cloud infrastructure. For AWS, it helps push cloud and AI deeper into creative production, not just distribution or monetization. It positions AWS as a key infrastructure partner for studios, media companies, developers, and individual creators building AI-powered content workflows. Offloading the GPU burden The partnership with AWS is designed to address the sheer physics and cost of rendering generative media. By migrating its operations to AWS, fal will be able to leverage Amazon’s broad suite of AI services, including the Bedrock platform, alongside custom-built silicon like Trainium and Graviton processors. "You don't have to manage like a GPU fleet to use the AI for creative pursuits," Bakhtiar explained. This is a critical pain point for larger-scale media generation demands in 2026. Securing high-performance GPUs for parallel inference is both expensive and technically demanding. By shifting that burden to AWS, fal ensures that creatives can focus on their workflows, without needing a dedicated DevOps team. Bakhtiar also noted the powerful "network effect" of building on AWS. Because major studios and creative platforms (like Adobe and Canva) are already deeply entrenched in the AWS ecosystem, integrating fal's API into their existing pipelines becomes a frictionless endeavor. Enterprise-grade security and compliance with gen AI creative speed For IT leaders and developers, fal's architecture offers a distinct advantage regarding licensing, security, and deployment. Historically, utilizing frontier generative models meant either accepting strict vendor lock-in from a single provider or attempting to host open-source models locally. The latter requires significant overhead and forces enterprises to navigate a minefield of disparate open-source licenses (such as MIT, Apache 2.0, or restrictive non-commercial licenses). fal bypasses this friction by offering commercial API access to a curated ecosystem of models. Developers simply pay for the inference they consume. Furthermore, the platform is SOC 2 compliant and explicitly built for "enterprise scale," meaning it meets the stringent data privacy and security benchmarks required by heavily regulated industries and massive consumer platforms. For large media conglomerates, this managed service approach allows them to experiment with the latest state-of-the-art tools securely, without the risk of exposing proprietary data or intellectual property. Empowering devs and vibe coders The true impact of fal’s platform, however, is best observed at the developer level. By democratizing access to high-end infrastructure, fal is enabling a new class of builders—often referred to as "vibe coders"—to create complex, multimodal applications without traditional computer science backgrounds. As Bakhtiar pointed out, access to these tools fundamentally "levels the playing field". Whether it is an individual developer or hobbyist vibe coding a side project, or a fully-funded editor or director rendering a blockbuster film, the underlying technology is now identical, infinitely scalable, and ready for production. “More creatives — whether they’re full-fledged studios, indie brands, or individual content creators — are now going to be able to access these tools, and they’re going to be able to punch way above their weight as a result," Bakhtiar said, casting the partnership as a way to serve even more users through fal thanks to the reliability of AWS's servers and custom Trainium, Graviton and Inferentia chips. The rollout of enhanced AWS capabilities for fal customers will occur in phases throughout 2026.

Editor's pickTechnology
Top Daily Headlines: America's top cyber-defense agency left a GitHub repo open with with passwords, keys, tokens – and incredibly obvious filenames· Today

Baidu says the quiet part out loud: you can't build AI infrastructure, so clouds can cash in

Baidu's CFO noted that GPU rentals are structurally higher margin than traditional CPU cloud services.

Editor's pickTechnology
⚙️ Google AI glasses could mainstream the category· Today

OpenAI targets enterprise with guaranteed compute

OpenAI's new Guaranteed Capacity initiative highlights how compute is becoming a critical battleground for enterprises requiring reliable AI access.

Editor's pickPAYWALLTechnology
Bloomberg· Today

French Companies Bid for €10 Billion Europe AI Gigafactory Site

A consortium of European companies will bid on a €10 billion ($11.6 billion) project to build a major data center campus in France as part of a European Union effort to boost artificial intelligence infrastructure on the continent.

Editor's pickTechnology
Arxiv· Today

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

arXiv:2605.19008v1 Announce Type: new Abstract: Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.

Editor's pickTechnology
Substack· Yesterday

OpenAI and Dell Bring Enterprise AI Securely On-Premise

Your daily PM briefing | May 19, 2026 | PM Interview Prep Club

AI Models & Capabilities13 articles
Editor's pickTechnology
Anthropic Claude Sandboxes 🔒, Microsoft 3D Image Model 🖼️, Hermes Agen· Yesterday

Microsoft drops open-source 4B model that converts any image to 3D in 3 seconds

Microsoft released an open-source 4B parameter model capable of transforming any image into a 3D representation in just three seconds.

Editor's pickPAYWALLTechnology
FT· Yesterday

Google to release smart glasses and add AI ‘agents’ to search engine

CEO Sundar Pichai says features powered by new Gemini model will close gap with Anthropic and OpenAI

Editor's pick
Arxiv· Today

Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits

arXiv:2605.18890v1 Announce Type: cross Abstract: The scientific claims drawn from LLM social simulations should be no stronger than the robustness audits that support them. Generative agents bring new expressive power to agent-based modeling, enabling simulations of collective social processes like cooperation, polarization, and norm formation. Yet they also introduce complexity through additional architectural choices, such as agent specification, memory representation, interaction protocols, and environment design. Small perturbations that appear minor to researchers can cascade into macro-level outcomes through repeated interaction, creating a "butterfly effect." Consequently, scientific claims drawn from LLM social simulations may reflect implementation artifacts rather than the social mechanisms being modeled. We support this position with two case studies: a repeated Prisoner's Dilemma and a social media echo chamber simulation. Across multiple models, minor perturbations in persona format and game-instruction framing shift cooperation rates by up to 76 percentage points, while network homophily and hub assignment produce significant and consistent shifts in polarization metrics. We also find that sensitivity is unevenly distributed across both architectural choices and model families: the same perturbation that produces the 76 pp shift in one frontier model only shifts another by 1 pp. Robustness is therefore a property that should be measured per claim and per model, not assumed. To address this validation gap, we introduce TRAILS (Taxonomy for Robustness Audits In LLM Simulations), a robustness-audit taxonomy spanning three levels of simulation design: agent (micro-level), interaction (meso-level), and system (macro-level). We call for robustness to become a first-order validation requirement before LLM social simulations are used to explain mechanisms, evaluate interventions, or inform decisions.

Editor's pick
Arxiv· Today

Interference-Aware Multi-Task Unlearning

arXiv:2605.19042v1 Announce Type: new Abstract: Machine unlearning aims to remove the contribution of designated training data from a trained model while preserving performance on the remaining data. Existing work mainly focuses on single-task settings, whereas modern models often operate in multi-task setups with shared backbones, where removing supervision for one task or instance can unintentionally affect others. We introduce multi-task unlearning with two settings: full-task unlearning, which removes a target instance from all tasks, and partial-task unlearning, which removes supervision only from selected tasks. We show that shared parameters couple the forget and retain sets, causing task-level interference on non-target tasks and instance-level interference on other instances. To address this issue, we propose an interference-aware framework that combines task-aware gradient projection, which constrains updates within task-specific subspaces, with instance-level gradient orthogonalization, which reduces conflicts between forget and retain signals. Experiments on two multi-task computer vision benchmarks across five tasks show that our method achieves effective unlearning while maintaining strong generalization, reducing UIS compared with the strongest baseline by 30.3% in full-task unlearning and 52.9% in partial-task unlearning.

Editor's pick
Arxiv· Today

Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance

arXiv:2605.18801v1 Announce Type: new Abstract: Data is fundamental to large language models (LLMs). However, understanding of what makes certain data useful for different stages of an LLM workflow, including training, tuning, alignment, in-context learning, etc., and why, remains an open question. Current approaches rely heavily on extensive experimentation with large public datasets to obtain empirical heuristics for data filtering and dataset construction. These approaches are compute intensive and lack a principled way of understanding the essence of how specific data characteristics drive LLM behavior. In this position paper, we advocate for the need of developing systematic methodologies for generating synthetic sequences from appropriately defined random processes, with the goal that these sequences can reveal useful characteristics when they are used in one or multiple stages of the LLM workflow. We refer to such sequences as data probes. By observing LLM behavior on data probes, researchers can systematically conduct studies on how data characteristics influence model performance, generalization, and robustness. The probing sequences exhibit statistical properties that can be viewed using theoretical concepts, such as typical sets, which are generalized to describe the behaviors of LLMs. This data-probe approach provides a pathway for uncovering foundational insights into the role of data in LLM training and inference, beyond empirical heuristics.

Editor's pickMedia & Entertainment
Arxiv· Today

Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

arXiv:2605.19285v1 Announce Type: cross Abstract: The rapid spread of misinformation on social media platforms has become a formidable challenge. To mitigate its proliferation, Misinformation Detection (MD) has emerged as a critical research topic. Traditional MD approaches based on small models typically perform binary classification through a black-box process. Recently, the rise of Large Language Models (LLMs) has enabled explainable MD, where models generate rationales that explain their decisions, thereby enhancing transparency. Existing explainable MD methods primarily focus on crafting sophisticated prompts to elicit rationales from off-the-shelf LLMs. In this work, we propose a pipeline to fine-tune a dedicated LLM specifically for explainable MD. Our pipeline begins by collecting large-scale fact-checked articles, and then uses multiple strong LLMs to produce veracity predictions and rationales. To ensure high-quality training data, we leverage a filtering strategy that selects only the correct instances for fine-tuning. While this pipeline is intuitive and prevalent, our experiments reveal that naive filtering based solely on label correctness is insufficient in practice and suffers from two critical limitations: (1) Coarse-grained labels cause insufficient rationales: Rationales filtered solely based on binary labels are insufficient to adequately support their decisions; (2) Over-verification behavior causes unnecessary rationales: Stronger LLMs tend to exhibit over-verification behavior, producing excessively verbose and unnecessary rationales. To address these issues, we introduce LONSREX, a novel data synthesis pipeline to Locate Necessary and Sufficient Rationales for Explainable MD. Specifically, we propose a metric that quantifies the contribution of each verification step to the final prediction, thereby evaluating its necessity and sufficiency. Experimental results demonstrate the effectiveness of LONSREX.

Editor's pickTechnology
VentureBeat· Yesterday

Google unveils Gemini Omni 'any-to-any' AI model: what enterprises should know

Although it was already discovered by intrepid AI power users weeks ago, Google's new Gemini Omni model officially debuted today at the company's annual I/O developer conference in Mountain View, California, and it marks a significantly new paradigm in the wider AI and tech marketplace. That's because as its "omni" (from the Latin omne — meaning "all") prefix would suggest, this is Google's first truly native, multimodal model, that is "a model that can create anything from any input — starting with video." The model marks Google's bid to collapse the multimodal generative stack — text-to-image, image-to-video, video-to-video, audio generation — into a single foundation model with a single editing surface. The big question for business leaders is: should you switch any of your own AI stack over to Gemini Omni now? Unfortunately, the truth is, you may not be able to just yet — the model is only available to individual users through Google's AI subscription plans starting with the $20 per user per month "AI Plus" plan. It can currently be accessed on the Gemini website and mobile apps, Google's web-based Flow AI image and video editing suite, and YouTube Shorts. While the company says it is ultimately going to be available via an application programming interface (API) — which many enterprises rely on for their AI needs — it's not ready yet. In a departure, Google also did not issue any public benchmarks for Gemini Omni (yet). However, third-party organizations will no doubt put it to the test on various tasks and user-reported quality metrics. In the meantime, though, its quality and speed remain somewhat subjective. But, given the capabilities and faster editing enabled by the new Omni model, individual members of your team should probably give serious consideration to switching over to it, especially if they work creating visuals for technical diagrams, marketing and comms materials, training and corporate education courses, sales collateral, and basically anything that involves visuals. What Omni actually is Omni is the next chapter of the work that produced Nano Banana, the image-generation and editing model Google shipped roughly a year ago. The first model in the family, Gemini Omni Flash, accepts any combination of text, images, audio, and video as input and produces high-quality output across the same modalities — all from a single model rather than a relay of specialized systems. Google says the model is "natively multimodal from the ground up," which matters less as marketing copy than as an architectural claim: a unified model can reason across modalities in the same forward pass, which generally translates into more coherent edits, fewer pipeline artifacts, and a far cleaner API surface for developers. OpenAI started this trend back in May 2024 with the release of GPT-4o, its first natively "omni" model, also trained from the ground-up to be able to analyze and generate multiple different types of content, from text to code, imagery, and audio. However, it did not support video generation, and the model was eventually deprecated following reports of sycophancy and even users demanding OpenAI retain it after developing parasocial relationships with it. Is Gemini Omni at risk of sparking a similarly devoted following? It remains to be seen. One big difference is that its headline interaction pattern is conversational video editing. Each instruction "builds on the last," and past directions persist across turns so the video evolves coherently as the user iterates. Practical examples Google highlighted include changing the world inside a clip, reimagining an action or camera angle, refining sequences over multiple turns, and generating explainer-style content from short prompts. Google also emphasizes improved physics — gravity, kinetic energy, fluid dynamics — which is the kind of detail that separates "looks like AI video" from "looks like footage." Rollout, pricing, and the API question The first thing enterprise leaders should read carefully is the rollout plan. Omni Flash is going live today inside the Gemini app for U.S. subscribers across AI Plus, AI Pro, and AI Ultra tiers — including the new $100-per-month AI Ultra plan Google announced at the same event. Google says it will roll out to developers via Vertex AI APIs "in the coming weeks." That gap is significant. Until the Vertex API is generally available, Omni is effectively a consumer and prosumer tool. Enterprise pilots beyond individual seat-based experimentation should wait for the API, both because that's where Google's enterprise SLAs and data-handling commitments live, and because production-grade generative video without a programmatic interface is a non-starter. Its pricing through the API per million tokens (presumably) will also determine its viability as an enterprise product outside of film/TV/entertainment and the arts productions. For decision-makers weighing seat economics in the meantime, the new AI Ultra tier is positioned specifically at developers, technical leads, knowledge workers, and advanced creators, with priority access to Google Antigravity, higher usage limits, and bundled Omni Flash access. For small creative teams under tight deadlines, that may be the fastest way to evaluate the model before the API arrives. The enterprise use cases that really matter It is easy to default to "marketing video" as the use case, but Omni's value proposition for enterprises is broader if you think of it as a programmable video and media engine rather than a creative app: Sales and marketing: rapid generation of variant ads, localized creative, and product demos without per-asset agency cycles. Internal communications, learning and development (L&D): explainer videos, onboarding modules, and policy walkthroughs produced by non-specialists. Customer support and documentation: dynamic, query-conditioned visual explainers attached to help articles. Product and engineering: visualization of simulations, UI walkthroughs, and concept videos for spec reviews. Field operations: short, situation-specific instructional clips generated on demand. What changes with Omni versus the previous generation of tools is the unification. Many enterprises stitched a workflow together from text-to-image, image-to-video, lip-sync, and voice models, each with its own contract, billing, and data path. A single Vertex AI-backed model collapses procurement and observability into one place — assuming the eventual API delivers production-grade throughput and latency. The governance story is the most underrated part For CIOs and CISOs, the most important section of Google's announcement is not the model card; it is the provenance and content-safety work shipping alongside it. Every video generated by Omni carries Google's SynthID digital watermark. Google is expanding C2PA Content Credentials across its generative tools, and launching an AI Content Detection API on Agent Platform that lets businesses identify AI-generated content from both Google and other popular models. Partner integrations announced at the same event — including Shutterstock, Avid (in Pro Tools), and at least one major newswire — indicate where the standard is going. For enterprises, this matters in three concrete ways: It gives legal and compliance teams a defensible audit trail for AI-generated media. It allows brand-safety teams to detect AI-generated material entering content pipelines from third parties. And it provides a defensible answer for regulators in jurisdictions, like the EU, that are tightening rules around synthetic-media disclosure. There is also a "Personal Avatars" program that lets creators record short videos to authorize use of their voice and likeness across generated content, as Google leaders and employees showcased themselves today in posts centered around I/O featuring their AI generated likenesses. This puts it in direct competition with Synthesia, a UK-based AI unicorn focused primarily on enterprise-safe AI videos and avatars. For enterprises considering executive videos, training avatars, or branded spokesperson content, the consent model here is the right starting point — but contracts and rights-management policies will need to extend to cover it. Risks worth flagging Omni's main risks are familiar but worth restating. The competitive landscape is crowded with the aforementioned Synthesia, TikTok parent company ByteDance's acclaimed Seedance model, Kuaishou Technology's Kling AI models, and the fast-improving open-source field all compete for the same workflows. Lock-in to any single video model is a real concern when output quality is still leapfrogging quarterly. Latency and cost for production-volume video generation remain unproven outside controlled demos. In addition, the legal status of training data for generative video is unsettled in multiple jurisdictions; enterprises should require clear indemnification language before deploying generated video into customer-facing channels. Furthermore, VentureBeat collaborator and AI YouTuber Sam Witteveen, CEO of enterprise machine learning vendor Red Dragon AI, received early access to Gemini Omni and reported the content restrictions (which some deem to be censorship) to be quite strict, potentially restricting and inhibiting all the potential use cases an enterprise would like to pursue. Thoughts for enterprises considering adoption Omni is worth piloting — but the structure of the pilot matters. For most enterprises, the right move over the next 30 to 60 days is to fund a small, sanctioned experiment with one or two AI Ultra seats in marketing or L&D, while the platform and security teams use that runway to prepare for the Vertex AI API: define data-residency requirements, set up SynthID and C2PA verification in the content pipeline, and stand up the AI Content Detection API alongside existing media-governance tooling. Treat the consumer rollout as a UX preview, not a production plan. When the API arrives, the enterprises that have already done the governance work will be the ones moving Omni into real workflows while everyone else is still drafting policy. Omni is not, by itself, a reason to overhaul an enterprise AI strategy. But it is a strong signal that the multimodal generative stack is consolidating into single models with first-party provenance baked in — and that is a shift technical decision-makers should be planning around now.

Editor's pickManufacturing & Industrials
Arxiv· Today

KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition

arXiv:2605.19031v1 Announce Type: new Abstract: Kolmogorov-Arnold Networks (KANs) have demonstrated an exceptional ability to learn complex functions on clean, low-dimensional data but struggle to maintain performance on noisy and imperfect real-world datasets. In contrast, conventional multi-layer perceptrons (MLPs) are far more tolerant to noise and computationally efficient. Replacing all MLP components with KANs in HAR models often degrades accuracy and computation efficiency, highlighting an open challenge: how to combine KANs' precision with MLPs' noise robustness and efficiency. To address this, we systematically explore various placements of KAN modules within deep HAR networks and propose a hybrid architecture that strategically synergizes the strengths of both paradigms, which uses a KAN-based input embedding layer, retains MLP layers for intermediate feature mixing, and introduces a specialized LarctanKAN module for final activity classification. Across eight public HAR datasets, the hybrid KAN-MLP model achieves an average macro F1 score relative improvement of 5.33\% compared pure-MLP model, significantly outperforming standalone KAN and MLP baselines. Furthermore, integrating this hybrid strategy into other state-of-the-art HAR architectures consistently boosts their performance. Our findings demonstrate that a carefully orchestrated combination of KAN, MLP, or other conventional neural components yields more robust and accurate HAR models for real-world wearable sensing environments.

Editor's pick
Arxiv· Today

GRASP: Deterministic argument ranking in interaction graphs

arXiv:2605.19141v1 Announce Type: cross Abstract: Large language models are increasingly deployed as automated judges to evaluate the strength of arguments. As this role expands, their legitimacy depends on consistency, transparency, and the ability to separate argumentative structure from rhetorical appeal. However, we show that holistic judging - a common LLM-as-a-Judge practice where a model provides a global verdict on a debate - suffers from substantial inter-model disagreement. We argue that this instability arises from collapsing a debate's complex interaction structure into a single opaque score. To address this, we propose GRASP (Gradual Ranking with Attacks and Support Propagation), a deterministic framework that aggregates stable local interaction judgments into a global ranking via a convergent attack--defense propagation operator. We show that local interaction judgments are more reproducible than holistic rankings in LLM-as-a-Judge evaluations, allowing GRASP to produce more consistent global rankings. We further show that GRASP scores do not correlate with human "convincingness" labels, highlighting a vital sociotechnical distinction: GRASP does not measure persuasion, factuality, or rhetorical appeal, but structural sufficiency - a defense-aware notion of argument robustness over the explicit interaction graph. Overall, GRASP offers a transparent and auditable alternative to holistic LLM judging.

Editor's pickTechnology
Arxiv· Today

The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems

arXiv:2605.19638v1 Announce Type: cross Abstract: As large language models (LLMs) demonstrate increasing competence in synthesizing functional user interfaces, a fundamental question emerges in accessibility computing: \textit{how far can AI-driven accessibility systems go?} This paper introduces the \textit{Accessibility Capability Boundary} (ACB), a formal framework for reasoning about the operational limits and expansion potential of autonomous accessibility systems, and grounds this theory in a real-world systems artifact. We model accessibility not as a binary compliance property but as a dynamic, multidimensional capability space constrained by measurable variables including deployment latency, cognitive load, infrastructure dependency, offline persistence, interaction complexity, and adaptability. We argue that AI-generated, browser-native systems constructed as single-file HTML artifacts leveraging standard browser APIs may dramatically shift the ACB outward by reducing deployment friction to near-zero and enabling rapid, context-specific interface adaptation. We ground our theoretical framework in the analysis of two real-world exploratory prototypes. The first is an AI-generated browser-native accessibility interface deployed for a blind user in Nepal. The second is a fully functional, open-source webcam alignment assistant for visually impaired users, serving as a concrete systems artifact. Through formal definitions, propositions, and a comparative evaluation matrix, we characterize the regions of the accessibility capability space that such systems can and cannot reach. We further identify remaining computational, infrastructural, and verification constraints that constitute the hard boundaries of this paradigm. This work contributes a theoretical foundation for understanding the scalable limits of autonomous accessibility computing and proposes a research agenda for future work in accessibility-aware AI systems.

Editor's pickTechnology
Substack· Today

Google I/O 2026 Was Not Just a Model Launch. It Was Google Showing the Agent Stack.

Google’s own language around I/O 2026 is “the agentic Gemini era.” Sundar Pichai framed the keynote around Gemini products, conversational AI , infrastructure, models, and agents, not around a single isolated model release.

Editor's pickTechnology
Daily AI News May 19, 2026: 73% Success on Cyber Tests Redefines AI Security· Yesterday

Introducing Composer 2.5

Composer 2.5 is an update to Cursor’s AI coding model with improvements in long-running tasks, instruction following, and training methods.

Editor's pickTechnology
Daily AI News May 19, 2026: 73% Success on Cyber Tests Redefines AI Security· Yesterday

Starchild-1: The First Real-Time Multimodal World Model

Starchild-1 introduces a real-time multimodal world model that generates synchronized audio and video while responding to streaming user input.

AI Research & Science2 articles
Editor's pickProfessional Services
Arxiv· Today

RobustiPy: An efficient next generation multiversal library with model selection, averaging, resampling, and explainable artificial intelligence

arXiv:2506.19958v4 Announce Type: replace-cross Abstract: Scientific inference is often undermined by the vast but rarely explored "multiverse" of defensible modelling choices, which can generate results as variable as the phenomena under study. We introduce RobustiPy, an open-source Python library that systematizes multiverse analysis and model-uncertainty quantification at scale. RobustiPy unifies bootstrap-based inference, combinatorial specification search, model selection and averaging, joint-inference routines, and explainable AI methods within a modular, reproducible framework. Beyond exhaustive specification curves, it supports rigorous out-of-sample validation and quantifies the marginal contribution of each covariate. We demonstrate its utility across five simulation designs and ten empirical case studies spanning economics, sociology, psychology, and medicine, including a re-analysis of widely cited findings with documented discrepancies. Benchmarking on ~672 million simulated regressions shows that RobustiPy delivers state-of-the-art computational efficiency while expanding transparency in empirical research. By standardizing and accelerating robustness analysis, RobustiPy transforms how researchers interrogate sensitivity across the analytical multiverse, offering a practical foundation for more reproducible and interpretable computational science.

AI Security & Cybersecurity6 articles
Editor's pickPAYWALLFinancial Services
FT· Yesterday

Morgan Stanley issues China-only iPhones to its Hong Kong bankers

US bank’s move reflects rising concern over data security for staff travelling to mainland China

Editor's pickEducation
Arxiv· Today

Locked Out at 8,000 Miles: Why UK-China Partnership Students Are Suffering

arXiv:2605.19367v1 Announce Type: cross Abstract: University cybersecurity protocols have intensified dramatically in response to rising threats of data breaches, ransomware, and credential theft. While necessary, these measures have created a parallel crisis of accessibility - even for students physically on campus. This paper argues that domestic, on-campus students already face significant barriers: mandatory multi-factor authentication (MFA), device compliance rules, browser and operating system restrictions, and administrative remote-management permissions on personal phones and laptops. However, these difficulties are magnified to near-breaking point in the context of international partnerships, such as the increasingly common UK-China transnational education programmes. For a student in China accessing a UK university's virtual learning environment (VLE) from an 8-hour time difference, with no on-hand IT support during their active hours, the same security architecture becomes functionally disabling. Drawing on testimonies from public forums (Reddit's r/college, r/UniUK, r/Professors), higher education IT help boards, and student accounts from UK-China partnership programmes, this paper documents how over-engineering digital security disproportionately harms remote international learners. We show that while on-campus students can at least visit an IT desk or borrow a library terminal, their counterparts in partner institutions abroad face authentication failures, device lockouts, and unsupported browsers with no real-time remedy. The paper concludes that current university security models assume a co-located, 9-to-5, English-time-zone user - an assumption that fails both domestic students and, catastrophically, international partnership cohorts.

Editor's pickTechnology
MIT Technology Review· Yesterday

Understanding the modern cybercrime landscape

Throughout 2025, HPE observed significant changes in how cybercriminals operate. Analyzing real-world threats, our HPE Threat Labs highlighted an industrialization of the cyber criminals’ methods in its new In the Wild Report, enabling greater scale, speed and structure in their campaigns. They typically use automation and AI to exploit longstanding vulnerabilities, and many have adopted…

Adoption, Deployment & Impact

30 articles
AI Adoption Barriers & Enablers8 articles
Editor's pickProfessional Services
Arxiv· Today

Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production

arXiv:2605.18818v1 Announce Type: new Abstract: Academic research tends to focus on new models for document understanding creating a wide gap in the literature between model definition and running models at production scale. To close that gap, we present a microservice architecture that encapsulates pipelines of multiple models for classification, optical character recognition (OCR), and large language model structured field extraction as well as our experience running this pipeline on thousands of multi-page documents per hour. We describe our primary design decisions, including a hybrid classification, separation of GPU-bound inference from CPU-bound orchestration, use of asynchronous processing for the many IO-bound operations in the pipeline, and an independent, horizontal scaling strategy. Using batch profiling, we identified two surprising qualitative findings that shape production deployments: OCR, not language-model parsing, dominates end-to-end latency, and the system saturates at a concurrency determined by shared GPU-inference capacity rather than worker count. Our goal is to provide practitioners with concrete architectural patterns for building document understanding systems that work beyond the benchmark; effectively operationalizing models in production.

Editor's pickProfessional Services
Arxiv· Today

Towards Zero Trust Architecture: A Pilot Study on Information Systems Security Readiness amongst Small and Medium Enterprises

arXiv:2605.18901v1 Announce Type: cross Abstract: Small and medium enterprises (SMEs) face growing cyber threats but often lack the resources and expertise needed to adopt Zero Trust Architecture (ZTA). This pilot study examines the drivers and barriers shaping SME perceptions of ZTA necessity and proposes an exploratory staged adoption path. Survey data from 64 IT and security professionals in the Asia-Pacific region show that ZTA familiarity and cloud-computing needs are the strongest positive correlates of perceived necessity, whereas accumulated barriers show only a weak negative association. Identity and access management complexity and scalability emerge as the main implementation hurdles. Based on these findings, we propose a three-stage route for SMEs: strengthening identity governance, segmenting high-value assets, and introducing targeted monitoring in line with operational capacity. The study offers early evidence for more realistic Zero Trust transitions in resource-constrained firms.

Editor's pickProfessional Services
MIT· Yesterday

A Need for Nuance: The Economist’s Andrew Palmer

On today’s episode of the Me, Myself, and AI podcast, Andrew Palmer, senior editor at The Economist, describes how organizations can experiment with generative AI while balancing speed, quality, and risk. At his own organization, Andrew and others test artificial intelligence with human oversight to develop editing and publishing efficiencies. As the host of The […]

Editor's pick
Forbes· Yesterday

Council Post: Why Most Enterprise AI Fails After The Pilot Phase

AI does not usually fail in production. More often, the organization is not ready for it.​

Editor's pickHealthcare
Outsourceaccelerator· Yesterday

AI preparedness gap hits frontline industries hardest in 2026 - Outsource Accelerator

Hospitality, healthcare and logistics rank among the industries least prepared for AI workforce disruption in 2026, according to a new analysis.

Editor's pickProfessional Services
Cybernews· Yesterday

“Poisoning the well:” EY retracts cyber report packed with AI slop | Cybernews

Consultancy group Ernst & Young (EY) has withdrawn a cybersecurity report after an investigation by GPTZero found that 70% of the citations within it were either fabricated or broken.

Editor's pickTechnology
Guardian· Today

Online child safety campaigners call for US inquiry into Roblox

Groups claim game platform’s design and business model conflict with children’s developmental needs Online child safety campaigners including Jonathan Haidt, the bestselling writer on the mental health impacts of social media, have called on the Trump administration to investigate Roblox, the booming gaming and chat platform used by 150 million people daily, including a large number of under-13s. Haidt’s Anxious Generation Movement, Fairplay and the rightwing anti-pornography National Center on Sexual Exploitation are among groups claiming Roblox’s design and business model conflict with children’s developmental needs. Continue reading...

Editor's pickTechnology
Theregister· Today

Google Cloud suspended major customer Railway.com without cause, causing outage

This is the service we get when we spend $10m plus? asks automated code deployment outfit

AI Applications8 articles
Editor's pickHealthcare
VentureBeat· Today

Corti's new Symphony for Speech-to-Text model beats OpenAI at medical terminology accuracy, highlighting the value of specialized AI

Today, Copenhagen-based healthcare AI Corti is launching Symphony for Speech-to-Text, a new generation of clinical-grade speech recognition models engineered specifically for real-time dictation, conversational transcription, and batch audio processing — and their accuracy rate is the highest for this specific use case yet recorded. "We are focused on ensuring our AI scribes can be trusted by physicians, medical practitioners and patients...the entire healthcare system," said Andreas Cleve, co-founder and CEO of Corti, in an exclusive video call interview with VentureBeat. The performance data the company is bringing to the table paints a stark picture of the current state of enterprise AI: when it comes to highly regulated, specialized industries, domain-specific models can beat out the foundation model providers. In a newly published research paper, Corti revealed that its new clinical-grade speech models reduced word error rates (WER) by up to 93% when compared against leading generalist speech models and APIs on medical terminology. On English medical terminology, its Symphony for Speech-to-Text achieved a remarkably low 1.4% WER. By comparison, OpenAI’s speech model registered a 17.7% WER, ElevenLabs hit 18.1%, Whisper recorded 17.4%, and Parakeet scored 18.9%. Corti’s announcement serves as a critical inflection point for healthcare builders. While general-purpose APIs like OpenAI’s whisper are sufficient for broad-domain transcription, they frequently stumble over medical acronyms, complex medication dosages, shorthand, and noisy emergency room environments. Symphony for Speech-to-Text aims to solve this by providing developers with a highly specialized, production-grade API designed from the ground up for clinical workflows. The agentic era demands flawless data inputs The launch of Symphony for Speech-to-Text highlights a fundamental shift in how healthcare uses voice technology. For decades, medical speech recognition was primarily about generating a static text document for human doctors to review—a digital replacement for a notepad. But as the healthcare industry hurtles into what technologists call the "agentic era," where autonomous AI agents actively assist in clinical decision-making, EHR navigation, and real-time support, the transcript is no longer the final product. It is the foundational data layer. “Speech has always been one of healthcare’s most important inputs,” Cleve said in a statement provided to VentureBeat. “What is changing is what happens after the words are captured. In the agentic era, speech recognition requires more than simply producing a transcript - we need to give AI systems accurate clinical facts to reason from. If a model mishears a medication, dosage, or symptom, every downstream step becomes less reliable. Symphony for Speech-to-Text gives healthcare builders a speech layer accurate enough to thrive in clinical reality.” This is where the compounding danger of high word error rates comes into play. If a general-purpose AI model hallucinates a transcription—turning "hyperthyroidism" into "hypothyroidism," or misinterpreting a critical medication dosage—every subsequent AI agent relying on that transcript will operate on corrupted data. Corti’s architecture mitigates this risk by producing structured, clinically usable output directly from the API, helping downstream AI applications reason over clean facts rather than messy, unformatted text. Nowhere is this more evident than in Corti’s entity recall benchmarks. Symphony for Speech-to-Text reached an astonishing 98.3% recall rate on formatted clinical entities—such as dosages, measurements, and dates. In contrast, Corti reported that the strongest general-purpose baseline model maxed out at just 44.3% recall for the same entities. For developers building ambient AI documentation tools, that 54% gap is the difference between a tool that saves a physician time and a tool that constitutes a medical liability. Dethroning the industry ldears While Corti’s benchmarks against modern LLM builders like OpenAI and ElevenLabs are striking, the company is also taking aim at legacy medical transcription giants. For years, the gold standard for dedicated clinician dictation has been Dragon Medical One. However, these legacy systems were historically optimized strictly for intentional clinician dictation, not as underlying infrastructure for ambient AI, complex multi-party conversations, or real-time clinical support tools. In evaluations of real-world English medical dictation, Corti achieved a 4.6% WER, outperforming Dragon’s 5.7% (a 19% relative improvement). Furthermore, Corti demonstrated a higher medical term recall than Dragon (93.5% versus 92.9%). By providing this level of accuracy via an API endpoint, Corti is enabling third-party developers, EHR vendors, and virtual care platforms to build their own custom dictation and ambient listening tools that outperform the industry's legacy incumbent. "We want people to build apps atop our models," Cleve said. "The goal is to diffuse the technology as widely as it is needed so it can be as helpful as possible to patients and their doctors and professionals." For Cleve and his co-founders, the mission is a personal one: Cleve's own mother was a healthcare professional attacked by a patient and spent years struggling to recover. He sought to improve healthcare processes as a way of honoring her sacrifice. Solving the healthcare model puzzle The demands of healthcare extend far beyond English-speaking hospitals, and global health systems have historically been underserved by clinical NLP models. Early adopters are already leveraging Corti’s new models in linguistically demanding environments, proving the technology's viability in complex international markets. Switzerland, for instance, requires care delivery across multiple languages—often simultaneously within a single medical institution. It serves as one of the most stringent proving grounds for multilingual medical speech models in the world. Corti’s Symphony models demonstrated massive performance gains in these non-English tests, achieving a 2.4% WER in German (compared to 13.0% for the next-best system) and a 3.9% WER in French (versus 10.6%). “In a clinical conversation, every word matters - a missed medication name, a misheard dosage, or a mistranscribed symptom can change the meaning of an encounter," said Pierre Corboz, Head of Solutions & Business Development at Voicepoint, a Swiss healthcare technology provider, in a statement provided to VentureBeat. "Symphony’s accuracy on clinical terminology gives us the foundation to bring more trusted AI capabilities into clinical workflows with our Voicepoint Xenon platform. When Corti improves the speech layer, the workflows we build together become sharper, safer, and more useful for clinicians in Switzerland.” AI vrticalization and specialization are yielding gains Today’s announcement of Symphony for Speech-to-Text is not an isolated event; it is the culmination of a strategic narrative Corti has been aggressively pushing over the last several weeks. The broader Symphony platform—which powers clinical and administrative applications for a global network of EHR vendors and life sciences organizations—has been systematically proving the defensibility of vertical AI labs against horizontal tech giants. This marks the third major benchmark Corti has released in just six weeks, touching different layers of healthcare AI performance. In April, the company revealed that its Symphony for Medical Coding system outperformed general-purpose models by more than 25% in clinical accuracy benchmarks, tackling one of healthcare’s most notoriously complex workflows. And just last week, Corti announced that its flagship clinical-grade model outscored OpenAI on HealthBench Professional, OpenAI’s own healthcare benchmark. Taken together, these three data points—medical coding, clinical reasoning, and speech-to-text accuracy—illustrate a growing consensus in the enterprise technology sector: generalized models are hitting a ceiling in regulated industries. Models deployed in hospitals must inherently understand complex acronyms, sudden interruptions, medical shorthand, specialty-specific language, and strict compliance constraints. By training specifically on these unique edge cases, vertical AI labs like Corti are building a formidable moat that companies relying solely on API calls to generalized large language models cannot easily cross. Availability and product lineup Developers are clearly taking notice of the performance gap. According to momentum data provided to VentureBeat, Corti is seeing a 30% growth in new sign-ups for its platform in quarter-to-date comparisons, signaling that developers and healthcare builders are actively gravitating toward vertical, clinical-grade models over generalist APIs. Corti, which already serves over 100 million patients annually across major health systems including the UK’s National Health Service (NHS), is positioning Symphony for Speech-to-Text as the default engine for the next generation of healthcare software. It is important to note that Corti is not launching the overarching Symphony platform itself today; rather, Symphony for Speech-to-Text operates as a new, distinct capability within that broader ecosystem, accessible via its own API endpoints. Symphony for Speech-to-Text is generally available starting today. Developers and enterprise architects can access the models via the Corti API console, with full technical documentation available to help integrate the clinical-grade speech layer into their existing applications. In a move toward research transparency, Corti has also published its full research paper detailing its methodology, along with a separate comparison tool designed to support transparent evaluation of medical speech recognition systems across the industry. As the healthcare industry continues its rapid embrace of AI-driven automation, the foundational data layer has never been more critical. Corti’s latest launch is a stark reminder that in the medical field, generic AI simply isn't good enough. The future belongs to the specialists.

Editor's pickHealthcare
Arxiv· Today

Evaluating the Utility of Personal Health Records in Personalized Health AI

arXiv:2605.18937v1 Announce Type: new Abstract: Patient-managed Personal Health Records (PHRs) promises to empower patients to better understand their health; but information in the record is complex, potentially hindering insights. In this study, we assess the potential of large language models (LLMs, Gemini 3.0 Flash) to provide helpful answers to user health queries, when provided clinical data from PHRs as context. A total of 2,257 user queries were drawn from 3 different distributions to represent patient questions: shorter web search queries, longer questions derived from templates of chatbot conversations, and questions patients asked to their healthcare team (patient calls). Queries were matched with de-identified PHRs (from a pool of 1,945). Gemini responses were generated (1) without PHR context; (2) with a basic summary of demographics, conditions, and medications; (3) with full, extensive clinical notes. For evaluation, we leveraged an existing rating framework (SHARP), and developed a new framework for specific error modes when interpreting PHRs. Evaluation was performed using autoraters for the full set, and with clinician ratings for a subset (n=95), with both sets of raters knowing the full PHR context. We see significant improvements in the helpfulness of answers to all question types with PHR data (p < 0.001, paired t-test). We also observe potential gains in safety, accuracy, relevance and personalization of answers. Our PHR evaluation framework further identifies gaps in LLM understanding of particular aspects of complex PHRs, such as temporal disorientation, and rare but meaningful confabulations. These results suggest potential for PHR data to help people with a wide range of user needs; and provide a framework for monitoring for gaps in LLM answers based on PHR context. This study motivates further work to assess and realize potential benefits to users from understanding their health records.

Editor's pickTransportation & Logistics
Fortune· Today

Grab bets on new delivery robots to fix Singapore’s ‘supply-constrained markets’ and solve the last-mile problem

The Southeast Asian tech company will launch a pilot of its first delivery robot in Singapore’s Punggol district in late 2026.

Editor's pickManufacturing & Industrials
Arxiv· Today

Decentralized autonomous organization and blockchain-based incentivization framework for community-based facilities management

arXiv:2605.18773v1 Announce Type: cross Abstract: Traditional facility management often relies on centralized decision-making structures that limit stakeholder participation, leading to misalignment with occupant needs and reduced satisfaction. This paper proposes a novel blockchain- and Decentralized Autonomous Organization (DAO)-based framework for community-based facilities management in smart buildings. The framework comprises two key components: a decentralized governance platform that facilitates transparent collective decision-making through blockchain-based voting, and a maintenance management platform with an incentivization mechanism that encourages building occupants to actively contribute to facility upkeep through tokenized rewards. System evaluation includes cost analysis, scalability, data security considerations, usability testing, and semi-structured interviews with facility managers and researchers to assess the platform's usefulness, challenges, and adoption potential. The findings demonstrate the framework's potential as a viable incentivization solution for engaging stakeholders in the collective upkeep and improvement of building infrastructure.

AI Organisational Change4 articles
AI Productivity Evidence5 articles
AI ROI & Business Case5 articles
Editor's pick
Arxiv· Today

Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

arXiv:2605.19093v1 Announce Type: new Abstract: System prompts are a central control mechanism in modern AI systems, shaping behavior across conversations, tasks, and user populations. Yet they are difficult to tune when feedback is available only as aggregate metrics rather than per-example labels, failures, or critiques. We study this aggregate feedback setting as sample-constrained black-box optimization over discrete, variable-length text. We introduce ReElicit, a Bayesian optimization framework based on \emph{embedding by elicitation}. Given a task description, previously evaluated prompts, and scalar scores, an LLM elicits a compact, interpretable feature space and maps prompts into it. Leveraging a probabilistic Gaussian process surrogate, an acquisition function then selects target feature vectors, which the LLM realizes and refines into deployable system prompts. Re-eliciting the feature space as new evaluations arrive lets the representation adapt to the observed prompt-score history. We evaluate the setting using offline benchmark accuracy as a controlled aggregate proxy: the optimizer observes one scalar score per prompt and no per-example labels, errors, or critiques. Across ten system prompt optimization tasks with a 30 total evaluation budget, ReElicit achieves the strongest aggregate performance profile among representative aggregate-only prompt-optimization baselines. These results suggest that LLMs can serve as adaptive semantic representation builders, not only prompt generators, for Bayesian optimization over natural-language artifacts.

Editor's pickConsumer & Retail
Theregister· Yesterday

Frustrated franchisee sues Pizza Hut over crappy kitchen AI

The Hut stands accused of breaching its franchise agreement by forcing 'algorithmic behaviors that slowed production and delivery' on restaurants, leading to $100M in losses one group wants back

Editor's pickHealthcare
Healthcare Finance News· Yesterday

Implement AI in the mid-cycle of rev cycle for the biggest return | Healthcare Finance News

ROI shows fairly quickly, and tools can be used right away to advance from simple to more complex cases, says Jeff Francis, CFO and VP for the Methodist Health System.

Geopolitics, Policy & Governance

17 articles
AI Policy & Regulation10 articles
Editor's pickMedia & Entertainment
Arxiv· Today

Beyond Nutrition Labels: How Analogical Reasoning Shapes Synthetic Media Disclosure Design

arXiv:2605.19045v1 Announce Type: new Abstract: As synthetic media proliferates, AI policymakers and practitioners have increasingly turned to disclosures--signals describing how media has been created or modified by AI--to help audiences evaluate media credibility. While there is a growing body of research on user interpretations, the upstream decision-making processes that affect users remain underexplored. This study therefore examines how AI policymakers and practitioners design synthetic media disclosures under complex sociotechnical constraints. Drawing on 23 expert interviews and 13 case studies from organizations participating in the Partnership on AI's Synthetic Media Framework, analysis identifies key disclosure goals, including process transparency and harm reduction, and two central tensions that emerge when pursuing those goals: normativity versus neutrality and proactivity versus precision. Findings highlight the role of analogical reasoning, from nutrition labels to Prop 65 warnings, in managing, but not resolving tensions. Ultimately, this study emphasizes the need for scholarship focused on AI transparency decision-makers and their use of analogical reasoning to support audiences encountering media in the AI age.

Editor's pickGovernment & Public Sector
IAPP· Yesterday

European Commission delivers draft high-risk AI guidelines after delays | IAPP

The European Commission released draft guidelines 19 May aimed at supporting "providers, deployers and other relevant actors in determining whether an AI system falls within the high-risk category." The three-phased guide brings clarity around implementation of high-risk requirements while ...

Editor's pickMedia & Entertainment
Fredericksburg· Yesterday

Three copyright rulings and an EU deadline have rewritten the rules for AI images

Three Copyright Office reports, a UK ruling, and an EU deadline have reshaped the legal landscape for AI-generated images used by businesses.

Editor's pickManufacturing & Industrials
Bisi· Yesterday

AI Omnibus Implementation Rollbacks — Bloomsbury Intelligence and Security Institute (BISI)

The European Union (EU) is attempting to preserve its manufacturing dominance through decoupling industrial AI from the AI Act, risking a fragmented regulatory landscape that favours established companies over startups to keep the industrial growth stable.

Editor's pick
Artificial Intelligence Newsletter | May 20, 2026· Yesterday

EU investment-screening overhaul gets final nod from lawmakers

The EU has approved a revamp of its investment-screening rules, requiring member states to consistently screen foreign deals in sensitive sectors like critical technologies and infrastructure.

Editor's pick
Artificial Intelligence Newsletter | May 20, 2026· Today

US panel weighs if Anthropic risk finding within bounds or 'spectacular overreach'

A US panel is evaluating whether recent risk findings regarding Anthropic's AI models constitute appropriate regulatory oversight or an overreach.

Editor's pickGovernment & Public Sector
Newsmax· Yesterday

Nancy Mace Pushes Limits on AI Data Centers | Newsmax.com

Rep. Nancy Mace, R-S.C., on Monday called for a one-year moratorium on new data center construction in her home state, arguing the rapid expansion of artificial intelligence infrastructure is driving up electricity demand...

Editor's pickMedia & Entertainment
Artificial Intelligence Newsletter | May 19, 2026· 2 days ago

MiniMax, Nanonoble push for dismissal of studios' US copyright case

MiniMax and Nanonoble filed replies in support of dismissing US copyright claims filed by Disney, Universal and Warner Bros. Discovery over their AI image and video generating service, Hailuo AI.

Editor's pickPAYWALLEnergy & Utilities
NYT· Yesterday

Bipartisan Bill Would Impose New Annual Fee on Electric Vehicles

A House transportation bill introduced this week would require owners of electric cars to pay $130 to cover the cost of road repairs.

Editor's pickMedia & Entertainment
Broadcastmediaafrica· Yesterday

"Innovation Without Governance Becomes Institutional Risk" – African Media Leaders Examine AI And Broadcast Compliance - Broadcast Media Africa

As artificial intelligence rapidly reshapes broadcasting across Africa, industry leaders are warning that the future success of broadcasters will depend not only on how quickly they adopt AI, but on how responsibly they use it. This was the central message emerging from the webinar “AI and ...

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.