Wed 20 May 2026
Daily Brief — Curated and contextualised by Best Practice AI
Google Partners with Blackstone, Standard Chartered Cuts Jobs, and China Shields Workers
TL;DR Google and Blackstone are launching a new AI cloud company with $5 billion in equity capital. Standard Chartered plans to cut 8,000 jobs by replacing lower-value human capital with AI. Meta is reorganizing over 7,000 employees around AI initiatives. Meanwhile, China is using legal rulings to protect jobs from AI displacement. Nvidia's $90 billion investment spree ties customers and startups to its technology.
The stories that matter most
Selected and contextualised by the Best Practice AI team
The Insurability Frontier of AI Risk: Mapping Threats to Affirmative Coverage, Silent Exposures, and Exclusions
arXiv:2605.18784v1 Announce Type: cross Abstract: The rapid diffusion of agentic AI has created a new coverage problem for commercial insurance: some AI-mediated losses are now affirmatively insured, some create silent-AI exposure under legacy cyber, technology errors-and-omissions (E&O), directors-and-officers (D&O), employment practices liability (EPLI), crime, and media policies, and others are being actively excluded. This paper maps that emerging boundary by coding 55 AI threat classes against 26 insurance products, endorsements, and exclusion regimes using public carrier materials and OWASP/MITRE threat catalogs. We identify a four-tier insurability frontier: affirmatively insured perils, silent-AI exposures, actively excluded perils, and perils outside conventional private insurance structures. Our coding measures publicly claimed positioning rather than executed contract wording; the headline statistics describe what carriers publicly state about coverage, not what would be paid in any specific claim. Three patterns emerge. First, affirmative AI coverage is beginning to differentiate by primary risk emphasis: public materials often position Munich Re around model performance and drift, Armilla and parts of the Lloyd's market around hallucination and broader AI liability, Tokio Marine Kiln and CFC around IP and technology E&O concerns, Apollo ibott around emerging autonomous system liability, and Coalition around deepfake and AI-enabled cyber response. Second, legacy lines retain silent-AI exposure where AI is an instrumentality rather than the legal cause of loss. Third, foundation model concentration is the clearest genuinely novel insurability frontier because upstream model failure can correlate losses across many cedents at once; the relevant market design question is which insurability constraint each candidate structure relaxes, not merely which systemic risk template exists.
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows
arXiv:2605.19099v1 Announce Type: new Abstract: We introduce DecisionBench, a benchmark substrate for emergent delegation in long-horizon agentic workflows. The substrate fixes a task suite (GAIA, tau-bench, BFCL multi-turn), a peer-model pool (11 models, 7 vendor families), a delegation interface (call_model plus an optional read_profile channel), a deterministic skill-annotation layer, and a multi-axis metric suite covering quality, cost, latency, delegation rate, routing fidelity-at-k, vendor self-preference, and a counterfactual-delegation ceiling. The substrate is agnostic to how peer information is generated or delivered, so learned routers, richer peer memories, adaptive profile construction, and multi-step delegation can all be evaluated against it. We characterize the substrate with a five-condition reference sweep on the full pool (n=23,375 task instances). Three benchmark-level findings emerge: (i) mean end-task quality is statistically indistinguishable across the four awareness conditions (|beta| = 0.21), so quality-only evaluation would miss the orchestration signal; (ii) routing fidelity-at-1 ranges from 7.5% to 29.5% across conditions at near-equal mean quality, with delivery channel (on-demand tool vs. preloaded description) dominating description content; (iii) a counterfactual ceiling places perfect delegation 15-31 percentage points above measured performance on every suite, locating large unrealized headroom for future orchestration methods. We release the substrate, annotation layer, reference intervention suite, analysis pipeline, and 220 per-condition run archives.
Analog Devices to Buy Empower Semiconductor for $1.5 Billion
The deal will help Analog Devices expand its total addressable market in artificial-intelligence compute power delivery as demand from AI developers climbs, the semiconductor company said.
Indeed chief economist says the sectors most exposed to AI are seeing a big growth in job demand
Indeed’s chief economist Svenja Gudell believes AI-exposed industries like software development are actually adding jobs and could enjoy a “wage premium.”
The impossible maths of the AI boom
The IPO of big sector companies is probably nothing more than a transfer of investment risk to retail investors
How Agentic AI Supercharges Startups and Threatens Incumbents
Agentic AI provides startups with faster iteration, automated go-to-market capabilities, and capital efficiency. This poses a strategic threat to incumbents with legacy business models.
SAP customers warned AI agents could put costs on autopilot
Gartner has warned that SAP users adopting its AI agents could face spiraling costs as the vendor moves to a new commercial model. Last week, the German ERP giant announced plans for its Autonomous Enterprise, including an AI platform for building and governing a suite of agents that do business ...
Economics & Markets
Analog Devices to Buy Empower Semiconductor for $1.5 Billion
The deal will help Analog Devices expand its total addressable market in artificial-intelligence compute power delivery as demand from AI developers climbs, the semiconductor company said.
The impossible maths of the AI boom
The IPO of big sector companies is probably nothing more than a transfer of investment risk to retail investors
Wall Street prepares for boom in tech IPOs after Cerebras’ success
Chip designer’s $6.4bn raise signals demand ahead of huge listings expected from SpaceX, OpenAI and Anthropic
Is Nvidia too big to fail?
‘You’re clearly at the centre of everything’
Big Europe and Asian private equity health funds merge to defy AI disruption
Global Healthcare Opportunities and CBC Group say $21bn investment manager will be world’s largest in sector
Google touts its tokenmaxxing and capex spending amid AI orgy
Chocolate Factory readies always-on agents for searchers
AI Eats the World
This presentation frames Generative AI as a major technology cycle, covering capital investment, deployment, model competition, and enterprise adoption.
Big Tech Accumulates Heavy AI Infrastructure Assets | Let's Data Science
These patterns typically raise ... hardware lifecycle planning, and site-level redundancy for teams that previously focused on software-only risks. Industry observers increasingly treat large-scale compute as strategic infrastructure rather than a transient R&D expense. This elevates questions around cross-border supply chains for semiconductors, long-term ...
The stock market that outpaced Nasdaq’s dotcom-era gains
South Korea’s Kospi triples in 18 months powered by Samsung and SK Hynix as AI euphoria continues
AI boom masks deepening cracks in global economy, developing nations most exposed
UNCTAD warns the apparent strength in global trade is distorted by a narrow AI hardware boom in the US and China, masking stagnation in traditional industries and commodities, while developing economies face mounting financial stress, currency pressures, food insecurity.
How Agentic AI Supercharges Startups and Threatens Incumbents
Agentic AI provides startups with faster iteration, automated go-to-market capabilities, and capital efficiency. This poses a strategic threat to incumbents with legacy business models.
Learning to Hand Off: Provably Convergent Workflow Learning under Interface Constraints
arXiv:2605.19140v1 Announce Type: new Abstract: We study workflow learning in a setting where specialized agents hand off control through a shared artifact, each agent observes only a local function of that artifact and its own private state, and no centralized learner accesses joint trajectories -- the operating regime of multi-agent LLM pipelines that span organizational, vendor, or trust boundaries. We formalize this regime as an interface-constrained semi-Markov decision process (IC-SMDP), whose decision epochs occur at handoff times, and design IC-$Q$, an asynchronous decentralized $Q$-learning algorithm in which cross-agent coordination at every handoff is exactly one scalar. Our main result is a finite-sample bound for neural IC-$Q$ that decomposes into three independently controllable error sources: neural function-approximation error, interface representation gap, and a mixing-time residual, under the random option-duration discount. Establishing this bound requires lifting the approximate information state (AIS) framework from single-agent primitive-step MDPs to multi-agent SMDPs and controlling Markovian noise under random duration, neither of which has been done in prior work. To our knowledge this is the first finite-sample guarantee for neural $Q$-learning under decentralized partial observability. Four experiments: a controlled synthetic IC-SMDP that validates the bound term-by-term, multi-LLM mathematical reasoning, multi-agent routing, and multi-agent CPU programming, show that IC-$Q$ matches a centralized oracle without any agent observing joint trajectories, with each of the three error sources scaling along its corresponding axis as the bound predicts.
SAP's AI strategy: Come for the openness, stay because you have to
Joule Studio 2.0 waves the flag of interoperability, API policy tells enterprises who's really in charge
Anthropic’s Stainless steal tightens grip on AI dev tooling
Claude maker nabs SDK and MCP tooling biz, plans to sunset platform
Google, Amazon, Microsoft face further delay in EU's cloud and AI development bill
Legislative proposals regarding cloud and AI development in the EU have been delayed, impacting major US hyperscalers and drawing attention from European competitors.
OpenAI defeats Elon Musk's lawsuit, removes obstacle to IPO | Reuters
People use AI for myriad purposes such as education, facial recognition, financial advice, journalism, legal research, medical diagnoses, and harmful deepfakes.
AI & Tech Brief: The AI Influence Machine - The Washington Post
Scale AI , which provides the data infrastructure for AI model training, launches a new seven-figure ad campaign in New York and San Francisco called “the humans stay.”
The Revenue of Finance Journals: Networks, Pricing Power, and Publication Volume
arXiv:2508.14301v3 Announce Type: replace Abstract: I study commercial revenue at 26 finance journals over 1999-2025, exploiting the creation of the Elsevier Finance Journal Ecosystem (a formal network of coordinated journals planned in 2019 and launched in 2020) as a quasi-natural experiment. Using synthetic control as the primary identification strategy, I find that ecosystem membership generated a projected long-run commercial revenue effect of approximately \$54-\$59 million in real 2024 USD, comprising \$48 million in citation-mechanism-implied APC revenue and \$6-\$11 million in incremental submission-fee revenue (the submission-fee range reflects uncertainty about the share of extra submissions arriving via Elsevier's Article Transfer Service, which generates no incremental fee at the receiving journal). Of this total, approximately \$40-\$44 million is directly observed and realized through 2025 (a \$36 million synthetic-control gap on APC flow revenue plus \$4-\$8 million in incremental submission fees); the remaining \$14-\$15 million reflects standard submission-to-citation-to-revenue propagation lags from citation gains realized in 2019-2025 that are projected to materialize as publication revenue through approximately 2028. The effect is highly concentrated: four core journals (FRL, IRFA, IREF, RIBAF) account for 95% of the gain. Decomposing the revenue effect into intensive (price) and extensive (volume) margins, 89% comes from expanded publication volume; per-paper pricing power rose modestly if at all. The findings speak to the economics of coordinated networks in information-goods markets and to the industrial organization of scholarly publishing.
Healthcare AI firm Commure valued at $7 billion, raises $70 million | Reuters
Agentic AI — which can plan, decide and act autonomously rather than just respond to prompts — has become one of venture capital's most sought-after areas, as investors pile into businesses using the technology to streamline operations.
Exclusive: Circle cofounder raises $30 million for Series A ‘AI-native bank’ Catena Labs
Sean Neville’s startup aims to build banking tools especially designed for AI agents.
AI coworker startup Viktor raises €64.7 million Series A after hitting €12.9 million revenue run rate within 10 weeks of launch
Viktor, a Warsaw and Munich-based AI startup that develops an AI coworker that lives in Slack and Microsoft Teams and works across the tools companies already use, has raised €64.7 million ($75 million) in Series A funding. The round was led by Accel, with participation from Bek Ventures, Kaya VC, Inovo VC and Tenacity Capital. […]
UK payments startup Primer raises €86.2 million Series C to expand AI capabilities and accelerate US growth
Primer, a London-based payments infrastructure startup, today announced a €86.2 million ($100 million) Series C funding round to accelerate its investment in AI for payments and finance teams and to drive its expansion in the US. The round was led by Sofina, with participation from Peak XV Partners and continued backing from all existing investors, […]
Unframe.AI in European competition: An in-depth economic analysis
🚀 Unframe.AI is leaving its stealth phase and promising production-ready enterprise AI in days instead of months. 🇪🇺 The analysis examines whether the California-Berlin startup can conquer the demanding European market. ⚖️ The radical results-based pricing model shifts the risk ...
Berlin-based bunch bags €30.1 million Series B to modernise Europe’s private markets infrastructure
bunch, a Berlin-based FinTech startup offering end-to-end infrastructure for European private markets, has closed its €30.1 million ($35 million) Series B to accelerate European commercial growth, deepen its automation and AI, and expand the platform into new geographies, asset classes, and workflows. The round was led by Portage, with participation from Illuminate Financial, significant follow-on […]
We Asked Top Startup Investors How They Use AI. Here's What They Said. - Business Insider
Venture capitalists, including Ann Miura-Ko and Salil Deshpande, are leveraging AI for investment insights, deal sourcing, and developing internal tools.
CircuitHub takes $28m from Plural to make PCBs the way clouds make compute
CircuitHub has raised $28m led by Plural to expand its automated PCB-manufacturing 'Grid' factories across Europe and the US.
Labor, Society & Culture
Indeed chief economist says the sectors most exposed to AI are seeing a big growth in job demand
Indeed’s chief economist Svenja Gudell believes AI-exposed industries like software development are actually adding jobs and could enjoy a “wage premium.”
CEO Walks Back Comment About Replacing ‘Lower-Value Human Capital’ With AI
Standard Chartered’s Bill Winters walked back his comments in a memo to bank employees on Wednesday.
AI sackings reach New Zealand, which will use it to eject 14 percent of government staff
Minister demands AI becomes ‘basic expectation for all public entities’
Companies Don’t Have to Slash Jobs Because of AI
Harry Haysom/Ikon Images | Carolyn Geason-Beissel “If AI is going to destroy all the jobs, why don’t we just stop?” That was the rhetorical question my college-age son asked after we talked about the possibility of drastic changes to career paths and society thanks to AI (technically, generative AI). It was in line with what […]
ThredUp’s CEO has a warning for five-day companies: You’re going to lose the talent war
A four-day workweek leaves employees more content and well-rested, and that directly translated to increased and sustained revenues.
AI bidding wars: The talent making a fortune as Big Tech firms fight it out | Euronews
As Big Tech and a whole cohort of new-generation AI start-ups race towards artificial general intelligence (AGI), elite researchers and engineering leaders have become the equivalent of franchise athletes.
Report – nearly half of Irish employers have scaled back entry-level hiring
IrishJobs’ research found that hiring in Ireland is becoming increasingly specific, particularly in the area of AI. Read more: Report – nearly half of Irish employers have scaled back entry-level hiring
King’s College London AI work study finds job fears rising | ETIH EdTech News — EdTech Innovation Hub
AI in education, edtech AI tools, and workforce skills are in focus as King’s College London finds UK concern over AI job losses, student preparedness, and entry-level roles. ETIH edtech news covers the new AI and work tracker, employer adoption, university readiness, and retraining.
Council Post: How Autonomous AI Agents Are Reshaping The Workforce
Correctly implemeting AI agents in your workflows requires reimagining the way we work.
The Insurability Frontier of AI Risk: Mapping Threats to Affirmative Coverage, Silent Exposures, and Exclusions
arXiv:2605.18784v1 Announce Type: cross Abstract: The rapid diffusion of agentic AI has created a new coverage problem for commercial insurance: some AI-mediated losses are now affirmatively insured, some create silent-AI exposure under legacy cyber, technology errors-and-omissions (E&O), directors-and-officers (D&O), employment practices liability (EPLI), crime, and media policies, and others are being actively excluded. This paper maps that emerging boundary by coding 55 AI threat classes against 26 insurance products, endorsements, and exclusion regimes using public carrier materials and OWASP/MITRE threat catalogs. We identify a four-tier insurability frontier: affirmatively insured perils, silent-AI exposures, actively excluded perils, and perils outside conventional private insurance structures. Our coding measures publicly claimed positioning rather than executed contract wording; the headline statistics describe what carriers publicly state about coverage, not what would be paid in any specific claim. Three patterns emerge. First, affirmative AI coverage is beginning to differentiate by primary risk emphasis: public materials often position Munich Re around model performance and drift, Armilla and parts of the Lloyd's market around hallucination and broader AI liability, Tokio Marine Kiln and CFC around IP and technology E&O concerns, Apollo ibott around emerging autonomous system liability, and Coalition around deepfake and AI-enabled cyber response. Second, legacy lines retain silent-AI exposure where AI is an instrumentality rather than the legal cause of loss. Third, foundation model concentration is the clearest genuinely novel insurability frontier because upstream model failure can correlate losses across many cedents at once; the relevant market design question is which insurability constraint each candidate structure relaxes, not merely which systemic risk template exists.
Going PLACES: Participatory Localized Red Teaming for Text-to-Image Safety in the Global South
arXiv:2605.19190v1 Announce Type: new Abstract: Despite the global deployment of text-to-image (T2I) models, their safety frameworks are largely calibrated to a Western-centric default, creating significant vulnerabilities for the rest of the world. To embrace cultural pluralism and bring historically under-represented perspectives in T2I safety, we conduct localised community-centered red teaming studies in the Global South. Our two-fold approach prioritizes localization and participation, by focusing on secondary urban centers in these regions, and conducting community engagement and training workshops to contextualize local norms. As a result, we present PLACES, a dataset comprising over 26,000 examples of T2I model failures collected in partnership with universities in Ghana, Nigeria, and two regions of India (Karnataka and Punjab). Analysis of prompts collected reveals a wide-ranging diversity in socio-cultural and linguistic attributes, when compared to existing geography-agnostic crowdsourced red-teaming data. We observe unique adversarial patterns enabled by local cultural and linguistic nuances, and distinct clusters within region around specific themes, such as religion in India. Moreover, we uncover structural contextual gaps in existing safety frameworks by identifying novel harms showing normative dissonance (e.g., violating religious norms, ignoring local customs, and ominous symbolism). This work argues that expanding T2I safety requires moving beyond mere scale to incorporate deeply localised, participatory methodologies for data collection and contextualization. Content warning: This paper includes examples containing potentially harmful or offensive content.
POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents
arXiv:2605.19127v1 Announce Type: new Abstract: LLM agents increasingly have access to private user data and act on the user's behalf when interacting with third-party systems. The user defines what may and must not be shared, and the agent must robustly follow that intent even when third-party systems behave adversarially. We introduce POLAR-Bench (Policy-aware adversarial Benchmark), in which a trusted model with a privacy policy and a task converses with a third-party model that adversarially probes for both task-relevant and protected attributes. Across 10 domains and 7,852 samples, we score privacy and utility by deterministic set-membership and vary privacy policy dimension and attack strategy along two orthogonal axes, producing a 5 times 5 diagnostic surface per model. Our results reveal a sharp split: current frontier models withhold over 99% of protected attributes, while smaller open-weight models in the 1--30B range, the class users most commonly run as their own trusted agent on-device or via private inference, score notably worse, with the weakest leaking over half. POLAR-Bench thus localizes where each model's intent-following breaks down, providing a foothold for privacy alignment where it matters most.
Can LLMs Emulate Human Belief Dynamics?
arXiv:2605.18781v1 Announce Type: cross Abstract: Can LLMs simulate how humans form and change beliefs in social networks? We put this to the test by replicating an established study on belief dynamics, evaluating 12 LLMs across multiple model families and parameter sizes. The answer is a clear no, and in systematic ways. LLMs fail to capture initial human belief distributions and tend to be overall more conformist than humans, shifting their responses to align with those around them. They also take a nuanced approach to emulating human homophilic tendencies within networks. Our findings carry a double payoff: they highlight fundamental properties of LLM behavior, and they raise a sharp warning against deploying LLMs as human proxies in social simulations.
AI engineer says Google unfairly sacked him after he protested against work for Israel
Exclusive: Employment tribunal claim says worker lost his job after distributing leaflets throughout London office Google is facing a legal challenge from an AI engineer who claims he was unfairly dismissed after he protested against its work for the Israeli government, in the latest sign of growing concern about the social and ethical impacts of AI. The engineer distributed flyers around Google DeepMind’s London offices, which read “Google provides military AI to forces committing genocide” and asking colleagues: “Is your paycheck worth this?” He also emailed colleagues about Google’s 2025 decision to drop a promise not to pursue weapons that harm people and surveillance violating international norms and urged them to unionise. Continue reading...
Major online platforms dodge child-safety norms due to weak enforcement, study says
A study by the 5Rights Foundation and LSE found that child safety on major platforms like Meta and TikTok has not significantly improved despite new EU and UK regulations.
Bridging the Disciplinary Gap in Explainable AI: From Abstract Desiderata to Concrete Tasks
arXiv:2605.20081v1 Announce Type: new Abstract: Explainable AI (XAI) is often criticized for failing to satisfy broad desiderata (e.g., fairness, accountability) and for limited practical value to stakeholders. This challenge partly arises because researchers across disciplines prioritize different sets of desiderata that remain underspecified and context-dependent, yet expect XAI to satisfy them simultaneously, resulting in fragmented and sometimes incompatible operationalizations. We argue that many desiderata are not independent, but instead form dependency structures in which higher-level goals (\emph{e.g.}, trust, accountability) rely on more foundational properties (\emph{e.g.}, faithfulness, robustness). Some desiderata are multi-faceted and are best understood within these structures. In particular, instead of addressing all desiderata at once, we focus on subsets of dependency structures and translate them into concrete XAI tasks, thereby decomposing research questions into benchmarkable and solvable units. To this end, we propose a three-axis taxonomy (\emph{target}, \emph{functional role}, and \emph{mode of justification}) and a three-step framework for deriving well-scoped, benchmarkable XAI tasks. Our approach builds on a systematic literature review and conceptual analysis, and supports clarifying desiderata, identifying dependencies, scoping feasibility, and delimiting the design space to derive concrete XAI tasks from abstract desiderata. We illustrate its utility through two explanatory cases, showing how the taxonomy and framework guide systematic task design and evaluation in XAI. {\color{red}{This is a preprint of a paper that will appear in AISoLA 2026.}}
‘I don’t worry about a robot takeover’: AI expert Michael Wooldridge on big tech’s real dangers (and occasional blessings)
Almost 50 years after he first got his hands on a computer, the Oxford professor still believes in the power of technology. Can his beloved game theory explain why Silicon Valley’s entrepreneurs consistently misuse it? Michael Wooldridge is like the teacher you wish you’d had: approachable, able to explain difficult things in simple terms, neither dauntingly highbrow nor off-puttingly cool, and genuinely enthusiastic about what he does. “I love it when you see the light go on in somebody, when they understand something that they didn’t understand before,” he says. “I find that incredibly gratifying.” He comes across a regular sort of guy, which, as an Oxford professor with more than 500 scientific articles and 10 books to his name, he clearly isn’t. Typically, his favourite work is his contribution to Ladybird’s Expert Books – an update of the classic children’s series – on artificial intelligence. “I’m very proud of this,” he says, as he hands me a copy from his bookshelf. We’re in his study in the University of Oxford’s somewhat municipal computing department on a sunny spring day. Maybe it’s the campus setting, but our discussion almost takes the form of a seminar. Continue reading...
Big Four post more job ads for AI specialists than auditors
Increase comes as world’s largest accounting firms rush to adapt to technological disruption
Opinion | Minimum age rules for AI are bad policy - The Washington Post
Young people are told AI will shape our careers. Why shouldn't we be able to access it?
The New Digital Divide: Agentic AI - by David Bachman
I’m in a panic over the upcoming Fall semester. I’m committed to teaching my students the most current skills, but I can’t. Most of them don’t have access to what they need to learn: Agentic AI . AI agents are semi-autonomous systems that accomplish user-specified goals.
Only one in five say education prepares young people for AI future - Educate magazine
The education system is increasingly expected to prepare young people for an AI-driven jobs market but only one in five people believe it is.
63% of Workers Admit to Exaggerating AI Skills as Automation Anxiety Fuels an AI Skills Bubble, New GCheck Report Finds
GCheck released findings from its Automation Anxiety Report 2026, which found 63% percent of workers admit to lying or exaggerating their AI skills....
Technology & Infrastructure
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows
arXiv:2605.19099v1 Announce Type: new Abstract: We introduce DecisionBench, a benchmark substrate for emergent delegation in long-horizon agentic workflows. The substrate fixes a task suite (GAIA, tau-bench, BFCL multi-turn), a peer-model pool (11 models, 7 vendor families), a delegation interface (call_model plus an optional read_profile channel), a deterministic skill-annotation layer, and a multi-axis metric suite covering quality, cost, latency, delegation rate, routing fidelity-at-k, vendor self-preference, and a counterfactual-delegation ceiling. The substrate is agnostic to how peer information is generated or delivered, so learned routers, richer peer memories, adaptive profile construction, and multi-step delegation can all be evaluated against it. We characterize the substrate with a five-condition reference sweep on the full pool (n=23,375 task instances). Three benchmark-level findings emerge: (i) mean end-task quality is statistically indistinguishable across the four awareness conditions (|beta| = 0.21), so quality-only evaluation would miss the orchestration signal; (ii) routing fidelity-at-1 ranges from 7.5% to 29.5% across conditions at near-equal mean quality, with delivery channel (on-demand tool vs. preloaded description) dominating description content; (iii) a counterfactual ceiling places perfect delegation 15-31 percentage points above measured performance on every suite, locating large unrealized headroom for future orchestration methods. We release the substrate, annotation layer, reference intervention suite, analysis pipeline, and 220 per-condition run archives.
AgentNLQ: A General-Purpose Agent for Natural Language to SQL
arXiv:2605.19010v1 Announce Type: new Abstract: Natural language to SQL (NL2SQL) conversion is an important problem for researchers and enterprises due to the ubiquitous importance of relational databases in broad-ranging practical problems. Despite the rapid advancements in the capabilities of LLMs, NL2SQL has not reached parity in accuracy with human expert SQL writers, hence needing additional improvements in NL2SQL algorithms. This study presents a new multi-agent method for NL2SQL that achieves 78.1% semantic accuracy on the BIg Bench for LaRge-scale Database (BIRD) benchmark. Our method leverages a semantically enriched representation of user-provided schema, adds user-provided business rules, and produces accurate SQL queries. The main contributions of this study are (a) We designed an optimized new orchestrator in a multi-agent solution that uses LLMs to plan, orchestrate, reflect, and self-correct to generate accurate SQL queries, (b) We developed an advanced schema enrichment method that creates context-aware metadata to improve accuracy, and (c) We demonstrated the accuracy and generalizability of the method across different domains and datasets by evaluating it on the BIRD-SQL benchmark.
Start thinking agentically
The hard part of agentic AI isn't the technology. It's knowing how to decide where agents should lead, where humans should stay in the loop and how to stop optimizing work at the margins and start redesigning it.
AIwire - Covering Scientific & Technical AI
“Almost right” is not good enough once AI starts making decisions inside a business. That warning came from SAP CEO Christian Klein at the Sapphire 2026 event in Orlando, FL. […]
How Claude Code Works in Large Codebases: Best Practices and Where to Start
Claude Code’s guidance explains how teams can use AI coding tools effectively in large codebases through configuration files, hooks, and subagents.
Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On
arXiv:2605.19035v1 Announce Type: new Abstract: The rapid advancement of Large Language Models has given rise to autonomous LLM-based agents capable of complex reasoning and execution. As these agents transition from isolated operation to collaborative ecosystems, we witness the emergence of the Agent-to-Agent (A2A) network, a paradigm where heterogeneous agents autonomously coordinate to solve multi-step tasks. While these networks may offer better task performance compared to simply using one agent to complete the entire task, they introduce systemic vulnerabilities, such as adversarial composition, semantic misalignment, and cascading operational failures, that existing agent alignment techniques cannot address. In this vision paper, we argue that the trustworthiness of A2A networks cannot be fully guaranteed via retrofitting on existing protocols that are largely designed for individual agents. Rather, it must be architected from the very beginning of the A2A coordination framework. We present a comprehensive conceptual framework that situates trust in A2A systems through four design pillars.
Discoverable Agent Knowledge -- A Formal Framework for Agentic KG Affordances (Extended Version)
arXiv:2605.19186v1 Announce Type: new Abstract: Two decades ago, the Semantic Web Services community was asked how agents with different ontological commitments could discover, compose, and invoke web services coherently. The response was OWL-S and WSMO: formally grounded capability descriptions specifying what a service could do, what the agent must already know for invocation to be epistemically sound, and how ontological mismatches could be formally bridged. Current Knowledge Graph (KG) metadata standards such as VoID and DCAT describe what a KG contains yet say nothing about what a specific agent can prove from it, what closure assumptions govern empty results, or whether the agent's task vocabulary is grounded in the schema. Furthermore, in deployed KGs the governing schema DL and the operative entailment regime can diverge: an epistemic failure mode invisible to current metadata. We revisit and extend these insights for the KG setting with a four-dimensional formal framework from which we derive the Agentic Affordance Profile (AAP): a semantic layer above VoID and DCAT enabling principled KG selection, composition, and failure diagnosis at agent planning time. A five-point research agenda identifies the formal, computational, and engineering work needed to realise AAP-based affordance matching at scale.
Google Launches Antigravity 2.0 at I/O 2026: A Standalone Agent-First Platform with CLI, SDK, Managed Execution, and Enterprise Support - MarkTechPost
Google Launches Antigravity 2.0 at I/O 2026: A Standalone Agent-First Platform with CLI, SDK, Managed Execution, and Enterprise Support
AWS nabs white hot gen AI media creation startup fal, becoming its preferred cloud provider
Generative AI’s rapid transition from text-based chatbots to high-fidelity media—spanning images, video, spatial 3D, and audio—has exposed a glaring bottleneck in the modern tech stack: infrastructure. Rendering pixels in real-time requires a staggering amount of compute, and developers are increasingly struggling to manage fragmented GPU clusters just to keep their applications online. Enter fal, a generative media creation platform that has quietly become the connective tissue for 2.5 million developers across the globe, offering literally hundreds of leading AI image, video, and audio creation and editing models — from proprietary ones like OpenAI's ChatGPT-Images-2.0 and Google's Nano Banana Pro 2 to open source rivals — all through its unified interface and APIs. Today, the San Francisco-based startup, recently valued at a massive $4.5 billion following a $300 million Series D round led by Sequoia Capital, announced it has selected Amazon Web Services (AWS) as its preferred cloud provider. While the financial terms of the deal weren't made public, the move signals a maturation in the generative media space, shifting the focus from simply building foundational models to effectively scaling them for mass, commercial consumption. “AWS has been there for distribution and monetization, and for the use of AI in creative pursuits — helping designers, developers, and the creative community think through how they can use AI responsibly, scalably, and at global scale," said Samira Panah Bakhtiar, General Manager for Media, Entertainment, Games, and Sports at AWS, in an exclusive interview with VentureBeat. A one-stop-shop for Gen AI media allowing enterprises to plug in and choose the best model for their needs At its core, fal operates as a unified gateway to the rapidly expanding generative AI ecosystem. Rather than forcing developers to provision their own servers, deal with latency issues, or string together disparate open-source model weights, fal provides a single, unified API. Through this API, users gain instant access to over 1,000 production-ready AI models. Think of it as the Stripe or Plaid of generative media: abstracting away the devastatingly complex back-end plumbing so developers can focus solely on the user experience. It is a "plug-and-play" solution that has already attracted independent creators and enterprise giants alike, powering generative workflows for enterprises including Canva, Adobe, and Amazon MGM Studios. “Generative media workloads demand a fundamentally different infrastructure layer, one that can handle massive parallel inference, rapid model iteration, and production-grade reliability at scale,” said Gorkem Yurtseven, CTO and Co-founder of fal, in a statement provided to VentureBeat. Neither AWS nor fal specified what other cloud or GPU providers the latter was using prior to their deal together. Asked who fal had been using before AWS, Bakhtiar did not name a prior cloud or GPU provider, saying instead that fal is now using AWS services. In a blog post, fal's Head of Compute Partnerships Emir Lise described AWS as providing the “global scale and reliability layer” for its existing serverless generative-media infrastructure — framing the partnership around elasticity, reliability and enterprise scale rather than a replacement of a named incumbent. A public search turned up Tigris as a storage provider for fal — with Tigris saying fal runs a “global fleet of GPUs across many clouds” — and an announcement from fal in Septemeber 2025 that it was available through Google Cloud Marketplace, allowing customers to buy fal through Google Cloud billing and governance, but that listing does not state that Google Cloud powered fal’s GPU infrastructure. 99.99% guaranteed uptime? By partnering with AWS, fail aims to merge its highly optimized inference engine with Amazon’s global reach to handle millions of daily API calls with 99.99% guaranteed uptime. In addition, Bakhtiar said fal users can expect to see "faster inference and performance, greater efficiency, more scalability, and more seamless service continuity — all things you would expect as a result of partnering with the world’s largest, broadly adopted cloud." Therefore, the primary benefit for fal users is better performance and reliability without changing how they work: faster inference, more scalability, smoother continuity, and access to production-ready AI models without managing their own infrastructure. For fal, the partnership makes its platform stronger for creators, studios, and enterprise customers by backing it with AWS’s security, global scale, and cloud infrastructure. For AWS, it helps push cloud and AI deeper into creative production, not just distribution or monetization. It positions AWS as a key infrastructure partner for studios, media companies, developers, and individual creators building AI-powered content workflows. Offloading the GPU burden The partnership with AWS is designed to address the sheer physics and cost of rendering generative media. By migrating its operations to AWS, fal will be able to leverage Amazon’s broad suite of AI services, including the Bedrock platform, alongside custom-built silicon like Trainium and Graviton processors. "You don't have to manage like a GPU fleet to use the AI for creative pursuits," Bakhtiar explained. This is a critical pain point for larger-scale media generation demands in 2026. Securing high-performance GPUs for parallel inference is both expensive and technically demanding. By shifting that burden to AWS, fal ensures that creatives can focus on their workflows, without needing a dedicated DevOps team. Bakhtiar also noted the powerful "network effect" of building on AWS. Because major studios and creative platforms (like Adobe and Canva) are already deeply entrenched in the AWS ecosystem, integrating fal's API into their existing pipelines becomes a frictionless endeavor. Enterprise-grade security and compliance with gen AI creative speed For IT leaders and developers, fal's architecture offers a distinct advantage regarding licensing, security, and deployment. Historically, utilizing frontier generative models meant either accepting strict vendor lock-in from a single provider or attempting to host open-source models locally. The latter requires significant overhead and forces enterprises to navigate a minefield of disparate open-source licenses (such as MIT, Apache 2.0, or restrictive non-commercial licenses). fal bypasses this friction by offering commercial API access to a curated ecosystem of models. Developers simply pay for the inference they consume. Furthermore, the platform is SOC 2 compliant and explicitly built for "enterprise scale," meaning it meets the stringent data privacy and security benchmarks required by heavily regulated industries and massive consumer platforms. For large media conglomerates, this managed service approach allows them to experiment with the latest state-of-the-art tools securely, without the risk of exposing proprietary data or intellectual property. Empowering devs and vibe coders The true impact of fal’s platform, however, is best observed at the developer level. By democratizing access to high-end infrastructure, fal is enabling a new class of builders—often referred to as "vibe coders"—to create complex, multimodal applications without traditional computer science backgrounds. As Bakhtiar pointed out, access to these tools fundamentally "levels the playing field". Whether it is an individual developer or hobbyist vibe coding a side project, or a fully-funded editor or director rendering a blockbuster film, the underlying technology is now identical, infinitely scalable, and ready for production. “More creatives — whether they’re full-fledged studios, indie brands, or individual content creators — are now going to be able to access these tools, and they’re going to be able to punch way above their weight as a result," Bakhtiar said, casting the partnership as a way to serve even more users through fal thanks to the reliability of AWS's servers and custom Trainium, Graviton and Inferentia chips. The rollout of enhanced AWS capabilities for fal customers will occur in phases throughout 2026.
Baidu says the quiet part out loud: you can't build AI infrastructure, so clouds can cash in
Baidu's CFO noted that GPU rentals are structurally higher margin than traditional CPU cloud services.
OpenAI targets enterprise with guaranteed compute
OpenAI's new Guaranteed Capacity initiative highlights how compute is becoming a critical battleground for enterprises requiring reliable AI access.
French Companies Bid for €10 Billion Europe AI Gigafactory Site
A consortium of European companies will bid on a €10 billion ($11.6 billion) project to build a major data center campus in France as part of a European Union effort to boost artificial intelligence infrastructure on the continent.
Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency
arXiv:2605.19008v1 Announce Type: new Abstract: Modern language-model training is increasingly exposed to instability, degraded runs, and wasted compute, especially under aggressive learning-rate, scale, and runtime-stress conditions. This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer that operates above AdamW. Rather than replacing the optimizer update rule, LBW-Guard observes training telemetry, interprets instability-sensitive regimes, and applies bounded control to optimizer execution while preserving fixed training objectives. We evaluate LBW-Guard in a Qwen2.5-centered stress-and-robustness suite using WikiText-103, with Qwen2.5-7B as the empirical anchor, model-size comparisons against Qwen2.5-3B and Qwen2.5-14B, learning-rate stress tests, gradient-clipping baselines, and a no-LoRA TinyLlama-1B full-parameter sanity check. In the 7B reference setting, LBW-Guard reduces final perplexity from 13.21 to 10.74, an 18.7% improvement, while reducing end-to-end time from 392.54s to 357.02s, a 1.10x speedup. Under stronger learning-rate stress, AdamW degrades to 1885.24 final perplexity at LR=3e-3 and 659.76 at LR=1e-3, whereas LBW-Guard remains trainable at 11.57 and 10.33, respectively. Gradient-clipping baselines do not reproduce this effect. These results support a scoped systems conclusion that stability-sensitive LLM training can benefit from a governance plane above the optimizer. LBW-Guard provides evidence that bounded runtime control can preserve productive compute under stress while remaining distinct from optimizer replacement and local gradient suppression.
OpenAI and Dell Bring Enterprise AI Securely On-Premise
Your daily PM briefing | May 19, 2026 | PM Interview Prep Club
Microsoft drops open-source 4B model that converts any image to 3D in 3 seconds
Microsoft released an open-source 4B parameter model capable of transforming any image into a 3D representation in just three seconds.
Google to release smart glasses and add AI ‘agents’ to search engine
CEO Sundar Pichai says features powered by new Gemini model will close gap with Anthropic and OpenAI
Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits
arXiv:2605.18890v1 Announce Type: cross Abstract: The scientific claims drawn from LLM social simulations should be no stronger than the robustness audits that support them. Generative agents bring new expressive power to agent-based modeling, enabling simulations of collective social processes like cooperation, polarization, and norm formation. Yet they also introduce complexity through additional architectural choices, such as agent specification, memory representation, interaction protocols, and environment design. Small perturbations that appear minor to researchers can cascade into macro-level outcomes through repeated interaction, creating a "butterfly effect." Consequently, scientific claims drawn from LLM social simulations may reflect implementation artifacts rather than the social mechanisms being modeled. We support this position with two case studies: a repeated Prisoner's Dilemma and a social media echo chamber simulation. Across multiple models, minor perturbations in persona format and game-instruction framing shift cooperation rates by up to 76 percentage points, while network homophily and hub assignment produce significant and consistent shifts in polarization metrics. We also find that sensitivity is unevenly distributed across both architectural choices and model families: the same perturbation that produces the 76 pp shift in one frontier model only shifts another by 1 pp. Robustness is therefore a property that should be measured per claim and per model, not assumed. To address this validation gap, we introduce TRAILS (Taxonomy for Robustness Audits In LLM Simulations), a robustness-audit taxonomy spanning three levels of simulation design: agent (micro-level), interaction (meso-level), and system (macro-level). We call for robustness to become a first-order validation requirement before LLM social simulations are used to explain mechanisms, evaluate interventions, or inform decisions.
Interference-Aware Multi-Task Unlearning
arXiv:2605.19042v1 Announce Type: new Abstract: Machine unlearning aims to remove the contribution of designated training data from a trained model while preserving performance on the remaining data. Existing work mainly focuses on single-task settings, whereas modern models often operate in multi-task setups with shared backbones, where removing supervision for one task or instance can unintentionally affect others. We introduce multi-task unlearning with two settings: full-task unlearning, which removes a target instance from all tasks, and partial-task unlearning, which removes supervision only from selected tasks. We show that shared parameters couple the forget and retain sets, causing task-level interference on non-target tasks and instance-level interference on other instances. To address this issue, we propose an interference-aware framework that combines task-aware gradient projection, which constrains updates within task-specific subspaces, with instance-level gradient orthogonalization, which reduces conflicts between forget and retain signals. Experiments on two multi-task computer vision benchmarks across five tasks show that our method achieves effective unlearning while maintaining strong generalization, reducing UIS compared with the strongest baseline by 30.3% in full-task unlearning and 52.9% in partial-task unlearning.
Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance
arXiv:2605.18801v1 Announce Type: new Abstract: Data is fundamental to large language models (LLMs). However, understanding of what makes certain data useful for different stages of an LLM workflow, including training, tuning, alignment, in-context learning, etc., and why, remains an open question. Current approaches rely heavily on extensive experimentation with large public datasets to obtain empirical heuristics for data filtering and dataset construction. These approaches are compute intensive and lack a principled way of understanding the essence of how specific data characteristics drive LLM behavior. In this position paper, we advocate for the need of developing systematic methodologies for generating synthetic sequences from appropriately defined random processes, with the goal that these sequences can reveal useful characteristics when they are used in one or multiple stages of the LLM workflow. We refer to such sequences as data probes. By observing LLM behavior on data probes, researchers can systematically conduct studies on how data characteristics influence model performance, generalization, and robustness. The probing sequences exhibit statistical properties that can be viewed using theoretical concepts, such as typical sets, which are generalized to describe the behaviors of LLMs. This data-probe approach provides a pathway for uncovering foundational insights into the role of data in LLM training and inference, beyond empirical heuristics.
Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection
arXiv:2605.19285v1 Announce Type: cross Abstract: The rapid spread of misinformation on social media platforms has become a formidable challenge. To mitigate its proliferation, Misinformation Detection (MD) has emerged as a critical research topic. Traditional MD approaches based on small models typically perform binary classification through a black-box process. Recently, the rise of Large Language Models (LLMs) has enabled explainable MD, where models generate rationales that explain their decisions, thereby enhancing transparency. Existing explainable MD methods primarily focus on crafting sophisticated prompts to elicit rationales from off-the-shelf LLMs. In this work, we propose a pipeline to fine-tune a dedicated LLM specifically for explainable MD. Our pipeline begins by collecting large-scale fact-checked articles, and then uses multiple strong LLMs to produce veracity predictions and rationales. To ensure high-quality training data, we leverage a filtering strategy that selects only the correct instances for fine-tuning. While this pipeline is intuitive and prevalent, our experiments reveal that naive filtering based solely on label correctness is insufficient in practice and suffers from two critical limitations: (1) Coarse-grained labels cause insufficient rationales: Rationales filtered solely based on binary labels are insufficient to adequately support their decisions; (2) Over-verification behavior causes unnecessary rationales: Stronger LLMs tend to exhibit over-verification behavior, producing excessively verbose and unnecessary rationales. To address these issues, we introduce LONSREX, a novel data synthesis pipeline to Locate Necessary and Sufficient Rationales for Explainable MD. Specifically, we propose a metric that quantifies the contribution of each verification step to the final prediction, thereby evaluating its necessity and sufficiency. Experimental results demonstrate the effectiveness of LONSREX.
Google unveils Gemini Omni 'any-to-any' AI model: what enterprises should know
Although it was already discovered by intrepid AI power users weeks ago, Google's new Gemini Omni model officially debuted today at the company's annual I/O developer conference in Mountain View, California, and it marks a significantly new paradigm in the wider AI and tech marketplace. That's because as its "omni" (from the Latin omne — meaning "all") prefix would suggest, this is Google's first truly native, multimodal model, that is "a model that can create anything from any input — starting with video." The model marks Google's bid to collapse the multimodal generative stack — text-to-image, image-to-video, video-to-video, audio generation — into a single foundation model with a single editing surface. The big question for business leaders is: should you switch any of your own AI stack over to Gemini Omni now? Unfortunately, the truth is, you may not be able to just yet — the model is only available to individual users through Google's AI subscription plans starting with the $20 per user per month "AI Plus" plan. It can currently be accessed on the Gemini website and mobile apps, Google's web-based Flow AI image and video editing suite, and YouTube Shorts. While the company says it is ultimately going to be available via an application programming interface (API) — which many enterprises rely on for their AI needs — it's not ready yet. In a departure, Google also did not issue any public benchmarks for Gemini Omni (yet). However, third-party organizations will no doubt put it to the test on various tasks and user-reported quality metrics. In the meantime, though, its quality and speed remain somewhat subjective. But, given the capabilities and faster editing enabled by the new Omni model, individual members of your team should probably give serious consideration to switching over to it, especially if they work creating visuals for technical diagrams, marketing and comms materials, training and corporate education courses, sales collateral, and basically anything that involves visuals. What Omni actually is Omni is the next chapter of the work that produced Nano Banana, the image-generation and editing model Google shipped roughly a year ago. The first model in the family, Gemini Omni Flash, accepts any combination of text, images, audio, and video as input and produces high-quality output across the same modalities — all from a single model rather than a relay of specialized systems. Google says the model is "natively multimodal from the ground up," which matters less as marketing copy than as an architectural claim: a unified model can reason across modalities in the same forward pass, which generally translates into more coherent edits, fewer pipeline artifacts, and a far cleaner API surface for developers. OpenAI started this trend back in May 2024 with the release of GPT-4o, its first natively "omni" model, also trained from the ground-up to be able to analyze and generate multiple different types of content, from text to code, imagery, and audio. However, it did not support video generation, and the model was eventually deprecated following reports of sycophancy and even users demanding OpenAI retain it after developing parasocial relationships with it. Is Gemini Omni at risk of sparking a similarly devoted following? It remains to be seen. One big difference is that its headline interaction pattern is conversational video editing. Each instruction "builds on the last," and past directions persist across turns so the video evolves coherently as the user iterates. Practical examples Google highlighted include changing the world inside a clip, reimagining an action or camera angle, refining sequences over multiple turns, and generating explainer-style content from short prompts. Google also emphasizes improved physics — gravity, kinetic energy, fluid dynamics — which is the kind of detail that separates "looks like AI video" from "looks like footage." Rollout, pricing, and the API question The first thing enterprise leaders should read carefully is the rollout plan. Omni Flash is going live today inside the Gemini app for U.S. subscribers across AI Plus, AI Pro, and AI Ultra tiers — including the new $100-per-month AI Ultra plan Google announced at the same event. Google says it will roll out to developers via Vertex AI APIs "in the coming weeks." That gap is significant. Until the Vertex API is generally available, Omni is effectively a consumer and prosumer tool. Enterprise pilots beyond individual seat-based experimentation should wait for the API, both because that's where Google's enterprise SLAs and data-handling commitments live, and because production-grade generative video without a programmatic interface is a non-starter. Its pricing through the API per million tokens (presumably) will also determine its viability as an enterprise product outside of film/TV/entertainment and the arts productions. For decision-makers weighing seat economics in the meantime, the new AI Ultra tier is positioned specifically at developers, technical leads, knowledge workers, and advanced creators, with priority access to Google Antigravity, higher usage limits, and bundled Omni Flash access. For small creative teams under tight deadlines, that may be the fastest way to evaluate the model before the API arrives. The enterprise use cases that really matter It is easy to default to "marketing video" as the use case, but Omni's value proposition for enterprises is broader if you think of it as a programmable video and media engine rather than a creative app: Sales and marketing: rapid generation of variant ads, localized creative, and product demos without per-asset agency cycles. Internal communications, learning and development (L&D): explainer videos, onboarding modules, and policy walkthroughs produced by non-specialists. Customer support and documentation: dynamic, query-conditioned visual explainers attached to help articles. Product and engineering: visualization of simulations, UI walkthroughs, and concept videos for spec reviews. Field operations: short, situation-specific instructional clips generated on demand. What changes with Omni versus the previous generation of tools is the unification. Many enterprises stitched a workflow together from text-to-image, image-to-video, lip-sync, and voice models, each with its own contract, billing, and data path. A single Vertex AI-backed model collapses procurement and observability into one place — assuming the eventual API delivers production-grade throughput and latency. The governance story is the most underrated part For CIOs and CISOs, the most important section of Google's announcement is not the model card; it is the provenance and content-safety work shipping alongside it. Every video generated by Omni carries Google's SynthID digital watermark. Google is expanding C2PA Content Credentials across its generative tools, and launching an AI Content Detection API on Agent Platform that lets businesses identify AI-generated content from both Google and other popular models. Partner integrations announced at the same event — including Shutterstock, Avid (in Pro Tools), and at least one major newswire — indicate where the standard is going. For enterprises, this matters in three concrete ways: It gives legal and compliance teams a defensible audit trail for AI-generated media. It allows brand-safety teams to detect AI-generated material entering content pipelines from third parties. And it provides a defensible answer for regulators in jurisdictions, like the EU, that are tightening rules around synthetic-media disclosure. There is also a "Personal Avatars" program that lets creators record short videos to authorize use of their voice and likeness across generated content, as Google leaders and employees showcased themselves today in posts centered around I/O featuring their AI generated likenesses. This puts it in direct competition with Synthesia, a UK-based AI unicorn focused primarily on enterprise-safe AI videos and avatars. For enterprises considering executive videos, training avatars, or branded spokesperson content, the consent model here is the right starting point — but contracts and rights-management policies will need to extend to cover it. Risks worth flagging Omni's main risks are familiar but worth restating. The competitive landscape is crowded with the aforementioned Synthesia, TikTok parent company ByteDance's acclaimed Seedance model, Kuaishou Technology's Kling AI models, and the fast-improving open-source field all compete for the same workflows. Lock-in to any single video model is a real concern when output quality is still leapfrogging quarterly. Latency and cost for production-volume video generation remain unproven outside controlled demos. In addition, the legal status of training data for generative video is unsettled in multiple jurisdictions; enterprises should require clear indemnification language before deploying generated video into customer-facing channels. Furthermore, VentureBeat collaborator and AI YouTuber Sam Witteveen, CEO of enterprise machine learning vendor Red Dragon AI, received early access to Gemini Omni and reported the content restrictions (which some deem to be censorship) to be quite strict, potentially restricting and inhibiting all the potential use cases an enterprise would like to pursue. Thoughts for enterprises considering adoption Omni is worth piloting — but the structure of the pilot matters. For most enterprises, the right move over the next 30 to 60 days is to fund a small, sanctioned experiment with one or two AI Ultra seats in marketing or L&D, while the platform and security teams use that runway to prepare for the Vertex AI API: define data-residency requirements, set up SynthID and C2PA verification in the content pipeline, and stand up the AI Content Detection API alongside existing media-governance tooling. Treat the consumer rollout as a UX preview, not a production plan. When the API arrives, the enterprises that have already done the governance work will be the ones moving Omni into real workflows while everyone else is still drafting policy. Omni is not, by itself, a reason to overhaul an enterprise AI strategy. But it is a strong signal that the multimodal generative stack is consolidating into single models with first-party provenance baked in — and that is a shift technical decision-makers should be planning around now.
KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition
arXiv:2605.19031v1 Announce Type: new Abstract: Kolmogorov-Arnold Networks (KANs) have demonstrated an exceptional ability to learn complex functions on clean, low-dimensional data but struggle to maintain performance on noisy and imperfect real-world datasets. In contrast, conventional multi-layer perceptrons (MLPs) are far more tolerant to noise and computationally efficient. Replacing all MLP components with KANs in HAR models often degrades accuracy and computation efficiency, highlighting an open challenge: how to combine KANs' precision with MLPs' noise robustness and efficiency. To address this, we systematically explore various placements of KAN modules within deep HAR networks and propose a hybrid architecture that strategically synergizes the strengths of both paradigms, which uses a KAN-based input embedding layer, retains MLP layers for intermediate feature mixing, and introduces a specialized LarctanKAN module for final activity classification. Across eight public HAR datasets, the hybrid KAN-MLP model achieves an average macro F1 score relative improvement of 5.33\% compared pure-MLP model, significantly outperforming standalone KAN and MLP baselines. Furthermore, integrating this hybrid strategy into other state-of-the-art HAR architectures consistently boosts their performance. Our findings demonstrate that a carefully orchestrated combination of KAN, MLP, or other conventional neural components yields more robust and accurate HAR models for real-world wearable sensing environments.
GRASP: Deterministic argument ranking in interaction graphs
arXiv:2605.19141v1 Announce Type: cross Abstract: Large language models are increasingly deployed as automated judges to evaluate the strength of arguments. As this role expands, their legitimacy depends on consistency, transparency, and the ability to separate argumentative structure from rhetorical appeal. However, we show that holistic judging - a common LLM-as-a-Judge practice where a model provides a global verdict on a debate - suffers from substantial inter-model disagreement. We argue that this instability arises from collapsing a debate's complex interaction structure into a single opaque score. To address this, we propose GRASP (Gradual Ranking with Attacks and Support Propagation), a deterministic framework that aggregates stable local interaction judgments into a global ranking via a convergent attack--defense propagation operator. We show that local interaction judgments are more reproducible than holistic rankings in LLM-as-a-Judge evaluations, allowing GRASP to produce more consistent global rankings. We further show that GRASP scores do not correlate with human "convincingness" labels, highlighting a vital sociotechnical distinction: GRASP does not measure persuasion, factuality, or rhetorical appeal, but structural sufficiency - a defense-aware notion of argument robustness over the explicit interaction graph. Overall, GRASP offers a transparent and auditable alternative to holistic LLM judging.
The Accessibility Capability Boundary: Operational Limits and Expansion Potential of AI-Generated Browser-Native Accessibility Systems
arXiv:2605.19638v1 Announce Type: cross Abstract: As large language models (LLMs) demonstrate increasing competence in synthesizing functional user interfaces, a fundamental question emerges in accessibility computing: \textit{how far can AI-driven accessibility systems go?} This paper introduces the \textit{Accessibility Capability Boundary} (ACB), a formal framework for reasoning about the operational limits and expansion potential of autonomous accessibility systems, and grounds this theory in a real-world systems artifact. We model accessibility not as a binary compliance property but as a dynamic, multidimensional capability space constrained by measurable variables including deployment latency, cognitive load, infrastructure dependency, offline persistence, interaction complexity, and adaptability. We argue that AI-generated, browser-native systems constructed as single-file HTML artifacts leveraging standard browser APIs may dramatically shift the ACB outward by reducing deployment friction to near-zero and enabling rapid, context-specific interface adaptation. We ground our theoretical framework in the analysis of two real-world exploratory prototypes. The first is an AI-generated browser-native accessibility interface deployed for a blind user in Nepal. The second is a fully functional, open-source webcam alignment assistant for visually impaired users, serving as a concrete systems artifact. Through formal definitions, propositions, and a comparative evaluation matrix, we characterize the regions of the accessibility capability space that such systems can and cannot reach. We further identify remaining computational, infrastructural, and verification constraints that constitute the hard boundaries of this paradigm. This work contributes a theoretical foundation for understanding the scalable limits of autonomous accessibility computing and proposes a research agenda for future work in accessibility-aware AI systems.
Google I/O 2026 Was Not Just a Model Launch. It Was Google Showing the Agent Stack.
Google’s own language around I/O 2026 is “the agentic Gemini era.” Sundar Pichai framed the keynote around Gemini products, conversational AI , infrastructure, models, and agents, not around a single isolated model release.
Introducing Composer 2.5
Composer 2.5 is an update to Cursor’s AI coding model with improvements in long-running tasks, instruction following, and training methods.
Starchild-1: The First Real-Time Multimodal World Model
Starchild-1 introduces a real-time multimodal world model that generates synchronized audio and video while responding to streaming user input.
Morgan Stanley issues China-only iPhones to its Hong Kong bankers
US bank’s move reflects rising concern over data security for staff travelling to mainland China
Locked Out at 8,000 Miles: Why UK-China Partnership Students Are Suffering
arXiv:2605.19367v1 Announce Type: cross Abstract: University cybersecurity protocols have intensified dramatically in response to rising threats of data breaches, ransomware, and credential theft. While necessary, these measures have created a parallel crisis of accessibility - even for students physically on campus. This paper argues that domestic, on-campus students already face significant barriers: mandatory multi-factor authentication (MFA), device compliance rules, browser and operating system restrictions, and administrative remote-management permissions on personal phones and laptops. However, these difficulties are magnified to near-breaking point in the context of international partnerships, such as the increasingly common UK-China transnational education programmes. For a student in China accessing a UK university's virtual learning environment (VLE) from an 8-hour time difference, with no on-hand IT support during their active hours, the same security architecture becomes functionally disabling. Drawing on testimonies from public forums (Reddit's r/college, r/UniUK, r/Professors), higher education IT help boards, and student accounts from UK-China partnership programmes, this paper documents how over-engineering digital security disproportionately harms remote international learners. We show that while on-campus students can at least visit an IT desk or borrow a library terminal, their counterparts in partner institutions abroad face authentication failures, device lockouts, and unsupported browsers with no real-time remedy. The paper concludes that current university security models assume a co-located, 9-to-5, English-time-zone user - an assumption that fails both domestic students and, catastrophically, international partnership cohorts.
Understanding the modern cybercrime landscape
Throughout 2025, HPE observed significant changes in how cybercriminals operate. Analyzing real-world threats, our HPE Threat Labs highlighted an industrialization of the cyber criminals’ methods in its new In the Wild Report, enabling greater scale, speed and structure in their campaigns. They typically use automation and AI to exploit longstanding vulnerabilities, and many have adopted…
If everyone is rushing to board the AI ship why are so few workflows secure? | TechRadar
AI adoption outpaces security, governance and risk controls
Two-Thirds of Nonhuman Accounts Are Unseen and Unmanaged, According to New Identity Gap Report
New research shows identity dark matter continues to expand and erode enterprise identity, resulting in a fragile foundation for agent AI readiness and...
Securing the AI Supply Chain in the European Union - IT Security Guru
The EU’s AI strategy is entering a new phase. Keeper Security's Darren Guccione explains how cybersecurity is now a statutory obligation
Adoption, Deployment & Impact
Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production
arXiv:2605.18818v1 Announce Type: new Abstract: Academic research tends to focus on new models for document understanding creating a wide gap in the literature between model definition and running models at production scale. To close that gap, we present a microservice architecture that encapsulates pipelines of multiple models for classification, optical character recognition (OCR), and large language model structured field extraction as well as our experience running this pipeline on thousands of multi-page documents per hour. We describe our primary design decisions, including a hybrid classification, separation of GPU-bound inference from CPU-bound orchestration, use of asynchronous processing for the many IO-bound operations in the pipeline, and an independent, horizontal scaling strategy. Using batch profiling, we identified two surprising qualitative findings that shape production deployments: OCR, not language-model parsing, dominates end-to-end latency, and the system saturates at a concurrency determined by shared GPU-inference capacity rather than worker count. Our goal is to provide practitioners with concrete architectural patterns for building document understanding systems that work beyond the benchmark; effectively operationalizing models in production.
Towards Zero Trust Architecture: A Pilot Study on Information Systems Security Readiness amongst Small and Medium Enterprises
arXiv:2605.18901v1 Announce Type: cross Abstract: Small and medium enterprises (SMEs) face growing cyber threats but often lack the resources and expertise needed to adopt Zero Trust Architecture (ZTA). This pilot study examines the drivers and barriers shaping SME perceptions of ZTA necessity and proposes an exploratory staged adoption path. Survey data from 64 IT and security professionals in the Asia-Pacific region show that ZTA familiarity and cloud-computing needs are the strongest positive correlates of perceived necessity, whereas accumulated barriers show only a weak negative association. Identity and access management complexity and scalability emerge as the main implementation hurdles. Based on these findings, we propose a three-stage route for SMEs: strengthening identity governance, segmenting high-value assets, and introducing targeted monitoring in line with operational capacity. The study offers early evidence for more realistic Zero Trust transitions in resource-constrained firms.
A Need for Nuance: The Economist’s Andrew Palmer
On today’s episode of the Me, Myself, and AI podcast, Andrew Palmer, senior editor at The Economist, describes how organizations can experiment with generative AI while balancing speed, quality, and risk. At his own organization, Andrew and others test artificial intelligence with human oversight to develop editing and publishing efficiencies. As the host of The […]
Council Post: Why Most Enterprise AI Fails After The Pilot Phase
AI does not usually fail in production. More often, the organization is not ready for it.
AI preparedness gap hits frontline industries hardest in 2026 - Outsource Accelerator
Hospitality, healthcare and logistics rank among the industries least prepared for AI workforce disruption in 2026, according to a new analysis.
“Poisoning the well:” EY retracts cyber report packed with AI slop | Cybernews
Consultancy group Ernst & Young (EY) has withdrawn a cybersecurity report after an investigation by GPTZero found that 70% of the citations within it were either fabricated or broken.
Online child safety campaigners call for US inquiry into Roblox
Groups claim game platform’s design and business model conflict with children’s developmental needs Online child safety campaigners including Jonathan Haidt, the bestselling writer on the mental health impacts of social media, have called on the Trump administration to investigate Roblox, the booming gaming and chat platform used by 150 million people daily, including a large number of under-13s. Haidt’s Anxious Generation Movement, Fairplay and the rightwing anti-pornography National Center on Sexual Exploitation are among groups claiming Roblox’s design and business model conflict with children’s developmental needs. Continue reading...
Google Cloud suspended major customer Railway.com without cause, causing outage
This is the service we get when we spend $10m plus? asks automated code deployment outfit
Corti's new Symphony for Speech-to-Text model beats OpenAI at medical terminology accuracy, highlighting the value of specialized AI
Today, Copenhagen-based healthcare AI Corti is launching Symphony for Speech-to-Text, a new generation of clinical-grade speech recognition models engineered specifically for real-time dictation, conversational transcription, and batch audio processing — and their accuracy rate is the highest for this specific use case yet recorded. "We are focused on ensuring our AI scribes can be trusted by physicians, medical practitioners and patients...the entire healthcare system," said Andreas Cleve, co-founder and CEO of Corti, in an exclusive video call interview with VentureBeat. The performance data the company is bringing to the table paints a stark picture of the current state of enterprise AI: when it comes to highly regulated, specialized industries, domain-specific models can beat out the foundation model providers. In a newly published research paper, Corti revealed that its new clinical-grade speech models reduced word error rates (WER) by up to 93% when compared against leading generalist speech models and APIs on medical terminology. On English medical terminology, its Symphony for Speech-to-Text achieved a remarkably low 1.4% WER. By comparison, OpenAI’s speech model registered a 17.7% WER, ElevenLabs hit 18.1%, Whisper recorded 17.4%, and Parakeet scored 18.9%. Corti’s announcement serves as a critical inflection point for healthcare builders. While general-purpose APIs like OpenAI’s whisper are sufficient for broad-domain transcription, they frequently stumble over medical acronyms, complex medication dosages, shorthand, and noisy emergency room environments. Symphony for Speech-to-Text aims to solve this by providing developers with a highly specialized, production-grade API designed from the ground up for clinical workflows. The agentic era demands flawless data inputs The launch of Symphony for Speech-to-Text highlights a fundamental shift in how healthcare uses voice technology. For decades, medical speech recognition was primarily about generating a static text document for human doctors to review—a digital replacement for a notepad. But as the healthcare industry hurtles into what technologists call the "agentic era," where autonomous AI agents actively assist in clinical decision-making, EHR navigation, and real-time support, the transcript is no longer the final product. It is the foundational data layer. “Speech has always been one of healthcare’s most important inputs,” Cleve said in a statement provided to VentureBeat. “What is changing is what happens after the words are captured. In the agentic era, speech recognition requires more than simply producing a transcript - we need to give AI systems accurate clinical facts to reason from. If a model mishears a medication, dosage, or symptom, every downstream step becomes less reliable. Symphony for Speech-to-Text gives healthcare builders a speech layer accurate enough to thrive in clinical reality.” This is where the compounding danger of high word error rates comes into play. If a general-purpose AI model hallucinates a transcription—turning "hyperthyroidism" into "hypothyroidism," or misinterpreting a critical medication dosage—every subsequent AI agent relying on that transcript will operate on corrupted data. Corti’s architecture mitigates this risk by producing structured, clinically usable output directly from the API, helping downstream AI applications reason over clean facts rather than messy, unformatted text. Nowhere is this more evident than in Corti’s entity recall benchmarks. Symphony for Speech-to-Text reached an astonishing 98.3% recall rate on formatted clinical entities—such as dosages, measurements, and dates. In contrast, Corti reported that the strongest general-purpose baseline model maxed out at just 44.3% recall for the same entities. For developers building ambient AI documentation tools, that 54% gap is the difference between a tool that saves a physician time and a tool that constitutes a medical liability. Dethroning the industry ldears While Corti’s benchmarks against modern LLM builders like OpenAI and ElevenLabs are striking, the company is also taking aim at legacy medical transcription giants. For years, the gold standard for dedicated clinician dictation has been Dragon Medical One. However, these legacy systems were historically optimized strictly for intentional clinician dictation, not as underlying infrastructure for ambient AI, complex multi-party conversations, or real-time clinical support tools. In evaluations of real-world English medical dictation, Corti achieved a 4.6% WER, outperforming Dragon’s 5.7% (a 19% relative improvement). Furthermore, Corti demonstrated a higher medical term recall than Dragon (93.5% versus 92.9%). By providing this level of accuracy via an API endpoint, Corti is enabling third-party developers, EHR vendors, and virtual care platforms to build their own custom dictation and ambient listening tools that outperform the industry's legacy incumbent. "We want people to build apps atop our models," Cleve said. "The goal is to diffuse the technology as widely as it is needed so it can be as helpful as possible to patients and their doctors and professionals." For Cleve and his co-founders, the mission is a personal one: Cleve's own mother was a healthcare professional attacked by a patient and spent years struggling to recover. He sought to improve healthcare processes as a way of honoring her sacrifice. Solving the healthcare model puzzle The demands of healthcare extend far beyond English-speaking hospitals, and global health systems have historically been underserved by clinical NLP models. Early adopters are already leveraging Corti’s new models in linguistically demanding environments, proving the technology's viability in complex international markets. Switzerland, for instance, requires care delivery across multiple languages—often simultaneously within a single medical institution. It serves as one of the most stringent proving grounds for multilingual medical speech models in the world. Corti’s Symphony models demonstrated massive performance gains in these non-English tests, achieving a 2.4% WER in German (compared to 13.0% for the next-best system) and a 3.9% WER in French (versus 10.6%). “In a clinical conversation, every word matters - a missed medication name, a misheard dosage, or a mistranscribed symptom can change the meaning of an encounter," said Pierre Corboz, Head of Solutions & Business Development at Voicepoint, a Swiss healthcare technology provider, in a statement provided to VentureBeat. "Symphony’s accuracy on clinical terminology gives us the foundation to bring more trusted AI capabilities into clinical workflows with our Voicepoint Xenon platform. When Corti improves the speech layer, the workflows we build together become sharper, safer, and more useful for clinicians in Switzerland.” AI vrticalization and specialization are yielding gains Today’s announcement of Symphony for Speech-to-Text is not an isolated event; it is the culmination of a strategic narrative Corti has been aggressively pushing over the last several weeks. The broader Symphony platform—which powers clinical and administrative applications for a global network of EHR vendors and life sciences organizations—has been systematically proving the defensibility of vertical AI labs against horizontal tech giants. This marks the third major benchmark Corti has released in just six weeks, touching different layers of healthcare AI performance. In April, the company revealed that its Symphony for Medical Coding system outperformed general-purpose models by more than 25% in clinical accuracy benchmarks, tackling one of healthcare’s most notoriously complex workflows. And just last week, Corti announced that its flagship clinical-grade model outscored OpenAI on HealthBench Professional, OpenAI’s own healthcare benchmark. Taken together, these three data points—medical coding, clinical reasoning, and speech-to-text accuracy—illustrate a growing consensus in the enterprise technology sector: generalized models are hitting a ceiling in regulated industries. Models deployed in hospitals must inherently understand complex acronyms, sudden interruptions, medical shorthand, specialty-specific language, and strict compliance constraints. By training specifically on these unique edge cases, vertical AI labs like Corti are building a formidable moat that companies relying solely on API calls to generalized large language models cannot easily cross. Availability and product lineup Developers are clearly taking notice of the performance gap. According to momentum data provided to VentureBeat, Corti is seeing a 30% growth in new sign-ups for its platform in quarter-to-date comparisons, signaling that developers and healthcare builders are actively gravitating toward vertical, clinical-grade models over generalist APIs. Corti, which already serves over 100 million patients annually across major health systems including the UK’s National Health Service (NHS), is positioning Symphony for Speech-to-Text as the default engine for the next generation of healthcare software. It is important to note that Corti is not launching the overarching Symphony platform itself today; rather, Symphony for Speech-to-Text operates as a new, distinct capability within that broader ecosystem, accessible via its own API endpoints. Symphony for Speech-to-Text is generally available starting today. Developers and enterprise architects can access the models via the Corti API console, with full technical documentation available to help integrate the clinical-grade speech layer into their existing applications. In a move toward research transparency, Corti has also published its full research paper detailing its methodology, along with a separate comparison tool designed to support transparent evaluation of medical speech recognition systems across the industry. As the healthcare industry continues its rapid embrace of AI-driven automation, the foundational data layer has never been more critical. Corti’s latest launch is a stark reminder that in the medical field, generic AI simply isn't good enough. The future belongs to the specialists.
Evaluating the Utility of Personal Health Records in Personalized Health AI
arXiv:2605.18937v1 Announce Type: new Abstract: Patient-managed Personal Health Records (PHRs) promises to empower patients to better understand their health; but information in the record is complex, potentially hindering insights. In this study, we assess the potential of large language models (LLMs, Gemini 3.0 Flash) to provide helpful answers to user health queries, when provided clinical data from PHRs as context. A total of 2,257 user queries were drawn from 3 different distributions to represent patient questions: shorter web search queries, longer questions derived from templates of chatbot conversations, and questions patients asked to their healthcare team (patient calls). Queries were matched with de-identified PHRs (from a pool of 1,945). Gemini responses were generated (1) without PHR context; (2) with a basic summary of demographics, conditions, and medications; (3) with full, extensive clinical notes. For evaluation, we leveraged an existing rating framework (SHARP), and developed a new framework for specific error modes when interpreting PHRs. Evaluation was performed using autoraters for the full set, and with clinician ratings for a subset (n=95), with both sets of raters knowing the full PHR context. We see significant improvements in the helpfulness of answers to all question types with PHR data (p < 0.001, paired t-test). We also observe potential gains in safety, accuracy, relevance and personalization of answers. Our PHR evaluation framework further identifies gaps in LLM understanding of particular aspects of complex PHRs, such as temporal disorientation, and rare but meaningful confabulations. These results suggest potential for PHR data to help people with a wide range of user needs; and provide a framework for monitoring for gaps in LLM answers based on PHR context. This study motivates further work to assess and realize potential benefits to users from understanding their health records.
Grab bets on new delivery robots to fix Singapore’s ‘supply-constrained markets’ and solve the last-mile problem
The Southeast Asian tech company will launch a pilot of its first delivery robot in Singapore’s Punggol district in late 2026.
Decentralized autonomous organization and blockchain-based incentivization framework for community-based facilities management
arXiv:2605.18773v1 Announce Type: cross Abstract: Traditional facility management often relies on centralized decision-making structures that limit stakeholder participation, leading to misalignment with occupant needs and reduced satisfaction. This paper proposes a novel blockchain- and Decentralized Autonomous Organization (DAO)-based framework for community-based facilities management in smart buildings. The framework comprises two key components: a decentralized governance platform that facilitates transparent collective decision-making through blockchain-based voting, and a maintenance management platform with an incentivization mechanism that encourages building occupants to actively contribute to facility upkeep through tokenized rewards. System evaluation includes cost analysis, scalability, data security considerations, usability testing, and semi-structured interviews with facility managers and researchers to assess the platform's usefulness, challenges, and adoption potential. The findings demonstrate the framework's potential as a viable incentivization solution for engaging stakeholders in the collective upkeep and improvement of building infrastructure.
AI startup Unframe raises $50 million Series B after surpassing $100 million in contracts within a year | CTech
The Israeli startup, founded by former Noname Security executives, says it has signed more than $100 million in multi-year enterprise AI contracts as companies race to move AI projects from pilot stages into full-scale deployment.
A New Personal Finance Experience in ChatGPT
OpenAI is previewing a personal finance experience in ChatGPT that lets U.S. Pro users connect financial accounts and view spending.
Tinder Is Betting Gen Z Daters Would Rather Be Offline
The dating app’s turnaround bets on live events, group dating and an AI-heavy redesign.
Zurich-based AVIAN raises €2.2 million to prevent industrial fires with always-on AI thermal monitoring
AVIAN, a Zurich-based industrial AI company building 24/7 thermal monitoring for high-risk operations, has raised a €2.2 million ($2.6 million) pre-Seed round, led by Founderful. The company was profitable and entirely bootstrapped for two years before raising capital. With this round, it plans to expand engineering and deployment capacity and scale beyond its stronghold in […]
Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use
arXiv:2605.19151v1 Announce Type: new Abstract: We formalize trust calibration for agentic tool use (deciding when an automated agent's proposed action may execute autonomously versus require human approval) as a preference-learning problem. A policy gateway maintains a Gaussian-process posterior over a latent human risk-tolerance function, observed through a probit likelihood on binary approve/deny feedback, and escalates to the human exactly where the approval outcome is most uncertain. We show this is structurally an instance of Preferential Bayesian Optimization, inheriting its inference machinery (approximate Gaussian-process classification) and its sample-efficiency argument (uncertainty-targeted querying), while differing in objective: classifying an action space into allow/block/ask regions rather than optimizing a design.
CEO of AI-powered performance review firm says annual evaluations weren’t designed for the AI era: ‘The practice just hasn’t kept up’
15Five CEO David Hassell told Fortune that companies must stop relying on the annual performance review cadence.
Survey Finds AI Culture Gap Quietly Undermining Workplaces - Ghanamma.com
Workplace A new workplace study reveals that while employees are rapidly embracing artificial intelligence (AI), many are doing so without guidance, transparency or institutional support, creating what researchers describe as a hidden risk to organisational productivity and innovation.
Why enterprise decision-making Is moving beyond traditional hierarchies - The Economic Times
As organisations become more interconnected and operationally complex, traditional hierarchies are slowing decision-making, collaboration, and execution. Increasingly, enterprises are shifting toward decentralised leadership, cross-functional teams, and distributed decision-making models to ...
14K Subs, AI Writes 60% of Code, and the Consulting Boom Nobody Saw Coming 🎯
Airbnb revealed this week that AI is now responsible for generating 60% of its new code. Not assisting. Not suggesting. Generating. The company’s engineering teams are focused on reviewing, directing, and integrating — not writing from scratch.
Microsoft Work Trend Index 2026 Shows AI Productivity Is Not Enough
The 2026 Work Trend Index shows that marginal AI productivity gains are outpacing organizational redesign that might harness AI for durable strategic advantage.
81% of Enterprise Technology Leaders Report Production Failures from AI-Generated Code, New Research Shows
CloudBees released its State of Code Abundance report, revealing that 81% of enterprise tech leaders have seen production failures from AI-generated code....
How High-Performance Computing and AI Accelerated Applied Energy Research in 2025 - CleanTechnica
Support CleanTechnica's work through a Substack subscription or on Stripe. Kestrel Supercomputer Advanced More Than 500 Energy Modeling and Simulation Projects By Julia Medeiros Coad The National Laboratory of the Rockies’ (NLR’s) advanced computing capabilities continue to grow with the ...
In Small-Business Credit, The AI Question Isn’t Settled And The Operators Disagree
After more than a decade of putting AI to work on small-business loans, the verdict is split. New 2025 Federal Reserve data and a regulatory rewrite are forcing the industry to pick a side.
Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts
arXiv:2605.19093v1 Announce Type: new Abstract: System prompts are a central control mechanism in modern AI systems, shaping behavior across conversations, tasks, and user populations. Yet they are difficult to tune when feedback is available only as aggregate metrics rather than per-example labels, failures, or critiques. We study this aggregate feedback setting as sample-constrained black-box optimization over discrete, variable-length text. We introduce ReElicit, a Bayesian optimization framework based on \emph{embedding by elicitation}. Given a task description, previously evaluated prompts, and scalar scores, an LLM elicits a compact, interpretable feature space and maps prompts into it. Leveraging a probabilistic Gaussian process surrogate, an acquisition function then selects target feature vectors, which the LLM realizes and refines into deployable system prompts. Re-eliciting the feature space as new evaluations arrive lets the representation adapt to the observed prompt-score history. We evaluate the setting using offline benchmark accuracy as a controlled aggregate proxy: the optimizer observes one scalar score per prompt and no per-example labels, errors, or critiques. Across ten system prompt optimization tasks with a 30 total evaluation budget, ReElicit achieves the strongest aggregate performance profile among representative aggregate-only prompt-optimization baselines. These results suggest that LLMs can serve as adaptive semantic representation builders, not only prompt generators, for Bayesian optimization over natural-language artifacts.
Frustrated franchisee sues Pizza Hut over crappy kitchen AI
The Hut stands accused of breaching its franchise agreement by forcing 'algorithmic behaviors that slowed production and delivery' on restaurants, leading to $100M in losses one group wants back
Implement AI in the mid-cycle of rev cycle for the biggest return | Healthcare Finance News
ROI shows fairly quickly, and tools can be used right away to advance from simple to more complex cases, says Jeff Francis, CFO and VP for the Methodist Health System.
Five ways contractors can turn a growing compliance burden into a competitive advantage with AI | Federal News Network
What was once a back-office function is now directly impacting how firms win work, deliver programs and withstand audit scrutiny.
Banking's Most Boring Layer Is Quietly Becoming AI's Best ROI Story - Benzinga
The chatbots get most of the press. Some of the more measurable returns on AI in financial services are appearing in areas that do not look glamorous: compliance, AML, and fraud. The economics may now be influencing how some banks approach AI purchasing decisions · For a long time, compliance ...
Geopolitics, Policy & Governance
SMIC founder and AMEC CEO urge Chinese fabs to test domestic chipmaking tools on active production lines — equipment makers post record revenue but falling margins | Tom's Hardware
The broadcast itself is ultimately ... supply lines are cut. ... Luke James is a freelance writer and journalist. Although his background is in legal, he has a personal interest in all things tech, especially hardware and microelectronics, and anything regulatory. ... Tech Industry Chinese fabs import record volumes of US chipmaking equipment via Singapore and Malaysia · Semiconductors China's top ...
The Power of the AI Chip: The Techno-statecraft Approach in the US-China Great Power Rivalry - Modern Diplomacy
The recent H200 chip export authorisation marks a new form of power signalling in the technopolitics sphere.
EU taps Sweden’s EQT to manage major €5bn Scale-up Europe Fund
The EU has chosen Swedish investment giant EQT to run a new €5bn fund aimed at keeping Europe’s most promising deep tech companies on home soil. Read more: EU taps Sweden’s EQT to manage major €5bn Scale-up Europe Fund
Industry minister says Australia should not just be AI customer
Australia's Minister for Industry and Innovation Tim Ayres stated the government aims to invest in R&D and heavy industries to ensure the country is not merely a consumer of AI technology.
South Korea's defense chips 99% import-dependent as photonic supply chain crisis looms
Photonic semiconductors have become indispensable to modern defense — prized for their ultra-high-speed data processing, high capacity, low power consumption, and exceptional reliability. Yet despite their growing strategic importance, South Korea remains almost entirely dependent on foreign ...
India eyes global IP leadership in AI, 6G and electronics
India's patent and telecom leaders urged a shift toward design-led innovation and stronger intellectual property creation, positioning AI and 6G at the center of the country's knowledge-economy ambitions.
Beyond Nutrition Labels: How Analogical Reasoning Shapes Synthetic Media Disclosure Design
arXiv:2605.19045v1 Announce Type: new Abstract: As synthetic media proliferates, AI policymakers and practitioners have increasingly turned to disclosures--signals describing how media has been created or modified by AI--to help audiences evaluate media credibility. While there is a growing body of research on user interpretations, the upstream decision-making processes that affect users remain underexplored. This study therefore examines how AI policymakers and practitioners design synthetic media disclosures under complex sociotechnical constraints. Drawing on 23 expert interviews and 13 case studies from organizations participating in the Partnership on AI's Synthetic Media Framework, analysis identifies key disclosure goals, including process transparency and harm reduction, and two central tensions that emerge when pursuing those goals: normativity versus neutrality and proactivity versus precision. Findings highlight the role of analogical reasoning, from nutrition labels to Prop 65 warnings, in managing, but not resolving tensions. Ultimately, this study emphasizes the need for scholarship focused on AI transparency decision-makers and their use of analogical reasoning to support audiences encountering media in the AI age.
European Commission delivers draft high-risk AI guidelines after delays | IAPP
The European Commission released draft guidelines 19 May aimed at supporting "providers, deployers and other relevant actors in determining whether an AI system falls within the high-risk category." The three-phased guide brings clarity around implementation of high-risk requirements while ...
Three copyright rulings and an EU deadline have rewritten the rules for AI images
Three Copyright Office reports, a UK ruling, and an EU deadline have reshaped the legal landscape for AI-generated images used by businesses.
AI Omnibus Implementation Rollbacks — Bloomsbury Intelligence and Security Institute (BISI)
The European Union (EU) is attempting to preserve its manufacturing dominance through decoupling industrial AI from the AI Act, risking a fragmented regulatory landscape that favours established companies over startups to keep the industrial growth stable.
EU investment-screening overhaul gets final nod from lawmakers
The EU has approved a revamp of its investment-screening rules, requiring member states to consistently screen foreign deals in sensitive sectors like critical technologies and infrastructure.
US panel weighs if Anthropic risk finding within bounds or 'spectacular overreach'
A US panel is evaluating whether recent risk findings regarding Anthropic's AI models constitute appropriate regulatory oversight or an overreach.
Nancy Mace Pushes Limits on AI Data Centers | Newsmax.com
Rep. Nancy Mace, R-S.C., on Monday called for a one-year moratorium on new data center construction in her home state, arguing the rapid expansion of artificial intelligence infrastructure is driving up electricity demand...
MiniMax, Nanonoble push for dismissal of studios' US copyright case
MiniMax and Nanonoble filed replies in support of dismissing US copyright claims filed by Disney, Universal and Warner Bros. Discovery over their AI image and video generating service, Hailuo AI.
Bipartisan Bill Would Impose New Annual Fee on Electric Vehicles
A House transportation bill introduced this week would require owners of electric cars to pay $130 to cover the cost of road repairs.
"Innovation Without Governance Becomes Institutional Risk" – African Media Leaders Examine AI And Broadcast Compliance - Broadcast Media Africa
As artificial intelligence rapidly reshapes broadcasting across Africa, industry leaders are warning that the future success of broadcasters will depend not only on how quickly they adopt AI, but on how responsibly they use it. This was the central message emerging from the webinar “AI and ...
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.