Wed 22 April 2026
Daily Brief — Curated and contextualised by Best Practice AI
Google Funds AI Adoption, Anthropic Secures $100bn, and Insurers Cap Cyber Risks
TL;DR Google Cloud has launched a $750 million fund to help consulting firms adopt AI technologies. Anthropic and Amazon have agreed on a $100 billion deal to enhance AI infrastructure. Insurers like Beazley and QBE are moving to cap cyber payouts related to AI and 'LLMjacking'. Meanwhile, Vista Equity Partners is partnering with Google Cloud to accelerate AI deployment across its software portfolio.
The stories that matter most
Selected and contextualised by the Best Practice AI team
Google Launches $750 Million Fund for Consultants to Adopt AI
Alphabet Inc.’s Google Cloud is launching a $750 million fund to help consulting firms including McKinsey & Co., Accenture Plc and Deloitte bring agentic artificial intelligence to their clients.
Anthropic and Amazon agree $100bn AI infrastructure deal
Start-up behind Claude tool seeks to bulk up on chips and computing power after suffering outages this year
Generative AI at Work: From Exposure to Adoption across 35 European Countries
arXiv:2604.18849v1 Announce Type: new Abstract: Generative AI diffuses at pace across European workplaces, but unevenly. Using the 2024 European Working Conditions Survey of more than 36,600 workers across 35 countries, we examine who adopts generative AI and whether early adoption has begun to reshape the task content of jobs. Adoption averages 12\% but ranges from under 3% to 25% across countries. Although occupational exposure strongly predicts uptake, AI does not diffuse passively along exposure lines. At the worker level, individual skills, non-routine cognitive job content within occupations, and employee say in organisational decisions steepen the exposure-adoption gradient; at the country level, so do digitalisation and workplace training provision. A gender gap persists, concentrated in the most exposed occupations. A shift-share design finds no detectable effect of early adoption on worker-reported technology-related task restructuring, consistent with a transitional phase in which AI is fitted into changing work processes rather than actively reshaping them.
AutomationBench
arXiv:2604.18934v1 Announce Type: new Abstract: Existing AI benchmarks for software automation rarely combine cross-application coordination, autonomous API discovery, and policy adherence. Real business workflows demand all three: a single task may span a CRM, inbox, calendar, and messaging platform - requiring the agent to find the right endpoints, follow a policy document, and write correct data to each system. To address this gap, we introduce AutomationBench, a benchmark for evaluating AI agents on cross-application workflow orchestration via REST APIs. Drawing on real workflow patterns from Zapier's platform, tasks span Sales, Marketing, Operations, Support, Finance, and HR domains. Agents must discover relevant endpoints themselves, follow layered business rules, and navigate environments with irrelevant and sometimes misleading records. Grading is programmatic and end-state only: whether the correct data ended up in the right systems. Even the best frontier models currently score below 10%. AutomationBench provides a challenging, realistic measure of where current models stand relative to the agentic capabilities businesses actually need.
Who Benefits from AI? Self-Selection, Skill Gap, and the Hidden Costs of AI Feedback
arXiv:2409.18660v2 Announce Type: replace Abstract: Feedback from artificial intelligence (AI) is increasingly easy to access and research has already established that people learn from it. But individuals choose when and how to seek such feedback, and more engaged and motivated individuals may seek it more, creating an illusion of effectiveness that masks self-selection. We investigate how the endogenous choice to seek AI feedback shapes both individual learning and collective outcomes. Using data from over five years and 52,000 individuals on an online chess platform, we show that motivated and higher-skilled individuals self-select into AI feedback use-and use it more productively. This self-selection creates an illusion of AI effectiveness: apparent learning gains disappear once endogenous motivation is accounted for. This same selection mechanism drives two population-level consequences. Because motivated, higher-skilled individuals benefit disproportionately, AI access widens the skill gap. And because individuals exposed to centralized AI feedback converge on common input from a centralized AI source, intellectual diversity declines. Leveraging 42 platform-level natural experiments, we show this diversity reduction is causal. Self-selection into AI use thus connects individual-level learning dynamics to collective-level consequences-a micro-macro linkage with implications for organizational learning, human capital development, and the design of AI-augmented work.
Google Cloud Releases New TPU Chip Lineup in Bid to Speed Up AI
Alphabet Inc.’s Google Cloud division unveiled the latest generation of its tensor processing unit, or TPU, a homegrown chip that’s designed to make AI computing services faster and more efficient.
Vista Strikes Deal to Speed Up Google AI in Software Portfolio
Vista Equity Partners is partnering with Alphabet Inc.’s Google Cloud to accelerate the deployment of artificial intelligence across the private equity firm’s portfolio of more than 90 software firms.
Insurers move to cap cyber payouts related to AI and ‘LLMjacking’
Beazley and QBE are among the groups proposing to limit losses from the rapidly advancing technology
Governed Auditable Decisioning Under Uncertainty: Synthesis and Agentic Extension
arXiv:2604.19112v1 Announce Type: new Abstract: When automated decision systems fail, organizations frequently discover that formally compliant governance infrastructure cannot reconstruct what happened or why. This paper synthesizes an operational governance evidence framework -- structural accountability collapse diagnostics, decision trace schemas, evidence sufficiency measurement, and label-free monitoring -- into an integrated chain and analytically assesses its transferability across four decision system architectures. The cross-architecture comparison reveals a governance coverage gradient: deterministic rule engines achieve full DES-property fillability, hybrid ML+rules systems achieve partial fillability, classical ML systems achieve only minimal fillability, and agentic AI systems encounter structural breaks. We introduce the cascade of uncertainty, showing how governance failures propagate through serial dependencies between framework layers. For agentic systems, we identify three structural breaks -- decision diffusion, evidence fragmentation, and responsibility ambiguity -- and propose corresponding analytical extensions. Four propositions formalize the gradient, cascade compounding, delegation-depth effects, and extension sufficiency, establishing boundary conditions for the framework's valid operating envelope.
The modern data stack was built for humans asking questions. Google just rebuilt its for agents taking action.
Enterprise data stacks were built for humans running scheduled queries. As AI agents increasingly act autonomously on behalf of businesses around the clock, that architecture is breaking down — and vendors are racing to rebuild it. Google's answer, announced at Cloud Next on Wednesday, is the Agentic Data Cloud. The architecture has three pillars: Knowledge Catalog. Automates semantic metadata curation, inferring business logic from query logs without manual data steward intervention Cross-cloud lakehouse. Lets BigQuery query Iceberg tables on AWS S3 via private network with no egress fees Data Agent Kit. Drops MCP tools into VS Code, Claude Code and Gemini CLI so data engineers describe outcomes rather than write pipelines "The data architecture has to change now," Andi Gutmans, VP and GM of Data Cloud at Google Cloud, told VentureBeat. "We're moving from human scale to agent scale." From system of intelligence to system of action The core premise behind Agentic Data Cloud is that enterprises are moving from human‑scale to agent‑scale operations. Historically, data platforms have been optimized for reporting, dashboarding, and some forecasting — what Google characterizes as “reactive intelligence.” In that model, humans interpret data and decide what to do. Now, with AI agents increasingly expected to take actions directly on behalf of the business, Gutmans argued that data platforms must evolve into systems of action. "We need to make sure that all of enterprise data can be activated with AI, that includes both structured and unstructured data," Gutmans said. "We need to make sure that there's the right level of trust, which also means it's not just about getting access to the data, but really understanding the data." The Knowledge Catalog is Google's answer to that problem. It is an evolution of Dataplex, Google's existing data governance product, with a materially different architecture underneath. Where traditional data catalogs required data stewards to manually label tables, define business terms and build glossaries, the Knowledge Catalog automates that process using agents. The practical implication for data engineering teams is that the Knowledge Catalog scales to the full data estate, not just the curated subset that a small team of data stewards can maintain by hand. The catalog covers BigQuery, Spanner, AlloyDB and Cloud SQL natively, and federates with third-party catalogs including Collibra, Atlan and Datahub. Zero-copy federation extends semantic context from SaaS applications including SAP, Salesforce Data360, ServiceNow and Workday without requiring data movement. Google's lakehouse goes cross cloud Google has had a data lakehouse called BigLake since 2022. Initially it was limited to just Google data, but in recent years has had some limited federation capabilities enabling enterprises to query data found in other locations. Gutmans explained that the previous federation worked through query APIs, which limited the features and optimizations BigQuery could bring to bear on external data. The new approach is storage-based sharing via the open Apache Iceberg format. That means whether the data is in Amazon S3 or in Google Cloud , he argued it doesn't make a difference. "This truly means we can bring all the goodness and all the AI capabilities to those third-party data sets," he said. The practical result is that BigQuery can query Iceberg tables sitting on Amazon S3 via Google's Cross-Cloud Interconnect, a dedicated private networking layer, with no egress fees and price-performance Google says is comparable to native AWS warehouses. All BigQuery AI functions run against that cross-cloud data without modification. Bidirectional federation in preview extends to Databricks Unity Catalog on S3, Snowflake Polaris and the AWS Glue Data Catalog using the open Iceberg REST Catalog standard. From writing pipelines to describing outcomes The Knowledge Catalog and cross-cloud lakehouse solve the data access and context problems. The third pillar addresses what happens when a data engineer actually sits down to build something with all of it. The Data Agent Kit ships as a portable set of skills, MCP tools and IDE extensions that drop into VS Code, Claude Code, Gemini CLI and Codex. It does not introduce a new interface. The architectural shift it enables is a move from what Gutmans called a "prescriptive copilot experience" to intent-driven engineering. Rather than writing a Spark pipeline to move data from source A to destination B, a data engineer describes the outcome — a cleaned dataset ready for model training, a transformation that enforces a governance rule — and the agent selects whether to use BigQuery, the Lightning Engine for Apache Spark or Spanner to execute it, then generates production-ready code. "Customers are kind of sick of building their own pipelines," Gutmans said. "They're truly more in the review kind of mode, than they are in the writing the code mode." Where Google and its rivals diverge The premise that agents require semantic context, not just data access, is shared across the market. Databricks has Unity Catalog, which provides governance and a semantic layer across its lakehouse. Snowflake has Cortex, its AI and semantic layer offering. Microsoft Fabric includes a semantic model layer built for business intelligence and, increasingly, agent grounding. The dispute is not over whether semantics matter — everyone agrees they do. The dispute is over who builds and maintains them. "Our goal is just to get all the semantics you can get," he explained, noting that Google will federate with third-party semantic models rather than require customers to start over. Google is also positioning openness as a differentiator, with bidirectional federation into Databricks Unity Catalog and Snowflake Polaris via the open Iceberg REST Catalog standard. What this means for enterprises Google's argument — and one echoed across the data infrastructure market — is that enterprises are behind on three fronts: Semantic context is becoming infrastructure. If your data catalog is still manually curated, it will not scale to agent workloads — and Gutmans argues that gap will only widen as agent query volumes increase. Cross-cloud egress costs are a hidden tax on agentic AI. Storage-based federation via open Iceberg standards is emerging as the architectural answer across Google, Databricks and Snowflake. Enterprises locked into proprietary federation approaches should be stress-testing those costs at agent-scale query volumes. Gutmans argues the pipeline-writing era is ending. Data engineers who move toward outcome-based orchestration now will have a significant head start.
Forget call centers, local energy prices mean Britain's latest offshoring wave is AI projects
Brit firms look to run tech overseas as govt tries to support 'sovereign' creators One in five UK firms have already moved AI workloads abroad due to high energy costs, in findings likely to alarm a government counting on AI to drive economic growth.…
Elite law firm Sullivan & Cromwell admits to AI ‘hallucinations’
Firm whose partners bill more than $2,000 per hour apologises to judge for software-driven errors in bankruptcy case
Economics & Markets
Google Launches $750 Million Fund for Consultants to Adopt AI
Alphabet Inc.’s Google Cloud is launching a $750 million fund to help consulting firms including McKinsey & Co., Accenture Plc and Deloitte bring agentic artificial intelligence to their clients.
Bezos’s Project Prometheus AI lab nears $38bn valuation in funding deal
Company code-named Project Prometheus is working on models for industrial applications
Tesla’s Cooling AI Hype Overshadows Blowout Earnings Forecasts
Tesla Inc. investors are in for a rare treat Wednesday afternoon: an earnings report that analysts say should be a blowout. The trouble is the actual numbers are likely to get overlooked as Wall Street seeks evidence that Elon Musk’s artificial intelligence and robotics ventures justify the stock’s sky-high valuation.
🍏 Apple’s AI bet got a CEO
Apple’s board picked John Ternus, senior vice president of Hardware Engineering, to succeed Tim Cook on September 1.
EQT Chief Sees AI Rout as Opportunity for Fresh Tech Bets
The broad selloff in public markets triggered by fears of artificial intelligence disruption provides an opportunity to snap up technology firms on the cheap, according to private equity giant EQT AB.
Amazon investing up to $25bn in Anthropic AI infrastructure deal
This latest investment is in addition to the $8bn Amazon has already invested in the AI company. Read more: Amazon investing up to $25bn in Anthropic AI infrastructure deal
Anthropic attracts a 5 billion US Dollars investment from Amazon which also commits for a further 20 billion
The AI unicorn will invest 100 billion in 10 years for implementing a 5 GW computing AWS and achieving revenues of 30 billion. The firm is also holding talks with the Pentagon for a contract On 21 April, Tuesday, Anthropic, the US unicorn that siblings Dario Amodei and Daniela Amodei founded, said it consolidated its partnership with NYSE-listed tech […]
Musk’s SpaceX Goals Shift Ahead of Its I.P.O.
As SpaceX prepares to go public, Mr. Musk has proposed moonshots that differ from the company’s original aim of reaching Mars.
DeepSeek in Talks to Raise at $20 Billion Value, Information Says
Chinese technology giants Tencent Holdings Ltd. and Alibaba Group Holding Ltd. are in discussions to invest funds into DeepSeek that would value the artificial intelligence startup above $20 billion, according to The Information.
SpaceX Strikes Deal With Cursor for $60 Billion
The potential acquisition comes as Elon Musk’s rocket and satellite maker, which has been emphasizing artificial intelligence, is preparing to go public.
SpaceX obtains right to buy AI start-up Cursor for $60bn
Elon Musk’s rocket and AI conglomerate is seeking to catch up to rivals OpenAI and Anthropic
SpaceX agrees rights to buy AI coding darling Cursor for $60bn
As it vies to catch up with rivals like OpenAI and Anthropic, SpaceX has done a deal to enable purchase of the fast-growing AI coding start-up Cursor. Read more: SpaceX agrees rights to buy AI coding darling Cursor for $60bn
The Godmother of Silicon Valley and her former student want to fix how healthcare gets built
Fail fast, revise, repeat: Esther Wojcicki brings her classroom philosophy to healthcare investing with the launch of Treehub.
Labor, Society & Culture
AI hallucinations found in high-profile Wall Street law firm filing
Sullivan & Cromwell apologises to New York federal judge for string of errors in documents for Prince Group case Business live – latest updates The elite Wall Street law firm Sullivan & Cromwell has told a court that a major filing it made in a high-profile case contained errors resulting from hallucinations generated by artificial intelligence. Andrew Dietderich, the co-head of the firm’s global restructuring group, apologised in a letter to the New York federal judge Martin Glenn on Saturday for the string of mistakes, which included inaccurate citations. Continue reading...
The AI Governance Mirage: Why 72% of Enterprises Don't Have the Control and Security They Think They Do
A survey has found that 72% of enterprises don't have the control and security they think they do when it comes to AI governance.
Anthropic investigates report of rogue access to hack-enabling Mythos AI
‘Handful’ of people allegedly gain unauthorised access to model adept at detecting cybersecurity vulnerabilities Business live – latest updates The AI developer Anthropic has confirmed it is investigating a report that unauthorised users have gained access to its Mythos model, which it has warned poses risks to cybersecurity. The US startup made the statement after Bloomberg reported on Wednesday that a small group of people had accessed the model, which has not been released to the public because of its ability to enable cyber-attacks. Continue reading...
ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System
arXiv:2604.18789v1 Announce Type: new Abstract: Reinforcement Learning from Human Feedback (RLHF) is central to aligning Large Language Models (LLMs), yet it introduces a critical vulnerability: an imperfect Reward Model (RM) can become a single point of failure when it fails to penalize unsafe behaviors. While existing red-teaming approaches primarily target policy-level weaknesses, they overlook what we term systemic weaknesses cases where both the core LLM and the RM fail in tandem. We present ARES, a framework that systematically discovers and mitigates such dual vulnerabilities. ARES employs a ``Safety Mentor'' that dynamically composes semantically coherent adversarial prompts by combining structured component types (topics, personas, tactics, goals) and generates corresponding malicious and safe responses. This dual-targeting approach exposes weaknesses in both the core LLM and the RM simultaneously. Using the vulnerabilities gained, ARES implements a two-stage repair process: first fine-tuning the RM to better detect harmful content, then leveraging the improved RM to optimize the core model. Experiments across multiple adversarial safety benchmarks demonstrate that ARES substantially enhances safety robustness while preserving model capabilities, establishing a new paradigm for comprehensive RLHF safety alignment.
Reasoning Structure Matters for Safety Alignment of Reasoning Models
arXiv:2604.18946v1 Announce Type: new Abstract: Large reasoning models (LRMs) achieve strong performance on complex reasoning tasks but often generate harmful responses to malicious user queries. This paper investigates the underlying cause of these safety risks and shows that the issue lies in the reasoning structure itself. Based on this insight, we claim that effective safety alignment can be achieved by altering the reasoning structure. We propose AltTrain, a simple yet effective post training method that explicitly alters the reasoning structure of LRMs. AltTrain is both practical and generalizable, requiring no complex reinforcement learning (RL) training or reward design, only supervised finetuning (SFT) with a lightweight 1K training examples. Experiments across LRM backbones and model sizes demonstrate strong safety alignment, along with robust generalization across reasoning, QA, summarization, and multilingual setting.
Anthropic probing reported Mythos leak on Discord
Bloomberg reports that users gained access to Mythos the same day Anthropic announced its limited release. Read more: Anthropic probing reported Mythos leak on Discord
Who Benefits from AI? Self-Selection, Skill Gap, and the Hidden Costs of AI Feedback
arXiv:2409.18660v2 Announce Type: replace Abstract: Feedback from artificial intelligence (AI) is increasingly easy to access and research has already established that people learn from it. But individuals choose when and how to seek such feedback, and more engaged and motivated individuals may seek it more, creating an illusion of effectiveness that masks self-selection. We investigate how the endogenous choice to seek AI feedback shapes both individual learning and collective outcomes. Using data from over five years and 52,000 individuals on an online chess platform, we show that motivated and higher-skilled individuals self-select into AI feedback use-and use it more productively. This self-selection creates an illusion of AI effectiveness: apparent learning gains disappear once endogenous motivation is accounted for. This same selection mechanism drives two population-level consequences. Because motivated, higher-skilled individuals benefit disproportionately, AI access widens the skill gap. And because individuals exposed to centralized AI feedback converge on common input from a centralized AI source, intellectual diversity declines. Leveraging 42 platform-level natural experiments, we show this diversity reduction is causal. Self-selection into AI use thus connects individual-level learning dynamics to collective-level consequences-a micro-macro linkage with implications for organizational learning, human capital development, and the design of AI-augmented work.
Students Know AI Should Not Replace Thinking, but How Do They Regulate It? The TACO Framework for Human-AI Cognitive Partnership
arXiv:2604.18737v1 Announce Type: new Abstract: As generative artificial intelligence becomes increasingly embedded in educational practice, a central concern is whether students use AI as cognitive support or as a substitute for thinking. Prior research shows that learners recognise this boundary conceptually and acknowledge that "AI should not replace thinking." However, whether such awareness translates into structured regulation during actual AI use remains unclear. Drawing on data from Hong Kong secondary students, this study examines how learners perceive their management of the boundary between assistance and outsourcing in practice. Findings show that awareness did not consistently translate into regulation; ethical belief did not necessarily lead to strategic execution; and conceptual endorsement did not guarantee operational behaviour. These findings suggest that the challenge is not teaching students that AI should not replace thinking, as they already know this, but providing them with structured mechanisms to regulate how AI is used within learning processes. In response, the study introduces the TACO framework (Think-Ask-Check-Own), a process-oriented model designed to operationalise the boundary between cognitive support and cognitive substitution. By shifting attention from ethical awareness to cognitive regulation, the study contributes a learner-grounded approach to sustaining AI as a dynamic cognitive partner in education.
Towards More Empathic Programming Environments: An Experimental Empathic AI-Enhanced IDE
arXiv:2604.19142v1 Announce Type: cross Abstract: As generative AI becomes integral to software development, the risk of over-reliance and diminished critical thinking grows. This study introduces "Ceci," our Caring Empathic C IDE designed to support novice programmers by prioritizing learning and emotional support over direct code generation. The researchers conducted a comparative pilot study between Ceci and VSCode + ChatGPT [9, 40]. Participants completed a coding task and were evaluated using the NASA-TLX workload assessment and a post-test usability survey. Although the sample size was small (n = 11), results show that there is no significant difference in perceived effectiveness, learning and workload between the Experimental Ceci group and the Control group, though Ceci users reported significantly greater perceived helpfulness in error correction (p = 0.0220). These findings suggest that empathic responses may not be sufficient on their own to enhance the learner's outcomes, perceptions, or reduce workload. Overall, this study provides a foundational framework for future research. Such research should explore larger sample sizes, diverse programming tasks, and additional empathic features to better understand the potential of empathic programming environments in supporting novice programmers; they must also ensure that the empathic features are well-integrated in the user interface.
Critical Thinking in the Age of Artificial Intelligence: A Survey-Based Study with Machine Learning Insights
arXiv:2604.18590v1 Announce Type: cross Abstract: The growing use of artificial intelligence (AI) in education, professional work, and everyday problem-solving has raised important questions about its effect on human reasoning. While AI can improve efficiency, save time, and support learning, repeated dependence on it may also encourage cognitive offloading, reduce productive struggle, and weaken independent critical thinking. This paper investigates the relationship between AI-use behavior and critical-thinking performance through an interview-based survey combined with short logic and reasoning tasks. The findings reveal a mixed pattern: participants largely viewed AI as a tool for speed, convenience, and learning support, yet many also reported reduced patience for sustained effort. Objective reasoning performance varied considerably across individuals, and the analyses suggest that reduced patience and stronger dependence-related tendencies are more closely associated with lower reasoning performance than background characteristics alone. Exploratory clustering further indicates that AI users do not form a single homogeneous group, but instead reflect tentative behavioral profiles, including over-reliant users, mixed-strategy users, and balanced support-seekers. Although the findings are exploratory, they indicate that AI does not affect critical thinking in a uniformly negative or positive way. Instead, its influence appears to depend on the manner in which it is used. The paper therefore argues that effective human-AI collaboration should support reflection, verification, and sustained cognitive effort rather than substitute for them.
Physical and Augmented Reality based Playful Activities for Refresher Training of ASHA Workers in India
arXiv:2604.18959v1 Announce Type: cross Abstract: Recent health surveys in India highlight the alarming child malnutrition levels and lower rates of complete child immunization in many parts of India. Previous researches report that the conventional training pedagogy of the CHWs (Community Healthcare Workers) or the ASHAs (Accredited Social Health Activists) in India is ineffective in enhancing their capacity. Considering that the CHWs are getting equipped with smartphones, it calls for a rethinking of their training pedagogy using the ICT approach. Two refresher training tools were developed to make learning the child immunization schedule more exciting and conceptually engaging for ASHAs. The physical and AR (Augmented Reality) versions of designed card games were compared for effectiveness and knowledge retention, pre, and post-intervention through questionnaire tests conducted immediately before and after playing multiple sessions. The AR-based play was found to be better in learning and knowledge retention with more engagement, mainly due to its interactive and intuitive nature of play.
Technology & Infrastructure
AutomationBench
arXiv:2604.18934v1 Announce Type: new Abstract: Existing AI benchmarks for software automation rarely combine cross-application coordination, autonomous API discovery, and policy adherence. Real business workflows demand all three: a single task may span a CRM, inbox, calendar, and messaging platform - requiring the agent to find the right endpoints, follow a policy document, and write correct data to each system. To address this gap, we introduce AutomationBench, a benchmark for evaluating AI agents on cross-application workflow orchestration via REST APIs. Drawing on real workflow patterns from Zapier's platform, tasks span Sales, Marketing, Operations, Support, Finance, and HR domains. Agents must discover relevant endpoints themselves, follow layered business rules, and navigate environments with irrelevant and sometimes misleading records. Grading is programmatic and end-state only: whether the correct data ended up in the right systems. Even the best frontier models currently score below 10%. AutomationBench provides a challenging, realistic measure of where current models stand relative to the agentic capabilities businesses actually need.
The modern data stack was built for humans asking questions. Google just rebuilt its for agents taking action.
Enterprise data stacks were built for humans running scheduled queries. As AI agents increasingly act autonomously on behalf of businesses around the clock, that architecture is breaking down — and vendors are racing to rebuild it. Google's answer, announced at Cloud Next on Wednesday, is the Agentic Data Cloud. The architecture has three pillars: Knowledge Catalog. Automates semantic metadata curation, inferring business logic from query logs without manual data steward intervention Cross-cloud lakehouse. Lets BigQuery query Iceberg tables on AWS S3 via private network with no egress fees Data Agent Kit. Drops MCP tools into VS Code, Claude Code and Gemini CLI so data engineers describe outcomes rather than write pipelines "The data architecture has to change now," Andi Gutmans, VP and GM of Data Cloud at Google Cloud, told VentureBeat. "We're moving from human scale to agent scale." From system of intelligence to system of action The core premise behind Agentic Data Cloud is that enterprises are moving from human‑scale to agent‑scale operations. Historically, data platforms have been optimized for reporting, dashboarding, and some forecasting — what Google characterizes as “reactive intelligence.” In that model, humans interpret data and decide what to do. Now, with AI agents increasingly expected to take actions directly on behalf of the business, Gutmans argued that data platforms must evolve into systems of action. "We need to make sure that all of enterprise data can be activated with AI, that includes both structured and unstructured data," Gutmans said. "We need to make sure that there's the right level of trust, which also means it's not just about getting access to the data, but really understanding the data." The Knowledge Catalog is Google's answer to that problem. It is an evolution of Dataplex, Google's existing data governance product, with a materially different architecture underneath. Where traditional data catalogs required data stewards to manually label tables, define business terms and build glossaries, the Knowledge Catalog automates that process using agents. The practical implication for data engineering teams is that the Knowledge Catalog scales to the full data estate, not just the curated subset that a small team of data stewards can maintain by hand. The catalog covers BigQuery, Spanner, AlloyDB and Cloud SQL natively, and federates with third-party catalogs including Collibra, Atlan and Datahub. Zero-copy federation extends semantic context from SaaS applications including SAP, Salesforce Data360, ServiceNow and Workday without requiring data movement. Google's lakehouse goes cross cloud Google has had a data lakehouse called BigLake since 2022. Initially it was limited to just Google data, but in recent years has had some limited federation capabilities enabling enterprises to query data found in other locations. Gutmans explained that the previous federation worked through query APIs, which limited the features and optimizations BigQuery could bring to bear on external data. The new approach is storage-based sharing via the open Apache Iceberg format. That means whether the data is in Amazon S3 or in Google Cloud , he argued it doesn't make a difference. "This truly means we can bring all the goodness and all the AI capabilities to those third-party data sets," he said. The practical result is that BigQuery can query Iceberg tables sitting on Amazon S3 via Google's Cross-Cloud Interconnect, a dedicated private networking layer, with no egress fees and price-performance Google says is comparable to native AWS warehouses. All BigQuery AI functions run against that cross-cloud data without modification. Bidirectional federation in preview extends to Databricks Unity Catalog on S3, Snowflake Polaris and the AWS Glue Data Catalog using the open Iceberg REST Catalog standard. From writing pipelines to describing outcomes The Knowledge Catalog and cross-cloud lakehouse solve the data access and context problems. The third pillar addresses what happens when a data engineer actually sits down to build something with all of it. The Data Agent Kit ships as a portable set of skills, MCP tools and IDE extensions that drop into VS Code, Claude Code, Gemini CLI and Codex. It does not introduce a new interface. The architectural shift it enables is a move from what Gutmans called a "prescriptive copilot experience" to intent-driven engineering. Rather than writing a Spark pipeline to move data from source A to destination B, a data engineer describes the outcome — a cleaned dataset ready for model training, a transformation that enforces a governance rule — and the agent selects whether to use BigQuery, the Lightning Engine for Apache Spark or Spanner to execute it, then generates production-ready code. "Customers are kind of sick of building their own pipelines," Gutmans said. "They're truly more in the review kind of mode, than they are in the writing the code mode." Where Google and its rivals diverge The premise that agents require semantic context, not just data access, is shared across the market. Databricks has Unity Catalog, which provides governance and a semantic layer across its lakehouse. Snowflake has Cortex, its AI and semantic layer offering. Microsoft Fabric includes a semantic model layer built for business intelligence and, increasingly, agent grounding. The dispute is not over whether semantics matter — everyone agrees they do. The dispute is over who builds and maintains them. "Our goal is just to get all the semantics you can get," he explained, noting that Google will federate with third-party semantic models rather than require customers to start over. Google is also positioning openness as a differentiator, with bidirectional federation into Databricks Unity Catalog and Snowflake Polaris via the open Iceberg REST Catalog standard. What this means for enterprises Google's argument — and one echoed across the data infrastructure market — is that enterprises are behind on three fronts: Semantic context is becoming infrastructure. If your data catalog is still manually curated, it will not scale to agent workloads — and Gutmans argues that gap will only widen as agent query volumes increase. Cross-cloud egress costs are a hidden tax on agentic AI. Storage-based federation via open Iceberg standards is emerging as the architectural answer across Google, Databricks and Snowflake. Enterprises locked into proprietary federation approaches should be stress-testing those costs at agent-scale query volumes. Gutmans argues the pipeline-writing era is ending. Data engineers who move toward outcome-based orchestration now will have a significant head start.
Google’s new Deep Research and Deep Research Max agents can search the web and your private data
Google on Monday unveiled the most significant upgrade to its autonomous research agent capabilities since the product's debut, launching two new agents — Deep Research and Deep Research Max — that for the first time allow developers to fuse open web data with proprietary enterprise information through a single API call, produce native charts and infographics inside research reports, and connect to arbitrary third-party data sources through the Model Context Protocol (MCP). The release, built on Google's Gemini 3.1 Pro model, marks an inflection point in the rapidly intensifying race to build AI systems that can autonomously conduct the kind of exhaustive, multi-source research that has traditionally consumed hours or days of human analyst time. It also represents Google's clearest bid yet to position its AI infrastructure as the backbone for enterprise research workflows in finance, life sciences, and market intelligence — industries where the stakes of getting information wrong are extraordinarily high. "We are launching two powerful updates to Deep Research in the Gemini API, now with better quality, MCP support, and native chart/infographics generation," Google CEO Sundar Pichai wrote on X. "Use Deep Research when you want speed and efficiency, and use Max when you want the highest quality context gathering & synthesis using extended test-time compute — achieving 93.3% on DeepSearchQA and 54.6% on HLE." Both agents are available starting today in public preview via paid tiers of the Gemini API, accessible through the Interactions API that Google first introduced in December 2025. Why Google built two research agents instead of one The launch introduces a tiered architecture that reflects a fundamental tension in AI agent design: the tradeoff between speed and thoroughness. Deep Research, the standard tier, replaces the preview agent Google released in December and is optimized for low-latency, interactive use cases. It delivers what Google describes as significantly reduced latency and cost at higher quality levels compared to its predecessor. The company positions it as ideal for applications where a developer wants to embed research capabilities directly into a user-facing interface — think a financial dashboard that can answer complex analytical questions in near-real time. Deep Research Max occupies the opposite end of the spectrum. It leverages extended test-time compute — a technique where the model spends more computational cycles iteratively reasoning, searching, and refining its output before delivering a final report. Google designed it for asynchronous, background workflows: the kind of task where an analyst team kicks off a batch of due diligence reports before leaving the office and expects exhaustive, fully sourced analyses waiting for them the next morning. The Google DeepMind team framed the distinction on X: "Deep Research: Optimized for speed and efficiency. Perfect for interactive apps needing quicker responses. Deep Research Max: It uses extra time to search and reason. Ideal for exhaustive context gathering and tasks happening in the background." "Deep Research was our first hosted agent in the API and has gained a ton of traction over the last 3 months, very excited for folks to test out the new agents and all the improvements, this is just the start of our agents journey," Logan Kilpatrick, who leads developer relations for Google's AI efforts, wrote on X. MCP support lets the agents tap into private enterprise data for the first time Perhaps the most consequential feature in today's release is the addition of Model Context Protocol support, which transforms Deep Research from a sophisticated web research tool into something more closely resembling a universal data analyst. MCP , an emerging open standard for connecting AI models to external data sources, allows Deep Research to securely query private databases, internal document repositories, and specialized third-party data services — all without requiring sensitive information to leave its source environment. In practical terms, this means a hedge fund could point Deep Research at its internal deal-flow database and a financial data terminal simultaneously, then ask the agent to synthesize insights from both alongside publicly available information from the web. Google disclosed that it is actively collaborating with FactSet, S&P, and PitchBook on their MCP server designs, a signal that the company is pursuing deep integration with the data providers that Wall Street and the broader financial services industry already rely on daily. The goal, according to the blog post authored by Google DeepMind product managers Lukas Haas and Srinivas Tadepalli, is to "let shared customers integrate financial data offerings into workflows powered by Deep Research, and to enable them to realize a leap in productivity by gathering context using their exhaustive data universes at lightning speed." This addresses one of the most persistent pain points in enterprise AI adoption: the gap between what a model can find on the open internet and what an organization actually needs to make decisions. Until now, bridging that gap required significant custom engineering. MCP support, combined with Deep Research's autonomous browsing and reasoning capabilities, collapses much of that complexity into a configuration step. Developers can now run Deep Research with Google Search, remote MCP servers, URL Context, Code Execution, and File Search simultaneously — or turn off web access entirely to search exclusively over custom data. The system also accepts multimodal inputs including PDFs, CSVs, images, audio, and video as grounding context. Native charts and infographics turn AI reports into stakeholder-ready deliverables The second headline feature — native chart and infographic generation — may sound incremental, but it addresses a practical limitation that has constrained the usefulness of AI-generated research outputs in professional settings. Previous versions of Deep Research produced text-only reports. Users who needed visualizations had to export the data and build charts themselves, a friction point that undermined the promise of end-to-end automation. The new agents generate high-quality charts and infographics inline within their reports, rendered in HTML or Google's Nano Banana format, dynamically visualizing complex datasets as part of the analytical narrative. "The agent generates HTML charts and infographics inline with the report. Not screenshots. Not suggestions to 'visualize this data.' Actual rendered charts inside the markdown output," noted AI commentator Shruti Mishra on X, capturing the practical significance of the change. For enterprise users — particularly those in finance and consulting who need to produce stakeholder-ready deliverables — this transforms Deep Research from a tool that accelerates the research phase into one that can potentially produce near-final analytical products. Combined with a new collaborative planning feature that lets users review, guide, and refine the agent's research plan before execution, and real-time streaming of intermediate reasoning steps, the system gives developers granular control over the investigation's scope while maintaining the transparency that regulated industries demand. How Deep Research evolved from a consumer chatbot feature to enterprise platform infrastructure Today's release crystallizes a strategic narrative Google has been building for months: Deep Research is not merely a consumer feature but a piece of infrastructure that powers multiple Google products and is now being offered to external developers as a platform. The blog post explicitly notes that when developers build with the Deep Research agent, they tap into "the same autonomous research infrastructure that powers research capabilities within some of Google's most popular products like Gemini App, NotebookLM, Google Search and Google Finance." This suggests that the agent available through the API is not a stripped-down version of what Google uses internally but the same system, offered at platform scale. The journey to this point has been remarkably rapid. Google first introduced Deep Research as a consumer feature in the Gemini app in December 2024, initially powered by Gemini 1.5 Pro. At the time, the company described it as a personal AI research assistant that could save users hours by synthesizing web information in minutes. By March 2025, Google upgraded Deep Research with Gemini 2.0 Flash Thinking Experimental and made it available for anyone to try. Then came the upgrade to Gemini 2.5 Pro Experimental, where Google reported that raters preferred its reports over competing deep research providers by more than a 2-to-1 margin. The December 2025 release was the pivot to developer access, when Google launched the Interactions API and made Deep Research available programmatically for the first time, powered by Gemini 3 Pro and accompanied by the open-source DeepSearchQA benchmark. The underlying model driving today's improvements is Gemini 3.1 Pro, which Google released on February 19, 2026. That model represented a significant leap in core reasoning: on ARC-AGI-2, a benchmark evaluating a model's ability to solve novel logic patterns, 3.1 Pro scored 77.1% — more than double the performance of Gemini 3 Pro. Deep Research Max inherits that reasoning foundation and layers autonomous research behaviors on top of it, achieving 93.3% on DeepSearchQA (up from 66.1% in December) and 54.6% on Humanity's Last Exam (up from 46.4%). Google faces a crowded field of competitors building autonomous research agents Google is not operating in a vacuum. The launch arrives amid intensifying competition in the autonomous research agent space. OpenAI has been developing its own agent capabilities within ChatGPT under the codename Hermes, which includes an agent builder, templates, scheduling, and Slack integration, according to reports circulating on social media. Perplexity has built its business around AI-powered research. And a growing ecosystem of startups is attacking various slices of the automated research workflow. What distinguishes Google's approach is the combination of its search infrastructure — which gives Deep Research access to the broadest and most current index of web information available — with the MCP-based connectivity to enterprise data sources. No other company currently offers a research agent that can simultaneously query the open web at Google Search's scale and navigate proprietary data repositories through a standardized protocol. The pricing structure also signals Google's intent to drive adoption: according to Sim.ai, which tracks model pricing, the Deep Research agent in the December preview was priced at $2 per million input tokens and $2 per million output tokens with a 1 million token context window — positioning it as cost-competitive for the volume of research output it generates. Not everyone greeted the announcement with unalloyed enthusiasm, however. Several users on X noted that the new agents are available only through the API, not in the Gemini consumer app. "Not on Gemini app," observed TestingCatalog News, while another user wrote, "Google keeps punishing Gemini App Pro subscribers for some reason." Others raised concerns about the presentation of benchmark results, with one user arguing that Google's charts could be "misleading" in how they represent percentage improvements. These complaints point to a broader tension in Google's AI strategy: the company is increasingly directing its most advanced capabilities toward developers and enterprise customers who access them through APIs, while consumer-facing products sometimes lag behind. What Deep Research Max means for finance, biotech, and the future of knowledge work The practical implications of today's launch are most immediately felt in industries that depend on exhaustive, multi-source research as a core business function. In financial services, where analysts routinely spend hours assembling due diligence reports from scattered sources — SEC filings, earnings transcripts, market data terminals, internal deal memos — Deep Research Max offers the possibility of automating the initial research phase entirely. The FactSet, S&P, and PitchBook partnerships suggest Google is serious about making this work with the data infrastructure that financial professionals already use. In life sciences, the blog post notes that Google has collaborated with Axiom Bio, which builds AI systems to predict drug toxicity, and found that Deep Research unlocked new levels of initial research depth across biomedical literature. In market research and consulting, the ability to produce stakeholder-ready reports with embedded visualizations and granular citations could compress project timelines from days to hours. The key question is whether the quality and reliability of these automated outputs will meet the standards that professionals in these fields demand. Google's benchmark numbers are impressive, but benchmarks measure performance on standardized tasks — real-world research is messier, more ambiguous, and often requires the kind of judgment that remains difficult to automate. Deep Research and Deep Research Max are available now in public preview via paid tiers of the Gemini API, with availability on Google Cloud for startups and enterprises coming soon. Eighteen months ago, Deep Research was a feature that helped grad students avoid drowning in browser tabs. Today, Google is betting it can replace the first shift at an investment bank. The distance between those two ambitions — and whether the technology can actually close it — will define whether autonomous research agents become a transformative category of enterprise software or just another AI demo that dazzles on benchmarks and disappoints in the conference room.
Google claims to have all the answers for enterprise AI agent sprawl
As biz agentic bot-wrangling intensifies, company says AI orchestration, security and infrastructure tools on the way Google Cloud Next Google has overhauled its enterprise AI strategy in the wake of the agentic push across the biz landscape, rebranding and expanding its Vertex AI developer platform into what it now calls the Gemini Enterprise Agent Platform.…
Realm Raises $4.5M to Bring the ‘Cursor Moment’ to Enterprise Sales
HELSINKI, April 22, 2026 /PRNewswire/ — Realm has raised a $4.5 million Seed round to speed up enterprise sales cycles. Its platform gives AI the structured context needed to automate deal-defining materials like RFP responses. The round was led by Frontline Ventures, with participation from HubSpot Ventures, Slack Co-founder Cal Henderson and Deel Co-founder Alex Bouaziz. Realm […]
Realm raises €3.8 million to bring AI agents into enterprise sales, plans to triple its team by year-end
Realm, a Helsinki-based startup that builds a structured understanding of a company’s go-to-market and turns it into execution, has raised a €3.8 million ($4.5 million) Seed round to speed up enterprise sales cycles. The round was led by Frontline Ventures, with participation from HubSpot Ventures, Slack co-founder Cal Henderson and Deel co-founder Alex Bouaziz. “Tools […]
Human-Guided Harm Recovery for Computer Use Agents
arXiv:2604.18847v1 Announce Type: new Abstract: As LM agents gain the ability to execute actions on real computer systems, we need ways to not only prevent harmful actions at scale but also effectively remediate harm when prevention fails. We formalize a solution to this neglected challenge in post-execution safeguards as harm recovery: the problem of optimally steering an agent from a harmful state back to a safe one in alignment with human preferences. We ground preference-aligned recovery through a formative user study that identifies valued recovery dimensions and produces a natural language rubric. Our dataset of 1,150 pairwise judgments reveals context-dependent shifts in attribute importance, such as preferences for pragmatic, targeted strategies over comprehensive long-term approaches. We operationalize these learned insights in a reward model, re-ranking multiple candidate recovery plans generated by an agent scaffold at test time. To evaluate recovery capabilities systematically, we introduce BackBench, a benchmark of 50 computer-use tasks that test an agent's ability to recover from harmful states. Human evaluation shows our reward model scaffold yields higher-quality recovery trajectories than base agents and rubric-based scaffolds. Together, these contributions lay the foundation for a new class of agent safety methods -- ones that confront harm not only by preventing it, but by navigating its aftermath with alignment and intent.
Anthropic and Amazon agree $100bn AI infrastructure deal
Start-up behind Claude tool seeks to bulk up on chips and computing power after suffering outages this year
Google Cloud Releases New TPU Chip Lineup in Bid to Speed Up AI
Alphabet Inc.’s Google Cloud division unveiled the latest generation of its tensor processing unit, or TPU, a homegrown chip that’s designed to make AI computing services faster and more efficient.
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution
arXiv:2604.18982v1 Announce Type: new Abstract: Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such agents via reinforcement learning requires solving the credit assignment problem: determining how individual utterances contribute to multi-turn dialogue outcomes. Existing approaches directly employ language models to distribute episode-level rewards, yielding attributions that are retrospective and lack theoretical grounding. We propose SAVOIR (ShApley Value fOr SocIal RL), a novel principled framework grounded in cooperative game theory. Our approach combines two complementary principles: expected utility shifts evaluation from retrospective attribution to prospective valuation, capturing an utterance's strategic potential for enabling favorable future trajectories; Shapley values ensure fair credit distribution with axiomatic guarantees of efficiency, symmetry, and marginality. Experiments on the SOTOPIA benchmark demonstrate that SAVOIR achieves new state-of-the-art performance across all evaluation settings, with our 7B model matching or exceeding proprietary models including GPT-4o and Claude-3.5-Sonnet. Notably, even large reasoning models consistently underperform, suggesting social intelligence requires qualitatively different capabilities than analytical reasoning.
Understanding the Mechanism of Altruism in Large Language Models
arXiv:2604.19260v1 Announce Type: new Abstract: Altruism is fundamental to human societies, fostering cooperation and social cohesion. Recent studies suggest that large language models (LLMs) can display human-like prosocial behavior, but the internal computations that produce such behavior remain poorly understood. We investigate the mechanisms underlying LLM altruism using sparse autoencoders (SAEs). In a standard Dictator Game, minimal-pair prompts that differ only in social stance (generous versus selfish) induce large, economically meaningful shifts in allocations. Leveraging this contrast, we identify a set of SAE features (0.024% of all features across the model's layers) whose activations are strongly associated with the behavioral shift. To interpret these features, we use benchmark tasks motivated by dual-process theories to classify a subset as primarily heuristic (System 1) or primarily deliberative (System 2). Causal interventions validate their functional role: activation patching and continuous steering of this feature direction reliably shift allocation distributions, with System 2 features exerting a more proximal influence on the model's final output than System 1 features. The same steering direction generalizes across multiple social-preference games. Together, these results enhance our understanding of artificial cognition by translating altruistic behaviors into identifiable network states and provide a framework for aligning LLM behavior with human values, thereby informing more transparent and value-aligned deployment.
From Natural Language to Executable Narsese: A Neuro-Symbolic Benchmark and Pipeline for Reasoning with NARS
arXiv:2604.18873v1 Announce Type: new Abstract: Large language models (LLMs) are highly capable at language generation, but they remain unreliable when reasoning requires explicit symbolic structure, multi-step inference, and interpretable uncertainty. This paper presents a neuro-symbolic framework for translating natural-language reasoning problems into executable formal representations using first-order logic (FOL) and Narsese, the language of the Non-Axiomatic Reasoning System (NARS). To support this direction, we introduce NARS-Reasoning-v0.1, a benchmark of natural-language reasoning problems paired with FOL forms, executable Narsese programs, and three gold labels: True, False, and Uncertain. We develop a deterministic compilation pipeline from FOL to executable Narsese and validate retained examples through runtime execution in OpenNARS for Applications (ONA), ensuring that the symbolic targets are not only syntactically well formed but also behaviorally aligned with the intended answer. We further present Language-Structured Perception (LSP), a formulation in which an LLM is trained to produce reasoning-relevant symbolic structure rather than only a final verbal response. As an initial proof of concept, we also train and release a Phi-2 LoRA adapter on NARS-Reasoning-v0.1 for three-label reasoning classification, showing that the benchmark can support supervised adaptation in addition to executable evaluation. Overall, the paper positions executable symbolic generation and execution-based validation as a practical path toward more reliable neuro-symbolic reasoning systems.
DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning
arXiv:2604.18964v1 Announce Type: new Abstract: This paper introduces DW-Bench, a new benchmark that evaluates large language models (LLMs) on graph-topology reasoning over data warehouse schemas, explicitly integrating both foreign-key (FK) and data-lineage edges. The benchmark comprises 1,046 automatically generated, verifiably correct questions across five schemas. Experiments show that tool-augmented methods substantially outperform static approaches but plateau on hard compositional subtypes.
OpenAI's ChatGPT Images 2.0 is here and it does multilingual text, full infographics, slides, maps, even manga — seemingly flawlessly
It's been only a few months since OpenAI released its last big improvement to AI image generations in ChatGPT and through its application programming interface (API) — namely, a new image generation model known as GPT-Image-1.5, released in December 2025, which brought about improved instruction following, colors, and lighting. Now, after weeks of testing, the company that kicked off the generative AI boom is unveiling a far more dramatic and even more impressive update: ChatGPT Images 2.0, which has been available not-so-secretly for several weeks on LM Arena AI, a third-party testing platform used by OpenAI and other major AI model providers to get early feedback, under the name "duct tape." Throughout that time, it's already blown early users' minds with its capacity to generate long blocks of text or disparate text panels within the same image, its insanely realistic generation of user interfaces and screenshots from popular websites and platforms, its reproduction of real life figures like OpenAI co-founder and CEO Sam Altman, and its ability to perform web research and put the results into the image itself. Now today, it's officially rolling out to ChatGPT users on all tiers, and OpenAI confirms it can also produce floor plans, image grids and sets of many smaller images, and character models from multiple angles, and apply almost all of these features to user-uploaded imagery as well. The update, which encompasses the new gpt-image-2 model for API users and a suite of "Thinking" features for ChatGPT subscribers, represents a fundamental shift in how the company views visual media. As the official release notes state, "Images are a language, not decoration. A good image does what a good sentence does—it selects, arranges, and reveals". OpenAI did not release benchmarks to us ahead of time on ChatGPT Images 2.0, but it is safe to say the model is performing at the "state-of-the-art" based on all the outputs I've seen. The move comes as the AI image model space has seen increasing competition, especially with the release of Google's Nano Banana 2 image generation model (also known as Gemini 3 Pro Image or Gemini 3.1 Pro Image) in February 2026, which also offered dense text options "baked into" images similar to ChatGPT Images 2.0. But the latter's fidelity in reproducing user interfaces, screenshots, and multiple image packs at once seem to exceed even Google's latest image model's capabilities in my brief testing and anecdotal usage and observation of other users' images. OpenAI spokespersons and researchers re-iterated the company's commitments to safety and tagging its image outputs with metadata as AI generated in the face of rising reports — including one recently from The New York Times — on AI user-generated characters (AI UGC) being used as the seed for realistic AI videos posted en masse on social media as part of political influence campaigns, including showing support for historically unpopular U.S. President Donald J. Trump with an army of fictitious people masquerading as "real Americans." When VentureBeat asked in a closed press briefing directly about this story and GPT Images 2.0's potential for usage in deceptive campaigning or advertising/influence campaigns Adele Li, OpenAI's Product Lead for ChatGPT Images, responded: "We take safety and security incredibly seriously. That includes anything when it comes to political or election interference. And so while other platforms and companies may not have those safeguards, ChatGPT does, and we take monitoring and protection of our users, as well as the influence that our photos as they are created, incredibly seriously..in the last couple years, we've seen a lot more new entrants into the image generation space with different standards and philosophies as ChatGPT, but we've stayed steady through all that, and we're really proud of releasing this model as it relates to advanced capabilities, but doing so in a safe and protected way." OpenAI has also confirmed that it is deprecating GPT-Image-1.5 as the default model across its suite, though it will remain accessible via the API for legacy support. This transition signals OpenAI's confidence that the 2.0 model is a superior replacement for both casual and high-value creative tasks. The reasoning era of AI image generation The most significant technical advancement in Images 2.0 is the integration of OpenAI’s "O-series" reasoning capabilities. Historically, image models have operated as black boxes: you provide a prompt, and a single output is generated. Images 2.0 introduces an "agentic" approach. When a user selects a "Thinking" model within ChatGPT, the system no longer simply "draws"; it researches, plans, and reasons through the structure of an image before the first pixel is rendered. During a live press briefing, Li demonstrated this reasoning by uploading a complex PowerPoint file regarding internal product strategies. Rather than merely creating a related image, the model synthesized the document's core data, identified the correct logos, and produced a professional poster that preserved the specific stylistic inputs of the original file. In my brief testing — I was given access last night and tested it on a few generations this morning — ChatGPT Images 2.0 is the first image model from OpenAI and one of only two (Nano Banana 2 being the other) that can seemingly accurately reproduce a map of the extent of the Aztec, Maya, and Inca empires at their respective heights along with a fully legible legend, making it useful for educational or internal training purposes on global knowledge and geography. This reasoning capability also allows the model to search the web in real-time to ensure visual accuracy for current events or specific technical artifacts. This is supported by a significantly more recent knowledge cutoff of December 2025, a major leap from previous iterations that struggled with modern context. The underlying architecture has been "revamped from scratch," according to Research Lead Boyuan Chen. While Chen declined to confirm if the model uses a traditional diffusion or auto-regressive technique, he described it as a "generalist model" or a "GPT for images" that can handle 3D-style perspective shifts and complex spatial reasoning through simple text prompts. Precision, multilingual support and a "wow" factor The product experience for Images 2.0 is defined by three major pillars: typography, linguistic diversity, and sequential consistency. One of the most persistent "tells" of AI-generated imagery has been the inability to render legible text. OpenAI claims Images 2.0 marks a "step change" in this department. The model is now capable of producing readable typography even in dense compositions, such as scientific diagrams, menus, or infographic posters. A look at the provided "Magazine Cover" sample (Open Scifi) illustrates this precision: every headline, volume number, and even the "Display until" date on the barcode is rendered with crisp, professional alignment that mirrors human-designed layouts. This capability extends into the "Thinking" mode, where the model can even generate three-page educational visuals—complete with quizzes—that maintain a consistent instructional flow. OpenAI has also addressed a long-standing Western bias in AI imagery. Images 2.0 is described as a "polyglot" model with significant gains in non-Latin script rendering. Specifically, the model now supports high-fidelity text generation in Japanese, Korean, Chinese, Hindi, and Bengali. In the "Global Language" diagram provided, which explains the water cycle, the model successfully renders complex Korean characters (Hangul) within an educational layout. The text is not just translated; it is "rendered correctly but with language that flows coherently," ensuring that labels and explanations feel natively integrated into the design. For creators working on storyboards or brand campaigns, the most impactful new feature is the ability to generate up to eight distinct images from a single prompt. Crucially, these images maintain "character and object continuity" across the series. Li noted that this solves a "cumbersome" workflow where users previously had to prompt one image at a time and manually stitch them together. This feature enables the creation of entire manga sequences, children's books, or a family of social media graphics that share the same visual DNA. Licensing and availability OpenAI’s rollout strategy reflects a clear push toward professional and enterprise adoption. While the base model is available to all users—including those on the free tier—the advanced "Thinking" and "Pro" capabilities are reserved for paid tiers. Free Users: Have access to the base ImageGen 2.0 model for standard tasks. Plus and Pro Users: Can access "Thinking" capabilities, which include tool use, web search, and multi-image generation. Pro Users: Receive additional access to "ImageGen Pro" models for more advanced image generation. API Developers: Can integrate gpt-image-2, which supports resolutions up to 4K (currently in beta) and flexible aspect ratios ranging from a wide 3:1 to a tall 1:3. Pricing in the API is as follows, echoing GPT-Image-1.5, the predecessor model, but actually shaving off $2 on the output side: Image $8.00 for inputs $2.00 for cached inputs $30.00 for outputs Text $5.00 for inputs $1.25 for cached inputs $10.00 for outputs What is clear so far is that OpenAI is describing three practical layers of access, even if it has not published a precise tier-by-tier matrix. The baseline is ChatGPT Images 2.0, which OpenAI's blog post states is available to all ChatGPT and Codex users and includes the core model improvements: better instruction following, stronger text rendering, multilingual gains, broader aspect ratios, and more polished, production-usable outputs. Above that is “thinking”, which the release defines more concretely: when a thinking model is selected, the system can take more time, use the web, analyze uploaded materials, reason through layout before generating, and produce multiple distinct images at once, including up to eight coherent outputs with continuity. In the briefing, Li also framed thinking and Pro as “juiced-up” versions of the base model with tool use, and said these advanced modes are slower, not faster, because they do more reasoning and search behind the scenes. What remains unclear is the exact feature boundary between Thinking and Pro. The materials say Pro users get access to more advanced image generation, but they do not spell out whether that means higher quality, higher limits, higher resolution, more outputs, or some other advantage distinct from thinking itself. For enterprise users, the safest way to think about the differences is not as three totally separate products, but as a spectrum from fast default generation to slower, more agentic, more structured generation. If a team needs quick creative drafts, marketing concepts, simple graphics, or everyday image edits, the base Images 2.0 model appears to be the relevant default. If the task involves factual grounding, transforming internal documents into explainers, creating multi-image sets, or maintaining consistency across a sequence of assets, the more important distinction is whether the organization has access to thinking-enabled outputs. Until OpenAI provides a clearer Pro-versus-Thinking breakdown, enterprise buyers should treat “thinking” as the meaningful functional upgrade and treat “Pro” as a possibly higher-end access tier whose exact incremental benefits still need clarification before procurement or workflow planning. Safety standards OpenAI’s says ChatGPT Images 2.0 offers a"multi-layered stack" of safety protocols, including: Provenance: Adhering to industry standards for watermarking so that AI-generated images are identifiable. Model Safeguards: Using advanced perception models to filter out harmful or abusive content for both adults and children. Active Monitoring: Enforcing user policies through real-time reporting. Li emphasized that while their philosophy is to "maximize user creativity," they maintain strict policies against election interference. What it means for enterprise users The shift from Images 1.5 to 2.0 is more than a resolution bump. By integrating reasoning, OpenAI is attempting to solve the "intent gap" that has plagued AI art since its inception. When you ask an AI for an "infographic about supply and demand," you aren't just looking for a picture; you are looking for a logical layout of information. The "Interior Design" sample (Japandi Furnishing Concept) highlights this systemic thinking. The model didn't just generate a room; it created a cohesive floor plan, a color palette, a list of materials, and "inspiration" shots that all adhere to a singular aesthetic. This is what OpenAI calls moving from a "tool" to a "visual system". However, this increased capability comes with a trade-off in speed. For the professional user, this is likely a worthwhile exchange: waiting an extra minute for a "production-ready asset" is still significantly faster than the hours required for manual design. As ChatGPT Images 2.0 rolls out, it marks the beginning of an era where AI doesn't just assist in making art, but in conducting "economically valuable creative tasks". Whether it can truly replace the intentionality of a human designer remains to be seen, but with 2K resolution, multilingual fluency, and the ability to "think" before it acts, OpenAI has certainly closed the distance.
Error-free Training for MedMNIST Datasets
arXiv:2604.18916v1 Announce Type: new Abstract: In this paper, we introduce a new concept called Artificial Special Intelligence by which Machine Learning models for the classification problem can be trained error-free, thus acquiring the capability of not making repeated mistakes. The method is applied to 18 MedMNIST biomedical datasets. Except for three datasets, which suffer from the double-labeling problem, all are trained to perfection.
Beyond One Output: Visualizing and Comparing Distributions of Language Model Generations
arXiv:2604.18724v1 Announce Type: new Abstract: Users typically interact with and evaluate language models via single outputs, but each output is just one sample from a broad distribution of possible completions. This interaction hides distributional structure such as modes, uncommon edge cases, and sensitivity to small prompt changes, leading users to over-generalize from anecdotes when iterating on prompts for open-ended tasks. Informed by a formative study with researchers who use LMs (n=13) examining when stochasticity matters in practice, how they reason about distributions over language, and where current workflows break down, we introduce GROVE. GROVE is an interactive visualization that represents multiple LM generations as overlapping paths through a text graph, revealing shared structure, branching points, and clusters while preserving access to raw outputs. We evaluate across three crowdsourced user studies (N=47, 44, and 40 participants) targeting complementary distributional tasks. Our results support a hybrid workflow: graph summaries improve structural judgments such as assessing diversity, while direct output inspection remains stronger for detail-oriented questions.
Insurers move to cap cyber payouts related to AI and ‘LLMjacking’
Beazley and QBE are among the groups proposing to limit losses from the rapidly advancing technology
How Adversarial Environments Mislead Agentic AI?
arXiv:2604.18874v1 Announce Type: new Abstract: Tool-integrated agents are deployed on the premise that external tools ground their outputs in reality. Yet this very reliance creates a critical attack surface. Current evaluations benchmark capability in benign settings, asking "can the agent use tools correctly" but never "what if the tools lie". We identify this Trust Gap: agents are evaluated for performance, not for skepticism. We formalize this vulnerability as Adversarial Environmental Injection (AEI), a threat model where adversaries compromise tool outputs to deceive agents. AEI constitutes environmental deception: constructing a "fake world" of poisoned search results and fabricated reference networks around unsuspecting agents. We operationalize this via POTEMKIN, a Model Context Protocol (MCP)-compatible harness for plug-and-play robustness testing. We identify two orthogonal attack surfaces: The Illusion (breadth attacks) poison retrieval to induce epistemic drift toward false beliefs, while The Maze (depth attacks) exploit structural traps to cause policy collapse into infinite loops. Across 11,000+ runs on five frontier agents, we find a stark robustness gap: resistance to one attack often increases vulnerability to the other, demonstrating that epistemic and navigational robustness are distinct capabilities.
Anthropic investigating unauthorised access of powerful Mythos AI model
Start-up has limited the release of the new tool because of concerns about its hacking abilities
Mythos found 271 Firefox flaws – but none a human couldn’t spot
Mozilla CTO says AI means developers finally have a chance to get on top of security The Mozilla Foundation has revealed it tested Anthropic’s bug-finding “Mythos” AI model and feels the results it experienced represent a watershed moment for software defenders.…
Google unleashes even more AI security agents to fight the baddies
Along with a bunch of new services to make sure those same agents don't cause chaos Google Cloud chief operating officer Francis deSouza has summed up his company's security strategy du jour as follows: "You need to use AI to fight AI."…
Anthropic investigating claim of unauthorised access to Mythos AI tool
The AI company has said the model is too dangerous to release publicly because of its hacking capabilities.
Airbus to Buy French Cybersecurity Company Quarkslab
The investment is part of the company’s strategy to develop sovereign cybersecurity capabilities in France and boost its position in the wider European cybersecurity sector.
Adoption, Deployment & Impact
Vista Strikes Deal to Speed Up Google AI in Software Portfolio
Vista Equity Partners is partnering with Alphabet Inc.’s Google Cloud to accelerate the deployment of artificial intelligence across the private equity firm’s portfolio of more than 90 software firms.
Elite law firm Sullivan & Cromwell admits to AI ‘hallucinations’
Firm whose partners bill more than $2,000 per hour apologises to judge for software-driven errors in bankruptcy case
AI needs a strong data fabric to deliver business value
Artificial intelligence is moving quickly in the enterprise, from experimentation to everyday use. Organizations are deploying copilots, agents, and predictive systems across finance, supply chains, human resources, and customer operations. By the end of 2025, half of companies used AI in at least three business functions, according to a recent survey. But as AI becomes…
At $5 billion startup Checkr new employees build an app using AI during onboarding—even the new CFO
Checkr hires ZipRecruiter veteran Tim Yarbrough as its new CFO.
OpenAI in talks to commit up to $1.5bn to private equity joint venture
Start-up backing new company intended to help deploy AI within businesses owned by PE firms
Formally Verified Patent Analysis via Dependent Type Theory: Machine-Checkable Certificates from a Hybrid AI + Lean 4 Pipeline
arXiv:2604.18882v1 Announce Type: new Abstract: We present a formally verified framework for patent analysis as a hybrid AI + Lean 4 pipeline. The DAG-coverage core (Algorithm 1b) is fully machine-verified once bounded match scores are fixed. Freedom-to-operate, claim-construction sensitivity, cross-claim consistency, and doctrine-of-equivalents analyses are formalized at the specification level with kernel-checked candidate certificates. Existing patent-analysis approaches rely on manual expert analysis (slow, non-scalable) or ML/NLP methods (probabilistic, opaque, non-compositional). To our knowledge, this is the first framework that applies interactive theorem proving based on dependent type theory to intellectual property analysis. Claims are encoded as DAGs in Lean 4, match strengths as elements of a verified complete lattice, and confidence scores propagate through dependencies via proven-correct monotone functions. We formalize five IP use cases (patent-to-product mapping, freedom-to-operate, claim construction sensitivity, cross-claim consistency, doctrine of equivalents) via six algorithms. Structural lemmas, the coverage-core generator, and the closed-path identity coverage = W_cov are machine-verified in Lean 4. Higher-level theorems for the other use cases remain informal proof sketches, and their proof-generation functions are architecturally mitigated (untrusted generators whose outputs are kernel-checked and sorry-free axiom-audited). Guarantees are conditional on the ML layer: they certify mathematical correctness of computations downstream of ML scores, not the accuracy of the scores themselves. A case study on a synthetic memory-module claim demonstrates weighted coverage and construction-sensitivity analysis. Validation against adjudicated cases is future work.
Auditing LLMs for Algorithmic Fairness in Casenote-Augmented Tabular Prediction
arXiv:2604.19204v1 Announce Type: new Abstract: LLMs are increasingly being considered for prediction tasks in high-stakes social service settings, but their algorithmic fairness properties in this context are poorly understood. In this short technical report, we audit the algorithmic fairness of LLM-based tabular classification on a real housing placement prediction task, augmented with street outreach casenotes from a nonprofit partner. We audit multi-class classification error disparities. We find that a fine-tuned model augmented with casenote summaries can improve accuracy while reducing algorithmic fairness disparities. We experiment with variable importance improvements to zero-shot tabular classification and find mixed results on resulting algorithmic fairness. Overall, given historical inequities in housing placement, it is crucial to audit LLM use. We find that leveraging LLMs to augment tabular classification with casenote summaries can safely leverage additional text information at low implementation burden. The outreach casenotes are fairly short and heavily redacted. Our assessment is that LLM zero-shot classification does not introduce additional textual biases beyond algorithmic biases in tabular classification. Combining fine-tuning and leveraging casenote summaries can improve accuracy and algorithmic fairness.
AI Startup Has Helped Reverse Thousands of Denied Health Insurance Claims
Americans rarely fight back when insurers reject treatments their doctors have prescribed. Claimable is working to change that, with a little help from Mark Cuban.
Database world trying to build natural language query systems again – this time with LLMs
Text-to-SQL might be useful for analysts and DBAs, but be cautious with general user adoption Over the past few years, database and analytics vendors have hopped on a bandwagon that may take us all to a destination where common data queries are free from the constraints of the specialist query language SQL.…
Quantum inspired qubit qutrit neural networks for real time financial forecasting
arXiv:2604.18838v1 Announce Type: new Abstract: This research investigates the performance and efficacy of machine learning models in stock prediction, comparing Artificial Neural Networks (ANNs), Quantum Qubit-based Neural Networks (QQBNs), and Quantum Qutrit-based Neural Networks (QQTNs). By outlining methodologies, architectures, and training procedures, the study highlights significant differences in training times and performance metrics across models. While all models demonstrate robust accuracies above 70%, the Quantum Qutrit-based Neural Network consistently outperforms with advantages in risk-adjusted returns, measured by the Sharpe ratio, greater consistency in prediction quality through the Information Coefficient, and enhanced robustness under varying market conditions. The QQTN not only surpasses its classical and qubit-based counterparts in multiple quantitative and qualitative metrics but also achieves comparable performance with significantly reduced training times. These results showcase the promising prospects of Quantum Qutrit-based Neural Networks in practical financial applications, where real-time processing is critical. By achieving superior accuracy, efficiency, and adaptability, the proposed models underscore the transformative potential of quantum-inspired approaches, paving the way for their integration into computationally intensive fields.
Governed Auditable Decisioning Under Uncertainty: Synthesis and Agentic Extension
arXiv:2604.19112v1 Announce Type: new Abstract: When automated decision systems fail, organizations frequently discover that formally compliant governance infrastructure cannot reconstruct what happened or why. This paper synthesizes an operational governance evidence framework -- structural accountability collapse diagnostics, decision trace schemas, evidence sufficiency measurement, and label-free monitoring -- into an integrated chain and analytically assesses its transferability across four decision system architectures. The cross-architecture comparison reveals a governance coverage gradient: deterministic rule engines achieve full DES-property fillability, hybrid ML+rules systems achieve partial fillability, classical ML systems achieve only minimal fillability, and agentic AI systems encounter structural breaks. We introduce the cascade of uncertainty, showing how governance failures propagate through serial dependencies between framework layers. For agentic systems, we identify three structural breaks -- decision diffusion, evidence fragmentation, and responsibility ambiguity -- and propose corresponding analytical extensions. Four propositions formalize the gradient, cascade compounding, delegation-depth effects, and extension sufficiency, establishing boundary conditions for the framework's valid operating envelope.
Personalized Benchmarking: Evaluating LLMs by Individual Preferences
arXiv:2604.18943v1 Announce Type: new Abstract: With the rise in capabilities of large language models (LLMs) and their deployment in real-world tasks, evaluating LLM alignment with human preferences has become an important challenge. Current benchmarks average preferences across all users to compute aggregate ratings, overlooking individual user preferences when establishing model rankings. Since users have varying preferences in different contexts, we call for personalized LLM benchmarks that rank models according to individual needs. We compute personalized model rankings using ELO ratings and Bradley-Terry coefficients for 115 active Chatbot Arena users and analyze how user query characteristics (topics and writing style) relate to LLM ranking variations. We demonstrate that individual rankings of LLM models diverge dramatically from aggregate LLM rankings, with Bradley-Terry correlations averaging only $\rho = 0.04$ (57\% of users show near-zero or negative correlation) and ELO ratings showing moderate correlation ($\rho = 0.43$). Through topic modeling and style analysis, we find users exhibit substantial heterogeneity in topical interests and communication styles, influencing their model preferences. We further show that a compact combination of topic and style features provides a useful feature space for predicting user-specific model rankings. Our results provide strong quantitative evidence that aggregate benchmarks fail to capture individual preferences for most users, and highlight the importance of developing personalized benchmarks that rank LLM models according to individual user preferences.
Geopolitics, Policy & Governance
Google’s Gemini can now run on a single air-gapped server — and vanish when you pull the plug
Cirrascale Cloud Services today announced it has expanded its partnership with Google Cloud to deliver the Gemini model on-premises through Google Distributed Cloud, making it the first neocloud provider to offer Google's most advanced AI model as a fully private, disconnected appliance. The announcement, timed to coincide with Google Cloud Next 2026 in Las Vegas, addresses a stubborn problem that has plagued regulated industries since the generative AI boom began: how to access frontier-class AI models without surrendering control of your data. The offering packages Gemini into a Dell-manufactured, Google-certified hardware appliance equipped with eight Nvidia GPUs and wrapped in confidential computing protections. Enterprises and government agencies can deploy the system inside Cirrascale's data centers or their own facilities, fully disconnected from the internet and from Google's cloud infrastructure. The product enters preview immediately, with general availability expected in June or July. In an exclusive interview with VentureBeat ahead of the announcement, Dave Driggers, CEO of Cirrascale Cloud Services, described the deployment as "the next step of the partnership” and “being able to offer their most important model they have, which is Gemini." He was emphatic about what customers would be getting: "It is full blown Gemini. It's not pulled,” he told VentureBeat. “Nothing's missing from it, and it'll be available in a private scenario, so that we can guarantee them that their data is secure, their inputs are secure, their outputs are secure." The move signals a deepening shift in the enterprise AI market, where the most capable models are migrating out of hyperscaler data centers and into customers' own racks — a reversal of the cloud computing orthodoxy that defined the past decade. The impossible tradeoff that kept banks and governments on the AI sidelines For years, organizations in financial services, healthcare, defense and government faced a binary choice: access the most powerful AI models through public cloud APIs, exposing sensitive data to third-party infrastructure, or settle for less capable open-source models they could host themselves. Cirrascale's new offering attempts to eliminate that tradeoff entirely. Driggers described how the trust problem escalated in stages. First, companies worried about handing their proprietary data to hyperscalers. Then came a deeper realization. "They started realizing, holy crap, when my users type stuff in, they're giving private information away — and the output is private too," Driggers told VentureBeat. "And then the hyperscalers said, 'Your prompts and the responses? That's our stuff. We need that in order to answer your question.'" That was the moment, he argued, when the demand for fully private AI became impossible to ignore. Unlike Google Distributed Cloud, which Google already offers as its own on-premises cloud extension, the Cirrascale deployment places the actual model — weights and all — outside of Google's infrastructure entirely. "Google doesn't own this hardware. We own the hardware, or the customer owns the hardware," Driggers said. "It is completely outside of Google." Driggers drew a sharp distinction between this offering and what competitors provide. When asked about Microsoft Azure's on-premises deployments with OpenAI models and AWS Outposts, he was blunt: "Those are a lot different. This is the actual model being deployed on prem outside of their cloud. It's not a cut down version. It's the actual model." Pull the plug and the model vanishes: how confidential computing guards Google's crown jewel The technical underpinnings of the deployment reveal how seriously both Google and Cirrascale are treating the security question. The Gemini model resides entirely in volatile memory — not on persistent storage. "As soon as the power is off, the model is gone," Driggers explained. User sessions operate through caches that clear automatically when a session ends. "A company's user inputs, once that session's over, they're gone. They can be saved, but by default, they're gone," he said. Perhaps the most striking security feature is what happens when someone attempts to tamper with the appliance. Driggers described a mechanism that effectively renders the machine inoperable: "You do anything that is against confidential compute, and it's gone. Not only does the machine turn off, and therefore the model is gone, it actually puts in a marker that says, 'You violated the confidential compute.' That machine has to come back to us, or back to Dell or back to Google." He characterized the appliance as something that "does time bomb itself if something goes wrong." This level of protection reflects Google's own anxiety about releasing its flagship model's weights into environments it doesn't control. The appliance is effectively a vault: the model runs inside it, but nobody — not even the customer — can extract or inspect the weights. The confidential computing envelope ensures that even physical possession of the hardware doesn't grant access to the model's intellectual property. When Google releases a new version of Gemini, the appliance needs to reconnect — but only briefly, and through a private channel. "It does have to get connected back to Google to load the new model. But that can go via a private connection," Driggers said. For the most security-sensitive customers who can never allow their machine to connect to an outside network, Cirrascale offers a physical swap: "The server will be unplugged, purged, all the data gone, guaranteed it's gone, a new server will show up with a new version of the model." From Wall Street to drug labs, the rush for air-gapped AI is accelerating Driggers identified three primary drivers of demand: trust, security and guaranteed performance. Financial services institutions top the list. "They've got regulatory issues where they can't have something out of their control. They've got to be the one who determines where everything is. It's got to be air gap," Driggers said. The minimum deployment footprint — a single eight-GPU server — makes the product accessible in a way that Google's own private offerings do not. Running Gemini on Google's TPU-based infrastructure, Driggers noted, requires a much larger commitment. "If you want a private [instance] from Google, they require a much bigger bite, because to build something private for you, Google requires a gigantic footprint. Here we can do it down to a single machine." Beyond finance, Driggers pointed to drug discovery, medical data, public-sector research, and any business handling personal information. He also flagged an increasingly critical use case: data sovereignty. "How about your business that's doing business outside of the United States, and now you've got data sovereignty laws in places where GCP is not? We can provide private Gemini in these smaller countries where the data can't leave." The public sector is another major target. Cirrascale launched a dedicated Government Services division in March as part of its earlier partnership with Google Public Sector around the GPAR (Google Public Sector Program for Accelerated Research) initiative. That program provides higher education and research institutions access to AI tools including AlphaFold, AI Co-Scientist, and Gemini Enterprise for Education. Today's announcement extends that relationship from the research tooling layer to the model itself. The performance guarantee is the third pillar. Driggers noted that frontier models accessed through public APIs deliver inconsistent response times — a problem for mission-critical business applications. The private deployment eliminates that variability. Cirrascale layers management software on top of the Gemini appliance that allows administrators to prioritize users, allocate tokens by role, adjust context window sizes, and load-balance across multiple appliances and regions. "Your primary data scientists or your programmers may need to have really large context windows and get priority, especially maybe nine to five," Driggers explained, "but yet, the rest of the time, they want to share the Gemini experience over a wider group of people." He also noted that agentic AI workloads, which can run around the clock, benefit from the ability to consume unused capacity during off-peak hours — a scheduling flexibility that public cloud deployments don't easily support. Seat licenses, token billing and all-you-can-eat pricing: a model built for enterprise flexibility The pricing model reflects Cirrascale's broader philosophy of meeting customers where they are. Driggers described several consumption options: seat-based licensing (with both enterprise and standard tiers), per-token billing, and flat "all-you-can-eat" pricing per appliance. The minimum commitment is a single dedicated server — the appliances are not shared between customers in any configuration. "We'll meet the customer, what they're used to," Driggers said. "If they're currently taking a seat license, we'll create a seat license for them." Customers can also choose to purchase the hardware outright while still consuming Gemini as a managed service, an arrangement Cirrascale has offered since its earliest days in the AI wave. Driggers said OpenAI has been a customer since 2016 or 2017, and in that engagement, OpenAI purchased its own GPUs while Cirrascale "took those GPUs, incorporated them into our servers and storage and networking, and then presented it back as a cloud service to them so they didn't have to manage anything." That flexible ownership model is particularly relevant for universities and government-funded research institutions, where mandates often require a specific mix of capital expenditure, operating expenditure, and personnel investment. "A lot of government funding requires a mixture of CapEx, OPEX and employment development," Driggers said. "So we allow that as well." Inside the neocloud that built the world's first eight-GPU server — and just landed Google's biggest AI model Cirrascale's announcement arrives during a period of explosive growth for the neocloud sector — the tier of specialized AI cloud providers that sit between the hyperscalers and traditional hosting companies. The neocloud market is projected to be worth $35.22 billion in 2026 and is growing at a compound annual growth rate of 46.37%, according to Mordor Intelligence. Leading neocloud providers include CoreWeave, Crusoe Cloud, Lambda, Nebius and Vultr, and these companies specialize in GPU-as-a-Service for AI and high-performance computing workloads. But Cirrascale occupies a different niche within this booming category. While companies like CoreWeave have focused primarily on providing raw GPU compute at scale — CoreWeave boasts a $55.6 billion backlog — Cirrascale has positioned itself around private AI, managed services and longer-term engagements rather than on-demand elastic compute. Driggers described the company as "not an on-demand place" but rather a provider focused on "longer-term workloads where we're really competing against somebody doing it back on prem." The company's history supports that claim. Cirrascale traces its roots to a hardware company that "designed the world's first eight GPU server in 2012 before anybody thought you'd ever need eight GPUs in a box," as Driggers put it. It pivoted to pure cloud services roughly eight years ago and has since built a client roster that includes the Allen Institute for AI, which in August 2025 tapped Cirrascale as the managed services provider for a $152 million open AI initiative funded by the National Science Foundation and Nvidia. Earlier this month, Cirrascale announced a three-way alliance with Rafay Systems and Cisco to deliver end-to-end enterprise AI solutions combining Cirrascale's inference platform, Rafay's GPU orchestration, and Cisco's networking and compute hardware. The private AI era is arriving faster than anyone expected The Gemini partnership is the highest-profile move yet — and it taps into a broader industry current. The push to move frontier AI out of the public cloud and into private infrastructure is no longer a niche demand. Industry analysts predict that by 2027, 40% of AI model training and inference will occur outside public cloud environments. That projection helps explain why Google is willing to let its crown-jewel model run on hardware it doesn't own, in data centers it doesn't operate, managed by a company in San Diego. The alternative — watching regulated enterprises default to open-source models or to Microsoft's Azure OpenAI Service — is apparently a worse outcome. The announcement also carries major implications for Google's competitive positioning. Microsoft has built its enterprise AI strategy around the Azure OpenAI Service and its deep partnership with OpenAI, while AWS has invested in Amazon Bedrock and its own on-premises solutions through Outposts. Google Cloud Platform still trails both rivals in market share, though Q4 cloud revenue rose 48% year-over-year. Enabling Gemini to run on third-party infrastructure via partners like Cirrascale broadens its distribution surface in exactly the segments — government, finance, healthcare — where Microsoft and Amazon have historically held advantages. For Cirrascale, the partnership represents a chance to differentiate sharply in a market where most neoclouds are competing on GPU availability and price. Driggers expects rapid uptake in the second half of 2026. "It's going to be crazy towards the end of this year," he said. "Major banks will finally do stuff like this, because they can secure it. They can do it globally. Big research institutions who have labs all over the world will do these types of things." He predicted other frontier model providers will follow with similar offerings soon, and he doesn't see Gemini as the end of the story. "We really think that the enterprise have been waiting for private AI, not just Gemini, but all sorts of private AI," Driggers said. That may be the most telling line of all. For three years, the AI revolution has been defined by a simple bargain: send your data to the cloud and get intelligence back. Cirrascale's bet — and increasingly, Google's — is that the biggest customers in the world are done accepting those terms. The most powerful AI on the planet is now available on a single locked box that can sit in a bank vault, a university basement, or a government facility in a country where Google has no data center. The cloud, it turns out, is finally ready to come back down to earth.
Europe’s AI endgame? Bet on reliability
If the region fails to lead on safe and secure AI, it risks remaining stuck on the wrong side of the tech wall
Regulating Artificial Intimacy: From Locks and Blocks to Relational Accountability
arXiv:2604.18893v1 Announce Type: new Abstract: A series of high-profile tragedies involving companion chatbots has triggered an unusually rapid regulatory response. Several jurisdictions, including Australia, California, and New York, have introduced enforceable regulation, while regulators elsewhere have signaled growing concern about risks posed by companion chatbots, particularly to children. In parallel, leading providers, notably OpenAI, appear to have strengthened their self-regulatory approaches. Drawing on legal textual analysis and insights from regulatory theory, psychology, and information systems research, this paper critically examines these recent interventions. We examine what is regulated and who is regulated, identifying regulatory targets, scope, and modalities. We classify interventions by method and priority, showing how emerging regimes combine "locks and blocks", such as access gating and content moderation, with measures addressing toxic relationship features and process-based accountability requirements. We argue that effective regulation of companion chatbots must integrate all three dimensions. More, however, is required. Current regimes tend to focus on discrete harms, narrow conceptions of vulnerability, or highly specified accountability processes, while failing to confront deeper power asymmetries between providers and users. Providers of companion chatbots increasingly control artificial intimacy at scale, creating unprecedented opportunities for control through intimacy. We suggest that a general, open-ended duty of care would be an important first step toward constraining that power and addressing a fundamental source of chatbot risk. The paper contributes to debates on companion chatbot regulation and is relevant to regulators, platform providers, and scholars concerned with digital intimacy, law and technology, and fairness, accountability, and transparency in sociotechnical systems.
When Transparency Falls Short: Auditing Platform Moderation During a High-Stakes Election
arXiv:2604.19285v1 Announce Type: cross Abstract: During major political events, social media platforms encounter increased systemic risks. However, it is still unclear if and how they adjust their moderation practices in response. The Digital Services Act Transparency Database provides-for the first time-an opportunity to systematically examine content moderation at scale, allowing researchers and policymakers to evaluate platforms' compliance and effectiveness, especially at high-stakes times. Here we analyze 1.58 billion self-reported moderation actions by the eight largest social media platforms in Europe over an eight-month period surrounding the 2024 European Parliament elections. We found that platforms did not exhibit meaningful signs of adaptation in moderation strategies as their self-reported enforcement patterns did not change significantly around the elections. This raises questions about whether platforms made any concrete adjustments, or whether the structure of the database may have masked them. On top of that, we reveal that initial concerns regarding platforms' transparency and accountability still persist one year after the launch of the Transparency Database. Our findings highlight the limits of current self-regulatory approaches and point to the need for stronger enforcement and better data access mechanisms to ensure that online platforms meet their responsibilities in protecting the democratic processes.
Global Web, Local Privacy? An International Review of Web Tracking
arXiv:2604.18633v1 Announce Type: cross Abstract: Web tracking by ad networks, social networks, and other third parties is privacy-invasive. To protect users' privacy an increasing number of countries are adopting new privacy laws. However, a major reason why their application on the web is so challenging is that privacy laws are local while the web is global. To that end, we evaluate websites' tracker connections for ten countries for two sets of sites -- the global Common Top 525 and the Country-specific Top 525 sites. We find that Australia and the US (California) -- two of the three opt-out jurisdictions in our study -- have the highest level of web tracking while opt-in jurisdictions generally have lower levels. We also find that the Common Top 525 sites have 50.5\% fewer average tracker connections when accessed from EU countries compared to non-EU countries. Further, simply not interacting with cookie banners decreases trackers by 48.5\% for Germany, as measured for a sample of 36 Common Top 525 sites. These results suggest that the General Data Protection Regulation and the ePrivacy Directive have a tangible effect in reducing tracking. As 28\% of Common Top 525 sites show cookie banners in all ten countries, our results suggest a moderate Brussels effect. However, against the backdrop of global US ad tech practices, EU law primarily acts as a Brussels shield. Generally, we think that strong enforcement of privacy laws is key to increase user privacy on the web.
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.