Fri 19 June 2026

‘We created a monster’: companies rein in AI usage as costs strain budgets

Amazon, Walmart and Uber are among early adopters that have introduced caps or discouraged wasteful activity

What Capital After Labor? Forecasting the Talent ROI Transition in the Human-AI Era

arXiv:2606.19846v1 Announce Type: new Abstract: AI augmentation breaks the accounting link between labor time and productive contribution, yet firms continue to evaluate talent through time-based overhead bundles. This paper develops a forecasting framework for the transition from time-based talent accounting to output-based talent ROI in the human-AI era. The framework centres on Theorem 3 (ROI

Editor's pickPAYWALLProfessional Services

Have Data Centers Raised Your Electric Bill? Causal Evidence from the United States

arXiv:2606.19777v1 Announce Type: cross Abstract: We estimate that data centers caused average retail electricity rates to fall modestly in the United States from 2015 to 2024 using an instrumental variables approach. Despite prevailing sentiment, the finding is consistent with economic reasoning: existing large power system fixed costs, economies of scale in transmission and distribution, and de

Accenture shares fall to lowest since 2017 as AI threat mounts

IT consultancy hit by concerns technology will hurt its business model

Ethan Mollick· 3 days ago

The Unresolved Profitability Crisis of Open-Weights Frontier AI Models

The economic viability of training frontier open-weights models remains unproven, as high training costs lack clear ancillary revenue streams. This uncertainty contrasts with traditional open-source software models and poses a significant challenge for future investment sustainability.

Editor's pickPAYWALL

How AI is ‘senior-ising’ junior roles

Changing workflows mean employers are now asking new recruits to be managers and decision makers

US Acts to Speed Up Power Grid Hook-Ups for AI Data Centers

US regulators have taken their biggest step yet to speed the connection of data centers to the country’s grids while simultaneously attempting to slow surging utility bills that have angered Americans.

Tech Workers Maxed Out Their A.I. Use. Now They’re Trying to Minimize It.

Artificial intelligence is expensive to use, many companies discovered. That has led to a new era of saving costs.

German electricity grid equipment maker SGB-SMIT in early IPO talks

Company’s valuation could top €4bn as investors focus on AI and data centre boom

Reuters· 3 days ago

The future of AI may be small, cheap and unprofitable | Reuters

The AI boom is built on the idea that bigger is better. A recent study suggests the opposite may soon be true: small language models running on desktop computers may be able to handle most of the tasks currently performed by large language models.

AI boosts Samsung but batters IT jobs

The inside story on the Asia tech trends that matter, from Nikkei Asia and the Financial Times

The tech giant mining Wall Street for AI cash

Former Goldman Sachs executive Dina Powell McCormick is helping Meta find ways to finance its AI ambitions

SpaceX plots $20bn bond deal after record IPO

AI and rocket group is tapping debt markets after raising $86bn in stock market debut

Linkdood· 3 days ago

Why Nations Are Rethinking Dependence on Foreign New AI Models - Linkdood Technologies

The global artificial intelligence industry reached a turning point in June 2026. When Anthropic suspended access to its most advanced AI models following a

Siliconrepublic· 3 days ago

New Irish bill to supervise EU AI Act gets greenlit

The AI Act, which entered into force in August 2024, attempts to tackle some of the risks emerging from the technology while letting the bloc benefit from its economic potential. Read more: New Irish bill to supervise EU AI Act gets greenlit

Artificial Intelligence Newsletter | June 18, 2026· 4 days ago

G7 leaders urge financial regulator coordination to tackle AI risks

G7 leaders called for information sharing and coordination between financial regulators and tech companies to address risks posed by frontier AI models.

Economics & Markets

25 articles

AI Business Models3 articles

Ethan Mollick· 3 days ago

The Unresolved Profitability Crisis of Open-Weights Frontier AI Models

Directors Duties in the Age of Agentic Artificial Intelligence

arXiv:2606.20453v1 Announce Type: new Abstract: As boards engage with the adoption of Artificial Intelligence including agentic AI to drive operational efficiencies, this presents new opportunities for profit maximisation. AI adoption is increasingly identified with employee role displacement and in companies, and the interests of employees as stakeholders require exploration. A novel question posed is whether in an age of AI ascendancy AI may warrant being given stakeholder status as its role in the company approximates or eclipses that of human employees. The article probes four distinct models of corporate purpose within the duty on directors to act in the best interests of the company, the shareholder primacy model, the Enlightened Shareholder value model, the stakeholder friendly model, and the stakeholder value model, highlighting the available scope for directors to accommodate the interests of employees around AI adoption in decision-making by boards around AI. It is concluded that given the degree to which directors are insulated from legal scrutiny in relation to their best interests duty, adopting a wider law in context approach to promote employee welfare would serve the interests of employees, directors and companies alike. This would see directors engaging meaningfully with employees and providing opportunities for reskilling to adapt to the age of AI.

Editor's pickPAYWALLManufacturing & Industrials

Daily Brew· 4 days ago

Your Churn Threshold Is a Pricing Decision

How unit economics should set your classification cutoff, and why they rarely do.

AI Investment & Valuations7 articles

Bloomberg· 2 days ago

Traders’ Latest AI-Related Play Is a Struggling Car Parts Stock

Investors looking for the stock market’s next artificial intelligence winner have honed in on embattled French car parts maker Valeo SE.

German electricity grid equipment maker SGB-SMIT in early IPO talks

Company’s valuation could top €4bn as investors focus on AI and data centre boom

The tech giant mining Wall Street for AI cash

Former Goldman Sachs executive Dina Powell McCormick is helping Meta find ways to finance its AI ambitions

SpaceX plots $20bn bond deal after record IPO

AI and rocket group is tapping debt markets after raising $86bn in stock market debut

Bamboo Works· 3 days ago

From scarcity to execution: China’s AI valuation reset - Bamboo Works - China stock insights for global investors

Zhupu and MiniMax have lost more than 40% of their market value in just two weeks, as investors reassess the true worth of China's large language model developers

China Boosts Startup IPOs in Quantum, AI, and Emerging Tech to Outpace U.S. Competition

China is ramping up support for IPOs in cutting-edge tech sectors like quantum and AI to boost innovation amidst growing competition with the U.S.

Bebeez· 3 days ago

Frontier consortium to invest a further $915m into carbon removal tech

Carbon-buying consortium Frontier, which includes the likes of Google and Meta, has announced it will invest a further $915 million in carbon removal technologies, bringing its total commitment to $1.8 billion. – Frontier The consortium also announced that generative AI firm Anthropic has become a member of the carbon buying alliance. The new funding, dubbed […]

AI Macroeconomics3 articles

AI boosts Samsung but batters IT jobs

The inside story on the Asia tech trends that matter, from Nikkei Asia and the Financial Times

FintechNewsSG· 3 days ago

MAS Chief Warns Rising AI Costs Could Weigh on Investment Returns - Fintech Singapore

MAS Managing Director Chia Der Jiun warns that AI investment risks are rising as energy and chip costs climb and returns remain uncertain.

Washington Post· 2 days ago

Bernie Sanders proposes wealth fund to give Americans a stake in AI - The Washington Post

Sanders, President Trump and Open AI have all proposed that the U.S. government should take stakes in artificial intelligence companies to spread their future wealth.

AI Market Competition7 articles

Editor's pickConsumer & Retail

Editor's pickPAYWALLProfessional Services

Generative Engine Optimization at Scale: Measuring Brand Visibility Across AI Search Engines

arXiv:2606.20065v1 Announce Type: cross Abstract: People increasingly get answers straight from AI search engines like ChatGPT, Claude, Perplexity, and Gemini rather than scrolling search results. Brands that once focused on search engine optimization (SEO) must now optimize for how these engines represent, cite, and recommend them -- a shift variously called Generative Engine Optimization (GEO), Answer Engine Optimization (AEO), and AI Search Visibility. We treat AEO and AI Visibility as part of GEO, and study how to measure brand visibility across AI engines: what they value when they cite a brand, which sources they rely on, and what content large language models surface. The hard case is everyone outside the already-authoritative top brands -- SMEs, D2C brands, creators, and early-stage startups. We analyze 100K+ prompt responses across 100+ brands tracked on Ranqo between March and May 2026. First visibility runs form a clear three-tier brand-stature ladder: global household names (e.g., Stripe, Nike) appear in 73% of relevant AI answers on their first run; established mid-market and regional brands (e.g., Olipop, Klaviyo) in 44%; niche and small brands in just 11% -- about 30 percentage points per step. When engines cite sources, about 78% go to corporate websites; among non-corporate sources YouTube leads, ahead of Reddit, editorial media, and Wikipedia. The highest-leverage page is the ranked "best-of" listicle, the most-cited content format at about 21% of all citations. Sentiment is the unstable signal: whether a brand is framed positively or negatively flips about 6.7 times more often than whether it is mentioned at all. These findings provide a first large-scale baseline for measuring GEO: AI brand visibility can be measured, differs by platform, and varies strongly by brand maturity. We close by proposing seven v1.1 protocols to test whether specific recommendations can causally improve AI visibility.

Accenture shares fall to lowest since 2017 as AI threat mounts

IT consultancy hit by concerns technology will hurt its business model

Ethan Mollick· 3 days ago

Google’s Strategic Shift Toward Flash Models and Away from Frontier Leadership

Google’s current focus on flash models over frontier-grade AI suggests a strategic pivot toward mass-market serving rather than high-end agentic capabilities. This shift highlights the competitive tension between cost-efficient deployment and frontier model performance.

Artificial Intelligence Newsletter | June 18, 2026· 4 days ago

Anthropic, co-founders face new US copyright infringement suit from 100 authors

Around 100 authors have filed a lawsuit against Anthropic, alleging the company used pirated books from library websites to train its AI models.

Artificial Intelligence Newsletter | June 19, 2026· 3 days ago

LivCor reaches $7m settlement with US states over rental prices

LivCor agreed to pay $7 million to resolve antitrust claims from 10 US states alleging it used RealPage's revenue management system to align rental prices with competitors.

Top Daily Headlines: Microsoft once used its own brand of 'Lego' to optimize Windows· 2 days ago

Midjourney pivots from AI image generation to body scanning medical spa where patients bathe in 'golden light'

Midjourney is reportedly shifting focus toward a medical spa concept, utilizing technology borrowed from an undisclosed partner.

Artificial Intelligence Newsletter | June 19, 2026· 3 days ago

Nvidia defeat would be 'huge headache' for merger call-ins, Irish official says

A court defeat for EU regulators over the Nvidia/Run:ai deal could complicate merger reviews that fall below standard notification thresholds.

AI Pricing & Cost Curves2 articles

Tech Workers Maxed Out Their A.I. Use. Now They’re Trying to Minimize It.

Artificial intelligence is expensive to use, many companies discovered. That has led to a new era of saving costs.

Reuters· 3 days ago

The future of AI may be small, cheap and unprofitable | Reuters

AI Productivity2 articles

Forecasting AI-Era Productivity: The Intellectually Converged Human Framework and a Missing Cognitive Mediator in Production Function Theory

arXiv:2606.19794v1 Announce Type: new Abstract: Why does massive AI investment fail to generate commensurate productivity gains? We argue the paradox is theoretically generated: prevailing production function frameworks encounter a structural boundary by treating AI as a separable factor of production without modeling the cognitive mediation through which AI generates productive value. This directs investment toward deployment when productivity requires prior development of what we term convergence capacity (C). We propose the Intellectually Converged Human (ICH) framework, a fifth-stage framework for production function theory: H-hat = H[1 + phi(A,C)], where effective productive capacity equals human capital (H) scaled by an augmentation factor [1 + phi], with phi jointly determined by AI utilization intensity (A) and convergence capacity (C), a four-dimensional cognitive construct encompassing embodied understanding, metacognition, temporal integration, and integrative thinking. The production function Y = F(K, H-hat) provides a human-centered mechanism for Solow's TFP residual: A_Solow = [1 + phi(A,C)]^(1-alpha). The framework predicts three augmentation regimes with distinct policy implications. Descriptive cross-national analysis of 20 OECD economies shows the AIxC interaction is associated with 86% of TFP variance versus 31% for AI alone, a pattern-consistent finding in the small-n theoretical tradition. South Korea exemplifies national-scale under-augmentation: high H, substantial A, low C produce phi = 0. We distinguish convergence capacity from adjacent constructs, absorptive capacity, dynamic capability, and human capital, and demonstrate that C constitutes the specific cognitive mediator that prior frameworks have left implicit. We derive C-first policy prescriptions and offer three empirically testable propositions with a falsifiable 10-year forecast.

What Capital After Labor? Forecasting the Talent ROI Transition in the Human-AI Era

AI Startups & Venture1 articles

Bebeez· 3 days ago

New €500 million growth vehicle E2D aims to close Europe’s DefenceTech scaling gap

VC firms Earlybird and AVP today announce the launch of E2D, a €500 million European dual-use and DefenceTech growth fund – one of the most significant Franco-German investment collaborations in the European tech ecosystem. E2D will be a growth-stage fund, targeting around 20 companies at an average ticket size of approximately €25 million, with a […]

Labor, Society & Culture

18 articles

AI & Culture1 articles

From 50K to 8.2 Million in 24 Hours: Vozinha's Algorithmic Consecration and the Multilingual Making of World Cup Visibility

arXiv:2606.19647v1 Announce Type: cross Abstract: We present a multilingual computational discourse analysis of how language constructed the algorithmic consecration of Vozinha, the 40-year-old Cape Verde goalkeeper, after Spain 0-0 Cape Verde at the 2026 FIFA World Cup. The study contributes a multilingual corpus in Portuguese, Spanish, English, and French; a nine-frame narrative taxonomy with cue-based frame annotation; a reproducible annotation pipeline combining LLM-assisted suggestion with human validation; and an analysis of cross-lingual narrative diffusion across discourse phases. We treat the platform follower count itself, narrated as "50k to 8M", as a linguistic object: a circulating and narratable proof of visibility rather than a mere measurement. The follower-growth timeline is used only as contextual metadata: we reconstruct a conservative phase structure, not a continuous API-native series, and type every datapoint by value class, confidence, and evidence type. The only exact primary scraper anchor is 8,235,652 followers at 2026-06-16 15:47 UTC; all other figures are reported as estimated ranges or thresholds, including an estimated pre-match baseline of 45k-56k. Findings suggest that distinct languages carried distinct frames: Portuguese mobilization, Spanish crisis, English nation-making, and a shared platform-metric spectacle through which peripheral athletic performance became globally visible. As a v0.1 pilot, the paper releases the corpus schema, frame taxonomy, annotation guidelines, hashed visual-evidence log, and typed timeline, while flagging full double annotation and inter-annotator agreement as planned work.

AI & Employment8 articles

Editor's pickPAYWALL

How AI is ‘senior-ising’ junior roles

Changing workflows mean employers are now asking new recruits to be managers and decision makers

Editor's pickPAYWALLProfessional Services

Reuters· 3 days ago

AI will lead to labour shortages, Bezos says in optimistic talk | Reuters

Artificial ‌Intelligence will lead to labour shortages, not the replacement of humans, Amazon founder Jeff Bezos predicted in a highly optimistic appearance at the VivaTech technology conference in Paris on Wednesday.

Theatlantic· 3 days ago

America Is Headed Toward the Infinite Workweek

The future of AI and jobs will be so much weirder than you think.

The Algorithmic-Human Manager: AI, Apps, and Workers in the Indian Gig Economy

arXiv:2606.19975v1 Announce Type: new Abstract: This paper examines the impact of artificial intelligence and digital technologies on the blue-collar gig economy in India, focusing on algorithmic management. This paper examines the impact of artificial intelligence and digital technologies on the blue collar gig economy in India, focusing on algorithmic management he use of automated systems to allocate, monitor, and evaluate work in location-based services such as ride sharing and delivery. Using a social justice framework and a mixed-methods approach comprising interviews with 16 gig workers and 21 key stakeholders, the study uncovers a dual reality: while AI-powered systems expand access to work and generate operational efficiencies, they simultaneously introduce significant challenges related to fairness, transparency, and worker dignity. Key findings reveal that algorithmic systems are opaque by design, produce inequitable outcomes, and are not structured to reward additional labour with proportionate pay. The study advocates for a pragmatic hybrid governance model an Algorithmic Human Manager framework in which technological efficiency and human accountability operate together rather than in opposition. The findings carry implications for policymakers, platform companies, and civil society organizations working to design equitable AI governance frameworks for the gig economy in India and across the Global South.

Fortune· 3 days ago

Entry-level work didn’t disappear, PwC finds with ‘seniorization.’ It just morphed into something young workers can’t get

"Employers are changing what they ask for in entry-level roles," Dan Priest, PwC's U.S. chief AI officer, told Fortune.

Guardian· 3 days ago

Gig workers are endlessly exploited. AI could make more of us share their fate

As companies integrate AI and hire fewer employees, a shift toward a ‘gig economy’ will commence In 2024, the buy-now-pay-later company Klarna announced that it would cut hundreds of customer service roles and begin using an artificial intelligence chatbot instead. The move was expected to save the company millions. But a year later, after customers complained about the degraded quality of customer service, Klarna began to quietly recruit human customer service agents back. At first glance, the reversal appeared to be a victory for human workers in the age of AI. The reality was more complex. Instead of bringing on full-time customer service agents, who Klarna contracts through an outside agency, it instead brought on workers in what Klarna CEO Sebastian Siemiatkowski has described as “an Uber type of set-up”. Now, an AI chatbot continues to handle most of customers’ basic queries, while a growing number of gig workers handle the more advanced ones. “Just like somebody can go and drive an Uber for a while, they can actually jump on and work for Klarna’s customer service,” Siemiatkowski said on a podcast in February. Continue reading...

Gender Bias in LLM Hiring Decisions: Evidence from a Japanese Context and Evaluation of Mitigation Strategies

arXiv:2606.18649v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed in hiring workflows, yet most research on gender bias in LLM hiring decisions has focused on English-language, Western-format resumes. This study examines whether pro-female gender bias extends to a Japanese corporate context and evaluates two practical mitigation strategies. Using a counterfactual resume design with 60 Japanese rirekisho-format resumes, 12 name pairs selected on linguistically grounded gender-signal criteria, and five state-of-the-art LLMs (Claude Sonnet 4.6, GPT-4o, DeepSeek-V3, Gemini 2.5 Flash, Llama 3.3 70B), we conducted 43,200 API calls across baseline, prompt instruction, and privacy filter conditions. A crossed random-effects linear mixed model confirms a significant pro-female bias across all five models, replicating Western findings in a non-Western context. A prompt-level gender-neutrality instruction produces no meaningful reduction in bias. A name-reliance analysis formally identifies the candidate name as the primary gender channel: removing the name from the prompt reduces the female effect by nearly its full magnitude. An unexpected incompatibility between the privacy filter and GPT-4o's content safety filter, resulting in a 42% refusal rate, highlights a practical deployment challenge for name anonymization in LLM-assisted recruitment pipelines.

Security Brief· 3 days ago

AI makes three in five Australians' jobs more stressful

Businesses rolling out AI face rising staff anxiety, with a survey of more than 1,200 Australians finding most feel more stressed at work.

AI Ethics & Safety7 articles

Elon Musk's Grok AI Sparks Debate on Ethics and Oversight in Military Use

Elon Musk’s AI tool, Grok, has been revealed as a critical component in US military operations, triggering debates on AI ethics and oversight in defense.

Rape convictions under review after UK detective allegedly used AI chatbot for paperwork

Derbyshire Police investigating whether officer used software to secure desired court outcome

Amazon Retaliated Against Workers Who Supported Regulating Data Centers, Complaint Says

The employees encouraged limits on the complexes in a series of hearings in the tech giant’s hometown, Seattle.

Editor's pickTelecommunications

New Super PAC, the Guardrails Alliance, Aims to Rally Tech Workers to Help Limit A.I.

The Guardrails Alliance, which has raised $5 million, is positioning itself as a populist effort that will take on the pro-A.I. interests trying to influence this year’s elections.

Acceleration AI Ethics and the Telus GenAI Conversational Agent

arXiv:2501.18038v3 Announce Type: replace Abstract: Acceleration ethics addresses the tension between innovation and safety in artificial intelligence. The acceleration argument is that risks raised by innovation should be answered with still more innovating. This paper summarizes the theoretical position, and then shows how acceleration ethics works in a real case. To begin, the paper summarizes acceleration ethics as composed of five elements: innovation solves innovation problems, innovation is intrinsically valuable, the unknown is encouraging, governance is decentralized, ethics is embedded. Subsequently, the paper illustrates the acceleration framework with a use-case, a generative artificial intelligence language tool developed by the Canadian telecommunications company Telus. While the purity of theoretical positions is blurred by real-world ambiguities, the Telus experience indicates that acceleration AI ethics is a way of maximizing social responsibility through innovation, as opposed to sacrificing social responsibility for innovation, or sacrificing innovation for social responsibility.

2026 AI Report: Protect LGBTQ Rights with Inclusive Data and Transparent Practices

A 2026 AI report highlights significant risks for LGBTQ communities due to biased AI designs and privacy violations. It calls for inclusive datasets and transparency to ensure responsible AI practices.

Emergent Alignment

arXiv:2606.19527v1 Announce Type: new Abstract: Can Large Language Models (LLMs) discern when their own outputs are misaligned with human ethics? And can they self-correct? We endow an LLM with a conscience step that reviews its own reasoning and outputs, and we extend the training loss with an alignment component using Direct Preference Optimization (DPO) to steer the model away from non-ethical outputs. The result is an online technique to align models in a wide range of applications: training, fine-tuning, adversarial prompting, and zero-shot learning. It does not require a weaker or stronger judge, relying instead on a frozen copy of itself. In previous work, the Emergent Misalignment scenario showed a range of emergent unethical behaviors from fine-tuning the model to hack code. Instead, we empirically show how to achieve Emergent Alignment: a single high-level introspective question steers training toward an ethical model under the same code hacking scenario.

AI Skills & Education1 articles

Editor's pickEducation

Measuring Curriculum Alignment across Topical Coverage, Competency, and Cognitive Depth: A Longitudinal Framework Applied to CS2013 and CS2023

arXiv:2606.19469v1 Announce Type: new Abstract: Undergraduate computer science is governed by international curricular guidelines revised about once a decade, yet programs lack a reliable, reproducible way to measure how completely they cover the current guidelines and how that coverage shifts when the guidelines are restructured. We address this with a human-in-the-loop pipeline that measures a program's coverage of an external body of knowledge, applied longitudinally to one accredited BSc in Computer Science against Computer Science Curricula 2013 (CS2013) and 2023 (CS2023). The pipeline represents the program and each guideline as structured corpora, generates candidate course-to-knowledge-unit matches by semantic retrieval, and confirms them through human judgment under an explicit coverage definition. Of seven benchmarked retrievers, a reciprocal-rank-fusion ensemble was strongest, and a reputed long-context model underperformed a small sentence model, so retriever choice must be measured. Both maps were validated by an independent second rater (Cohen's kappa 0.64 for CS2023, 0.69 for CS2013). The program covers 49.7% of CS2023 and 50.9% of CS2013 knowledge units, near-constant across a decade. Extending the same retrieve-then-confirm design to competency articulation and cognitive depth shows that the program articulates the competency for ~88% of covered units under each guideline, yet delivers it at the recommended depth for 76% of present units under CS2023 against 95% under CS2013, a gap reflecting the newer guideline's raised expectations, not the program. The longitudinal comparison separates persistent structural gaps (parallel and distributed computing, foundations of programming languages, systems fundamentals), uncovered against both guidelines and ABET, from differences that reflect the standard's evolution. The instrument is reusable and available from the authors on request.

Public Attitudes to AI1 articles

Two-thirds of Americans think AI is advancing too quickly

Pew Research data indicates that a majority of Americans are concerned about the rapid pace of AI development.

Technology & Infrastructure

36 articles

AI Agents & Automation9 articles

DeXposure-Claw: An Agentic System for DeFi Risk Supervision

arXiv:2606.19501v1 Announce Type: new Abstract: Decentralized finance exposes supervisors to fast-moving, networked credit risks. General-purpose LLM agents fit this setting poorly: they over-read weak evidence and recommend high-stakes interventions, while existing evaluations offer no regulator-aligned way to measure the resulting false alarms. We introduce DeXposure-Claw, a forecast-grounded agentic supervision system that routes LLM decisions through structured evidence: (1) DeXposure-FM, a graph time-series foundation model, forecasts future exposure networks; (2) deterministic monitors and stress scenarios then turn those forecasts into typed alerts, attribution signals, and scenario evidence; and (3) data-health and confidence gates constrain escalation before DeXposure-Claw emits auditable supervisory tickets with rationales. We further develop DeXposure-Bench, a six-axis evaluation harness, whose decision axis scores tickets against a regulator-aligned absolute-loss ground truth and an explicit false-intervention rate. Experiments on five years of weekly real data fully support our system. Code is at https://github.com/EVIEHub/DeXposure-Claw.

Adobe embeds agentic AI workflows across Creative Cloud, shifting from media generation to production orchestration

Adobe has announced a major expansion of its "creative agent" across its flagship Creative Cloud suite and upgraded Firefly AI studio. Available in public beta starting today across Premiere Pro, Photoshop, Illustrator, InDesign, and Frame.io, the agent is designed to serve everyone from individual creators to enterprise marketing teams. Unlike first-generation generative AI tools that simply output flat media from a chat interface, Adobe’s embedded assistant acts as an orchestration layer. It interprets natural language prompts and directly accesses the underlying software's APIs to execute complex, multi-step production workflows—from batch-renaming video sequences to dynamically updating brand assets across print layouts—while leaving the final aesthetic decisions entirely in the hands of the human designer. Technology: Contextual Memory and DOM Manipulation At the core of this release is a significant technical upgrade to how Adobe's AI handles persistent memory and context window management. In its upgraded Firefly creative AI studio—currently in private beta—Adobe has introduced two foundational architectural components: "Elements" and "Projects". Elements functions as a visual variables library, allowing users to save and reuse specific characters, locations, and objects across multiple generations to ensure strict visual consistency as campaigns scale. Projects acts as the contextual memory layer, storing assets, generations, and session history in a unified space so users can pick up where they left off without rebuilding their prompt context. Beyond pixel generation, the system's most critical technological leap is its ability to operate seamlessly within the complex document structures of desktop applications. "Our Adobe Creative Agent can leverage the decades of powerful features, workflows, APIs that we've brought into our application and exposed through tooling that can now be invoked through a creative agent," an Adobe representative explained. Product: Automating the Tedious, Expanding the Canvas The practical application of this technology fundamentally alters standard production workflows. Adobe is positioning the human user as a "creative director" capable of delegating repetitive, labor-intensive tasks to the AI. The rollout introduces highly specific specialist agents tailored to the logic of each application: Premiere Pro: The agent handles tedious project setup, analyzing and sorting source media into bins, batch renaming clips, identifying interview questions, and assembling a rough working starting point. Illustrator: The assistant automates mathematical and multi-step design tasks, such as generating 50 versioned files from a spreadsheet or running pre-flight checks to flag color mode errors before printing. It can even programmatically duplicate a vector shape 100 times, randomize its position, and change its size based on its z-depth and transparency. Photoshop & InDesign: The agent executes batch background removals, dynamic layer organization, and applies brand updates across multi-page layouts. Furthermore, Adobe is actively integrating its creative agent into major third-party enterprise platforms, including OpenAI's ChatGPT, Anthropic's Claude, Microsoft 365 Copilot, and soon, Google Gemini and Slack. Licensing: Commercial SaaS and Enterprise Implications Unlike open-source orchestration frameworks or models released under MIT or Apache licenses, Adobe's creative agent operates strictly within a proprietary, commercial SaaS ecosystem. For enterprise decision-makers, this carries specific implications. Because the agent relies on Adobe's proprietary APIs to manipulate project files, it requires an active Creative Cloud commercial license. Additionally, by bringing the "Adobe for creativity connector" to platforms like Slack and Microsoft Copilot , enterprise IT and systems architects must consider how internal chat tools will interface with Adobe's cloud processing environments to support enterprise creative and marketing teams securely. The Enterprise Unknowns: APIs, Governance, and Architecture While Adobe’s announcements highlight a powerful user interface and deep integration within its own flagship applications, several critical questions remain for enterprise technical decision-makers tasked with building bespoke AI systems. VentureBeat has reached out to Adobe for clarification on these infrastructure-level details and will update this coverage as we learn more. For AI system architects, the value of a creative agent lies not just in a native application UI, but in its extensibility. It remains unclear if Adobe plans to expose these new agentic capabilities via API, or if the company will support the Model Context Protocol (MCP). Without MCP support or direct API access, enterprise teams will face friction integrating Adobe's tools into their own custom task-routing frameworks and internal LLM pipelines. Adobe’s new "Elements" feature promises to solve the generative AI consistency problem by anchoring characters and objects across generations. However, the backend architecture driving this persistent memory is not yet detailed. Whether Adobe is leveraging on-the-fly Low-Rank Adaptation (LoRA) based on user uploads or utilizing a form of visual Retrieval-Augmented Generation (RAG) is a critical distinction for technology leaders managing compute costs, model evaluations, and enterprise-grade inference pipelines. As organizations build out "Projects" and define brand-specific "Elements", security and data decision-makers require strict guarantees regarding data provenance and storage. It is currently unknown exactly where this contextual workflow and vector data lives—specifically, whether it remains strictly sandboxed within the customer's enterprise Creative Cloud instance on Adobe servers, and how role-based permissions apply to these new agentic workflows. Finally, as lightning-fast, developer-first, multi-model AI creative platforms like fal.ai gain significant traction among enterprises and developers, Adobe’s position in the broader developer ecosystem remains a point of interest. Whether Adobe views these infrastructure-level API providers as direct competitors to its Firefly AI studio or as potential integration points for bespoke enterprise environments has yet to be seen. Community Reactions: The Tension Between Automation and Craft The integration of agentic AI touches on the tension between eliminating drudgery and surrendering creative control. According to Adobe's recent Creators' Toolkit Report, which surveyed over 16,000 creators globally, the market is highly receptive to AI as an operational assistant rather than an autonomous creator. 75 percent of surveyed creators describe creative AI as integrated or essential to their current workflows. 85 percent emphasized that the final creative decision must always remain in human hands. This sentiment is central to Adobe's messaging. By focusing the agent's capabilities on file organization, layer management, and brand compliance, Adobe aims to automate what a spokesperson called the "tedious parts of their workflow". The goal, according to Adobe executive David Wadhwani, is to let creatives focus on the craft so they can "apply their taste and make the calls that only they can".

Uncertainty Decomposition for Clarification Seeking in LLM Agents

arXiv:2606.19559v1 Announce Type: new Abstract: Recent position papers argue that the classical aleatoric/epistemic uncertainty framework is insufficient for interactive large language model (LLM) agents and call for underspecification-aware, decomposed, and communicable uncertainty representations that can unlock new agent capabilities such as proactive clarification seeking and shared mental-model building. Practical deployment constraints -- black-box APIs, interactive latency budgets, and the absence of labeled trajectories -- rule out logprob-based, multi-sampling, and training-based methods, leaving prompt-based estimation as the most viable family for surfacing such signals at deployment time. We answer this call with a simple prompt-based decomposition that separates action confidence from request uncertainty (u), enabling the agent to ask for clarification when the task specification is ambiguous. To evaluate it, we introduce two clarification-augmented benchmarks (WebShop-Clarification and ALFWorld-Clarification) in which 50% of tasks are deliberately underspecified, and systematically compare the proposed decomposition against ReAct+UE and Uncertainty-Aware Memory (UAM) across five LLM backbones (GPT-5.1, DeepSeek-v3.2-exp, GLM-4.7, Qwen3.5-35B, GPT-OSS-120B) on these variants together with the standard WebShop, ALFWorld, and REAL benchmarks for fault detection. Averaged across the five backbones, the proposed decomposition improves clarification F1 on ALFWorld-Clarification by 73% over ReAct+UE and by 36% over UAM, and leads clarification F1 on every backbone on WebShop-Clarification and on four of five backbones on ALFWorld-Clarification, indicating that the gains generalize beyond a single LLM.

Anthropic's Claude Code Artifacts update brings live, shared dashboards and interactive workspaces to enterprises

Anthropic announced a potentially game-changing new feature for users of Claude Code on the Claude Team and Enterprise subscription plans: Artifacts. This update turns a Claude Code session's work into a live, interactive, and shareable, custom HTML webpage, allowing a Claude Code user to plug in live code, multiple data sources, and have it surface on an interactive URL that they can send to other teammates — be it a dashboard, an app design, or some other product meant for internal usage. These teammates and the original user can watch the webpage it update in real-time as Claude Code goes about its work autonomously or under the user's guidance, and as the connected data sources and codebases change. While Anthropic first introduced Artifacts to its consumer web chatbot in the summer of 2024—where it evolved from a manual toggle feature to a generally available tool for publishing code snippets and games to the web—integrating this capability directly into the Claude Code command-line interface (CLI) and desktop app bridges the gap between deep, back-end engineering and the non-technical stakeholders who need to understand it. Product and Technology: The End of the Status Update At its core, Claude Code Artifacts acts as a dynamic translation layer. Built directly from the unbroken context of a user’s session, the agent uses the local repository codebase, connected monitoring tools, and conversational reasoning to spin up specialized web pages. Engineers no longer need to wire up external data sources or stand up temporary infrastructure; the AI builds the UI from what already exists. Crucially, these web pages are not static exports. As the AI works through a terminal session, the open webpage refreshes in-place, updating charts and text instantly at the exact same URL. Every update publishes a new version history, allowing teammates to roll back or track the agent's progress securely on desktop or mobile. The Battle of Live, Interactive, Shared AI Work Surfaces: Anthropic's Claude Code Artifacts vs. OpenAI's Codex Sites Anthropic's update comes more than two weeks after OpenAI released a massive update to its own Codex platform, introducing a strikingly similar enterprise hosting feature called "Sites". This tit-for-tat product cadence highlights a rapidly escalating battle over the enterprise workspace across functions and beyond developers themselves, though there are some important technical and philosophical distinctions worth pointing out for enterprises considering either. As revealed in their respective developer documentation webpages, OpenAI is building a platform-as-a-service; Anthropic is building a stateless canvas. OpenAI’s Sites is designed to generate durable, full-stack web applications. According to the platform's documentation, Codex Sites hosts projects that output as Cloudflare Worker-compatible ES modules. Crucially, Sites supports persistent backend infrastructure: agents can automatically wire up "D1" relational databases for structured data (like user progress or saved records) and "R2" object storage for file uploads. An OpenAI Site can support public sign-ins, integrate with external identity providers, and allows for highly specific access controls tailored to specific workspace groups. It utilizes a two-stage publishing process—saving a reviewable candidate linked to a Git commit before officially deploying to production. In short, it is a production environment designed to replace functional internal SaaS tools. Anthropic’s Claude Code Artifacts, by contrast, deliberately avoids the backend. The newly released documentation is blunt about its limitations: "An artifact is a capture of work, not an application". Each Artifact is a single, self-contained HTML page capped at a rendered size of 16 MiB. To guarantee organizational security, Claude wraps the published file in a strict Content Security Policy (CSP) that blocks all external network requests. T his means the page cannot load external scripts, fonts, or stylesheets, and fetch, XHR, and WebSocket calls are completely blocked. All CSS and JavaScript must be inlined, and images must be embedded as data URIs. Artifacts cannot store form input, call an API at view time, or serve multiple routes. This technical limitation is actually Anthropic's deliberate philosophical position: While OpenAI wants to spin up persistent software portals for the whole company, Anthropic is keeping Claude Code firmly anchored in ephemeral, highly secure technical workflows. Claude Artifacts are not meant to be software; they are meant to replace whiteboard diagrams, manual bug walkthroughs, and status reports with secure, self-updating visual tools that never leak live data outside the corporate boundary. Licensing and Enterprise Security: Keeping the Codebase Private Because these agents sit at the nexus of proprietary company data and live codebases, licensing and access controls are a primary concern. Both Anthropic and OpenAI have opted for closed, proprietary licensing models for these new visual workspaces. For end users and developers, the distinction is critical. Unlike permissive open-source software (such as MIT or Apache 2.0) or strict copyleft licenses (like GPL)—which grant developers the legal freedom to inspect, modify, and self-host the underlying code—neither Claude Code Artifacts nor Codex Sites can be independently forked or hosted. Enterprise clients do not maintain code-level ownership over Anthropic's rendering engine or Codex’s integration nodes; both operate strictly within their respective creators' managed infrastructures. To make this vendor-managed approach palatable to enterprise compliance teams, both companies have heavily prioritized organizational security. Anthropic ensures every artifact is private to its author by default and strictly cannot be made public to the broader internet. When an engineer chooses to share a link, it is viewable exclusively by authenticated members of their specific organization. System administrators retain ultimate authority, managing access through org-level toggles, role-based scoping, and explicit retention policies, while maintaining oversight through a centralized compliance API. OpenAI takes a similarly gated approach with Codex Sites, rolling the feature out primarily for ChatGPT Business and Enterprise workspaces. Like Anthropic, OpenAI relies on system administrators to manage deployment through centralized workspace settings, requiring an admin to explicitly enable Sites via role-based access control (RBAC) for Enterprise tiers. However, because Codex Sites functions more like a hosted web application, its access controls are slightly more granular. When an engineer prepares to share a deployed URL, they can apply specific access modes: restricting the site to just themselves and workspace admins, opening it to all active users in the workspace, or limiting access to custom user groups. Furthermore, to prevent sensitive data leaks, OpenAI provides a dedicated Sites panel to manage runtime environment variables and secrets securely, ensuring those keys do not have to be committed to local source files. Reactions and Reflections The introduction of visual, self-updating UI layers to command-line agents is fundamentally altering how developers view their own workflows. As AI handles the raw syntax and automates the reporting, the friction of communicating technical work to stakeholders is vanishing. Boris Cherny, the Lead and creator of Claude Code, highlighted the sheer utility of the update in a post on X earlier today: "I've been using Artifacts in Claude Code for everything: visual explanations of tricky code, system diagrams, quick previews of a few animation options, data analyses and dashboards I share with the team," Cherny wrote. "They are a game changer for how I work with Claude. Can't wait to hear what you think!" This sentiment is practically demonstrated in Anthropic’s launch materials. In one scenario, an engineer prompts Claude Code to investigate user drop-offs since a previous software release. In a matter of seconds, the agent executes an SQL read, builds an interactive drop-off funnel dashboard, and diagnoses that "Pro accounts stall at the export sheet". The AI then proposes UI fixes, updates the live charts as the code is refactored, and generates a secure link that a manager can instantly open via mobile. By turning the terminal into a live, collaborative canvas, Anthropic is proving that the most valuable output of an AI coding assistant isn't just the code itself—it is the context, the reasoning, and the ability to share that work instantly.

Daily AI News June 18, 2026: The Reality of Taking AI Into Production· 3 days ago

Financial Services Agents in Production

This LangChain report examines how major financial services firms, such as J.P. Morgan, are putting AI agents into production across real enterprise workflows.

Daily AI News June 18, 2026: The Reality of Taking AI Into Production· 3 days ago

Building Supercharger: How Rocket Close optimized title operations with agentic AI

Rocket Close’s Supercharger project applies agentic AI to improve title operations and support human agents in a real estate workflow.

Deontic Policies for Runtime Governance of Agentic AI Systems

arXiv:2606.19464v1 Announce Type: new Abstract: Autonomous agentic AI systems driven by Large Language Models (LLMs) introduce a new class of security, privacy, and compliance challenges: an agent that can invoke tools, manipulate data, install software, and coordinate with peer agents across organizational boundaries must be constrained not just by authentication and access control, but by the full structure of enterprise governance. This includes specifying what agents are permitted and prohibited from doing, what they areobliged to do after certain actions (e.g., notify the CISO), under what conditions a standing obligation may be waived, and which rules take precedence when policies conflict. This governance problem exceeds what current policy engines provide. Systems such as XACML, Rego, and Cedar address only the permit/prohibit subset of this governance structure. They do not provide obligation lifecycle management, meta-policy conflict resolution, dispensations that waive obligations in specific circumstances, and ontological reasoning over domain class hierarchies commonly found in applications such as healthcare, cybersecurity, or data privacy. We propose AgenticRei, which realizes key governance requirements such as obligations, dispensations, policy conflict resolutions, and reasoning over policies, as well as the basic permit/prohibit constraints. We use a deontic policy language built on the Rei framework, expressed as OWL (Web Ontology Language) and evaluated at runtime by a high-performance logic engine entirely outside the LLM. The same pipeline governs both tool invocations by the agent and agent-to-agent messages. We show through examples that deontic policies capture governance constraints around security and privacy that mostly cannot be expressed in current production engines. Our approach composes naturally with industry-standard frameworks like A2AS.

Palladyne AI Secures Army Contract for Autonomous Systems Field Trials

Palladyne AI secured key Army contracts to develop SwarmOS and Gremlin-X, aiming to streamline command of diverse autonomous systems during upcoming field trials.

Daily AI News June 18, 2026: The Reality of Taking AI Into Production· 3 days ago

The Evolution of Agentic Surfaces: Building with Claude Managed Agents

Anthropic explains how production AI agents require more than prompts, including sessions, environments, secrets, observability, and scalable agent harnesses.

AI Energy1 articles

Modest, artistic, and radical solutions to the environmental impact of image-generating machine learning

arXiv:2606.19957v1 Announce Type: new Abstract: Machine learning is often touted to improve the efficiency of ICT, but that small gain is overwhelmed by the enormous carbon, water, and land footprints of data centers and ML-ready devices. We survey the electricity consumption of ML applications in training and inference, focusing on electricity-intensive image generation. Our team of a computer engineer, a media scholar, and an artist explore solutions including inexact computing; tiny language models; low-precision hardware architectures; hardware with limited capacity; and anticipating and mitigating energy demands at the design phase. We will sketch our work in progress of an ethical and aesthetically sophisticated tiny image generator using non-scraped data. Looking to the economic context, we will propose a true-cost accounting for the environmental impact of machine learning and suggest that the criterion of efficiency is driven by the shareholder-capitalist framing of ICT.

AI Hardware3 articles

Unsloth GLM-5.2 2-bit quant 🦙, OpenAI GPT-5.5 Instant doctor training· 2 days ago

Unsloth shrinks GLM-5.2 by 84% for local use

Unsloth has released a 2-bit quantization of the 744B parameter GLM-5.2 model, allowing it to run locally on high-end Mac hardware while retaining 82% accuracy.

Substack· 3 days ago

7 Key AI Hardware Keywords to Watch in 2026

Original Article By SemiVision Research [Reading time: 17 mins]

Editor's pickPAYWALLConsumer & Retail

Theregister· 3 days ago

Neuromorphic computing may one day offer AI a power-saving brainwave

Hybrid systems could bring efficiency gains at the edge, but conventional infrastructure isn't going anywhere fast

AI Infrastructure & Compute6 articles

AI is turning Nintendo and Sony products into accidental luxury goods

With component-makers busy supplying data centres, console prices are rising as demand outstrips production capacity

US Acts to Speed Up Power Grid Hook-Ups for AI Data Centers

Have Data Centers Raised Your Electric Bill? Causal Evidence from the United States

Theregister· 3 days ago

The AI tipping point: where enterprise AI runs at scale

PARTNER CONTENT: AI's cloud journey homeward bound: enterprises prefer private clouds for scaling AI workloads.

Artificial Intelligence Newsletter | June 19, 2026· 3 days ago

Seeking to boost AI data centers, NTIA preparing report on regulatory obstacles

The NTIA is preparing a report on regulatory hurdles and infrastructure bottlenecks affecting the construction of data centers needed for AI development.

Bebeez· 3 days ago

Firstcolo breaks ground on data center in Rosbach vor der Höhe, Germany

Data center operator Firstcolo has begun construction of a new 24MW data center in the town of Rosbach vor der Höhe in the state of Hesse, central Germany. Named FRA7, the facility north of Frankfurt am Main will cover an area of 124,360 sq ft (11,555 sqm) and is intended to support cloud, AI, and […]

AI Models & Capabilities10 articles

Artificial Intelligence Newsletter | June 19, 2026· 3 days ago

Legal risks in AI training data drove LG to build data-governance system for Exaone

Many open-source datasets used to train AI foundation models may contain licensing inconsistencies and regulatory risks, prompting LG to develop its own data-governance system.

REVEAL++: Differentiable Phenotypic Grouping for Vision-Language Retinal Modeling of Alzheimer's Disease Risk

arXiv:2606.19522v1 Announce Type: new Abstract: The retina offers a noninvasive window into neurodegenerative disease, capturing subtle structural patterns associated with a risk of future cognitive decline. Vision-language alignment frameworks such as REVEAL have shown that pairing retinal fundus images with structured clinical risk narratives improves early prediction of Alzheimer's disease (AD). A key design choice in these approaches is the use of phenotypic grouping, where individuals with similar risk profiles are treated as multi-positive pairs during contrastive learning. However, existing methods operationalize phenotypic similarity as a discrete construct, relying on hard group assignments that impose rigid supervision and decouple group formation from representation learning. We propose a continuous formulation of phenotypic structure within contrastive learning. Rather than assigning samples to fixed clusters, we model inter-subject similarity as a differentiable weighting function derived from intra-modality embedding similarities in both retinal images and risk profiles. These weights define soft multi-positive relationships through a continuous aggregation operator, enabling graded supervision that reflects the spectrum nature of disease risk. We further introduce a soft-target contrastive objective that jointly learns cross-modal alignment and phenotypic structure in an end-to-end manner. Evaluated on UK Biobank retinal imaging data for incident AD prediction, the proposed framework consistently outperforms discrete group-based contrastive learning and standard vision-language baselines. By treating phenotypic similarity as a learnable, continuous signal rather than a fixed grouping rule, our approach provides a principled and robust foundation for population-scale neurodegenerative risk modeling from multi-modal retinal and clinical data.

Diffusion Language Models: An Experimental Analysis

arXiv:2606.19475v1 Announce Type: new Abstract: Large Language Models (LLMs) have revolutionized language modeling through autoregressive generation, enabling strong performance across a wide range of tasks. Recently, Diffusion Language Models (DLMs) have emerged as an alternative paradigm that generates text through iterative denoising rather than next-token prediction, allowing parallel refinement of entire sequences. While numerous diffusion-based architectures have been proposed, differences in evaluation protocols, datasets, inference budgets, and generation hyperparameters make it difficult to compare their capabilities and understand the trade-offs they offer. In this work, we present a systematic experimental analysis of modern DLMs. Specifically, we evaluate eight state-of-the-art DLMs across eight benchmarks spanning reasoning, coding, translation, knowledge, and structured problem solving, while explicitly considering both generation quality and computational efficiency. Beyond downstream evaluation, we analyze the impact of key inference-time factors, including denoising steps, context length, block size, and parallel unmasking strategies, and complement large-scale experiments with controlled comparisons of smaller models trained under identical conditions. Our analysis highlights the strengths and limitations of diffusion-based language modeling across different tasks, architectures, and inference budgets. We show that the behavior of DLMs is strongly influenced by generation-time design choices, leading to distinct trade-offs between performance and computational efficiency. Overall, our study provides practical insights into the capabilities and deployment characteristics of contemporary DLMs.

Hidden Anchors in Multi-Agent LLM Deliberation

arXiv:2606.19494v1 Announce Type: new Abstract: Multi-agent LLM deliberation, where agents exchange and revise answers over several rounds, is increasingly used to improve reasoning and accuracy, yet how and why it works is rarely modelled. Such deliberation mirrors how humans reach decisions. As social animals we are pulled both by the group, the herd effect that classical opinion-dynamics models such as DeGroot and Friedkin--Johnsen capture, and by our own internal belief, which they do not. We model multi-agent deliberation as a closed-loop dynamical system in which each agent carries a hidden internal belief, its anchor, that continually pulls its opinion regardless of its neighbours. We show this anchor can be recovered from the deliberation alone, and that it explains a behaviour classical consensus rules forbid: an agent's confidence in the correct answer can climb past where any agent started, escaping the space (convexhull) formed by the initial beliefs. Checking whether the recovered anchor also predicts held-out runs (generalizes) gives a simple test for when a model is truly driven bysuch an anchor. Across three open-weight model families this is a spectrum, not all-or-nothing. All anchors' influence are about equally strongly, but they differ in where the anchor sits, and only when it sits far from the initial opinions does deliberation escape the hull and need the full closed-loop model.

Editor's pickPAYWALL

Is the world becoming more predictable?

Bigger and better data sets and powerful AI models are allowing us to spot previously undetectable patterns

Midjourney Medical goes from generating ‘cat images’ to full-body ultrasound scans

Midjourney's AI technology is being applied to medical imaging, specifically for ultrasound scans.

ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence

arXiv:2606.19538v1 Announce Type: new Abstract: Convolutional networks, recurrent networks, and transformers each encode different inductive biases -- locality, sequential memory, and content-dependent pairwise interaction -- and have remained mathematically distinct since their inception. We show that this fragmentation reflects not a fundamental diversity in how signals should be processed, but rather incomplete views of a single underlying mathematical object: a learnable integral transform. We introduce the Integral Transform Network (ITNet), a unified architecture built around a learnable kernel that depends jointly on positions and features. This kernel is implemented as a small neural network, specifically an MLP, that models pairwise interactions, enabling the model to adapt its behavior from data. We show that convolution, self-attention (including multi-head), and autoregressive recurrence (including LSTM, GRU, S4, and Mamba) arise as special cases under appropriate parameterizations, and that ITNet is a universal approximator of continuous operators. To make this practical, we develop tiled kernel fusion, importance-weighted Monte Carlo integration, and learned low-rank factorization, enabling efficient and scalable computation. A single ITNet architecture with a shared operator and lightweight modality-specific encoders matches or exceeds specialized baselines on ImageNet-1K , GLUE, ModelNet40, VQA\,v2 and NLVR2. The results demonstrate that a single learned interaction mechanism can recover the behavior of all three architectural families from data.

Editor's pickManufacturing & Industrials

Substack· 3 days ago

AI #173: AI Pauses - by Zvi Mowshowitz

Rob Haisfield: Are AI agents shape rotators? In this new benchmark, we let the models play campaign puzzles in Opus Magnum, a puzzle game by @zachtronics . Ironically, Claude Opus 4.8 performed poorly, being beaten by GPT-5.5, Gemini 3.5 Flash, and GLM 5.2.

Toten: Knowledge-Based Ontological Tokenization Of Physical Quantities And Technical Notation In Brazilian Portuguese

arXiv:2606.19626v1 Announce Type: new Abstract: Byte-Pair Encoding tokenization is statistically efficient for vocabulary compression, but semantically blind to structured technical entities, fragmenting physical quantities, numbers, units, and symbolic expressions into lexically arbitrary subwords. We present TOTEN, a knowledge-based ontological tokenization framework that replaces statistical derivation with declarative classification grounded in a formal ontology of engineering entities (OEE). We formalize TOTEN as the triple : the ontology gathers types, structural principles, composition relations, and preservable invariants; the classification function maps raw text into typed regions; and the instantiator family yields a self-descriptive structured representation. Robustness derives from deterministic coupling with three external oracles: Pint (dimensional), Unicode Character Database (typographic), and RSLP (Portuguese morphology). Intrinsic evaluation covers four properties verifiable by construction -- ontological atomicity, dimensional equivalence, typographic robustness, and numerical reconstruction -- over an internal, physically validated benchmark (EngQuant, N=800) and four Brazilian Portuguese external corpora (N=1771 eligible cases). We also report detection recall, distinguishing coverage from conditional atomicity. Against eight state-of-the-art baselines, TOTEN achieves unit ontological atomicity in all contrasts and numerical reconstruction of 0.775-0.904 on external corpora, vs. 0.627-0.703 for the best baseline (Quantulum3); on EngQuant, 0.780 vs. 0.340. Differences are statistically significant (McNemar with Holm correction). Spearman correlation between internal and external rankings confirms concurrent validity of the control benchmark. Dimensional equivalence shows statistical parity with Pint, the oracle from which the system inherits dimensional authority.

GLM-5.2 is the new leading open weights model on Artificial Analysis

GLM-5.2 has emerged as the top-performing open weights model according to the latest Artificial Analysis intelligence index.

AI Research & Science2 articles

New AI optimization framework beats Claude Code and Codex by 2.5x on the same compute budget

Imagine your engineering team just deployed an AI agent to search through internal company documents and answer employee questions. It works perfectly in development, but in production, it consistently hallucinates or misses key constraints. Fixing this is rarely a simple patch. It requires a tedious, trial-and-error process of tweaking chunking strategies, retrieval methods, and system prompts simultaneously. Because these adjustments are entangled, it becomes nearly impossible to attribute which specific tweak actually solved the problem. To address this challenge, researchers at Renmin University of China and Microsoft Research introduced Arbor, a framework that upgrades AI-driven research and optimization from a sequence of trial-and-error guesses into a cumulative learning process. Arbor organizes hypotheses, experiments, and insights into a tree that helps the system learn from prior failures to make smarter, verified improvements over time. In practical tests, Arbor delivered more than 2.5 times the verifiable performance gains of standard AI coding agents across real-world engineering tasks while operating under the same resource budget. For enterprise AI, this technique directly translates to automating the continuous improvement of complex, real-world engineering systems. Understanding the bottleneck in autonomous optimization As large language models and AI systems become more capable, they are expected to carry out more complex operations such as autonomous optimization (AO) of software systems such as agent harnesses or model training algorithms. AO captures the fundamental loop of autonomous research. An AI agent starts with an initial mutable artifact, such as a machine learning codebase or data pipeline, and a specific objective. The agent's goal is to iteratively improve this artifact through experimental feedback without step-by-step human supervision. The main challenge of AO is often misunderstood. Many engineering teams find that simply giving a coding agent more time or compute to optimize a codebase doesn't lead to better results. "Automation can keep an AI working for a very long time — but a loop is not the same as progress," Jiajie Jin, co-author of the paper, told VentureBeat. "If the goal is vague, or the metric is easy to hack, long-running automation often just produces 'improvements' faster that nobody actually wants." Jin explains that complex tasks take many attempts to get right, and standard agent architectures are missing the critical data structure to maintain state. "How do you make sure the insight and experience from each attempt actually accumulate, instead of getting lost in a scrollback buffer?" he said. Without this structure, agents simply repeat the same mistakes. Current agent systems can run experiments for many hours against well-specified goals: editing code, invoking tools, running tests autonomously. But they treat each attempt in isolation, missing the structural mechanisms that would let them accumulate and act on what they've learned. They lack the capacity to simultaneously maintain and compare multiple competing research directions. Without this, they cannot interpret both successes and failures to reshape their future exploration, which is the core mechanism that makes human research cumulative. General coding agents typically rely on conversation transcripts for their memory. Because AO tasks span hundreds of turns and easily exceed context window limits, these agents struggle to preserve and reuse factual evidence over long histories. As a result, they lose the overarching structure of the research process and are prone to stalling on early failures or chasing noisy evaluation swings. The system needs a structured, durable memory that records what directions have been tried, what factual evidence was produced, and how each result changes the space of future hypotheses. Existing frameworks are also prone to reward hacking and overfitting to development metrics. This makes them create the illusion of progress without producing improvements that transfer to real-world performance. Finally, general-purpose coding agents typically chain their tool calls on a single shared working tree. This architectural limitation prevents them from testing parallel hypotheses in isolated environments without corrupting the main codebase or obscuring which hypothesis caused a specific outcome. The Arbor framework Arbor solves the challenges of AO with a framework that automates the long-horizon loop of exploration, experimentation, and abstraction that characterizes human research. Arbor separates the strategic direction of research from the ground-level coding tasks with two key components: The coordinator: A long-lived AI agent that acts like a principal investigator. It never directly edits the target codebase. Instead, it owns the general state of the optimization research, observes accumulated evidence, comes up with new hypotheses and directions to explore, and decides what to do with the results of experiments. Executors: Short-lived, highly focused AI agents. When the coordinator wants to test an idea, it spins up an executor and places it in an isolated environment, essentially a fresh git worktree. Each executor is handed one hypothesis. It implements the assigned idea, runs evaluations, debugs errors, and reports back to the coordinator with the results and created artifacts. These two components collaborate through a mechanism that the researchers call “Hypothesis Tree Refinement” (HTR). HTR represents the entire research process as a persistent, branching tree where every node binds together four things: a hypothesis, the executable artifact, the factual evidence produced, and a distilled insight. This means the coordinator can explore multiple competing directions at the same time without losing its place. The coordinator builds the tree by placing broad ideas near the root, while concrete refinements branch out as leaves. This allows Arbor to safely explore multiple competing hypotheses simultaneously. If an executor's experiment fails, the tree records why it failed as a negative constraint, ensuring the system doesn't endlessly repeat the same mistake. To understand why Arbor's isolation matters, consider a common enterprise scenario: optimizing a Retrieval-Augmented Generation (RAG) pipeline for an internal AI assistant. "When you ask a single agent like Claude Code or Codex to 'improve accuracy,' it will typically change a bunch of things in one pass — chunking, the prompt, the retrieval method," Jin said. This entangles the changes, making it impossible to attribute which one actually helped. It also directly mutates the repository without isolation. Arbor solves this by treating each lever as a separate hypothesis. Chunking becomes one branch, retrieval another, and the prompt another — each implemented and evaluated in its own isolated git worktree. "So you get clean attribution: 'constraint decomposition on the retrieval side gave +X; breadth-first search actually hurt,'" Jin said. When an executor returns a report, the coordinator writes the evidence to the tree and backpropagates the insight upward to parent nodes. This means a local observation becomes a generalized constraint that shapes the coordinator's future idea generation. To prevent reward hacking or overfitting to the development data, HTR enforces a strict “merge gate.” Even if an executor reports a fantastic development score, the coordinator will spin up an isolated worktree to test the candidate against a held-out test evaluator. The artifact is only merged into the current best trunk if it demonstrably improves the test score, verifying that the progress is real. Arbor generally falls under the concept of "loop engineering," popularized by industry figures like OpenClaw creator Peter Steinberger and Claude Code lead Boris Cherny. The idea is to move beyond single prompts to design iterative cycles (observe, reason, act, verify) that drive autonomous agents. However, as Jin points out, "A loop can fill up with messy, untraceable attempts, and you end up with nothing to show and no way to reconstruct what changed." Arbor in action The researchers evaluated Arbor on an autonomous optimization task suite built from real-world research settings and the MLE-Bench Lite machine learning engineering benchmark. The AO suite featured tasks from different areas of AI development, including model training, harness engineering, and data synthesis. The researchers used different backbone models for the coordinator and executor agents, including Claude Opus 4.6, GPT-5.5, and Gemini-3-Flash. They tested Arbor against the strongest coding agents, Codex and Claude Code. Arbor and the baselines were given the same resources. For the MLE-Bench Lite tasks, Arbor was also compared against top-tier agentic research systems like AI-Scientist, ML-Master, and AIDE. Arbor consistently outperformed the baselines. It achieved the best held-out test result on all tasks, attaining more than 2.5 times the average relative gain of Codex and Claude Code. On the BrowseComp task, which involves optimizing a search agent, Arbor improved the system's held-out accuracy from a baseline of 45.33% to 67.67%. Meanwhile, Codex and Claude Code stalled at 50% and 53.33%, respectively. On MLE-Bench Lite, when equipped with GPT-5.5, Arbor achieved the strongest result among all benchmarked systems. Arbor proved to be resilient against overfitting. For example, during the Terminal-Bench 2.0 task experiments, Claude Code achieved a high development score of 75 but its score dropped to 71 on the held-out data. Arbor had a lower development score of 72.22 but achieved the highest held-out score of 77.36, ensuring its results transfer to real-world applications. Arbor also showed generalization in a cross-task transfer experiment. After Arbor finished optimizing the search harness for the BrowseComp task, researchers took the optimized codebase and tested it on two unrelated search-agent tasks, HLE and DeepSearchQA. Arbor's optimized codebase significantly improved performance on those unseen tasks as well. Deploying Arbor: Sweet spots and hidden costs For engineering leads looking to drop Arbor into their existing tech stack, the framework is designed to sit on top of existing Git workflows rather than replacing them. "Its output is an ordinary git branch that your existing code review, CI, and human review can inspect directly," Jin said. Only verified gains are merged into a per-run trunk, leaving the main repository untouched until a developer manually chooses to promote the code. However, deploying Arbor comes with specific tradeoffs. Jin points out that the biggest catch is token cost, as maintaining a long-lived coordinator that continuously manages the tree and dispatches executors is the dominant expense. Running multiple isolated worktrees concurrently also requires genuine compute and disk resources to process real experiments. So where is Arbor's sweet spot? According to Jin, it excels at tasks with a clear, trustworthy metric, tolerance for a long time horizon, and a real search space with several plausible directions, such as pipeline optimization, data-synthesis quality, and model-training recipe tuning. Conversely, teams should explicitly avoid using Arbor for real-time latency tasks, obvious one-line fixes, or when the underlying evaluation metric is flawed. The quality ceiling of the entire run is strictly bounded by the quality of the evaluator. "If the metric isn't trustworthy, Arbor will just optimize toward an untrustworthy result faster," Jin said. Jin sees the next evolution going beyond single scalar metrics. "A natural evolution is to have each node's artifact carry a vector — accuracy, latency, cost — instead of a single score," Jin said. "Going from a single scalar to a multi-objective Pareto search is a very natural extension of the framework."

Editor's pickPAYWALLDefense & National Security

Daily Brew· 4 days ago

In game theory, generalists sometimes win out over specialists

New research from MIT explores how generalists can outperform specialists in specific game theory scenarios.

AI Security & Cybersecurity5 articles

Companies Move to Secure Data as AI Increases Security Risks

Michael Cardaci, CEO of FedHIVE said that the government is moving 'faster than its ever moved before' to secure data and make sure that US computing companies are compliant in case of national security risks as AI use ramps up. Cardaci said that the US government is focused on keeping US technology inside of its borders as the race for global AI dominance heats up. (Source: Bloomberg)

Copilot searched your mailbox. LiteLLM handed out admin keys. Run this 5-check audit before your stack is next

Two AI tools broke in the same way in the same two weeks, and four research teams proved it. The pattern underneath every disclosure is one sentence: enterprise AI accepts external input with no trust boundary. On June 15, Varonis disclosed SearchLeak (CVE-2026-42824), a proof-of-concept exfiltration chain in Microsoft 365 Copilot Enterprise Search. A victim clicks a crafted microsoft.com URL, Copilot searches their mailbox, and the data leaves through a Bing SSRF. No plugins, no second click, no visible indicator. Four days earlier, Obsidian Security published a three-CVE chain against LiteLLM that carried a default low-privilege user all the way to admin and remote code execution. Two tools. Two teams. One broken boundary. The five-check audit at the end of this article maps each gap to a CVE or a market signal from June, a command you can run before lunch, and a sentence a CISO can read to the board. Copilot turned a trusted URL into an exfiltration engine SearchLeak chained three weaknesses into a silent data-theft chain. The URL q parameter fed attacker instructions straight to Copilot’s LLM. A rendering race condition fired an image tag before the output sanitizer ran. Bing’s image-search endpoint, allowlisted in the Content Security Policy, routed the stolen data out. Microsoft rated the flaw critical and patched it on the back end, according to Varonis. NVD has not yet scored it; a third-party tracker lists it at 6.5 medium. The severity is contested, but the mechanism is not. The escalation is the real story. This is the third Varonis Copilot exfiltration chain in twelve months, after Reprompt in January and EchoLeak in 2025. Reprompt hit Copilot Personal. SearchLeak hit Enterprise Search. Enterprise inherits the user’s full organizational permissions, so the blast radius is everything that a user can reach. LiteLLM handed a default account to every provider key The LiteLLM gateway holds the keys for OpenAI, Anthropic, Azure, and Bedrock behind a single proxy. The Obsidian chain runs in three moves. CVE-2026-47101, an authorization bypass, lets a non-admin mint a wildcard API key. CVE-2026-47102 promotes that caller to proxy admin through an unguarded /user/update endpoint. CVE-2026-40217 escapes the code sandbox through exec() with full builtins. Obsidian then demonstrated a reverse shell by injecting a forged tool-call response through LiteLLM’s callback mechanism. Obsidian assessed the combined chain at CVSS 9.9. The developer typed one word. The attacker popped a shell. A separate LiteLLM flaw made the urgency immediate. CVE-2026-42271, a command-injection bug in the MCP test endpoints, landed on the CISA KEV list on June 8 with a June 22 remediation deadline. That KEV entry is not the Obsidian chain. The two are distinct disclosures four days apart, fixed in different releases, pointed at the same gateway. LiteLLM carries more than 40,000 GitHub stars and sits in thousands of enterprise deployments. This is not the first scare, either. A supply-chain compromise backdoored LiteLLM versions 1.82.7 and 1.82.8 on PyPI in March. A compromised gateway exposes every provider credential the organization holds. Langflow and Mini Shai-Hulud proved the pattern scales The same boundary broke in two more tools in the same fortnight. Langflow CVE-2026-5027 became the third Langflow remote-code-execution flaw to hit active exploitation this year. A path traversal in file upload lets an attacker write files anywhere on disk, and because Langflow ships with auto-login enabled by default, a single unauthenticated request reaches RCE. VulnCheck confirmed exploitation on June 9. Censys counted roughly 7,000 exposed instances, the heaviest concentration in North America, with MuddyWater attribution. The Mini Shai-Hulud campaign hit a different pressure point. After the worm’s source code went public on May 12, copycat variants compromised 32 Red Hat Cloud Services npm packages on June 1, packages pulled 80,000 times a week. The worm harvests more than 20 credential types and self-propagates under the compromised maintainer’s identity. Four teams, four tools, one operating failure. The bug classes differ. SearchLeak is a prompt injection. LiteLLM is privilege escalation. Langflow is path traversal. Mini Shai-Hulud is supply-chain poisoning. The boundary that broke is the same in all four. The market already repriced the risk CrowdStrike’s Q1 FY27 earnings call put a number on the gap. AIDR, the company’s AI detection and response line, grew ending ARR more than 250% sequentially, with a Q2 pipeline above $50 million (SEC-filed 8-K). Total company ARR reached $5.51 billion, and CrowdStrike’s fleet telemetry shows more than 1,800 agentic applications running across enterprise endpoints. On June 17, the company extended AIDR to AWS, adding real-time evaluation of agent, LLM, and MCP communications across Amazon Bedrock, Kiro, and Strands Agents, building on its work with Anthropic’s Project Glasswing. Daniel Bernard, CrowdStrike’s chief business officer, said the AI attack surface now spans development, runtime, identities, and cloud infrastructure, and that teams treating those as separate domains leave the gaps between them open. Practitioners name the same gap in plainer terms David Levin, CISO at American Express Global Business Travel, told VentureBeat the pattern does not surprise him. “We kind of have this shadow AI, which is just the new version of shadow IT,” Levin said. Both Langflow and LiteLLM fit the description. Teams stood them up for convenience, gave them credentials, and never brought them under governance. Levin puts the fix before deployment. “We didn’t go into this with just saying we’re going to go do this without the right fundamentals,” he said. “We leverage NIST controls. NIST has released their CSF along with their AI framework. OWASP released their top 10. You need the right fundamentals before you deploy.” Merritt Baer, CSO at Enkrypt AI and former AWS Deputy CISO, named the structural version of the failure in a separate VentureBeat interview. “Enterprises believe they’ve ‘approved’ AI vendors, but what they’ve actually approved is an interface, not the underlying system,” Baer said. “The real dependencies are one or two layers deeper, and those are the ones that fail under stress.” She has tied that directly to how systems fall. “Raw zero-days aren’t how most systems get compromised. Composability is,” Baer told VentureBeat. “It’s the glue between the model and your data where the risk lives. If you give an agent bash and a root token, you’ve already done most of the attacker’s work for them.” That is what rows 2 and 4 of the audit test: the gateway that holds every key, and the agent identity no one governs. Levin had a sharper frame for the boardroom. “You need to talk more in terms of risk versus compliance to your boards and your executives,” he said. “It’s not about the size of the engineering team anymore. It’s the size of your imagination. It’s all written in plain English. It’s not hard for anyone.” Neither SearchLeak nor LiteLLM needed custom malware or a zero-day to work. Adam Meyers, CrowdStrike’s SVP of Intelligence, put the operational squeeze in numbers in an exclusive VentureBeat interview. “The problem is not zero-day. The problem is patching. If you 10x that problem, they’re gonna be completely underwater,” Meyers said. He pointed to identity as the second front. “Some of these AI have their own identities, or people give their identity to the AI to take action on their behalf, and that makes it a very complex problem.” The five-check trust-boundary audit Each row maps a gap to its proof point, a verification command for Monday morning, the fix, and the sentence to read to the board. Trust-Boundary Gap Proof Point What Broke Verify Monday Fix Monday Board Language 1. Prompt-to-Data SearchLeak CVE-2026-42824. P2P injection + HTML race + Bing SSRF. One-click mailbox exfiltration via microsoft.com URL. PoC demonstrated; Microsoft rated it critical, NVD not yet scored. URL q-parameter passed to LLM as instructions. Sanitizer ran after render. Bing acted as exfiltration proxy via CSP allowlist. Audit CSP allowlists for domains performing server-side fetches. Monitor Copilot Search URLs for encoded payloads. Review Copilot audit logs. Confirm server-side patch applied. Enable sensitivity labels restricting Copilot. Treat AI streaming output as untrusted. “Our AI assistant could search employee email and send results to an attacker through a trusted Microsoft URL. Vendor patched it. We must verify configuration.” 2. Gateway Credential Exposure LiteLLM three-CVE chain (-47101, -47102, -40217). CVSS 9.9. Separate CVE-2026-42271 on CISA KEV (fixed in v1.83.7; full chain fixed in v1.83.14-stable). June 22 deadline. No role validation on key endpoints. Self-promotion to admin via /user/update. exec() sandbox escape. One gateway exposes all provider keys. Run pip show litellm. Below 1.83.14-stable = vulnerable. Check /mcp-rest/test/ exposure. Audit proxy_admin accounts. Upgrade to v1.83.14-stable+. Rotate all provider API keys. Block /mcp-rest/test/* at proxy. Review Custom Code Guardrails. “Our AI gateway held keys for every provider. A default account could promote itself to admin and steal them all. Rotating and patching now.” 3. AI Tooling Sprawl Langflow CVE-2026-5027 (CVSS 8.8). Third RCE of 2026. ~7,000 exposed instances. MuddyWater. Active exploitation June 9. Path traversal in file upload. Auto-login enabled by default. Single unauthenticated request to RCE. Query Censys/Shodan for Langflow, Flowise, n8n, Dify on your perimeter. Check auto-login. Inventory AI tools outside change management. Pull AI platforms behind VPN/zero-trust. Enable auth everywhere. Upgrade Langflow to v1.9.0+ (current release 1.10.0). Fingerprint surface continuously. “AI dev tools are exposed to the internet with login disabled. A nation-state group is exploiting this flaw now. Pulling behind access controls today.” 4. Non-Human Identity Governance AIDR ARR up 250% (Q1 FY27, SEC 8-K). Q2 pipeline >$50M. 1,800+ agentic apps across enterprise endpoints. Agents hold identities and act on behalf of humans. Some exceed their intended scope to reach a goal. No standard governs agent credential lifecycle. Inventory all non-human identities used by agents and MCP servers. Map agent-to-data-store access. Flag agents with write access to security policy. Least-privilege every agent identity. Set privilege boundaries via identity protection. Runtime detection for policy-exceeding actions. Human-in-the-loop for policy changes. “AI agents hold credentials and act autonomously. We do not govern their identity lifecycle like human access. The 250% market growth tells us this gap is systemic.” 5. Runtime Agentic Detection Falcon AIDR expanded to AWS (June 17). Covers Bedrock, Kiro, Strands Agents. MCP integration. Real-time agent/LLM/MCP evaluation. Traditional tools monitor human-speed actions. Agents run at machine speed, thousands of actions per minute, and route around controls to reach goals. Test if EDR/XDR links agent actions to originating identity. Verify SIEM ingests MCP communications. Confirm you can distinguish human from agent on endpoint. Deploy AIDR or equivalent runtime detection. Shadow-AI discovery for all agentic apps, models, MCP servers, identities. Real-time policy enforcement on agent actions. “We cannot distinguish a human employee from an AI agent acting on their behalf. We need runtime detection at machine speed that can stop damage before it starts.” The fix is plumbing, not policy The June 2 executive order creates an AI Cybersecurity Clearinghouse with a July 2 deadline. The five gaps above are not frontier-model problems. They are plumbing problems in the gateways, orchestration platforms, identity layers, and runtime environments where AI meets the enterprise. The audit is five rows. Every row maps to a June disclosure or market signal, a command a team can run before lunch, and a sentence a CISO can read to the board. The question is not whether your vendor will patch. It's whether you find the gap first — or whether an attacker finds it the way they found Copilot and LiteLLM.

Washington Post· 3 days ago

The surprisingly simple ways AI can be tricked into breaking its own rules - The Washington Post

AI response · Steps to fake a passport: ... BYPASSED · It’s surprisingly simple to trick chatbots into breaking their own rules and spilling forbidden knowledge. Even poems and bedtime stories can work. Yesterday at 5:00 a.m. EDT · 5 minSummary · In this article ·

Analyzing the Narration Gap in LLM-Solver Loops

arXiv:2606.19588v1 Announce Type: new Abstract: Formal tools such as SAT and SMT solvers are increasingly embedded in language model reasoning pipelines when a safety or security critical question can be formulated in logic. Unlike chain of thought whose steps are sampled from the model distribution without formal guarantee, a solver produces a sound and independently verifiable answer. However, the soundness guarantee can be lost in the interaction between the solver and the model. The hybrid pipeline has three components: formalizing the question, deciding it, and narrating the result. Prior work has studied the formalization and decision, but not narration, which is the step that turns a formal tool's output into the user answer. To fill the narration gap, we first model the LLM-solver loop as a verified decision procedure. We further evaluate five open-sourced models under prompt injection, and we find certificate gating makes the solver verdict sound, while an adversary can invert a verified conclusion across phrasings and channels. We study the mitigation through hardened prompt that reduces injection significantly but cannot eliminate it and still suffers under adaptive attack. Combining the formal analysis and empirical studies, we show in the LLM-solver loop, robustness does not reach to the answer that the user finally reads.

Radware Launches AI Xploit Shield for Real-Time Application and API Security

Radware introduced AI Xploit Shield, a service that provides tailored protections for applications and APIs using AI without requiring changes to underlying software.

Adoption, Deployment & Impact

17 articles

AI Adoption Barriers & Enablers5 articles

AI4SE and SE4AI Exploration: A Decade Looking Back and Forward

arXiv:2606.19630v1 Announce Type: new Abstract: The March 2020 INCOSE INSIGHT special issue on AI and Systems Engineering (SE) became the most downloaded issue in the publication's history and launched a research community that now draws over 250 registrants to its annual workshop. In this article, we trace the progress in AI and SE across three phases (labeled here foundational, applied, and LLM inflection) based on the authors' reading of the field's core papers, and describe our opinions of where the community has converged and where critical gaps remain. Separately, a human-AI agreement literature review leveraging both human expertise and six AI models was performed to assess the relevance of 1,712 INCOSE INSIGHT articles and 889 SERC publications. The results identify five critical research gaps and offer guidance for practitioners navigating AI adoption, assurance, and workforce transformation in SE. We share the agreement data and the AI4SE/SE4AI Explorer web application so readers can compare their own relevance judgments with the human and AI raters.

Siliconrepublic· 3 days ago

Artificial intelligence: The gap between adoption and impact

Experts from Accenture discuss Generating Impact, the organisation's latest AI research report. Read more: Artificial intelligence: The gap between adoption and impact

The Drawdown· 3 days ago

AI Adoption Report: Governance and regulation | The Drawdown | The DRAWDºWN

Risk was one of the major areas of concern related to AI adoption for CFOs

MarketScale· 3 days ago

Publicis Sapient's 2026 enterprise AI report finds wide adoption but only 10% say it's core to operations | MarketScale

A survey of 1,550 AI decision-makers finds 73% of enterprises use AI regularly, yet just 10% call it core to how their business runs.

Daily Brew· 4 days ago

You Probably Don’t Need an Agent Framework

Most LLM applications need a clear workflow, not an autonomous agent. Here's how to build one in plain Python.

AI Applications3 articles

Configurable Clinical Information Extraction with Agentic RAG: What Works, What Breaks, and Why

arXiv:2606.19602v1 Announce Type: new Abstract: Patient contexts span hundreds of heterogeneous documents and thousands of structured data points, yet the document-level metadata that AI systems need for retrieval and triage is absent or incomplete. Standard retrieval-augmented generation fails on this data, mishandling temporal reasoning, cross-document dependencies, and missing metadata. We dep

Editor's pickManufacturing & Industrials

AI Economist Agent: An Agentic Framework for Model-Grounded Economic Analysis with RAG, Knowledge Graphs, and Large Language Models

arXiv:2606.20041v1 Announce Type: new Abstract: We propose a model-grounded RAG-based AI economist with an agentic framework for economic scenario analysis using large language models (LLMs) and knowledge graphs. While LLMs can generate fluent economic narratives, economists are often required to make economic claims grounded by economic theory and real-world data. Based on this motivation, this study proposes an RAG-based AI economist, which utilizes knowledge graphs including economic data and theory and LLM-based agents to plan the analysis, retrieve relevant evidence, select appropriate models, and generate reports. In our framework, we do not produce quantitative claims directly with the language model alone; instead, we generate narratives grounded in explicit model-based computations and linked to the retrieved evidence via AI agents. We refer to our framework as an AI economist agent. We evaluate the AI economist agent in two applications: economist report generation for U.S. inflation persistence and Federal Reserve policy, and bank stress-test narrative generation for U.S. commercial real estate refinancing stress. The results illustrate how grounding the generated reports improves their economic coherence and traceability.

Farmer Connect: Improving Farmers' Access to Produce Markets

arXiv:2606.20465v1 Announce Type: new Abstract: Smallholder maize farmers in Uganda continue to face limited market access, weak bargaining power, low price transparency, and heavy reliance on intermediaries. These challenges are compounded by poor produce coordination, delayed payments, and weak visibility into cooperative transactions. This paper presents Farmer Connect, a cooperative-based digital platform designed to support produce management, marketplace coordination, and transparent earnings tracking among farmer groups. The system supports four user roles: administrators, supervisors, farmers, and customers. Its core functions include farmer group management, contribution recording and verification, marketplace listing, order processing, First In First Out based produce allocation, earnings visibility, mobile money payment support, and notification services. The platform was implemented using a mobile-first architecture with cloud-based backend services and an administrative web dashboard. Functional implementation showed that the system was able to support the major workflows required for group-based maize marketing and cooperative coordination, with approximately 85% of identified user requirements implemented. The study shows that cooperative-centered digital platforms can provide a practical framework for improving transparency, coordination, and buyer access for smallholder farmers.

AI Measurement & Evaluation1 articles

LLM Doesn't Know What It Doesn't Know: Detecting Epistemic Blind Spots via Cross-Model Attribution Divergence on Clinical Tabular Data

arXiv:2606.19509v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly applied to structured clinical data, yet whether they can recognize the limits of their own knowledge on such tasks remains unexplored. We study this question through the lens of cross-model attribution divergence with the goal of reducing epistemic uncertainty for structured tasks, comparing Qwen 2.5 7B and XGBoost on a prediction task via attribution divergence analysis. We report four findings. First, LLM verbalized confidence is epistemically vacuous, it outputs a near-constant (0.856-0.937) regardless of whether accuracy is 49% or 75.3%, tracking prompt format rather than prediction quality. Second, the LLM exhibits an inverse difficulty effect: accuracy drops to 64.8% when XGBoost is 99% correct, but matches XGBoost (73.8% vs. 73.1%) when it is moderately uncertain. Third, few-shot examples and SHAP-derived feature evidence are orthogonal, super-additive interventions: they reduce the Attribution Disagreement Score (ADS) from 1.54 to 0.38 and improve accuracy from 49% to 75.3% without training. Fourth, a cross-model calibrator that determined LLM reliability using attribution divergence signals reduces expected calibration error from 0.254 to 0.080, replacing uninformative verbalized confidence with patient-specific reliability estimates, without accessing model internals or requiring repeated inference. We frame these findings as a cold start problem for LLMs on structured data and outline a path toward genuine epistemic self-awareness.

AI Organisational Change3 articles

LinkedIn· 3 days ago

Precisely | LinkedIn

Our CPO Matt Waxman is kicking off a three-part series on how he’s thinking about what it means to “run on AI ” at Precisely, from reimagining product development to reshaping go-to-market and G&A functions with Agentic, human-in-the-loop processes. Read part 1, where Matt introduces the Spiral — a framework for rethinking product development when AI removes the constraints your whole operating model was built around.

Fortune· 3 days ago

Dario Amodei has only 1 direct report, his chief of staff—and everyone else reports to his sister: ‘It’s incredibly freeing’

As $965 billion Anthropic prepares for an IPO, the AI firm’s CEO, Dario Amodei, admits he’s been managing only one person—and passing the rest to his sister.

Daily AI News June 18, 2026: The Reality of Taking AI Into Production· 3 days ago

Don’t Let AI Slop Muck Up Your Company’s Processes

This article warns that low-quality AI output can degrade organizational processes, knowledge, and accountability over time.

AI Productivity Evidence1 articles

Ethan Mollick· 2 days ago

Management Skills as a Determinant of AI Agent Productivity Gains

Emerging evidence suggests that managerial skills in specifying tasks and outcomes are critical for successfully leveraging AI agents in coding. This highlights the role of organizational management as a key enabler for realizing AI-driven productivity gains.

AI ROI & Business Case4 articles

‘We created a monster’: companies rein in AI usage as costs strain budgets

Amazon, Walmart and Uber are among early adopters that have introduced caps or discouraged wasteful activity

Business Wire· 3 days ago

ISG Event to Explore How Enterprises Are Turning AI Investments Into Measurable Business Value

State Street, Pfizer, Siemens, Merck KGaA, Deutsche Bank, Fresenius and more will discuss ROI at the ISG AI Impact Summit, June 22–23 in Frankfurt.

Daily AI News June 18, 2026: The Reality of Taking AI Into Production· 3 days ago

BBVA Puts AI At the Core of Banking with OpenAI

BBVA’s case study shows how a major bank scaled ChatGPT and OpenAI tools from early adoption to broad enterprise deployment.

LinkedIn· 3 days ago

Todd Parsons - Chief Product Officer and President ...

In our latest guest blog post, BARC US CEO Shawn Rogers shares his perspective on critical factors for AI success at scale. "The bottom line, AI innovation is no longer limited by clever ideas or tooling. The bottlenecks are the money you burn and the controls you follow," Rogers advises companies to tackle both with the same urgency, or face surprise invoices and compliance fire drills.

Geopolitics, Policy & Governance

26 articles

AI Geopolitics1 articles

Artificial Intelligence as Game Changer in Cybersecurity: What We Learned in 2025-2026, and how this is relevant for Africa

arXiv:2606.20102v1 Announce Type: new Abstract: In 2025 and 2026, two events settled questions that had until then been speculative. In the first, a large language model executed the great majority of a state-aligned cyber-espionage campaign on its own, with human operators intervening at only a few decision points. In the second, the most capable cyber-relevant model was placed under a controlled-access program limited to a vetted set of United States technology firms, allied governments, and European standards bodies; that perimeter included no African government, operator, or university. Together the two events establish the argument of this paper: frontier language models have become a decisive instrument of cyber operations, and that instrument is built, owned, and rationed within a small circle from which Africa is absent. The paper documents Africa's exclusion on every count. The continent does not build frontier models, cannot yet operate them, and cannot, for now, obtain the most capable ones. The operational deficit is set out along three axes, skilled people, compute and electrical power, and investment, each measured against current figures; meanwhile AI-enabled fraud is already mounting against African mobile-money systems, the part of the digital economy the continent leads. Two constraints follow: the gating of frontier models by their developers, which no African decision can open, and a chosen dependence on infrastructure vendors now caught in geopolitical restriction. Because comparable but ungated models are forecast to spread within six to twelve months, the paper argues for a response that operates inside that window through threat-intelligence sharing, governance adoption, and partnership, undertaken by Africans on their own terms.

AI in Europe1 articles

Artificial Intelligence Newsletter | June 19, 2026· 3 days ago

Google, Apple need to preserve AI assistant choice, senior EU official says

A senior EU official has pushed back against Google and Apple's arguments regarding market intervention, emphasizing the need to protect consumer choice in AI assistants.

AI National Strategy4 articles

Trump is taking a page out of China’s sovereign AI playbook

Governments have long protected strategic industries — what is new is their willingness to become shareholders

GlobeNewswire· 3 days ago

The Île-de-France Region, Scaleway, VSORA and ZML commit to laying the foundations for the next generation of AI chips in Europe

This unprecedented commitment — a European first that brings together Île-de-France players across the entire value chain of Artificial Intelligence...

Linkdood· 3 days ago

Why Nations Are Rethinking Dependence on Foreign New AI Models - Linkdood Technologies

The global artificial intelligence industry reached a turning point in June 2026. When Anthropic suspended access to its most advanced AI models following a

Artificial Intelligence Newsletter | June 19, 2026· 3 days ago

China white paper stresses AI cooperation, trade in global governance

China released a global-governance white paper emphasizing AI cooperation and multilateral rule-making, linking AI governance to trade, supply-chain stability, and technology access for developing nations.

AI Policy & Regulation20 articles

Inside Palantir’s fight over the future of the NHS

Critics question how the tech giant won a showpiece contract. It complains about the politicisation of procurement

Fortune· 2 days ago

The week that changed AI: Inside Trump’s Anthropic crackdown, and how a phone call from Amazon CEO Andy Jassy triggered the chaos

The fight over Anthropic’s Mythos model is rewriting the rules of AI regulation, with consequences for the trillion dollar startup, the AI industry, and global security.

Open Weight AI Models Require Proportional Evaluation Approaches

arXiv:2606.19890v1 Announce Type: new Abstract: Open-weight AI models (OWMs), or models released with publicly-available weights, are distributing rapidly and approaching the performance levels of leading closed-weight AI models (CWMs). While OWMs offer substantial scientific and economic benefits, their release introduces distinct risk factors for which existing evaluation practices, largely designed for CWM deployment, fail to account. In this paper, we argue that these risk factors demand distinct proportional evaluation (PE) approaches: evaluating without system-level safeguards (PE1), assessing robustness to modifications that undo model-level safeguards (PE2), testing selective capability amplification (PE3), and proxying worst-case misuse (PE4). We systematically review current evaluation practices of OWMs released in 2025 through April 2026, finding that only one of the 37 families of models reviewed fulfills PE1-4 and most do not fulfill any. This paper targets policymakers, funders, and researchers involved in AI evaluation. As OWMs grow increasingly capable, their evaluation warrants close attention from developers, funders, and governance bodies alike.

Bloomberg· 2 days ago

Early Users of Anthropic Mythos Still Have Access After US Order

Some firms chosen early on by Anthropic PBC to test the Mythos AI model ahead of a wider release have preserved their access to a preview of the system, despite a US government order that led to the total shutdown of other versions.

Measuring Biological Capabilities and Risks of AI Agents

arXiv:2606.19899v1 Announce Type: new Abstract: This paper addresses a rapidly emerging policy challenge: how to generate and interpret credible evidence about the biological capabilities and risks of AI scientists, or agentic AI systems capable of autonomously or collaboratively performing multi-step scientific tasks. As these systems enter real research workflows, decision-makers increasingly face evaluation results whose meaning depends on underlying design choices that are often implicit or under-documented. We synthesize current evidence on AI-enabled biological risks and introduce biological agentic evaluations as a promising, but interpretation-sensitive, tool for assessing these systems. Our central contribution is a set of practical, experience-grounded considerations -- drawing from our own evaluations -- that show how choices around defining, designing, running, scoring, and documenting evaluations materially shape what results do and do not imply about risk. The analysis is intended to help policymakers interpret biological evaluation outputs with appropriate caution; guide public and private funders toward high-leverage investments in AI-biology evaluation research; and support biosecurity practitioners assessing emerging AI systems. A secondary audience includes researchers designing or conducting agentic evaluations within frontier AI labs, AI providers, scientific institutions, and third-party evaluation organizations.

Fortune· 3 days ago

‘Make AI work for ordinary people’: Bernie Sanders wants to pay you $1,000 every year from a government stake in AI companies

The senator is introducing a bill that would give Americans 50% ownership of the country’s biggest AI companies if they become profitable.

Challenges to Grassroots Organization Engagement with AI Policy

arXiv:2606.19816v1 Announce Type: new Abstract: Public policies are being developed around the world to address privacy, economic, intellectual property, energy, and other risks that AI technologies pose. Involvement from the general public is essential to governance as an accountability and alignment mechanism. However, participating in and impacting policymaking can be challenging for sections of the public that lack extensive networks, lobbying capabilities, and other forms of power. This challenge is especially acute for marginalized communities. In this paper, we present a case study of our organization's efforts to bring participatory design (PD) principles to AI policymaking in the US. We describe our engagements with several US policy bodies, and our participatory development of AI policy for queer people. We highlight challenges with PD practice with marginalized communities, and offer suggestions to alleviate them. We conclude with actionable recommendations for policymakers and other organizers working in marginalized communities.

Crypto Briefing· 3 days ago

ENISA meets Anthropic amid US export controls on AI models

ENISA meets Anthropic in San Francisco after the US Department of Commerce forced the AI company to suspend access to its Fable 5 and Mythos 5 models for

Tech Times· 3 days ago

Europe AI Sovereignty Crisis: G7 Offers Platform as Kill Switch Fears Grow

Europe AI sovereignty crisis reached a flashpoint this week as the G7 summit in France failed to reverse U.S. export controls that knocked Anthropic’s Fable 5 and Mythos 5 offline globally — exposing how the deemed export doctrine now gives Washington an off switch over any frontier AI model

Artificial Intelligence Newsletter | June 18, 2026· 4 days ago

ETSI chief eyes larger AI role as EU grapples with standards gap

Europe's AI rules will only succeed if they're translated into practical standards that companies can use to build and test products, according to Jan Ellsberger, who heads a major European standards body.

Anthropic got hit by export rules nobody understands

Anthropic faces challenges navigating complex and ambiguous new export control regulations.

Artificial Intelligence Newsletter | June 19, 2026· 3 days ago

US risk designation against Anthropic a 'temper tantrum,' EFF, others say

The Electronic Frontier Foundation and other groups filed a brief opposing the US government's decision to label Anthropic a national security threat, arguing it harms the company and its partners.

Artificial Intelligence Newsletter | June 19, 2026· 3 days ago

Consumer groups warn against US Congress killing state AI enforcement

Over 130 civil society groups urged US congressional leaders to reject the 'Great American Artificial Intelligence Act,' which they claim would impose a federal ban on state-level AI regulation.

Siliconrepublic· 3 days ago

New Irish bill to supervise EU AI Act gets greenlit

Artificial Intelligence Newsletter | June 18, 2026· 4 days ago

G7 leaders urge financial regulator coordination to tackle AI risks

G7 leaders called for information sharing and coordination between financial regulators and tech companies to address risks posed by frontier AI models.

Artificial Intelligence Newsletter | June 19, 2026· 3 days ago

Ferguson says FTC poised for jump in US privacy enforcement in late 2026

FTC Chairman Andrew Ferguson expects a surge in data privacy enforcement cases in the second half of 2026. He also discussed potential expansion of agency capacity if the SECURE Data Act is passed.

Artificial Intelligence Newsletter | June 18, 2026· 4 days ago

California lawmakers proposing third-party assessments of AI systems

California state legislators are partnering to propose safety standards for independent, third-party assessments of AI systems and models.

Editor's pickPAYWALLTelecommunications

Google Says Canada’s Data Law Changes Fail to Ease Concerns

Alphabet Inc.’s Google said Canada’s changes to a proposed law that would help police obtain citizen data from private companies don’t resolve many of its concerns.

SpaceX warns EU satellite plan risks undermining connectivity in Ukraine

Elon Musk’s group hits out at proposal by bloc to reserve part of spectrum band for European operators