AI Intelligence Brief

Thu 18 June 2026

Daily Brief — Curated and contextualised by Best Practice AI

100Articles
Editor's pickSummary

SpaceX Acquires Cursor, Meta Seeks Wall Street, and Investors Face Political Risk

TL;DRSpaceX has acquired Anysphere, the startup behind AI coding agent Cursor, for $60 billion to enhance its enterprise AI tools. Meta is pursuing Wall Street financing for a $600 billion infrastructure push. Investors are now considering political risk as a significant factor in AI investments. Meanwhile, BE Semiconductor has raised its long-term revenue targets due to AI demand.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

8 of 100 articles
Lead story
Editor's pickPAYWALLTechnology
FT· 3 days ago

Meta’s AI gamble: Dina Powell McCormick opens door to Wall Street

Former Goldman Sachs executive explores financing once alien to Silicon Valley for $600bn infrastructure push

Editor's pickFinancial Services
Arxiv· 3 days ago

The Illusion of Improvement: Reject Inference Strategies in Credit Scoring

arXiv:2606.18479v1 Announce Type: cross Abstract: Reject inference methods are widely used to mitigate survival bias in credit scoring, yet their effectiveness remains poorly understood. We systematically evaluate several such methods and uncover a structural failure mode: in a natural retraining cycle, models whose accuracy improves while recall collapses create an illusion of improvement that l

Editor's pick
Arxiv· 3 days ago

A Knowledge Theory of Capital:The Value of Natural and Artificial Intelligence

arXiv:2606.18288v1 Announce Type: new Abstract: This volume develops a knowledge theory of capital for economies in which productive capacity increasingly resides in software, data, models, routines, expertise, platforms, organizations, commons, and public epistemic infrastructure. Beginning from Adam Smith's theory of labour, stock, specialization, and market extent, it asks what changes when kn

Editor's pickPAYWALLTechnology
Bloomberg· 3 days ago

Anthropic Ban Forces Investor Rethink of Political Risk

Investors in the ever-hotter AI stock rally must suddenly consider a risk with the potential to be even more damaging than high valuations and big spending: Politics getting in the way.

Editor's pickProfessional Services
Arxiv· 3 days ago

CEO-Bench: Can Agents Play the Long Game?

arXiv:2606.18543v1 Announce Type: new Abstract: Language model agents are becoming proficient executors at isolated, short-horizon tasks such as software engineering and customer service. Yet real-world challenges require a combination of sophisticated skills that remain largely untested in agents: (1) navigating long horizons amid uncertainty; (2) acquiring information in noisy environments; (3)

Editor's pickPAYWALLTechnology
Bloomberg· 3 days ago

Blackstone’s AirTrunk Seeks $3 Billion Loan for Sydney Project

AirTrunk Pty., a data center operator backed by Blackstone Inc., is in talks with banks for a A$4.3 billion ($3 billion) loan to back a new project in Australia, people familiar with the matter said, as it continues its debt-fueled expansion across Asia Pacific.

Editor's pick
Arxiv· 3 days ago

Testing Centralized and Polycentric Computational Planning

arXiv:2606.19214v1 Announce Type: new Abstract: This paper presents a reproducible synthetic benchmark comparing a computational planner, an agent-based market, and a hybrid meta-market within a common simulated economy. The benchmark incorporates input-output production networks, heterogeneous firms, capacity constraints, endogenous prices, welfare metrics, structural shocks, adversarial stress

Editor's pickPAYWALLTechnology
WSJ· 3 days ago

BE Semiconductor Raises Long-Term Revenue, Profitability Targets on AI Boost

BE Semiconductor Industries raised its long-term revenue and profitability targets, citing increased demand for AI-related products.

Economics & Markets

28 articles
AI Business Models2 articles
Editor's pickMedia & Entertainment
Arxiv· 3 days ago

The Market in the Model: Latent Diffusion as Neural Economy

arXiv:2606.19151v1 Announce Type: new Abstract: Valuable critique of generative image models within visual culture and the humanities has emphasized the role of datasets in shaping the images they produce. Yet, close studies of the ideological positions embedded into the mechanism of the models have been neglected, leaving them imagined as "black boxes." In a bid to expand, rather than replace, dataset critique, this paper examines the mechanisms of the latent diffusion model in terms of the problems they were brought in to solve on behalf of computer vision engineers, and the decisions each component was tasked with automating. I interpret that ensemble through the histories of its parts and the theory of vision the system inscribes into every generated image. Drawing on Impett and Offert's notion of neural exchange value, I offer this analysis to argue that the model operates as a neural economy: a contained symbolic system that abstracts social communication into commensurable vectors as it transfers the social sphere into parcels for sale. Tracing the training and generation pipelines component by component reveals what each operation displaces, and how it further entrenches the logics of platform and attention economies over social communication. The paper warns that any critique fixated exclusively on copyright and commodity defenses risks reaffirming the very fetishism the model produces, and argues instead for centering social exchange.

AI Investment & Valuations10 articles
Editor's pickPAYWALLTechnology
WSJ· 3 days ago

BE Semiconductor Raises Long-Term Revenue, Profitability Targets on AI Boost

BE Semiconductor Industries raised its long-term revenue and profitability targets, citing increased demand for AI-related products.

Editor's pickPAYWALLDefense & National Security
Bloomberg· 3 days ago

AI Company Dream Triples Value to $3 Billion in Funding Round

Dream, an Israeli artificial intelligence company that provides AI and cybersecurity services to governments and critical infrastructure operators, has raised $260 million at a $3 billion valuation.

Editor's pickPAYWALLTechnology
Bloomberg· 3 days ago

Plurimi CIO Sees Value in AI Supply Chain

Plurimi Wealth Chief Investment Officer Patrick Armstrong says he sees value in the artificial intelligence supply chain, stocks such as Micron, Samsung and Nvidia. "We're underweight the Magnificent Seven — the hyperscalers — those are the companies that are spending a trillion dollars over the next 12 months on AI datacenters," Armstrong tells Bloomberg Television. "And we own the companies they're giving the money to." (Source: Bloomberg)

Editor's pickPAYWALLTechnology
FT· 3 days ago

Meta’s AI gamble: Dina Powell McCormick opens door to Wall Street

Former Goldman Sachs executive explores financing once alien to Silicon Valley for $600bn infrastructure push

Editor's pickPAYWALLTechnology
Bloomberg· 3 days ago

Jeremy Grantham on How to Tell if a Bubble's About to Burst | Odd Lots

Jeremy Grantham, co-founder and long-term strategist of GMO, has a long history of calling bubbles. As he recounts in his new memoir, The Making of a Permabear: The Perils of Long-Term Investing in a Short-Term World, that includes spotting the dot-com bubble of the early 2000s, which some people see as analogous to the current excitement over AI. And when it comes to today's market, there are a lot of signs of frothiness you could point to. In this episode, we speak to Grantham about how he sees markets right now, including a watershed change for Big Tech stocks, the signs he watches out for to spot when a bubble might burst, and what really keeps him up at night. (Source: Bloomberg)

AI Market Competition8 articles
Editor's pickPAYWALLTechnology
Bloomberg· 3 days ago

Star Google Researcher Joins OpenAI in Coup for ChatGPT Creator

One of Google’s most prominent researchers is leaving for rival OpenAI, dealing a setback to Alphabet Inc. in a multibillion-dollar race to build the world’s most powerful artificial intelligence models.

Editor's pickHealthcare
Arxiv· 3 days ago

Strategic Feature Selection

arXiv:2606.18867v1 Announce Type: cross Abstract: When algorithmic predictors inform resource allocation in high-stakes domains such as healthcare, these predictors must account for strategic manipulation of input features. The typical solution is to redesign the predictor itself to explicitly account for strategic interactions. In practice, however, decision makers are often constrained to adjusting coarser levers within existing prediction pipelines. For example, healthcare organizations often select which features to exclude based on perceived manipulability, while using standard regularization procedures to shrink the coefficients of retained features. In this work, we initiate a formal study of strategic classification through feature selection and its interaction with ridge regularization. Our main finding is that excluding individual features based on their manipulability alone is generally suboptimal. We provide a fine-grained characterization of the performance of a feature subset under optimal regularization, yielding new insights for policy design. Motivated by this characterization, we develop a practical algorithm for jointly choosing the feature set and the level of ridge regularization. Through a real-world case study on a healthcare payments benchmark, we illustrate how our algorithm can guide the design of coarse policy levers in practice. Our results provide a principled, practical framework for mitigating the effects of strategic behavior in algorithmic decision-making systems.

Editor's pickPAYWALLTechnology
Bloomberg· 3 days ago

China AI Lab’s 170% Stock Surge Cements Winner-Loser Pair Trade

A pair trade is emerging in China’s artificial intelligence sector, with investors piling into the perceived winner and betting against its rival seen as losing ground in the race to build commercially viable AI models.

Editor's pickTechnology
Artificial Intelligence Newsletter | June 17, 2026· 4 days ago

AI 'invisible agents' pose new competition risks, Australian minister says

Australian official Andrew Leigh identified four competition risks in agent-mediated markets, including hidden steering and personalized pricing, suggesting policy tools like audit trails.

Editor's pickMedia & Entertainment
Artificial Intelligence Newsletter | June 18, 2026· 4 days ago

Anthropic, co-founders face new US copyright infringement suit from 100 authors

Around 100 authors have filed a lawsuit against Anthropic, alleging the company used pirated books from library websites to train its AI models.

Editor's pickPAYWALLTechnology
FT· 4 days ago

Buying Cursor could be SpaceX’s Instagram moment

Elon Musk can learn from Mark Zuckerberg by refraining from too much meddling

Editor's pickMedia & Entertainment
Guardian· 4 days ago

Fears for Xbox as it puts its developers on the chopping block once again

After the billion-dollar company’s leaders sent staff a memo saying the brand had ‘over-extended’, game studios may be in the firing line Don’t get Pushing Buttons delivered to your inbox? Sign up here In March 2000, Bill Gates stood onstage at the Game Developers Conference in San Francisco and, to a packed crowd, officially announced the company’s long-anticipated video game console. “We want Xbox to be the platform of choice for the best and most creative game developers in the world,” he told attenders – and that was indeed the intention of the small, dedicated team who put together the blueprints of that first machine. The Xbox landscape seems very different 25 years later. Last week, mere days after a bullish summer showcase full of Gears of War revivals and promises of a renewed focus on Xbox’s gaming strengths, new CEO, Asha Sharma, and chief content officer, Matt Booty, wrote a memo to Xbox staff inviting them to brace for “hard truths”. “Excluding Activision Blizzard King, over the past five years, we have spent over $20bn on ongoing investments in our content, platform and hardware subsidy, but our annual revenue has declined nearly half a billion during that time. Going forward, this cannot continue,” it read. Continue reading...

Editor's pickPAYWALLTechnology
Bloomberg· 3 days ago

Korea Rejects Baemin, Coupang Settlement Bids in Antitrust Probe

South Korea’s antitrust watchdog has rejected settlement proposals from the country’s two largest food delivery platforms, Baedal Minjok and Coupang Eats, leaving both firms exposed to potentially significant fines over alleged unfair business practices.

AI Pricing & Cost Curves2 articles
AI Productivity3 articles
Editor's pickProfessional Services
Arxiv· 3 days ago

Searching for Synergy in Shared Workspace Human-AI Collaboration

arXiv:2606.18413v1 Announce Type: new Abstract: Automated AI agents are increasingly capable, yet many scientific and professional tasks require human judgment and contextual expertise. We study shared-workspace human-AI teams, where AI agents and human collaborators must coordinate responsibilities before submitting a final answer. Using the Collaborative Gym environment with DiscoveryBench tasks, we examine when adding simulated human collaborators improves performance and when process loss turns additional collaborators into coordination overhead. Across 1,482 sessions, adding relevant collaborators can lower performance when teams lack structure to coordinate their contributions. We then evaluate scaffolding that combines shared group memory with simulated human-in-the-loop (HITL) gates, where selected actions require approval from a designated simulated participant. This scaffolding yields higher mean performance, most clearly in three-person teams, with clearer responsibility signals and stronger routing of expertise to team actions. Overall, how human-AI teams coordinate and integrate expertise matters as much as the capability available to them.

Editor's pickManufacturing & Industrials
Daily Brew· 5 days ago

MIT’s Initiative for New Manufacturing builds momentum

MIT is advancing its manufacturing initiative, focusing on new technologies and industrial processes.

Editor's pickEnergy & Utilities
Arxiv· 3 days ago

Optimizing Lithium Production Decisions under Geological, Demand, and Pricing Uncertainties: A POMDP Framework for Multi-Objective Decision Making

arXiv:2606.18598v1 Announce Type: new Abstract: Decision making in lithium production is challenging, whether from an investor's perspective or a strategic production standpoint. Determining which mines to open and when to open them involves not only geological and price uncertainties, but also complexities around the choice of extraction method, from direct lithium extraction to hard rock mining. Prior work explored models of this problem and different methods to optimize mining decisions; these models did not account for uncertainty in pricing, uncertainty in demand, or different mining technologies to extract lithium. Incorporating different pricing models and extraction technology into these models enables more robust strategies for determining not only when and where to open a mine, but also which method of production to pursue. We frame the problem as a partially observable Markov decision process (POMDP) and solve using belief state planning methods to get optimal decision making. In our study, we show that POMDP solvers outperform human inspired heuristics by dynamically adapting to shifting lithium price regimes (static, linear, exponential, and stochastic) through belief state planning and explicit uncertainty management. By optimally sequencing exploration, production, and technology choice, the framework achieves higher demand fulfillment and more balanced economic environmental outcomes over the projects lifetime in all different pricing and deposit scenarios.

Labor, Society & Culture

14 articles
AI & Employment3 articles
Editor's pickProfessional Services
Guardian· 3 days ago

Gig workers are endlessly exploited. AI could make more of us share their fate

As companies integrate AI and hire fewer employees, a shift toward a ‘gig economy’ will commence In 2024, the buy-now-pay-later company Klarna announced that it would cut hundreds of customer service roles and begin using an artificial intelligence chatbot instead. The move was expected to save the company millions. But a year later, after customers complained about the degraded quality of customer service, Klarna began to quietly recruit human customer service agents back. At first glance, the reversal appeared to be a victory for human workers in the age of AI. The reality was more complex. Instead of bringing on full-time customer service agents, who Klarna contracts through an outside agency, it instead brought on workers in what Klarna CEO Sebastian Siemiatkowski has described as “an Uber type of set-up”. Now, an AI chatbot continues to handle most of customers’ basic queries, while a growing number of gig workers handle the more advanced ones. “Just like somebody can go and drive an Uber for a while, they can actually jump on and work for Klarna’s customer service,” Siemiatkowski said on a podcast in February. Continue reading...

Editor's pickTechnology
Reuters· 3 days ago

Reuters Reuters | Breaking International News & Views

AI will lead to labour shortages, Bezos says in optimistic talk · ago · Exclusive: Meta head of product for ' AI for work' transformation is leaving company · Tracking Fed moves · ago · Warsh kicks off Fed chief era with sweeping review as rates remain unchanged ·

AI & Misinformation1 articles
Editor's pickHealthcare
Arxiv· 3 days ago

RELIANCE: Curating and Evaluating Reproductive Health Information on Social Media

arXiv:2606.18285v1 Announce Type: cross Abstract: Social media platforms like TikTok have become a key source of health information, with studies reporting inaccuracies in posts. As Large Language Model (LLM) providers increasingly integrate LLMs into digital platforms to fact-check content (e.g., Grok and Perplexity on X and WhatsApp, respectively) and are being used by people to fact-check information, deploying these systems in critical areas such as reproductive health without rigorous evaluation can cause serious harm. We introduce RELIANCE, an expert-annotated dataset of health information on TikTok surrounding pregnancy and postpartum queries, serving as both an analysis of the reproductive health information landscape and an evaluation of LLMs' capabilities in fact-checking this content. Our dataset comprises 409 annotated sentences from 336 videos across 56 clinician-reviewed queries, annotated by three expert clinicians in Obstetrics, Gynecology, and Internal Medicine. Our findings reveal that nearly 60\% of the health information in the videos we sampled is accurate. Furthermore, LLM evaluations reveal a gap between evaluating specific claims and evaluating the entire content (15\%). We believe that our methodology, dataset, and tool will support the machine learning community in improving LLMs for important domains with real-world data, extending to other platforms and languages, and helping the health community further understand the information landscape on social media. Our dataset and code are made available at https://realize-lab.github.io/RELIANCE/.

AI Ethics & Safety6 articles
Editor's pick
Exponentialview· 4 days ago

🔮 Is AI immune to groupthink?

Stress-testing AI councils in practice

Editor's pickPharma & Biotech
Arxiv· 3 days ago

SciRisk-Bench: A Risk-Dimension-Aware Benchmark for AI4Science Safety

arXiv:2606.18936v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly embedded in AI for Science (AI4Science) workflows, from scientific question answering and literature analysis to laboratory planning and autonomous discovery. This progress creates an urgent need for safety benchmarks that evaluate not only scientific competence, but also whether models recognize and avoid risks in high-stakes scientific contexts. Existing AI4Science safety datasets cover several disciplines and task formats, leaving the underlying risk dimensions underspecified. We introduce \textbf{SciRisk-Bench}, a benchmark designed to evaluate AI4Science safety from two complementary perspectives: explicit risk dimensions and scientific disciplines. SciRisk-Bench covers 7 disciplines, 31 subdisciplines and 10 risk dimensions. In the experimental section, we evaluate both mainstream LLMs and science-oriented LLMs across risk dimensions, disciplines, and sub-disciplines, enabling fine-grained diagnosis of where scientific models remain unsafe.

Editor's pickPAYWALLTechnology
NYT· 3 days ago

New Super PAC, the Guardrails Alliance, Aims to Rally Tech Workers to Help Limit A.I.

The Guardrails Alliance, which has raised $5 million, is positioning itself as a populist effort that will take on the pro-A.I. interests trying to influence this year’s elections.

Editor's pickTechnology
Arxiv· 3 days ago

Digital Speech Acts Retain Control of Copyright with People, Not Platforms

arXiv:2606.19263v1 Announce Type: cross Abstract: Legal precedents protect computer code as copyrightable expression. They have enabled centralized digital platforms -- operating from corporate servers that hold all user data -- to construct private governance regimes through the interaction of copyright, contract, and technical architecture: people who create virtually all platform value must surrender effective copyright control through Terms of Service agreements as a condition of participation. In contrast, grassroots platforms consist of cryptographically-identified people operating their networked smartphones independently of any server or global resource; each person holds their own data on their own device, with no third party in possession or intermediation. Here, we define the notion of a \textit{digital speech act} -- a deliberate volitional act by a person of cryptographically signing personal content with the person's private key, carried out on the person's own device -- through which the person simultaneously establishes attribution, accountability, and authorship over the signed content. We contend that (\ia) digital speech acts qualify for copyright protection under existing U.S.\ precedent: \textit{Burrow-Giles} locates authorship in volitional creative choices despite mechanical or algorithmic processes, \textit{Feist} supplies the minimal-creativity threshold, and persistent device storage satisfies the Copyright Act's fixation requirement; (\ib) the digital social contract underlying grassroots platforms preserves this copyright by design -- signed content cannot be unbundled from its signature, and the full provenance chain accumulates as content is forwarded -- so that ownership and possession coalesce in the person; and (\ic) copyright in digital speech acts is a prerequisite for digital sovereignty and democratic self-governance.

Editor's pickPAYWALLTechnology
Bloomberg· 3 days ago

Circles Sold Phone Spy Tools to Repressive States, Report Says

A Bulgaria-based company sold controversial surveillance technology to governments in countries with records of repression, enabling authorities to track mobile phones and eavesdrop on private communications, according to documents obtained by Human Rights Watch.

Editor's pickDefense & National Security
Daily Brew· 3 days ago

Elon Musk's Grok AI Sparks Debate on Ethics and Oversight in Military Use

Elon Musk's AI tool, Grok, has been identified as a component in US military operations, triggering debates on ethics and oversight in defense.

AI Skills & Education2 articles
Editor's pickEducation
Arxiv· 3 days ago

Confident yet Concerned: Inconsistencies in Computing Students' Attitudes on Cybersecurity

arXiv:2606.18541v1 Announce Type: cross Abstract: Today's young adults are most immersed in technology, leading in feelings of powerlessness in managing online privacy across many platforms, and particularly susceptible to phishing attacks. This raises questions about their general, wide-ranging attitudes towards and management of cybersecurity. How do young, tech-savvy adults approach cybersecurity? We seek a better understanding of their cybersecurity knowledge, attitudes and experiences, in particular in addressing deceptive online communications. We surveyed a group of `lead users': computing university students (n = 236). By combining thematic analysis of open-ended responses with quantitative data, we provide insights into their experiences and perceptions. While students demonstrate reasonable cybersecurity awareness, their cybersecurity experiences vary, and inconsistencies exist around their practices, perceptions of responsibility, and support structures. Findings also reveal four key thematic tensions: 1) Computing students are knowledgeable yet have persistent incorrect beliefs, 2) They learn more about keeping safe from sources outside the classroom, 3) They have limited assistance and have fallen victim to cybercrime, and 4) Many are confident, yet others are concerned about their own safety and responsibility. Through cluster analysis of attitudes, we identify two groups, with one feeling less prepared, less confident, yet expressing a desire to learn more. Established measures of intentions and objective knowledge were correlated to preparedness. Self-efficacy correlated to confidence and predicted cluster membership.

Public Attitudes to AI2 articles

Technology & Infrastructure

30 articles
AI Agents & Automation5 articles
Editor's pickTechnology
Arxiv· 3 days ago

Skill-Guided Continuation Distillation for GUI Agents

arXiv:2606.18890v1 Announce Type: new Abstract: Improving GUI agents typically relies on behavior cloning on expert trajectories. However, as the current policy deviates from the expert policy, it inevitably encounters policy-induced off-trajectory states during closed-loop execution, i.e., states that fall outside the expert trajectories. Since expert trajectories provide no demonstrations for these unseen states, such states receive no effective supervision, leaving the policy unable to select the correct action. To close this supervision gap, we propose Skill-Guided Continuation Distillation (SGCD), an iterative self-improvement framework. SGCD first runs the plain policy without skill guidance for a few steps to reach realistic off-trajectory states. From these states, a skill-guided policy then completes the task and produces successful continuations, which are mixed with expert trajectories to supply supervision over policy-induced off-trajectory states. The skills are extracted from both successful and failed rollouts, consisting of Continuation Plans, Critical Targets, Failure Traps, and Success Criteria. On OSWorld-Verified, SGCD improves the success rate of three base models from the low-30\% range to over 50\%, demonstrating its effectiveness and generality.

Editor's pickConsumer & Retail
Arxiv· 3 days ago

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

arXiv:2606.18847v1 Announce Type: new Abstract: To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.

Editor's pickProfessional Services
Arxiv· 3 days ago

CEO-Bench: Can Agents Play the Long Game?

arXiv:2606.18543v1 Announce Type: new Abstract: Language model agents are becoming proficient executors at isolated, short-horizon tasks such as software engineering and customer service. Yet real-world challenges require a combination of sophisticated skills that remain largely untested in agents: (1) navigating long horizons amid uncertainty; (2) acquiring information in noisy environments; (3)

AI Models & Capabilities12 articles
Editor's pickHealthcare
Daily Brew· 3 days ago

Midjourney Medical goes from generating ‘cat images’ to full-body ultrasound scans

Midjourney's AI technology is being applied to medical imaging, specifically for ultrasound scans.

Editor's pickDefense & National Security
Arxiv· 3 days ago

NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

arXiv:2606.18271v1 Announce Type: new Abstract: As Earth Observation data generation outpaces downlink bandwidth and human-in-the-loop processing, a widening gap has emerged between onboard collection and actionable ground intelligence. This paper presents NAVI-Orbital, a software system deployed on a Low Earth Orbit (LEO) spacecraft. On April 16, 2026, NAVI-Orbital achieved what is, to the authors' knowledge, the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard. NAVI-Orbital uses a local vision-language model (Gemma 3) to classify each captured scene, produce a text description of its content and the relationships between its features, and respond to operator follow-up via natural-language dialogue. The system is re-tasked through plain-English prompts in place of conventional command sequences, and is orchestrated by a graph-based state machine (LangGraph) coordinating dedicated agents for detection and dialogue. Results across ground benchmarking (88.16% accuracy on the 7,960-image curated AID benchmark), Flatsat validation, and live in-orbit captures of newly acquired, previously unseen Earth imagery (including uncorrected YAM-9 imagery, processed onboard with hardware-accelerated GPU inference and no fine-tuning for the flight instrument) demonstrate the feasibility of running foundation models on satellite-class edge computers to invert the conventional acquire-then-downlink-everything bandwidth profile through semantic compression of Earth observations in-orbit.

Editor's pickTechnology
Ethan Mollick· 4 days ago

Qualitative Differences in Model Judgment and Creativity Under Constraint

A comparison of open-weights models reveals nuanced differences in creative judgment that standard benchmarks fail to capture. These qualitative variations impact the utility of models for complex, constrained tasks.

Editor's pickTechnology
Substack· 3 days ago

Are Small Language Models the New AI Default?

Their June 2025 paper, Small Language Models are the Future of Agentic AI (Belcak et al.), argued that the narrow, repetitive sub-tasks inside most agent pipelines don’t need a frontier model.

Editor's pickTechnology
Daily Brew· 3 days ago

GLM-5.2 is the new leading open weights model on Artificial Analysis

GLM-5.2 has emerged as the top-performing open weights model according to the latest Artificial Analysis intelligence index.

Editor's pick
Arxiv· 3 days ago

CaVe-VLM-CoT: An Interpretable Vision-Language Model Framework

arXiv:2606.18385v1 Announce Type: new Abstract: Vision-Language Models (VLMs) remain prone to hallucinations, producing fluent but visually unfaithful outputs. Existing chain-of-thought and retrieval-augmented methods only partially address this, as they neither enforce step-level citation grounding nor route verification failures back to retrieval for correction. We present CaVe-VLM-CoT, a modular reflection-based agentic-RAG framework that enforces evidence-grounded reasoning through a five-stage closed-loop pipeline: Extractor, Retriever, Solver, Citation Injector, and Verifier, in which detected ungrounded claims trigger structured feedback to the Extractor for targeted re-retrieval. Since no existing framework jointly measures retrieval quality, step-wise citation faithfulness, and cross-modal grounding, we propose a suite of 23 component-wise metrics across all stages, anchored by CaVeScore, a composite metric weighting accuracy, citation precision and recall, attribution, and evidence grounding. Without any architectural or prompt modifications, CaVe-VLM-CoT achieves 87.1\% accuracy and 56.6\% CaVeScore on ScienceQA , and 55.2\% accuracy and 35.7\% CaVeScore on MMMU (30 subjects).

Editor's pick
Arxiv· 3 days ago

DeFAb: A Verifiable Benchmark for Defeasible Abduction in Foundation Models

arXiv:2606.18557v1 Announce Type: new Abstract: A rule-based logic solver resolves every instance in our benchmark in under 50 microseconds with 100% accuracy; the best frontier language model reaches 65% at best and drops to 23.5% under rendering-robust evaluation (worst case over four surface renderings). We introduce DeFAb (Defeasible Abduction Benchmark), a dataset and generation pipeline that converts four decades of publicly funded knowledge bases into formally grounded instances for defeasible abduction: constructing hypotheses that explain anomalies by overriding defaults while preserving unrelated expectations. Because every hypothesis must pass polynomial-time checks for valid derivation, conservativity, and minimality, DeFAb makes logical rigor the instrument for measuring creativity and theoretical reasoning, scoring the disciplined construction of theory revisions rather than fluent but theory-destroying prose. The pipeline pairs taxonomic hierarchies (OpenCyc, YAGO, Wikidata) with behavioral property graphs (ConceptNet, UMLS) to produce 372,648+ instances across 33.75M materialized rules from 18 sources, in three levels with polynomial-time verifiable gold standards. Four frontier models do not reliably internalize defeasible reasoning: rendering-robust Level 2 accuracy is 7.8-23.5%; chain-of-thought variance (~36 pp) exceeds any inter-model gap; and a matched contamination control isolates a +19.4 pp Level 3 gap. We further release DeFAb-Hard (a 235-instance Level 3 difficulty variant; best model 53.3% vs 100% symbolic) and CONJURE (a kernel-verified transformative-creativity variant of 560 Lean 4/Mathlib instances whose gold answers are definitions the proof kernel did not previously contain, judge-free verifier; a pilot finds zero novel concepts). The same verifier doubles as an exact reward for preference optimization (DPO, RLVR/GRPO). Released under MIT at https://huggingface.co/datasets/PatrickAllenCooper/DeFAb.

Editor's pick
Arxiv· 3 days ago

Generative-Model Predictive Planning for Navigation in Partially Observable Environments

arXiv:2606.18888v1 Announce Type: new Abstract: Navigation in partially observable environments presents a significant challenge for autonomous agents, requiring effective decision-making with limited sensory information in unknown environments. Belief-based methods, particularly those using neural networks to approximate the belief space, often fail to capture the inherent multimodality of belief spaces, especially in high-dimensional cases with perceptual aliasing. While generative models present a compelling alternative, they typically require substantial data or expert demonstrations and lack explicit mechanisms for long-term planning. In this paper, we introduce BeliefDiffusion, a novel framework that combines the benefits of both generation and planning. BeliefDiffusion leverages diffusion models to explicitly characterize multimodal belief distributions and utilizes Model Predictive Control (MPC) to simultaneously plan ahead. It consists of two steps: (1) Imagining plausible environment configurations based on observation history and (2) Planning efficient navigation strategies across an aggregated configurations. Through extensive experiments in synthetic map environments, we demonstrate that BeliefDiffusion significantly outperforms both model-free reinforcement learning baselines and other generative approaches in navigation success rate and path efficiency. Our results validate that explicitly incorporating multimodal belief representations into planning enables more robust navigation in partially observable settings.

Editor's pickManufacturing & Industrials
Daily AI News June 17, 2026: Is the Software Factory for Real?· 4 days ago

Qwen-Robot Suite: A Foundation Model Suite for Physical World Intelligence

Qwen-Robot Suite introduces an open foundation model stack for embodied AI, including navigation, world modeling, and manipulation models for robotics.

Editor's pickManufacturing & Industrials
Daily AI News June 17, 2026: Is the Software Factory for Real?· 4 days ago

Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models

NVIDIA's post explains how video world models can be adapted into action models for robotics and embodied AI systems.

Editor's pick
Arxiv· 3 days ago

What Must Generalist Agents Remember?

arXiv:2606.18746v1 Announce Type: new Abstract: This paper develops a formal account of what generalist agents must store in memory in order to act near-optimally across multiple environments and goals. It shows that when two domains share an observational bottleneck but require incompatible optimal actions, any uniformly near-optimal policy must induce distinct memory distributions at that bottleneck. The result yields a separation theorem: sufficiently successful agents cannot rely only on current state observations, but must preserve domain-relevant information in memory. The paper further shows that if an agent's memory contains enough information to estimate values for related goals, then that memory can be used to approximately reconstruct the agent's local transition dynamics. Together, these results characterize memory as the substrate that supports domain disambiguation, transition-model reconstruction, and planning for generalist agents.

Editor's pickTechnology
Daily Brew· 6 days ago

Running local models is good now

An exploration of the current state of running AI models locally on consumer hardware.

AI Research & Science3 articles
Editor's pickEnergy & Utilities
Arxiv· 3 days ago

Analysing drivers and interdependencies in European electricity markets using XAI

arXiv:2606.19118v1 Announce Type: cross Abstract: Electricity markets are inherently complex systems characterised by strong nonlinearities, high-dimensional interactions, and increasing interdependence across regions. While deep neural networks (DNNs) have demonstrated strong predictive capabilities for electricity prices, their lack of interpretability limits their usefulness for understanding the underlying drivers of price formation. This paper addresses this gap by combining DNN models with explainable artificial intelligence (XAI) techniques to analyse the determinants of electricity prices across 39 European bidding zones. We employ SHAP (SHapley Additive exPlanations) to quantify feature contributions and apply and extend SSHAP, an aggregation framework to improve interpretability in high-dimensional settings. The analysis identifies that renewable energy sources, particularly solar, play a disproportionately important role in price formation despite their lower share in total power generation. Gas prices remain a dominant and consistent driver across electricity markets, while interconnections significantly shape price dynamics, highlighting the strong interdependence of European electricity systems. In addition, a synthetic EU-wide electricity market is constructed to explore the counterfactual scenario of a fully integrated market with a single price.

Editor's pick
Arxiv· 3 days ago

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

arXiv:2606.18874v1 Announce Type: new Abstract: AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspectable, contract-governed processes. Xcientist organizes literature evidence, idea states, implementation plans, ablation records and repair traces as persistent research artifacts, so that generated mechanisms can be grounded, executed, tested and revised without losing their evidential basis. We identify claim drift as a failure mode of automated research, where runnable artifacts no longer support the mechanism originally claimed. Across training-free memory systems, graph-structured traffic forecasting and multi-scale physics-informed neural networks, Xcientist preserves traceable trajectories from problem formulation to mechanism design, validation and bounded revision. These results suggest that AI scientists should be evaluated not only by their final artifacts, but by whether their synthesis and validation processes remain attributable, inspectable and scientifically accountable.

AI Security & Cybersecurity5 articles

Adoption, Deployment & Impact

10 articles
AI Measurement & Evaluation3 articles
Editor's pick
Arxiv· 3 days ago

ForecastBench-Sim: A Simulated-World Forecasting Benchmark

arXiv:2606.18686v1 Announce Type: new Abstract: Forecasting benchmarks for general-purpose AI systems usually inherit the constraints of the real world: outcomes resolve slowly, tail events are rare, and counterfactual questions are difficult to score. We introduce ForecastBench-Sim, a simulated-world forecasting benchmark built on game rollouts from Freeciv, a turn-based strategy game modelled on the Civilization series. Forecasters receive a fixed world report (a structured snapshot of the current game state) and answer questions about hidden future states; the benchmark then continues the simulation and scores forecasts. Because the world is simulated, the same setup can generate continuous or binary forecasting questions at arbitrary time horizons, paired intervention worlds for conditional or causal questions, and resolved examples of rare or disruptive outcomes. We describe the benchmark pipeline, question families, scoring protocol, and release artifacts, and report validation slices from model evaluations and an anonymized human pilot. ForecastBench-Sim is intended to complement real-world forecasting benchmarks by providing controlled, immediately resolvable tasks for studying probabilistic reasoning under dynamic world states.

Editor's pick
Ethan Mollick· 4 days ago

Developing Standardized Benchmarks for Generative AI Design and Coding Capabilities

A new benchmark evaluates AI models on their ability to generate complex, interactive 3D simulations. This highlights the gap between model capability and practical application in creative and technical workflows.

AI ROI & Business Case1 articles
Editor's pickTransportation & Logistics
Arxiv· 3 days ago

ProfiLLM: Utility-Aligned Agentic User Profiling for Industrial Ride-Hailing Dispatch

arXiv:2606.18803v1 Announce Type: new Abstract: Bringing Large Language Models (LLMs) into industrial ride-hailing dispatch as semantic feature extractors over platform-scale behavioral logs is a compelling but under-explored data systems problem. Production matching pipelines remain dominated by structured numerical features, yet decisive behavioral signals (e.g., a driver's habitual aversion to certain regions) are inherently contextual and naturally expressible as LLM-generated user profiles. However, scaling such profiling to a live, millisecond-latency dispatcher faces three intertwined constraints rarely addressed together: on a platform with millions of daily orders, logs exceed any LLM's context window by orders of magnitude; most users are long-tail, with too few interactions for per-user profiling; and surface-fluent profiles do not necessarily improve downstream prediction utility. We present ProfiLLM, an agentic LLM data pipeline that operationalizes utility-aligned user profiling for production matching systems through two modules. (1) Tool-Augmented Global Knowledge Mining equips an LLM agent with 27 analytical tools to mine platform-scale data, producing reusable global knowledge, adaptive user clustering rules, and region-level supply-demand priors. (2) Utility-Aligned Profile Exploration generates multiple candidate profiles per cluster, evaluates them via a lightweight downstream utility proxy, iteratively refines the best candidates and constructs preference pairs for DPO fine-tuning. Deployed on DiDi's production dispatcher, ProfiLLM achieves up to +6.14% relative AUC improvement in outcome prediction, up to +4.35% GMV gain in dispatching simulation, and consistent improvements in a 14-day online A/B test including +0.47% GMV, +0.33% Completion Rate, and -0.82% Cancel-Before-Accept rate.

Geopolitics, Policy & Governance

18 articles
AI Geopolitics2 articles
Editor's pickTelecommunications
Arxiv· 3 days ago

Understanding the "Airport" Censorship Circumvention Ecosystem in China

arXiv:2606.18427v1 Announce Type: cross Abstract: In China, a burgeoning underground market sells citizens subscription-based censorship circumvention proxies known as ''airports''. We present the first systematic study of this ecosystem, combining user surveys, social media analysis, and active network measurements. We find that airports are by far the most popular off-the-shelf censorship circumvention tool in China, used by over half of our 1,667~survey respondents, who cite their ease of use, performance, and access to geo-restricted services like ChatGPT and Netflix. By scanning the Internet and scraping Telegram announcement channels, we identify 3,431 active airports built on a handful of open-source toolkits. We subscribe to 35 airports and characterize their performance, which often surpasses direct connections through the Great Firewall due to a distinctive multi-hop architecture. However, airports also pose new challenges and security risks: they accept payment through commercial services like Alipay, suffer frequent government takedowns, and are difficult for clients to configure optimally. Many airports also deploy their own distinct censorship policies. Airports are far more widely used than other circumvention tools from the academic literature, but introduce new forms of fragility and control, offering both lessons and opportunities for future circumvention research.

AI Policy & Regulation13 articles
Editor's pick
Artificial Intelligence Newsletter | June 18, 2026· 4 days ago

California lawmakers proposing third-party assessments of AI systems

California state legislators are partnering to propose safety standards for independent, third-party assessments of AI systems and models.

Editor's pickConsumer & Retail
Reuters· 2 days ago

Reuters AI News | Latest Headlines and Developments | Reuters

Eurocommerce, the European retail association whose members include Amazon , H&M, Inditex, and Ikea, is ​asking EU tech chief Henna Virkkunen to exempt ‌ AI -generated advertisements from the bloc's new regulation requiring disclosure of AI use.

Editor's pickFinancial Services
Artificial Intelligence Newsletter | June 18, 2026· 4 days ago

G7 leaders urge financial regulator coordination to tackle AI risks

G7 leaders called for information sharing and coordination between financial regulators and tech companies to address risks posed by frontier AI models.

Editor's pickGovernment & Public Sector
Arxiv· 3 days ago

"The New Era of Tech-Enabled Traceability": Tensions between the FDA's Data Governance Vision and the Lived Realities of Food Producers

arXiv:2606.18593v1 Announce Type: cross Abstract: The U.S. Food and Drug Administration (FDA)'s Food Traceability Rule requires agri-food supply chain stakeholders (stakeholders)--including farmers, fishers, retail workers, and others--to maintain detailed tracking records beginning in January 2026. Through this Rule, the FDA envisions a "New Era of Tech-Enabled Traceability," in which standardized, harmonized tracking data serve as a foundational public health infrastructure, enabling more rapid identification and removal of potentially contaminated food and ultimately reducing the risk of foodborne illness. Despite this promising vision, we observe that the Rule reconfigures agri-food stakeholders into data laborers by mandating stringent data collection, formatting, and reporting requirements. In this paper, we examine the tensions and burdens that arise from such reconfiguration. Leveraging Data Feminism as an orientation to attend to how data-driven policy implementation disproportionately burdens smaller, under-resourced stakeholders who lack the infrastructural and financial capacity to comply, we analyze 1,198 public comments submitted to Regulations.gov in response to the proposed Rule. Our qualitative document analysis reveals three key tensions: (1) the individual labor, financial, and educational burdens stakeholders experience as they are reconfigured into data workers; (2) moments where data tracking becomes infeasible due to infrastructural limitations, cultural contexts, and situated production practices; and (3) instances where the Rule's intended flexibility instead introduces confusion and burden due to its ambiguity.

Editor's pickPAYWALLTechnology
NYT· 4 days ago

Anthropic Employees Accuse Trump Administration of Targeting Them

Workers at the artificial intelligence company have been puzzled and increasingly concerned by the administration’s move to limit their latest A.I. models.

Editor's pickPAYWALL
FT· 4 days ago

We have to manage the AI revolution

There must be a global agreement on how the technology is controlled

Editor's pickDefense & National Security
CEPA· 4 days ago

US AI Export Controls Cause Furor - CEPA

By banning foreigners from accessing Anthropic’s Fable 5, the US abandons its hands-off approach to artificial intelligence — and angers allies.

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | June 18, 2026· 4 days ago

ETSI chief eyes larger AI role as EU grapples with standards gap

Europe's AI rules will only succeed if they're translated into practical standards that companies can use to build and test products, according to Jan Ellsberger, who heads a major European standards body.

Editor's pickTechnology
Daily Brew· 3 days ago

Anthropic got hit by export rules nobody understands

Anthropic faces challenges navigating complex and ambiguous new export control regulations.

Editor's pickGovernment & Public Sector
Theregister· 4 days ago

Estonia intends to recognize AI agents with digital IDs

I am not a number! I am a free agent (that just happens to have a number)

Editor's pick
Artificial Intelligence Newsletter | June 17, 2026· 4 days ago

EU lawmakers approve nudifier app ban in AI omnibus package

EU lawmakers have approved a ban on nudifier applications as part of an AI omnibus legislative package.

Editor's pickGovernment & Public Sector
Siliconrepublic· 3 days ago

New Irish bill to supervise EU AI Act gets greenlit

The AI Act, which entered into force in August 2024, attempts to tackle some of the risks emerging from the technology while letting the bloc benefit from its economic potential. Read more: New Irish bill to supervise EU AI Act gets greenlit

Editor's pickTechnology
Artificial Intelligence Newsletter | June 17, 2026· 5 days ago

Calif. child-safety bills advance as lawmakers target platform, chatbot design

A trio of children's online safety bills is moving forward in California, with lawmakers aiming to hold online platforms and AI chatbot operators accountable for harms to minors.

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.