Thu 18 June 2026
Daily Brief — Curated and contextualised by Best Practice AI
SpaceX Acquires Cursor, Meta Seeks Wall Street, and Investors Face Political Risk
TL;DRSpaceX has acquired Anysphere, the startup behind AI coding agent Cursor, for $60 billion to enhance its enterprise AI tools. Meta is pursuing Wall Street financing for a $600 billion infrastructure push. Investors are now considering political risk as a significant factor in AI investments. Meanwhile, BE Semiconductor has raised its long-term revenue targets due to AI demand.
The stories that matter most
Selected and contextualised by the Best Practice AI team
Meta’s AI gamble: Dina Powell McCormick opens door to Wall Street
Former Goldman Sachs executive explores financing once alien to Silicon Valley for $600bn infrastructure push
The Illusion of Improvement: Reject Inference Strategies in Credit Scoring
arXiv:2606.18479v1 Announce Type: cross Abstract: Reject inference methods are widely used to mitigate survival bias in credit scoring, yet their effectiveness remains poorly understood. We systematically evaluate several such methods and uncover a structural failure mode: in a natural retraining cycle, models whose accuracy improves while recall collapses create an illusion of improvement that l
A Knowledge Theory of Capital:The Value of Natural and Artificial Intelligence
arXiv:2606.18288v1 Announce Type: new Abstract: This volume develops a knowledge theory of capital for economies in which productive capacity increasingly resides in software, data, models, routines, expertise, platforms, organizations, commons, and public epistemic infrastructure. Beginning from Adam Smith's theory of labour, stock, specialization, and market extent, it asks what changes when kn
Anthropic Ban Forces Investor Rethink of Political Risk
Investors in the ever-hotter AI stock rally must suddenly consider a risk with the potential to be even more damaging than high valuations and big spending: Politics getting in the way.
CEO-Bench: Can Agents Play the Long Game?
arXiv:2606.18543v1 Announce Type: new Abstract: Language model agents are becoming proficient executors at isolated, short-horizon tasks such as software engineering and customer service. Yet real-world challenges require a combination of sophisticated skills that remain largely untested in agents: (1) navigating long horizons amid uncertainty; (2) acquiring information in noisy environments; (3)
Blackstone’s AirTrunk Seeks $3 Billion Loan for Sydney Project
AirTrunk Pty., a data center operator backed by Blackstone Inc., is in talks with banks for a A$4.3 billion ($3 billion) loan to back a new project in Australia, people familiar with the matter said, as it continues its debt-fueled expansion across Asia Pacific.
Testing Centralized and Polycentric Computational Planning
arXiv:2606.19214v1 Announce Type: new Abstract: This paper presents a reproducible synthetic benchmark comparing a computational planner, an agent-based market, and a hybrid meta-market within a common simulated economy. The benchmark incorporates input-output production networks, heterogeneous firms, capacity constraints, endogenous prices, welfare metrics, structural shocks, adversarial stress
BE Semiconductor Raises Long-Term Revenue, Profitability Targets on AI Boost
BE Semiconductor Industries raised its long-term revenue and profitability targets, citing increased demand for AI-related products.
Economics & Markets
BE Semiconductor Raises Long-Term Revenue, Profitability Targets on AI Boost
BE Semiconductor Industries raised its long-term revenue and profitability targets, citing increased demand for AI-related products.
AI Company Dream Triples Value to $3 Billion in Funding Round
Dream, an Israeli artificial intelligence company that provides AI and cybersecurity services to governments and critical infrastructure operators, has raised $260 million at a $3 billion valuation.
Plurimi CIO Sees Value in AI Supply Chain
Plurimi Wealth Chief Investment Officer Patrick Armstrong says he sees value in the artificial intelligence supply chain, stocks such as Micron, Samsung and Nvidia. "We're underweight the Magnificent Seven — the hyperscalers — those are the companies that are spending a trillion dollars over the next 12 months on AI datacenters," Armstrong tells Bloomberg Television. "And we own the companies they're giving the money to." (Source: Bloomberg)
Meta’s AI gamble: Dina Powell McCormick opens door to Wall Street
Former Goldman Sachs executive explores financing once alien to Silicon Valley for $600bn infrastructure push
Jeremy Grantham on How to Tell if a Bubble's About to Burst | Odd Lots
Jeremy Grantham, co-founder and long-term strategist of GMO, has a long history of calling bubbles. As he recounts in his new memoir, The Making of a Permabear: The Perils of Long-Term Investing in a Short-Term World, that includes spotting the dot-com bubble of the early 2000s, which some people see as analogous to the current excitement over AI. And when it comes to today's market, there are a lot of signs of frothiness you could point to. In this episode, we speak to Grantham about how he sees markets right now, including a watershed change for Big Tech stocks, the signs he watches out for to spot when a bubble might burst, and what really keeps him up at night. (Source: Bloomberg)
Lenovo Raises $2 Billion From Seven-Year Convertible Bonds
Lenovo Group Ltd. fetched $2 billion from the sale of seven-year convertible bonds, two years after it raised a similar amount from Saudi Arabia’s sovereign wealth fund.
From scarcity to execution: China’s AI valuation reset - Bamboo Works - China stock insights for global investors
Zhupu and MiniMax have lost more than 40% of their market value in just two weeks, as investors reassess the true worth of China's large language model developers
HPE Expands Quantum Computing Partnerships Amid Insider Stock Sales and Analyst Optimism
HPE is advancing hybrid quantum computing by integrating its Cray-based HPC with diverse quantum technologies, partnering with firms like Intel and Quantinuum.
Anthropic Ban Forces Investor Rethink of Political Risk
Investors in the ever-hotter AI stock rally must suddenly consider a risk with the potential to be even more damaging than high valuations and big spending: Politics getting in the way.
China Boosts Startup IPOs in Quantum, AI, and Emerging Tech to Outpace U.S. Competition
China is ramping up support for IPOs in cutting-edge tech sectors like quantum and AI to boost innovation amidst growing competition with the U.S.
A Knowledge Theory of Capital:The Value of Natural and Artificial Intelligence
arXiv:2606.18288v1 Announce Type: new Abstract: This volume develops a knowledge theory of capital for economies in which productive capacity increasingly resides in software, data, models, routines, expertise, platforms, organizations, commons, and public epistemic infrastructure. Beginning from Adam Smith's theory of labour, stock, specialization, and market extent, it asks what changes when kn
Testing Centralized and Polycentric Computational Planning
arXiv:2606.19214v1 Announce Type: new Abstract: This paper presents a reproducible synthetic benchmark comparing a computational planner, an agent-based market, and a hybrid meta-market within a common simulated economy. The benchmark incorporates input-output production networks, heterogeneous firms, capacity constraints, endogenous prices, welfare metrics, structural shocks, adversarial stress
Star Google Researcher Joins OpenAI in Coup for ChatGPT Creator
One of Google’s most prominent researchers is leaving for rival OpenAI, dealing a setback to Alphabet Inc. in a multibillion-dollar race to build the world’s most powerful artificial intelligence models.
Strategic Feature Selection
arXiv:2606.18867v1 Announce Type: cross Abstract: When algorithmic predictors inform resource allocation in high-stakes domains such as healthcare, these predictors must account for strategic manipulation of input features. The typical solution is to redesign the predictor itself to explicitly account for strategic interactions. In practice, however, decision makers are often constrained to adjusting coarser levers within existing prediction pipelines. For example, healthcare organizations often select which features to exclude based on perceived manipulability, while using standard regularization procedures to shrink the coefficients of retained features. In this work, we initiate a formal study of strategic classification through feature selection and its interaction with ridge regularization. Our main finding is that excluding individual features based on their manipulability alone is generally suboptimal. We provide a fine-grained characterization of the performance of a feature subset under optimal regularization, yielding new insights for policy design. Motivated by this characterization, we develop a practical algorithm for jointly choosing the feature set and the level of ridge regularization. Through a real-world case study on a healthcare payments benchmark, we illustrate how our algorithm can guide the design of coarse policy levers in practice. Our results provide a principled, practical framework for mitigating the effects of strategic behavior in algorithmic decision-making systems.
China AI Lab’s 170% Stock Surge Cements Winner-Loser Pair Trade
A pair trade is emerging in China’s artificial intelligence sector, with investors piling into the perceived winner and betting against its rival seen as losing ground in the race to build commercially viable AI models.
AI 'invisible agents' pose new competition risks, Australian minister says
Australian official Andrew Leigh identified four competition risks in agent-mediated markets, including hidden steering and personalized pricing, suggesting policy tools like audit trails.
Anthropic, co-founders face new US copyright infringement suit from 100 authors
Around 100 authors have filed a lawsuit against Anthropic, alleging the company used pirated books from library websites to train its AI models.
Buying Cursor could be SpaceX’s Instagram moment
Elon Musk can learn from Mark Zuckerberg by refraining from too much meddling
Fears for Xbox as it puts its developers on the chopping block once again
After the billion-dollar company’s leaders sent staff a memo saying the brand had ‘over-extended’, game studios may be in the firing line Don’t get Pushing Buttons delivered to your inbox? Sign up here In March 2000, Bill Gates stood onstage at the Game Developers Conference in San Francisco and, to a packed crowd, officially announced the company’s long-anticipated video game console. “We want Xbox to be the platform of choice for the best and most creative game developers in the world,” he told attenders – and that was indeed the intention of the small, dedicated team who put together the blueprints of that first machine. The Xbox landscape seems very different 25 years later. Last week, mere days after a bullish summer showcase full of Gears of War revivals and promises of a renewed focus on Xbox’s gaming strengths, new CEO, Asha Sharma, and chief content officer, Matt Booty, wrote a memo to Xbox staff inviting them to brace for “hard truths”. “Excluding Activision Blizzard King, over the past five years, we have spent over $20bn on ongoing investments in our content, platform and hardware subsidy, but our annual revenue has declined nearly half a billion during that time. Going forward, this cannot continue,” it read. Continue reading...
Korea Rejects Baemin, Coupang Settlement Bids in Antitrust Probe
South Korea’s antitrust watchdog has rejected settlement proposals from the country’s two largest food delivery platforms, Baedal Minjok and Coupang Eats, leaving both firms exposed to potentially significant fines over alleged unfair business practices.
Searching for Synergy in Shared Workspace Human-AI Collaboration
arXiv:2606.18413v1 Announce Type: new Abstract: Automated AI agents are increasingly capable, yet many scientific and professional tasks require human judgment and contextual expertise. We study shared-workspace human-AI teams, where AI agents and human collaborators must coordinate responsibilities before submitting a final answer. Using the Collaborative Gym environment with DiscoveryBench tasks, we examine when adding simulated human collaborators improves performance and when process loss turns additional collaborators into coordination overhead. Across 1,482 sessions, adding relevant collaborators can lower performance when teams lack structure to coordinate their contributions. We then evaluate scaffolding that combines shared group memory with simulated human-in-the-loop (HITL) gates, where selected actions require approval from a designated simulated participant. This scaffolding yields higher mean performance, most clearly in three-person teams, with clearer responsibility signals and stronger routing of expertise to team actions. Overall, how human-AI teams coordinate and integrate expertise matters as much as the capability available to them.
MIT’s Initiative for New Manufacturing builds momentum
MIT is advancing its manufacturing initiative, focusing on new technologies and industrial processes.
Labor, Society & Culture
Gig workers are endlessly exploited. AI could make more of us share their fate
As companies integrate AI and hire fewer employees, a shift toward a ‘gig economy’ will commence In 2024, the buy-now-pay-later company Klarna announced that it would cut hundreds of customer service roles and begin using an artificial intelligence chatbot instead. The move was expected to save the company millions. But a year later, after customers complained about the degraded quality of customer service, Klarna began to quietly recruit human customer service agents back. At first glance, the reversal appeared to be a victory for human workers in the age of AI. The reality was more complex. Instead of bringing on full-time customer service agents, who Klarna contracts through an outside agency, it instead brought on workers in what Klarna CEO Sebastian Siemiatkowski has described as “an Uber type of set-up”. Now, an AI chatbot continues to handle most of customers’ basic queries, while a growing number of gig workers handle the more advanced ones. “Just like somebody can go and drive an Uber for a while, they can actually jump on and work for Klarna’s customer service,” Siemiatkowski said on a podcast in February. Continue reading...
Reuters Reuters | Breaking International News & Views
AI will lead to labour shortages, Bezos says in optimistic talk · ago · Exclusive: Meta head of product for ' AI for work' transformation is leaving company · Tracking Fed moves · ago · Warsh kicks off Fed chief era with sweeping review as rates remain unchanged ·
🔮 Is AI immune to groupthink?
Stress-testing AI councils in practice
SciRisk-Bench: A Risk-Dimension-Aware Benchmark for AI4Science Safety
arXiv:2606.18936v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly embedded in AI for Science (AI4Science) workflows, from scientific question answering and literature analysis to laboratory planning and autonomous discovery. This progress creates an urgent need for safety benchmarks that evaluate not only scientific competence, but also whether models recognize and avoid risks in high-stakes scientific contexts. Existing AI4Science safety datasets cover several disciplines and task formats, leaving the underlying risk dimensions underspecified. We introduce \textbf{SciRisk-Bench}, a benchmark designed to evaluate AI4Science safety from two complementary perspectives: explicit risk dimensions and scientific disciplines. SciRisk-Bench covers 7 disciplines, 31 subdisciplines and 10 risk dimensions. In the experimental section, we evaluate both mainstream LLMs and science-oriented LLMs across risk dimensions, disciplines, and sub-disciplines, enabling fine-grained diagnosis of where scientific models remain unsafe.
New Super PAC, the Guardrails Alliance, Aims to Rally Tech Workers to Help Limit A.I.
The Guardrails Alliance, which has raised $5 million, is positioning itself as a populist effort that will take on the pro-A.I. interests trying to influence this year’s elections.
Digital Speech Acts Retain Control of Copyright with People, Not Platforms
arXiv:2606.19263v1 Announce Type: cross Abstract: Legal precedents protect computer code as copyrightable expression. They have enabled centralized digital platforms -- operating from corporate servers that hold all user data -- to construct private governance regimes through the interaction of copyright, contract, and technical architecture: people who create virtually all platform value must surrender effective copyright control through Terms of Service agreements as a condition of participation. In contrast, grassroots platforms consist of cryptographically-identified people operating their networked smartphones independently of any server or global resource; each person holds their own data on their own device, with no third party in possession or intermediation. Here, we define the notion of a \textit{digital speech act} -- a deliberate volitional act by a person of cryptographically signing personal content with the person's private key, carried out on the person's own device -- through which the person simultaneously establishes attribution, accountability, and authorship over the signed content. We contend that (\ia) digital speech acts qualify for copyright protection under existing U.S.\ precedent: \textit{Burrow-Giles} locates authorship in volitional creative choices despite mechanical or algorithmic processes, \textit{Feist} supplies the minimal-creativity threshold, and persistent device storage satisfies the Copyright Act's fixation requirement; (\ib) the digital social contract underlying grassroots platforms preserves this copyright by design -- signed content cannot be unbundled from its signature, and the full provenance chain accumulates as content is forwarded -- so that ownership and possession coalesce in the person; and (\ic) copyright in digital speech acts is a prerequisite for digital sovereignty and democratic self-governance.
Circles Sold Phone Spy Tools to Repressive States, Report Says
A Bulgaria-based company sold controversial surveillance technology to governments in countries with records of repression, enabling authorities to track mobile phones and eavesdrop on private communications, according to documents obtained by Human Rights Watch.
Elon Musk's Grok AI Sparks Debate on Ethics and Oversight in Military Use
Elon Musk's AI tool, Grok, has been identified as a component in US military operations, triggering debates on ethics and oversight in defense.
Technology & Infrastructure
Skill-Guided Continuation Distillation for GUI Agents
arXiv:2606.18890v1 Announce Type: new Abstract: Improving GUI agents typically relies on behavior cloning on expert trajectories. However, as the current policy deviates from the expert policy, it inevitably encounters policy-induced off-trajectory states during closed-loop execution, i.e., states that fall outside the expert trajectories. Since expert trajectories provide no demonstrations for these unseen states, such states receive no effective supervision, leaving the policy unable to select the correct action. To close this supervision gap, we propose Skill-Guided Continuation Distillation (SGCD), an iterative self-improvement framework. SGCD first runs the plain policy without skill guidance for a few steps to reach realistic off-trajectory states. From these states, a skill-guided policy then completes the task and produces successful continuations, which are mixed with expert trajectories to supply supervision over policy-induced off-trajectory states. The skills are extracted from both successful and failed rollouts, consisting of Continuation Plans, Critical Targets, Failure Traps, and Success Criteria. On OSWorld-Verified, SGCD improves the success rate of three base models from the low-30\% range to over 50\%, demonstrating its effectiveness and generality.
WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents
arXiv:2606.18847v1 Announce Type: new Abstract: To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.
CEO-Bench: Can Agents Play the Long Game?
arXiv:2606.18543v1 Announce Type: new Abstract: Language model agents are becoming proficient executors at isolated, short-horizon tasks such as software engineering and customer service. Yet real-world challenges require a combination of sophisticated skills that remain largely untested in agents: (1) navigating long horizons amid uncertainty; (2) acquiring information in noisy environments; (3)
ENPIRE: Agentic Robot Policy Self-Improvement in the Real World
ENPIRE explores agentic robot policy self-improvement, using recursive loops and coding models to help robotic systems learn and refine real-world skills.
Palladyne AI Secures Army Contract for Autonomous Systems Field Trials
Palladyne AI secured key Army contracts to develop SwarmOS and Gremlin-X, aiming to streamline command of diverse autonomous systems during upcoming field trials.
Blackstone’s AirTrunk Seeks $3 Billion Loan for Sydney Project
AirTrunk Pty., a data center operator backed by Blackstone Inc., is in talks with banks for a A$4.3 billion ($3 billion) loan to back a new project in Australia, people familiar with the matter said, as it continues its debt-fueled expansion across Asia Pacific.
Data centre investors navigate geopolitical strife as deals boom
Sector remains hot despite some investors’ wariness about tenants such as ByteDance
Midjourney Medical goes from generating ‘cat images’ to full-body ultrasound scans
Midjourney's AI technology is being applied to medical imaging, specifically for ultrasound scans.
NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation
arXiv:2606.18271v1 Announce Type: new Abstract: As Earth Observation data generation outpaces downlink bandwidth and human-in-the-loop processing, a widening gap has emerged between onboard collection and actionable ground intelligence. This paper presents NAVI-Orbital, a software system deployed on a Low Earth Orbit (LEO) spacecraft. On April 16, 2026, NAVI-Orbital achieved what is, to the authors' knowledge, the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard. NAVI-Orbital uses a local vision-language model (Gemma 3) to classify each captured scene, produce a text description of its content and the relationships between its features, and respond to operator follow-up via natural-language dialogue. The system is re-tasked through plain-English prompts in place of conventional command sequences, and is orchestrated by a graph-based state machine (LangGraph) coordinating dedicated agents for detection and dialogue. Results across ground benchmarking (88.16% accuracy on the 7,960-image curated AID benchmark), Flatsat validation, and live in-orbit captures of newly acquired, previously unseen Earth imagery (including uncorrected YAM-9 imagery, processed onboard with hardware-accelerated GPU inference and no fine-tuning for the flight instrument) demonstrate the feasibility of running foundation models on satellite-class edge computers to invert the conventional acquire-then-downlink-everything bandwidth profile through semantic compression of Earth observations in-orbit.
Qualitative Differences in Model Judgment and Creativity Under Constraint
A comparison of open-weights models reveals nuanced differences in creative judgment that standard benchmarks fail to capture. These qualitative variations impact the utility of models for complex, constrained tasks.
Are Small Language Models the New AI Default?
Their June 2025 paper, Small Language Models are the Future of Agentic AI (Belcak et al.), argued that the narrow, repetitive sub-tasks inside most agent pipelines don’t need a frontier model.
GLM-5.2 is the new leading open weights model on Artificial Analysis
GLM-5.2 has emerged as the top-performing open weights model according to the latest Artificial Analysis intelligence index.
CaVe-VLM-CoT: An Interpretable Vision-Language Model Framework
arXiv:2606.18385v1 Announce Type: new Abstract: Vision-Language Models (VLMs) remain prone to hallucinations, producing fluent but visually unfaithful outputs. Existing chain-of-thought and retrieval-augmented methods only partially address this, as they neither enforce step-level citation grounding nor route verification failures back to retrieval for correction. We present CaVe-VLM-CoT, a modular reflection-based agentic-RAG framework that enforces evidence-grounded reasoning through a five-stage closed-loop pipeline: Extractor, Retriever, Solver, Citation Injector, and Verifier, in which detected ungrounded claims trigger structured feedback to the Extractor for targeted re-retrieval. Since no existing framework jointly measures retrieval quality, step-wise citation faithfulness, and cross-modal grounding, we propose a suite of 23 component-wise metrics across all stages, anchored by CaVeScore, a composite metric weighting accuracy, citation precision and recall, attribution, and evidence grounding. Without any architectural or prompt modifications, CaVe-VLM-CoT achieves 87.1\% accuracy and 56.6\% CaVeScore on ScienceQA , and 55.2\% accuracy and 35.7\% CaVeScore on MMMU (30 subjects).
DeFAb: A Verifiable Benchmark for Defeasible Abduction in Foundation Models
arXiv:2606.18557v1 Announce Type: new Abstract: A rule-based logic solver resolves every instance in our benchmark in under 50 microseconds with 100% accuracy; the best frontier language model reaches 65% at best and drops to 23.5% under rendering-robust evaluation (worst case over four surface renderings). We introduce DeFAb (Defeasible Abduction Benchmark), a dataset and generation pipeline that converts four decades of publicly funded knowledge bases into formally grounded instances for defeasible abduction: constructing hypotheses that explain anomalies by overriding defaults while preserving unrelated expectations. Because every hypothesis must pass polynomial-time checks for valid derivation, conservativity, and minimality, DeFAb makes logical rigor the instrument for measuring creativity and theoretical reasoning, scoring the disciplined construction of theory revisions rather than fluent but theory-destroying prose. The pipeline pairs taxonomic hierarchies (OpenCyc, YAGO, Wikidata) with behavioral property graphs (ConceptNet, UMLS) to produce 372,648+ instances across 33.75M materialized rules from 18 sources, in three levels with polynomial-time verifiable gold standards. Four frontier models do not reliably internalize defeasible reasoning: rendering-robust Level 2 accuracy is 7.8-23.5%; chain-of-thought variance (~36 pp) exceeds any inter-model gap; and a matched contamination control isolates a +19.4 pp Level 3 gap. We further release DeFAb-Hard (a 235-instance Level 3 difficulty variant; best model 53.3% vs 100% symbolic) and CONJURE (a kernel-verified transformative-creativity variant of 560 Lean 4/Mathlib instances whose gold answers are definitions the proof kernel did not previously contain, judge-free verifier; a pilot finds zero novel concepts). The same verifier doubles as an exact reward for preference optimization (DPO, RLVR/GRPO). Released under MIT at https://huggingface.co/datasets/PatrickAllenCooper/DeFAb.
Generative-Model Predictive Planning for Navigation in Partially Observable Environments
arXiv:2606.18888v1 Announce Type: new Abstract: Navigation in partially observable environments presents a significant challenge for autonomous agents, requiring effective decision-making with limited sensory information in unknown environments. Belief-based methods, particularly those using neural networks to approximate the belief space, often fail to capture the inherent multimodality of belief spaces, especially in high-dimensional cases with perceptual aliasing. While generative models present a compelling alternative, they typically require substantial data or expert demonstrations and lack explicit mechanisms for long-term planning. In this paper, we introduce BeliefDiffusion, a novel framework that combines the benefits of both generation and planning. BeliefDiffusion leverages diffusion models to explicitly characterize multimodal belief distributions and utilizes Model Predictive Control (MPC) to simultaneously plan ahead. It consists of two steps: (1) Imagining plausible environment configurations based on observation history and (2) Planning efficient navigation strategies across an aggregated configurations. Through extensive experiments in synthetic map environments, we demonstrate that BeliefDiffusion significantly outperforms both model-free reinforcement learning baselines and other generative approaches in navigation success rate and path efficiency. Our results validate that explicitly incorporating multimodal belief representations into planning enables more robust navigation in partially observable settings.
Qwen-Robot Suite: A Foundation Model Suite for Physical World Intelligence
Qwen-Robot Suite introduces an open foundation model stack for embodied AI, including navigation, world modeling, and manipulation models for robotics.
Pretrained to Imagine, Fine-Tuned to Act: The Rise of World-Action Models
NVIDIA's post explains how video world models can be adapted into action models for robotics and embodied AI systems.
What Must Generalist Agents Remember?
arXiv:2606.18746v1 Announce Type: new Abstract: This paper develops a formal account of what generalist agents must store in memory in order to act near-optimally across multiple environments and goals. It shows that when two domains share an observational bottleneck but require incompatible optimal actions, any uniformly near-optimal policy must induce distinct memory distributions at that bottleneck. The result yields a separation theorem: sufficiently successful agents cannot rely only on current state observations, but must preserve domain-relevant information in memory. The paper further shows that if an agent's memory contains enough information to estimate values for related goals, then that memory can be used to approximately reconstruct the agent's local transition dynamics. Together, these results characterize memory as the substrate that supports domain disambiguation, transition-model reconstruction, and planning for generalist agents.
Running local models is good now
An exploration of the current state of running AI models locally on consumer hardware.
Analysing drivers and interdependencies in European electricity markets using XAI
arXiv:2606.19118v1 Announce Type: cross Abstract: Electricity markets are inherently complex systems characterised by strong nonlinearities, high-dimensional interactions, and increasing interdependence across regions. While deep neural networks (DNNs) have demonstrated strong predictive capabilities for electricity prices, their lack of interpretability limits their usefulness for understanding the underlying drivers of price formation. This paper addresses this gap by combining DNN models with explainable artificial intelligence (XAI) techniques to analyse the determinants of electricity prices across 39 European bidding zones. We employ SHAP (SHapley Additive exPlanations) to quantify feature contributions and apply and extend SSHAP, an aggregation framework to improve interpretability in high-dimensional settings. The analysis identifies that renewable energy sources, particularly solar, play a disproportionately important role in price formation despite their lower share in total power generation. Gas prices remain a dominant and consistent driver across electricity markets, while interconnections significantly shape price dynamics, highlighting the strong interdependence of European electricity systems. In addition, a synthetic EU-wide electricity market is constructed to explore the counterfactual scenario of a fully integrated market with a single price.
Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness
arXiv:2606.18874v1 Announce Type: new Abstract: AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspectable, contract-governed processes. Xcientist organizes literature evidence, idea states, implementation plans, ablation records and repair traces as persistent research artifacts, so that generated mechanisms can be grounded, executed, tested and revised without losing their evidential basis. We identify claim drift as a failure mode of automated research, where runnable artifacts no longer support the mechanism originally claimed. Across training-free memory systems, graph-structured traffic forecasting and multi-scale physics-informed neural networks, Xcientist preserves traceable trajectories from problem formulation to mechanism design, validation and bounded revision. These results suggest that AI scientists should be evaluated not only by their final artifacts, but by whether their synthesis and validation processes remain attributable, inspectable and scientifically accountable.
Akamai Launches Edge-First Security Framework to Secure AI Commerce and Drive Market Growth
Akamai introduces a pioneering edge-first security framework to enhance AI-driven commerce by integrating identity, observability, and trust with real-time decisioning.
Cyber offenses now account for around a third of all crime across Asia and South Pacific
Latest Interpol review shows how scams continue to dominate, and AI-enabled attackers prove too hot to handle for cash-strapped regions
Radware Launches AI Xploit Shield for Real-Time Application and API Security
Radware introduced AI Xploit Shield, a service that provides tailored protections for applications and APIs using AI without requiring changes to underlying software.
UK critical infrastructure hit by 200 cyber incidents in a year, agency says
Head of National Cyber Security Centre says UK in ‘ongoing contest with capable adversaries’ and AI could add to threat The UK’s critical national infrastructure has been hit by more than 200 cyber incidents over the past year and state-linked assailants were behind three-quarters of the attacks, according to the state cybersecurity body. Richard Horne, the chief executive of the National Cyber Security Centre (NCSC), said hostile states such as Russia, China and Iran were increasingly targeting systems behind the UK’s key services. Continue reading...
Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers
Researchers clarify that a recent security concern regarding Fable 5 was a simple code fix rather than a jailbreak.
Adoption, Deployment & Impact
Only half of US datacenter capacity planned for 2026 is actually under construction
Another fun example of AI hype and reality colliding
The Steep Learning Curve of AI Interfaces as a Barrier to Enterprise Adoption
AI interfaces remain unintuitive, requiring significant user training to overcome adoption roadblocks. This friction limits the immediate productivity gains organizations can realize from current tools.
Infrastructure security is now the #1 tech priority
Infrastructure leaders ranked security and compliance as their top AI adoption goal, ahead of speed, developer output, and cost savings.
You Probably Don’t Need an Agent Framework
Most LLM applications need a clear workflow, not an autonomous agent. Here's how to build one in plain Python.
ForecastBench-Sim: A Simulated-World Forecasting Benchmark
arXiv:2606.18686v1 Announce Type: new Abstract: Forecasting benchmarks for general-purpose AI systems usually inherit the constraints of the real world: outcomes resolve slowly, tail events are rare, and counterfactual questions are difficult to score. We introduce ForecastBench-Sim, a simulated-world forecasting benchmark built on game rollouts from Freeciv, a turn-based strategy game modelled on the Civilization series. Forecasters receive a fixed world report (a structured snapshot of the current game state) and answer questions about hidden future states; the benchmark then continues the simulation and scores forecasts. Because the world is simulated, the same setup can generate continuous or binary forecasting questions at arbitrary time horizons, paired intervention worlds for conditional or causal questions, and resolved examples of rare or disruptive outcomes. We describe the benchmark pipeline, question families, scoring protocol, and release artifacts, and report validation slices from model evaluations and an anonymized human pilot. ForecastBench-Sim is intended to complement real-world forecasting benchmarks by providing controlled, immediately resolvable tasks for studying probabilistic reasoning under dynamic world states.
Developing Standardized Benchmarks for Generative AI Design and Coding Capabilities
A new benchmark evaluates AI models on their ability to generate complex, interactive 3D simulations. This highlights the gap between model capability and practical application in creative and technical workflows.
Geopolitics, Policy & Governance
AI & Tech brief: China’s biotech dominance - The Washington Post
New bipartisan legislation would create a vast biological database primed for AI -driven drug research.
China white paper stresses AI cooperation, trade in global governance
China released a global-governance white paper emphasizing AI cooperation and multilateral rule-making, linking AI governance to trade, supply-chain stability, and technology access for developing nations.
California lawmakers proposing third-party assessments of AI systems
California state legislators are partnering to propose safety standards for independent, third-party assessments of AI systems and models.
Reuters AI News | Latest Headlines and Developments | Reuters
Eurocommerce, the European retail association whose members include Amazon , H&M, Inditex, and Ikea, is asking EU tech chief Henna Virkkunen to exempt AI -generated advertisements from the bloc's new regulation requiring disclosure of AI use.
G7 leaders urge financial regulator coordination to tackle AI risks
G7 leaders called for information sharing and coordination between financial regulators and tech companies to address risks posed by frontier AI models.
"The New Era of Tech-Enabled Traceability": Tensions between the FDA's Data Governance Vision and the Lived Realities of Food Producers
arXiv:2606.18593v1 Announce Type: cross Abstract: The U.S. Food and Drug Administration (FDA)'s Food Traceability Rule requires agri-food supply chain stakeholders (stakeholders)--including farmers, fishers, retail workers, and others--to maintain detailed tracking records beginning in January 2026. Through this Rule, the FDA envisions a "New Era of Tech-Enabled Traceability," in which standardized, harmonized tracking data serve as a foundational public health infrastructure, enabling more rapid identification and removal of potentially contaminated food and ultimately reducing the risk of foodborne illness. Despite this promising vision, we observe that the Rule reconfigures agri-food stakeholders into data laborers by mandating stringent data collection, formatting, and reporting requirements. In this paper, we examine the tensions and burdens that arise from such reconfiguration. Leveraging Data Feminism as an orientation to attend to how data-driven policy implementation disproportionately burdens smaller, under-resourced stakeholders who lack the infrastructural and financial capacity to comply, we analyze 1,198 public comments submitted to Regulations.gov in response to the proposed Rule. Our qualitative document analysis reveals three key tensions: (1) the individual labor, financial, and educational burdens stakeholders experience as they are reconfigured into data workers; (2) moments where data tracking becomes infeasible due to infrastructural limitations, cultural contexts, and situated production practices; and (3) instances where the Rule's intended flexibility instead introduces confusion and burden due to its ambiguity.
Anthropic Employees Accuse Trump Administration of Targeting Them
Workers at the artificial intelligence company have been puzzled and increasingly concerned by the administration’s move to limit their latest A.I. models.
We have to manage the AI revolution
There must be a global agreement on how the technology is controlled
US AI Export Controls Cause Furor - CEPA
By banning foreigners from accessing Anthropic’s Fable 5, the US abandons its hands-off approach to artificial intelligence — and angers allies.
ETSI chief eyes larger AI role as EU grapples with standards gap
Europe's AI rules will only succeed if they're translated into practical standards that companies can use to build and test products, according to Jan Ellsberger, who heads a major European standards body.
Anthropic got hit by export rules nobody understands
Anthropic faces challenges navigating complex and ambiguous new export control regulations.
Estonia intends to recognize AI agents with digital IDs
I am not a number! I am a free agent (that just happens to have a number)
EU lawmakers approve nudifier app ban in AI omnibus package
EU lawmakers have approved a ban on nudifier applications as part of an AI omnibus legislative package.
New Irish bill to supervise EU AI Act gets greenlit
The AI Act, which entered into force in August 2024, attempts to tackle some of the risks emerging from the technology while letting the bloc benefit from its economic potential. Read more: New Irish bill to supervise EU AI Act gets greenlit
Calif. child-safety bills advance as lawmakers target platform, chatbot design
A trio of children's online safety bills is moving forward in California, with lawmakers aiming to hold online platforms and AI chatbot operators accountable for harms to minors.
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.