Sat 6 June 2026
Daily Brief — Curated and contextualised by Best Practice AI
Apollo Finances AI Chips, Google Bets on SpaceX, and BOE Warns of Rationing
TL;DR Apollo Global Management and Blackstone have secured $35 billion to fund Anthropic's AI infrastructure expansion. Google has committed $30 billion to SpaceX for computing power over the next few years. The Bank of England's Governor warns that AI may need to be rationed due to energy capacity constraints. Meanwhile, a new report highlights AI as the primary reason for job cuts in companies.
The stories that matter most
Selected and contextualised by the Best Practice AI team
Microsoft AI chief says company was “set free” from OpenAI to pursue superintelligence
Microsoft's AI leadership suggests that distancing from OpenAI allows the company more flexibility to pursue its own superintelligence goals.
Agents' Last Exam
arXiv:2606.05405v1 Announce Type: new Abstract: Recent AI systems have achieved strong results on a wide range of benchmarks, yet these gains have not translated into economically meaningful deployment across many professional domains. We argue that this gap is largely an evaluation problem: widely used benchmarks lack sustained performance measurement on real and economically valuable workflows. This paper introduces Agents' Last Exam (ALE), a benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes. Developed in collaboration with 250+ industry experts, ALE covers non-physical industries defined with reference to O*NET / SOC 2018 (the U.S. federal occupational taxonomy). It is organized around a task taxonomy with 55 subfields grouped into 13 industry clusters covering 1K+ tasks. Current results show that the hardest tier remains far from saturated: across mainstream harness and backbone configurations, the average full pass rate is 2.6%. ALE is designed as a living benchmark: its task pool grows continuously as new workflows and industries are onboarded. More broadly, ALE is intended not merely as another leaderboard, but as an instrument for closing the gap between benchmark success and GDP-relevant impact.
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems
arXiv:2606.05304v1 Announce Type: new Abstract: Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared context window, and ultimately affect both system performance and inference cost. We analyze five common inter-agent communication strategies across two MAS topologies, finding that no fixed strategy is universally optimal. Instead, effective inter-agent messages consistently preserve action-centered information needed by downstream agents. Building on this, we propose the PACT (Protocolized Action-state Communication and Transmission), which treats inter-agent communication as a public state-update problem and projects each raw agent output into a compact action-state record before it enters shared history. Across different MAS topologies, PACT consistently improves the performance-cost trade-off, achieving comparable or stronger task performance with substantially fewer tokens. The gains extend to production coding harnesses: PACT lifts OpenHands' resolve rate at -10% tokens-per-resolved, and is resolve-neutral on SWE-agent while halving input tokens. Our code is publicly available at https://github.com/iNLP-Lab/PACT.
Microsoft's Scout leak has turned AI stickiness into a boardroom risk - Startup Fortune
A leaked Microsoft planning document reportedly described Scout's first rollout phase as making users addicted, a phrase Satya Nadella has pushed back
The coming sticker shock: The cost to use and adopt AI is going up
As noted in the latest S3T Strategic Awareness Dashboard, capital is still funding AI infrastructure, but scarce resources continue to impose operating constraints. Now two new forms of market friction threaten to slow AI adoption and value realization: stricter usage based pricing and emerging ...
What I Tell Every CEO And Board Who Asks Me About AI Deployment
The question every CEO and board needs to ask is whether someone in their own organization is doing the same thing right now, and whether they have any way of knowing.
Palantir wins £9M contract to run UK firearms licensing
The CIA-backed business will hold gun, bomb, and poison records after beating Accenture and NEC for the decade-long deal.
Did Claude increase bugs in rsync?
An analysis investigating whether the use of AI coding assistants like Claude has introduced regressions or bugs into the rsync codebase.
MAGA hates AI, but Trump agrees with Bernie it might be time for partial government ownership
"You make them a partnership in this revolution," Trump told reporters Friday. "It would be a beautiful thing."
Canada joins EU in push for tech sovereignty with new AI strategy
A new national AI strategy puts sovereignty front and centre as Canada moves to reduce its dependence on foreign cloud and AI providers. Read more: Canada joins EU in push for tech sovereignty with new AI strategy
Condoleezza Rice Warns That the AI Race Between the United States and China Will Define the World Order - Funds Society
Former U.S. Secretary of State Condoleezza Rice argued that the world is undergoing a profound transition away from the international order built after World War II, and that artificial intelligence (AI) represents the greatest technological disruption within that reconfiguration.
OECD launches AI Policy Toolkit for governments | Digital Watch Observatory
The AI Policy Toolkit uses semantic search to surface policy examples and guidance for governments.
Economics & Markets
Elon Musk Is Dropping a Boulder in a Kiddie Pool
He is about to take SpaceX public—pushing other AI companies to do the same.
Weekly AI Digest: Computex 2026, Anthropic IPO and more
Anthropic filed for what could be one of the largest IPOs in history, while Florida decided it’s had enough of Open AI . Uber discovered that telling engineers to use AI “as much as possible” has consequences in the P&L and Meta has been hacked big time through their chatbot.
Scott Galloway Predicts AI Valuations Will Crater 50-70% Within 24 Months. Here’s Why.
Scott Galloway has made his most aggressive call yet on the AI trade. On the Prof G Markets segment titled AI May Not Be Worth The Cost: Here’s Why, he argued that AI companies are dramatically overvalued and predicted a major repricing within the next 24 months.
Nvidia Acquires Kumo AI for $400M to Enhance Enterprise AI, Shares Dip Slightly
Nvidia has acquired predictive analytics startup Kumo AI for over $400 million to bolster its enterprise AI capabilities.
Amazon's €10B AI Investment: New Proteus Robot
Amazon announced a €10 billion investment in Europe, including the launch of an upgraded AI-powered warehouse robot, Proteus, set for a 2027 rollout.
Stock Market Today (June 5, 2026): Nasdaq falls 4% as semiconductor slide wipes $1T from markets - TheStreet
Investors focus on a key U.S. jobs report as markets assess the labor outlook.
Marvell Technology, Flex to Join S&P 500 Later This Month
Marvell Technology Inc. and Flex Ltd. will join the S&P 500 in the latest quarterly rebalance, S&P Dow Jones Indices said Friday.
3 Top AI Stocks to Watch in June 2026 - StocksToTrade
AI trading got off to a strong start in 2026 off Trump pro-AI executive orders, and there was real momentum behind defense AI, optical infrastructure, and quantum computing names. Then the market pulled back when tariffs and the conflict in the Middle East created uncertainty.
Wall Street Week | AI-era Internet, Spaceport Investment, Bolivia’s Investability
This week, the dollar remains the world’s dominant currency, but economist Ken Rogoff thinks rising debt and geopolitical shifts are eroding its position as the world’s reserve currency. And, AI-powered search is reducing web traffic, forcing publishers to rethink how they attract audiences and generate revenue online. Will competition between the US and China to launch satellites create the next major infrastructure boom in the commercial space industry? Bolivia wants to benefit from foreign capital flows into Latin America and harness its mineral reserves, but political instability remains a significant challenge. (Source: Bloomberg)
Microsoft AI chief says company was “set free” from OpenAI to pursue superintelligence
Microsoft's AI leadership suggests that distancing from OpenAI allows the company more flexibility to pursue its own superintelligence goals.
The coming sticker shock: The cost to use and adopt AI is going up
As noted in the latest S3T Strategic Awareness Dashboard, capital is still funding AI infrastructure, but scarce resources continue to impose operating constraints. Now two new forms of market friction threaten to slow AI adoption and value realization: stricter usage based pricing and emerging ...
Sam Altman: Now, AI costs are "a huge issue"
OpenAI CEO Sam Altman highlights the growing financial challenges and high costs associated with scaling AI models.
Labor, Society & Culture
AI & Tech Brief: The future of work - The Washington Post
Highlights from the Washington Post Intelligence Council dinner in New York City Thursday night, featuring insights on how AI will impact the future of work as more agents get deployed in workplaces.
A CEO denied raises to spend money on AI instead. Companies have ‘no idea what they’re going to need in a workforce’ when the AI race is over
Companies may be cutting raises and benefits to create attrition, one expert says.
AI Surge in APAC: Talent Shortage Threatens Growth Despite Widespread Adoption
While 74% of APAC organizations are piloting or deploying AI, a study by Aon indicates that only 21% believe they can recruit sufficient talent to support this growth.
Google Layoffs Hit Cloud Division Amid AI Shift
Google is laying off employees in its Cloud division and Threat Intelligence Group as it shifts focus to AI-driven growth.
Jobs Report Surprise as Employers Add 172K Workers in May | National News | U.S. News
“Friday's jobs report was much stronger-than-expected and shows that the labor market is turning a corner after a rough past 12 months driven by fears of AI and uncertainty over geopolitics and tariffs,” Glen Smith, chief investment officer at GDS Wealth Management, said in an email.
How Far Did They Go? The Persuasive Tactics of Covert LLM Agents in a Discontinued Field Experiment
arXiv:2606.05256v1 Announce Type: new Abstract: This study analyzes a publicly released dataset from a discontinued field experiment on Reddit's r/ChangeMyView. The intervention, conducted by unknown, external researchers and halted following ethical backlash, involved undisclosed AI-generated accounts engaging users in live debate. After public disclosure, Reddit authorized moderators to release an archive of the AI-generated comments, creating a rare opportunity to examine how large language models operated in an identity-rich deliberative forum without disclosure. We conduct a structured content analysis of this corpus, evaluating identity performance, authority signaling, alignment strategies, and activation of cognitive heuristics. Identity targeting or adoption appears in over two-thirds of comments, alignment moves and authority claims in nearly all of them, and cognitive-bias triggers -- particularly confirmation bias, representativeness, and availability -- in the large majority. These patterns co-occur systematically, composing a rhetorical architecture calibrated for persuasive efficiency rather than authentic deliberative participation. Compared against human-authored CMV counter-arguments, the agents inverted the typical distribution on every dimension: denser authority use, more adversarial alignment, and heavier reliance on external citation over experiential grounding. In such environments, distinctions between authentic and synthetic epistemic standing grow increasingly opaque -- an asymmetry that disclosure mandates alone cannot address. The results point toward auditing frameworks capable of assessing how AI systems structure credibility, not merely whether they are present.
The ethical dilemmas of AI
How we address the revolution is a question of how we manage its uncertainties
Technology & Infrastructure
Harnessing Generalist Agents for Contextualized Time Series
arXiv:2606.05404v1 Announce Type: new Abstract: Time series are often embedded in rich contexts that are essential for holistic modeling. Moreover, real-world practitioners often require end-to-end workflows for analyzing temporal dynamics, where widely studied tasks such as forecasting are only one step in a broader solution loop. While generalist AI agents offer a promising interface for such workflows under complex contexts, they still operate primarily in textual spaces that are not fully aligned with structured temporal signals. In this work, we introduce TimeClaw, an agentic harness framework for time series that equips generalist LLM agents with the time series-native runtime support needed for contextualized temporal reasoning. TimeClaw integrates executable temporal tools for grounded and auditable analysis, experience-driven capability evolution for creating reusable analytical routines, and episodic multimodal memory for retrieving relevant reasoning traces. Together, these components unlock harnessed open-ended temporal reasoning with contextual information. Extensive evaluation on multiple benchmarks covering diverse tasks across energy, finance, weather, traffic, and other real-world domains demonstrates improved performance of TimeClaw. Code is available at https://github.com/iDEA-iSAIL-Lab-UIUC/TimeClaw.
China Mobile Jiangsu and ZTE unveil intelligent complaint analysis agent to reshape core network O&M
PARTNER CONTENT: Leveraging multi-modal LLMs and agent technology to automate signaling analysis and shift core network O&M from experience to knowledge-driven
What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems
arXiv:2606.05304v1 Announce Type: new Abstract: Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared context window, and ultimately affect both system performance and inference cost. We analyze five common inter-agent communication strategies across two MAS topologies, finding that no fixed strategy is universally optimal. Instead, effective inter-agent messages consistently preserve action-centered information needed by downstream agents. Building on this, we propose the PACT (Protocolized Action-state Communication and Transmission), which treats inter-agent communication as a public state-update problem and projects each raw agent output into a compact action-state record before it enters shared history. Across different MAS topologies, PACT consistently improves the performance-cost trade-off, achieving comparable or stronger task performance with substantially fewer tokens. The gains extend to production coding harnesses: PACT lifts OpenHands' resolve rate at -10% tokens-per-resolved, and is resolve-neutral on SWE-agent while halving input tokens. Our code is publicly available at https://github.com/iNLP-Lab/PACT.
LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization
arXiv:2606.05400v1 Announce Type: new Abstract: Long-horizon autoformalization of research mathematics fails not only at hard lemmas, but at scale: statements drift, dependencies tangle, context decays, and local repairs corrupt distant work. We present LeanMarathon, a multi-agent harness for reliable research-level Lean autoformalization. Its core abstraction is an evolving blueprint: a Lean file that serves simultaneously as formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents construct, audit, prove, and repair this blueprint. These agents are coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review and then discharges the proof directed acyclic graph (DAG) from its dynamic leaves upward in parallel CI-gated rounds. LeanMarathon turns one brittle multi-hour run into many local, recoverable, parallel transactions. We evaluate LeanMarathon on two recent research papers spanning four Erd\H{o}s problems (#1051, #1196, #164, #1217). Across three autonomous runs, it formalizes all seven target theorems with no sorry, proving 258 lemmas and theorems. These results show that reliable AI co-mathematics requires not only stronger provers, but durable harnesses that preserve target fidelity across long mathematical developments. The code can be found at https://github.com/YuanheZ/LeanMarathon.
What Is Microsoft Web IQ? Inside the AI Agent Search Engine
The service supplies AI systems with real-time context pulled from across the web, spanning webpages, news, images, and videos. It’s designed to help agents not only surface the right information but also convert it into useful evidence and apply it to their reasoning.
Benevolent dictator Zuck will give Meta staff 30-minute breaks from keylogging privacy assault
The tech company is teaching AI to use computers by slurping staff activity.
Broadcom’s AI Chip Guidance Raises APAC Supply-Chain Questions - TechRepublic
Broadcom’s AI chip guidance pressured Samsung and SK Hynix as investors questioned AI hardware expectations, HBM capacity, and APAC supply-chain execution.
Kevin O’Leary agrees to downsize massive Utah data center
Following local pressure, Kevin O'Leary has agreed to reduce the scale of a planned large-scale data center project in Utah.
AI Data Centers Squeeze Memory Supply, Coalition Warns
Industry groups warn AI data center demand is reshaping memory markets, raising costs and creating supply-chain risks beyond tech.
The AI Race Is About to Get Physical – Unite.AI
For the past several years, the story of artificial intelligence has been told almost entirely in the language of software. Models are growing more capable by the minute, becoming faster and more specialised. Billions of...
Why Does AI Use Water? The Hidden Environmental Cost of Artificial Intelligence
This makes water consumption a hidden but important part of how modern AI functions at scale. The discussion around AI water usage is often misunderstood as a simple environmental blame issue, but it is more accurately an infrastructure challenge tied to performance and efficiency. AI data centers must balance speed, energy ...
Infotelecom launches data center on Balearic Island of Mallorca
The Balearic Islands-based technology company Infotelecom has opened a new data center in Palma de Mallorca, on the Spanish island of Mallorca. The project, located in the Son Castelló industrial park, represents the company’s largest investment in recent years and expands its network of facilities already in operation in Menorca. Full details of the new […]
Synthetic Contrastive Reasoning for Multi-Table Q&A
arXiv:2606.05382v1 Announce Type: new Abstract: Multi-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional reasoning across relational tables. Existing multi-table Q&A resources typically provide questions and final answers but lack reasoning supervision that explains how answers are derived. To address this gap, we construct a synthetic contrastive reasoning-trace dataset for MMQA by generating validated positive traces and plausible negative traces with heterogeneous LLMs. We then use the resulting preference pairs to fine-tune open-weight LLMs with Contrastive Preference Optimization (CPO). Across Qwen3-14B, Mistral-8B, and Llama-3.1-8B, CPO achieves absolute average improvements over Q&A supervised fine-tuning ranging from 9.7%-16.3%, with gains up to 21 percentage points on MMQA. Ablations show that heterogeneous positive and negative trace generators strengthen the contrastive signal, and automated as well as human evaluations indicate that the generated pairs are largely faithful, coherent, and meaningfully contrastive.
Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges
arXiv:2606.05384v1 Announce Type: new Abstract: LLM-as-judge evaluation is widely used in benchmarking pipelines, where model outputs are compared and ranked using automated evaluators. These pipelines typically assume that judgments are stable properties of fixed inputs. We show that this assumption does not hold under interaction. We study post-decision manipulability: the extent to which an evaluation outcome can be altered through subsequent conversation with the judge after an initial decision has been made. Across controlled experiments on MT-Bench and AlpacaEval, we find that LLM judges are highly stable under repeated and neutral reevaluation, yet become substantially reversible under targeted post-decision challenge. An anti-baseline challenge protocol shows that stable judgments can be overturned through motivated interaction, while a counterbalanced target-validation protocol separates this reversibility from net target-directed steering. These reversals have practical consequences: they can degrade agreement with human preferences, shift benchmark rankings, and produce harmful evaluation changes despite high self-reported confidence. Authority framing is especially destabilizing, and revised judgments are often accompanied by low-overlap justifications, suggesting post hoc rationalization rather than reliable error correction. We introduce the Evaluation Robustness Score (ERS) to quantify interactional robustness by combining reversal susceptibility with counterbalanced directional effects. Our findings identify post-decision interaction as a distinct failure mode for LLM-as-judge evaluation and motivate evaluation protocols that measure not only static agreement, but robustness under challenge.
Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution
arXiv:2606.05408v1 Announce Type: new Abstract: When an LLM repeatedly mutates a program, does it explore new forms or circle back to the same ones? We study this question by analyzing LLM-driven mutation chains in the absence of selection pressure within a domain-specific language, varying prompt design, model family, and stochastic replication. We find that LLM-based mutation consistently converges toward restricted attractor regions in program space. Convergence is especially severe at the structural level: in 87% of chains, over 93% of mutations revisit a previously seen structural form, with most variation confined to terminal substitutions within recurring templates. Cycle analysis reveals short cycles and self-loops dominating the transition structure. The rate of convergence varies with prompt wording and model choice, but the phenomenon is robust across conditions. A classical GP subtree mutation operator does not exhibit comparable convergence, suggesting that the effect is intrinsic to the LLM mutation pipeline. These findings reveal a tension at the heart of LLM-driven program evolution: the same capabilities that enable semantics-aware program transformation also carry a systematic bias toward structural homogeneity that must be accounted for if such systems are to sustain open-ended exploration. Source code is available at https://github.com/can-gurkan/lmca.
AI is designing OpenAI's next model in a sign of 'superintelligence': SoftBank's Masayoshi Son to CNBC - IndiaVision India News & Information
However, if AI can contribute to ... of AI, the pace of progress could become exponential. This potential acceleration has profound implications across numerous sectors. Industries reliant on complex problem-solving, scientific discovery, and advanced automation could see transformative breakthroughs occurring much sooner than expected. From medical research and climate modeling to financial ...
I Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge Acquisition
arXiv:2606.05316v1 Announce Type: new Abstract: Multimodal memes are dynamic and often require up to date background knowledge for interpretation. Existing methods often overlook such knowledge or rely on fixed parametric knowledge of pretrained models that may be incomplete, outdated, or unavailable for emerging memes. We introduce Query Retrieve Conclude, a zero shot framework that identifies missing knowledge, retrieves open web evidence, and synthesizes evidence grounded background knowledge for meme understanding and detection. We also introduce a curated meme understanding benchmark of recent memes from 2024 to 2026 with external background knowledge annotations. Experiments on three meme understanding datasets and five meme detection tasks show that our framework improves knowledge recovery, meme understanding and downstream detection over zero shot baselines.
A Motivational Architecture for Conversational AGI
arXiv:2606.05411v1 Announce Type: new Abstract: Motivational architectures in cognitive AI have largely been designed for physical agents regulating bodily needs. Conversational agents operate in a different regime: their sensorimotor loop is linguistic, their environment is a user's evolving mental state, and their consequential actions are speech acts, tool invocations, and strategic silences. This paper proposes a conversational reinterpretation of the OpenPsi motivational lineage, coupled to MetaMo's higher-level motivational scaffold, for agents built on a modular execution substrate. Homeostasis is recast in dialogue-native terms: the agent regulates competence, uncertainty reduction, affiliation, affinity, legitimacy, nurturing, and aesthetic coherence rather than bodily deficits. We propose three contributions: a ten-stage motivational processing pipeline that architecturally separates cognitive modulation from situational appraisal; a dual decision strategy blending urgency-driven fast response with deliberative multi-goal optimization; and an architecturally useful distinction between pre-action feelings and post-action emotions as functionally different forms of affect. We specialize the framework to two example agents -- CompanionAgent and ResearchAgent -- and sketch its extension to social robotics and domain-generic human-level AGI.
Former cyber executive turned whistleblower accuses IBM of covering up several data breaches
A former executive has alleged that IBM failed to properly disclose multiple security breaches, prompting a whistleblower investigation.
Nobody needs Mythos or 0-days to build a chaos-causing computer worm
Boffins warn that attackers can now cheaply operationalize known vulnerabilities at scale using free open source models.
Cyber risks sharpen as open-source AI closes gap with frontier models, UK AISI says
Open-source AI models may be only months behind leading frontier systems, potentially complicating efforts to mitigate AI-powered cyber threats.
Midnight Labs Launches Ceartas to Combat Piracy and Deepfakes, Backed by Sony
Midnight Labs has launched Ceartas, a tool supported by Sony that scans over 75 million sources to protect content creators from piracy and unauthorized AI impersonation.
Adoption, Deployment & Impact
Agentic AI hype races ahead as enterprises remain stuck in pilot mode
Most orgs remain trapped between flashy demos and real-world deployment, despite 75% saying adoption is racing ahead
AI adoption surges among investment managers in 2026
SimCorp's 2026 report reveals 70% of buy-side firms use AI in the front office. Read the key findings now.
An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)
arXiv:2606.05357v1 Announce Type: new Abstract: Purpose: To develop an interpretable and trustworthy AI framework that combines deep learning based MRI Osteoarthritis Knee Score (MOAKS) prediction with interpretable statistical modeling to study structure-pain relationships at scale using data from the Osteoarthritis Initiative (OAI). Materials and Methods: We first developed a deep learning framework to predict MOAKS features directly from knee MRIs and incorporated conformal prediction to provide prediction uncertainty quantification. This uncertainty-aware strategy enables explicit filtering of model outputs, retaining only high-confidence MOAKS predictions at the knee level. Second, we applied a longitudinal latent class mixed model (LCMM) to examine associations between key structural abnormalities and four complementary knee pain measurements. Results: Among the three MRI-defined abnormalities (i.e., bone marrow lesions (BML), cartilage loss (CART), and meniscal extrusion (ME)), our framework substantially improved the Matthews correlation coefficient (MCC) and some other metrics. For example, MCC increased from 0.69 to 0.91 for BML, from 0.45 to 0.80 for CART, and from 0.59 to 0.89 for ME. Using these high-confidence predictions, we expanded the sample size to 2,175 knees for the LCMM analysis. Two distinct pain trajectories were identified (rapid and stable pain progression). The estimated odds ratios (95% CI) for the rapid progression group were 1.62 (1.12-2.35) for BML, 1.83 (1.24-2.70) for CART loss, and 2.50 (1.75-3.57) for ME. Conclusion: These results highlight the importance of these structural abnormalities as risk factors for pain and functional progression in osteoarthritis.
Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory
arXiv:2606.05334v1 Announce Type: new Abstract: Returned products in circular factories re-enter production with heterogeneous degradation states, usage histories, and remaining capability. Reuse cannot be decided from the current inspection alone, because future function fulfillment and component integrity may evolve differently under the next service scenario. Existing PHM approaches support degradation prediction, but often target fixed operating conditions or isolated component benchmarks, while material-fatigue assessment is rarely linked to system-level functional prognosis. This paper addresses this gap for an angle grinder by combining uncertainty-aware functional prediction with component-level fatigue assessment in an instance-specific reliability workflow. The proposed framework combines the current tool state with recent force--torque usage windows. A convolutional encoder extracts loading patterns from spindle forces and shaft torque, and an LSTM backbone predicts nine functional variables as Gaussian mean and variance estimates. In parallel, the same loading history is translated into output-shaft fatigue information through finite-element-supported stress reconstruction, S--N/Miner damage evaluation with Haibach extension, and Paris-law crack-growth analysis. A streaming replay algorithm consolidates both branches into functional, material, and system reliability trajectories. Held-out tests show mean \(2\%\)-tolerance accuracy of 0.9652 across nine outputs. Thermal variables are predicted near-perfectly, while drive motor current and load speed remain the most demanding dynamic outputs, with \(R^2\) values of 0.9750 and 0.9924. Torque history is especially important for these variables, and the conventional LSTM outperforms GRU and xLSTM in the short-history setting. Reliability calibration is most informative for drive motor current, where predicted and observed exceedance probabilities ...
Microsoft's AI Futurist explains how he uses Copilot — and the real-world problems enterprises are solving with agents
Microsoft used its Build 2026 conference this week to push a clear message: agents are rapidly moving into production throughout enterprise systems, and the winning platform will be the one that gives them reliable context, governance, identity, memory — and secure access to enterprise data. The company announced Microsoft IQ as a context layer across GitHub Copilot, Microsoft Foundry and Copilot Studio; Work IQ APIs coming June 16; Fabric IQ for structured business data; Foundry IQ for retrieval across enterprise knowledge and the live web; and Web IQ as a new agent-facing web search stack. Microsoft also introduced Scout, a personal work agent, and a whopping seven new in-house AI models in its growing MAI family across modalities and use cases, including MAI-Thinking-1. Those announcements sit directly in Marco Casalaina’s lane. Casalaina is Microsoft’s VP Products, Core AI and AI Futurist. He leads Microsoft’s AI Futures team and previously led teams across Azure AI, including Azure OpenAI, Vision, Speech, Decision, Language, Responsible AI and AI Studio. Before Microsoft, he led Salesforce’s Einstein AI team and earned a computer science degree from Cornell University. CRN reported that he joined Microsoft in early 2022 as vice president of products for Azure Cognitive Services, meaning he has now been at the company for more than four years. VentureBeat spoke with Casalaina ahead of Build about Microsoft’s agent strategy, the company’s model-choice philosophy, how Microsoft IQ fits with MCP, and why he believes enterprises need far more than just access to powerful models. The interview below has been edited for clarity and condensed from the transcript. VentureBeat (VB): To start, can you explain your role at Microsoft and what “AI Futurist” means in practice? Marco Casalaina (MC): I am VP Products of what we call Core AI. Core AI is our set of tools for AI developers, and that includes Foundry, Visual Studio, VS Code, GitHub and GitHub Copilot. That’s our overall group. My Silicon Valley title is AI Futurist, and that has a very concrete meaning here. I’ve worked with other folks who are considered futurists, like Peter Schwartz, and that can be a little bit more fuzzy. For me, what it means concretely is that I am the first person to try anything new here. I am constantly getting things from all over Microsoft, not even just Foundry, because I work with really everybody across the company. Pretty much everybody sends me the new things at all times. Even today, I got something brand new just before this call. I’m usually the first person to try anything new here, which is pretty cool. I get to see a lot of really cool stuff. A friend of mine, who is head of AI at Intuit, calls me an “adjacent possiblist.” I consider my futurist concept to be about a year out from now — the immediate future of what’s about to happen next. That’s what I focus on. VB: Where are you looking at the agentic state of things, and in particular Microsoft’s position as enterprises and individuals rush to adopt agentic AI? MC: We can look at it from bottom to top. At the very base of the stack is our commitment to model choice. All along, we’ve had the OpenAI GPT frontier models. Now we have a really solid partnership with Anthropic, where we’re offering the Claude models. We just launched Claude Opus 4.8 on Azure — on Foundry, I should say — and at Build, we are introducing our new MAI model. The MAI models are a set of frontier models that we’re building in-house. They are made for token efficiency, optimization and customization. We are specifically making them for our customers to customize on their own data sets. One level above that, we are announcing hosted agents in Foundry. That is our managed agent capability in Foundry. It automatically handles scaling, containerization and those kinds of things. It is an environment where you can manage agents. One level above that is the Foundry control plane. At least for the agents you build, you want to have control over them. This gives you observability into their cost, tokens and correctness. You can do continuous evaluations and sample interactions with those agents, run evals and make sure they are continuing to work and not drifting. The big news is going to be the GA of what we call the IQs here at Microsoft. There are currently three, and there will be four. There is Foundry IQ, which is basically for knowledge — largely unstructured knowledge. There is Fabric IQ. We have a ton of customers who have entrusted a lot of data to the Microsoft Cloud in Fabric, Power BI and related technologies. Fabric IQ is about making an agent-facing interface for this data, so agents can get to it without literally going through a Power BI report. That’s ridiculous. Work IQ is about the Microsoft ecosystem. You can look at Work IQ as the agentic face of all the Microsoft apps: Outlook, Teams, Word, SharePoint and all those kinds of things. How does an agent interact with those things? That is Work IQ. And finally, the fourth IQ is Web IQ. We are releasing our new agent-facing web search capability. It can search the web, search through videos and even do some kinds of browsing tasks automatically. It is super fast, and it kind of has no face. It’s headless. The interface is intended for agents. We will also be announcing Agent Optimizer. That includes a new type of evaluation that allows you to evaluate much more granularly whether an agent is actually working and working correctly. The optimization step can go back in and make modifications to the prompt, obviously with your consent, and modify your agent so it works more correctly going forward. Effectively, it creates a feedback loop to make agents work better. VB: Microsoft has sometimes been criticized for murky and clunky product naming. Where do these IQ products sit? Are enterprise users supposed to go to IQ first, or is IQ more for developers to connect to? MC: All of the IQs are headless. The concept of IQ is that each one provides a different type of context to an agent specifically. Largely, it will be developers interacting with the various IQs — developers and the agents they build. The IQ brand is really about agent context. End users largely won’t interact with the IQs. It is true that if you use Microsoft 365 Copilot today, you’ll notice a little thing that says it is using Work IQ. So it is a little bit visible, but the customer or end user doesn’t have to go find the IQ. Their system or developers hook that up. VB: Is the IQ family essentially Microsoft’s version of MCP? Is it using MCP, or is it something different? MC: All of the IQs are indeed exposed as MCP servers. You have correctly characterized MCP as basically an agent-facing or self-describing API. It’s not that fancy. That’s really what it is, with some authentication layers and capabilities built in, which is super useful. Something like Work IQ — really all the IQs — have to be authenticated. In order for Work IQ to see my email, Teams messages, documents and stuff like that, I have to be able to authenticate it on behalf of me. That gets us to another core differentiator that we will be announcing at Build, which is agent identity. We have this Entra system, and Entra is, I believe, the world’s largest used identity system for human users. For some time now, you have been able to declare an agent to have an identity in there. Now, agents will be able to have their own identity, their own Teams box, their own email inbox and stuff like that. These agents will use Work IQ to check their own email, check their own documents and that sort of thing. VB: Enterprises are not one-size-fits-all on models. Microsoft supports many leading models through Foundry and Azure, while also building its own. Is Microsoft a model company, an infrastructure company or a connector between models and work products? MC: The answer is yes. We are obviously the hyperscaler. We are absolutely committed to model choice, and we will continue to offer the frontier models from all of the major players: OpenAI, Anthropic, Mistral, Black Forest, xAI — you name it. They are all going to be represented in there. At the same time, we have what is now called our Microsoft AI Superintelligence Team, formed by Mustafa Suleyman, and we are building our own frontier models as well. Like I said earlier, we are really gearing these models toward optimization — token efficiency, bang for the buck and customization. These are things our customers have been asking for: the ability to more finely customize models, whether that is fine-tuning or continued pre-training. Continued pre-training is literally changing the weights of the model, whereas fine-tuning is adding a little layer on top. We have these capabilities in Foundry: fine-tuning, distillation and those kinds of things. I would note, by the way, that our MAI models are not distilled. Some model providers, especially some of the less scrupulous ones, will distill other models into theirs, and that can have unusual effects. We don’t do that. The data provenance of our models is of primary importance to us. When we come out with these models, we want our customers to know that the data provenance is clean in terms of the rights to the data, where it came from and all that kind of stuff. The choice thing also goes above the model layer. When we talk about Foundry hosted agents, we have the Microsoft Agent Framework. You talk about agent orchestration — how you make agents work together when you have multiple agents — and Microsoft Agent Framework is an excellent framework for that. However, I can make a LangGraph or LangChain Foundry hosted agent. I can make a CrewAI Foundry hosted agent. I can use any number of orchestration frameworks and put that up as a Foundry hosted agent, and it becomes a first-class Foundry agent. That means I get the observability. It shows up in the Foundry control plane. I can do evaluations on it. I can do traces on it. I can get all those things from the Foundry control plane with an agent built in really any framework I choose. VB: Some companies are interested in Chinese and open-source models. How much of Microsoft offering its own models is about giving customers an American version of that? MC: I can’t speak to that exactly. Of course, we offer DeepSeek models and Qwen models in Foundry, so we offer all of these choices today, and our customers can make that choice. The MAI models are really focused on token efficiency and customizability. That is what our customers are demanding, and that is the gap we are filling. VB: As agents take on longer tasks and more specialized work, will enterprises keep expanding the number of models they use, or will there be a winnowing? MC: I do see it expanding. We are not just focused on tokens per se. A token is not a token is not a token. One token is not necessarily equivalent across these things. It is all about what you are doing with each token and the efficiency of that. It comes back to what kind of value you are getting for the cost. That is a lot of the rationale behind why we are developing our own MAI models. Part of my job is to travel all around the world. I’ve been all over the place. For example, I’ve been working with Bayer. One of the things we are measuring is not just token usage, but number of users — monthly active users and daily active users — because we have a lot of first-party capabilities like Microsoft 365 Copilot. Over the last year, we’ve seen a 6x increase in monthly active users. We have over 20 million users of Microsoft 365 Copilot alone. That is on the agents you use. In terms of the agents you build, Bayer put up its own agent system on Foundry, and now it has 20,000 of its own employees on it. A few weeks ago, I was in Sydney, Australia, hanging out with AEMO, the Australian Energy Market Operator. They operate the electrical grid of Australia. They showed me that they had built agents to manage grid operations. This is a human-centered thing. They have grid operators sitting in centers in West Sydney, Brisbane and places like that, and they are bombarded with alerts. I wouldn’t believe it if I hadn’t seen it myself. The alerts are constant. They built a system to triage those alerts. Is this alert a super major thing, or is it just that a transformer is getting a little hot? It also says, here is when we had this problem last time, and here is how we resolved it last time. Maybe now we need to replace this component, or whatever. Ultimately, it is the grid operators making the choice. A lot of our philosophy here is human empowerment. These human-centered agents are the ones that are working best among our customers. What I saw at AEMO and Bayer is this notion of human empowerment: taking away some of the grunt work, or in the case of AEMO, taking billions of alerts and reducing them to something much more manageable and actionable for the people involved. We are moving past the era where agents are just answering questions. AI in general is moving past that. We are not just answering questions anymore. We are moving toward a place where AI can really meaningfully help you do your work. VB: How do observability, tokenomics, ROI analysis and agent governance fit into Microsoft Foundry? MC: That is what the Foundry control plane is all about. We introduced it in November of last year. If you looked at my own Foundry control plane — I’ve built a ton of these agents, and I am a developer by background — you would see all of my agents that are running and the ones that are paused. I can see how many tokens they’ve used over the last day, week or month. I can look at trends. I can look at costs, because the cost will be different depending on what underlying model I’m using. If I’m using our model router, it can route to different models depending on the complexity of the inbound prompt. We also have Azure cost management overall. Azure has had cost management for over a decade, before the AI thing even happened. This integrates with overall Azure cost management. It is not just narrowly about what your AI is doing. Your AI will be using storage resources, data resources and other compute resources around that AI. You can get a complete picture of not just the cost and token usage of the AI itself, but everything around it. When you think about governance, that also extends to evaluation. One of the things we are releasing in preview is rubric-based evaluation. Rubric-based evaluation is much more granular. Let’s say you have built a restaurant reservation agent. The things you want to test about that agent are not really groundedness. Groundedness is the opposite of hallucination, and that is very question-answering. For a restaurant reservation agent, you want to test very granular things. If you say, “Make me a table for two tomorrow,” did it come back and ask, “What time would you like the table?” Before it gave you a table for two tomorrow at 6 p.m., did it actually check that the table was available, or did it randomly give you a table without checking first? There are very granular things you want to test about that specific use case. You don’t just want to test whether the agent works. You want to test whether the agent works right. That is what we are approaching with our new rubric-based evaluation system. You will see that in Satya’s keynote. I have been using it myself lately, and I’m very happy about it. I’ve been waiting for this. VB: Microsoft is also partnering with companies like Anthropic and allowing Claude to work with Microsoft 365. How important is Copilot to this story? Why would someone turn to Copilot over other options? MC: Microsoft 365 Copilot is a huge advantage for us. As I mentioned, we crossed the 20 million user mark on Copilot relatively recently. The great thing about that is that it is the face. When you go into Foundry and make an agent, there is a button that says “publish to Copilot” — actually, it says “publish to Copilot in Teams,” because you can put it in Teams too. The idea is that you want to put these agents where your users are. A lot of people who use the Microsoft ecosystem are in Teams, or they are using Copilot. I can create a custom agent, as many of my colleagues have, and now it is in Copilot, which I use maybe 50 times a day. Since January, Copilot has become more and more capable. I now use it to draft my email. I am not just using it for question answering. I’m starting to use it to manage my calendar and draft emails. I really do this every day now. When I want to use a custom agent — for example, to file my expenses, because we have a custom agent for that now — I can access that agent not in some random standalone interface, but in Copilot or Teams, where I already am. That surface area that people are already engaging with is a major advantage. VB: As people offload more repetitive work to AI, what are they able to spend more time doing? MC: Let’s consider something I did yesterday. I got an email from a customer named Frankie, and he asked me a question about Foundry hosted agents. I knew the answer because I had talked to my colleague Jeff Holland, who is the head of our hosted agents product management. I had asked Jeff the same question two weeks ago. Where or how I asked him, I don’t remember. Was it in Teams? Was it email? Was it a meeting? I don’t really remember. But I knew the answer to the question Frankie was asking. So I went into Copilot and said, “Answer Frankie’s question about how hosted agents scale, and reference the conversation I had with Jeff a couple of weeks ago on this same topic.” And it did it. It drafted the email. Over time, I have taught Copilot my style. I don’t do the bold-print thing. I tell it: don’t use em dashes and that kind of stuff. I have a certain style in the way I write emails. It’s a little terse, to be perfectly honest, but I want it to be the way I write. It drafted this thing. It searched through my Teams messages, my emails and the transcripts of my meetings with Jeff. It used Work IQ, as a matter of fact. It found the answer, drafted the email and provided a link to the documentation that specifically covered the question Frankie was asking. I looked at the draft and thought, yep, that’s it. Yes, I could have composed this email myself. I knew the answer to the question. I could have looked up the documentation. If I dug around, I’m sure I could have found the conversation I had with Jeff in whatever medium that was. I could have done that stuff. It probably would have taken me, I don’t know, an hour to find all the information and compose it. Instead, I did it in about a minute. I had a draft, I looked at it, I was happy with it, I pressed send, and that was the end of that. It really is about giving people time back. It is not even just grunt work. It is all this time you spend looking things up and finding things. Now, I can make it take an action. It didn’t just answer the question. It fully drafted the email and copied Jeff. VB: Do you fear for your job? How has AI changed your own work? MC: I don’t fear for my job. My job has changed. For one thing, I do a lot more now, both in my business life and personal life. This weekend I was using Web IQ, the new Web IQ. I’ve been car shopping. My car’s lease is coming up, and there is a very specific car I’m trying to find, which is hard to find. It’s a Hyundai Ioniq 6, which Hyundai, for whatever reason, has stopped offering in the United States. I’m going to get one, though. I set my agent to the task, using Web IQ, of finding all the Hyundai Ioniq 6s available in the entire Bay Area — everywhere, all the way out to Sacramento, all the way as far south as Gilroy. I set it to this task, and then I went on a hike. When I got back, I had a big long list of all the Hyundai Ioniq 6s, at least the 2024 and 2025 models, available in the entire Bay Area. From that, I started calling down these dealers. Even in my personal life, I’m using it constantly. It saves me a ton of time. That would have taken me hours, to go through every single dealer’s inventory like this. But Web IQ could do that, and it was super quick. VB: Any final thought for developers around this news? MC: Foundry is really the place. This is the place where you can build your agents, scale your agents, test your agents and improve your agents. That’s what it’s all about, and it’s happening.
DocuSign Shares Drop After Full-Year Guidance Disappoints Investors
DocuSign's shares declined amid a broadly negative market day following the company's full-year guidance, which failed to excite investors. Allan Thygesen, CEO of DocuSign, discussed the company's AI-powered intelligent agreement management platform, highlighting strong adoption with 40,000 customers currently live on the platform. He speaks with Romaine Bostick and Katie Greifeld on "The Close." (Source: Bloomberg)
AI-Designed Vaccine Marks Global Breakthrough
Scientists unveil the world’s first AI-designed vaccine, raising hopes for faster disease prevention and future medical breakthroughs.
Palantir wins £9M contract to run UK firearms licensing
The CIA-backed business will hold gun, bomb, and poison records after beating Accenture and NEC for the decade-long deal.
Airbnb’s Brian Chesky plans to launch a new AI lab
Airbnb CEO Brian Chesky is reportedly planning to establish a dedicated laboratory for artificial intelligence research.
Meta Targets Health-Focused AI
Meta is prioritizing health as a differentiator for its AI strategy, integrating these capabilities into platforms like Instagram and WhatsApp.
Geopolitics, Policy & Governance
AI assistants face EU review for possible 'gatekeeper' designation
Large tech companies are providing EU enforcers with usage data on AI assistants, which could lead to tighter regulation under the Digital Markets Act.
MAGA hates AI, but Trump agrees with Bernie it might be time for partial government ownership
"You make them a partnership in this revolution," Trump told reporters Friday. "It would be a beautiful thing."
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.