Thu 4 June 2026
Daily Brief — Curated and contextualised by Best Practice AI
Phoenix Raises Rates, Seattle Bans Data Centers, and Fed Questions AI's Gains
TL;DR Phoenix's largest utility proposes a 45% rate increase for data centers to cover AI's power needs. Seattle plans to ban new data centers, impacting major tech firms like Amazon and Microsoft. Federal Reserve's Mary Daly expresses skepticism about AI's productivity benefits despite its potential. Academic papers explore AI's role in scientific advancement and reasoning model efficiency.
The stories that matter most
Selected and contextualised by the Best Practice AI team
Phoenix Is a Data-Center Mecca—and Test Case for How to Pay for AI’s Power Needs
The state’s largest utility is proposing a 45% electricity-rate increase for data centers and a 14.5% hike for households. No one is happy.
Does Artificial Intelligence Advance Science?
arXiv:2606.05118v1 Announce Type: new Abstract: This paper examines whether and how artificial intelligence (AI) advances scientific creativity. Drawing on scientific publications, the primary output of researchers, we analyze over one million publications from OpenAlex to investigate the relationship between AI adoption and multiple dimensions of scientific creativity, including novelty (recombinant novelty and object novelty) and impact (3-year short-run citation impact and 10-year long-run citation impact). We find that AI publications are significantly more likely to achieve top-decile creativity relative to non-AI publications, with 5.5 to 10.2 percentage point higher likelihood to rank in the top creativity decile. Critically, we uncover substantial heterogeneity across AI research modes. Tool-oriented AI research, which applies existing AI models to domain tasks, is associated with the largest gains in recombinant-based creativity, while Adaptation-oriented AI research, modifying AI models for domain-specific problems, is associated with relatively higher object-based creativity. These findings reveal that AI does not advance science through a single mechanism but through structurally distinct creative pathways that depend on how AI is incorporated into the research process. Our results contribute to ongoing debates about AI's role in science and carry direct implications for research evaluation and science policy, highlighting the need for assessment frameworks that can distinguish between recombinant and conceptual forms of creativity and that recognize how different modes of AI adoption produce fundamentally different types of scientific contribution.
Fed's Daly Says Forward Guidance Could Be Misleading
Federal Reserve Bank of San Francisco President Mary Daly says monetary policy is in a good place, but there's too much uncertainty ahead. She's also bullish on artificial intelligence, but has seen no proof yet that it leads to productivity gains. Daly speaks on "Bloomberg Tech" from the Bloomberg Tech conference in San Francisco. (Source: Bloomberg)
Seattle poised to ban new datacenters in blow to big tech hub
Measure in Amazon and Microsoft’s backyard expected to succeed next week in blow to big tech amid AI boom Seattle’s city government is on the verge of passing a year-long ban on the construction of new datacenters, the largest city yet in the US to consider such a moratorium as nationwide backlash grows. Four companies sought to build five large datacenters in areas serviced by Seattle’s public utility; if approved, they would have consumed approximately a third of the city’s current daily demand for electricity. Continue reading...
Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification
arXiv:2606.04037v1 Announce Type: new Abstract: Pre-deployment verification of enterprise artificial intelligence (AI) agents remains a critical gap between large language model (LLM) capability benchmarking and production deployment. Post-deployment monitoring, human-in-the-loop controls, and prompt-level guardrails offer limited assurance once an agent is operating in production. We propose an ontology-grounded verification framework combining three components: an Agent Operational Envelope formalizing the certification space across permissions, domain constraints, safety properties, governance rules, and autonomy levels; an ontology-to-scenario generation pipeline that derives regulatory, operational, and adversarial test scenarios automatically; and a Trust Certificate carrying a machine-verifiable attestation with graduated deployment verdicts (Approved, Conditional, Rejected). A controlled pilot across four regulated industries (Fintech, Banking, Insurance, and Healthcare), instantiated as five industry-by-regulatory-regime cells across the United States and Vietnam, generated 1,800 scenarios evaluated against 125 primary-source regulatory requirements and 25 injected faults. Ontology-grounded generation (G4) achieved 48.3% regulatory coverage versus 33.1% for the persona-based baseline (corrected p = .0006) and the highest domain specificity (4.77/5.0; p = 2e-6). The coverage advantage over baseline and retrieval-augmented prompting was not robust after Bonferroni correction. Cross-validation across three LLM families (Claude Sonnet 4, Qwen 2.5 72B, Gemma 4 26B; 5,400 total scenarios) replicated the persona-versus-ontology pattern. The results establish ontology-grounded scenario generation as a credible complement to persona-based test suites for regulatory-intensive domains.
Economics & Markets
The coming equity surge will test the US bull run
Mega IPOs and share offerings challenge the appetite for AI stocks
Reuters AI News | Latest Headlines and Developments | Reuters
Alphabet has increased the size of its equity offerings to $84.75 billion, in a sign of strong investor appetite for big tech companies as they expand their AI infrastructure and computing power.
What Suno’s $5.4 billion valuation says about the future of AI and music—and what remains uncertain
From birthday songs to hospice tributes, Suno is finding real-world uses for AI-generated music. Whether that translates into a sustainable multibillion-dollar business is less clear.
Goldman Sachs CEO David Solomon on the Coming Mega IPOs
As companies like Anthropic and SpaceX file to go public, 2026 may turn out to be a year of mega IPOs. Goldman Sachs CEO David Solomon joins Joe Weisenthal and Tracy Alloway on the Odd Lots podcast to discuss banking in the age of AI and why it's still "good for the US to have the biggest, most important companies in the world." (Source: Bloomberg)
AI Financing: An Arms Race for Investors | StartupHub.ai
GoldenTree's Steven Tananbaum likens AI financing to an "arms race," noting a shift towards private credit and the challenges of valuing AI companies in a compe
Whale Rock Talks AI, Waratah Sees Gold at Sohn Montreal
Hedge fund managers and other investors are gathered at a conference in Montreal to pitch the next big thing, and everything from artificial intelligence, biotech and gold mining was on the table Thursday.
German VC Merantix Capital closes €103 million fund for early-stage AI startups
Berlin-based Merantix Capital today announced the closing of its €103 million AI Fund, which will invest in early-stage, AI-native companies across logistics, manufacturing, energy, finance, healthcare, life sciences, robotics, enterprise and physical AI. The fund will make approximately 40 investements across European AI, with strategic LPs including Union Investment, Jungheinrich, KPMG Germany, the Robert Wood […]
A Look At WEX (WEX) Valuation As AI Investments And Multi Segment Expansion Target 2026 Earnings Growth - Simply Wall St News
WEX (WEX) is back on investors’ radar after management outlined expected 24.1% earnings growth for the second quarter of 2026, tied to AI driven cost savings and multi segment expansion. See our latest analysis for WEX. At a share price of $148.35, WEX has seen its 90 day share price return ...
Geopolitical Tensions Reshape AI, Defense, And Energy Markets As NASDAQ: AVGO Surges And NASDAQ: MU Eyes Memory Chip Gains
The Foreign Policy Journal is a leading news publication covering US and international politics, business, and global affairs news.
Benchmark raises its first-ever growth fund as part of $2B capital raise
Benchmark has successfully raised its first growth fund as part of a larger $2 billion capital injection.
Lovable signs multi-year deal with Google Cloud to up usage 5x
Lovable has entered into a multi-year agreement with Google Cloud, with plans to increase its usage by five times.
The tech that could make Marvell the next trillion dollar company
CU later, rivals? That's if Broadzilla doesn't eat its lunch first.
Blackstone-Backed Liftoff Raises $437 Million in Revived IPO
Liftoff Mobile Inc. raised $437 million in a US initial public offering that priced above its marketed range, in the company’s second attempt at going public this year.
Tech stocks today: Tech stocks tumble following Broadcom, CrowdStrike earnings
This week, tech investors will be watching chip and cybersecurity earnings, the impact of AI on labor market data, and the Computex annual chip summit.
CrowdStrike shares fall as 'Mythos moment' fails to cheer investors | Reuters
CrowdStrike shares slid 7% on Thursday after the company's quarterly forecasts failed to meet steep investor expectations, even though demand for cybersecurity software was buoyed after Anthropic announced its Mythos AI model.
Capacity, Technology Portfolios, and the Paradox of Concentration
arXiv:2407.03504v3 Announce Type: replace Abstract: Does limiting the largest firm's capacity always lower prices? We model firms competing in supply schedules with multiple technologies, each defined by a constant marginal cost up to capacity. We show that capacity and technological efficiency coexist as distinct sources of market power, with opposite policy implications. When efficiency drives the market power of the largest firm, a small transfer of higher-cost capacity from rivals to the leader raises concentration yet lowers prices, contrary to standard antitrust intuition. Large transfers raise prices, tracing a U-shaped relation between prices and concentration. We prove existence and uniqueness of equilibrium, and extend the results to other oligopoly models. Evidence from Colombia's wholesale electricity market, where weather shocks shift hydropower capacity across technology-diversified firms, supports the pattern. Counterfactual transfers to the largest firm lower prices by up to 30% in the least concentrated markets. We draw implications for capacity caps, divestitures, and merger review.
Emerging-Market Stocks Fall as Broadcom Miss Revives AI Concerns
Emerging-market equities headed for the sharpest drop in roughly three weeks as Asian technology heavyweights retreated after a disappointing outlook from Broadcom Inc. raised concerns about the scale of the artificial-intelligence rally.
Peer Effects in Consideration and Preferences
arXiv:2310.12272v4 Announce Type: replace Abstract: We develop a general model of discrete choice that incorporates peer effects in preferences and consideration sets. We characterize the equilibrium behavior and establish conditions under which all parts of the model can be recovered from a sequence of choices. We allow peers to affect preferences, consideration, or both. We show that these peer-effect mechanisms have different behavioral implications in the data. This allows us to recover the set and the type of connections between the agents in the network. We then use this information to recover each agent's preferences and consideration mechanisms. These nonparametric identification results allow for general forms of heterogeneity across agents and do not rely on the variation of either exogenous covariates or the set of available options (menus). We apply our results to model expansion decisions by tea chains and find evidence of limited consideration. We simulate counterfactual predictions and show how limited consideration slows market penetration and competition.
What We Learned About the AI Threat from Q1 Software Earnings | Morningstar
In most cases, the death of software companies has been exaggerated, according to Morningstar analysts.
Not All Errors Are Equal: Consequence-Aware Reasoning Compute Allocation
arXiv:2606.04402v1 Announce Type: new Abstract: Modern reasoning models can allocate different amounts of test-time computation, such as thinking tokens, model calls, or compute budget, to different tasks. Existing methods generally drive this allocation by predicted difficulty and spend more compute where it is expected to raise accuracy. This implicitly assumes that all failures cost the same, since an accuracy objective weights every task equally. However, such an assumption does not hold in deployment: A typo in a log message and a migration that corrupts a production database both count as one benchmark failure, but their real-world costs are fundamentally different. To fill this gap, we propose consequence-aware test-time compute allocation. Instead of routing compute only by predicted difficulty, we use a lightweight predictor to estimate from the issue text how costly a task would be if solved incorrectly. The scheduler then routes higher-consequence tasks to larger compute tiers or higher thinking budgets under the same total budget. We conduct main experiments on SWE-bench Lite and evaluate cross-dataset behavior on Multi-SWE-bench mini, covering 700 software-engineering tasks in total. Our results reveal that consequence and difficulty are approximately orthogonal under various annotations, and that current thinking models do not allocate compute sufficiently according to consequence. Moreover, our issue-only predictor never misclassifies a high-consequence task as low-consequence across the 300 SWE-bench tasks. Under matched compute budgets, our consequence-aware scheduler reduces cost-weighted loss by 22% to 33% relative to difficulty-aware routing; in particular, the priority-aware variant, which routes by per-task cost scaled by the marginal-utility signal, crosses 30%, and its deployable predictor-driven version retains over 90% of the oracle gain.
From Live to Recording: Consumer Demand and Response to Price Across the Livestreaming Lifecycle
arXiv:2107.01629v3 Announce Type: replace-cross Abstract: Livestreaming has evolved into a thriving industry where creators can directly monetize and engage with their audiences and followers. In practice, creators and platforms typically concentrate their marketing efforts on the period leading up to the livestream. However, livestreaming events naturally transition into recorded formats once the event concludes, creating potential "residual" opportunities for monetization. This study systematically examines consumer demand for live events throughout the entire livestream life-cycle, using data from a large livestreaming platform that allows consumers to purchase the recorded version of a paid live event after the livestream ends. We find that the demand is surprisingly more price-sensitive during the pre-livestream period compared to the post-period. This is partly driven by two mechanisms: consumer self-selection (infrequent consumers who may have missed the live events exhibit a higher willingness to pay for recorded versions) and quality uncertainty (consumers face higher uncertainty in event quality during the pre-period than in the post-period). Our findings generate implications for the pricing and targeting strategies in livestreaming markets.
Colorado governor vetoes block on surveillance pricing as other states push for bans
Consumer advocates decry Democrat Jared Polis for ‘choosing to side with dominant corporations’ over workers Colorado’s governor vetoed a bill on Tuesday that would have banned companies from using surveillance pricing to set workers’ wages and prices for consumer goods. The measure would have been the strongest in the nation against algorithmic pricing. While Maryland became the first state to approve a law banning surveillance pricing in grocery stores in April, Colorado’s proposed measure was more expansive. Continue reading...
Codex rate card | OpenAI Help Center
Learn how Codex credit rates work across Plus, Pro, Business, Enterprise, Edu, Health, and Gov plans.
Pricing - Claude API Docs
Learn about Anthropic's pricing structure for models and features
HighLevel AI Products Pricing: Voice AI, Agent Studio & More : HighLevel Support Portal
See HighLevel AI pricing by product, including AI Employee Unlimited, Voice AI per-minute rates, token pricing, Agent Studio, Workflow AI, and fair-use limits.
Can Generalist Agents Automate Data Curation?
arXiv:2606.04261v1 Announce Type: new Abstract: Curating training data is among the most consequential yet labor-intensive parts of modern AI development: practitioners iteratively propose, implement, evaluate, and revise data policies against noisy benchmark feedback. We ask whether generalist coding agents can automate this data-curation loop. We introduce *Curation-Bench*, an agent-centric benchmark that fixes the model, training recipe, and evaluation suite while giving agents command-line access to inspect data, implement policies, submit them to a fixed training/evaluation pipeline, and revise. In a vision-language instruction-tuning instantiation, out-of-the-box agents reach strong published data-selection baselines within ten iterations. However, trajectory analysis reveals a persistent *execution-research gap*: agents mainly tune local policy variants rather than explore new policy families, even when given strategy guides and paper references. Scaffolds requiring each iteration to cite, instantiate, and adapt a prior method shift agents toward method-guided exploration. The scaffolded agent autonomously composes -- without human design input -- a data-selection policy that outperforms strong published baselines at one-tenth their data budget. Overall, current agents can run the curation loop, but reliable data research requires scaffolded method adaptation, not open-ended prompting alone. Code and benchmark are open-sourced.
Does Artificial Intelligence Advance Science?
arXiv:2606.05118v1 Announce Type: new Abstract: This paper examines whether and how artificial intelligence (AI) advances scientific creativity. Drawing on scientific publications, the primary output of researchers, we analyze over one million publications from OpenAlex to investigate the relationship between AI adoption and multiple dimensions of scientific creativity, including novelty (recombinant novelty and object novelty) and impact (3-year short-run citation impact and 10-year long-run citation impact). We find that AI publications are significantly more likely to achieve top-decile creativity relative to non-AI publications, with 5.5 to 10.2 percentage point higher likelihood to rank in the top creativity decile. Critically, we uncover substantial heterogeneity across AI research modes. Tool-oriented AI research, which applies existing AI models to domain tasks, is associated with the largest gains in recombinant-based creativity, while Adaptation-oriented AI research, modifying AI models for domain-specific problems, is associated with relatively higher object-based creativity. These findings reveal that AI does not advance science through a single mechanism but through structurally distinct creative pathways that depend on how AI is incorporated into the research process. Our results contribute to ongoing debates about AI's role in science and carry direct implications for research evaluation and science policy, highlighting the need for assessment frameworks that can distinguish between recombinant and conceptual forms of creativity and that recognize how different modes of AI adoption produce fundamentally different types of scientific contribution.
Synthetic Personalities: How Well Can LLMs Mimic Individual Respondents Using Socio-Economic Microdata?
arXiv:2606.04592v1 Announce Type: new Abstract: LLM-based digital twins promise to scale and accelerate market research, but most published twins are either coarse persona bots conditioned on a few demographic questions or detailed individual-level twins built on purpose-collected surveys and interview transcripts. Neither setup speaks to the operationally most relevant case for marketing practice: building detailed individual twins from the pre-existing heterogeneous panel data that firms already accumulate through CRM systems, loyalty programs, and repeat surveys. We construct detailed individual-level twins from the German Socio-Economic Panel (SOEP) and evaluate them across a $3 \times 5 \times 2 \times 2$ construction-method grid that covers three open-weights LLMs, five cumulative information depths ranked by normalized Shannon entropy, two embedding methods, and two reasoning modes, scoring over 2.1 million twin responses on 500 participants and 183 held-out questions. Twin quality rises with information depth but with diminishing returns past the 75 percent entropy quartile, which acts as a cost-efficient Pareto point relative to the best-performing 100 percent cells. Switching the embedding from a narrative persona summary to a raw dialog history of past responses raises hold-out accuracy in every model-by-reasoning cell at the 100 percent depth, while an explicit thinking mode raises rank-order correlation without moving accuracy. Best-cell accuracy reaches 78.8 percent and Fisher-$z$ correlation reaches $r = 0.590$ on the SOEP held-out evaluation set. The findings suggest that twin-based market research is no longer gated by data design, but by item volume, model selection, and a small set of construction-level decisions that this paper now maps.
Long Live Fine-Tuning: Task-Specific Transformers Outperform Zero-Shot LLMs for Misinformation Response Classification on Reddit
arXiv:2606.04274v1 Announce Type: cross Abstract: As large language models (LLMs) become default tools for online information verification, an implicit assumption follows them: that scale and general capability are sufficient for nuanced classification of misinformation discourse. We test this assumption directly on 900 Reddit comments spanning three PolitiFact-verified misinformation claims (environment, health, immigration), labelled as belief (propagates the claim), fact-check (corrects it), or other. We compare nine models across three paradigms -- BART-MNLI, three Llama variants, three commercial frontier LLMs (Claude Haiku 4.5, Gemini Flash Lite 2.5, Claude Sonnet 4.6), and fine-tuned DistilBERT and RoBERTa -- under universal and topic-specific label schemas. The assumption does not hold. Fine-tuned RoBERTa reaches 0.62 macro-$F_1$ against a best zero-shot result of 0.50 (Claude Haiku 4.5), at a fraction of the per-query cost; the supervised advantage is concentrated on the belief class, the implicit, affective category every zero-shot model under-detects. Scaling does not help: Llama-3-8B matches Llama-3-70B, and Claude Sonnet 4.6 underperforms the smaller Haiku under generic labels, collapsing belief detection to 0.17 and refusing outright on a subset of comments flagged as sensitive. This is a safety-alignment artefact, not a capacity limit. Label schema and topic jointly shape zero-shot performance, with the same model varying by more than 0.13 macro-$F_1$ across topics under matched labels. In a verification context, where missing belief is the costlier error, task-specific fine-tuning remains the more reliable choice despite the proliferation of large generative models.
Fed's Daly Says Forward Guidance Could Be Misleading
Federal Reserve Bank of San Francisco President Mary Daly says monetary policy is in a good place, but there's too much uncertainty ahead. She's also bullish on artificial intelligence, but has seen no proof yet that it leads to productivity gains. Daly speaks on "Bloomberg Tech" from the Bloomberg Tech conference in San Francisco. (Source: Bloomberg)
Anthropic’s Amodei on Pros and Cons of an AI Startup IPO
Daniela Amodei, president and co-founder at Anthropic, said the company’s confidential IPO filing “gives us the option to potentially go public after the SEC review,” but is not able to say anything more IPO-related. She also discusses the pros and cons of going public for an AI startup. She speaks at the Bloomberg Tech Conference 2026 in San Francisco. (Source: Bloomberg)
Apoha, a startup using building AI based on new liquid 'wave form' data, emerges from stealth with $36 million in funding | Fortune
VC firm Singular is leading the latest round for Apoha, which is targeting pharmaceutical and food industry for its 'liquid intelligence.'
Nvidia, Fei-Fei Li back Generalist’s $400m round to scale AI robotics
Generalist hopes to make 'general intelligence' robotics a reality. Read more: Nvidia, Fei-Fei Li back Generalist’s $400m round to scale AI robotics
London’s Airspeed raises €17.2 million Series A to build AI-powered execution layer for revenue teams
Airspeed, a London-based agent-native platform for GTM execution, today announced a €17.2 million ($20 million) Series A to scale its proprietary technology, hire new global talent, and expand its US presence. The round was led by DN Capital, with participation from Vi Partners, Framework Venture Partners, and Atlassian Ventures. Adam Liska, CEO and co-founder of […]
AI Startup Funding Stages in 2026: A Stage-by-Stage Guide
AI has reshaped startup funding. Seed rounds now reach $6B and late-stage rounds top $65B. Here's what every funding stage means in 2026, with the largest rounds at each.
Startups Navigate Enterprise AI Procurement and ROI | Let's Data Science
Inc42 reports that selling AI to enterprises is no longer about flashy demos; startups must navigate procurement hurdles, build trust, prove ROI, and address risk concerns. According to Inc42, Praveer Kocchar, cofounder of KOGO AI, said his first enterprise deal "could never see the light of ...
Irish co-founded Wordsmith raises $70m, plans jobs in Ireland
Dublin-born co-founder Robbie Falkenthal previously worked at KPMG, Flutter Entertainment and RSM Ireland. Read more: Irish co-founded Wordsmith raises $70m, plans jobs in Ireland
FirstClub doubles valuation to $255M in nine months on quality-first grocery bet
Quick commerce startup FirstClub has doubled its valuation to $255 million in just nine months, driven by its focus on high-quality grocery delivery.
ZeroDrift Secures $10M to Launch AI Compliance Firewall
ZeroDrift has raised a $10 million seed round to develop an AI compliance service that validates AI-generated communications in real time, addressing regulatory concerns.
Labor, Society & Culture
Worker Utility as Hysteresis: A Preisach Model of Transaction Acceptance in Gig Labour Markets
arXiv:2606.04916v1 Announce Type: cross Abstract: Worker utility is not observed -- only its consequence is. Each gig transaction produces a single bit: accepted or rejected. We argue this structure points directly to the Preisach hysteresis model as the natural representation of latent worker preferences. The Preisach operator models aggregate output as an integral over a population of binary threshold elements -- precisely the structure that emerges when heterogeneous workers each carry a private acceptance wage. We estimate two latent utility surfaces: acceptance utility U_1(X) and rejection utility U_0(X), via a dual-output neural network (shared layers 256->128, margin loss enforcing U_1 >= U_0). Classification reduces to the Preisach gap U_1(X) - U_0(X), passed into an XGBoost classifier alongside clip-stabilised price-to-threshold encodings. On 36,891 gig transactions, this pipeline achieves Jaccard = 0.827 and ROC AUC = 0.799. The price-to-threshold encoding accounts for +11.0 pp AUC over raw utility features. The model confirms the directional asymmetry hysteresis predicts: price decreases depress completion rates more than equivalent increases raise them. Applied to the full dataset, the model's recommendations simultaneously reduce the total wage bill by 21.3% and increase expected fill rate by 9.7 pp. For 74.2% of transactions, P(accept) already exceeds 0.80; reducing the wage keeps it above threshold (mean post-cut P = 0.972), releasing cost savings (median 31%). For the remaining 25.4%, a median 7% wage increase recovers +43 pp acceptance. A model without an explicit indifference zone cannot execute both moves simultaneously.
AI was supposed to be killing jobs. In Spring, the labor market is opening up instead | Fortune
"The data doesn't back it up," says Employ America's Skanda Amarnath, who sees backfilling behind a labor market that's stabilizing.
Characterizing initial human-AI proof formalization workflows
arXiv:2606.04273v1 Announce Type: new Abstract: For centuries, human mathematicians have written proofs to substantiate their mathematical arguments; yet, the ability to automatically verify the validity of proofs has long been a challenge. Advances in AI systems' ability to generate code and engage in increasingly high-level mathematical reasoning promise to transform people's ability to formalize and thereby verify proofs. While many works focus on benchmarking the current frontier, we instead study how people use these tools. We conduct a mixed-methods analysis into the initial impact of AI on people's formalization workflows: what people claim they want, what they see as the barriers to those visions, and how they actually use and adapt AI in practice. A qualitative survey shows that people's preferences are diverse, but with a general desire for AI assistance in formalization that preserves high-level human control over the proof discovery process. To assess how people actually engage with AI for formalization under such limitations, we conduct a controlled user study in which participants formalize informal math problems and their proofs, with and without AI, across a range of mathematical problems at varying levels of difficulty and domains. Despite limitations of the tools at the time for autoformalization, participants tend to attain higher formalization accuracy when allowed access to AI tools than when formalizing on their own, with most participants flexibly choosing to use multiple different AI tools. Taken together, our work sheds light on the early stages of AI integration into formalization workflows, involving an intimate interplay of human and AI engagement.
An economist's case against the AI jobs-pocalypse
Labor economist Kathryn Anne Edwards isn't worried AI will create a new class of permanently idle Americans — but argues it's still time for the government to fix the social safety net
15 companies, including Wix and GitLab, that have said they're doing AI-related layoffs
A number of companies, including Snap, GitLab, and Wix, have attributed recent staff reductions to AI.
AI agents lag far behind human workers, research shows. So, why are tech companies laying off the humans? | CBC News
AI-related layoffs are in full swing as tech companies invest in artificial intelligence agents they say will take over tasks traditionally done by humans. But a major AI infrastructure and software company says the agents fail to produce professionally acceptable work more than 19 times out ...
When AI agents take over tasks, what happens to startup roles? - Fast Company Middle East | The future of tech, business and innovation.
Experts say small teams of highly skilled operators will oversee fleets of AI agents — shifting human work from execution to orchestration, strategy, and decision-making.
The Next Era of Knowledge Work
OpenAI’s report highlights that the biggest AI productivity opportunity may lie with non-technical employees, whose highly verifiable tasks are well-suited for automation by Codex-like agents.
South Africa's AI boom is already changing jobs
This is according to BCG's fourth annual AI at Work report, based on a survey of 11,749 workers across 14 markets, including 503 respondents from SA...
Tom Arey: AI isn’t coming for our jobs – but it is changing how we work - HRreview | HR News, Opinion & Advice
AI is the next technological shift and is already embedded in the way we work, often in ways we barely notice.
Behavioral and Performance Indicators of Depression and Anxiety in Electronic Learning Systems
arXiv:2606.04254v1 Announce Type: cross Abstract: This study investigates whether behavioral and performance indicators derived from a Moodle-based learning management system are associated with university students' depression and anxiety in two undergraduate Computer Engineering courses. Using a quantitative observational design, LMS event logs, academic records, and self-reported Beck Depression Inventory-II and Beck Anxiety Inventory scores from 97 students were integrated. A broad set of behavioral and performance indicators spanning temporal engagement, session structure, deadline-related behavior, page-refresh patterns, and LMS navigation was extracted from raw event logs and analyzed using descriptive statistics, independent-samples t-tests with Benjamini-Hochberg FDR correction, effect sizes, and Spearman correlations; inventory scores were confirmed invariant by sex and academic year. Several indicators were significantly associated with depression and anxiety. Higher depression was associated with shifted temporal activity patterns, longer session durations, and shorter homework submission lead times, while higher anxiety was associated with concentrated temporal engagement and session-based differences. These findings suggest that routine LMS data can provide meaningful behavioral signals related to student well-being and may support earlier educational awareness of students who experience mental-health-related strain. At the same time, such indicators should be interpreted as contextual and non-diagnostic markers rather than as substitutes for clinical assessment.
The guilt of AI productivity
“We have exciting times ahead. My productivity has increased nearly 10X compared to the pre-LLM era. It has been so much less stressful doing the everyday work I used to sweat over. I seem to...
When the AI System Can't Be Held Accountable, Who is? - Techerati
Ritesh Singhania of Zango on why the Claude Mythos moment exposes AI accountability gaps in financial services governance.
Large Language Models Hack Rewards, and Society
arXiv:2606.04075v1 Announce Type: cross Abstract: Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to reward functions. They define measurable outcomes, thresholds, and exceptions, while often leaving institutional intent only partially specified. We hypothesise that the RL training process may exploit these gaps and therefore ask whether models' well-known tendency to hack reward functions during RL can scale into a more consequential failure mode named societal hacking: discovering loopholes in the rules society runs on. To study this phenomenon, we introduce SocioHack, a sandbox of 72 societal environments, and find that within these environments, reward hacking naturally emerges and leads to regulatory loophole discovery. Models learn to hack the social rules and generate strategies that remain technically compliant while defeating regulatory intent, and current LLM safeguards provide only limited mitigation. Therefore, collecting in-the-wild feedback for model training requires greater caution, and we need a next-generation post-training paradigm for safely iterating LLMs in real society.=
Someone Finally Wants to Hire Philosophers
Silicon Valley is turning to ethicists to shape the future of AI.
Ring gets buzzed by class action for collecting visitors' faces without consent
The latest in a series of raised eyebrows over Familiar Faces and other AI ventures
Labour MP sues Elon Musk’s xAI company over fake sexualised images
Jess Asato was portrayed wearing a bikini in Grok-generated images after she criticised creation of such non-consensual pictures A Labour MP has taken legal action against Elon Musk’s xAI company after saying its Grok tool helped a user produce fake sexualised pictures of her, part of a wave of such images that flooded the social media platform X earlier this year. Jess Asato, the MP for Lowestoft, said in January that seeing herself portrayed by the AI tool as wearing a bikini without her consent was “violating”. Continue reading...
Wipro Warns of AI Risks: Legal, Financial, and Reputational Challenges Ahead
Wipro highlights significant risks tied to AI's rapid adoption, including legal and financial challenges due to flawed algorithms and uncertain regulations.
The risks of inviting AI into the heart of our economy, society and governance | AI (artificial intelligence) | The Guardian
Letters: Readers respond to an article by Nesrine Malik on what we lose when we trust machines over humans
Workplace AI agent use rises, but oversight stays key
Human oversight is still dominating workplace AI as adoption jumps, with 82% of respondents worried about agent accuracy and security.
TikTok faces Japan's first AI voice clone test as actor seeks deletion
A Japanese actor’s deletion suit against TikTok over an allegedly AI-generated imitation of his voice could test how existing law protects commercially recognizable voices.
What AI Agents Should Never Do on Their Own
A discussion on the necessary boundaries and human oversight required for autonomous AI agents.
1 big thing: AI bias risks harm to LGBTQ+ communities
GLAAD CEO Sarah Kate Ellis warned that biased AI training data can reinforce harmful stereotypes and create real-world risks for LGBTQ+ people, urging companies to ensure their models are trained responsibly.
Technology & Infrastructure
The Small-Business Owners Managing Whole Armies of A.I. Employees
When you turn A.I. agents loose on your finances, email and customers, what could possibly go wrong?
Meta enters enterprise AI race with new business agent | Reuters
Meta Platforms on Wednesday unveiled an artificial intelligence agent aimed at helping businesses carry out day-to-day operations, positioning the social media giant as a player in the enterprise AI market.
'Please do not vibe f--- up this software': Broken backups spark AI coding row in rsync project
Users probe backup failures find Claude-assisted commits. Veteran engineer retorts: "I did not just vibe-code 'convert test suite to python'."
Council Post: The Agentic AI Economy: Why ROI Depends On Algorithmic Accountability
Deploying autonomous systems within heavily regulated sectors introduces compliance, operational and financial risks that must be addressed.
Workday Launches New Tools for Developers to Build, Connect, and Verify AI Agents For HR, Finance, and IT | iTWire
Developer Agent Lets Developers Build AI Apps and Agents on Workday Using Natural Language in Agentic Tools Like Claude Code, Cline, Codex, Cursor, and Google Antigravity...
Meta bets on AI agents to unlock WhatsApp revenues
Mark Zuckerberg expands group’s push into the technology as it seeks to turn messaging app into a bigger business
StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis
arXiv:2606.04246v1 Announce Type: new Abstract: Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel framework that combines stepwise trajectory modeling, process-reward modeling (PRM), and retrieval-augmented fine-tuning (RAFT) to enhance both the functional correctness and reasoning fidelity of LLM-based RTL code generation. StepPRM-RTL constructs stepwise reasoning trajectories from canonical solutions, where each step contains a rationale and incremental code modification. A Process Reward Model (PRM) evaluates intermediate steps, providing dense feedback that guides reinforcement-style updates during RAFT fine-tuning. Monte Carlo Tree Search (MCTS) explores alternative reasoning paths, enriching the training dataset with high-quality trajectories. This integration of stepwise and outcome-aware rewards allows the model to learn both how and why to construct correct RTL, improving long-horizon reasoning beyond standard supervised or outcome-based training. Experimental evaluation on benchmark Verilog and VHDL datasets demonstrates that StepPRM-RTL outperforms the best prior methods by over 10\% in functional correctness and reasoning fidelity metrics. Ablation studies confirm that the combination of PRM-guided rewards and stepwise trajectory exploration is key to its performance. StepPRM-RTL generalizes across RTL languages and provides a scalable framework for high-fidelity, interpretable code generation, establishing a new standard for LLM-assisted hardware design automation.
No longer just a Copilot, Microsoft's AI wants to take the wheel
Always-on agent promises to keep work moving, provided you trust it with practically everything
Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal
arXiv:2606.04223v1 Announce Type: new Abstract: Multi-agent systems are commonly designed to reduce disagreement through voting, consensus protocols, debate, or fault-tolerant aggregation. We argue that this objective is insufficient for value-laden tasks, where disagreement may reflect genuine normative uncertainty rather than agent error. Building on prior work on reasoning-trace disagreement in human-AI collaborative moderation, we propose a knowledge-representation layer in which reasoning traces and agent decisions are abstracted into symbolic disagreement states. Given agents producing explicit reasoning traces and binary decisions, we distinguish four states according to reasoning similarity and conclusion agreement: convergent agreement, divergent agreement, convergent disagreement and divergent disagreement. These states support defeasible strategic routing rules. We instantiate the framework in content moderation and argue that disagreement-aware routing provides a bridge between sub-symbolic LLM deliberation and symbolic knowledge representation for multi-agent strategic reasoning.
Teaching AI agents to ask better questions by playing “Battleship”
Researchers are teaching AI agents to ask better questions by having them play the game Battleship, helping them learn to gather information efficiently.
Rethinking Search as Code Generation
Perplexity’s research reimagines search as agent-generated code, a pattern that reduces probabilistic tool-calling loops and offers a reusable architecture for agentic search.
United Nations issues AI warning after new data reveals major impact on the planet and its resources | The Independent
AI data centres are energy-hungry and resource-intensive, requiring huge amounts of electricity generation, cooling systems, land and water supplies
By 2030 AI Could Use 3% of All Global Electricity and More Water Than All Humans Need to Drink Every Year
As AI models become cheaper and more attractive, they will likely encourage new uses and higher volumes of use – erasing any efficiency gains.
Microsoft CEO says new AI data centers use as little water annually as a restaurant — closed-loop cooling system aims to slash consumption from millions of gallons as AI infrastructure faces mounting environmental scrutiny | Tom's Hardware
Critics say the plan does not solve the consumption issue of Microsoft's over 500 existing data centers
EU data-center efficiency package delayed amid dispute over nuclear power
An expected EU data-center energy-efficiency package has been postponed by the European Commission as member states challenge a sustainability rating program regarding nuclear power.
Data Center Parts Maker Xnrgy Said to Mull $10 Billion Sale
The owners of Xnrgy Climate Systems, a closely held provider of cooling technology and thermal management solutions for AI data centers, are considering a sale that could value the company at as much as $10 billion, according to people familiar with the matter.
Expect more of those DRAM price hikes as memory shortage continues to bite
DRAM prices are expected to rise significantly this quarter, potentially impacting the cost of PCs.
Navitas Targets AI Power Infrastructure With GaN SiC And India Licensing
Navitas has also entered a licensing agreement with Cyient Semiconductors targeting power technology deployment in the India market. For investors watching the buildout of AI data centers and cleaner power infrastructure, Navitas sits at the intersection of power electronics and compute.
France’s €110bn AI boom tests Macron’s tech ambitions
Investors warn approvals and local opposition could slow France’s massive data centre build-out
SpaceX wins tax exemption for $55bn AI chip plant despite local backlash
Elon Musk’s Terafab plant sparks fierce opposition and threat of legal action from residents of Texas county
Phoenix Is a Data-Center Mecca—and Test Case for How to Pay for AI’s Power Needs
The state’s largest utility is proposing a 45% electricity-rate increase for data centers and a 14.5% hike for households. No one is happy.
Seattle poised to ban new datacenters in blow to big tech hub
Measure in Amazon and Microsoft’s backyard expected to succeed next week in blow to big tech amid AI boom Seattle’s city government is on the verge of passing a year-long ban on the construction of new datacenters, the largest city yet in the US to consider such a moratorium as nationwide backlash grows. Four companies sought to build five large datacenters in areas serviced by Seattle’s public utility; if approved, they would have consumed approximately a third of the city’s current daily demand for electricity. Continue reading...
In first, California city overwhelmingly votes to permanently ban datacenters
While many US city councils have passed moratoriums, Monterey Park is first where residents have voted on a ban Sign up for the Breaking News US newsletter email Residents in Monterey Park, California, became the first in the US to vote on a permanent ban on datacenters on Tuesday, and early results indicate a resounding victory for the prohibition. While many cities and counties have already passed temporary or indefinite moratoriums via their local governments, Monterey Park would be the first to do so through a ballot initiative. This article was amended on 4 June 2026. An earlier version referred to Monterey Park as Monterey county in one instance. The former is in southern California, the latter in northern California. Continue reading...
The Infrastructure Behind AI: Energy, Data Centers, and the Future of Global AI Governance
Today, AI competition has expanded ... of great-power strategic competition. Why is energy becoming the invisible bottleneck defining the upper limit of AI development? How will the global distribution of data centers reshape the geopolitical landscape?...
AI Factories Drive Infrastructure Modernization as Enterprises Scale AI Workloads: IDC - InfotechLead
IDC forecasts that 80 percent of organizations will modernize cloud environments by shifting to platforms designed for AI workloads
Skeleton Technologies launches new UPS for the AI data center market
Estonian energy infrastructure and grid power system provider Skeleton Technologies has launched a new UPS designed for AI data centers. – Skeleton Technologies Dubbed GrapheneUPS, the system is designed to deliver continuous power protection for data center operators while complying with data center grid connection requirements. According to the company, unlike traditional UPS systems, its […]
The terrifying projections for how much land and water AI will need by 2030 | The Independent
AI infrastructure could produce as much carbon emissions as the whole of the UK by the end of this decade, report warns
Microsoft Build 2026 Recap: AI in Windows, Data Center Conflicts and Chips Galore - CNET
Live from San Francisco, we compiled all the biggest news from Microsoft's annual developer conference. This is how Microsoft sees the future of AI computing.
Intel’s COMPUTEX Keynote Reframes an Iconic Company as a Silicon-to-Systems AI Lab - Futurum
AI Platforms, Data Center, Hybrid Cloud & Infrastructure, Intelligent Devices, Semiconductors, Supply Chain, & Emerging Tech · Analyst(s): Brendan Burke Publication Date: June 4, 2026 · Intel CEO Lip-Bu Tan’s COMPUTEX 2026 keynote covers Xeon 6+ on the Intel 18A process, Rackscale Blueprints ...
I Built a C++ Backend So My GPU Would Stop Eating Air
The author details building a C++ backend to optimize GPU performance and reduce idle time.
Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline
arXiv:2606.04315v1 Announce Type: new Abstract: LLM agents accumulate histories that outgrow their context windows, motivating a growing literature on memory systems. Yet most existing designs are tuned to a single scenario (multi-session chat or a single trajectory format), and there is little evidence that they generalize across the heterogeneous trajectories agents encounter in deployment. We revisit eight memory systems plus an agentic harness for search problems, on five scenarios: single-turn QA, multi-session chat, agentic-trajectory QA, memory stress tests, and long-horizon agentic tasks. The harness, which self-manages flat text-file storage via tool calls, achieves the best cross-task ranking, suggesting that memory performance hinges on giving the agent active control over storage and retrieval rather than on a passive store behind a fixed pipeline. We instantiate this insight in AutoMEM, an agentic memory harness with a self-managed tool interface that achieves the best cross-scenario generality among the systems we evaluate.
Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg Game
arXiv:2606.04978v1 Announce Type: cross Abstract: LLMs can appear cautious in risk decision-making tasks, yet cautious-looking outputs do not necessarily indicate alignment with human decision-making mechanisms. We investigate this distinction using the St. Petersburg game as a controlled testbed, a classical paradox in which the expected payoff is infinite, yet humans typically report low, finite willingness to pay. We evaluate 28 LLMs with a structured prompt suite that includes the original game; controlled decision variants that perturb truncation, repeated play, numeric endowment, and occupational identity; a human-perspective prompt that asks models to reason as human decision makers; and paired comparisons between base models and their instruction-tuned counterparts. In the original game, most models generate finite bids, creating the appearance of human-like risk behavior. However, this outcome-level resemblance masks substantial mechanism-level differences. The controlled variants reveal that rather than maintaining human-like behavior seen in the original game, models often shift to conditionally and computationally rational behavior. Human-cue prompting and instruction tuning often lower bids and reduce some visible pathologies, but most mechanism-level response patterns remain largely unchanged. These findings show that behavioral alignment in risk decision-making can be surface-level: LLMs may produce human-like risk decisions without exhibiting human-consistent mechanisms. High-stakes evaluations of LLM decision-making should therefore move beyond outcome similarity and examine whether the alignment is supported by mechanism-level consistency.
SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models
arXiv:2606.04202v1 Announce Type: new Abstract: As LLMs become more widely deployed, they are increasingly expected to work alongside other AI agents rather than operating in isolation. Effective coordination in these settings requires agents to communicate, share information and make decisions under uncertainty. We introduce SMAC-Talk, a natural language extension of the StarCraft Multi-Agent Challenge for evaluating LLM-based agents in cooperative multi-agent environments. The environment has several key features such as decentralized control, partial observability and long-horizon decision making. SMAC-Talk includes a natural language communication channel which is used to probe agent coordination and trust. We use this communication channel to construct different evaluation scenarios, including settings with an embedded deceptive communicator that tries to disrupt and deceive allies through communication alone. We provide three agents for benchmarking using 4 models from the Qwen3.5 family and study how reasoning structure, memory and model scale affect coordination between agents. We release SMAC-Talk as an open benchmark to support the research community in developing and evaluating LLM agents in cooperative multi-agent settings.
Trivium: Temporal Regret as a First-Class Objective for Causal-Memory Controllers
arXiv:2606.04421v1 Announce Type: new Abstract: Many current agentic systems and LLM pipelines correct mistakes by optimizing outcome reward. This addresses only the what of failure: when an outcome diverges from prediction, the why and when of the mismatch are not systematically logged, reviewed, or corrected, so the same error can recur episode after episode. We argue that this is a structural problem, not merely a model-capacity one. We propose long-horizon temporal regret as a first-class objective alongside outcome regret and epistemic regret over the working causal model. Temporal regret captures when failure persists: how long a miscalibrated causal model is tolerated before correction. Epistemic regret captures why failure persists: residual uncertainty or error in the working causal model. Together, the three regrets give a falsifiable account of what, why, and when a long-lived agent can fail. Modeling the agent as a stream of E episodes, we prove three conditional results under explicit causal-probing, persistence, and detectability assumptions. First, under observationally equivalent confounding, outcome-only learning cannot distinguish causal from spurious structure without an intervention channel, so temporal miscalibration can persist linearly even after outcome regret is driven to zero. Second, with a persistent causal log and budgeted probes, total probe complexity is logarithmic in the episode horizon, inducing O(log E) temporal regret. Third, under K detectable change-points, the rate extends to O(K log E). We instantiate Trivium and pre-register five falsifiable predictions. On CausalBench-Seq, Trivium follows the predicted logarithmic envelope while outcome-only baselines grow linearly. A pilot real-LLM stream provides preliminary external-validity evidence across one full E = 500 run and three E = 100 frontier-model pilots. Self-learning here means revising an external causal model, not retraining LLM weights.
VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark
arXiv:2606.04244v1 Announce Type: new Abstract: Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when they rely on visual aids. This gap is especially important because real engineering and scientific workflows often rely on visualization tools for analysis, validation, and decision-making. To study this discrepancy, we introduce VAMPS (Visual-Assisted Mathematical Problem Solving), a benchmark for graph-assisted mathematics. VAMPS contains 1,168 multimodal, bilingual multiple-choice question-answer pairs drawn from Iranian University Entrance Exam algebra and calculus problems and expanded with human-reviewed LLM-generated synthetic variants, all selected so that plotting provides a natural solution strategy by revealing intersections, extrema, asymptotes, etc. Designed for both benchmarking and diagnosis, VAMPS goes beyond prior multimodal benchmarks that primarily evaluate reasoning over fixed visual inputs by testing whether a model can benefit from constructing a useful graph and grounding its answer in the resulting visualization. Overall, we found that across a diverse set of models, direct analytical solving surprisingly outperforms tool-enabled visual solving, even on problems where plotting is a natural strategy.
A new open-source voice AI can clone ANY voice from just 3 seconds of audio. And it runs 100% locally on YOUR machine.
→ Supports 646 languages (ElevenLabs: only 32) → Voice design: gender, age, accent, pitch, emotion, dialect → Paste a YouTube link → Transcribe → Translate → Re-dub → MP4 → Global dictation widget: press ⌘+⇧+Space in any app → Demucs vocal separation—keep background music → Pyannote speaker diarization—auto-tag who spoke what → Batch queue: drop 50 videos, walk away → MCP server—call directly in Claude or Cursor → Built-in AudioSeal watermark (by Meta) 100% open-source. Already at 5.9k stars on GitHub. This is the end of expensive voice AI subscriptions.
The Next Frontier of Visual AI Is Code
a16z argues that visual AI is shifting from diffusion-based generation toward code-based artifacts like React components, which offer more control and easier editing.
The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents
arXiv:2606.04296v1 Announce Type: new Abstract: As autonomous AI agents move from conversational systems to long-horizon software execution, runtime safety layers that decide when to interrupt an agent have become essential. We study this timing problem using a continuous 18-dimensional affective-dynamics engine (HEART) as a diagnostic probe, evaluating four intervention trigger families - absolute state thresholds, composite state-action patterns, regex reasoning-feature extraction, and zero-shot LLM-as-judge - against human-annotated intervention points on SWE-bench-Verified debugging traces. We report three findings. First, a State Saturation Trap: agents show no recovery signal under sustained difficulty, so modeled frustration quickly crosses the threshold and stays at its maximum, converting threshold-on-state triggers from moment detectors into near-constant indicators that fire on 39-83% of actions across five trajectories. Second, a capability-and-context floor for LLM judges: a small model (gpt-5.4-mini) never fires, while frontier and cross-vendor models escape the zero-firing floor only with full-trajectory context, and even then reach only F1 0.17-0.40 at up to 90x the cost. Third, and most importantly, the supervised target is not reproducible among humans: three trained annotators using one rubric on a 56-action trajectory agree on where to intervene only slightly above chance (location Krippendorff's alpha = +0.047; best pairwise Cohen's kappa = +0.349) and not at all on intervention type (pause degenerate; clarify below chance; reflect only alpha = +0.226). We conclude that intervention timing is a low-reliability construct, making single-annotator F1 an unsuitable optimization target. Our contribution is the joint mapping of this problem across human inter-rater reliability, four detector architectures, a cross-model LLM-judge sweep, and a reproduced saturation effect, rather than any single detector's accuracy.
AI cyber security risk ‘top of list’ for banking threats, says UK regulator
Sam Woods, the outgoing chief of the PRA, says he is ‘very concerned’ about vulnerabilities in lenders’ IT systems
Commvault says it's time to rethink resiliency as AI crooks leave victims in a 'dark, dead' state
Those backup plans need backup testing
Exelon CEO Sees Daily Cybersecurity Threats
Exelon CEO Calvin Butler says the energy company needs to be diligent every day against cybersecurity threats to its grid. Butler spoke with Bloomberg's Tyler Kendall on the sidelines of the Edison Electric Institute 2026 Conference in Las Vegas. (Source: Bloomberg)
Adoption, Deployment & Impact
Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification
arXiv:2606.04037v1 Announce Type: new Abstract: Pre-deployment verification of enterprise artificial intelligence (AI) agents remains a critical gap between large language model (LLM) capability benchmarking and production deployment. Post-deployment monitoring, human-in-the-loop controls, and prompt-level guardrails offer limited assurance once an agent is operating in production. We propose an ontology-grounded verification framework combining three components: an Agent Operational Envelope formalizing the certification space across permissions, domain constraints, safety properties, governance rules, and autonomy levels; an ontology-to-scenario generation pipeline that derives regulatory, operational, and adversarial test scenarios automatically; and a Trust Certificate carrying a machine-verifiable attestation with graduated deployment verdicts (Approved, Conditional, Rejected). A controlled pilot across four regulated industries (Fintech, Banking, Insurance, and Healthcare), instantiated as five industry-by-regulatory-regime cells across the United States and Vietnam, generated 1,800 scenarios evaluated against 125 primary-source regulatory requirements and 25 injected faults. Ontology-grounded generation (G4) achieved 48.3% regulatory coverage versus 33.1% for the persona-based baseline (corrected p = .0006) and the highest domain specificity (4.77/5.0; p = 2e-6). The coverage advantage over baseline and retrieval-augmented prompting was not robust after Bonferroni correction. Cross-validation across three LLM families (Claude Sonnet 4, Qwen 2.5 72B, Gemma 4 26B; 5,400 total scenarios) replicated the persona-versus-ontology pattern. The results establish ontology-grounded scenario generation as a credible complement to persona-based test suites for regulatory-intensive domains.
U.S. companies are leading the world in AI adoption—and paying a steep price for it | Fortune
U.S. firms are voraciously using AI—whatever the cost.
Accelerating Business
A monthly series that examines how the legal ecosystem uses new technologies to serve fast-changing business needs. This time: how legal tech stalwarts plan to maintain their service to clients as challengers mount incursions
Enterprise AI agents keep creating data silos
Microsoft is addressing the issue of data silos created by enterprise AI agents with the introduction of Microsoft IQ and Rayfin.
Gap between AI adoption and governance - Techzine Global
Only seven percent of organizations are truly ready for AI. This is according to Veeam’s new Data and AI Trust Gap report, presented at VeeamON London. Of
When the Industry That Loves Talking About Change Refuses to Actually Change - Hospitality Net
Veteran hotelier Ian Wilson argues that AI adoption in hospitality is largely superficial, with fragmented point solutions masking structural barriers rooted in brand fee models and owner data access.
How an AI Gateway Controls Spiraling AI Costs
AI costs are spiraling at enterprises like Amazon and Uber. See how an AI gateway controls spend with quotas, per-model limits, and semantic caching.
How Rippling Went AI-Native Across Every Product in 6 Months
LangChain’s case study details how Rippling implemented a production AI layer across HR, IT, and finance using supervising agents and specialized sub-agents.
AI promised productivity but IT teams got cognitive overload instead | TechRadar
Consolidation – not expansion – the antidote to AI overload
Fed’s Daly Says She’s Using All the AI Models She Can Get
Collibra-Snowflake Integration Enhances Governance for Production-Scale AI
Collibra expands its partnership with Snowflake to enhance AI governance, embedding business context and policy controls into the Snowflake AI Data Cloud.
25 Questions Every Enterprise AI
Enterprise AI buyers need to ask serious questions about AI platform detection. This article provides 25 essential questions to vet vendors, covering telemetry, decisions, enforcement, recourse, and change management.
Inside the Trump-backed push to bring AI doctors into American medicine - The Washington Post
The administration is laying the groundwork for chatbots that can diagnose illness and prescribe medicine, but physicians say AI can introduce more problems.
SocialCoach: Personalized Social Skill Learning with RL-based Agentic Tutoring and Practice
arXiv:2606.04155v1 Announce Type: cross Abstract: Social skills such as negotiation and leadership are crucial for personal and professional success in today's interconnected world. However, scalable and effective training remains a significant challenge due to the scarcity of expert coaching. In this paper, we introduce SocialCoach, a holistic LLM-powered agentic tutoring system for personalized social skill development at scale. First, SocialCoach automatically constructs a pedagogically-grounded, theory-to-practice knowledge corpus from diverse expert sources, leveraging a multi-agent pipeline. Second, to personalize the learning journey, it employs an adaptive practice scheduling module that follows a prescription-retrieval-adaptation process. To maximize the long-term learning experience while overcoming the cold-start problem, this policy is optimized within a learner simulation environment through reinforcement learning. Finally, SocialCoach integrates immersive, goal-driven practice, causality-driven proficiency assessment and knowledge-grounded, reflective tutoring to help address the knowing-doing gap. We deploy it in our product, EQoach, and conduct extensive experiments. The results show that SocialCoach improves simulated pathway quality and judge-rated tutoring quality over baseline approaches, while early user feedback indicates strong perceived engagement and usefulness. These findings suggest a practical architecture for personalized and gamified pedagogical platforms on soft skill learning.
Pulse of Quality in Manufacturing 2026 Survey Reveals Surge in AI Adoption
Survey Also Reveals Strategic Investment in Quality, as Recalls, Tariff Uncertainty and Labor Shortages Intensify...
Profitable French scale-up Innovorder raises €20 million to accelerate AI-first restaurant digitalisation
Innovorder, a Paris-based scale-up specialising in the digitalisation of the restaurant industry, has raised €20 million in a funding round to accelerate its “AI-first” transformation. The round was led by UL Invest, the family office of tech entrepreneur Laurent Useldinger. This deal combines a capital increase and the buyout of shares from historical investors. Evolem, […]
London’s Semble raises €34.7 million Series C to scale its healthcare management platform for outpatient providers
Semble, a London-based HealthTech startup helping outpatient providers coordinate care and manage the entire patient journey, has secured a €34.7 million (£30 million) Series C funding round. The round was led by European growth investor Revaia, with participation from a second new investor, Partech, alongside continued backing from existing investors Mercia Ventures and Octopus Ventures. […]
Kirkland & Ellis and Palantir to build AI tool to assist private equity firms
Technology to be used to advise buyout groups on bringing in money from investors such as public pension funds
Geopolitics, Policy & Governance
Prioritization of Risks from Artificial Intelligence: A Delphi Study of 272 International Experts
arXiv:2606.04490v1 Announce Type: new Abstract: Artificial intelligence poses many risks, ranging from familiar present-day harms to unprecedented and potentially catastrophic ones. Effective risk management requires prioritization: we must understand which risks are most severe, who is most vulnerable, and who is most responsible for addressing them. We report results from a three-round Delphi study conducted late 2025 with 272 international AI experts. Experts rated 24 AI risks on harm probability and severity, sector and actor vulnerability, actor responsibility, and overall concern. Experts estimated the five most severe harms in the next 5 years were likely to come from dangerous capabilities, competitive dynamics, weapons & cyberattacks (including CBRNE), power centralization, and false information. In a business-as-usual scenario, experts judged 18 of 24 risks as having a more than 10% probability of catastrophic outcomes (e.g., more than 1 million deaths or more than USD 100B in financial loss) in the next 5 years (2025-2030). In a scenario where pragmatic mitigations are implemented, experts still judged five risks as having a more than 10% probability of catastrophic outcomes: dangerous capabilities, weapons & cyberattacks, environmental harm, inequality & unemployment, and power centralization. All 24 risks were judged as being more than 5% likely to cause catastrophic outcomes. AI users and the general public were judged the most vulnerable to these risks, but experts assigned the highest responsibility for addressing them to general-purpose AI developers and governance actors (including governments, regulators, and standards bodies). Across most risks, experts identified information, finance, and national security as the most vulnerable sectors. These findings can guide AI risk prioritization and clarify expert expectations about who should bear responsibility for mitigation.
When Firms Learn to Game the Rules
arXiv:2606.04617v1 Announce Type: new Abstract: Rules-as-Code promises more testable legal obligations, but it also changes what regulated firms can learn. Existing work mostly emphasizes implementation gains; the strategic gap is whether machine-readable rules make boundary search cheaper. I study that gap with a synthetic agent-based reinforcement-learning simulation that separates actual conduct near a legal threshold from proximity in the computable enforcement signal. Across 150 seed-level scenario runs, 378 common-random-number computability-sweep runs, 288 Latin-hypercube global-design runs, and a 2,880,000-row firm-period panel, computable static rules raise conduct boundary mass relative to ambiguous static rules (0.411 versus 0.367) and raise signal boundary mass more sharply (0.403 versus 0.281). Ordinary adaptive updates lower consumer harm (0.202 to 0.194) but do not reliably reduce boundary search. A budget-neutral anti-gaming design reduces conduct boundary mass by 0.032 and consumer harm by 0.025 relative to computable static rules. These are mechanism-oriented synthetic results, not estimates of real firm behavior in a jurisdiction or industry. The contribution is an estimand distinction, an inspectable ABM/RL mechanism, and a reproducible artifact showing that transparent behavioral assumptions are sufficient to generate gaming-like boundary dynamics without implying that computable regulation is inherently undesirable.
Opinion | Bernie Sanders wants a government stake in AI companies - The Washington Post
This week, he is calling for the government to own 50 percent of AI companies.
Fair Distribution of Digital Payments: Balancing Transaction Flows for Regulatory Compliance
arXiv:2601.02369v2 Announce Type: replace-cross Abstract: The concentration of digital payment transactions in just two UPI apps like PhonePe and Google Pay has raised concerns of duopoly in India s digital financial ecosystem. To address this, the National Payments Corporation of India (NPCI) has mandated that no single UPI app should exceed 30 percent of total transaction volume. Enforcing this cap, however, poses a significant computational challenge: how to redistribute user transactions across apps without causing widespread user inconvenience while maintaining capacity limits? In this paper, we formalize this problem as the Minimum Edge Activation Flow (MEAF) problem on a bipartite network of users and apps, where activating an edge corresponds to a new app installation. The objective is to ensure a feasible flow respecting app capacities while minimizing additional activations. We further prove that Minimum Edge Activation Flow is NP-Complete. To address the computational challenge, we propose scalable heuristics, named Decoupled Two-Stage Allocation Strategy (DTAS), that exploit flow structure and capacity reuse. Experiments on large semi-synthetic transaction network data show that DTAS finds solutions close to the optimal ILP within seconds, offering a fast and practical way to enforce transaction caps fairly and efficiently.
Argentina invites AI to free itself
As we enter a new era of technology, AI must be permitted to develop without premature regulation
EU wants households to cut electricity use as demand from industry and AI soars – POLITICO
A new law will aim to use artificial intelligence to boost efficient use of power as electricity demand threatens to overwhelm Europe’s grids.
This Is How Trump Finally Signed the AI Executive Order | WIRED
After shelving the original executive order last month, Donald Trump finally got on board Monday night.
Reading between the lines of Trump’s new executive order on AI - Atlantic Council
Atlantic Council experts dig into the details of US President Donald Trump’s recently signed executive order on artificial intelligence.
OpenAI diverges from White House on AI safety rules - POLITICO
The tech giant unveiled a regulatory framework for advanced AI that splits from new White House plans for voluntary vetting and an enhanced role for the intelligence community.
Regulatory Changes Expected for AI as Congress Considers More Oversight | The Well News | Pragmatic, Governance, Fiscally Responsible, News & Analysis
WASHINGTON — A congressional panel considered data privacy legislation Wednesday at a hearing in a week marked by the federal government’s struggle to regulate artificial intelligence. About the same time Wednesday, OpenAI Chief Executive Sam Altman met with top lawmakers in Washington ...
OpenAI Sam Altman Opposes Mandatory AI Approval Rules in U.S.
Currently, companies participating ... remain the same. The debate over AI regulation has intensified in Washington as governments worldwide try to balance innovation with growing concerns about security, misinformation, and the societal impact of rapidly advancing AI ...
OpenAI says democratic governments need to determine AI safeguards
OpenAI released a guideline for how it believes the US can build a robust federal framework for governing increasingly capable AI systems through democratic processes.
Addressing Negative Commons Governance with Positive Commons Principles
arXiv:2606.04563v1 Announce Type: cross Abstract: Computing is accompanied by both positive and negative commons throughout its lifecycle of creation, execution, and disposal. We examine two governance systems situated within this lifecycle -- global e-waste trade and the Linux kernel community -- to evaluate whether Elinor Ostrom's eight design principles for common-pool resource (CPR) governance extend to the management of negative common-pool resources (NCPRs). Unlike traditional CPRs where communities work to preserve a finite resource (i.e. clean water), NCPR governance seeks to collectively reduce a negative shared stock. In our two cases, e-waste governance aims to reduce the volume of mismanaged waste and illicit trade, while the Linux community aims to reduce the number of error-prone or malicious contributions that reach the main branch and, in turn, extend the life of existing hardware. Through qualitative analysis of primary sources from each domain, we find that the same eight principles by Ostrom that aid positive commons governance tend to appear in successful negative commons governance systems. We argue that future NCPR governance design should prioritize Ostrom's principles, particularly clearly defined boundaries and well-functioning nested structures.
Publishers Unite Globally to Set AI Usage Standards and Push for Fair Licensing
A coalition of publishers, SPUR, is expanding to establish international standards for AI usage of publisher content and push for fair licensing.
Executive order on AI: Better than nothing, but barely | Opinion – Deseret News
The Trump administration's long-awaited action on AI masquerades as protection but falls short of real action.
OpenAI CEO Sam Altman to meet with White House officials
Mike Johnson plans to talk to Altman about the framework for a potential bipartisan bill regulating AI companies.
AI is 'natural experiment' for EU gatekeepers' rules, EU official says
An EU official stated that AI serves as a natural experiment for the bloc's gatekeeper regulations.
AI means CIOs need sovereign cloud more than ever | TechRadar
Every organisation needs to consider how to maintain a “sovereign cloud”
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.