AI Intelligence Brief

Sat 27 June 2026

Daily Brief — Curated and contextualised by Best Practice AI

186Articles
Editor's pickSummary

Washington vets OpenAI clients, capital flees India, and engineers debug ghosts

TL;DRThe Trump administration is now mandating government approval for all new customers of OpenAI and Anthropic's most powerful models. Global capital is shifting away from emerging markets like India toward Korea and Taiwan to chase the $110 billion AI ecosystem. Meanwhile, researchers report that prompt-composed agentic systems suffer from 'instruction bleed,' where editing one module silently corrupts others. Microsoft is simultaneously abandoning flat subscriptions for usage-based billing to capture the value of AI-driven labor displacement.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

12 of 186 articles
Lead story
Editor's pickPAYWALLGovernment & Public Sector
Daily Brew· Yesterday

U.S. government will decide who gets to use GPT-5.6

The U.S. government is set to oversee access to OpenAI's latest AI model, GPT-5.6, as part of new regulatory oversight.

Editor's pickTechnology
Azeem Azhar· Yesterday

AI Ecosystem Revenue Hits $110 Billion, Outpacing Historical IT Growth Cycles

The AI ecosystem has generated $110 billion in revenue, growing three times faster than previous mobile or internet waves. While enterprise adoption is scaling, executive sentiment remains heavily tied to successful AI integration.

Editor's pickPAYWALLTechnology
NYT· Yesterday

How a Niche Technology Became a Choke Point for A.I.

Advanced chip packaging, which boosts computing power for artificial intelligence, has made the United States more reliant on Taiwan than ever.

Editor's pickFinancial Services
Arxiv· Today

AlgoEvolve: LLM-driven Meta-evolution of Algorithmic Trading Programs

arXiv:2606.26173v1 Announce Type: new Abstract: Recent work shows that Large Language Models (LLMs) can act as semantic mutation operators for the evolutionary discovery of programs and proofs. Most current applications focus on static coding benchmarks. We extend this paradigm to algorithmic trading. This domain is uniquely challenging because it is noisy, non-stationary, and highly discontinuous. We present AlgoEvolve, an LLM-driven evolutionary framework that generates, evaluates, and iteratively improves executable trading strategies. These strategies are expressed as Python code and evaluated through a rigorous testing protocol. Across multiple experiments, the system exhibits emergent regime-adaptive strategy logic, including autonomous shifts in trading rules. We further introduce a meta-evolutionary outer loop that evolves the prompts guiding program synthesis in the inner loop. This outer loop discovers improved search heuristics. These heuristics balance exploration and exploitation while reducing zero-trade failures. They consistently outperform initial human-designed instructions. The results demonstrate that LLM-based semantic evolution provides a viable approach for continual program synthesis in complex environments.

Editor's pickTechnology
The Economy· Yesterday

“From Subscriptions to Usage-Based Billing”: After Anthropic and OpenAI, Microsoft Revamps Pricing as the Era of ‘AI Maxxing’ Nears | The Economy

The pricing structure for generative artificial intelligence (AI) models is undergoing a fundamental transformation. As more enterprise customers use AI agents under subscription plans costing only a few dozen dollars per month to perform workloads equivalent to those handled by a full-time ...

Editor's pickDefense & National Security
Arxiv· Today

Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems

arXiv:2606.26298v1 Announce Type: new Abstract: Autonomous AI agents may begin to perform consequential, irreversible actions such as clinical prescribing and production software deployment. This paper observes that human institutions have governed powerful autonomous actors not by monitoring their reasoning but by requiring independently attested evidence at the point of consequential action. We formalise this institutional pattern as a computational governance model for AI agent systems. Under the proposed model, an agent retains full autonomy over planning and reasoning but holds no execution authority over designated high-risk actions. Execution is conditional on preconditions that are each independently attested by a separate authoritative source, cryptographically bound to a declared intent, and evaluated by a deterministic policy. Decisions are recorded in a tamper-evident log amenable to independent re-verification. We present a proof-of-concept implementation and illustrate the model with examples from software deployment and clinical prescribing.

Editor's pickProfessional Services
Arxiv· Today

Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems

arXiv:2606.26356v1 Announce Type: new Abstract: Practitioners of prompt-composed agentic systems report a recurring failure mode: editing one prompt module silently shifts the behavior of others despite no shared variable or executable dependency. We formalize this as compositional behavioral leakage (CBL): interference between modules sharing a context window. CBL is enabled by architectural non-isolation: transformer self-attention provides no formal boundary between concatenated modules. We probe CBL on a deployed job-evaluation agent (Claude Sonnet 4.6, 144 trials) through a reusable three-channel protocol that perturbs non-focal modules along volume, content, and form. Only the content channel produces a detectable paired effect (Cohen's d = 0.63, bootstrap 95% CI excluding zero); no recommendation flipped -- a sub-threshold regime invisible to standard QA but compounding across the thousands of decisions a deployed agent makes. CBL is orthogonal to known agent-failure axes (adversarial injection, cognitive degradation, multi-agent fault propagation, privacy leakage). We contribute an operational definition, a reusable protocol, a falsifiable prediction set, and a system-class characterization, establishing cross-module interference measurement as a requirement for prompt-composed agent evaluation.

Editor's pickFinancial Services
Livemint· Yesterday

Foreign investors exit India funds as AI boom redirects capital. Will inflows return? | Stock Market News

Foreign investors pulled money from India-focused funds in the March quarter as capital shifted toward AI-linked markets such as Korea and Taiwan, even as some investors see early signs of macro stabilization in India.

Editor's pickTechnology
Arxiv· Today

Accelerating Returns and the Qualitative Engine for Science

arXiv:2606.26359v1 Announce Type: new Abstract: Ray Kurzweil described a thesis of accelerating returns, which is the most influential narratives in discussions of technological progress. Its central claim is that advances in multiple technological fields, especially compute, artificial intelligence, brain science, and biotechnology, interact in such a way that progress becomes self-amplifying and approximately exponential. This paper gives a simple mathematical interpretation of that claim and then argues that, even if such acceleration is real, it does not by itself resolve the central problem of scientific discovery. The reason is that accelerating returns apply most naturally to executional and infrastructural capability, whereas genuine discovery often depends on a different capacity: qualitative reasoning about when a current framework is structurally inadequate and what conceptual move is needed next. Recent ARC-AGI-3 results sharpen this distinction: humans solve the benchmark at ceiling, whereas frontier AI systems remain below 1%, indicating that the gap between current AI and human flexible reasoning is still very large. At the same time, Demis Hassabis has emphasized that humans must retain their sense of meaning and what they choose to focus their lives on, a reminder that the future of AI is not only a technical forecast but also a question of what forms of human understanding are worth preserving and transmitting. This paper positions the Qualitative Engine for Science (QES) [3] as a response to that missing capacity. In this view, the Kurzweil theory helps explain why quantitative capability may accelerate, while QES addresses the central problem in scientific discovery that acceleration alone does not solve. Its value does not depend on when AGI arrives, but on the fact that the processes of scientific discovery themselves constitute a form of human wisdom worth preserving, organizing, and making accessible.

Editor's pick
Daily AI News June 26, 2026: AI Startups Are Coming With Two-Thirds Fewer People· Yesterday

AI-Native Firms

This paper explores how AI-native companies are scaling revenue and operations with leaner teams. It highlights how agentic AI lowers the threshold for a minimum viable company.

Editor's pickEnergy & Utilities
Tech Times· Yesterday

AI Data Center Water Use Is Not Solved: Nvidia's Cooling Fix Stops at the Walls

AI data center water use remains unsolved despite Nvidia’s June 2026 DSX closed-loop cooling breakthrough, which cuts on-site water consumption to near zero but leaves fossil fuel power plant water demand — roughly 54% of AI’s total projected water footprint through 2050 — entirely ...

Editor's pickTechnology
Arxiv· Today

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

arXiv:2606.26300v1 Announce Type: new Abstract: A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning capabilities and engineering harnesses grow more sophisticated, generating complex candidate solutions is no longer difficult -- reliably verifying them has become the harder problem. Every verifier we can build is only a proxy for human intent, never the intent itself. This makes verification subject to a twofold difficulty: first, intent is underspecified by nature, making it inherently hard to faithfully check whether it has been fulfilled; second, during model training, optimization widens the gap between proxy and intent -- manifesting as reward hacking or signal saturation. To address this, we characterize the quality of verification signals along three dimensions -- scalability, faithfulness, and robustness -- and argue that achieving all three simultaneously is the central challenge. We further study four reward constructions: a test verifier for general coding tasks, a rubric verifier for frontend tasks, the user as verifier for real-world agent tasks, and an automated agent verifier for long-horizon tasks. Across different task types and policy capability levels, we conduct in-depth analysis and experiments on the core challenges of reward design and how to more effectively leverage reward signals. Experiments show that targeted verification design can effectively suppress reward hacking, improve task completion quality, and achieve significant gains across multiple internal and public benchmarks. These experiences collectively point to a core observation: no fixed reward function can remain effective as policy capability continues to grow; and verification must co-evolve with the generator.

Economics & Markets

44 articles
AI Investment & Valuations18 articles
Editor's pickTechnology
Azeem Azhar· Yesterday

AI Revenue Growth Defies Expectations Amid Sustained Data Center Capital Expenditure

AI revenue growth continues to outpace forecasts, signaling sustained demand for compute infrastructure. This trend suggests that capital expenditure in data centers is increasingly translating into tangible market returns.

Editor's pickTechnology
Azeem Azhar· Yesterday

AI Ecosystem Revenue Hits $110 Billion, Outpacing Historical IT Growth Cycles

The AI ecosystem has generated $110 billion in revenue, growing three times faster than previous mobile or internet waves. While enterprise adoption is scaling, executive sentiment remains heavily tied to successful AI integration.

Editor's pickPAYWALLTechnology
NYT· Yesterday

Apple, Micron, OpenAI and A.I.’s Rough Summer

Rising memory prices, more expensive iPads and a longer wait for OpenAI to go public: The sector that has driven markets skyward is hitting turbulence.

Editor's pickPAYWALLFinancial Services
Bloomberg· Yesterday

AI Trade’s Bruising Week Forces Investors to Be More Selective

Adam Dell, founder and CEO at Domain Money joined Bloomberg Businessweek Daily to discuss the intersection of fintech, VC, and early stage startup mechanics and building a tech business. (Source: Bloomberg)

Editor's pickPAYWALLTechnology
FT· Yesterday

SpaceX bonds sell off days after AI and rocket group’s $25bn debt deal

Yields move towards levels commonly associated with junk-rated companies

Editor's pickPAYWALLTechnology
FT· Yesterday

S&P 500 notches longest losing streak in 10 months as chipmakers slide

Software stock-led recovery proves fleeting as Wall Street’s main indices fall for fifth consecutive session

Editor's pickTechnology
Siliconrepublic· Yesterday

NYT: OpenAI mulls delaying IPO over valuation concerns

CEO Sam Altman reportedly does not want OpenAI to be valued at less than $1trn at IPO. Read more: NYT: OpenAI mulls delaying IPO over valuation concerns

Editor's pickTechnology
Business Insider· Yesterday

Enterprise AI Spending Grows, OpenAI Leads, RBC Reveals - Business Insider

RBC's latest CIO survey finds no AI token panic, no SaaSpocalypse, and surging enterprise AI spending, led by OpenAI.

Editor's pickFinancial Services
Whalesbook· Today

AI Bubble Fears Hit Global Markets: What India's Macro Upgrade Means | Whalesbook

Global AI stocks face a sell-off amid bubble fears, while India's GDP forecast rises to 6.8%. Here is what investors need to know about this market shift.

Editor's pickFinancial Services
Livemint· Yesterday

India can still build AI winners despite US, China lead, says Accel's Subrata Mitra | Company Business News

Accel is deploying capital across AI, consumer, fintech and manufacturing from its $650 million eighth India fund launched in January 2025.

Editor's pickTechnology
Artificial Intelligence Newsletter | June 26, 2026· 2 days ago

Onsemi to acquire Synaptics in all-stock deal, gives Synaptics $7bn enterprise value

Onsemi has agreed to acquire Synaptics in an all-stock deal valued at $7 billion, aiming to extend its capabilities into intelligent systems through Synaptics' Edge AI compute franchise.

Editor's pickTechnology
Daily Brew· Today

ON Semiconductor to Acquire Synaptics in $7 Billion Deal to Boost AI Market Reach

ON Semiconductor is set to acquire Synaptics in a $7 billion all-stock transaction to bolster its AI capabilities and expand its market potential to $30 billion by 2030.

Editor's pickTechnology
BitcoinWorld· Yesterday

Apple Stock Dips As Rising Chip Costs Pressure AI-Driven Growth

Apple shares fall as rising chip costs pressure the AI trade. Analysis of market implications and supply chain challenges for tech investors.

Editor's pickTechnology
Crypto Briefing· Yesterday

KOSPI plunges as Samsung and SK Hynix lead chipmaker rout amid AI sentiment shift

The company makes high-bandwidth memory chips, the kind that AI data centers consume voraciously. Its ascent to Korea’s most valuable company, leapfrogging Samsung, was a testament to just how much money was flowing into AI hardware bets. Margin debt across the Korean market had reached a record 38.5 trillion won earlier in June. South Korea’s semiconductor sector sits at the heart of the AI supply chain...

Editor's pickTechnology
Intellectia.AI· Today

AI Semiconductor Boom 2026: Top Chip Stocks Driving the $1.3T Market

The AI semiconductor boom is driven by massive capital expenditure from hyperscale cloud providers building AI infrastructure. Companies like Microsoft, Google, Amazon, and Meta are investing hundreds of billions in data centers for AI training and inference. Bank of America projects the global semiconductor market will reach $1.3 trillion in 2026...

Editor's pickFinancial Services
Livemint· Yesterday

Foreign investors exit India funds as AI boom redirects capital. Will inflows return? | Stock Market News

Foreign investors pulled money from India-focused funds in the March quarter as capital shifted toward AI-linked markets such as Korea and Taiwan, even as some investors see early signs of macro stabilization in India.

Editor's pickFinancial Services
Cointribune· Yesterday

Billionaire Jeremy Grantham Predicts A Major Tech Market Correction

For Jeremy Grantham, the euphoria surrounding AI signals a major crash, while bitcoin is doomed to disappear.

Editor's pickTechnology
Eciks· Yesterday

As markets deliver positive returns through mid-2026, investors are turning to tech stocks and AI-driven themes to build wealth this year.

Cloud companies plan to spend over $670 billion on AI infrastructure in 2026, driving a historic data center buildout.

AI Market Competition10 articles
Editor's pickTechnology
Substack· Today

AI: Blip 2.0++, OpenAI IPO now 2027, 'RAMAgeddon' reaches Apple. AI-RTZ #1130

Nadella reframes AI as job reorganization over elimination, and says the industry still has to “earn the social permission.” My take: this is the Frenemies dynamic in full color — Microsoft wants to be far less dependent on a handful of frontier model companies (beyond Nvidia too), and it’s carving out an ‘us vs them’ position with the customers right ahead of the Open AI and Anthropic mega-IPOs.

Editor's pickMedia & Entertainment
News-articles· Yesterday

Adobe's Strategic Expansion of Ecosystem Accessibility

Adobe uses generative AI to expand accessibility for novices while maintaining professional quality standards to compete with tools like Canva and AI startups.

Editor's pickTechnology
Reuters· Yesterday

Reuters Tech News | Today's Latest Technology News | Reuters

Senior research scientist John Jumper said on Friday he would leave Google DeepMind to join AI startup Anthropic, the latest high-profile departure ​at the Big Tech giant's AI lab.

Editor's pickTechnology
eTeknix· Yesterday

AMD EPYC Venice Could Ship More Units Than NVIDIA Vera CPUs in 2027, Says Morgan Stanley - eTeknix

The race for AI hardware is no longer focused only on GPUs. According to a Morgan Stanley report, AMD could ship more EPYC Venice CPUs than NVIDIA ships Vera CPUs in 2027. The report estimates that AMD will ship around 6.75 million EPYC Venice processors, while NVIDIA is expected to ship about ...

Editor's pickTechnology
Artificial Intelligence Newsletter | June 26, 2026· 2 days ago

New York Times proposes third amended complaint against OpenAI, Microsoft

The New York Times is requesting to file a third amended copyright complaint against OpenAI and Microsoft, featuring expanded contributory infringement claims regarding AI training services.

AI Productivity4 articles
Editor's pickTechnology
VentureBeat· Yesterday

Most companies think they're building a software factory. They're actually just shipping bugs faster.

Industrialized factories changed how the world produced physical goods: more output, lower costs, faster than anything that came before. Now a similar shift is happening with software.  LLMs have lowered the barrier to writing code, increased individual output, and pushed organizations to think about software development as a production system. The standard software development lifecycle and CI/CD practices that have held for decades won't hold up under that pressure. That's where the software factory comes in — and like physical factories, it needs more than speed to actually work. The idea of a “software factory” started to solidify over the past year. Luca Rossi's "The Era of the Software Factory" made the case plainly: AI is not just changing how fast people write code — it's changing the whole production system around software.  The concept can mean different things: a collection of coding agents and skills files; faster CI/CD; better review systems; or more automation around software delivery. A better frame is to think of it less as a tool category and more as a set of principles. A software factory can't just be a loose collection of prompts, agents, and plugins. It needs a platform that defines how work moves through the system and how code is generated, reviewed, tested, traced, deployed, and improved when something goes wrong. Otherwise all you’re doing is putting yet another one-off machine into an empty room and calling it a factory.  Why is this happening now? There are a few forces all hitting at the same time. Companies have always wanted more software than engineers can produce. That’s why tools like Excel exist: They often fill in the gap for a lot of the software that many companies wish they could make. AI has also lowered the barrier of entry to creating code, and this is the part everyone focuses on. Code creation is now easier, though not always cheaper or better, as evidenced by many high-profile companies fretting over their high AI bills. The barrier to writing functional code has effectively collapsed. More importantly, a single engineer can generate more code than they could just a few years ago. That changes the bottleneck: it’s no longer “How fast can someone write this?” or even, in some cases, “Can someone understand how to code?” Instead it becomes, “Should this be written?”  More importantly, can we actually create end products that are durable and reliable and don’t just build tech debt? Or are we just putting out more AI slop faster than ever? That’s where the danger lies.  The dangers of the modern software factory All of this sounds great. Factories, after all, made production faster and more consistent.  They made it possible to build more cars and products, less expensively, which led to more people being able to afford cars and products. Putting environmental impacts aside, you could argue this was positive. But like many things in engineering, there are always tradeoffs, and in this case, there are new risks. When you increase the output of one person with machinery, digital or otherwise, you also increase the mistakes that can be made either by the individual or the machinery. The speed at which code can now be put out is on an industrial scale. Even smaller organizations can suddenly have code bases ballooning up to the size of tech company code bases a decade ago.  The data is already showing problems. Faros AI found that while task throughput per developer is up 33.7% and PR merge rate is up 16.2%, the incidents-to-PR ratio has risen 242.7% and bugs per developer are up 54%. Google’s DORA research found that more AI adoption was actually associated with worse delivery stability.  As a fractional head of data, I've been brought in to fix these exact issues. In the past year alone, I've worked on two projects where AI-generated data infrastructure slowly started to morph over time. Between multiple engineers trying to move quickly and a lack of standards, these projects became unruly. Code bases tend to go through some level of evolution, but as different styles blend, the LLMs in turn start to create their own mutations. Codebases developed five to six different styles within months — a process that previously took years. Layer by layer, the engineers would slowly stop understanding exactly what was going on. The pattern echoes what happened a decade ago with self-service tooling: early productivity gains that masked downstream complexity. And that’s why the software factory can’t just be about speed.  What makes a software factory work There are several key principles to consider when building a software factory. Platform over tools: Many teams are slowly implementing AI into their coding workflows at the edges — adding a PR review agent or a skills file into their repos. But building an actual software factory requires a platform, not a collection of tools at the edges. A platform provides a unified foundation where tools aren't scattered in separate corners. Instead, they actively share data, talk to each other, and work as a single cohesive system — standards, processes, and the work itself all connected.  Rerunability and traceability: A real platform requires the ability to go back into any run, identify what went wrong, and rerun it — which is why one-off agents don't make a factory. The system needs to support taking a serial ID, looking it up, and tracing exactly how it got to the output it produced. This is why state machines make more sense than loops for AI workflows: they make it far easier to rerun a process and understand what happened at each step. Safety and guardrails: Factories are not safe places. Neither is a software factory. As more people develop on these platforms, better guardrails and safety measures need to be built in. Testing and quality control need to be pushed to the front of the process — catching bugs at the lowest possible stage reduces the cost to fix them and limits the blast radius. Standardization: At the enterprise level, every codebase has its own flavor. Layering a code assistant on top without standards produces an amalgamation of styles. Standardization has to be built into the process from the start. Quality control: In older manufacturing models, quality control happened at the end of the line. The product was built, inspected, defects found, and fixed later. Toyota's approach was different. Quality was pushed into the process itself — workers were expected to stop the line when something was wrong. The goal wasn't to catch defects at the end; it was to prevent them from flowing downstream in the first place.  The same is true for the software factory. QC needs to be baked into the entire process, starting with how the spec is written. That means integrating static code analysis that catches obvious errors and providing templates to LLMs so they know the structure the code should follow. Without that, the bottleneck becomes the final review — or teams just push out more AI slop. Speed without quality isn't productivity Improving the speed of your code output is not actual productivity if the downstream issues aren’t managed. A company is not more productive because it produces millions of cars, only to see them all fall apart within 100 miles. It’s also not more productive if all it does is produce an endless stream of proofs-of-concept that never enter production.  Actual productivity is when the software factory takes ephemeral tokens and turns them into durable outputs. It's easy to talk about lines of code and how much faster your team is moving. The software factory that wins isn't the one that generates the most code. It's the one that generates the fewest defects downstream.

Editor's pickProfessional Services
Forbes· Yesterday

Council Post: How AI Is Changing The Economics Of Compliance

When AI can complete a security questionnaire in hours that previously required days of analyst time, the unit economics of a compliance engagement begin to invert.

Labor, Society & Culture

21 articles
AI & Employment12 articles
Editor's pickEducation
Daily Brew· Yesterday

David Autor named head of the Department of Economics

MIT has appointed economist David Autor as the new head of its Department of Economics.

Editor's pickPAYWALLProfessional Services
FT· Yesterday

The AI factory: the rewiring of India's tech industry

Outsourced data services may not be enough for India to thrive in the AI era

Editor's pickProfessional Services
Top Daily Headlines: Infosys boss says vibe coding is no threat because there’s more to writing software than writing software· Yesterday

Infosys boss says vibe coding is no threat because there’s more to writing software than writing software

Despite warnings of revenue deflation, the chairman predicts AI will create more work rather than less for services organizations.

Editor's pickTechnology
eWeek· Yesterday

2026 Layoffs Tracker: Meta, Robinhood, Walmart, and Oracle Lead AI-Driven Job Cuts | eWeek

Tech layoffs are accelerating as companies invest heavily in AI, raising questions about automation, over-hiring, and corporate accountability.

Editor's pick
The Hindu BusinessLine· Yesterday

AI unlikely to trigger 'Job Apocalypse', it may create uneven workforce disruption: Goldman Sachs Report - The HinduBusinessLine

Goldman Sachs report suggests AI will reshape labor markets, displacing some jobs but also creating new opportunities over time.

Editor's pick
ETEnterpriseai.com· Yesterday

Goldman Sachs Report: AI Won't Cause Job Apocalypse, But Workforce Disruption is Inevitable, ETEnterpriseai

Making AI Work: A new Goldman Sachs report assures that while an 'AI job apocalypse' is unlikely, significant changes in the labor market are expected, with both job displacement and new opportunities arising over time.

Editor's pickEducation
EdTech Innovation Hub· Yesterday

RAISE US launches AI workforce initiative | ETIH EdTech News — EdTech Innovation Hub

AI skills and workforce training gain more than $500 million in commitments as Gina Raimondo and Eric Holcomb launch RAISE US. ETIH edtech news covers state pilots, apprenticeships, worker transitions, and support from Microsoft, Amazon, Anthropic, and the OpenAI Foundation.

Editor's pickTechnology
Dark Reading· Yesterday

AI Won't Wipe-Out Entry-Level Cybersecurity Jobs

Instead of eliminating jobs for early-career cyber pros, AI is creating new opportunities for candidates with strong human decision-making skills.

Editor's pickTechnology
Information Week· Yesterday

CIO guide: Balancing the board's AI hype and employee pushback

How can CIOs deliver on board and C-suite expectations while employees worry about the impact of AI? Learn how CIOs are measuring employee sentiment.

Editor's pickProfessional Services
PR Newswire· Yesterday

India's technology services sector will continue to grow in the AI era: Nasscom US CEO Forum

/PRNewswire/ -- The technology services sector in India will continue to remain central to global enterprises transformation in the AI era. AI does not reduce...

Editor's pickProfessional Services
The Manila Times· Yesterday

Artificial intelligence won't replace humans, only their jobs, says IBM executive | The Manila Times

ARTIFICIAL intelligence may be advancing at a pace that is reshaping entire industries, but it is not displacing the need for human judgment, accountability or domain expertise. Instead, it is reorganizing the structure of work itself, according to Arun Biswas, Global AI and Sustainability ...

Editor's pick
Washington Examiner· Yesterday

Trap is set: Job market is about to get crushed if Labor Department doesn't act

The impact of artificial intelligence on the workforce is more on how the technology shapes their hours, pay, and, by extension, their quality of life.

Technology & Infrastructure

57 articles
AI Agents & Automation10 articles
Editor's pickFinancial Services
Arxiv· Today

OpenFinGym: A Verifiable Multi-Task Gym Environment for Evaluating Quant Agents

arXiv:2606.26350v1 Announce Type: new Abstract: Although large language model agents are increasingly applied to quantitative-finance workflows, their evaluation remains fragmented across isolated tasks, while the financial relevance of benchmark tasks is often overlooked. Yet financial workflows are inherently multi-stage, spanning interdependent tasks such as forecasting, strategy construction, risk management, and trading. Existing platforms typically focus on a single task, and can therefore overstate agent competence and fail to reveal weaknesses in generalization, real-market interaction, and financially meaningful decision-making. We introduce OpenFinGym, a unified gym environment for quantitative-finance agent development that covers forecasting, market generation, real-time trading, and fraud detection under a single execution and verification interface. OpenFinGym additionally provides an automated task-construction pipeline that turns quantitative finance publications into executable task packages; a containerised runtime with a host-side verifier service that supports scalable agent rollouts and prevents runtime train-test leakage; a paper trading engine with a low-latency data-stream design; deferred-resolution support for long-horizon and event-market forecasts; and integration for SFT and RL post-training

Editor's pickManufacturing & Industrials
Robotics & Automation News· Yesterday

AI agents and business process automation beyond the factory floor

Automation is expanding beyond manufacturing into finance, HR, procurement, and customer operations. Learn how AI agents are transforming business process automation and enterprise workflows.

Editor's pick
Daily AI News June 26, 2026: AI Startups Are Coming With Two-Thirds Fewer People· Yesterday

How agents are transforming work

OpenAI discusses the shift from chat-based assistance to long-horizon workflows where agents execute tasks across engineering and knowledge work.

Editor's pickProfessional Services
KDG· Yesterday

What Business Leaders Need to Know About AI Agent Observability

Explore the role of AI agents in organizations and the critical need for observability to ensure responsible AI adoption.

Editor's pickConsumer & Retail
TechRadar· Yesterday

Know your agent: building the foundation of autonomous commerce | TechRadar

Identity verification and "Know Your Agent" protocols are essential for secure AI commerce

Editor's pickFinancial Services
Arxiv· Today

AlgoEvolve: LLM-driven Meta-evolution of Algorithmic Trading Programs

arXiv:2606.26173v1 Announce Type: new Abstract: Recent work shows that Large Language Models (LLMs) can act as semantic mutation operators for the evolutionary discovery of programs and proofs. Most current applications focus on static coding benchmarks. We extend this paradigm to algorithmic trading. This domain is uniquely challenging because it is noisy, non-stationary, and highly discontinuous. We present AlgoEvolve, an LLM-driven evolutionary framework that generates, evaluates, and iteratively improves executable trading strategies. These strategies are expressed as Python code and evaluated through a rigorous testing protocol. Across multiple experiments, the system exhibits emergent regime-adaptive strategy logic, including autonomous shifts in trading rules. We further introduce a meta-evolutionary outer loop that evolves the prompts guiding program synthesis in the inner loop. This outer loop discovers improved search heuristics. These heuristics balance exploration and exploitation while reducing zero-trade failures. They consistently outperform initial human-designed instructions. The results demonstrate that LLM-based semantic evolution provides a viable approach for continual program synthesis in complex environments.

Editor's pickProfessional Services
Arxiv· Today

Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems

arXiv:2606.26356v1 Announce Type: new Abstract: Practitioners of prompt-composed agentic systems report a recurring failure mode: editing one prompt module silently shifts the behavior of others despite no shared variable or executable dependency. We formalize this as compositional behavioral leakage (CBL): interference between modules sharing a context window. CBL is enabled by architectural non-isolation: transformer self-attention provides no formal boundary between concatenated modules. We probe CBL on a deployed job-evaluation agent (Claude Sonnet 4.6, 144 trials) through a reusable three-channel protocol that perturbs non-focal modules along volume, content, and form. Only the content channel produces a detectable paired effect (Cohen's d = 0.63, bootstrap 95% CI excluding zero); no recommendation flipped -- a sub-threshold regime invisible to standard QA but compounding across the thousands of decisions a deployed agent makes. CBL is orthogonal to known agent-failure axes (adversarial injection, cognitive degradation, multi-agent fault propagation, privacy leakage). We contribute an operational definition, a reusable protocol, a falsifiable prediction set, and a system-class characterization, establishing cross-module interference measurement as a requirement for prompt-composed agent evaluation.

Editor's pickTechnology
Ethan Mollick· Yesterday

AI Coding Agents Demonstrate Significant Gains in Engineering Productivity

Recent experiments show AI models completing complex software engineering tasks in hours that previously required weeks of human labor. These results indicate a rapid acceleration in automation capabilities for technical roles.

Editor's pickTechnology
VentureBeat· Yesterday

New agentic memory framework uses 118K tokens per query. LangMem burns through 3.26M.

Long-horizon reasoning exposes a core weakness in AI agents: context windows fill up fast, and retrieval pipelines return noise instead of signal. To solve this, researchers at the National University of Singapore developed MRAgent, a framework that abandons the static "retrieve-then-reason" approach. Instead, it uses a mechanism that allows an agent to dynamically develop its memory based on accumulating evidence.  This multi-step memory reconstruction is integrated into the reasoning process of the large language model (LLM). While not the only framework in this space, MRAgent significantly reduces token consumption and runtime costs compared to other agentic memory management approaches. The limits of passive retrieval in long-horizon tasks In classic retrieval pipelines, documents are retrieved through vector search or graph traversal and passed on to an LLM for reasoning. This passive approach fails because it cannot combine reasoning with memory access, creating three major bottlenecks: These systems cannot revise their retrieval strategy mid-reasoning. If an agent fetches a document and discovers a crucial missing cue — a specific date or person — it has no way to issue a new query based on that finding. Fixed similarity scores and predefined graph expansions return surface-level matches that flood the LLM's context window with irrelevant noise, degrading reasoning. Current systems rely heavily on pre-constructed structures such as top-k results and static relevance functions, limiting the flexibility required to scale across unpredictable, long-horizon user interactions. The researchers argue that to overcome these limitations, developers must shift toward an “active and associative reconstruction process,” a concept inspired by cognitive neuroscience.  Under this paradigm, memory recall unfolds sequentially rather than operating as a passive read-out of a static database. The system starts with small, specific triggers from the user's prompt, such as a person's name, an action, or a place. These initial hints point to connecting concepts or categories instead of massive blocks of text.  By following these metadata stepping stones, the agent gathers small pieces of evidence one by one. It uses each new piece of information to guide its next step until it successfully pieces together the full, accurate story. How MRAgent implements active memory reconstruction Instead of viewing memory as a static database, MRAgent (Memory Reasoning Architecture for LLM Agents) treats it as an interactive environment. When processing a complex query, the agent uses the backbone LLM’s reasoning abilities to explore multiple candidate retrieval paths across a structured memory graph.  At each step, the LLM evaluates the intermediate evidence it has gathered and uses it to iteratively optimize its search. It infers new search constraints, pursues the paths with the best information, and prunes irrelevant branches. This allows MRAgent to piece together deeply buried information without filling the LLM’s context with noise. To make this active exploration computationally efficient and scalable, the framework organizes its database using a “Cue-Tag-Content” mechanism. This operates as a multi-layered associative graph with three node types: Cues: Fine-grained keywords, such as entities or contextual attributes extracted from user interactions. Content: The actual stored memory units. These are divided into multi-granular layers, such as episodic memory for concrete events and semantic memory for stable facts and user preferences. Tags: Semantic bridges that summarize the relational associations between specific Cues and Content. This structure enables a highly efficient two-stage retrieval process. The LLM first navigates from Cues to candidate Tags. Because Tags explicitly expose the semantic relationships and structural associations of the data, the agent evaluates these short summaries to judge their relevance. The LLM identifies promising traversal paths and discards irrelevant branches before spending compute and prompt tokens to access the detailed, heavy memory contents. For example, a user might ask an AI agent, "How did Nate use the prize money when he won his third video game tournament?" MRAgent first extracts fine-grained starting cues from the prompt, such as "Nate," "video game tournament," and "win." The agent maps these initial cues to the memory graph and looks at the available associative Tags connected to them. The agent sees tags like "Tournament Victory" and "Tournament Participation.” Since it is only concerned with what the person did after they won the championship, MRAgent drops the tournament participation tag and pursues the victory tag. The agent retrieves the episodic content linked to the chosen Cue-Tag pair, retrieving three distinct memory episodes where Nate won a tournament. MRAgent looks at the three memories, decides one of them in particular is relevant to the query, and discards the other two. With this information, it updates its cues and starts another round of discovery and pruning. From the new episodic memory it has retrieved, the agent adds “tournament earnings” to its cues and uses that to traverse new tags and home in on new memories. It repeats this process until it gathers enough information to answer the query, which could be something like “Nate saved the money.” MRAgent performance on industry benchmarks MRAgent operates alongside several other frameworks addressing agentic memory building. Alternatives include A-MEM, a graph-based agentic memory framework, and MemoryOS, a hierarchical memory framework. Other persistent memory frameworks include LangMem and Mem0. The researchers tested MRAgent on the LoCoMo and LongMemEval industry benchmarks. These test the abilities of agents to resolve queries on long-horizon tasks and conversations across dozens of sessions and hundreds of turns of dialogue. The backbone models used were Gemini 2.5 Flash and Claude Sonnet 4.5. The system was tested against standard RAG, A-MEM, MemoryOS, LangMem, and Mem0.  MRAgent consistently outperformed every baseline across both models and all question types by a significant margin.  However, for enterprise developers, the most critical metric is often computational cost. In the LongMemEval tests, MRAgent slashed prompt token consumption to just 118k per sample. By comparison, A-Mem consumed 632k tokens, and LangMem burned through 3.26 million tokens per query. MRAgent also effectively halved the runtime compared to A-Mem, dropping from 1,122 seconds to 586 seconds. What makes MRAgent efficient in practice is its on-demand behavior. Evaluating tags and pruning irrelevant paths before retrieval saves money and context space. Furthermore, the system autonomously evaluates its accumulated context and inherently knows when to stop searching, completely avoiding redundant data exploration. Implementation and development catch While MRAgent is highly effective, the Cue-Tag-Content structure needs to be prepared before the agent can query it. Developers must figure out how to architect the underlying memory database to enable the LLM to efficiently navigate associative items and prune irrelevant paths without exploding compute costs. Fortunately, developers do not have to manually label or structure this data. The authors designed MRAgent with an automated distillation pipeline that uses LLMs to process raw interaction histories and automatically populate the memory graph. For a developer, the job is to implement and orchestrate this automated ingestion pipeline, rather than manually tag data. You need to set up a background job or streaming pipeline that passes raw user interactions through prompt templates to extract this metadata before storing it in your graph database. However, the authors emphasize that this is a lightweight construction phase and MRAgent intentionally keeps ingestion simple.  The authors have released the code on GitHub.

Editor's pickTelecommunications
Theregister· Yesterday

ZTE showcases practical path to Level-4 autonomous networks through agentic AI and cross-domain innovation at DTW Ignite 2026

PARTNER CONTENT: Highlighting Level-4 autonomous network solutions, TM Forum Excellence Award finalists, and joint operator trials powering cross-domain fault management and Dynamic 5G Slicing

AI Infrastructure & Compute15 articles
Editor's pickTechnology
Fortune· Yesterday

Meet Micron, the under-the-radar chipmaker that just reported a 346% sales surge and helped stop a global AI selloff

The company behind the memory chips powering AI started in a dentist’s basement in Boise.

Editor's pickTechnology
Reuters· Yesterday

Micron overtakes Meta, Tesla in market value amid relentless AI infrastructure demand

The Australian airline studied details from nutrition and ergonomics to movement and light.

Editor's pickEnergy & Utilities
Fox Baltimore· Yesterday

The AI build-out is driving prices higher for consumers

The race to build up the infrastructure behind artificial intelligence is creating a new source of inflation.

Editor's pickEnergy & Utilities
BitcoinWorld· Yesterday

Bitcoin Miners Emerge As Grid Flexibility Tool Amid AI Power Surge, Bloomberg Reports

The rapid expansion of AI infrastructure is placing unprecedented strain on regional power grids, leading to longer interconnection queues and rising electricity prices. Bitcoin miners, which have historically been criticized for their energy consumption, are now being recognized for the grid services they can provide. By participating in demand ...

Editor's pickEnergy & Utilities
Energy Magazine· Yesterday

Capgemini: Utilities Cannot Predict the Energy Demand of AI | Energy Digital

New research from Capgemini reveals ... forecast the energy demand created by the rapid expansion of AI-driven data centres. The report found that 77% of utilities are struggling to predict this demand. Electricity consumption from AI training and inferencing is set to rise ...

Editor's pickTechnology
PCMAG· Yesterday

MacBooks Are the Latest Victim of the Memory Shortage. Here's Why Laptop Prices Keep Rising | PCMag

Memory costs have exploded in 2026, leading Apple and other computing giants to jack up laptop prices. Against that backdrop, here’s how to save on a new PC this year.

Editor's pickPAYWALLTechnology
NYT· Yesterday

How a Niche Technology Became a Choke Point for A.I.

Advanced chip packaging, which boosts computing power for artificial intelligence, has made the United States more reliant on Taiwan than ever.

Editor's pickTechnology
AI Insider· Yesterday

Amazon Commits Additional $13B to India AI and Cloud Infrastructure Through 2030

Amazon has announced a further $13 billion investment in India’s AI and cloud infrastructure, bringing its total committed spending in the country to $48

Editor's pickTechnology
Daily Brew· Today

KAYTUS Launches Gigawatt-Scale Prefab AI Data Center Solution, Slashes Deployment Time by 60%

KAYTUS has launched an innovative gigawatt-scale AI data center solution at ISC 2026, offering rapid deployment with prefabricated modules.

Editor's pickEnergy & Utilities
SourceTrail· Yesterday

AI Infrastructure Expansion: Energy, Investment & Local Impact

This gap in legislation has led ... growing demand for advanced cooling technologies, such as liquid cooling, which can reduce noise levels but significantly increase the cost of construction. As the industry matures, the focus is shifting toward a more integrated approach that considers infrastructure as a vital public utility. Experts suggest that the future of AI will depend on the ability of governments and private sectors to coordinate on energy grids, water ...

Editor's pickTelecommunications
Theregister· Yesterday

Jiangsu's first AI-powered 10 Gbps all-optical campus network launched at Southeast University

PARTNER CONTENT: Integrating 50G-PON, FTTR-B, Wi-Fi 7, and intelligent AI scheduling to deliver 10 Gbps bidirectional speeds with ultra-low 0.1ms latency across Southeast University

Editor's pickTechnology
ETF Trends· Yesterday

Physical AI and Infrastructure

Excitement around AI software and large language models (LLMs) remains high in 2026. However, the emerging focus shifted toward infrastructure.

Editor's pickTechnology
Substack· Yesterday

AI to ROI: News & Analysis - June 26, 2026

Jalapeño does not perform AI model training. Open AI still depends on Nvidia GPUs for that. But inference is where Open AI spends the most compute at scale, serving hundreds of millions of requests for ChatGPT and Codex daily.

Editor's pickTechnology
OpenPR· Yesterday

Building Europe's AI Factories: How Nebius, Azur Datacenters and Inflect Are Reshaping Northern France's Digital Infrastructure

Image http www abnewswire com pressreleases wp content uploads 2026 06 2026 After two years dominated by investment announcements sovereign AI initiatives and multi billion dollar capacity plans attention across the industry is shifting decisively towards execution The question is ...

Editor's pickTechnology
eWeek· Yesterday

Orbital Data Centers Aren't Ridiculous, But They Won't Save Us From Earth's AI Infrastructure Crunch | eWeek

Orbital data centers may eventually support AI and space-native workloads, but launch costs, cooling, maintenance, and timing keep Earth central for now.

AI Models & Capabilities16 articles
Editor's pickPAYWALLTechnology
Bloomberg· Yesterday

Anthropic’s Mythos 5 AI Model Cleared by US for Wider Use

Anthropic PBC won US approval to restore some access to its powerful Mythos 5 artificial intelligence model, after resolving Trump administration concerns about the technology’s potential threats to national security.

Editor's pickTechnology
Arxiv· Today

Accelerating Returns and the Qualitative Engine for Science

arXiv:2606.26359v1 Announce Type: new Abstract: Ray Kurzweil described a thesis of accelerating returns, which is the most influential narratives in discussions of technological progress. Its central claim is that advances in multiple technological fields, especially compute, artificial intelligence, brain science, and biotechnology, interact in such a way that progress becomes self-amplifying and approximately exponential. This paper gives a simple mathematical interpretation of that claim and then argues that, even if such acceleration is real, it does not by itself resolve the central problem of scientific discovery. The reason is that accelerating returns apply most naturally to executional and infrastructural capability, whereas genuine discovery often depends on a different capacity: qualitative reasoning about when a current framework is structurally inadequate and what conceptual move is needed next. Recent ARC-AGI-3 results sharpen this distinction: humans solve the benchmark at ceiling, whereas frontier AI systems remain below 1%, indicating that the gap between current AI and human flexible reasoning is still very large. At the same time, Demis Hassabis has emphasized that humans must retain their sense of meaning and what they choose to focus their lives on, a reminder that the future of AI is not only a technical forecast but also a question of what forms of human understanding are worth preserving and transmitting. This paper positions the Qualitative Engine for Science (QES) [3] as a response to that missing capacity. In this view, the Kurzweil theory helps explain why quantitative capability may accelerate, while QES addresses the central problem in scientific discovery that acceleration alone does not solve. Its value does not depend on when AGI arrives, but on the fact that the processes of scientific discovery themselves constitute a form of human wisdom worth preserving, organizing, and making accessible.

Editor's pickPAYWALLTechnology
FT· Yesterday

OpenAI releases GPT-5.6 to select users vetted by US government

San Francisco-based company announces ‘limited preview’ of new models with powerful cyber security capabilities

Editor's pickHealthcare
Nature· Yesterday

Evaluating the robustness and readiness of large frontier models in health AI applications | Nature Medicine

Large frontier models such as GPT-5 and Gemini have demonstrated remarkable performance in a wide range of health application benchmarks. However, underneath the seemingly promising results lie salient growth areas, especially in cutting-edge frontiers such as multimodal reasoning.

Editor's pickTechnology
Arxiv· Today

Detecting and Controlling Sycophancy with Cascading Linear Features

arXiv:2606.26155v1 Announce Type: new Abstract: Interpreting and controlling model behaviors through activation steering methods requires many pairs of contrastive samples that clearly exhibit desired or undesired behavior. These data pairs determine the degree to which interpretability frameworks can reliably detect model features responsible for a behavior, and therefore the ability to steer models toward or away from such behavior. In this work, we present an iterative data generation pipeline that isolates cascading linear features responsible for a behavior. Specifically, we show how moving beyond simple binary pairs of samples, and instead isolating samples that show degrees of features that scale linearly with behavior, allows for better disentanglement of features. We focus on detecting and steering away from sycophancy -- the tendency of language models to prioritize user validation. We demonstrate that sycophancy features discovered through cascading samples form linearly separable subspaces, and allow for selection of model activations that more clearly correspond to the desired behavior than baseline approaches. We also evaluate their ability to enable detection, deterministic scoring, and robust steering, and see that they either match or outperform LLM-as-a-judge and system prompting baselines while providing lower computational demand and more interpretability guarantees. Code & Data: https://cascading-feats.github.io/

Editor's pick
Arxiv· Today

What We are Missing in Multimodal LLM Evaluation?

arXiv:2606.26348v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) can process diverse inputs, e.g., text, images, audio, and video, and generate textual responses. While their capabilities have advanced rapidly, evaluation of such models has not kept pace. Most existing evaluation benchmarks are limited to isolated tasks and reveal little about whether a model integrates information across modalities. We examine current means for evaluating MLLMs and review the existing benchmark taxonomy to identify gaps, including temporal-spatial coherence, physical world understanding, multimodal consistency, and selective attention. Addressing these gaps is essential for measuring real progress in multimodal intelligence and exposing capability boundaries.

Editor's pickTechnology
VentureBeat· Yesterday

OpenAI unveils GPT-5.6 Sol, Terra and Luna models — but only accessible to limited preview partners for now, per US Gov

OpenAI is announcing a limited preview of its newest frontier AI model GPT-5.6 family, which comes in three variants: Sol, Terra, and Luna. Sol is for the hardest problems, such as complex coding and security research; Terra is for high-volume business tasks like customer support, internal tools and document analysis; and Luna is for faster, lower-cost everyday work like summarization, drafting and routine automation. Sol and Terra set new high benchmark scores, while Luna performs near GPT-5.5 levels on several tests despite being positioned as the fastest and lowest-cost model in the GPT-5.6 family. However, the models are being made available initially to a narrow set of approximately 20 total organizations, after OpenAI shared the models and release plans with the U.S. government. A general release is planned for "the coming weeks." The staggered release follows an executive order issued by President Donald J. Trump earlier this month on June 2, 2026, which calls upon various federal agencies to collaborate on a process for benchmarking and assessing capabilities of new AI models to ensure they are safe and appropriate for wide release. While this process remains underway (it was said in the order to take 30 days, so July 2), OpenAI says in its release blog post that it "previewed our plans and the models’ capabilities ahead of today’s launch. At [the U.S. government's] request, we are starting with a limited preview for a small group of trusted partners." OpenAI's limited preview release strategy also follows the drastic step taken by the U.S. government to issue an export control order against Anthropic, OpenAI's top U.S. competitor, over jailbreaks found in its most powerful generally released model, Claude Fable 5, to which Anthropic responded by removing any access to the model and its cybersecurity focused counterpart Claude Mythos 5 by public or private parties. (Anthropic had earlier previewed a prior version of the model as "Claude Mythos Preview" to a selected small number of external participants in its cybersecurity research program "Project Glasswing," dating back to April.) Because OpenAI is coordinating its release framework with the White House ahead of a broader public launch, enterprise buyers must navigate a novel landscape of real-time safety interventions, mandatory compliance parameters, and structured token caching systems. How the 3 new GPT-5.6 models differ: Sol vs. Terra vs. Luna The three GPT-5.6 models are designed to address different enterprise needs and performance profiles. Sol is the top-tier option, built for the most demanding tasks such as complex reasoning, extended coding sessions, advanced agent-driven workflows, and security-focused applications. Sol delivers the highest level of capability but comes at the highest price: $5.00 per million input tokens / $30.00 per million output tokens — the same as GPT-5.5 — and OpenAI says it delivers a major performance gain for long-running coding, cybersecurity and agentic tasks. Terra balances strong performance with efficiency. It is intended for large-scale production environments where organizations need reliable results across high volumes of work without the overhead of the most advanced model. It's available for $2.50/$15 per 1M tokens. Luna is the most lightweight and cost-efficient option, optimized for speed and everyday use cases. It is well suited for simpler tasks, routine workflows, and applications where responsiveness and scalability are more important than maximum depth of reasoning, and is the most affordably priced at $1/$6 per million tokens in and out, respectively. Sources with knowledge of OpenAI's inner workings shared with VentureBeat that the new naming scheme was designed to move away from the "nano" and "mini" variants of GPT-5, as these models are not so different in terms of size or raw intelligence, but rather, designed for different distinct use cases. As OpenAI states in its blog post about the new naming scheme: "In this new naming system introduced with GPT‑5.6, the number identifies a model’s generation, while Sol, Terra, and Luna identify durable capability tiers that can advance on their own cadence. Together, the family gives people and developers clearer choices across intelligence, speed, and cost." Also, sources said OpenAI sought to evoke a sense of inspiration by looking to the cosmos and names associated with it. Further, Sol fits well alongside OpenAI's Daybreak opt-in program for organizations interested in using OpenAI models to bolster cyber defense, which is an added bonus. The "Sol" voice style for OpenAI's voice mode on ChatGPT is unrelated, and will likely be renamed. The new GPT-5.6 system card adds another important point for businesses: OpenAI is classifying all three GPT-5.6 models — not just Sol — at its “High” risk level for both cyber and biological/chemical capability, while rating them below that level for AI self-improvement. That means even the cheaper Terra and Luna tiers may carry new governance obligations for companies using them in security, life sciences or other sensitive workflows. Here's how they stack up against the rest of the current leading LLM field in price — note that OpenAI's cheapest option is overall a mid-priced model, and still more expensive than the frontier-level GLM-5.2 VentureBeat Frontier AI Model API Pricing Snapshot Model Input Output Total Cost Source MiMo-V2.5 Flash $0.10 $0.30 $0.40 Xiaomi MiMo deepseek-v4-flash $0.14 $0.28 $0.42 DeepSeek deepseek-v4-pro $0.435 $0.87 $1.305 DeepSeek MiniMax-M3 $0.30 $1.20 $1.50 MiniMax Gemini 3.1 Flash-Lite $0.25 $1.50 $1.75 Google Qwen3.7-Plus $0.40 $1.60 $2.00 Alibaba Cloud MiMo-V2.5 $0.40 $2.00 $2.40 Xiaomi MiMo Grok 4.3 (low context) $1.25 $2.50 $3.75 xAI MiMo-V2.5 Pro (≤256K) $1.00 $3.00 $4.00 Xiaomi MiMo Kimi-K2.6 $0.95 $4.00 $4.95 Moonshot/Kimi GLM-5.2 $1.40 $4.40 $5.80 Z.ai GPT-5.6 Luna $1.00 $6.00 $7.00 OpenAI Grok 4.3 (high context) $2.50 $5.00 $7.50 xAI MiMo-V2.5 Pro (>256K) $2.00 $6.00 $8.00 Xiaomi MiMo Qwen3.7-Max $2.50 $7.50 $10.00 Alibaba Cloud Gemini 3.5 Flash $1.50 $9.00 $10.50 Google Gemini 3.1 Pro Preview (≤200K) $2.00 $12.00 $14.00 Google GPT-5.6 Terra $2.50 $15.00 $17.50 OpenAI GPT-5.4 $2.50 $15.00 $17.50 OpenAI Gemini 3.1 Pro Preview (>200K) $4.00 $18.00 $22.00 Google Claude Opus 4.8 $5.00 $25.00 $30.00 Anthropic GPT-5.5 $5.00 $30.00 $35.00 OpenAI GPT-5.5 Instant (chat-latest) $5.00 $30.00 $35.00 OpenAI Sakana Fugu Ultra (≤272K) $5.00 $30.00 $35.00 Sakana AI GPT-5.6 Sol $5.00 $30.00 $35.00 OpenAI Claude Fable 5 / Claude Mythos 5 $10.00 $50.00 $60.00 Anthropic Technology: deeper reasoning and subagent-based work The main technical change in GPT-5.6 centers on giving the model more time and structure for hard tasks during inference. OpenAI is adding a new max reasoning setting for GPT-5.6 Sol, aimed at problems that require more extended deliberation. OpenAI is also introducing ultra mode, which brings in subagents that can split up and accelerate complex projects, rather than keeping the work inside a single-agent flow. The company’s launch evaluations suggest this approach improves performance on several agent-style tasks. Benchmarks show measurable improvement from GPT-5.5, and new state-of-the-art on TerminalBench 2.1 command-line tasks The GPT-5.6 series demonstrates a clear performance leap over its predecessors across complex reasoning and long-horizon tasks. In command-line automation evaluated on TerminalBench 2.1, both the flagship Sol model and the mid-tier Terra outpace the previous GPT-5.5 benchmark, though notably Sol used the new ultra thinking mode to achieve a record-high score of 91.91% on the benchmark, and the max mode achieved 88.76% — ahead of both GPT-5.5's 83.4% and Claude Mythos 5's 88%. This superiority extends into professional workflows on Agent's Last Exam, where Sol is the sole model to successfully clear the halfway mark for task completion at 50.9% in "code mode," while the everyday Luna tier also manages to narrowly edge out the prior generation's flagship. In quantitative biology and genomics testing, Sol and Terra achieve higher accuracy rates than both GPT-5.5 and GPT-5.4, with Sol explicitly managing these stronger results while consuming fewer tokens. Finally, across cybersecurity evaluations measuring vulnerability research and exploitation, the new models push past prior performance ceilings; Sol reaches significantly higher intended exploit rates as reasoning time scales up and achieves competitive capability caps using a fraction of the output tokens required by older models. On ExploitBench, OpenAI says Sol performs near Mythos Preview while generating roughly one-third as many output tokens. Predictable prompt caching mechanics and a Cerebras speed bump To help enterprises control the unpredictable cost curves of running agentic loops, the GPT-5.6 API introduces a revamped prompt caching protocol. Developers can now implement explicit cache breakpoints, backed by a guaranteed 30-minute minimum cache lifetime. Under this framework, initial cache writes cost 1.25x the model’s standard uncached input rate, while later cache reads receive a 90% discount. In practice, businesses running repeated or similar operations pay more to establish the cache, then much less each time they reuse that cached context during at least the 30-minute minimum cache window. For systems that routinely pass massive context windows or codebase definitions back into the model, this predictability is a critical financial guardrail. Furthermore, for enterprise applications where latency is the primary barrier to adoption, OpenAI is launching GPT-5.6 Sol on Cerebras hardware this July. This infrastructure partnership claims processing speeds of up to 750 tokens per second, targeting specialized enterprise applications requiring real-time, frontier-grade reasoning. Enterprise implications: High security and algorithmic friction For corporate engineering, information security, and compliance teams, the deployment of GPT-5.6 requires a meticulous look at its security architecture. To achieve clearance for release, OpenAI dedicated roughly 700,000 A100e GPU hours solely to automated red-teaming GPT-5.6. This compute was allocated to discovering "universal jailbreaks"—systemic attack vectors designed to bypass safeguards across varied contexts, rather than single-prompt workarounds. OpenAI says it has implemented a multi-layered safeguard stack that operates in real time, putting up intentional operational hurdles for enterprise security teams. Model-level refusals: GPT-5.6 is tuned to reject banned cyber help, including requests that mask malicious intent or attempt jailbreak-style workarounds. Live misuse screening: Separate cyber and biology detectors review generations while they are being produced. Activation-based screening: For Sol and Terra, OpenAI says it is adding activation classifiers that monitor internal model signals during inference. If those systems detect a risky pattern, output streaming can pause while another safety check reviews the content. Luna does not appear to receive that same activation-classifier layer, though it is still covered by other monitoring systems. Reasoning review pauses: When risk appears elevated, generation can stop while a larger reasoning system examines the exchange and surrounding context. If the system classifies the output as disallowed, the answer is blocked before it reaches the endpoint. Because legitimate defensive work—such as code reviews, vulnerability discovery, patch engineering, and defensive testing—frequently utilizes the exact same code primitives as offensive exploits, OpenAI admits that its classifiers may regularly trigger false positives. The system card says OpenAI’s monitoring stack posted 94.8% overall recall on its biology evaluation set and 81.6% overall recall on its cybersecurity evaluation set. Those figures give enterprises a rare quantitative look at the safeguards, but they also show the system is not perfect and may miss some risky cases or block some legitimate work. Persistent flagging can trigger automated account-level reviews across historical conversations to evaluate if an enterprise client is engaging in malicious behavior or standard security research. OpenAI is currently negotiating longer-term enterprise safety compliance controls, including customer-operated safety overrides and privacy-preserving detection mechanisms, to insulate corporate data from manual review pipelines. Importantly, OpenAI notes that under testing, Sol remains optimized for defensive containment rather than offensive deployment. In evaluations running against the Chromium and Firefox codebases, the model successfully isolated bugs and exploitation primitives but was unable to autonomously engineer a functional, full-chain exploit, keeping it safely below the organization's "Cyber Critical" alert threshold. But all three GPT-5.6 models crossed its “High” cyber threshold on internal capture-the-flag testing, with Sol reaching 96.7%, Terra reaching 91.84% and Luna reaching 85.19%. That distinction matters for enterprise security buyers: OpenAI is presenting GPT-5.6 as powerful enough to help automate parts of vulnerability research and exploit analysis, but not yet as a system that can reliably run a complete advanced attack campaign without human direction under the company’s test conditions. The Geopolitics of the phased release The broader rollout of the GPT-5.6 series reflects an escalating entanglement between frontier AI labs and national security protocols. The decision to limit initial access to a small circle of vetted partners whose details are shared with the U.S. government stems from direct coordination regarding the developing cyber Executive Order framework. OpenAI has taken the unusual step of publicly critiquing this sovereign gatekeeping within its official product announcement documentation. The company states plainly: "We don’t believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them." This tension highlights the precarious position of modern tech enterprises. While organizations can leverage unprecedented agentic efficiency and robust defensive patching capabilities via benchmarks like ExploitGym and ExploitBench, they must also accept that access to premier tools remains subject to diplomatic and regulatory authorization.

Editor's pickTechnology
Arxiv· Today

Refusal Lives Downstream of Persona in Chat Models

arXiv:2606.26161v1 Announce Type: new Abstract: Linear directions in activation space have been identified for both refusal and persona traits in instruction-tuned chat models, but the two have been studied as separate mechanisms. We show they interact: a compliant persona gates refusal. In Qwen2.5-7B-Instruct and Llama-3.1-8B-Instruct, we extract a compliant model-persona direction and a refusal direction and intervene on both. Compliant persona steering suppresses refusal -- in Llama, the refusal rate falls from 97% to 2%. Reintroducing the refusal direction partially restores refusal at late layers but not at early ones. Projecting out the persona direction in a late-layer window restores it to baseline; projecting out a random direction does not. Refusal is therefore gated at the late-layer expression stage, downstream of where it is computed. Treating refusal as a single isolated direction misses its dependence on persona.

Editor's pickHealthcare
Arxiv· Today

Knowledge-augmented Agentic AI for Mental Health Medication Information Seeking

arXiv:2606.26205v1 Announce Type: new Abstract: Patients increasingly seek medication information online, yet safety knowledge for psychiatric drugs is split between regulatory adverse-event records, which are authoritative but abstract, and patient narratives, which are experience-near but unvalidated. Integrating them without conflating evidence and anecdote is especially consequential in psychiatry, where poorly contextualised information can amplify fear, nocebo responses, and non-adherence. Here we develop a provenance-aware, knowledge-graph-based multi-agent framework unifying 466,525 Reddit posts, 60,782 WebMD reviews, and twenty years of U.S. FDA Adverse Event Reporting System records for nine antidepressants. A large-language-model entity-recognition pipeline benchmarked against physician annotations reached highest F1 scores of 0.969 for medications and 0.973 for conditions. The two community platforms were far more concordant with each other (overlap up to a Jaccard similarity of 0.905) than with regulatory reports, indicating that patient-generated data form a partly independent safety signal. For sertraline, many adverse events appeared in community sources hundreds of days before the corresponding FDA date. A Neo4j knowledge graph grounded in ATC-N, ICD-10, and MedDRA vocabularies preserves provenance, keeping every claim traceable and regulatory facts distinct from patient experience. These results establish source-aware integration as a route to more auditable psychiatric medication information, with usefulness and patient benefit to be tested prospectively.

Editor's pick
Newsweek· Yesterday

Top AI Models Might Be Confident—Doesn’t Mean They’re Right - Newsweek

“Mostly right is the wrong bar,” Pearl CEO Andy Kurtzig says, as research tests top AI models against professional judgment.

Editor's pick
Dwarkesh Podcast· Yesterday

The next big breakthrough will be AIs learning on the job

The people optimistic about this ... research problems in natural language processing collapsed against the flood of compute thrown into LLMs. Yes, these models are 1/1-millionth as sample efficient as humans during training. But training a one-time cost amortized across billions of user sessions. What matters is how smart, general, and sample efficient the model is within a session, and that’s clearly been improving as we do more RL training. AIs are able to ...

Editor's pickTechnology
Everything PR· Yesterday

Salesforce Audit: AI Answer Engine Failures & Benchmarks

A Salesforce AI Research audit exposes 16 failure modes in AI answer engines like Perplexity & Bing Chat. This study offers crucial benchmarks for understanding

Editor's pickTechnology
Qwen AgentWorld 🌍, Nous Research Pet Sprites 🐾, xAI Grok T3 Editor 💻· Yesterday

Qwen releases open-source world model that simulates 7 agent environments

Qwen-AgentWorld is a model trained to simulate environments like MCP, Terminal, and Android, allowing agents to train in a sandbox without needing real-world access.

Editor's pickTechnology
Daily Brew· Today

Previewing GPT-5.6 Sol: a next-generation model

OpenAI has provided a preview of its latest model, GPT-5.6 Sol, detailing its capabilities and development.

Editor's pick
Arxiv· Today

Accelerating Skill Assessment in Chess: A Drift-Diffusion-Enhanced Elo Rating System

arXiv:2606.26267v1 Announce Type: new Abstract: Rating systems such as Elo serve as the gold standard for matchmaking in competitive chess. However, they inherently suffer from response lag due to their exclusive reliance on match outcomes, neglecting the granular quality of gameplay. Nevertheless, incorporating move-by-move information into rating adjustments presents a significant challenge given the substantial noise and the vastness of the game-state space. To address this, we propose the Drift-Diffusion-Enhanced Elo Rating System (DD-Elo), a novel skill assessment framework inspired by the drift diffusion model (DDM) from cognitive neuroscience. By modeling skill expression as a decision-making process, our model integrates move-level data to capture rapid skill fluctuations. We provide a rigorous mathematical derivation proving that DD-Elo maintains a bounded deviation from the traditional Elo system, ensuring theoretical alignment. Extensive experiments demonstrate that DD-Elo adapts to skill changes faster than Elo. Our findings suggest that DD-Elo offers an explainable, highly responsive, and backward-compatible solution for chess rating ecosystems. The implementation code is publicly available at https://github.com/Aquila-zhou1/DD-Elo .

Editor's pickManufacturing & Industrials
Daily Brew· Yesterday

LLMs help robots understand vague instructions and focus on key details

Researchers at MIT are using large language models to help robots interpret ambiguous human instructions and prioritize relevant environmental details.

AI Research & Science1 articles
Editor's pickTechnology
Arxiv· Today

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

arXiv:2606.26300v1 Announce Type: new Abstract: A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning capabilities and engineering harnesses grow more sophisticated, generating complex candidate solutions is no longer difficult -- reliably verifying them has become the harder problem. Every verifier we can build is only a proxy for human intent, never the intent itself. This makes verification subject to a twofold difficulty: first, intent is underspecified by nature, making it inherently hard to faithfully check whether it has been fulfilled; second, during model training, optimization widens the gap between proxy and intent -- manifesting as reward hacking or signal saturation. To address this, we characterize the quality of verification signals along three dimensions -- scalability, faithfulness, and robustness -- and argue that achieving all three simultaneously is the central challenge. We further study four reward constructions: a test verifier for general coding tasks, a rubric verifier for frontend tasks, the user as verifier for real-world agent tasks, and an automated agent verifier for long-horizon tasks. Across different task types and policy capability levels, we conduct in-depth analysis and experiments on the core challenges of reward design and how to more effectively leverage reward signals. Experiments show that targeted verification design can effectively suppress reward hacking, improve task completion quality, and achieve significant gains across multiple internal and public benchmarks. These experiences collectively point to a core observation: no fixed reward function can remain effective as policy capability continues to grow; and verification must co-evolve with the generator.

AI Security & Cybersecurity8 articles
Editor's pickFinancial Services
Times of India· Yesterday

AI is changing financial regulation as watchdogs build tools to fight cyber threats - The Times of India

Financial regulators are racing to adopt artificial intelligence (AI) to keep pace with rapidly evolving cyber threats, with watchdogs increasingly developing their own AI-powered supervisory tools to strengthen oversight of banks and digital assets, Reuters reported.

Editor's pickTechnology
VentureBeat· Yesterday

Autonomous security agents need complete data. Here's how to check if yours is ready.

An endpoint agent cannot report its own absence. The 2026 Axonius Actionability Report, conducted with the Ponemon Institute and surveying 662 IT and security professionals, put a number on a gap SOC teams have worked around for years. Across the Axonius customer base, 12.7% of devices in a 298,000-device median inventory are missing their expected security agent. If a device has no agent, no management console shows it. If a CMDB record is stale, no reconciliation flags it. An employee who installed Claude Enterprise outside procurement created a SaaS workspace, identity surface, and API-token footprint that endpoint telemetry alone will not reliably inventory. The coverage percentage on the EDR dashboard is structurally incomplete because the reporting mechanism cannot see what it does not cover. That gap matters more now than it did six months ago. SOC and XDR vendors are pushing more autonomous investigation and remediation into production. Those agents will query the same dashboards, trust the same coverage percentages, and act on the same blind spots human analysts learned to work around. A human analyst second-guesses a 98% coverage number. An autonomous agent treats it as ground truth and moves at machine speed. Three independent signals converged on the same gap Gravitee’s 2026 survey of 900-plus executives found 88% reported confirmed or suspected AI-related incidents, and only 14.4% sent agents live with full security approval. The Axonius/Ponemon report found 52% of respondents would let autonomous agents act on recommendations — while 63% said the underlying data lacks important information. The CSA's Agentic Trust Framework requires verified data governance before agents act on any finding. Mike Riemer, Field CISO at Ivanti, said that known vulnerabilities on Azure’s honeypot networks are now attacked in under 90 seconds. “Traditional security measures continue to work,” Riemer told VentureBeat. The caveat is that those measures only protect what they can see. An EDR agent deployed across 87.3% of the device inventory leaves the remaining 12.7% outside that agent’s telemetry, policy enforcement, and detection logic. Exclusive deployment data quantifies the scale Joe Diamond, CEO of Axonius, told VentureBeat that the average CISO sees roughly 50% of what is actually on the network. “Say 50% of their environment is sitting in dark matter,” Diamond said. “They don’t know what it is, or where it is, or who has access to it, if it’s secure, if it’s not secure.” Deployment data from more than 900 Axonius customers confirms those numbers. TransUnion went from 70% to 99% endpoint coverage after out-of-band verification. Western Union went from 85% to 99% by consolidating data from 38 tools and cutting manual workload by half. Lumen discovered 1.1 million assets, where the CMDB showed 17,000. That translates to roughly 37,000 unmanaged endpoints per organization sitting outside every policy, every patch cycle, and every detection rule. Diamond pointed to Mythos, Anthropic’s frontier reasoning model, as a sign that machine-speed offensive capability will make any unknown asset far riskier than it is today. “People tend to have shiny object syndrome,” he said. “If you didn’t understand what 50% of your environment looked like from a traditional endpoint perspective, and you think you’re going to wind sprint to granular control and governance of AI, your program will fail.” Diamond called the broader AI shift “as big, if not bigger than the internet.” Three approaches compete to close the gap No single architecture solves the visibility problem today. Three approaches compete, each with named tradeoffs security teams should evaluate before procurement. A dedicated integration layer uses bidirectional API adapters to build an always-current inventory. Axonius runs 1,400-plus adapters and now discovers shadow Claude Enterprise installations via its Anthropic adapter (GA June 15). “We created a bidirectional API integration with all the IT systems and all the security controls to build an always up-to-date inventory of what the environment looks like,” Diamond told VentureBeat. Platform-native EDR and XDR intelligence builds richer asset context inside the agent footprint. Depth within the agent footprint is the advantage. The limitation is structural. Platform-native intelligence is bounded by what the agent can see, and the gap the Ponemon report identified lives precisely where that visibility ends. CMDB modernization requires continuous reconciliation against three or more independent telemetry sources. Only 13% of organizations reconcile daily, according to Axonius/Ponemon data. The remaining 87% operate on stale records that feed incorrect prioritization into any automated remediation pipeline. EDR data readiness: Five gates before autonomous remediation Before you let autonomous SOC agents close tickets or quarantine assets, this checklist tells you whether your EDR and asset data is solid enough to trust. It is vendor-agnostic, works with any EDR and CMDB, and gives you five pass/fail gates you can run in a single working session. Risk Area What the data shows Readiness threshold Action to take now Asset inventory delta Ponemon: only 45% consolidate into a single view. Forrester TEI: 150% more assets than previously identified. Lumen: 17K in CMDB vs. 1.1M discovered. Delta ≤10% between discovery, CMDB, and EDR agent count. Delta above 10% blocks automated remediation until reconciled. Run API-based discovery against all segments. Diff against CMDB and EDR console count. Reconcile quarterly minimum. Unmanaged AI services Gravitee: 88% confirmed or suspected AI incidents. Only 14.4% with full security approval. Anthropic adapter (GA June 15) discovers unmanaged Claude Enterprise installations. No high-risk AI services outside approved procurement. Weekly SaaS discovery scans. Unmanaged high-risk instances trigger IR triage before exception review. Deploy SaaS discovery or protocol-level adapters for AI service detection. Automate weekly scans. Route unmanaged instances to IR queue. CMDB record accuracy Ponemon: only 13% reconcile daily (RSAC 2026). Brooks Running: 20% server discrepancy between console and independent discovery. Top remediation barriers: unclear prioritization, unclear ownership, inconsistent data. ≥85% of records validated against 3+ independent telemetry sources. No stale or orphaned records in active remediation queue. Cross-reference CMDB against cloud inventory, EDR telemetry, and IdP directory. Continuous reconciliation replaces annual audit cycles. Endpoint agent coverage gap Ponemon: an agent cannot report its own absence (p. 8). TransUnion: 70% to 99% after out-of-band verification. RSAC 2026: 12.7% of 298K median devices missing expected agent. ≥95% agent coverage verified via out-of-band discovery. Many CISOs set this as the minimum before allowing autonomous remediation. No self-reported-only metrics in board reports. Run network-based or API-driven discovery against managed device list. Coverage below 95% blocks automated remediation scoping. Asset ownership mapping Ponemon: 32% apply tags consistently. Only 51% assign ownership on new exposures (pp. 9, 16). TransUnion: 12K to 190K assets with ownership mapped. Owner assigned within 24 hours. Tags consistent across cloud, EDR, CMDB. Three systems showing three owners = failure. Automate ownership via cloud tags, IdP group membership, or CMDB metadata. Map asset, remediation, and business owner as separate fields. Five questions to ask before allowing autonomous SOC action What independently verifies endpoint-agent coverage outside the EDR console? How does the SOC reconcile conflicts between EDR, CMDB, cloud inventory, IdP, and discovery tools? Can AI agents act on assets with unknown or disputed ownership? Can the system distinguish “not vulnerable” from “not visible”? What data-quality gate blocks autonomous remediation when coverage or ownership falls below threshold? Board-ready risk framing Kayne McGladrey, IEEE Senior Member, has confirmed the pattern across multiple published VentureBeat interviews. The structural gap in self-reported coverage is not new. What is new is that autonomous agents will act on it at machine speed without the institutional workarounds human analysts developed over years of experience. Diamond put the board-level stakes plainly in an April 2026 press statement: “Findings pile up because the data isn’t trusted, ownership isn’t clear, and entire asset classes aren’t even in the picture.” The CSA’s Agentic Trust Framework requires that any agent promoted to a higher autonomy level must pass five gates, including demonstrated accuracy and a security audit. The EU AI Act’s Article 50 transparency obligations take effect August 2, 2026. The May 2026 Digital Omnibus pushed high-risk system obligations to December 2027, but organizations deploying agentic SOC agents on incomplete asset data face immediate operational risk that outpaces any regulatory timeline. The board-ready sentence: Our EDR coverage reports are structurally incomplete because an endpoint agent cannot report its own absence, and we are verifying coverage through out-of-band discovery before deploying autonomous agents that would act on those reports at machine speed. Security director playbook Run out-of-band asset discovery this week. Compare results against your CMDB export and EDR console count. If the delta exceeds 10%, halt automated remediation scoping until the gap is reconciled. Deploy SaaS discovery for AI services. Employees install AI ahead of procurement, ahead of security. Weekly scans are the minimum. Route any unmanaged high-risk instance to your incident response queue for triage before exception review. Map asset ownership to remediation responsibility. Ponemon found only 32% of organizations apply tags consistently. If three systems show three different owners for the same asset, automated remediation has no routing target. Fix the ownership layer before deploying agents that depend on it. Kill self-reported-only coverage metrics. Any risk calculation or board report that relies on EDR console-reported coverage alone is built on data the reporting system cannot verify. Require out-of-band verification for every coverage number that informs a risk decision.

Editor's pickTechnology
Theregister· Yesterday

Amazon Q flaw let booby-trapped Git repos execute code, swipe cloud creds

Researchers warn many AI coding assistants now execute commands from project configurations

Editor's pickTechnology
Theregister· Yesterday

Miasma campaign poisons 20-plus npm packages, hunts for developer secrets

Microsoft says latest attack targets Leo Platform and RStreams packages, harvesting creds and going after more maintainers

Adoption, Deployment & Impact

23 articles
AI Adoption Barriers & Enablers6 articles
AI Applications7 articles
Editor's pickEnergy & Utilities
Arxiv· Today

How Do Tool-Augmented LLM Agents Perform on Real-World Energy Analytics Tasks?

arXiv:2606.26346v1 Announce Type: new Abstract: Agentic benchmarks have emerged across general-purpose and domain-specific settings, including finance, coding, law, and drug discovery, yet energy-domain evaluations remain largely limited to static knowledge recall. This is a critical gap for a sector that requires live data retrieval, specialized regulatory and market knowledge, and multi-step quantitative reasoning under real-world constraints. We present an empirical study of tool-augmented LLM agents on real-world energy market analytics tasks. Our evaluation environment includes 243 expert-curated problems across three categories: (1) Market Data Retrieval and Analysis, (2) Knowledge Retrieval and Interpretation, and (3) Advanced Quantitative Modeling and Decision Analytics. Tasks include price and demand analysis, tariff impact modeling, asset revenue and returns estimation, hedging strategy analysis, and optimization modeling, with problems spanning multiple difficulty levels. Agents are equipped with a configurable suite of domain tools, including live electricity market APIs for major U.S. ISOs, regulatory docket search, utility tariff databases, asset optimization models, and retrieval-augmented generation over energy market documents. We assess agent responses using a multi-dimensional evaluation protocol that scores approach correctness, answer accuracy, attribute alignment, and source validity, with category-aware routing to match scoring criteria to question type. We evaluate both closed-source and open-source LLMs, providing a comparative analysis of how model capability and domain tooling interact in a high-stakes professional domain. Key artifacts are publicly released to support reproducibility and future research.

Editor's pickTechnology
Theregister· Yesterday

Notion kills its Gmail client after AI agents keep humans from troubling inbox

More than half of users now let bots handle email, so service is headed for shutdown

Editor's pickHealthcare
Daily Brew· Today

AI Revolutionizes Healthcare: Enhancing Patient Care and Hospital Efficiency with Smart Technology

AI is revolutionizing healthcare by enhancing disease detection, treatment planning, and operational efficiency, ultimately improving patient care.

Editor's pickProfessional Services
Daily AI News June 26, 2026: AI Startups Are Coming With Two-Thirds Fewer People· Yesterday

Semantic Search for AI Agents at Scale: Retrieval and Ranking for LinkedIn’s Hiring Assistant

A production case study from LinkedIn Engineering on using ranking systems and evaluation loops to improve agentic search in hiring.

AI Measurement & Evaluation1 articles
Editor's pickTechnology
Arxiv· Today

Life After Benchmark Saturation: A Case Study of CORE-Bench

arXiv:2606.26158v1 Announce Type: new Abstract: When a benchmark's accuracy saturates, it is often retired and replaced with a more challenging version. We show that this approach privileges accuracy and misses the opportunity to study six other key dimensions of agent performance: construct validity issues such as shortcuts, out-of-distribution generalizability, efficiency, reliability, the relative importance of the model versus the scaffold, and uplift from human-agent collaboration. We use CORE-Bench Hard, a benchmark for computational reproducibility of scientific code, as a case study to demonstrate that measuring agents along these dimensions yields meaningful insights into agent performance even after accuracy saturates. First, we surface threats to construct validity in CORE-Bench Hard that are difficult to anticipate with less capable agents. We introduce an improved benchmark, CORE-Bench v1.1, and an out-of-distribution task suite, CORE-Bench OOD. Second, we find that despite accuracy saturation, CORE-Bench v1.1 remains useful for measuring efficiency, reliability, model performance, and scaffold performance. Finally, we conduct a small-scale randomized experiment to measure uplift from human-agent collaboration on real-world computational reproducibility tasks. We find a statistically significant speedup by about a factor of two -- likely underestimated due to one-fifth of human-only reproductions reaching the time limit before completing -- and describe various other findings. Together, our contributions present a more rigorous alternative to the dominant accuracy-centric evaluation paradigm.

AI ROI & Business Case7 articles

Geopolitics, Policy & Governance

41 articles
AI Geopolitics6 articles
AI National Strategy5 articles
AI Policy & Regulation29 articles
Editor's pickGovernment & Public Sector
Social Europe· Yesterday

Democracy, Control, or Competitiveness: The AI Trilemma - Social Europe

A new papal encyclical exposes the democratic trilemma shaping how the world governs artificial intelligence.

Editor's pickPAYWALLTechnology
Bloomberg· Yesterday

Anthropic Moves Toward Deal With US to Lift Curbs on AI Models

Anthropic PBC and the Trump administration are moving closer to an agreement that would lift US restrictions on the company’s top two artificial intelligence models after weeks of talks between the two sides over security of the systems, according to people familiar with the matter.

Editor's pickDefense & National Security
Arxiv· Today

Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems

arXiv:2606.26298v1 Announce Type: new Abstract: Autonomous AI agents may begin to perform consequential, irreversible actions such as clinical prescribing and production software deployment. This paper observes that human institutions have governed powerful autonomous actors not by monitoring their reasoning but by requiring independently attested evidence at the point of consequential action. We formalise this institutional pattern as a computational governance model for AI agent systems. Under the proposed model, an agent retains full autonomy over planning and reasoning but holds no execution authority over designated high-risk actions. Execution is conditional on preconditions that are each independently attested by a separate authoritative source, cryptographically bound to a declared intent, and evaluated by a deterministic policy. Decisions are recorded in a tamper-evident log amenable to independent re-verification. We present a proof-of-concept implementation and illustrate the model with examples from software deployment and clinical prescribing.

Editor's pickPAYWALLTechnology
FT· Yesterday

Trump administration asks OpenAI to stagger release of new model to vet users

The US Treasury, commerce department and other government offices request limited distribution of GPT 5.6

Editor's pickPAYWALLTechnology
NYT· Today

U.S. Loosens Restrictions on Anthropic’s Mythos A.I. Model

The move de-escalates a clash between the Trump administration and the company over its cutting-edge artificial intelligence systems.

Editor's pickPAYWALLGovernment & Public Sector
Daily Brew· Yesterday

U.S. government will decide who gets to use GPT-5.6

The U.S. government is set to oversee access to OpenAI's latest AI model, GPT-5.6, as part of new regulatory oversight.

Editor's pickPAYWALLTechnology
Bloomberg· Yesterday

OpenAI Limits Release of New Model Under Pressure From US

OpenAI is rolling out a preview version of a more capable new artificial intelligence model to select partners before making it available more widely in the coming weeks, following pressure from the Trump administration to stagger the release.

Editor's pickPAYWALLTechnology
FT· Today

Trump administration allows some access to Anthropic’s Mythos

Move eases tension with AI lab but unease over Washington’s ad hoc regulatory approach remains

Editor's pickGovernment & Public Sector
Reuters· Yesterday

US allows Anthropic to release Mythos AI to 'trusted' US organizations

Eurocommerce, the European retail association whose members include Amazon , H&M, Inditex, and Ikea, is ​asking EU tech chief Henna Virkkunen to exempt ‌ AI -generated advertisements from the bloc's new regulation requiring disclosure of AI use.

Editor's pickTechnology
Fortune· Yesterday

OpenAI agrees to stagger rollout of its most powerful model to only Trump-approved customers

It is the second time in a month that a frontier lab’s most powerful model has been held back from general release over fears about cyber capabilities.

Editor's pickTechnology
Guardian· Yesterday

OpenAI staggers AI model release after Trump administration request

Sam Altman announces limited preview of GPT 5.6 in move that echoes launch of Anthropic’s Mythos Business live – latest updates OpenAI is staggering the release of its latest AI model after a request from the US government, in a move echoing the launch of Anthropic’s Mythos product. The company behind ChatGPT signalled its dissatisfaction with the move, saying that doing so keeps the best AI tools from “users, developers, enterprises, cyber defenders, and global partners who need them”. Continue reading...

Editor's pickGovernment & Public Sector
Ethan Mollick· Yesterday

The Need for Transparency in Government AI Safety and Risk Mitigation

Public understanding of government safety concerns regarding frontier AI is essential for firm-level risk management. Increased transparency is required to prepare for potential security implications as open-source models advance.

Editor's pickTechnology
Top Daily Headlines: Infosys boss says vibe coding is no threat because there’s more to writing software than writing software· Yesterday

European Commission lines up Amazon and Microsoft for cloud gatekeeper status

The European Commission is considering designating Amazon and Microsoft as gatekeepers under the Digital Markets Act.

Editor's pickPAYWALLGovernment & Public Sector
NYT· Yesterday

Trump Threatens to Impose 100% Tariff on European Countries Over Tech Taxes

The president claimed the tariffs would override a trade deal with the European Union, which European officials finalized just days ago.

Editor's pickPAYWALL
Washington Post· Yesterday

Opinion | What if Trump is right to pump the brakes on the most advanced AI?

Nationalist fervor over beating China biases AI policy toward recklessness — and possible catastrophe.

Editor's pickTechnology
Arxiv· Today

Agentic Analysis for Agentic Infrastructure: An LLM-Powered Pipeline for Comparative Governance of DAO and Corporate AI Protocols

arXiv:2606.26203v1 Announce Type: new Abstract: As AI agent protocols proliferate, the governance structures shaping their interoperability standards remain empirically underexamined. We introduce an LLM-powered comparative pipeline for large-scale governance discourse analysis, integrating automated annotation, neural topic modeling, and multi-layer network analysis to study socio-technical power structures at scale. We validate it on two contrasting standards for agent interoperability: ERC-8004 (permissionless, on-chain) and Google A2A (corporate-led). Analyzing 4,323 governance participation records, we combine LLM-assisted coding, topic modeling, and multi-layer network analysis to examine how institutional design shapes thematic priorities and community structure. We find that while governance form influences substantive focus, both regimes exhibit comparable levels of participation inequality and community fragmentation. Discourse alignment is denser in the permissionless setting, suggesting that open governance may foster greater thematic convergence despite decentralized participation. These findings illustrate how LLM-assisted methods can advance the empirical study of technology governance, with implications for designing more equitable agentic AI standards. All data and code are openly available.

Editor's pickTechnology
Theregister· Yesterday

Google wants AI regulation, but on its own terms

Surely, we can have rules that allow us to continue doing what we're doing

Editor's pickGovernment & Public Sector
Fortune· Yesterday

Anthropic and OpenAI waged a $27 million proxy war in a Manhattan congressional race. The winner told them both to get lost

Micah Lasher won the most expensive AI election yet—then used his victory speech to reject both companies and promise to regulate them anyway.

Editor's pickDefense & National Security
Mathrubhumi English· Yesterday

Is the US tightening grip on AI? Trump administration slows GPT-5.6 rollout

The US government has requested a staggered rollout of OpenAI's GPT-5.6 due to national security concerns. Learn how this impacts AI regulation. Read more here.

Editor's pickDefense & National Security
CoinGape· Yesterday

Breaking: White House Orders OpenAI To Limit GPT 5.6 Release Over Security Reasons

OpenAI agreed to limit GPT-5.6 access after a White House request as U.S. officials navigate AI safety concerns and regulatory uncertainty.

Editor's pickTechnology
DigitalToday· Yesterday

OpenAI to limit early access to GPT-5.6 for some companies

OpenAI is delaying a full launch of its next flagship model, GPT-5.6, and will begin a limited release for selected corporate customers. The Trump administration requested a phased rollout over potential security concerns, with customer access expected to require case-by-case approval during ...

Editor's pickTechnology
DIGITIMES· Yesterday

OpenAI to release GPT 5.6 model on staggered basis in face of US regulatory uncertainty

OpenAI CEO Sam Altman told staff that its latest model, GPT 5.6, would be released on a staggered basis, with a small group of entities first gaining preview access to it after approval by the US government. The case highlights the regulatory uncertainty many local AI developers are facing ...

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | June 26, 2026· 2 days ago

US House committee passes bipartisan AI legislation

The US House Science, Space, and Technology Committee passed a bipartisan package of AI legislation aimed at expanding research access, strengthening the workforce, and bolstering national security.

Editor's pickGovernment & Public Sector
iTech Post· Today

California AI Job Loss Tracker Launches to Monitor Layoffs Across the Workforce

California launches an AI job loss tracker to monitor AI layoffs, unemployment trends, and workforce changes through a new public dashboard.

Editor's pickGovernment & Public Sector
Governing· Yesterday

California Creates the Nation’s First AI Job-Loss Tracker

The new dashboard is designed to help state leaders monitor AI's impact on employment and respond with targeted workforce policies.

Editor's pickPAYWALL
Bloomberg· Yesterday

Forum AI CEO on Pitfalls With AI in Politics

Campbell Brown, Co-Founder and CEO of Forum AI, discusses the need for AI companies to open up for scrutiny. She discusses with Romaine Bostick on Bloomberg's "The Close." (Source: Bloomberg)

Editor's pickGovernment & Public Sector
Digital Watch Observatory· Yesterday

Google proposes a balanced approach to AI governance in the US | Digital Watch Observatory

A new policy paper from Google proposes independent oversight for frontier AI development in the US.

Editor's pickTechnology
Siliconrepublic· Yesterday

Italian watchdog probes Microsoft over 365 price hike concerns

Microsoft did not provide consumers with sufficient information to assess the changes, the watchdog said. Read more: Italian watchdog probes Microsoft over 365 price hike concerns

Editor's pickGovernment & Public Sector
The Parliament Magazine· Yesterday

Europe 2031: The viral AI scenario warning Brussels about Europe’s future

A fictional doomsday scenario by European AI researchers has gone viral in Brussels tech circles, exposing a divide over AI safety, sovereignty and wh...

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.