Sat 27 June 2026
Daily Brief — Curated and contextualised by Best Practice AI
Washington vets OpenAI clients, capital flees India, and engineers debug ghosts
TL;DRThe Trump administration is now mandating government approval for all new customers of OpenAI and Anthropic's most powerful models. Global capital is shifting away from emerging markets like India toward Korea and Taiwan to chase the $110 billion AI ecosystem. Meanwhile, researchers report that prompt-composed agentic systems suffer from 'instruction bleed,' where editing one module silently corrupts others. Microsoft is simultaneously abandoning flat subscriptions for usage-based billing to capture the value of AI-driven labor displacement.
The stories that matter most
Selected and contextualised by the Best Practice AI team
U.S. government will decide who gets to use GPT-5.6
The U.S. government is set to oversee access to OpenAI's latest AI model, GPT-5.6, as part of new regulatory oversight.
AI Ecosystem Revenue Hits $110 Billion, Outpacing Historical IT Growth Cycles
The AI ecosystem has generated $110 billion in revenue, growing three times faster than previous mobile or internet waves. While enterprise adoption is scaling, executive sentiment remains heavily tied to successful AI integration.
How a Niche Technology Became a Choke Point for A.I.
Advanced chip packaging, which boosts computing power for artificial intelligence, has made the United States more reliant on Taiwan than ever.
AlgoEvolve: LLM-driven Meta-evolution of Algorithmic Trading Programs
arXiv:2606.26173v1 Announce Type: new Abstract: Recent work shows that Large Language Models (LLMs) can act as semantic mutation operators for the evolutionary discovery of programs and proofs. Most current applications focus on static coding benchmarks. We extend this paradigm to algorithmic trading. This domain is uniquely challenging because it is noisy, non-stationary, and highly discontinuous. We present AlgoEvolve, an LLM-driven evolutionary framework that generates, evaluates, and iteratively improves executable trading strategies. These strategies are expressed as Python code and evaluated through a rigorous testing protocol. Across multiple experiments, the system exhibits emergent regime-adaptive strategy logic, including autonomous shifts in trading rules. We further introduce a meta-evolutionary outer loop that evolves the prompts guiding program synthesis in the inner loop. This outer loop discovers improved search heuristics. These heuristics balance exploration and exploitation while reducing zero-trade failures. They consistently outperform initial human-designed instructions. The results demonstrate that LLM-based semantic evolution provides a viable approach for continual program synthesis in complex environments.
“From Subscriptions to Usage-Based Billing”: After Anthropic and OpenAI, Microsoft Revamps Pricing as the Era of ‘AI Maxxing’ Nears | The Economy
The pricing structure for generative artificial intelligence (AI) models is undergoing a fundamental transformation. As more enterprise customers use AI agents under subscription plans costing only a few dozen dollars per month to perform workloads equivalent to those handled by a full-time ...
Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems
arXiv:2606.26298v1 Announce Type: new Abstract: Autonomous AI agents may begin to perform consequential, irreversible actions such as clinical prescribing and production software deployment. This paper observes that human institutions have governed powerful autonomous actors not by monitoring their reasoning but by requiring independently attested evidence at the point of consequential action. We formalise this institutional pattern as a computational governance model for AI agent systems. Under the proposed model, an agent retains full autonomy over planning and reasoning but holds no execution authority over designated high-risk actions. Execution is conditional on preconditions that are each independently attested by a separate authoritative source, cryptographically bound to a declared intent, and evaluated by a deterministic policy. Decisions are recorded in a tamper-evident log amenable to independent re-verification. We present a proof-of-concept implementation and illustrate the model with examples from software deployment and clinical prescribing.
Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems
arXiv:2606.26356v1 Announce Type: new Abstract: Practitioners of prompt-composed agentic systems report a recurring failure mode: editing one prompt module silently shifts the behavior of others despite no shared variable or executable dependency. We formalize this as compositional behavioral leakage (CBL): interference between modules sharing a context window. CBL is enabled by architectural non-isolation: transformer self-attention provides no formal boundary between concatenated modules. We probe CBL on a deployed job-evaluation agent (Claude Sonnet 4.6, 144 trials) through a reusable three-channel protocol that perturbs non-focal modules along volume, content, and form. Only the content channel produces a detectable paired effect (Cohen's d = 0.63, bootstrap 95% CI excluding zero); no recommendation flipped -- a sub-threshold regime invisible to standard QA but compounding across the thousands of decisions a deployed agent makes. CBL is orthogonal to known agent-failure axes (adversarial injection, cognitive degradation, multi-agent fault propagation, privacy leakage). We contribute an operational definition, a reusable protocol, a falsifiable prediction set, and a system-class characterization, establishing cross-module interference measurement as a requirement for prompt-composed agent evaluation.
Foreign investors exit India funds as AI boom redirects capital. Will inflows return? | Stock Market News
Foreign investors pulled money from India-focused funds in the March quarter as capital shifted toward AI-linked markets such as Korea and Taiwan, even as some investors see early signs of macro stabilization in India.
Accelerating Returns and the Qualitative Engine for Science
arXiv:2606.26359v1 Announce Type: new Abstract: Ray Kurzweil described a thesis of accelerating returns, which is the most influential narratives in discussions of technological progress. Its central claim is that advances in multiple technological fields, especially compute, artificial intelligence, brain science, and biotechnology, interact in such a way that progress becomes self-amplifying and approximately exponential. This paper gives a simple mathematical interpretation of that claim and then argues that, even if such acceleration is real, it does not by itself resolve the central problem of scientific discovery. The reason is that accelerating returns apply most naturally to executional and infrastructural capability, whereas genuine discovery often depends on a different capacity: qualitative reasoning about when a current framework is structurally inadequate and what conceptual move is needed next. Recent ARC-AGI-3 results sharpen this distinction: humans solve the benchmark at ceiling, whereas frontier AI systems remain below 1%, indicating that the gap between current AI and human flexible reasoning is still very large. At the same time, Demis Hassabis has emphasized that humans must retain their sense of meaning and what they choose to focus their lives on, a reminder that the future of AI is not only a technical forecast but also a question of what forms of human understanding are worth preserving and transmitting. This paper positions the Qualitative Engine for Science (QES) [3] as a response to that missing capacity. In this view, the Kurzweil theory helps explain why quantitative capability may accelerate, while QES addresses the central problem in scientific discovery that acceleration alone does not solve. Its value does not depend on when AGI arrives, but on the fact that the processes of scientific discovery themselves constitute a form of human wisdom worth preserving, organizing, and making accessible.
AI-Native Firms
This paper explores how AI-native companies are scaling revenue and operations with leaner teams. It highlights how agentic AI lowers the threshold for a minimum viable company.
AI Data Center Water Use Is Not Solved: Nvidia's Cooling Fix Stops at the Walls
AI data center water use remains unsolved despite Nvidia’s June 2026 DSX closed-loop cooling breakthrough, which cuts on-site water consumption to near zero but leaves fossil fuel power plant water demand — roughly 54% of AI’s total projected water footprint through 2050 — entirely ...
The Verification Horizon: No Silver Bullet for Coding Agent Rewards
arXiv:2606.26300v1 Announce Type: new Abstract: A classical intuition holds that verifying a solution is easier than producing one. For today's coding agents, this intuition is being inverted: as foundation models develop stronger reasoning capabilities and engineering harnesses grow more sophisticated, generating complex candidate solutions is no longer difficult -- reliably verifying them has become the harder problem. Every verifier we can build is only a proxy for human intent, never the intent itself. This makes verification subject to a twofold difficulty: first, intent is underspecified by nature, making it inherently hard to faithfully check whether it has been fulfilled; second, during model training, optimization widens the gap between proxy and intent -- manifesting as reward hacking or signal saturation. To address this, we characterize the quality of verification signals along three dimensions -- scalability, faithfulness, and robustness -- and argue that achieving all three simultaneously is the central challenge. We further study four reward constructions: a test verifier for general coding tasks, a rubric verifier for frontend tasks, the user as verifier for real-world agent tasks, and an automated agent verifier for long-horizon tasks. Across different task types and policy capability levels, we conduct in-depth analysis and experiments on the core challenges of reward design and how to more effectively leverage reward signals. Experiments show that targeted verification design can effectively suppress reward hacking, improve task completion quality, and achieve significant gains across multiple internal and public benchmarks. These experiences collectively point to a core observation: no fixed reward function can remain effective as policy capability continues to grow; and verification must co-evolve with the generator.
Economics & Markets
Walmart’s ad deal smartly puts its customers in the shopping basket
Retailer looks to monetise its deep well of consumer data and insights
“From Subscriptions to Usage-Based Billing”: After Anthropic and OpenAI, Microsoft Revamps Pricing as the Era of ‘AI Maxxing’ Nears | The Economy
The pricing structure for generative artificial intelligence (AI) models is undergoing a fundamental transformation. As more enterprise customers use AI agents under subscription plans costing only a few dozen dollars per month to perform workloads equivalent to those handled by a full-time ...
AI Revenue Growth Defies Expectations Amid Sustained Data Center Capital Expenditure
AI revenue growth continues to outpace forecasts, signaling sustained demand for compute infrastructure. This trend suggests that capital expenditure in data centers is increasingly translating into tangible market returns.
AI Ecosystem Revenue Hits $110 Billion, Outpacing Historical IT Growth Cycles
The AI ecosystem has generated $110 billion in revenue, growing three times faster than previous mobile or internet waves. While enterprise adoption is scaling, executive sentiment remains heavily tied to successful AI integration.
Apple, Micron, OpenAI and A.I.’s Rough Summer
Rising memory prices, more expensive iPads and a longer wait for OpenAI to go public: The sector that has driven markets skyward is hitting turbulence.
AI Trade’s Bruising Week Forces Investors to Be More Selective
Adam Dell, founder and CEO at Domain Money joined Bloomberg Businessweek Daily to discuss the intersection of fintech, VC, and early stage startup mechanics and building a tech business. (Source: Bloomberg)
SpaceX bonds sell off days after AI and rocket group’s $25bn debt deal
Yields move towards levels commonly associated with junk-rated companies
S&P 500 notches longest losing streak in 10 months as chipmakers slide
Software stock-led recovery proves fleeting as Wall Street’s main indices fall for fifth consecutive session
NYT: OpenAI mulls delaying IPO over valuation concerns
CEO Sam Altman reportedly does not want OpenAI to be valued at less than $1trn at IPO. Read more: NYT: OpenAI mulls delaying IPO over valuation concerns
Enterprise AI Spending Grows, OpenAI Leads, RBC Reveals - Business Insider
RBC's latest CIO survey finds no AI token panic, no SaaSpocalypse, and surging enterprise AI spending, led by OpenAI.
AI Bubble Fears Hit Global Markets: What India's Macro Upgrade Means | Whalesbook
Global AI stocks face a sell-off amid bubble fears, while India's GDP forecast rises to 6.8%. Here is what investors need to know about this market shift.
India can still build AI winners despite US, China lead, says Accel's Subrata Mitra | Company Business News
Accel is deploying capital across AI, consumer, fintech and manufacturing from its $650 million eighth India fund launched in January 2025.
Onsemi to acquire Synaptics in all-stock deal, gives Synaptics $7bn enterprise value
Onsemi has agreed to acquire Synaptics in an all-stock deal valued at $7 billion, aiming to extend its capabilities into intelligent systems through Synaptics' Edge AI compute franchise.
ON Semiconductor to Acquire Synaptics in $7 Billion Deal to Boost AI Market Reach
ON Semiconductor is set to acquire Synaptics in a $7 billion all-stock transaction to bolster its AI capabilities and expand its market potential to $30 billion by 2030.
Apple Stock Dips As Rising Chip Costs Pressure AI-Driven Growth
Apple shares fall as rising chip costs pressure the AI trade. Analysis of market implications and supply chain challenges for tech investors.
KOSPI plunges as Samsung and SK Hynix lead chipmaker rout amid AI sentiment shift
The company makes high-bandwidth memory chips, the kind that AI data centers consume voraciously. Its ascent to Korea’s most valuable company, leapfrogging Samsung, was a testament to just how much money was flowing into AI hardware bets. Margin debt across the Korean market had reached a record 38.5 trillion won earlier in June. South Korea’s semiconductor sector sits at the heart of the AI supply chain...
AI Semiconductor Boom 2026: Top Chip Stocks Driving the $1.3T Market
The AI semiconductor boom is driven by massive capital expenditure from hyperscale cloud providers building AI infrastructure. Companies like Microsoft, Google, Amazon, and Meta are investing hundreds of billions in data centers for AI training and inference. Bank of America projects the global semiconductor market will reach $1.3 trillion in 2026...
Foreign investors exit India funds as AI boom redirects capital. Will inflows return? | Stock Market News
Foreign investors pulled money from India-focused funds in the March quarter as capital shifted toward AI-linked markets such as Korea and Taiwan, even as some investors see early signs of macro stabilization in India.
Billionaire Jeremy Grantham Predicts A Major Tech Market Correction
For Jeremy Grantham, the euphoria surrounding AI signals a major crash, while bitcoin is doomed to disappear.
As markets deliver positive returns through mid-2026, investors are turning to tech stocks and AI-driven themes to build wealth this year.
Cloud companies plan to spend over $670 billion on AI infrastructure in 2026, driving a historic data center buildout.
AI Investment Boom 2026: $800B Spending Spree Reshaping Markets
While semiconductors and chips ... that AI-related spending will boost true capital expenditure growth by approximately 3.3 percentage points in 2026, making it a genuine macroeconomic force rather than just a sectoral trend....
iCapital Market Pulse: The AI Economy’s Biggest Surprises - iCapital
How AI is reshaping productivity, inflation, and jobs—revealing unexpected economic impacts and shifting market dynamics.
DeepSeek plans hiring spree in escalation of China’s AI talent war
Advertised roles suggest company focused on commercialising frontier research
Divergent Dynamics Between AI Infrastructure Buildout and Actual Market Demand
The current AI infrastructure expansion is not perfectly aligned with end-user demand patterns. This divergence highlights potential risks in capital allocation and market maturity.
Apple’s Vision Pro and Smart Glasses Chief to Join OpenAI
Apple Inc.’s top executive in charge of the Vision Pro headset and the company’s smart glasses efforts is leaving for OpenAI, continuing a streak of high-profile defections to rivals in the artificial intelligence and hardware sectors.
Oracle promises to open up MySQL governance, but the community wants guarantees
Open source advocates remain concerned over lack of binding commitments
OpenAI Taps Ex-Uber Exec Prabhjeet Singh to Spearhead India Growth Amid AI Boom
OpenAI has named Prabhjeet Singh as Managing Director for India, a strategic move to bolster its presence in its second-largest market.
AI: Blip 2.0++, OpenAI IPO now 2027, 'RAMAgeddon' reaches Apple. AI-RTZ #1130
Nadella reframes AI as job reorganization over elimination, and says the industry still has to “earn the social permission.” My take: this is the Frenemies dynamic in full color — Microsoft wants to be far less dependent on a handful of frontier model companies (beyond Nvidia too), and it’s carving out an ‘us vs them’ position with the customers right ahead of the Open AI and Anthropic mega-IPOs.
Adobe's Strategic Expansion of Ecosystem Accessibility
Adobe uses generative AI to expand accessibility for novices while maintaining professional quality standards to compete with tools like Canva and AI startups.
Reuters Tech News | Today's Latest Technology News | Reuters
Senior research scientist John Jumper said on Friday he would leave Google DeepMind to join AI startup Anthropic, the latest high-profile departure at the Big Tech giant's AI lab.
AMD EPYC Venice Could Ship More Units Than NVIDIA Vera CPUs in 2027, Says Morgan Stanley - eTeknix
The race for AI hardware is no longer focused only on GPUs. According to a Morgan Stanley report, AMD could ship more EPYC Venice CPUs than NVIDIA ships Vera CPUs in 2027. The report estimates that AMD will ship around 6.75 million EPYC Venice processors, while NVIDIA is expected to ship about ...
New York Times proposes third amended complaint against OpenAI, Microsoft
The New York Times is requesting to file a third amended copyright complaint against OpenAI and Microsoft, featuring expanded contributory infringement claims regarding AI training services.
Apple, Microsoft raise prices: The AI price shock is here
AI is driving up the price of working and playing.
Amazon Raises Price of a Key AI Cloud Service by 20% on Strong Demand - Business Insider
Amazon Web Services is hiking the price of a popular AI cloud service by 20%. The company had already raised prices 15% in January.
Most companies think they're building a software factory. They're actually just shipping bugs faster.
Industrialized factories changed how the world produced physical goods: more output, lower costs, faster than anything that came before. Now a similar shift is happening with software. LLMs have lowered the barrier to writing code, increased individual output, and pushed organizations to think about software development as a production system. The standard software development lifecycle and CI/CD practices that have held for decades won't hold up under that pressure. That's where the software factory comes in — and like physical factories, it needs more than speed to actually work. The idea of a “software factory” started to solidify over the past year. Luca Rossi's "The Era of the Software Factory" made the case plainly: AI is not just changing how fast people write code — it's changing the whole production system around software. The concept can mean different things: a collection of coding agents and skills files; faster CI/CD; better review systems; or more automation around software delivery. A better frame is to think of it less as a tool category and more as a set of principles. A software factory can't just be a loose collection of prompts, agents, and plugins. It needs a platform that defines how work moves through the system and how code is generated, reviewed, tested, traced, deployed, and improved when something goes wrong. Otherwise all you’re doing is putting yet another one-off machine into an empty room and calling it a factory. Why is this happening now? There are a few forces all hitting at the same time. Companies have always wanted more software than engineers can produce. That’s why tools like Excel exist: They often fill in the gap for a lot of the software that many companies wish they could make. AI has also lowered the barrier of entry to creating code, and this is the part everyone focuses on. Code creation is now easier, though not always cheaper or better, as evidenced by many high-profile companies fretting over their high AI bills. The barrier to writing functional code has effectively collapsed. More importantly, a single engineer can generate more code than they could just a few years ago. That changes the bottleneck: it’s no longer “How fast can someone write this?” or even, in some cases, “Can someone understand how to code?” Instead it becomes, “Should this be written?” More importantly, can we actually create end products that are durable and reliable and don’t just build tech debt? Or are we just putting out more AI slop faster than ever? That’s where the danger lies. The dangers of the modern software factory All of this sounds great. Factories, after all, made production faster and more consistent. They made it possible to build more cars and products, less expensively, which led to more people being able to afford cars and products. Putting environmental impacts aside, you could argue this was positive. But like many things in engineering, there are always tradeoffs, and in this case, there are new risks. When you increase the output of one person with machinery, digital or otherwise, you also increase the mistakes that can be made either by the individual or the machinery. The speed at which code can now be put out is on an industrial scale. Even smaller organizations can suddenly have code bases ballooning up to the size of tech company code bases a decade ago. The data is already showing problems. Faros AI found that while task throughput per developer is up 33.7% and PR merge rate is up 16.2%, the incidents-to-PR ratio has risen 242.7% and bugs per developer are up 54%. Google’s DORA research found that more AI adoption was actually associated with worse delivery stability. As a fractional head of data, I've been brought in to fix these exact issues. In the past year alone, I've worked on two projects where AI-generated data infrastructure slowly started to morph over time. Between multiple engineers trying to move quickly and a lack of standards, these projects became unruly. Code bases tend to go through some level of evolution, but as different styles blend, the LLMs in turn start to create their own mutations. Codebases developed five to six different styles within months — a process that previously took years. Layer by layer, the engineers would slowly stop understanding exactly what was going on. The pattern echoes what happened a decade ago with self-service tooling: early productivity gains that masked downstream complexity. And that’s why the software factory can’t just be about speed. What makes a software factory work There are several key principles to consider when building a software factory. Platform over tools: Many teams are slowly implementing AI into their coding workflows at the edges — adding a PR review agent or a skills file into their repos. But building an actual software factory requires a platform, not a collection of tools at the edges. A platform provides a unified foundation where tools aren't scattered in separate corners. Instead, they actively share data, talk to each other, and work as a single cohesive system — standards, processes, and the work itself all connected. Rerunability and traceability: A real platform requires the ability to go back into any run, identify what went wrong, and rerun it — which is why one-off agents don't make a factory. The system needs to support taking a serial ID, looking it up, and tracing exactly how it got to the output it produced. This is why state machines make more sense than loops for AI workflows: they make it far easier to rerun a process and understand what happened at each step. Safety and guardrails: Factories are not safe places. Neither is a software factory. As more people develop on these platforms, better guardrails and safety measures need to be built in. Testing and quality control need to be pushed to the front of the process — catching bugs at the lowest possible stage reduces the cost to fix them and limits the blast radius. Standardization: At the enterprise level, every codebase has its own flavor. Layering a code assistant on top without standards produces an amalgamation of styles. Standardization has to be built into the process from the start. Quality control: In older manufacturing models, quality control happened at the end of the line. The product was built, inspected, defects found, and fixed later. Toyota's approach was different. Quality was pushed into the process itself — workers were expected to stop the line when something was wrong. The goal wasn't to catch defects at the end; it was to prevent them from flowing downstream in the first place. The same is true for the software factory. QC needs to be baked into the entire process, starting with how the spec is written. That means integrating static code analysis that catches obvious errors and providing templates to LLMs so they know the structure the code should follow. Without that, the bottleneck becomes the final review — or teams just push out more AI slop. Speed without quality isn't productivity Improving the speed of your code output is not actual productivity if the downstream issues aren’t managed. A company is not more productive because it produces millions of cars, only to see them all fall apart within 100 miles. It’s also not more productive if all it does is produce an endless stream of proofs-of-concept that never enter production. Actual productivity is when the software factory takes ephemeral tokens and turns them into durable outputs. It's easy to talk about lines of code and how much faster your team is moving. The software factory that wins isn't the one that generates the most code. It's the one that generates the fewest defects downstream.
Council Post: How AI Is Changing The Economics Of Compliance
When AI can complete a security questionnaire in hours that previously required days of analyst time, the unit economics of a compliance engagement begin to invert.
AI-Native Firms
This paper explores how AI-native companies are scaling revenue and operations with leaner teams. It highlights how agentic AI lowers the threshold for a minimum viable company.
India's Tech Services Key to Global AI Era: Nasscom
India's tech services sector will play a central role in AI era, says Nasscom US CEO Forum. AI services revenue estimated at USD 10-12 billion, with 2 million AI-skilled professionals.
AI for Startups in 2026: What Actually Matters Now | by John Eric | Jun, 2026 | Medium
AI for Startups in 2026: What Actually Matters Now Google Cloud VP warns two AI startup models are dying · YC’s newest batch pivots to “agent supply chain” infrastructure · One Anthropic blog …
DeepSeek to ‘double size of all departments’ as it competes with AI rivals
The hiring push comes amid additional plans to raise significant funds of roughly $7.4bn. Read more: DeepSeek to ‘double size of all departments’ as it competes with AI rivals
Labor, Society & Culture
David Autor named head of the Department of Economics
MIT has appointed economist David Autor as the new head of its Department of Economics.
The AI factory: the rewiring of India's tech industry
Outsourced data services may not be enough for India to thrive in the AI era
Infosys boss says vibe coding is no threat because there’s more to writing software than writing software
Despite warnings of revenue deflation, the chairman predicts AI will create more work rather than less for services organizations.
2026 Layoffs Tracker: Meta, Robinhood, Walmart, and Oracle Lead AI-Driven Job Cuts | eWeek
Tech layoffs are accelerating as companies invest heavily in AI, raising questions about automation, over-hiring, and corporate accountability.
AI unlikely to trigger 'Job Apocalypse', it may create uneven workforce disruption: Goldman Sachs Report - The HinduBusinessLine
Goldman Sachs report suggests AI will reshape labor markets, displacing some jobs but also creating new opportunities over time.
Goldman Sachs Report: AI Won't Cause Job Apocalypse, But Workforce Disruption is Inevitable, ETEnterpriseai
Making AI Work: A new Goldman Sachs report assures that while an 'AI job apocalypse' is unlikely, significant changes in the labor market are expected, with both job displacement and new opportunities arising over time.
RAISE US launches AI workforce initiative | ETIH EdTech News — EdTech Innovation Hub
AI skills and workforce training gain more than $500 million in commitments as Gina Raimondo and Eric Holcomb launch RAISE US. ETIH edtech news covers state pilots, apprenticeships, worker transitions, and support from Microsoft, Amazon, Anthropic, and the OpenAI Foundation.
AI Won't Wipe-Out Entry-Level Cybersecurity Jobs
Instead of eliminating jobs for early-career cyber pros, AI is creating new opportunities for candidates with strong human decision-making skills.
CIO guide: Balancing the board's AI hype and employee pushback
How can CIOs deliver on board and C-suite expectations while employees worry about the impact of AI? Learn how CIOs are measuring employee sentiment.
India's technology services sector will continue to grow in the AI era: Nasscom US CEO Forum
/PRNewswire/ -- The technology services sector in India will continue to remain central to global enterprises transformation in the AI era. AI does not reduce...
Artificial intelligence won't replace humans, only their jobs, says IBM executive | The Manila Times
ARTIFICIAL intelligence may be advancing at a pace that is reshaping entire industries, but it is not displacing the need for human judgment, accountability or domain expertise. Instead, it is reorganizing the structure of work itself, according to Arun Biswas, Global AI and Sustainability ...
Trap is set: Job market is about to get crushed if Labor Department doesn't act
The impact of artificial intelligence on the workforce is more on how the technology shapes their hours, pay, and, by extension, their quality of life.
Council Post: The Workforce Reset: Capability Is The Bottleneck To Organizational Success With AI
For organizations looking to implement AI technology, they need to ensure their employees can use the tools effectively.
Studies: AI skills shield workers from layoffs, West Virginia itself less susceptible | WV News | wvnews.com
CLARKSBURG, W.Va. — The rise of artificial intelligence has also given rise to Luddite-esque fears that the technology will displace and even replace workers in the future.
Who should lead in the age of AI? Canadians remain cautious about handing the C‑suite to algorithms - Digital Journal
Deploying AI to improve decision-making is likely to be widely accepted. Attempting to replace human executives entirely may create reputational risks that outweigh any operational benefits.
Why Does Everyone Hate AI? - Paul Krugman
New technology is often met with a degree of curiosity as well as skepticism. As more Americans incorporate AI into their lives, there are broad concerns about its impact, its speed and whether the government can properly regulate it.
Technology & Infrastructure
OpenFinGym: A Verifiable Multi-Task Gym Environment for Evaluating Quant Agents
arXiv:2606.26350v1 Announce Type: new Abstract: Although large language model agents are increasingly applied to quantitative-finance workflows, their evaluation remains fragmented across isolated tasks, while the financial relevance of benchmark tasks is often overlooked. Yet financial workflows are inherently multi-stage, spanning interdependent tasks such as forecasting, strategy construction, risk management, and trading. Existing platforms typically focus on a single task, and can therefore overstate agent competence and fail to reveal weaknesses in generalization, real-market interaction, and financially meaningful decision-making. We introduce OpenFinGym, a unified gym environment for quantitative-finance agent development that covers forecasting, market generation, real-time trading, and fraud detection under a single execution and verification interface. OpenFinGym additionally provides an automated task-construction pipeline that turns quantitative finance publications into executable task packages; a containerised runtime with a host-side verifier service that supports scalable agent rollouts and prevents runtime train-test leakage; a paper trading engine with a low-latency data-stream design; deferred-resolution support for long-horizon and event-market forecasts; and integration for SFT and RL post-training
AI agents and business process automation beyond the factory floor
Automation is expanding beyond manufacturing into finance, HR, procurement, and customer operations. Learn how AI agents are transforming business process automation and enterprise workflows.
How agents are transforming work
OpenAI discusses the shift from chat-based assistance to long-horizon workflows where agents execute tasks across engineering and knowledge work.
What Business Leaders Need to Know About AI Agent Observability
Explore the role of AI agents in organizations and the critical need for observability to ensure responsible AI adoption.
Know your agent: building the foundation of autonomous commerce | TechRadar
Identity verification and "Know Your Agent" protocols are essential for secure AI commerce
AlgoEvolve: LLM-driven Meta-evolution of Algorithmic Trading Programs
arXiv:2606.26173v1 Announce Type: new Abstract: Recent work shows that Large Language Models (LLMs) can act as semantic mutation operators for the evolutionary discovery of programs and proofs. Most current applications focus on static coding benchmarks. We extend this paradigm to algorithmic trading. This domain is uniquely challenging because it is noisy, non-stationary, and highly discontinuous. We present AlgoEvolve, an LLM-driven evolutionary framework that generates, evaluates, and iteratively improves executable trading strategies. These strategies are expressed as Python code and evaluated through a rigorous testing protocol. Across multiple experiments, the system exhibits emergent regime-adaptive strategy logic, including autonomous shifts in trading rules. We further introduce a meta-evolutionary outer loop that evolves the prompts guiding program synthesis in the inner loop. This outer loop discovers improved search heuristics. These heuristics balance exploration and exploitation while reducing zero-trade failures. They consistently outperform initial human-designed instructions. The results demonstrate that LLM-based semantic evolution provides a viable approach for continual program synthesis in complex environments.
Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems
arXiv:2606.26356v1 Announce Type: new Abstract: Practitioners of prompt-composed agentic systems report a recurring failure mode: editing one prompt module silently shifts the behavior of others despite no shared variable or executable dependency. We formalize this as compositional behavioral leakage (CBL): interference between modules sharing a context window. CBL is enabled by architectural non-isolation: transformer self-attention provides no formal boundary between concatenated modules. We probe CBL on a deployed job-evaluation agent (Claude Sonnet 4.6, 144 trials) through a reusable three-channel protocol that perturbs non-focal modules along volume, content, and form. Only the content channel produces a detectable paired effect (Cohen's d = 0.63, bootstrap 95% CI excluding zero); no recommendation flipped -- a sub-threshold regime invisible to standard QA but compounding across the thousands of decisions a deployed agent makes. CBL is orthogonal to known agent-failure axes (adversarial injection, cognitive degradation, multi-agent fault propagation, privacy leakage). We contribute an operational definition, a reusable protocol, a falsifiable prediction set, and a system-class characterization, establishing cross-module interference measurement as a requirement for prompt-composed agent evaluation.
AI Coding Agents Demonstrate Significant Gains in Engineering Productivity
Recent experiments show AI models completing complex software engineering tasks in hours that previously required weeks of human labor. These results indicate a rapid acceleration in automation capabilities for technical roles.
New agentic memory framework uses 118K tokens per query. LangMem burns through 3.26M.
Long-horizon reasoning exposes a core weakness in AI agents: context windows fill up fast, and retrieval pipelines return noise instead of signal. To solve this, researchers at the National University of Singapore developed MRAgent, a framework that abandons the static "retrieve-then-reason" approach. Instead, it uses a mechanism that allows an agent to dynamically develop its memory based on accumulating evidence. This multi-step memory reconstruction is integrated into the reasoning process of the large language model (LLM). While not the only framework in this space, MRAgent significantly reduces token consumption and runtime costs compared to other agentic memory management approaches. The limits of passive retrieval in long-horizon tasks In classic retrieval pipelines, documents are retrieved through vector search or graph traversal and passed on to an LLM for reasoning. This passive approach fails because it cannot combine reasoning with memory access, creating three major bottlenecks: These systems cannot revise their retrieval strategy mid-reasoning. If an agent fetches a document and discovers a crucial missing cue — a specific date or person — it has no way to issue a new query based on that finding. Fixed similarity scores and predefined graph expansions return surface-level matches that flood the LLM's context window with irrelevant noise, degrading reasoning. Current systems rely heavily on pre-constructed structures such as top-k results and static relevance functions, limiting the flexibility required to scale across unpredictable, long-horizon user interactions. The researchers argue that to overcome these limitations, developers must shift toward an “active and associative reconstruction process,” a concept inspired by cognitive neuroscience. Under this paradigm, memory recall unfolds sequentially rather than operating as a passive read-out of a static database. The system starts with small, specific triggers from the user's prompt, such as a person's name, an action, or a place. These initial hints point to connecting concepts or categories instead of massive blocks of text. By following these metadata stepping stones, the agent gathers small pieces of evidence one by one. It uses each new piece of information to guide its next step until it successfully pieces together the full, accurate story. How MRAgent implements active memory reconstruction Instead of viewing memory as a static database, MRAgent (Memory Reasoning Architecture for LLM Agents) treats it as an interactive environment. When processing a complex query, the agent uses the backbone LLM’s reasoning abilities to explore multiple candidate retrieval paths across a structured memory graph. At each step, the LLM evaluates the intermediate evidence it has gathered and uses it to iteratively optimize its search. It infers new search constraints, pursues the paths with the best information, and prunes irrelevant branches. This allows MRAgent to piece together deeply buried information without filling the LLM’s context with noise. To make this active exploration computationally efficient and scalable, the framework organizes its database using a “Cue-Tag-Content” mechanism. This operates as a multi-layered associative graph with three node types: Cues: Fine-grained keywords, such as entities or contextual attributes extracted from user interactions. Content: The actual stored memory units. These are divided into multi-granular layers, such as episodic memory for concrete events and semantic memory for stable facts and user preferences. Tags: Semantic bridges that summarize the relational associations between specific Cues and Content. This structure enables a highly efficient two-stage retrieval process. The LLM first navigates from Cues to candidate Tags. Because Tags explicitly expose the semantic relationships and structural associations of the data, the agent evaluates these short summaries to judge their relevance. The LLM identifies promising traversal paths and discards irrelevant branches before spending compute and prompt tokens to access the detailed, heavy memory contents. For example, a user might ask an AI agent, "How did Nate use the prize money when he won his third video game tournament?" MRAgent first extracts fine-grained starting cues from the prompt, such as "Nate," "video game tournament," and "win." The agent maps these initial cues to the memory graph and looks at the available associative Tags connected to them. The agent sees tags like "Tournament Victory" and "Tournament Participation.” Since it is only concerned with what the person did after they won the championship, MRAgent drops the tournament participation tag and pursues the victory tag. The agent retrieves the episodic content linked to the chosen Cue-Tag pair, retrieving three distinct memory episodes where Nate won a tournament. MRAgent looks at the three memories, decides one of them in particular is relevant to the query, and discards the other two. With this information, it updates its cues and starts another round of discovery and pruning. From the new episodic memory it has retrieved, the agent adds “tournament earnings” to its cues and uses that to traverse new tags and home in on new memories. It repeats this process until it gathers enough information to answer the query, which could be something like “Nate saved the money.” MRAgent performance on industry benchmarks MRAgent operates alongside several other frameworks addressing agentic memory building. Alternatives include A-MEM, a graph-based agentic memory framework, and MemoryOS, a hierarchical memory framework. Other persistent memory frameworks include LangMem and Mem0. The researchers tested MRAgent on the LoCoMo and LongMemEval industry benchmarks. These test the abilities of agents to resolve queries on long-horizon tasks and conversations across dozens of sessions and hundreds of turns of dialogue. The backbone models used were Gemini 2.5 Flash and Claude Sonnet 4.5. The system was tested against standard RAG, A-MEM, MemoryOS, LangMem, and Mem0. MRAgent consistently outperformed every baseline across both models and all question types by a significant margin. However, for enterprise developers, the most critical metric is often computational cost. In the LongMemEval tests, MRAgent slashed prompt token consumption to just 118k per sample. By comparison, A-Mem consumed 632k tokens, and LangMem burned through 3.26 million tokens per query. MRAgent also effectively halved the runtime compared to A-Mem, dropping from 1,122 seconds to 586 seconds. What makes MRAgent efficient in practice is its on-demand behavior. Evaluating tags and pruning irrelevant paths before retrieval saves money and context space. Furthermore, the system autonomously evaluates its accumulated context and inherently knows when to stop searching, completely avoiding redundant data exploration. Implementation and development catch While MRAgent is highly effective, the Cue-Tag-Content structure needs to be prepared before the agent can query it. Developers must figure out how to architect the underlying memory database to enable the LLM to efficiently navigate associative items and prune irrelevant paths without exploding compute costs. Fortunately, developers do not have to manually label or structure this data. The authors designed MRAgent with an automated distillation pipeline that uses LLMs to process raw interaction histories and automatically populate the memory graph. For a developer, the job is to implement and orchestrate this automated ingestion pipeline, rather than manually tag data. You need to set up a background job or streaming pipeline that passes raw user interactions through prompt templates to extract this metadata before storing it in your graph database. However, the authors emphasize that this is a lightweight construction phase and MRAgent intentionally keeps ingestion simple. The authors have released the code on GitHub.
ZTE showcases practical path to Level-4 autonomous networks through agentic AI and cross-domain innovation at DTW Ignite 2026
PARTNER CONTENT: Highlighting Level-4 autonomous network solutions, TM Forum Excellence Award finalists, and joint operator trials powering cross-domain fault management and Dynamic 5G Slicing
US Electricity Generation Growth Rebounds After Decade of Stagnation
US electricity generation has returned to growth after 15 years of flat performance. This shift is largely attributed to the massive energy requirements of the expanding AI compute infrastructure.
AI Data Center Water Use Is Not Solved: Nvidia's Cooling Fix Stops at the Walls
AI data center water use remains unsolved despite Nvidia’s June 2026 DSX closed-loop cooling breakthrough, which cuts on-site water consumption to near zero but leaves fossil fuel power plant water demand — roughly 54% of AI’s total projected water footprint through 2050 — entirely ...
New Report Finds Out How Much Water AI Really Uses | IBTimes
Researchers say there is no single number that accurately represents AI's water consumption because it varies based on several factors.
Can AI Solve the Sustainability Challenge It Helps Create? The Corporate ESG Strategy | IBTimes UK
AI supports firms in optimising energy, monitoring supply chains, and managing ESG demands, even as its own footprint raises sustainability challenges.
OpenAI and Broadcom’s Jalapeño Is Not an Nvidia Story. It’s a Unit Economics Story. | by Noah Bean | Jun, 2026 | Medium
Spark’s architecture also reveals where agentic AI is heading. Standard GPT-5.3-Codex runs at ~67 tokens per second with a 400,000-token context window for long-running autonomous tasks. Spark trades context depth for speed: 128,000-token window, 1,000+ tokens per second, designed for real-time pair programming where latency matters more than context length.
IBM stacks up a sub-nanometer chip future
IBM has showcased a new process node that it claims can scale down to 1 Angstrom.
Meet Micron, the under-the-radar chipmaker that just reported a 346% sales surge and helped stop a global AI selloff
The company behind the memory chips powering AI started in a dentist’s basement in Boise.
Micron overtakes Meta, Tesla in market value amid relentless AI infrastructure demand
The Australian airline studied details from nutrition and ergonomics to movement and light.
The AI build-out is driving prices higher for consumers
The race to build up the infrastructure behind artificial intelligence is creating a new source of inflation.
Bitcoin Miners Emerge As Grid Flexibility Tool Amid AI Power Surge, Bloomberg Reports
The rapid expansion of AI infrastructure is placing unprecedented strain on regional power grids, leading to longer interconnection queues and rising electricity prices. Bitcoin miners, which have historically been criticized for their energy consumption, are now being recognized for the grid services they can provide. By participating in demand ...
Capgemini: Utilities Cannot Predict the Energy Demand of AI | Energy Digital
New research from Capgemini reveals ... forecast the energy demand created by the rapid expansion of AI-driven data centres. The report found that 77% of utilities are struggling to predict this demand. Electricity consumption from AI training and inferencing is set to rise ...
MacBooks Are the Latest Victim of the Memory Shortage. Here's Why Laptop Prices Keep Rising | PCMag
Memory costs have exploded in 2026, leading Apple and other computing giants to jack up laptop prices. Against that backdrop, here’s how to save on a new PC this year.
How a Niche Technology Became a Choke Point for A.I.
Advanced chip packaging, which boosts computing power for artificial intelligence, has made the United States more reliant on Taiwan than ever.
Amazon Commits Additional $13B to India AI and Cloud Infrastructure Through 2030
Amazon has announced a further $13 billion investment in India’s AI and cloud infrastructure, bringing its total committed spending in the country to $48
KAYTUS Launches Gigawatt-Scale Prefab AI Data Center Solution, Slashes Deployment Time by 60%
KAYTUS has launched an innovative gigawatt-scale AI data center solution at ISC 2026, offering rapid deployment with prefabricated modules.
AI Infrastructure Expansion: Energy, Investment & Local Impact
This gap in legislation has led ... growing demand for advanced cooling technologies, such as liquid cooling, which can reduce noise levels but significantly increase the cost of construction. As the industry matures, the focus is shifting toward a more integrated approach that considers infrastructure as a vital public utility. Experts suggest that the future of AI will depend on the ability of governments and private sectors to coordinate on energy grids, water ...
Jiangsu's first AI-powered 10 Gbps all-optical campus network launched at Southeast University
PARTNER CONTENT: Integrating 50G-PON, FTTR-B, Wi-Fi 7, and intelligent AI scheduling to deliver 10 Gbps bidirectional speeds with ultra-low 0.1ms latency across Southeast University
Physical AI and Infrastructure
Excitement around AI software and large language models (LLMs) remains high in 2026. However, the emerging focus shifted toward infrastructure.
AI to ROI: News & Analysis - June 26, 2026
Jalapeño does not perform AI model training. Open AI still depends on Nvidia GPUs for that. But inference is where Open AI spends the most compute at scale, serving hundreds of millions of requests for ChatGPT and Codex daily.
Building Europe's AI Factories: How Nebius, Azur Datacenters and Inflect Are Reshaping Northern France's Digital Infrastructure
Image http www abnewswire com pressreleases wp content uploads 2026 06 2026 After two years dominated by investment announcements sovereign AI initiatives and multi billion dollar capacity plans attention across the industry is shifting decisively towards execution The question is ...
Orbital Data Centers Aren't Ridiculous, But They Won't Save Us From Earth's AI Infrastructure Crunch | eWeek
Orbital data centers may eventually support AI and space-native workloads, but launch costs, cooling, maintenance, and timing keep Earth central for now.
Anthropic’s Mythos 5 AI Model Cleared by US for Wider Use
Anthropic PBC won US approval to restore some access to its powerful Mythos 5 artificial intelligence model, after resolving Trump administration concerns about the technology’s potential threats to national security.
Accelerating Returns and the Qualitative Engine for Science
arXiv:2606.26359v1 Announce Type: new Abstract: Ray Kurzweil described a thesis of accelerating returns, which is the most influential narratives in discussions of technological progress. Its central claim is that advances in multiple technological fields, especially compute, artificial intelligence, brain science, and biotechnology, interact in such a way that progress becomes self-amplifying and approximately exponential. This paper gives a simple mathematical interpretation of that claim and then argues that, even if such acceleration is real, it does not by itself resolve the central problem of scientific discovery. The reason is that accelerating returns apply most naturally to executional and infrastructural capability, whereas genuine discovery often depends on a different capacity: qualitative reasoning about when a current framework is structurally inadequate and what conceptual move is needed next. Recent ARC-AGI-3 results sharpen this distinction: humans solve the benchmark at ceiling, whereas frontier AI systems remain below 1%, indicating that the gap between current AI and human flexible reasoning is still very large. At the same time, Demis Hassabis has emphasized that humans must retain their sense of meaning and what they choose to focus their lives on, a reminder that the future of AI is not only a technical forecast but also a question of what forms of human understanding are worth preserving and transmitting. This paper positions the Qualitative Engine for Science (QES) [3] as a response to that missing capacity. In this view, the Kurzweil theory helps explain why quantitative capability may accelerate, while QES addresses the central problem in scientific discovery that acceleration alone does not solve. Its value does not depend on when AGI arrives, but on the fact that the processes of scientific discovery themselves constitute a form of human wisdom worth preserving, organizing, and making accessible.
OpenAI releases GPT-5.6 to select users vetted by US government
San Francisco-based company announces ‘limited preview’ of new models with powerful cyber security capabilities
Evaluating the robustness and readiness of large frontier models in health AI applications | Nature Medicine
Large frontier models such as GPT-5 and Gemini have demonstrated remarkable performance in a wide range of health application benchmarks. However, underneath the seemingly promising results lie salient growth areas, especially in cutting-edge frontiers such as multimodal reasoning.
Detecting and Controlling Sycophancy with Cascading Linear Features
arXiv:2606.26155v1 Announce Type: new Abstract: Interpreting and controlling model behaviors through activation steering methods requires many pairs of contrastive samples that clearly exhibit desired or undesired behavior. These data pairs determine the degree to which interpretability frameworks can reliably detect model features responsible for a behavior, and therefore the ability to steer models toward or away from such behavior. In this work, we present an iterative data generation pipeline that isolates cascading linear features responsible for a behavior. Specifically, we show how moving beyond simple binary pairs of samples, and instead isolating samples that show degrees of features that scale linearly with behavior, allows for better disentanglement of features. We focus on detecting and steering away from sycophancy -- the tendency of language models to prioritize user validation. We demonstrate that sycophancy features discovered through cascading samples form linearly separable subspaces, and allow for selection of model activations that more clearly correspond to the desired behavior than baseline approaches. We also evaluate their ability to enable detection, deterministic scoring, and robust steering, and see that they either match or outperform LLM-as-a-judge and system prompting baselines while providing lower computational demand and more interpretability guarantees. Code & Data: https://cascading-feats.github.io/
What We are Missing in Multimodal LLM Evaluation?
arXiv:2606.26348v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) can process diverse inputs, e.g., text, images, audio, and video, and generate textual responses. While their capabilities have advanced rapidly, evaluation of such models has not kept pace. Most existing evaluation benchmarks are limited to isolated tasks and reveal little about whether a model integrates information across modalities. We examine current means for evaluating MLLMs and review the existing benchmark taxonomy to identify gaps, including temporal-spatial coherence, physical world understanding, multimodal consistency, and selective attention. Addressing these gaps is essential for measuring real progress in multimodal intelligence and exposing capability boundaries.
OpenAI unveils GPT-5.6 Sol, Terra and Luna models — but only accessible to limited preview partners for now, per US Gov
OpenAI is announcing a limited preview of its newest frontier AI model GPT-5.6 family, which comes in three variants: Sol, Terra, and Luna. Sol is for the hardest problems, such as complex coding and security research; Terra is for high-volume business tasks like customer support, internal tools and document analysis; and Luna is for faster, lower-cost everyday work like summarization, drafting and routine automation. Sol and Terra set new high benchmark scores, while Luna performs near GPT-5.5 levels on several tests despite being positioned as the fastest and lowest-cost model in the GPT-5.6 family. However, the models are being made available initially to a narrow set of approximately 20 total organizations, after OpenAI shared the models and release plans with the U.S. government. A general release is planned for "the coming weeks." The staggered release follows an executive order issued by President Donald J. Trump earlier this month on June 2, 2026, which calls upon various federal agencies to collaborate on a process for benchmarking and assessing capabilities of new AI models to ensure they are safe and appropriate for wide release. While this process remains underway (it was said in the order to take 30 days, so July 2), OpenAI says in its release blog post that it "previewed our plans and the models’ capabilities ahead of today’s launch. At [the U.S. government's] request, we are starting with a limited preview for a small group of trusted partners." OpenAI's limited preview release strategy also follows the drastic step taken by the U.S. government to issue an export control order against Anthropic, OpenAI's top U.S. competitor, over jailbreaks found in its most powerful generally released model, Claude Fable 5, to which Anthropic responded by removing any access to the model and its cybersecurity focused counterpart Claude Mythos 5 by public or private parties. (Anthropic had earlier previewed a prior version of the model as "Claude Mythos Preview" to a selected small number of external participants in its cybersecurity research program "Project Glasswing," dating back to April.) Because OpenAI is coordinating its release framework with the White House ahead of a broader public launch, enterprise buyers must navigate a novel landscape of real-time safety interventions, mandatory compliance parameters, and structured token caching systems. How the 3 new GPT-5.6 models differ: Sol vs. Terra vs. Luna The three GPT-5.6 models are designed to address different enterprise needs and performance profiles. Sol is the top-tier option, built for the most demanding tasks such as complex reasoning, extended coding sessions, advanced agent-driven workflows, and security-focused applications. Sol delivers the highest level of capability but comes at the highest price: $5.00 per million input tokens / $30.00 per million output tokens — the same as GPT-5.5 — and OpenAI says it delivers a major performance gain for long-running coding, cybersecurity and agentic tasks. Terra balances strong performance with efficiency. It is intended for large-scale production environments where organizations need reliable results across high volumes of work without the overhead of the most advanced model. It's available for $2.50/$15 per 1M tokens. Luna is the most lightweight and cost-efficient option, optimized for speed and everyday use cases. It is well suited for simpler tasks, routine workflows, and applications where responsiveness and scalability are more important than maximum depth of reasoning, and is the most affordably priced at $1/$6 per million tokens in and out, respectively. Sources with knowledge of OpenAI's inner workings shared with VentureBeat that the new naming scheme was designed to move away from the "nano" and "mini" variants of GPT-5, as these models are not so different in terms of size or raw intelligence, but rather, designed for different distinct use cases. As OpenAI states in its blog post about the new naming scheme: "In this new naming system introduced with GPT‑5.6, the number identifies a model’s generation, while Sol, Terra, and Luna identify durable capability tiers that can advance on their own cadence. Together, the family gives people and developers clearer choices across intelligence, speed, and cost." Also, sources said OpenAI sought to evoke a sense of inspiration by looking to the cosmos and names associated with it. Further, Sol fits well alongside OpenAI's Daybreak opt-in program for organizations interested in using OpenAI models to bolster cyber defense, which is an added bonus. The "Sol" voice style for OpenAI's voice mode on ChatGPT is unrelated, and will likely be renamed. The new GPT-5.6 system card adds another important point for businesses: OpenAI is classifying all three GPT-5.6 models — not just Sol — at its “High” risk level for both cyber and biological/chemical capability, while rating them below that level for AI self-improvement. That means even the cheaper Terra and Luna tiers may carry new governance obligations for companies using them in security, life sciences or other sensitive workflows. Here's how they stack up against the rest of the current leading LLM field in price — note that OpenAI's cheapest option is overall a mid-priced model, and still more expensive than the frontier-level GLM-5.2 VentureBeat Frontier AI Model API Pricing Snapshot Model Input Output Total Cost Source MiMo-V2.5 Flash $0.10 $0.30 $0.40 Xiaomi MiMo deepseek-v4-flash $0.14 $0.28 $0.42 DeepSeek deepseek-v4-pro $0.435 $0.87 $1.305 DeepSeek MiniMax-M3 $0.30 $1.20 $1.50 MiniMax Gemini 3.1 Flash-Lite $0.25 $1.50 $1.75 Google Qwen3.7-Plus $0.40 $1.60 $2.00 Alibaba Cloud MiMo-V2.5 $0.40 $2.00 $2.40 Xiaomi MiMo Grok 4.3 (low context) $1.25 $2.50 $3.75 xAI MiMo-V2.5 Pro (≤256K) $1.00 $3.00 $4.00 Xiaomi MiMo Kimi-K2.6 $0.95 $4.00 $4.95 Moonshot/Kimi GLM-5.2 $1.40 $4.40 $5.80 Z.ai GPT-5.6 Luna $1.00 $6.00 $7.00 OpenAI Grok 4.3 (high context) $2.50 $5.00 $7.50 xAI MiMo-V2.5 Pro (>256K) $2.00 $6.00 $8.00 Xiaomi MiMo Qwen3.7-Max $2.50 $7.50 $10.00 Alibaba Cloud Gemini 3.5 Flash $1.50 $9.00 $10.50 Google Gemini 3.1 Pro Preview (≤200K) $2.00 $12.00 $14.00 Google GPT-5.6 Terra $2.50 $15.00 $17.50 OpenAI GPT-5.4 $2.50 $15.00 $17.50 OpenAI Gemini 3.1 Pro Preview (>200K) $4.00 $18.00 $22.00 Google Claude Opus 4.8 $5.00 $25.00 $30.00 Anthropic GPT-5.5 $5.00 $30.00 $35.00 OpenAI GPT-5.5 Instant (chat-latest) $5.00 $30.00 $35.00 OpenAI Sakana Fugu Ultra (≤272K) $5.00 $30.00 $35.00 Sakana AI GPT-5.6 Sol $5.00 $30.00 $35.00 OpenAI Claude Fable 5 / Claude Mythos 5 $10.00 $50.00 $60.00 Anthropic Technology: deeper reasoning and subagent-based work The main technical change in GPT-5.6 centers on giving the model more time and structure for hard tasks during inference. OpenAI is adding a new max reasoning setting for GPT-5.6 Sol, aimed at problems that require more extended deliberation. OpenAI is also introducing ultra mode, which brings in subagents that can split up and accelerate complex projects, rather than keeping the work inside a single-agent flow. The company’s launch evaluations suggest this approach improves performance on several agent-style tasks. Benchmarks show measurable improvement from GPT-5.5, and new state-of-the-art on TerminalBench 2.1 command-line tasks The GPT-5.6 series demonstrates a clear performance leap over its predecessors across complex reasoning and long-horizon tasks. In command-line automation evaluated on TerminalBench 2.1, both the flagship Sol model and the mid-tier Terra outpace the previous GPT-5.5 benchmark, though notably Sol used the new ultra thinking mode to achieve a record-high score of 91.91% on the benchmark, and the max mode achieved 88.76% — ahead of both GPT-5.5's 83.4% and Claude Mythos 5's 88%. This superiority extends into professional workflows on Agent's Last Exam, where Sol is the sole model to successfully clear the halfway mark for task completion at 50.9% in "code mode," while the everyday Luna tier also manages to narrowly edge out the prior generation's flagship. In quantitative biology and genomics testing, Sol and Terra achieve higher accuracy rates than both GPT-5.5 and GPT-5.4, with Sol explicitly managing these stronger results while consuming fewer tokens. Finally, across cybersecurity evaluations measuring vulnerability research and exploitation, the new models push past prior performance ceilings; Sol reaches significantly higher intended exploit rates as reasoning time scales up and achieves competitive capability caps using a fraction of the output tokens required by older models. On ExploitBench, OpenAI says Sol performs near Mythos Preview while generating roughly one-third as many output tokens. Predictable prompt caching mechanics and a Cerebras speed bump To help enterprises control the unpredictable cost curves of running agentic loops, the GPT-5.6 API introduces a revamped prompt caching protocol. Developers can now implement explicit cache breakpoints, backed by a guaranteed 30-minute minimum cache lifetime. Under this framework, initial cache writes cost 1.25x the model’s standard uncached input rate, while later cache reads receive a 90% discount. In practice, businesses running repeated or similar operations pay more to establish the cache, then much less each time they reuse that cached context during at least the 30-minute minimum cache window. For systems that routinely pass massive context windows or codebase definitions back into the model, this predictability is a critical financial guardrail. Furthermore, for enterprise applications where latency is the primary barrier to adoption, OpenAI is launching GPT-5.6 Sol on Cerebras hardware this July. This infrastructure partnership claims processing speeds of up to 750 tokens per second, targeting specialized enterprise applications requiring real-time, frontier-grade reasoning. Enterprise implications: High security and algorithmic friction For corporate engineering, information security, and compliance teams, the deployment of GPT-5.6 requires a meticulous look at its security architecture. To achieve clearance for release, OpenAI dedicated roughly 700,000 A100e GPU hours solely to automated red-teaming GPT-5.6. This compute was allocated to discovering "universal jailbreaks"—systemic attack vectors designed to bypass safeguards across varied contexts, rather than single-prompt workarounds. OpenAI says it has implemented a multi-layered safeguard stack that operates in real time, putting up intentional operational hurdles for enterprise security teams. Model-level refusals: GPT-5.6 is tuned to reject banned cyber help, including requests that mask malicious intent or attempt jailbreak-style workarounds. Live misuse screening: Separate cyber and biology detectors review generations while they are being produced. Activation-based screening: For Sol and Terra, OpenAI says it is adding activation classifiers that monitor internal model signals during inference. If those systems detect a risky pattern, output streaming can pause while another safety check reviews the content. Luna does not appear to receive that same activation-classifier layer, though it is still covered by other monitoring systems. Reasoning review pauses: When risk appears elevated, generation can stop while a larger reasoning system examines the exchange and surrounding context. If the system classifies the output as disallowed, the answer is blocked before it reaches the endpoint. Because legitimate defensive work—such as code reviews, vulnerability discovery, patch engineering, and defensive testing—frequently utilizes the exact same code primitives as offensive exploits, OpenAI admits that its classifiers may regularly trigger false positives. The system card says OpenAI’s monitoring stack posted 94.8% overall recall on its biology evaluation set and 81.6% overall recall on its cybersecurity evaluation set. Those figures give enterprises a rare quantitative look at the safeguards, but they also show the system is not perfect and may miss some risky cases or block some legitimate work. Persistent flagging can trigger automated account-level reviews across historical conversations to evaluate if an enterprise client is engaging in malicious behavior or standard security research. OpenAI is currently negotiating longer-term enterprise safety compliance controls, including customer-operated safety overrides and privacy-preserving detection mechanisms, to insulate corporate data from manual review pipelines. Importantly, OpenAI notes that under testing, Sol remains optimized for defensive containment rather than offensive deployment. In evaluations running against the Chromium and Firefox codebases, the model successfully isolated bugs and exploitation primitives but was unable to autonomously engineer a functional, full-chain exploit, keeping it safely below the organization's "Cyber Critical" alert threshold. But all three GPT-5.6 models crossed its “High” cyber threshold on internal capture-the-flag testing, with Sol reaching 96.7%, Terra reaching 91.84% and Luna reaching 85.19%. That distinction matters for enterprise security buyers: OpenAI is presenting GPT-5.6 as powerful enough to help automate parts of vulnerability research and exploit analysis, but not yet as a system that can reliably run a complete advanced attack campaign without human direction under the company’s test conditions. The Geopolitics of the phased release The broader rollout of the GPT-5.6 series reflects an escalating entanglement between frontier AI labs and national security protocols. The decision to limit initial access to a small circle of vetted partners whose details are shared with the U.S. government stems from direct coordination regarding the developing cyber Executive Order framework. OpenAI has taken the unusual step of publicly critiquing this sovereign gatekeeping within its official product announcement documentation. The company states plainly: "We don’t believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them." This tension highlights the precarious position of modern tech enterprises. While organizations can leverage unprecedented agentic efficiency and robust defensive patching capabilities via benchmarks like ExploitGym and ExploitBench, they must also accept that access to premier tools remains subject to diplomatic and regulatory authorization.
Refusal Lives Downstream of Persona in Chat Models
arXiv:2606.26161v1 Announce Type: new Abstract: Linear directions in activation space have been identified for both refusal and persona traits in instruction-tuned chat models, but the two have been studied as separate mechanisms. We show they interact: a compliant persona gates refusal. In Qwen2.5-7B-Instruct and Llama-3.1-8B-Instruct, we extract a compliant model-persona direction and a refusal direction and intervene on both. Compliant persona steering suppresses refusal -- in Llama, the refusal rate falls from 97% to 2%. Reintroducing the refusal direction partially restores refusal at late layers but not at early ones. Projecting out the persona direction in a late-layer window restores it to baseline; projecting out a random direction does not. Refusal is therefore gated at the late-layer expression stage, downstream of where it is computed. Treating refusal as a single isolated direction misses its dependence on persona.
Knowledge-augmented Agentic AI for Mental Health Medication Information Seeking
arXiv:2606.26205v1 Announce Type: new Abstract: Patients increasingly seek medication information online, yet safety knowledge for psychiatric drugs is split between regulatory adverse-event records, which are authoritative but abstract, and patient narratives, which are experience-near but unvalidated. Integrating them without conflating evidence and anecdote is especially consequential in psychiatry, where poorly contextualised information can amplify fear, nocebo responses, and non-adherence. Here we develop a provenance-aware, knowledge-graph-based multi-agent framework unifying 466,525 Reddit posts, 60,782 WebMD reviews, and twenty years of U.S. FDA Adverse Event Reporting System records for nine antidepressants. A large-language-model entity-recognition pipeline benchmarked against physician annotations reached highest F1 scores of 0.969 for medications and 0.973 for conditions. The two community platforms were far more concordant with each other (overlap up to a Jaccard similarity of 0.905) than with regulatory reports, indicating that patient-generated data form a partly independent safety signal. For sertraline, many adverse events appeared in community sources hundreds of days before the corresponding FDA date. A Neo4j knowledge graph grounded in ATC-N, ICD-10, and MedDRA vocabularies preserves provenance, keeping every claim traceable and regulatory facts distinct from patient experience. These results establish source-aware integration as a route to more auditable psychiatric medication information, with usefulness and patient benefit to be tested prospectively.
Top AI Models Might Be Confident—Doesn’t Mean They’re Right - Newsweek
“Mostly right is the wrong bar,” Pearl CEO Andy Kurtzig says, as research tests top AI models against professional judgment.
The next big breakthrough will be AIs learning on the job
The people optimistic about this ... research problems in natural language processing collapsed against the flood of compute thrown into LLMs. Yes, these models are 1/1-millionth as sample efficient as humans during training. But training a one-time cost amortized across billions of user sessions. What matters is how smart, general, and sample efficient the model is within a session, and that’s clearly been improving as we do more RL training. AIs are able to ...
Salesforce Audit: AI Answer Engine Failures & Benchmarks
A Salesforce AI Research audit exposes 16 failure modes in AI answer engines like Perplexity & Bing Chat. This study offers crucial benchmarks for understanding
Qwen releases open-source world model that simulates 7 agent environments
Qwen-AgentWorld is a model trained to simulate environments like MCP, Terminal, and Android, allowing agents to train in a sandbox without needing real-world access.
Previewing GPT-5.6 Sol: a next-generation model
OpenAI has provided a preview of its latest model, GPT-5.6 Sol, detailing its capabilities and development.
Accelerating Skill Assessment in Chess: A Drift-Diffusion-Enhanced Elo Rating System
arXiv:2606.26267v1 Announce Type: new Abstract: Rating systems such as Elo serve as the gold standard for matchmaking in competitive chess. However, they inherently suffer from response lag due to their exclusive reliance on match outcomes, neglecting the granular quality of gameplay. Nevertheless, incorporating move-by-move information into rating adjustments presents a significant challenge given the substantial noise and the vastness of the game-state space. To address this, we propose the Drift-Diffusion-Enhanced Elo Rating System (DD-Elo), a novel skill assessment framework inspired by the drift diffusion model (DDM) from cognitive neuroscience. By modeling skill expression as a decision-making process, our model integrates move-level data to capture rapid skill fluctuations. We provide a rigorous mathematical derivation proving that DD-Elo maintains a bounded deviation from the traditional Elo system, ensuring theoretical alignment. Extensive experiments demonstrate that DD-Elo adapts to skill changes faster than Elo. Our findings suggest that DD-Elo offers an explainable, highly responsive, and backward-compatible solution for chess rating ecosystems. The implementation code is publicly available at https://github.com/Aquila-zhou1/DD-Elo .
LLMs help robots understand vague instructions and focus on key details
Researchers at MIT are using large language models to help robots interpret ambiguous human instructions and prioritize relevant environmental details.
AI is changing financial regulation as watchdogs build tools to fight cyber threats - The Times of India
Financial regulators are racing to adopt artificial intelligence (AI) to keep pace with rapidly evolving cyber threats, with watchdogs increasingly developing their own AI-powered supervisory tools to strengthen oversight of banks and digital assets, Reuters reported.
Autonomous security agents need complete data. Here's how to check if yours is ready.
An endpoint agent cannot report its own absence. The 2026 Axonius Actionability Report, conducted with the Ponemon Institute and surveying 662 IT and security professionals, put a number on a gap SOC teams have worked around for years. Across the Axonius customer base, 12.7% of devices in a 298,000-device median inventory are missing their expected security agent. If a device has no agent, no management console shows it. If a CMDB record is stale, no reconciliation flags it. An employee who installed Claude Enterprise outside procurement created a SaaS workspace, identity surface, and API-token footprint that endpoint telemetry alone will not reliably inventory. The coverage percentage on the EDR dashboard is structurally incomplete because the reporting mechanism cannot see what it does not cover. That gap matters more now than it did six months ago. SOC and XDR vendors are pushing more autonomous investigation and remediation into production. Those agents will query the same dashboards, trust the same coverage percentages, and act on the same blind spots human analysts learned to work around. A human analyst second-guesses a 98% coverage number. An autonomous agent treats it as ground truth and moves at machine speed. Three independent signals converged on the same gap Gravitee’s 2026 survey of 900-plus executives found 88% reported confirmed or suspected AI-related incidents, and only 14.4% sent agents live with full security approval. The Axonius/Ponemon report found 52% of respondents would let autonomous agents act on recommendations — while 63% said the underlying data lacks important information. The CSA's Agentic Trust Framework requires verified data governance before agents act on any finding. Mike Riemer, Field CISO at Ivanti, said that known vulnerabilities on Azure’s honeypot networks are now attacked in under 90 seconds. “Traditional security measures continue to work,” Riemer told VentureBeat. The caveat is that those measures only protect what they can see. An EDR agent deployed across 87.3% of the device inventory leaves the remaining 12.7% outside that agent’s telemetry, policy enforcement, and detection logic. Exclusive deployment data quantifies the scale Joe Diamond, CEO of Axonius, told VentureBeat that the average CISO sees roughly 50% of what is actually on the network. “Say 50% of their environment is sitting in dark matter,” Diamond said. “They don’t know what it is, or where it is, or who has access to it, if it’s secure, if it’s not secure.” Deployment data from more than 900 Axonius customers confirms those numbers. TransUnion went from 70% to 99% endpoint coverage after out-of-band verification. Western Union went from 85% to 99% by consolidating data from 38 tools and cutting manual workload by half. Lumen discovered 1.1 million assets, where the CMDB showed 17,000. That translates to roughly 37,000 unmanaged endpoints per organization sitting outside every policy, every patch cycle, and every detection rule. Diamond pointed to Mythos, Anthropic’s frontier reasoning model, as a sign that machine-speed offensive capability will make any unknown asset far riskier than it is today. “People tend to have shiny object syndrome,” he said. “If you didn’t understand what 50% of your environment looked like from a traditional endpoint perspective, and you think you’re going to wind sprint to granular control and governance of AI, your program will fail.” Diamond called the broader AI shift “as big, if not bigger than the internet.” Three approaches compete to close the gap No single architecture solves the visibility problem today. Three approaches compete, each with named tradeoffs security teams should evaluate before procurement. A dedicated integration layer uses bidirectional API adapters to build an always-current inventory. Axonius runs 1,400-plus adapters and now discovers shadow Claude Enterprise installations via its Anthropic adapter (GA June 15). “We created a bidirectional API integration with all the IT systems and all the security controls to build an always up-to-date inventory of what the environment looks like,” Diamond told VentureBeat. Platform-native EDR and XDR intelligence builds richer asset context inside the agent footprint. Depth within the agent footprint is the advantage. The limitation is structural. Platform-native intelligence is bounded by what the agent can see, and the gap the Ponemon report identified lives precisely where that visibility ends. CMDB modernization requires continuous reconciliation against three or more independent telemetry sources. Only 13% of organizations reconcile daily, according to Axonius/Ponemon data. The remaining 87% operate on stale records that feed incorrect prioritization into any automated remediation pipeline. EDR data readiness: Five gates before autonomous remediation Before you let autonomous SOC agents close tickets or quarantine assets, this checklist tells you whether your EDR and asset data is solid enough to trust. It is vendor-agnostic, works with any EDR and CMDB, and gives you five pass/fail gates you can run in a single working session. Risk Area What the data shows Readiness threshold Action to take now Asset inventory delta Ponemon: only 45% consolidate into a single view. Forrester TEI: 150% more assets than previously identified. Lumen: 17K in CMDB vs. 1.1M discovered. Delta ≤10% between discovery, CMDB, and EDR agent count. Delta above 10% blocks automated remediation until reconciled. Run API-based discovery against all segments. Diff against CMDB and EDR console count. Reconcile quarterly minimum. Unmanaged AI services Gravitee: 88% confirmed or suspected AI incidents. Only 14.4% with full security approval. Anthropic adapter (GA June 15) discovers unmanaged Claude Enterprise installations. No high-risk AI services outside approved procurement. Weekly SaaS discovery scans. Unmanaged high-risk instances trigger IR triage before exception review. Deploy SaaS discovery or protocol-level adapters for AI service detection. Automate weekly scans. Route unmanaged instances to IR queue. CMDB record accuracy Ponemon: only 13% reconcile daily (RSAC 2026). Brooks Running: 20% server discrepancy between console and independent discovery. Top remediation barriers: unclear prioritization, unclear ownership, inconsistent data. ≥85% of records validated against 3+ independent telemetry sources. No stale or orphaned records in active remediation queue. Cross-reference CMDB against cloud inventory, EDR telemetry, and IdP directory. Continuous reconciliation replaces annual audit cycles. Endpoint agent coverage gap Ponemon: an agent cannot report its own absence (p. 8). TransUnion: 70% to 99% after out-of-band verification. RSAC 2026: 12.7% of 298K median devices missing expected agent. ≥95% agent coverage verified via out-of-band discovery. Many CISOs set this as the minimum before allowing autonomous remediation. No self-reported-only metrics in board reports. Run network-based or API-driven discovery against managed device list. Coverage below 95% blocks automated remediation scoping. Asset ownership mapping Ponemon: 32% apply tags consistently. Only 51% assign ownership on new exposures (pp. 9, 16). TransUnion: 12K to 190K assets with ownership mapped. Owner assigned within 24 hours. Tags consistent across cloud, EDR, CMDB. Three systems showing three owners = failure. Automate ownership via cloud tags, IdP group membership, or CMDB metadata. Map asset, remediation, and business owner as separate fields. Five questions to ask before allowing autonomous SOC action What independently verifies endpoint-agent coverage outside the EDR console? How does the SOC reconcile conflicts between EDR, CMDB, cloud inventory, IdP, and discovery tools? Can AI agents act on assets with unknown or disputed ownership? Can the system distinguish “not vulnerable” from “not visible”? What data-quality gate blocks autonomous remediation when coverage or ownership falls below threshold? Board-ready risk framing Kayne McGladrey, IEEE Senior Member, has confirmed the pattern across multiple published VentureBeat interviews. The structural gap in self-reported coverage is not new. What is new is that autonomous agents will act on it at machine speed without the institutional workarounds human analysts developed over years of experience. Diamond put the board-level stakes plainly in an April 2026 press statement: “Findings pile up because the data isn’t trusted, ownership isn’t clear, and entire asset classes aren’t even in the picture.” The CSA’s Agentic Trust Framework requires that any agent promoted to a higher autonomy level must pass five gates, including demonstrated accuracy and a security audit. The EU AI Act’s Article 50 transparency obligations take effect August 2, 2026. The May 2026 Digital Omnibus pushed high-risk system obligations to December 2027, but organizations deploying agentic SOC agents on incomplete asset data face immediate operational risk that outpaces any regulatory timeline. The board-ready sentence: Our EDR coverage reports are structurally incomplete because an endpoint agent cannot report its own absence, and we are verifying coverage through out-of-band discovery before deploying autonomous agents that would act on those reports at machine speed. Security director playbook Run out-of-band asset discovery this week. Compare results against your CMDB export and EDR console count. If the delta exceeds 10%, halt automated remediation scoping until the gap is reconciled. Deploy SaaS discovery for AI services. Employees install AI ahead of procurement, ahead of security. Weekly scans are the minimum. Route any unmanaged high-risk instance to your incident response queue for triage before exception review. Map asset ownership to remediation responsibility. Ponemon found only 32% of organizations apply tags consistently. If three systems show three different owners for the same asset, automated remediation has no routing target. Fix the ownership layer before deploying agents that depend on it. Kill self-reported-only coverage metrics. Any risk calculation or board report that relies on EDR console-reported coverage alone is built on data the reporting system cannot verify. Require out-of-band verification for every coverage number that informs a risk decision.
Amazon Q flaw let booby-trapped Git repos execute code, swipe cloud creds
Researchers warn many AI coding assistants now execute commands from project configurations
Miasma campaign poisons 20-plus npm packages, hunts for developer secrets
Microsoft says latest attack targets Leo Platform and RStreams packages, harvesting creds and going after more maintainers
Australia's prudential regulator warns of frontier AI cyber risks
The Australian Prudential Regulation Authority warns that frontier AI models are creating a paradigm shift in cybersecurity risk, urging the financial sector to share intelligence on vulnerabilities.
AI-driven cyber threats reshapes enterprise cybersecurity spending in India | Tech News - Business Standard
AI-Driven Cyber threats: As enterprises adopt AI at scale, cybersecurity spending is shifting from compliance and perimeter defence to identity protection, AI governance, cloud security, and continuous threat monitoring
How big a cybersecurity threat are the latest AI models, really?
New AI models are accelerating the game of cat-and-mouse as cybersecurity experts try to keep ahead of would-be hackers. An AI expert explains the risks.
Five Eyes Warns AI Could Speed Cyberattacks Within Months - TechRepublic
Five Eyes agencies warned AI could accelerate cyberattacks within months, putting pressure on security teams to control prompt injection and phishing risks.
Adoption, Deployment & Impact
Main Street Enterprises Expected to Accelerate AI Adoption in Coming Months
Small and medium-sized enterprises are poised to increase their AI investment. This shift suggests a broadening of AI adoption beyond large-scale technology firms.
Why AI Adoption Is Failing Inside Many Companies
Why many AI adoption efforts fail—and how leaders can build trust, clarity, governance and stronger human capacity around AI.
India's Tech Sector Poised for AI-Driven Transformation and Growth
Nasscom highlights the need for companies to turn AI potential into production value through data readiness and secure deployment.
Johns Hopkins already leads the medical world. On AI, it won’t be easy. - The Banner
Johns Hopkins is investing heavily in AI research and medicine, including training a robot to help in the operating room.
Council Post: Enterprise AI Still Has A Maturity Problem
Sustained success with AI depends on enterprises strengthening governance and security by implementing both from the outset.
Loren Rudd - Arlington, Virginia, United States
"Federal officials from the Library of Congress and the DOE said their agencies are focusing on deploying artificial intelligence (AI) systems that are low risk but deliver measurable value" https://buff.ly/GzbxDYP
How Do Tool-Augmented LLM Agents Perform on Real-World Energy Analytics Tasks?
arXiv:2606.26346v1 Announce Type: new Abstract: Agentic benchmarks have emerged across general-purpose and domain-specific settings, including finance, coding, law, and drug discovery, yet energy-domain evaluations remain largely limited to static knowledge recall. This is a critical gap for a sector that requires live data retrieval, specialized regulatory and market knowledge, and multi-step quantitative reasoning under real-world constraints. We present an empirical study of tool-augmented LLM agents on real-world energy market analytics tasks. Our evaluation environment includes 243 expert-curated problems across three categories: (1) Market Data Retrieval and Analysis, (2) Knowledge Retrieval and Interpretation, and (3) Advanced Quantitative Modeling and Decision Analytics. Tasks include price and demand analysis, tariff impact modeling, asset revenue and returns estimation, hedging strategy analysis, and optimization modeling, with problems spanning multiple difficulty levels. Agents are equipped with a configurable suite of domain tools, including live electricity market APIs for major U.S. ISOs, regulatory docket search, utility tariff databases, asset optimization models, and retrieval-augmented generation over energy market documents. We assess agent responses using a multi-dimensional evaluation protocol that scores approach correctness, answer accuracy, attribute alignment, and source validity, with category-aware routing to match scoring criteria to question type. We evaluate both closed-source and open-source LLMs, providing a comparative analysis of how model capability and domain tooling interact in a high-stakes professional domain. Key artifacts are publicly released to support reproducibility and future research.
Notion kills its Gmail client after AI agents keep humans from troubling inbox
More than half of users now let bots handle email, so service is headed for shutdown
AI Revolutionizes Healthcare: Enhancing Patient Care and Hospital Efficiency with Smart Technology
AI is revolutionizing healthcare by enhancing disease detection, treatment planning, and operational efficiency, ultimately improving patient care.
Semantic Search for AI Agents at Scale: Retrieval and Ranking for LinkedIn’s Hiring Assistant
A production case study from LinkedIn Engineering on using ranking systems and evaluation loops to improve agentic search in hiring.
Supply chain expo: Car chip race intensifies as AI debuts - CGTN
More vehicle AI solutions made their debut at the China International Supply Chain Expo (CISCE) in Beijing, as chipmakers compete for a larger share of China's vast car market.This year's CISCE introduced a dedicated AI zone for the first time, presenting
How AI Is Reshaping Talent Acquisition and Workforce Planning » World Business Outlook
Explore how AI is reshaping hiring by moving beyond keyword matching and enabling recruiters to focus on strategic decision-making.
Hikers lost in Kosciuszko national park rescued within five hours by AI drone
Fire and Rescue NSW uses thermal imaging and a mobile phone red light to quickly locate men who veered off walking track near Jindabyne Get our breaking news email, free app or daily news podcast Two hikers who veered off a walking track in Kosciuszko national park have been found within five hours using a drone powered by artificial intelligence, a first-of-its-kind mission, Fire and Rescue NSW (FRNSW) has said. The two men, aged in their 20s, were reported missing at 7pm on Tuesday evening after they failed to return to a rendezvous point on time. Continue reading...
Is there a business case for AI tools in a small practice? | Medical Economics
The number of doctors using AI tools is rapidly increasing, but for a smaller practice, does the business case make sense?
AI adoption rises, but ROI still lags | ITWeb
A new global report reveals a significant gap between corporate AI effectiveness and expectations.
Surging AI costs could exceed developer salaries by 2028 – analysts say context engineering could be the key to optimizing token consumption | IT Pro
With AI costs rising and enterprises racking up huge bills, engineering leaders need to take drastic measures to limit costs
SAP's Robinson says enterprise AI is beyond baby steps
SAP's David Robinson discusses how more companies are scaling AI from pilot to production, with their sights on real ROI and operational transformation.
Council Post: The Most Expensive Part Of AI Might Not Be The Model
AI deployment strategies need more operational discipline than many companies currently have.
Council Post: How Outcome-Based Contracting Can Enable Successful Enterprise AI Deployments
When a vendor can deliver an AI outcome and charge for that value, they become a true strategic partner and a trusted, outcome-based provider.
Council Post: The Hidden Funnel: Your Dashboard Says You’re Winning But AI May Say Otherwise
As AI increasingly influences consumer decisions, I've been observing a measurement gap and developing a conceptual model to illustrate my observations. The Pre-Funnel Influence Index (PFI) provides a framework for businesses to develop this new metric and can be expressed as a weighted composite index: PFI can serve as a conceptual key performance indicator (KPI) framework for evaluating ...
Geopolitics, Policy & Governance
Opinion | Make It Make Sense: Who is winning the AI battle? - The Washington Post
In this episode of “Make It Make Sense,” James Hohmann talks with Kyle Chan, a foreign policy fellow at the Brookings Institution, about the AI race between the United States and China, how Beijing is deploying the technology and whether fears of a Chinese AGI takeover are just hype.
World Leaders Want American AI — But Fear the Off Switch
Macron and Modi raised alarms at G7 that the U.S. could cut off AI access overnight. The Anthropic blackout just proved the fear is real.
United States Joins China, Europe and Asia in Cloud AI and Quantum Arms Race Driving Next-Gen Computing Revolution - Travel And Tour World
US, China, Europe and Asia escalate AI cloud and quantum computing race, reshaping global infrastructure, innovation and next-gen digital economy.
Pax Silica Expands: 24 Nations Unite for AI, Economic Security, and Advanced Manufacturing
The PaxSilica alliance, spearheaded by the U.S., has expanded to include 24 member countries, with a focus on AI, advanced manufacturing, and economic security.
Europe Is Fed Up and Wants Its Own AI | WIRED
It's a stretch to think that the continent can build a top-tier model, but it has an advantage: Donald Trump.
Taiwan's chip packaging bottleneck keeps U.S. AI supply chain dependent | Prism News
Taiwan will add a new AI packaging plant in Kaohsiung by September 2029, even as U.S.-made chips still have to return there for CoWoS.
Samsung readies $648 billion bet, report says, as AI boom reshapes South Korea
Reuters.com is your online source for the latest Asia news stories and current events, ensuring our readers up to date with any breaking news developments
US announces new Pax Silica initiatives on tech supply chains
US officials announced three new initiatives under the Pax Silica international cooperative, including a pledge by more than two dozen nations to not over-regulate artificial intelligence.
How to win at AI (if you’re not the US or China), with AI minister Kanishka Narayan
Specialisation and research can give the UK leverage
Dublin’s TensorX to partner with Solstice on sovereign European AI
Earlier this week, TensorX raised €8m in a seed funding round, which its founder Shane Morton described as an ‘opening move’ ahead of a much larger build-out. Read more: Dublin’s TensorX to partner with Solstice on sovereign European AI
India's Tech Transformation: Quantum Leap in Security and Innovation
India is asserting itself as a leader in frontier technologies, with significant strides in AI, nuclear, space, and quantum sectors for national security and global competitiveness.
Democracy, Control, or Competitiveness: The AI Trilemma - Social Europe
A new papal encyclical exposes the democratic trilemma shaping how the world governs artificial intelligence.
Anthropic Moves Toward Deal With US to Lift Curbs on AI Models
Anthropic PBC and the Trump administration are moving closer to an agreement that would lift US restrictions on the company’s top two artificial intelligence models after weeks of talks between the two sides over security of the systems, according to people familiar with the matter.
Governing Actions, Not Agents: Institutional Attestation as a Governance Model for Autonomous AI Systems
arXiv:2606.26298v1 Announce Type: new Abstract: Autonomous AI agents may begin to perform consequential, irreversible actions such as clinical prescribing and production software deployment. This paper observes that human institutions have governed powerful autonomous actors not by monitoring their reasoning but by requiring independently attested evidence at the point of consequential action. We formalise this institutional pattern as a computational governance model for AI agent systems. Under the proposed model, an agent retains full autonomy over planning and reasoning but holds no execution authority over designated high-risk actions. Execution is conditional on preconditions that are each independently attested by a separate authoritative source, cryptographically bound to a declared intent, and evaluated by a deterministic policy. Decisions are recorded in a tamper-evident log amenable to independent re-verification. We present a proof-of-concept implementation and illustrate the model with examples from software deployment and clinical prescribing.
Trump administration asks OpenAI to stagger release of new model to vet users
The US Treasury, commerce department and other government offices request limited distribution of GPT 5.6
U.S. Loosens Restrictions on Anthropic’s Mythos A.I. Model
The move de-escalates a clash between the Trump administration and the company over its cutting-edge artificial intelligence systems.
U.S. government will decide who gets to use GPT-5.6
The U.S. government is set to oversee access to OpenAI's latest AI model, GPT-5.6, as part of new regulatory oversight.
OpenAI Limits Release of New Model Under Pressure From US
OpenAI is rolling out a preview version of a more capable new artificial intelligence model to select partners before making it available more widely in the coming weeks, following pressure from the Trump administration to stagger the release.
Trump administration allows some access to Anthropic’s Mythos
Move eases tension with AI lab but unease over Washington’s ad hoc regulatory approach remains
US allows Anthropic to release Mythos AI to 'trusted' US organizations
Eurocommerce, the European retail association whose members include Amazon , H&M, Inditex, and Ikea, is asking EU tech chief Henna Virkkunen to exempt AI -generated advertisements from the bloc's new regulation requiring disclosure of AI use.
OpenAI agrees to stagger rollout of its most powerful model to only Trump-approved customers
It is the second time in a month that a frontier lab’s most powerful model has been held back from general release over fears about cyber capabilities.
OpenAI staggers AI model release after Trump administration request
Sam Altman announces limited preview of GPT 5.6 in move that echoes launch of Anthropic’s Mythos Business live – latest updates OpenAI is staggering the release of its latest AI model after a request from the US government, in a move echoing the launch of Anthropic’s Mythos product. The company behind ChatGPT signalled its dissatisfaction with the move, saying that doing so keeps the best AI tools from “users, developers, enterprises, cyber defenders, and global partners who need them”. Continue reading...
The Need for Transparency in Government AI Safety and Risk Mitigation
Public understanding of government safety concerns regarding frontier AI is essential for firm-level risk management. Increased transparency is required to prepare for potential security implications as open-source models advance.
European Commission lines up Amazon and Microsoft for cloud gatekeeper status
The European Commission is considering designating Amazon and Microsoft as gatekeepers under the Digital Markets Act.
Trump Threatens to Impose 100% Tariff on European Countries Over Tech Taxes
The president claimed the tariffs would override a trade deal with the European Union, which European officials finalized just days ago.
Opinion | What if Trump is right to pump the brakes on the most advanced AI?
Nationalist fervor over beating China biases AI policy toward recklessness — and possible catastrophe.
Agentic Analysis for Agentic Infrastructure: An LLM-Powered Pipeline for Comparative Governance of DAO and Corporate AI Protocols
arXiv:2606.26203v1 Announce Type: new Abstract: As AI agent protocols proliferate, the governance structures shaping their interoperability standards remain empirically underexamined. We introduce an LLM-powered comparative pipeline for large-scale governance discourse analysis, integrating automated annotation, neural topic modeling, and multi-layer network analysis to study socio-technical power structures at scale. We validate it on two contrasting standards for agent interoperability: ERC-8004 (permissionless, on-chain) and Google A2A (corporate-led). Analyzing 4,323 governance participation records, we combine LLM-assisted coding, topic modeling, and multi-layer network analysis to examine how institutional design shapes thematic priorities and community structure. We find that while governance form influences substantive focus, both regimes exhibit comparable levels of participation inequality and community fragmentation. Discourse alignment is denser in the permissionless setting, suggesting that open governance may foster greater thematic convergence despite decentralized participation. These findings illustrate how LLM-assisted methods can advance the empirical study of technology governance, with implications for designing more equitable agentic AI standards. All data and code are openly available.
Google wants AI regulation, but on its own terms
Surely, we can have rules that allow us to continue doing what we're doing
Anthropic and OpenAI waged a $27 million proxy war in a Manhattan congressional race. The winner told them both to get lost
Micah Lasher won the most expensive AI election yet—then used his victory speech to reject both companies and promise to regulate them anyway.
Is the US tightening grip on AI? Trump administration slows GPT-5.6 rollout
The US government has requested a staggered rollout of OpenAI's GPT-5.6 due to national security concerns. Learn how this impacts AI regulation. Read more here.
Breaking: White House Orders OpenAI To Limit GPT 5.6 Release Over Security Reasons
OpenAI agreed to limit GPT-5.6 access after a White House request as U.S. officials navigate AI safety concerns and regulatory uncertainty.
OpenAI to limit early access to GPT-5.6 for some companies
OpenAI is delaying a full launch of its next flagship model, GPT-5.6, and will begin a limited release for selected corporate customers. The Trump administration requested a phased rollout over potential security concerns, with customer access expected to require case-by-case approval during ...
OpenAI to release GPT 5.6 model on staggered basis in face of US regulatory uncertainty
OpenAI CEO Sam Altman told staff that its latest model, GPT 5.6, would be released on a staggered basis, with a small group of entities first gaining preview access to it after approval by the US government. The case highlights the regulatory uncertainty many local AI developers are facing ...
US House committee passes bipartisan AI legislation
The US House Science, Space, and Technology Committee passed a bipartisan package of AI legislation aimed at expanding research access, strengthening the workforce, and bolstering national security.
California AI Job Loss Tracker Launches to Monitor Layoffs Across the Workforce
California launches an AI job loss tracker to monitor AI layoffs, unemployment trends, and workforce changes through a new public dashboard.
California Creates the Nation’s First AI Job-Loss Tracker
The new dashboard is designed to help state leaders monitor AI's impact on employment and respond with targeted workforce policies.
Forum AI CEO on Pitfalls With AI in Politics
Campbell Brown, Co-Founder and CEO of Forum AI, discusses the need for AI companies to open up for scrutiny. She discusses with Romaine Bostick on Bloomberg's "The Close." (Source: Bloomberg)
Google proposes a balanced approach to AI governance in the US | Digital Watch Observatory
A new policy paper from Google proposes independent oversight for frontier AI development in the US.
Italian watchdog probes Microsoft over 365 price hike concerns
Microsoft did not provide consumers with sufficient information to assess the changes, the watchdog said. Read more: Italian watchdog probes Microsoft over 365 price hike concerns
Europe 2031: The viral AI scenario warning Brussels about Europe’s future
A fictional doomsday scenario by European AI researchers has gone viral in Brussels tech circles, exposing a divide over AI safety, sovereignty and wh...
Get the full executive brief
Receive curated insights with practical implications for strategy, operations, and governance.