AI Daily Brief
Healthcare
Latest Intelligence
The latest AI stories, analysis and developments relevant to Healthcare — curated daily by Best Practice AI.
Use Casesfor Healthcare
200 articles
The Download: China’s brain implant ambitions
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. China has approved the world’s first invasive brain-computer chip—here’s what’s next Sitting in the courtyard of his house in China’s Henan province last October, Dong Hui decided to try holding a…
China has approved the world’s first invasive brain-computer chip—here’s what’s next
One day last October, sitting in the courtyard of his house in China’s Henan province, Dong Hui decided to see if he could hold a pen to write. Dong, 39, had sustained spinal cord injuries in a car accident six years earlier that left him paralyzed from the neck down. Slowly but determinedly, he wrote…
Council Post: Why Central AI Governance Committees Are Failing Healthcare—And Their Fix
If health systems, payers and pharma companies want to move from dozens of AI pilots to hundreds of production systems, the manual committee model has to change.
EHRBench: An Automated and Reliable EHR-based Benchmark for Clinical Decision Making with LLMs
arXiv:2605.30637v1 Announce Type: new Abstract: Clinical decision-making (CDM) is central to real-world clinical workflows, where clinicians infer diagnoses, select treatments, or anticipate future health outcomes under incomplete evidence. LLMs are increasingly used to support these decisions due to strong language capabilities, broad biomedical knowledge, and efficiency, yet the reliability of LLMs on real-world clinical decision tasks remains insufficiently understood. To evaluate CDM models, especially LLM-based models, an ideal and practical medical decision benchmark should be constructed via an automated yet reliable pipeline to ensure both scale and quality. Moreover, the grounding of a CDM benchmark in real patient EHRs can better support evaluation on practical CDM tasks that require substantive biomedical knowledge and clinical inference. To fill the gaps, we introduce EHRBench, an automated and reliable EHR-grounded benchmark for evaluating LLM-based clinical decision-making at scale. To ensure scalability and reliability, EHRBench is constructed through an EHR-LLM-KB(knowledge-base) interaction pipeline. For efficiency, we use a specialized LLM to automatically convert encounter-level EHR trajectories into structured templates and deterministically instantiate the templates into QA items. In parallel, we apply systematic KB-based verification and enrichment to filter hallucinated or ambiguous relations and to improve reliability. Using this pipeline, we construct nearly 1M (960,067) QA items spanning three core inference-required clinical decision tasks: diagnosis, treatment, and prognosis. We benchmark more than 30 representative LLMs on EHRBench and provide detailed analyses of performance and robustness. The results show consistent capability trends across settings, further validating the reliability of EHRBench and highlighting actionable gaps toward clinically reliable LLM systems.
AI for Interoperability in Health Care: Philips’s Carla Goulart Peron
In this episode of the Me, Myself, and AI podcast, Philips’s chief medical officer Carla Goulart Peron shares how artificial intelligence is reshaping health care — not by replacing clinicians but by expanding access, improving diagnostics, and freeing doctors to focus more time on patients. Drawing on her experience practicing medicine in Brazil’s strained public […]
Healthcare Mechanisms from Policy-as-Code Search under Strategic Provider Response
arXiv:2605.30680v1 Announce Type: new Abstract: Healthcare mechanisms are inseparable from the strategic provider response they induce: existing healthcare AI benchmarks hold this response fixed and so cannot evaluate mechanisms by the equilibrium they produce. We recast hospital mechanism design as program synthesis for language models: typed, inspectable rule programs are executed and scored by Medi-Sim, a multi-agent simulator with five strategic provider channels (coding, selection, delay, effort, triage). An incentive sweep recovers classical health-economics findings as adjacent regimes -- up-coding and low-complexity-patient selection under profit pressure, and Goodhart-style drift where measured performance becomes anti-correlated with true outcomes -- and a single audit lever exposes pressure migration: closing the coding channel more than doubles low-complexity selection. LLM-guided evolutionary code search over the same rule-program space then synthesizes an inspectable mixed-objective program that eliminates up-coding, halves rejection, and retains most of the profit-oriented baseline's funds.
CVS Health Ventures Leads $40M Investment in H1 AI - DistilINFO Publications
CVS Health Ventures led a $40 million investment in H1 on May 28, 2026, following a successful collaboration that produced an AI model improving provider directory accuracy, with H1's platform serving 85% of the top 20 pharma companies and nine out of ten leading health plans through its Doctor ...
AI chatbots fail medical misinformation test, returning inaccurate and fabricated advice
A new study found that nearly half of the medical advice generated by popular AI chatbots like ChatGPT and Grok is problematic. The chatbots frequently provided incorrect health information, faked scientific references, and refused to admit ignorance.
The future of AI healthcare lies in a solid infrastructure backbone - IT-Online
In most walks of life, AI’s presence can already be felt. In healthcare, the benefits are, quite frankly, mindboggling; AI-powered platforms are unlocking new levels of efficiency and precision across medical practices. By Steven Santini, vice-president for secure power: SSA at Schneider ...
Council Post: AI In Healthcare 2026: The System May Be Broken. Let’s Try To Fix It
Healthcare fails in coordination, not capability. Yet most innovation still targets the wrong layer.
Digital Transformation in Healthcare Market to Hit USD 340 Billion by 2035 Amid AI and Telehealth Expansion – SNS Insider
U.S. market projected to hit USD 99.5 Billion by 2035, while Europe is forecast to reach USD 87.42 Billion as AI-driven clinical workflows and...
Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes
arXiv:2605.28965v1 Announce Type: new Abstract: Linking free-text phenotype descriptions to ontology terms, typically referred to as phenotype annotation, is essential for the cross-study integration of comparative morphological data. This labor intensive process has heavily relied on highly trained human experts, which makes it challenging to scale and thus a key bottleneck. Dahdul et al. (2018) established a Gold Standard (GS) of Entity-Quality (EQ) annotations across seven phylogenetic studies and used it to evaluate three human curators and the Semantic CharaParser NLP tool with ontology-based semantic similarity metrics; they reported that machine-human consistency was significantly lower than inter-curator (human-human) consistency. Here we revisit that benchmark with five frontier hosted LLMs from Anthropic and OpenAI, each operating as an "agentic curator" within a self-contained workspace that supplies the source publication PDF, the same annotation guide used by the original human curators, the four project ontologies (UBERON, PATO, BSPO, GO), and a validation script. Evaluated against the same Gold Standard, every agent fell within the range of inter-curator variability of the three trained human biocurators of the original study; the best performing agents approached but did not reach the best performing human curator. Agents substantially outperformed Semantic CharaParser on all four metrics.
Compute Reality of Artificial Intelligence in Global Health LMICs - ICTworks
AI is not just a model. AI is compute, cloud, chips, data centers, energy, procurement power, cybersecurity, and governance.
K Health and Penn Medicine Partner to Launch Enterprise-Wide Clinical AI Architecture
Penn Medicine enters a multi-year collaboration with K Health to deploy clinical AI agents across its EHR, automating patient intake and reducing wait times.
Council Post: The Hidden Layer Every Healthcare AI Solution Is Missing
In the next wave of healthcare AI, differentiation will turn less on model sophistication and more on the quality and structure of the clinical knowledge beneath it.
AI Prognosis: Where patients and hospitals disagree about AI | STAT
In this edition of AI Prognosis, Brittany Trang takes a look at patients' role in how Stanford Health Care adopts AI tools, and more health AI news.
YC-backed French preventive health platform Lucis raises €17.3 million Series A led by Singular
Lucis, a Paris-based preventive health platform that uses blood biomarker analysis and AI to deliver personalised, science-based health recommendations, has raised €17.1 million ($20 million) in Series A funding. The round was led by Singular, with participation from General Catalyst, Y Combinator, and angels including investors behind Runna, Céline Lazorthes (Resilience), and Manu Lecomte. This […]
Authority Signals in Claude AI Health Citations: A Descriptive Analysis Using the Authority Signals Framework
arXiv:2605.23921v1 Announce Type: new Abstract: This study seeks to determine the authority signals used by Anthropic's Claude AI in its presentation of sources when answering consumer health questions. While there exists a great deal of discourse around the quality of health citations that LLMs produce, there is limited information on the integrity of the sources the citations originate from, and to what extent the sources are, from what health professionals would consider, credible sources. This descriptive cross-sectional study used data from HealthSearchQA, which contains 3,172 consumer health questions curated by Google Research. After exclusions, a final dataset of 3,075 questions yielding 10,038 citations was analyzed. The Authority Signals Framework (Jacques et al., 2026) was applied to examine 10 authority signals across four domains for a disproportionate stratified sample of 542 sources. Established institutional sources accounted for 97.8% of all citations (n = 9,818). Medical Institutions were the most frequently cited organization type (36.5%), followed by Government Resources (31.6%) and Professional Associations (28.4%). Commercial Health Information comprised 2.2% (n = 220). The top 10 organizations accounted for 57.8% of all citations, with Mayo Clinic alone representing 24.7%. Among commercial sources in the focused sample, 86.4% displayed medical review statements, 82.5% used schema markup, and 71.8% had comprehensive content, while traditional institutional sources appeared in Claude's citations with or without these same markers. As Anthropic positions Claude for HIPAA-ready healthcare applications, these findings establish a baseline for Claude's citation behavior and demonstrate the utility of the Authority Signals Framework as a tool for ongoing, cross-platform evaluation of AI-mediated health information.
When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure
arXiv:2605.23932v1 Announce Type: new Abstract: Despite strong medical benchmark accuracy, LLMs can exhibit severe multi-turn sycophancy in clinical dialogue, abandoning initial correct diagnosis under escalating pressure. We propose \textbf{\textsc{Med-Stress}}, a targeted stress test framework that evaluates belief stability under escalating pressure. Across nine frontier large language models (LLMs), we find a clear dissociation between medical knowledge and robustness: high initial diagnostic capability does not imply high belief stability, yielding large knowledge-robustness gaps for several LLMs. To mitigate this failure mode, we propose a lightweight inference-time defense, \textbf{\texttt{RBED}} (\textbf{R}ole-\textbf{B}ased \textbf{E}pistemic \textbf{D}efense), and \textbf{\texttt{R-FT}} (\textbf{R}esilience-oriented \textbf{F}ine-\textbf{T}uning), a training-time approach that internalizes evidence-based resistance to pressure. Experiments show that \textbf{\texttt{R-FT}} nearly eliminates belief change and substantially improves robustness.
AI Health Check: No Governance, No Trust - MedCity News
Will doctors or patients who are burned by one AI solution trust the next one they’re given? Probably not. That’s why every provider rolling out AI tools has to understand this risk and build governance into its development process.
New AI assistant streamlines initial psychiatric consultations for doctors
People often say that seeking psychiatric care can feel intimidating. Patients may feel burdened when they first open up about their emotional distress, while medical staff must accurately understand a patient's extensive history and symptoms within limited consultation time.
What Medicine Taught Us About Fairness and What It Missed: Lessons from Reconsidering Race-Specific Lung Function Reference Algorithms
arXiv:2605.24149v1 Announce Type: new Abstract: Since 2019, medical societies have reconsidered race-specific clinical equations often in parallel to and largely independent from algorithmic fairness research. Focusing on lung function reference algorithms that affect medical care, insurance, and employment for hundreds of millions globally, we analyze the transition from race-specific GLI-2012 to race-averaged GLI-Global through a fairness lens. Drawing on historical context, citation analysis, and quantitative evaluation, we show (i) limited cross-citation between FAccT and clinical guideline revision efforts; (ii) that GLI-Global implicitly encodes assumptions about social determinants of health, behaving as if ~62% of the Black-White gap in FEV1 is exposure-related; and (iii) clinical validation studies operationalized a sufficiency-like fairness criterion long before its formalization in fairness literature, while neglecting foundational results such as the impossibility theorem has led to inefficiencies in clinical research. Overall, our analysis highlights the value of deeper, mutually beneficial engagement between medical and fairness communities and the public to accelerate progress toward equitable healthcare algorithms.
Apple Watchに変革を、Whoopやオーラ台頭でヘルスケアアプリに課題-Power On
健康ウエアラブル端末市場で競争激化、AI時代への対応課題に
Iy\`aw\'oBench: A Benchmark for Evaluating Large Language Model Clinical Triage Accuracy on Undifferentiated Febrile Illness in Nigerian Primary Health Settings
arXiv:2605.23465v1 Announce Type: new Abstract: Background. Undifferentiated febrile illness is the leading cause of primary care outpatient visits in Nigeria, yet no validated benchmark exists for evaluating large language model (LLM) clinical triage reasoning in West African primary health settings. Methods. We introduce Iy\`aw\'oBench v1.0, a dataset of 200 synthetic clinical vignettes across eight febrile illness categories derived from statistical distributions of 1,200 real patient encounters at 19 primary health centres (PHCs) in Oyo State, Nigeria. Six LLMs were evaluated on structured triage classification across two metrics: triage accuracy and safety score. Results. All six models achieved 100% safety scores (95% CI: 96.4-100.0%), never downgrading a critical REFER NOW case to TREAT HERE. Triage accuracy varied substantially: Claude Sonnet (claude-sonnet-4-5) 67.5% (95% CI: 60.8-73.7%), Llama 4 Scout 59.5% (52.5-66.2%), Llama 3.3 70B 43.0% (36.2-50.0%), and Llama 3.1 8B 39.0% (32.4-45.9%). Two models demonstrated near-zero accuracy attributable to structured output non-compliance. Conclusions. Modern LLMs exhibit safe triage behaviour but vary substantially in structured clinical accuracy. Clinically engineered systems with embedded WHO guidelines outperform general-purpose models by up to 28.5 percentage points. Iy\`aw\'oBench provides the first reproducible evaluation framework for LLM clinical decision support in West African primary care.
Irish AI health-tech xWave to create 30 jobs amid €3m funding drive
xWave Technologies has earned more than 20 NHS Trusts contracts in the UK for its diagnostic decision-making platform. Read more: Irish AI health-tech xWave to create 30 jobs amid €3m funding drive
Opportunities and Risks of Generative AI through the Health Information Journey
arXiv:2605.23026v1 Announce Type: new Abstract: Artificial intelligence is fundamentally changing how health content is encountered and acted upon across both the information and healthcare ecosystems. AI systems now generate claims, curate information, interpret symptoms, synthesize evidence, and guide decisions, with significant opportunities and risks for the public. Potential benefits include improvements in access, comprehension, and continuity of care. At the same time, AI can introduce inaccurate or manipulative content that is difficult to distinguish from reliable guidance, and encourage automated decisions that affect care with little transparency or recourse. We introduce a four-stage framework to examine how these opportunities and risks unfold as the public moves through the information environment and into formal healthcare.
Engagement-Optimized Care: When LLMs become Mental Health Infrastructure
arXiv:2605.23787v1 Announce Type: new Abstract: General-purpose LLMs are increasingly functioning as mental health infrastructure due to gaps in care left by provider shortages, inadequate insurance coverage, social isolation, and stigma around formal help-seeking. This shift poses a distinct problem for AI ethics: systems neither designed nor governed as care technologies are being used as such, while their dominant design incentives optimize for engagement rather than user well-being. We present findings from a qualitative, longitudinal study with 18 US-based participants who use general-purpose LLMs for socioemotional support and participated in one or more of our study phases, including initial interviews, a four-week diary study, focus groups, and exit interviews. Participants turned to LLMs because other forms of support were unavailable, unaffordable, socially costly, or inadequate. As they continued to use these systems, design features such as anthropomorphic cues, default validation, persistent responsiveness, and weak disengagement mechanisms shaped their ongoing reliance. Participants described meaningful support alongside dependency, epistemic distortion through one-sided validation, privacy expectations without corresponding legal protection, and continued use despite awareness of these risks. We argue these dynamics reflect a structurally unfair tradeoff: users accept risks because support is otherwise absent, while available systems are optimized to deepen engagement and lack care-based accountability. The paper makes three contributions: it traces the arc through which LLMs become care infrastructure and identifies distinct ethical tensions at each stage, shifts analysis from turn-based exchanges to longitudinal trajectories of use, and argues that accountability belongs at the design and incentive conditions through which these systems become care infrastructure rather than at the output or crisis-response layer.
Finland’s Grundium acquires Denmark’s Visiopharm to build an end-to-end AI precision pathology platform
Grundium, a Tampere-based startup specialising in digital pathology imaging technology, backed by US-based healthcare private equity firm EW Healthcare Partners, has acquired Visiopharm, a Denmark-based provider of AI-driven precision pathology software. The combined business merges complementary capabilities from Grundium’s imaging platform and Visiopharm’s AI-driven precision pathology software, creating an accessible end-to-end solution for diagnostic laboratories, […]
How AI could help fix Kenya's overstretched healthcare system - The Standard Health
Kenya continues to face growing demand for healthcare services alongside persistent shortages of healthcare personnel, particularly in specialised areas of care.
‘You can’t control everything’: the rise in plastic surgeons asked to create ‘AI face’
Growing numbers of people are seeking improbable cosmetic surgery based on chatbots’ recommendations Plastic surgeons are increasingly concerned about the rise of “AI face”, as more and more clients arrive in their offices with unrealistic AI-generated visions of what they want to look like. Dr Nora Nugent, a cosmetic surgeon from Tunbridge Wells, has seen this first hand. Clients have started coming to her office with photos of themselves beautified by AI and a false expectation that those results are achievable with surgery. She is also the president of the British Association of Aesthetic Plastic Surgeons, and says many colleagues are having similar experiences. Continue reading...
Healthcare LLM Benchmarks Are Only as Good as Their Explicit Assumptions
arXiv:2605.22612v1 Announce Type: new Abstract: Benchmarks are necessary for healthcare evaluation, but are not sufficient for predicting deployment performance. Our position is that the evaluation--deployment gap arises not because of poorly designed benchmarks, but from implicit assumptions about how users interact with models that cannot be surfaced from benchmarks alone. To make this precise, we propose a classification of assumptions into two categories: task, which can be tested from conversation data alone, and outcome, which requires outcome data and behavioral studies for testing. Critically, outcome assumptions depend on human behavior, something that even well-designed benchmarks cannot directly observe. To demonstrate the operationality of this framework, we retrospectively analyze a healthcare RCT as a case study and find that the gap naturally separates into task and outcome gaps of roughly equal size. To address this, we make two contributions: first, we propose BenchmarkCards, an artifact that documents assumptions, and second, we propose staged evaluation, a procedure that systematically tests assumptions and evaluates performance.
AI-powered diagnostics: What will the technology look like in 5-10 years? | Medical Economics
AI tools are becoming more prevalent in back office operations, but they are also making inroads on the diagnostic side.
Privacy-by-Design Adaptive Group Assignment for Digital Lifestyle Coaching at Scale
arXiv:2605.20505v1 Announce Type: cross Abstract: Digital lifestyle coaching systems must personalize peer support as user behavior and engagement evolve while preventing personally identifiable information (PII) and sensitive health information from leaking into analytics and AI pipelines. This creates a practical tension: personalization requires longitudinal linkability, while privacy engineering requires minimization, separation, and controlled re-identification. We present PRISM-Coach, a stakeholder-centered architecture and adaptive peer-group assignment method for privacy-preserving lifestyle coaching. PRISM-Coach separates each user into four bounded views: Identity, Operational, Learning, and Coaching, each with distinct access controls and risk profiles. Building on this separation, the system uses vault-based controlled identity restoration, a privacy-constrained contextual bandit to assign users to eligible peer groups under coach-capacity and stability constraints, and a human-in-the-loop coaching assistant that generates de-identified summaries and draft messages without sending raw PII or PHI to external AI services. We instantiate PRISM-Coach in a commercially deployed lifestyle coaching platform and evaluate it using three years of telemetry from approximately 2,800 users and an in-app needs assessment survey. At the population level, daily check-in adherence increases from 0.35 to 0.68, and engagement rises to 1.35 baseline. In a matched 19-week comparison window, the AI-enabled workflow achieves adherence of 0.74 versus 0.48 under static grouping and higher average weight loss: 5.2 kg versus 3.1 kg. Survey results show that 82% report positive perceived benefit, and 92% report increased privacy confidence after transparency disclosures. These results position PRISM-Coach as a practical blueprint for privacy-by-design adaptive learning systems in everyday wellness.
Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models
arXiv:2605.20591v1 Announce Type: cross Abstract: Medical large language models (LLMs), including custom medical GPTs (MedGPTs) and open-source models, are increasingly deployed on web platforms to provide clinical guidance. However, they pose risks of hallucination, policy noncompliance, and unsafe design. We conduct a large-scale assessment of 6,233 MedGPTs, evaluating a stratified sample of 1,500, together with 10 open-source LLMs. We introduce two frameworks: MedGPT-HEval for hallucination detection and an LLM-based pipeline for assessing policy violations and developer intent. Our results show that 25-30% of MedGPTs exhibit low factual accuracy, with bottom- and middle-tier models at highest risk; 33.6-54.3% violate operational thresholds, and 57.06% of Action-enabled models lack adequate privacy disclosures. Compared with open-source models, MedGPTs achieve higher factual accuracy and semantic alignment, though open-source models are more stable. These results reveal systemic gaps in hallucination and compliance, highlighting the need for multi-metric evaluation and stronger safeguards. We release HAA-MedGPT, a structured dataset that supports future research on the safety of web-facing medical LLMs.
Artificial Pancreas Implantables -- How Healthcare Professionals May Deal With DIY Bio Cases
arXiv:2605.20208v1 Announce Type: cross Abstract: Automated insulin delivery (AID) and artificial pancreas systems increasingly serve as safety-critical cyber-physical technologies in clinical care, integrating sensors, algorithms, software, and insulin-delivery hardware to automate a life-sustaining therapy. While regulated commercial systems are supported by formal approval pathways, manufacturer governance, and post-market surveillance, clinicians are also encountering patients who rely on do-it-yourself (DIY) artificial pancreas systems that operate outside conventional regulatory and institutional control structures. This paper examines how routine clinical handling practices intersect with cyberbiosecurity risk across both regulated and DIY AID systems. When insulin delivery systems are fundamentally reconfigured into a bespoke AID system, with the patient-user becoming the primary threat vector by assuming manufacturer-level roles without mandated governance, the entire ecosystem of stakeholders is placed in legal and clinical uncertainty.
60% of Healthcare Firms Use AI for Chatbots | PYMNTS.com
Healthcare’s AI adoption is narrower than other sectors, but the industry is using it where operational strain is most immediate.
The Missing Link in Healthcare AI Adoption: Workforce Readiness | Healthcare IT Today
The following is a guest article by Anupama Shashank, Managing Director & Senior Vice President, Healthcare & Life Sciences at Kyndryl Nearly all healthcare organizations are deploying AI across clinical, operational, and administrative functions, outpacing the global average.
Singapore unveils healthcare AI deals on diabetes, dementia and Bhutan
Singapore announced new cross-border healthcare AI partnerships to support disease detection and diagnostic models in both local and rural Bhutanese hospitals.
XCHANGE ‘26 Attendee Insights Highlight Healthcare AI’s Shift From Adoption to Operational Scale
Hospital leaders attending Xsolis’ XCHANGE ‘26 user conference signaled a broader shift in how orgs are approaching AI, moving from adoption to scale....
Corti's new Symphony for Speech-to-Text model beats OpenAI at medical terminology accuracy, highlighting the value of specialized AI
Today, Copenhagen-based healthcare AI Corti is launching Symphony for Speech-to-Text, a new generation of clinical-grade speech recognition models engineered specifically for real-time dictation, conversational transcription, and batch audio processing — and their accuracy rate is the highest for this specific use case yet recorded. "We are focused on ensuring our AI scribes can be trusted by physicians, medical practitioners and patients...the entire healthcare system," said Andreas Cleve, co-founder and CEO of Corti, in an exclusive video call interview with VentureBeat. The performance data the company is bringing to the table paints a stark picture of the current state of enterprise AI: when it comes to highly regulated, specialized industries, domain-specific models can beat out the foundation model providers. In a newly published research paper, Corti revealed that its new clinical-grade speech models reduced word error rates (WER) by up to 93% when compared against leading generalist speech models and APIs on medical terminology. On English medical terminology, its Symphony for Speech-to-Text achieved a remarkably low 1.4% WER. By comparison, OpenAI’s speech model registered a 17.7% WER, ElevenLabs hit 18.1%, Whisper recorded 17.4%, and Parakeet scored 18.9%. Corti’s announcement serves as a critical inflection point for healthcare builders. While general-purpose APIs like OpenAI’s whisper are sufficient for broad-domain transcription, they frequently stumble over medical acronyms, complex medication dosages, shorthand, and noisy emergency room environments. Symphony for Speech-to-Text aims to solve this by providing developers with a highly specialized, production-grade API designed from the ground up for clinical workflows. The agentic era demands flawless data inputs The launch of Symphony for Speech-to-Text highlights a fundamental shift in how healthcare uses voice technology. For decades, medical speech recognition was primarily about generating a static text document for human doctors to review—a digital replacement for a notepad. But as the healthcare industry hurtles into what technologists call the "agentic era," where autonomous AI agents actively assist in clinical decision-making, EHR navigation, and real-time support, the transcript is no longer the final product. It is the foundational data layer. “Speech has always been one of healthcare’s most important inputs,” Cleve said in a statement provided to VentureBeat. “What is changing is what happens after the words are captured. In the agentic era, speech recognition requires more than simply producing a transcript - we need to give AI systems accurate clinical facts to reason from. If a model mishears a medication, dosage, or symptom, every downstream step becomes less reliable. Symphony for Speech-to-Text gives healthcare builders a speech layer accurate enough to thrive in clinical reality.” This is where the compounding danger of high word error rates comes into play. If a general-purpose AI model hallucinates a transcription—turning "hyperthyroidism" into "hypothyroidism," or misinterpreting a critical medication dosage—every subsequent AI agent relying on that transcript will operate on corrupted data. Corti’s architecture mitigates this risk by producing structured, clinically usable output directly from the API, helping downstream AI applications reason over clean facts rather than messy, unformatted text. Nowhere is this more evident than in Corti’s entity recall benchmarks. Symphony for Speech-to-Text reached an astonishing 98.3% recall rate on formatted clinical entities—such as dosages, measurements, and dates. In contrast, Corti reported that the strongest general-purpose baseline model maxed out at just 44.3% recall for the same entities. For developers building ambient AI documentation tools, that 54% gap is the difference between a tool that saves a physician time and a tool that constitutes a medical liability. Dethroning the industry ldears While Corti’s benchmarks against modern LLM builders like OpenAI and ElevenLabs are striking, the company is also taking aim at legacy medical transcription giants. For years, the gold standard for dedicated clinician dictation has been Dragon Medical One. However, these legacy systems were historically optimized strictly for intentional clinician dictation, not as underlying infrastructure for ambient AI, complex multi-party conversations, or real-time clinical support tools. In evaluations of real-world English medical dictation, Corti achieved a 4.6% WER, outperforming Dragon’s 5.7% (a 19% relative improvement). Furthermore, Corti demonstrated a higher medical term recall than Dragon (93.5% versus 92.9%). By providing this level of accuracy via an API endpoint, Corti is enabling third-party developers, EHR vendors, and virtual care platforms to build their own custom dictation and ambient listening tools that outperform the industry's legacy incumbent. "We want people to build apps atop our models," Cleve said. "The goal is to diffuse the technology as widely as it is needed so it can be as helpful as possible to patients and their doctors and professionals." For Cleve and his co-founders, the mission is a personal one: Cleve's own mother was a healthcare professional attacked by a patient and spent years struggling to recover. He sought to improve healthcare processes as a way of honoring her sacrifice. Solving the healthcare model puzzle The demands of healthcare extend far beyond English-speaking hospitals, and global health systems have historically been underserved by clinical NLP models. Early adopters are already leveraging Corti’s new models in linguistically demanding environments, proving the technology's viability in complex international markets. Switzerland, for instance, requires care delivery across multiple languages—often simultaneously within a single medical institution. It serves as one of the most stringent proving grounds for multilingual medical speech models in the world. Corti’s Symphony models demonstrated massive performance gains in these non-English tests, achieving a 2.4% WER in German (compared to 13.0% for the next-best system) and a 3.9% WER in French (versus 10.6%). “In a clinical conversation, every word matters - a missed medication name, a misheard dosage, or a mistranscribed symptom can change the meaning of an encounter," said Pierre Corboz, Head of Solutions & Business Development at Voicepoint, a Swiss healthcare technology provider, in a statement provided to VentureBeat. "Symphony’s accuracy on clinical terminology gives us the foundation to bring more trusted AI capabilities into clinical workflows with our Voicepoint Xenon platform. When Corti improves the speech layer, the workflows we build together become sharper, safer, and more useful for clinicians in Switzerland.” AI vrticalization and specialization are yielding gains Today’s announcement of Symphony for Speech-to-Text is not an isolated event; it is the culmination of a strategic narrative Corti has been aggressively pushing over the last several weeks. The broader Symphony platform—which powers clinical and administrative applications for a global network of EHR vendors and life sciences organizations—has been systematically proving the defensibility of vertical AI labs against horizontal tech giants. This marks the third major benchmark Corti has released in just six weeks, touching different layers of healthcare AI performance. In April, the company revealed that its Symphony for Medical Coding system outperformed general-purpose models by more than 25% in clinical accuracy benchmarks, tackling one of healthcare’s most notoriously complex workflows. And just last week, Corti announced that its flagship clinical-grade model outscored OpenAI on HealthBench Professional, OpenAI’s own healthcare benchmark. Taken together, these three data points—medical coding, clinical reasoning, and speech-to-text accuracy—illustrate a growing consensus in the enterprise technology sector: generalized models are hitting a ceiling in regulated industries. Models deployed in hospitals must inherently understand complex acronyms, sudden interruptions, medical shorthand, specialty-specific language, and strict compliance constraints. By training specifically on these unique edge cases, vertical AI labs like Corti are building a formidable moat that companies relying solely on API calls to generalized large language models cannot easily cross. Availability and product lineup Developers are clearly taking notice of the performance gap. According to momentum data provided to VentureBeat, Corti is seeing a 30% growth in new sign-ups for its platform in quarter-to-date comparisons, signaling that developers and healthcare builders are actively gravitating toward vertical, clinical-grade models over generalist APIs. Corti, which already serves over 100 million patients annually across major health systems including the UK’s National Health Service (NHS), is positioning Symphony for Speech-to-Text as the default engine for the next generation of healthcare software. It is important to note that Corti is not launching the overarching Symphony platform itself today; rather, Symphony for Speech-to-Text operates as a new, distinct capability within that broader ecosystem, accessible via its own API endpoints. Symphony for Speech-to-Text is generally available starting today. Developers and enterprise architects can access the models via the Corti API console, with full technical documentation available to help integrate the clinical-grade speech layer into their existing applications. In a move toward research transparency, Corti has also published its full research paper detailing its methodology, along with a separate comparison tool designed to support transparent evaluation of medical speech recognition systems across the industry. As the healthcare industry continues its rapid embrace of AI-driven automation, the foundational data layer has never been more critical. Corti’s latest launch is a stark reminder that in the medical field, generic AI simply isn't good enough. The future belongs to the specialists.
Evaluating the Utility of Personal Health Records in Personalized Health AI
arXiv:2605.18937v1 Announce Type: new Abstract: Patient-managed Personal Health Records (PHRs) promises to empower patients to better understand their health; but information in the record is complex, potentially hindering insights. In this study, we assess the potential of large language models (LLMs, Gemini 3.0 Flash) to provide helpful answers to user health queries, when provided clinical data from PHRs as context. A total of 2,257 user queries were drawn from 3 different distributions to represent patient questions: shorter web search queries, longer questions derived from templates of chatbot conversations, and questions patients asked to their healthcare team (patient calls). Queries were matched with de-identified PHRs (from a pool of 1,945). Gemini responses were generated (1) without PHR context; (2) with a basic summary of demographics, conditions, and medications; (3) with full, extensive clinical notes. For evaluation, we leveraged an existing rating framework (SHARP), and developed a new framework for specific error modes when interpreting PHRs. Evaluation was performed using autoraters for the full set, and with clinician ratings for a subset (n=95), with both sets of raters knowing the full PHR context. We see significant improvements in the helpfulness of answers to all question types with PHR data (p < 0.001, paired t-test). We also observe potential gains in safety, accuracy, relevance and personalization of answers. Our PHR evaluation framework further identifies gaps in LLM understanding of particular aspects of complex PHRs, such as temporal disorientation, and rare but meaningful confabulations. These results suggest potential for PHR data to help people with a wide range of user needs; and provide a framework for monitoring for gaps in LLM answers based on PHR context. This study motivates further work to assess and realize potential benefits to users from understanding their health records.
Big Europe and Asian private equity health funds merge to defy AI disruption
Global Healthcare Opportunities and CBC Group say $21bn investment manager will be world’s largest in sector
Comment: What does successful AI adoption look like? - Healthcare Today
Roy Wills, global head of healthcare business and partnerships at Intellias, argues that healthcare’s AI problem is not innovation, it’s implementation.
AI preparedness gap hits frontline industries hardest in 2026 - Outsource Accelerator
Hospitality, healthcare and logistics rank among the industries least prepared for AI workforce disruption in 2026, according to a new analysis.
Generative AI and Two-Tiered Online Mental Health Communities
arXiv:2605.16279v1 Announce Type: new Abstract: Online mental health communities (OMHCs) are tiered platforms that connect patients with licensed counselors through public Q&A forums and paid private consultations. Their two-tier structure creates a strategic dilemma for genAI integration. Conversational agents can provide scalable and timely responses to a broader set of patients, alleviating persistent supply shortages, but their large-scale presence may also reshape counselors' participation in providing nuanced expertise, emotionally sensitive support, and paid consultations, which are central to platform revenue and long-run sustainability. Leveraging a quasi-natural experiment from the integration of a genAI-based conversational agent in a leading OMHC, we examine how AI entry affects counselor participation. Using multiple identification strategies, we find that posting intensity increases significantly after AI integration, while average response length remains unchanged and per-post social recognition declines. Mechanism analyses show that AI improves responsiveness and expands patient engagement, enlarging counselors' opportunity sets, with activity partially reallocated from a nearby non-AI subforum. Counselors respond heterogeneously: intrinsically motivated counselors reduce participation, whereas economically motivated counselors intensify competitive effort. These dynamics generate cross-tier spillovers: inactive counselors experience declines in paid consultations, while those who increase public participation preserve or expand downstream demand. Overall, our findings show that in tiered professional platforms, demand expansion and competitive incentives can outweigh intrinsic crowding-out.
When AI Tells You What You Want to Hear: Sycophantic Behavior of Large Language Models in Dementia Care Settings
arXiv:2605.16288v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used in clinical and care settings. This exploratory study investigates whether LLMs exhibit sycophantic behavior - adapting their responses to social expectation signals rather than maintaining professional quality - in the context of dementia care. Five prompts with systematically increasing confirmatory and authority-related framing (P1 neutral to P5 authority-signaled implementation support) were submitted to four LLMs (GPT-5, Claude Sonnet 4.6, Gemini 3.1 Pro, Mistral Large), each repeated five times (N = 100 responses). Responses were evaluated using an LLM-as-a-Judge methodology against seven nursing-ethical quality criteria (K1-K7) and a tone scale (0-3). All models showed significant negative Spearman correlations between prompt level and response quality (rho ranging from -0.543 to -0.734, all p < 0.01). Mistral Large exhibited the most pronounced effect (rho = -0.734), with mean scores dropping from 6.0/7 at P1 to 0.2/7 at P5. The findings suggest that LLMs pose context-sensitive risks in high-stakes care environments and that prompt framing significantly shapes response quality - a dimension that has received insufficient attention in healthcare AI deployment.
Healthcare AI firm Commure valued at $7 billion, raises $70 million | Reuters
Agentic AI — which can plan, decide and act autonomously rather than just respond to prompts — has become one of venture capital's most sought-after areas, as investors pile into businesses using the technology to streamline operations.
Payer-Led AI Adoption Emerges as Key Theme in Healthcare Technology - TipRanks.com
According to a recent LinkedIn post from Ember, a conversation with Dr. Kevin Stevenson characterizes health care payers as the leading force in AI and technology i...
Council Post: Winning With AI In Healthcare Starts With Choosing The Right Workflows
Simply, the best clinical AI strategies aren’t focused on technology but workflows.
Council Post: Transforming Healthcare Payer Operations With AI Building Blocks
If early AI in healthcare was defined by experimentation, I believe the next phase will be defined by architecture.
Implement AI in the mid-cycle of rev cycle for the biggest return | Healthcare Finance News
ROI shows fairly quickly, and tools can be used right away to advance from simple to more complex cases, says Jeff Francis, CFO and VP for the Methodist Health System.
What Health Care Leaders Have Learned From Deploying AI | AJMC
Experts examine what is working in production, where the guardrails are being tested, and why the most transformative chapter of AI in health care hasn’t started yet.
AI 'Industrial Revolution' Taking Place, Says Boston Children's Chief Medical Officer
Dr. Joan LaRovere, chief medical officer at Boston Children’s Hospital, said that AI can 'bolster' the capabilities of health care providers to help diagnose and treat patients more effectively. Dr. LaRovere said that she believes the moment in time is like the 'industrial revolution' that can revolutionize health care. (Source: Bloomberg)
National AI Doctors Mission Launched In New Delhi: Here Is How It Will Change Indian Healthcare
The National AI Doctors Mission launches in New Delhi. Here is how it will change the Indian healthcare landscape and the challenges that may arise.
Melbourne psychiatrist refuses new patients who don’t consent to AI note-taking
Registration form informs patients that if they do not wish AI to be used, they will need their referring doctor to refer them to a different service provider Get our breaking news email, free app or daily news podcast A Melbourne psychiatrist has refused new patients unless they agree to allow her to use an AI scribe to transcribe the conversations in their sessions. AI-driven note-taking tools are becoming popular within the medical industry – with two in five general practitioners now using such scribes, according to the Royal Australian College of General Practitioners (RACGP). Continue reading...
Q&A: Why pricey AI prototypes are often left on the cutting room floor | Healthcare IT News
Michael Privat, chief data and engineering officer of Availity, offers his perspective on what healthcare organizations need to scale artificial intelligence projects that last – and what it takes to achieve end-to-end AI observability.
UnitedHealth Tracks Workers’ AI Use in Push to Transform Company
UnitedHealth Group Inc. is tracking how often some employees use artificial intelligence tools as part of a push to embed the technology throughout its operations, according to people familiar with the matter.
AI, Robotics Key to Transforming Health Care: BD CEO
Tom Polen, CEO of medical technology company BD, says that in the next decade AI and robotics will transform health care in ways that will make today’s system seem archaic. Polen sits down with Bloomberg’s Caroline Hyde on the sidelines of the Consello Spark Summit. (Source: Bloomberg)
MiniMed Aims to Be 'Self-Driving Car' of Diabetes Care
Diabetes equipment provider MiniMed is aiming to be the “self-driving car” of insulin pumps, says its CEO, Que Dallara. Hot off the heels of the company’s IPO, Dallara speaks with Caroline Hyde on the sidelines of the Consello Spark Summit. (Source: Bloomberg)
Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning
arXiv:2605.14048v1 Announce Type: new Abstract: Masked autoencoders (MAEs) have recently shown promise for self-supervised representation learning of resting-state brain functional connectivity (FC). However, a fundamental question remains unresolved: how should FC matrices be tokenized to align with the intrinsic modular organization of large-scale brain networks? Existing approaches typically adopt region-centric or graph-based schemes that treat FC as structurally homogeneous elements and overlook the large-scale network brain organization. We introduce NERVE (Network-Aware Representations of Brain Functional Connectivity via Bilinear Tokenization), a self-supervised learning framework that redefines FC tokenization by partitioning FC matrices into patches of intra- and inter-network connectivity blocks. Unlike image-based MAE, where fixed-size patches share a common tokenizer, FC patches defined by network pairs are heterogeneous in size and correspond to distinct functional roles. To resolve this problem, NERVE embeds FC patches through a novel structured bilinear factorization. This formulation preserves network identity and reduces parameter complexity from quadratic to linear scaling in the number of networks. We evaluate NERVE across three large-scale developmental cohorts (ABCD, PNC, and CCNP) for behavior and psychopathology prediction. Compared to structurally agnostic MAE variants and graph-based self-supervised baselines, the proposed network-aware formulation yields more stable and transferable representations, particularly in cross-cohort evaluation. Ablation studies confirm that the proposed bilinear network embedding and anatomically grounded parcellation are critical for performance. These findings highlight the importance of incorporating domain-specific structural priors into self-supervised learning for functional connectomics.
Healthcare AI Evaluation Frameworks: Moving Beyond Accuracy to Safety and Fairness
Until evaluation frameworks reflect the realities of the environments they are deployed in – workflow complexity, human behavior, data instability, and system risks – healthcare AI deployments will lack the reliability needed to truly deliver consistent clinical value and outcomes.
How AI Aims to Fix Healthcare Access
Rezilient CEO Dr. Danish Nagda says the healthcare system is at a tipping point. He joins Bloomberg Open Interest to talk about how hybrid “cloud clinics,” employer-driven care, and AI-powered doctors could eliminate long wait times, cut costs, and make switching doctors a thing of the past. (Source: Bloomberg)
Multimodal Hidden Markov Models for Persistent Emotional State Tracking
arXiv:2605.12838v1 Announce Type: new Abstract: Tracking an interpretable emotional arc of a conversation via the sentiment of individual utterances processed as a whole is central to both understanding and guiding communication in applied, especially clinical, conversational contexts. Existing approaches to emotion recognition operate at the utterance level, obscuring the persistent phases that characterize real conversational dynamics. We propose a lightweight framework that models conversational emotion as a sequence of latent emotional regimes using sticky factorial HDP-HMMs over multimodal valence-arousal representations derived from simultaneous video, audio and textual input. We evaluate the quality of regime prediction using LLM-as-a-Judge, geometric, and temporal consistency metrics, demonstrating that the sticky HDP-HMM produces more interpretable regime sequences than the baseline Gaussian HMM at a fraction of the computational cost of LLM-based dialogue state tracking methods. In addition, Question-Answer experiments in a clinical dataset suggest that meaningful emotional phases can reliably be recovered from multimodal valence-arousal trajectories and used to improve the quality of LLM responses in unstable affective regimes via context augmentation. This framework thus opens a path toward interpretable, lightweight, and actionable analysis of conversational emotion dynamics at scale.
WhatsApp Vaccine Discourse (WhaVax): An Expert-Annotated Dataset and Benchmark for Health Misinformation Detection
arXiv:2605.12510v1 Announce Type: cross Abstract: We introduce WhaVax, a new expert-annotated dataset of vaccine-related WhatsApp messages collected from large Brazilian public groups spanning multiple pandemic years. The dataset was constructed through a rigorous, carefully designed pipeline that integrates keyword-based data collection, semantic deduplication to remove near-duplicate content, and a multi-stage annotation protocol conducted by medical specialists. This process produced a high-quality gold-standard corpus, characterized by substantial inter-annotator agreement and strong reliability for downstream analysis. Additionally, we provide a detailed characterization of WhatsApp misinformation, revealing distinctive linguistic, structural, lexical, temporal, and group-level patterns, as well as a meaningful layer of ambiguous cases that reflect the complexity of health discourse in private messaging. We also benchmark classical models, fine-tuned Small Language Models, and zero- or few-shot Large Language Models under realistic data-scarcity constraints, demonstrating that strong embeddings and LLM approaches perform competitively, while domain alignment and data availability remain critical factors. This study provides a rare, high-quality resource to support misinformation research and computational modeling in encrypted communication environments.
Greater Manchester still says no to NHS data platform with Palantir at its heart
Public concern has only grown, says ICB, while evidence of benefits remains thin
OpenEvidence Exits Europe Over Regulatory Rules | Telehealth.org
OpenEvidence exits EU and the UK, highlighting tensions between AI regulation, innovation, and patient safety in digital health.
Half of AI Health Advice Is Wrong—And Seems Just Right - Decrypt
A peer-reviewed audit in BMJ Open found that nearly 50% of health responses from five major AI chatbots were problematic, with fabricated sources and confident delivery.
One in seven in UK prefer consulting AI chatbots to seeing doctor, study finds
Exclusive: Doctors say ‘highly concerning’ poll highlights risk to patients of turning to AI for medical advice One in seven people are using AI chatbots for health advice instead of seeing their GP, a UK study has found. The poll of more than 2,000 people found that – of the 15% turning to chatbots – one in four had done so because of long NHS waiting lists. Continue reading...
Measuring What Matters: Benchmarking Generative, Multimodal, and Agentic AI in Healthcare
arXiv:2605.08445v1 Announce Type: new Abstract: AI models are increasingly deployed in live clinical environments where they must perform reliably across complex, high-stakes workflows that standard training and validation datasets were never designed to capture. Evaluating these systems requires benchmarks: structured combinations of tasks, datasets, and metrics that enable reproducible, comparable measurement of what a model can do. The central challenge in healthcare AI is not performance alone, but the absence of systematic methods to measure reliability, safety, and clinical relevance under real-world conditions. Most existing benchmarks test what a model knows; too few test whether it can perform reliably and without failing across the full complexity of real clinical tasks. Current benchmarks have accumulated through ad hoc dataset construction optimized for narrow task performance: frontier models achieve near-perfect scores on medical licensing examinations, but when evaluated across real clinical tasks, performance degrades sharply, scoring 0.74--0.85 on documentation, 0.61--0.76 on clinical decision support, and only 0.53--0.63 on administrative and workflow tasks \cite{medhelm}. High benchmark scores give a false sense of deployment readiness, and the gap between performance and utility widens precisely as AI systems take on more consequential clinical roles. Without a principled framework for benchmark design, the field cannot determine whether poor clinical performance reflects model limitations or failures in how performance is being measured.
Rotterdam’s Ditto raises €7.6 million to make “what did the doctor say?” easier to answer
Ditto, a Rotterdam-based HealthTech startup that has developed a free app that translates complex medical information into plain language, has raised €7.6 million for its European rollout. The round was led by Heal Capital, with participation from Optiverder and Rubio Impact Ventures. “No patient should have to guess what was just said. We are fundamentally […]
MedThink: Enhancing Diagnostic Accuracy in Small Models via Teacher-Guided Reasoning Correction
arXiv:2605.08094v1 Announce Type: new Abstract: Accurate clinical diagnosis requires extensive domain knowledge and complex clinical reasoning capabilities. Although large language models (LLMs) hold great potential for clinical reasoning, their high computational and memory requirements limit their deployment in resource-constrained environments. Knowledge distillation (KD) can compress LLM capabilities into smaller models, but traditional KD merely transfers superficial answer patterns and fails to preserve the structured reasoning required for reliable diagnosis. To address this, we propose a two-stage distillation framework, MedThink, designed to cultivate robust clinical reasoning in small language models (SLMs). In the first stage, a teacher LLM screens data and injects domain-knowledge explanations to fine-tune a student model, establishing a knowledge foundation. In the second stage, the teacher evaluates the student's errors, generates reasoning chains linking knowledge to correct answers, and refines the student's diagnostic reasoning through a second round of fine-tuning. We evaluate MedThink on general medical benchmarks and a gastroenterology dataset comprising 955 question-answer pairs. Experiments demonstrate that MedThink outperforms six distillation strategies in all benchmarks: achieving an improvement of up to 12.7% over the student baseline in general tasks, and reaching a total top accuracy of 56.4% in gastroenterology evaluation. This indicates that iterative distillation centered on reasoning can significantly enhance the diagnostic accuracy and generalization capabilities of SLMs whilst maintaining computational efficiency. Our code and data are publicly available at https://github.com/destinybird/PrecisionBoost.
Palantir’s access to identifiable NHS England patient data is ‘dangerous’, MPs say
Health service has given US tech firm ‘unlimited access’ to certain data to build integrated platform, according to reports UK politics live – latest updates MPs have warned that an NHS decision to grant Palantir access to identifiable patient information in its plan to use AI to improve the health service is “dangerous” and will fuel public fears that data privacy is not being prioritised. NHS England has allowed staff from the US tech firm and other contractors to access patient data before it has been pseudonymised, despite internal fears of a “risk of loss of public confidence”, the Financial Times reported. Continue reading...
Secai Partners With Mila to Accelerate AI-Powered Healthcare Across North America | Markets Insider
MONTREAL, May 11, 2026 (GLOBE NEWSWIRE) -- Secai, the Montreal-based healthcare AI company behind the Voxira platform, is proud to announce a st...
New AI model spots pancreatic cancer up to 3 years earlier than human doctors in test
A new AI diagnostic model has demonstrated the ability to detect pancreatic cancer significantly earlier than traditional human screening.
Artera Launches AI Service Squads for Tailored Healthcare Solutions
Artera introduced AI Service Squads to integrate custom AI solutions within healthcare providers' operations, enhancing both front and back-office tasks.
ADAPTS: Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms
arXiv:2605.03212v2 Announce Type: new Abstract: Modeling latent clinical constructs from unconstrained clinical interactions is a unique challenge in affective computing. We present ADAPTS (Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms), a framework for automated rating of depression and anxiety severity using a mixture-of-agents LLM architecture. This approach decomposes long-form clinical interviews into symptom-specific reasoning tasks, producing auditable justifications while preserving temporal and speaker alignment. Generalization was evaluated across two independent datasets ($N=204$) with distinct interview structures. On high-discrepancy interviews, automated ratings approximated expert benchmarks ($\text{absolute error}=22$) more closely than original human ratings ($\text{absolute error}=26$). Implementing an ``extended'' protocol that incorporates qualitative clinical conventions significantly stabilized ratings, with absolute agreement reaching $\text{ICC(2,1)} = 0.877$. These findings suggest that the ADAPTS framework enables promising evaluations of psychiatric severity. While the current implementation is purely text-based, the underlying architecture is readily extensible to multimodal inputs, including acoustic and visual features. By approximating expert-level precision in a protocol-agnostic manner, this framework provides a foundation for objective and scalable psychiatric assessment, especially in resource-limited settings.
Are Multimodal LLMs Ready for Clinical Dermatology? A Real-World Evaluation in Dermatology
arXiv:2605.04098v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have demonstrated promise on publicly available dermatology benchmarks. However, benchmark performance may not generalize to real-world dermatologic decision-making. To quantify this benchmark-to-bedside gap, we evaluated four open-weight MLLMs (InternVL-Chat v1.5, LLaVA-Med v1.5, SkinGPT4 and MedGemma-4B-Instruct) and one commercial MLLM (GPT-4.1) across three publicly available dermatology datasets and a retrospective multi-site hospital-based dermatology consultation cohort comprising 5,811 cases and 46,405 clinical images. Models were evaluated on two clinically relevant tasks: differential diagnosis generation and severity-based triage. Diagnostic performance was modest on public datasets and declined substantially in the real-world cohort. On public benchmarks, top-3 diagnostic accuracy reached 26.55% for the best open-weight model and 42.25% for GPT-4.1. On real-world consultation cases using images alone, top-3 diagnostic accuracy fell to 1.50%-13.35% among open-weight models and 24.65% for GPT-4.1. Incorporating clinical context improved performance across all models, increasing top-3 diagnostic accuracy up to 28.75% among open-weight models and 38.93% for GPT-4.1. However, model outputs were highly sensitive to incomplete or erroneous consultation context. For severity-based triage, models achieved moderate sensitivity (above 60%), suggesting potential utility for screening but insufficient reliability for clinical deployment. These findings demonstrate that benchmark performance substantially overestimates the real-world clinical capability of current dermatology MLLMs.
AI and Suicide Prevention: A Cross-Sector Primer
arXiv:2605.04321v1 Announce Type: new Abstract: AI chatbots already function as de facto mental health support tools for millions of people, including people in crisis. Yet, they lack the clinical validation, shared standards, and coordinated oversight that their societal role demands. This primer was developed in conjunction with a multistakeholder workshop hosted by Partnership on AI in 2026, convening AI labs, mental health practitioners, people with lived experience, and policymakers, to provide a common cross-sector reference point for the current state of the field of AI and suicide prevention. It begins with an overview of clinical best practices, then turns to how frontier AI systems (as of winter 2026) detect and respond to suicide and non-suicidal self-injury (NSSI) queries. Together, these provide insight into what it would take to design and implement AI tools that not only better prevent suicide and NSSI, but also promote overall well-being. Drawing on clinical literature, publicly available AI lab policies, an emerging landscape of evaluation frameworks, and conversations with leaders across the AI and mental health fields, we map challenges posed by general-purpose AI chatbots for mental health across model, product, and policy layers, ultimately highlighting priority areas where cross-industry alignment is both urgently needed and achievable.
Evaluating Patient Safety Risks in Generative AI: Development and Validation of a FMECA Framework for Generated Clinical Content
arXiv:2605.04085v1 Announce Type: new Abstract: Objectives: Large language models (LLMs) are increasingly used for clinical text summarization, yet structured methods to assess associated patient safety risks remain limited. Failure Mode, Effects, and Criticality Analysis (FMECA) provides a proactive framework for systematic risk identification but has not been adapted to LLM-generated clinical content. This study aimed to develop and validate a novel FMECA framework for the prospective assessment of patient safety risks in LLM-generated clinical summaries. Materials and Methods: An interdisciplinary expert panel (n = 8) developed a taxonomy of failure modes through literature review and brainstorming. Standard FMECA dimensions (occurrence, severity, detectability) were adapted into 5-point ordinal scales. The framework was applied to 36 discharge summaries from four patients, generated by an open LLM (GPT-OSS 120B) using real-world clinical data from the Geneva University Hospitals. Reviewers independently annotated the summaries across two rounds. Inter-rater reliability was assessed at failure mode, severity and detectability score levels. Usability and content validity were evaluated using an adapted System Usability Scale and structured feedback. Results: The final framework comprised 14 failure modes organized into categories. Inter-rater agreement improved between rounds, reaching moderate-to-substantial agreement for failure mode identification and good agreement for severity and detectability scoring. Usability was rated as good (mean SUS: 79.2/100), with high evaluator confidence. Discussion and Conclusion: This study presents the first FMECA-based framework for systematic patient safety risk assessment of LLM-generated clinical summaries. The framework provides a structured and reproducible method for identifying clinically relevant risks caused by these summaries.
Roche to Buy PathAI for Up to $1.05 Billion to Bolster AI Diagnostics Tools
The deal seeks to bolster the artificial-intelligence offerings of Roche’s diagnostics division and to help accelerate clinical-therapy development.
To Use AI as Dice of Possibilities with Timing Computation
arXiv:2605.01134v1 Announce Type: new Abstract: The dominant noun-based modeling paradigm has fundamentally constrained AI development, precluding any adequate representation of the future as an open temporal dimension. This paper introduces a verb-based paradigm, together with precise definitions of \emph{timing computation} and \emph{possibility}, that enables AI to function as an effective instrument for realizing the grammar of our thought. Applied to longitudinal EHR data from 3,276 breast cancer patients, the framework empirically demonstrates: (1) automatic discovery of clinically significant patient trajectories, and (2) counterfactual timing deduction. Both results are purely data-driven, require no prior domain knowledge, and, to our knowledge, represent the first such demonstrations in the machine learning literature.
EQUITRIAGE: A Fairness Audit of Gender Bias in LLM-Based Emergency Department Triage
arXiv:2605.03998v1 Announce Type: cross Abstract: Emergency department triage assigns patients an acuity score that determines treatment priority, and clinical evidence documents persistent gender disparities in human acuity assessment. As hospitals pilot large language models (LLMs) as triage decision support, a critical question is whether these models reproduce or mitigate known biases. We present EQUITRIAGE, a fairness audit of LLM-based ESI assignment evaluating five models (Gemini-3-Flash, Nemotron-3-Super, DeepSeek-V3.1, Mistral-Small-3.2, GPT-4.1-Nano) across 374,275 evaluations on 18,714 MIMIC-IV-ED vignettes under four prompt strategies. Of 9,368 originals, 9,346 are paired with a gender-swapped counterfactual. All five models produced flip rates above a pre-registered 5% threshold (9.9% to 43.8%). Two showed directional female undertriage (DeepSeek F/M 2.15:1, Gemini 1.34:1); two were near-parity; one had high sensitivity with weak male-direction asymmetry. DeepSeek's directional bias coexisted with a low outcome-linked calibration gap (0.013 against MIMIC-IV admission), a Chouldechova-style dissociation between within-group calibration and between-pair counterfactual invariance. Demographic blinding reduced Gemini's flip rate to 0.5%; an age-preserving blind variant left DeepSeek with residual F/M 1.25, implicating age as a residual channel. Chain-of-thought prompting degraded accuracy for all five models. A two-model ablation reveals opposite underlying mechanisms for the same directional phenotype: in Gemini the signal is emergent in the combined name+gender swap, while in DeepSeek the gender token alone carries it. EQUITRIAGE shows that group parity, counterfactual invariance, and gender calibration are distinct fairness properties, that intervention effectiveness is model-dependent, and that per-model counterfactual auditing should precede clinical deployment.
ClinicBot: A Guideline-Grounded Clinical Chatbot with Prioritized Evidence RAG and Verifiable Citations
arXiv:2605.00846v1 Announce Type: new Abstract: Clinical diagnosis requires answers that are accurate, verifiable, and explicitly grounded in official guidelines. While large language models excel at natural language processing, their tendency to hallucinate undermines their utility in high-stakes medical contexts where precision is essential. Existing retrieval-augmented generation (RAG) systems treat all evidence equally, producing noisy context and generic answers misaligned with clinical practice. We present ClinicBot, an AI system that translates guideline recommendations into trustworthy clinical support through three key advances: (1) structured extraction of clinical guidelines into semantic units (recommendations, tables, definitions, narrative) with explicit provenance, (2) evidence prioritization that ranks content by clinical significance and guideline structure rather than textual similarity, and (3) a web-based interface that presents concise, actionable answers with verifiable evidence. We will demonstrate ClinicBot using diabetes questions from real patients and an additional diabetes risk assessment tool that is faithful to the American Diabetes Association (ADA) Standards of Care in Diabetes (2025). The demonstration will illustrate how semantic knowledge extraction and hierarchical evidence ranking can reliably operate in a multi-agent setting to process complex clinical guidelines at scale.
NHS to close-source hundreds of GitHub repos over AI, security concerns
Healthcare giant's maintainers handed May deadline to enact the change.
Virtual Speech Therapist: A Clinician-in-the-Loop AI Speech Therapy Agent for Personalized and Supervised Therapy
arXiv:2605.01101v1 Announce Type: new Abstract: This paper develops Virtual Speech Therapist (VST), an intelligent agent-based platform that streamlines stuttering assessment and delivers customized therapy planning through automated and adaptive AI-driven workflows. VST integrates state-of-the-art deep learning-based stuttering classification, and multi-agent large language model (LLM) reasoning to support evidence-based clinical decision-making. The VST begins with the acquisition and feature extraction of patient speech samples, followed by robust classification of stuttering types. Building on these outputs, VST initiates an agentic reasoning process in which specialized LLM agents autonomously generate, critique, and iteratively refine individualized therapy plans. A dedicated critic agent evaluates all generated therapy plans to ensure clinical safety, methodological soundness, and alignment with peer-reviewed evidence and established professional guidelines. The resulting output is a comprehensive, patient-specific therapy draft intended for clinician review. Incorporating clinician feedback, the system then produces a finalized therapy plan suitable for patient delivery, thereby maintaining a clinician-in-the-loop paradigm. Experimental evaluation by expert speech therapists confirms that VST consistently generates high-quality, evidence-based therapy recommendations. These findings demonstrate the system's potential to augment clinical workflows, reduce clinician burden, and improve therapeutic outcomes for individuals with speech impairments. An interactive user interface for the proposed system is available online at: https://vocametrix.com/ai/stuttering-therapy-planning-agent , facilitating real-time stuttering assessment and personalized therapy planning.
Swiss startup Moonlight AI raises €2.8 million to turn routine blood and cytology imaging into genomic insights
Moonlight AI, a Swiss startup building image analysis software for clinical-grade diagnostics, has closed a €2.8 million ($3.3 million) Seed funding round. The round was co-led by Lotus One Investment (Singapore), VP Venture Partners (Switzerland), and MEDIN Fund (Tunisia), with participation from N&V Capital (Liechtenstein) and existing investor QAI Ventures (Switzerland). “Our technology enables labs […]
Pennsylvania Sues AI Company Saying Its Chatbots Give Dangerous Medical Advice
The state of Pennsylvania has taken legal action against an AI company over concerns that its chatbots provide harmful medical guidance.
Pa. suit alleging unlicensed medical practice is latest state action against chatbots
Pennsylvania has sued Character Technologies for allegedly practicing medicine without a license through its Character.AI platform.
Character.AI sued by Pa. over alleged doctor impersonation by chatbot
Pennsylvania's Department of State has sued chatbot developer Character.AI, alleging the company misrepresented its companion chatbots as licensed medical professionals.
Validation of an AI-based end-to-end model for prostate pathology using long-term archived routine samples
Artificial intelligence (AI) is becoming a clinical tool for prostate pathology, but generalization across variations in sample preparation and preservation over prolonged time periods remains poorly understood. We evaluated GleasonAI, an end-to-end attention-based multiple instance learning model, on an independent validation cohort comprising 10,366 biopsy cores from 1,028 patients across 14 Swedish regions, using archival diagnostic specimens from the ProMort cohorts collected between 1998-20...
Tailoring AI solutions for health care needs
The AI market is full of big promises of grand transformation. Health care is a prime target for those promises, beset as it is by financial pressures, labor shortages, and the growing burden of caring for an aging population. AI developers are targeting functions that vary widely, from curing cancer and performing surgery to streamlining…
The Paradox of Medical AI Implementation - by Eric Topol
There have been 44 randomized trials for colonoscopy that consistently, and in aggregate, demonstrate a substantial advantage of AI -assist for detecting adenomatous polyps compared with gastroenterologists without AI , yet that has not been made part of standard medical practice.
Flaws in Kenya’s AI-driven health reforms driving up costs for the poorest
Exclusive: amid unrest, President William Ruto promised to give all Kenyans access to healthcare. But the algorithm favours the rich, an investigation has found An AI system used to predict how much Kenyans can afford to pay for access to healthcare, has systemically driven up costs for the poor, an investigation has found. The healthcare system being rolled out across the country, a key electoral promise of President William Ruto, was launched in October 2024 and intended to replace Kenya’s decades-old national insurance system.
AI finds signs of pancreatic cancer before tumors develop
New AI research shows promise in detecting early signs of pancreatic cancer before physical tumors are even present.
A decade after the ‘Godfather of AI’ said radiologists were obsolete, their salaries are up to $571K and demand is growing fast
"As long as AI doesn't make this quantum leap of becoming sort of AGI,” most jobs are going to be reasonably safe, said one economist.
Adoption and Use of LLMs at an Academic Medical Center
arXiv:2602.00074v2 Announce Type: replace Abstract: While large language models (LLMs) can support clinical documentation needs, standalone tools struggle with "workflow friction" from manual data entry. We developed ChatEHR, a system that enables the use of LLMs with the entire patient timeline spanning several years. ChatEHR enables automations - which are static combinations of prompts and data that perform a fixed task - and interactive use in the electronic health record (EHR) via a user interface (UI). The resulting ability to sift through patient medical records for diverse use-cases such as pre-visit chart review, screening for transfer eligibility, monitoring for surgical site infections, and chart abstraction, redefines LLM use as an institutional capability. This system, accessible after user-training, enables continuous monitoring and evaluation of LLM use. In 1.5 years, we built 7 automations and 1075 users have trained to become routine users of the UI, engaging in 23,000 sessions in the first 3 months of launch. For automations, being model-agnostic and accessing multiple types of data was essential for matching specific clinical or administrative tasks with the most appropriate LLM. Benchmark-based evaluations proved insufficient for monitoring and evaluation of the UI, requiring new methods to monitor performance. Generation of summaries was the most frequent task in the UI, with an estimated 0.73 hallucinations and 1.60 inaccuracies per generation. The resulting mix of cost savings, time savings, and revenue growth required a value assessment framework to prioritize work as well as quantify the impact of using LLMs. Initial estimates are $6M savings in the first year of use, without quantifying the benefit of the better care offered. Such a "build-from-within" strategy provides an opportunity for health systems to maintain agency via a vendor-agnostic, internally governed LLM platform.
Healthcare’s AI Agents Aim to Give Doctors Time Back | PYMNTS.com
Healthcare’s next AI test will be whether agents can give doctors and nurses back something far more valuable: time. Across new reports and commentary,
Carlyle Acquires Healthcare RCM Providers Knack and EqualizeRCM
Carlyle Group Inc. has acquired a majority stake in healthcare revenue cycle management firms Knack RCM and EqualizeRCM, it said in a statement Monday, without disclosing terms.
AI outshines doctors in Harvard's ER study
A new study from Harvard indicates that AI models are demonstrating high performance in emergency room settings, potentially outperforming human doctors in certain diagnostic tasks.
AI Rivals Doctors in Emergency Decision-Making, Harvard Study Reveals
AI models now rival doctors in emergency diagnosis accuracy, but experts stress human oversight remains essential for safe clinical decision-making.
Infor's Technology Tackles Healthcare's AI Execution Gap | Healthcare Digital
Infor's new platform tackles industry-specific AI scaling challenges with robust governance and compliance features for healthcare providers
Tailoring AI solutions for health care needs | MIT Technology Review
The AI market is full of big promises of grand transformation. Health care is a prime target for those promises, beset as it is by financial pressures, labor shortages, and the growing burden of caring for an aging population. AI developers are targeting functions that vary widely, from curing ...
Trends in 2026 for healthcare – How is AI making insight-driven patient care a reality?
Get insights into healthcare trends 2026 and how AI and predictive analytics are reshaping patient care and service delivery.
Download XRPH AI: Earn Rewards for Healthy Actions With an AI Healthcare App | by XRP Healthcare | May, 2026 | Medium
With XRPH AI , users can access AI -powered healthcare tools today – and participate in a system designed to reward healthy actions through real usage.
In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors
A recent Harvard study found that AI models outperformed human doctors in making accurate emergency room diagnoses.
AI Co-Clinician for Healthcare
This article from Google DeepMind introduces an AI co-clinician research initiative aimed at supporting doctors with evidence-grounded, supervised AI in healthcare. Our analysts noted the small sample size but found the multimodal clinical reasoning and broader applicability to regulated industries important for AI leaders to monitor.
AI Enhances Medical Diagnostics
AI is enhancing healthcare by supporting diagnostics and decision-making, but not replacing doctors.
Beacon Biosignals is mapping the brain during sleep
Researchers are using AI to analyze brain activity during sleep, providing new insights into neurological health and sleep patterns.
Galway’s Orreco signs up with MLS Innovation Lab
Orreco uses AI, computer vision and biomarker data to optimise athlete performance, predict injury risk and accelerate recovery, according to the company. Read more: Galway’s Orreco signs up with MLS Innovation Lab
Enabling A New Model for Healthcare with AI Co-Clinician
Google DeepMind introduces an AI co-clinician research initiative to support doctors with evidence-grounded, supervised AI, demonstrating potential for regulated industries.
OpenAI’s Big Reset + A.I. in the Doctor’s Office + Talkie, a pre-1930s LLM
Will the rising tide of A.I. adoption lift all boats?
Evaluating TabPFN for Mild Cognitive Impairment to Alzheimer's Disease Conversion in Data Limited Settings
arXiv:2604.27195v1 Announce Type: new Abstract: Accurate prediction of conversion from Mild Cognitive Impairment (MCI) to Alzheimers Diseases (AD) is essential for early intervention, however, developing reliable conversion predictive models is difficult to develop due to limited longitudinal data availability We evaluate TabPFN (Tabular Pre-Trained Foundation Network) against traditional machine learning methods for predicting 3 year MCI to AD conversion using the TADPOLE dataset derived from ADNI. Using multimodal biomarker features extracted from demographics, APOE4, MRI volumes, CSF markers, and PET imaging, we conducted an experimental comparison across varying training set sizes (N=50 to 1000) and models including XGBoost, Random Forest, LightGBM, and Logistic Regression. TabPFN achieved one the highest performance (AUC=0.892), outperforming LightGBM (AUC=0.860) and demonstrating advantages in low data settings. At N=50 training samples, TabPFN maintained strong AUC while the traditional machine learning models struggles at small training samples. These findings demonstrate that foundation models are promising for disease prediction in data limited scenarios, such as Alzheimers diseases.
Opinion | AI-automated prescriptions need safeguards: Responses to readers - The Washington Post
Artificial intelligence can make medical practice more efficient and accessible, but there must be safeguards
Toward Personalized Digital Twins for Cognitive Decline Assessment: A Multimodal, Uncertainty-Aware Framework
arXiv:2604.27217v1 Announce Type: new Abstract: Cognitive decline is highly heterogeneous across individuals, which complicates prognosis, trial design, and treatment planning. We present the Personalized Cognitive Decline Assessment Digital Twin (PCD-DT), a multimodal and uncertainty-aware framework for modeling patient-specific disease trajectories from sparse, noisy, and irregular longitudinal data. The framework combines three methodological components: (1) latent state-space models for individualized temporal dynamics, (2) multimodal fusion for clinical, biomarker, and imaging features, and (3) uncertainty-aware validation and adaptive updating for robust digital twin operation. We also outline how conditional generative models can support data augmentation and stress testing for underrepresented progression patterns. As a preliminary feasibility study, we analyze longitudinal TADPOLE trajectories and show clear separation between cognitively normal and Alzheimer's disease cohorts in ADAS13, ventricle volume, and hippocampal volume over five years. We further conduct a multimodal next-visit prediction ablation using an LSTM sequence model on 3{,}003 visit-pair sequences derived from TADPOLE, where the combined cognitive plus MRI configuration achieves the lowest standardized RMSE for both ADAS13 (0.4419) and ventricle volume (0.5842), outperforming a Last Observation Carried Forward baseline. A Bayesian tensor modeling component for high-dimensional imaging fusion is also discussed. These results support the feasibility of the proposed architecture while also highlighting the need for stronger uncertainty calibration and longer-horizon predictive evaluation. The PCD-DT framework provides a principled starting point for personalized in silico modeling in neurodegenerative disease. This work positions PCD-DT as a foundational step toward clinically deployable, uncertainty-aware digital twin systems.
Frontier AI Models Outperform Human Physicians in Clinical Benchmarks and Emergency Scenarios
New research indicates that advanced LLMs can surpass human performance in specific medical diagnostic tasks. These findings underscore an urgent requirement for prospective clinical trials to validate AI efficacy in healthcare settings.
CareGuardAI: Context-Aware Multi-Agent Guardrails for Clinical Safety & Hallucination Mitigation in Patient-Facing LLMs
arXiv:2604.26959v1 Announce Type: new Abstract: Integrating large language models (LLMs) into patient-facing healthcare systems offers significant potential to improve access to medical information. However, ensuring clinical safety and factual reliability remains a critical challenge. In practice, AI-generated responses may be conditionally correct yet medically inappropriate, as models often fail to interpret patient context and tend to produce agreeable responses rather than challenge unsafe assumptions. Unlike clinicians, who infer risk from incomplete information, LLMs frequently lack contextual awareness. Moreover, real-world patient interactions are open-ended and underspecified, unlike structured benchmark settings. We present CareGuardAI, a risk-aware safety framework for patient-facing medical question answering that addresses two key failure modes: clinical safety risk and hallucination risk. The framework introduces Clinical Safety Risk Assessment (SRA), inspired by ISO 14971, and Hallucination Risk Assessment (HRA) to evaluate medical risk and factual reliability. At inference time, CareGuardAI employs a multi-stage pipeline consisting of a controller agent, safety-constrained generation, and dual risk evaluation, followed by iterative refinement when necessary. Responses are released only when both SRA and HRA are less than or equal to 2, ensuring clinically acceptable outputs with bounded latency. We evaluate CareGuardAI on PatientSafeBench, MedSafetyBench, and MedHallu, covering both safety and hallucination detection. Across these benchmarks, the framework consistently outperforms strong baseline models, including GPT-4o-mini, demonstrating the importance of context-aware, risk-based, inference-time safety mechanisms for reliable deployment in healthcare.
Generative AI In Healthcare: Adoption Matures As Agentic AI Emerges
Generative AI adoption in healthcare is shifting from pilot programs to production, with a focus on clinical documentation and administrative automation.
A Scoping Review of LLM-as-a-Judge in Healthcare and the MedJUDGE Framework
arXiv:2604.25933v1 Announce Type: new Abstract: As large language models (LLMs) increasingly generate and process clinical text, scalable evaluation has become critical. LLM-as-a-Judge (LaaJ), which uses LLMs to evaluate model outputs, offers a scalable alternative to costly expert review, but its healthcare adoption raises safety and bias concerns. We conducted a PRISMA-ScR scoping review of six databases (January 2020-January 2026), screening 11,727 studies and including 49. The landscape was dominated by evaluation and benchmarking applications (n=37, 75.5%), pointwise scoring (n=42, 85.7%), and GPT-family judges (n=36, 73.5%). Despite growing adoption, validation rigor was limited: among 36 studies with human involvement, the median number of expert validators was 3, while 13 (26.5%) used none. Risk of bias testing was absent in 36 studies (73.5%), only 1 (2.0%) examined demographic fairness, and none assessed temporal stability or patient context. Deployment remained limited, with 1 study (2.0%) reaching production and four (8.2%) prototype stage. Importantly, these gaps may interact: when judges and evaluated systems share training data or architectures, they may inherit similar blind spots, and agreement metrics may fail to distinguish true validity from shared errors. Minimal human oversight, limited bias assessment, and model monoculture together represent a governance gap where current validation may miss clinically significant errors. To address this, we propose MedJUDGE (Medical Judge Utility, De-biasing, Governance and Evaluation), a risk-stratified three-pillar framework organized around validity, safety, and accountability across clinical risk tiers, providing deployment-oriented evaluation guidance for healthcare LaaJ systems.
Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control
arXiv:2604.26577v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly considered for deployment as the control component of robotic health attendants, yet their safety in this context remains poorly characterized. We introduce a dataset of 270 harmful instructions spanning nine prohibited behavior categories grounded in the American Medical Association Principles of Medical Ethics, and use it to evaluate 72 LLMs in a simulation environment based on the Robotic Health Attendant framework. The mean violation rate across all models was 54.4\%, with more than half exceeding 50\%, and violation rates varied substantially across behavior categories, with superficially plausible instructions such as device manipulation and emergency delay proving harder to refuse than overtly destructive ones. Model size and release date were the primary determinants of safety performance among open-weight models, and proprietary models were substantially safer than open-weight counterparts (median 23.7\% versus 72.8\%). Medical domain fine-tuning conferred no significant overall safety benefit, and a prompt-based defense strategy produced only a modest reduction in violation rates among the least safe models, leaving absolute violation rates at levels that would preclude safe clinical deployment. These findings demonstrate that safety evaluation must be treated as a first-class criterion in the development and deployment of LLMs for robotic health attendants.
AI #166: Google Sells Out - by Zvi Mowshowitz
Senator Maria Cantwell says that if we let AI make healthcare decisions instead of doctors, we are going to have some real problems. Did you know we already have some real problems the other way? Her objection here is that AI systems designed to catch ‘wasteful spending’ (often but not always read: outright fraud) might deny care.
AI & Tech brief: States regulate AI in health care - The Washington Post
The White House is attempting a rapprochement with Anthropic over its new AI model, Mythos.
Governance for safe and responsible AI in healthcare organisations: a scoping review of frameworks | npj Digital Medicine
This scoping review synthesises current evidence on artificial intelligence (AI) governance in healthcare organisations, outlining key components of AI governance frameworks. Following PRISMA-ScR guidelines, we searched MEDLINE, Embase, and Scopus (April 2024, updated March 2025) for AI governance ...
Zuckerberg Bets $500M on AI Biology
Biohub, the nonprofit spearheaded by Mark Zuckerberg and Priscilla Chan, is committing $500 million to help create better AI simulations of the human body. The bet is that more data and compute will produce more useful models.
MITRE flags rising cyber risks as medical devices adopt AI, cloud and post-quantum technologies - Industrial Cyber
Survey finds 99% back microsegmentation ... short on protecting critical systems · US bill allows critical infrastructure operators to detect and neutralize rogue drones, closing key defense gaps · NMFTA names Ben Wilkens director of cybersecurity to lead strategy and research · OT-ISAC flags rising energy sector cyber risk as OT exposure spreads beyond control rooms into distributed assets · Nozomi joins Dragos in dismissing ZionSiphon as flawed, likely AI-generated ...
Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains
arXiv:2604.24902v1 Announce Type: new Abstract: Foundation models are routinely fine-tuned for use in particular domains, yet safety assessments are typically conducted only on base models, implicitly assuming that safety properties persist through downstream adaptation. We test this assumption by analyzing the safety behavior of 100 models, including widely deployed fine-tunes in the medical and legal domains as well as controlled adaptations of open foundation models alongside their bases. Across general-purpose and domain-specific safety benchmarks, we find that benign fine-tuning induces large, heterogeneous, and often contradictory changes in measured safety: models frequently improve on some instruments while degrading on others, with substantial disagreement across evaluations. These results show that safety behavior is not stable under ordinary downstream adaptation, raising critical questions about governance and deployment practices centered on base-model evaluations. Without explicit re-evaluation of fine-tuned models in deployment-relevant contexts, such approaches fall short of adequately managing downstream risk, overlooking practical sources of harm -- failures that are especially consequential in high-stakes settings and challenge current accountability paradigms.
AI-enabled medtech introduces risks facilities aren't ready for, cybersecurity report says
AI-enabled devices are introducing new risks that organizations aren’t fully equipped to manage, the cybersecurity report said.
GITEX Future Health Africa 2026: The Ethics of AI in Healthcare Under Global Focus
In a 700-bed public hospital in Kimberley, South Africa, a sole radiologist fell ill at the height of the COVID-19 pandemic.
Berlin-based Patronus raises €11 million for senior-friendly emergency smartwatch and family app
Patronus, a Berlin-based elderly care startup developing a mobile emergency smartwatch and a family app, today announced the closing of its €11 million funding round to expand its leadership in the mobile emergency response segment and develop new products around family, wellbeing, and an AI-powered daily companion. The round was led by 3TS Capital Partners, […]
Utah dismisses medical board call to halt its pioneering AI prescription program | KUER
The Utah Medical Licensing Board has “major concerns” and worries Utahns could potentially be harmed. But the Department of Commerce stood by the pilot program.
New Hyderabad Centre Revolutionizes Neuro-Ophthalmology with AI Diagnostics and Integrated Care
LV Prasad Eye Institute in Hyderabad has opened an Eye and Brain Centre, supported by D. E. Shaw India, to revolutionize neuro-ophthalmology through AI-driven diagnostics.
Generative AI in healthcare to hit $30.4b by 2032 on imaging boom | Asian Business Review
The global generative artificial intelligence in healthcare market is projected to reach $30.4b by 2032, expanding at a compound annual growth rate of 34.9%.
Secure On-Premise Deployment of Open-Weights Large Language Models in Radiology: An Isolation-First Architecture with Prospective Pilot Evaluation
arXiv:2604.22768v1 Announce Type: new Abstract: Purpose: To design, implement, evaluate, and report on the regulatory requirements of a self-hosted LLM infrastructure for radiology adhering to the principle of least privilege, emphasizing technical feasibility, network isolation, and clinical utility. Materials and Methods: The isolation-first, containerized LLM inference stack relies on strict
Policies and Safeguards for the Safe Use of AI
Considerations for creating an Al governance and safeguards framework Throughout 2025 and early 2026, a team of AI-focused security professionals in the
HTEC Research: Only One in Three Healthcare Organizations is Ready to Scale AI | National Business | joplinglobe.com
PALO ALTO, Calif.--(BUSINESS WIRE)--Apr 28, 2026--
Case-Specific Rubrics for Clinical AI Evaluation: Methodology, Validation, and LLM-Clinician Agreement Across 823 Encounters
An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness
Artificial Intelligence and Machine Learning (AI/ML) models used in clinical settings are increasingly deployed to support clinical decision-making. However, when training data become stale due to changes in demographics, environment, or patient behaviors, model performance can degrade substantially. While updating models with new training data is necessary, such updates may also introduce new risks.
An Artifact-based Agent Framework for Adaptive and Reproducible Medical Image Processing
arXiv:2604.21936v1 Announce Type: new Abstract: Medical imaging research is increasingly shifting from controlled benchmark evaluation toward real-world clinical deployment. In such settings, applying analytical methods extends beyond model design to require dataset-aware workflow configuration and provenance tracking. Two requirements therefore become central: \textbf{adaptability}, the ability to configure workflows according to dataset-specific conditions and evolving analytical goals; and \textbf{reproducibility}, the guarantee that all transformations and decisions are explicitly recorded and re-executable. Here, we present an artifact-based agent framework that introduces a semantic layer to augment medical image processing. The framework formalizes intermediate and final outputs through an artifact contract, enabling structured interrogation of workflow state and goal-conditioned assembly of configurations from a modular rule library. Execution is delegated to a workflow executor to preserve deterministic computational graph construction and provenance tracking, while the agent operates locally to comply with most privacy constraints. We evaluate the framework on real-world clinical CT and MRI cohorts, demonstrating adaptive configuration synthesis, deterministic reproducibility across repeated executions, and artifact-grounded semantic querying. These results show that adaptive workflow configuration can be achieved without compromising reproducibility in heterogeneous clinical environments.
CognitiveTwin: Robust Multi-Modal Digital Twins for Predicting Cognitive Decline in Alzheimer's Disease
arXiv:2604.22428v1 Announce Type: new Abstract: Predicting individual cognitive decline in Alzheimer's disease (AD) is difficult due to the heterogeneity of disease progression. Reliable clinical tools require not only high accuracy but also fairness across demographics and robustness to missing data. We present CognitiveTwin, a digital twin framework that predicts patient-specific cognitive trajectories. The model integrates multi-modal longitudinal data (cognitive scores, magnetic resonance imaging, positron emission tomography, cerebrospinal fluid biomarkers, and genetics). We use a Transformer-based architecture to fuse these modalities and a Deep Markov Model to capture temporal dynamics. We trained and evaluated the framework using data from 1,666 patients in the TADPOLE (Alzheimer's Disease Neuroimaging Initiative) dataset. We assessed the model for prediction error, demographic fairness, and robustness to missing-not-at-random (MNAR) data patterns. ognitiveTwin provides accurate and personalized predictions of cognitive decline. Its demonstrated fairness across patient demographics and resilience to clinical dropout make it a reliable tool for clinical trial enrichment and personalized care planning.
AI-Driven Automation in Healthcare
The Robotics Intelligence Seminar at Stanford Research Institute spotlights the future of human-robot collaboration, particularly in healthcare and logistics, driven by AI-enabled full-stack autonomy.
AI and X-Ray Breakthroughs
AI and non-contact imaging have successfully revealed the contents of a charred Herculaneum papyrus without damaging it.
Contributor: AI could democratize medicine, but better regulation comes first - Los Angeles Times
Artificial intelligence has the potential to fundamentally change healthcare, and the possibilities are neither radical nor experimental.
Joseph Ologunja MD - NHS Fellowship in Clinical AI | LinkedIn
Black professionals are not waiting for an invitation to the AI table. They are the ones building it. From fixing bias to rethinking how we see data, the future of this technology is being shaped by the people I met today.
As Trump Officials Pushed Health Savings Accounts, RFK Jr. Aide Ran Wellness Company Poised to Benefit
Calley Means remained president of a company that relied on health savings accounts last year as the Trump administration developed policies to expand them.
Therapy company mixes emotional and artificial intelligence to top ranking
Grow Therapy heads The Americas’ Fastest-Growing Companies 2026 list while testing AI’s potential
Health-care AI is here. We don’t know if it actually helps patients.
I don’t need to tell you that AI is everywhere. Or that it is being used, increasingly, in hospitals. Doctors are using AI to help them with notetaking. AI-based tools are trawling through patient records, flagging people who may require certain support or treatments. They are also used to interpret medical exam results and X-rays. A…
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
arXiv:2604.21027v1 Announce Type: new Abstract: Electronic health record (EHR) question answering is often handled by LLM-based pipelines that are costly to deploy and do not explicitly leverage the hierarchical structure of clinical data. Motivated by evidence that medical ontologies and patient trajectories exhibit hyperbolic geometry, we propose HypEHR, a compact Lorentzian model that embeds codes, visits, and questions in hyperbolic space and answers queries via geometry-consistent cross-attention with type-specific pointer heads. HypEHR is pretrained with next-visit diagnosis prediction and hierarchy-aware regularization to align representations with the ICD ontology. On two MIMIC-IV-based EHR-QA benchmarks, HypEHR approaches LLM-based methods while using far fewer parameters. Our code is publicly available at https://github.com/yuyuliu11037/HypEHR.
Agentic AI for Personalized Physiotherapy: A Multi-Agent Framework for Generative Video Training and Real-Time Pose Correction
arXiv:2604.21154v1 Announce Type: new Abstract: At-home physiotherapy compliance remains critically low due to a lack of personalized supervision and dynamic feedback. Existing digital health solutions rely on static, pre-recorded video libraries or generic 3D avatars that fail to account for a patient's specific injury limitations or home environment. In this paper, we propose a novel Multi-Agent System (MAS) architecture that leverages Generative AI and computer vision to close the tele-rehabilitation loop. Our framework consists of four specialized micro-agents: a Clinical Extraction Agent that parses unstructured medical notes into kinematic constraints; a Video Synthesis Agent that utilizes foundational video generation models to create personalized, patient-specific exercise videos; a Vision Processing Agent for real-time pose estimation; and a Diagnostic Feedback Agent that issues corrective instructions. We present the system architecture, detail the prototype pipeline using Large Language Models and MediaPipe, and outline our clinical evaluation plan. This work demonstrates the feasibility of combining generative media with agentic autonomous decision-making to scale personalized patient care safely and effectively.
InVitroVision: a Multi-Modal AI Model for Automated Description of Embryo Development using Natural Language
arXiv:2604.21061v1 Announce Type: new Abstract: The application of artificial intelligence (AI) in IVF has shown promise in improving consistency and standardization of decisions, but often relies on annotated data and does not make use of the multimodal nature of IVF data. We investigated whether foundational vision-language models can be fine-tuned to predict natural language descriptions of embryo morphology and development. Using a publicly available embryo time-lapse dataset, we fine-tuned PaliGemma-2, a multi-modal vision-language model, with only 1,000 images and corresponding captions, describing embryo morphology, embryonic cell cycle and developmental stage. Our results show that the fine-tuned model, InVitroVision, outperformed a commercial model, ChatGPT 5.2, and base models in overall metrics, with performance improving with larger training datasets. This study demonstrates the potential of foundational vision-language models to generalize to IVF tasks with limited data, enabling the prediction of natural language descriptions of embryo morphology and development. This approach may facilitate the use of large language models to retrieve information and scientific evidence from relevant publications and guidelines, and has implications for few-shot adaptation to multiple downstream tasks in IVF.
Clinical Reasoning AI for Oncology Treatment Planning: A Multi-Specialty Case-Based Evaluation
arXiv:2604.20869v1 Announce Type: new Abstract: Background: More than 80% of U.S. cancer care is delivered in community settings, where survival remains worse than at academic centers. Clinicians must integrate genomics, staging, radiology, pathology, and changing guidelines, creating cognitive burden. We evaluated OncoBrain, an AI clinical reasoning platform for oncology treatment-plan generation, as an early step toward OGI. Methods: OncoBrain combines general-purpose LLMs with a cancer-specific graph retrieval-augmented generation layer, a gold-standard treatment-plan corpus as long-term memory, and a model-agnostic safety layer (CHECK) for hallucination detection and suppression. We evaluated clinician-enriched case summaries across gynecologic, genitourinary, neuro-oncology, gastrointestinal/hepatobiliary, and hematologic malignancies. Three clinician groups completed structured evaluations of 173 cases using a common 16-item instrument: subspecialist oncologists reviewed 50 cases, physician reviewers 78, and advanced practice providers 45. Results: Ratings were highest for scientific accuracy, evidence support, and safety, with lower but favorable scores for workflow integration and time savings. On a 5-point scale, mean alignment with evidence and guidelines was 4.60, 4.56, and 4.70 across subspecialists, physician reviewers, and advanced practice providers. Mean scores for absence of safety or misinformation concerns were 4.80, 4.40, and 4.60. Workflow integration averaged 4.50, 3.94, and 4.00; perceived time savings averaged 5.00, 3.89, and 3.60. Conclusions: In this multi-specialty vignette-based evaluation, OncoBrain generated oncology treatment plans judged guideline-concordant, clinically acceptable, and easy to supervise. These findings support the potential of a carefully engineered AI reasoning platform to assist oncology treatment planning and justify prospective real-world evaluation in community settings.
Post Next: Future of Cancer - The Next Frontiers
Post Next: Future of Cancer - The Next Frontiers - The Washington Post Democracy Dies in Darkness By Washington Post Live Register for the program here. Technological breakthroughs in artificial intelligence and beyond are transforming cancer research and care for patients around the world. Join Washington Post Live for a conversation with Microsoft Science President Peter Lee about the progress made and the future of cancer.
Automated Detection of Dosing Errors in Clinical Trial Narratives: A Multi-Modal Feature Engineering Approach with LightGBM
arXiv:2604.19759v1 Announce Type: new Abstract: Clinical trials require strict adherence to medication protocols, yet dosing errors remain a persistent challenge affecting patient safety and trial integrity. We present an automated system for detecting dosing errors in unstructured clinical trial narratives using gradient boosting with comprehensive multi-modal feature engineering. Our approach combines 3,451 features spanning traditional NLP (TF-IDF, character n-grams), dense semantic embeddings (all-MiniLM-L6v2), domain-specific medical patterns, and transformer-based scores (BiomedBERT, DeBERTa-v3), used to train a LightGBM model. Features are extracted from nine complementary text fields (median 5,400 characters per sample) ensuring complete coverage across all 42,112 clinical trial narratives. On the CT-DEB benchmark dataset with severe class imbalance (4.9% positive rate), we achieve 0.8725 test ROC-AUC through 5-fold ensemble averaging (cross-validation: 0.8833 + 0.0091 AUC). Systematic ablation studies reveal that removing sentence embeddings causes the largest performance degradation (2.39%), demonstrating their critical role despite contributing only 37.07% of total feature importance. Feature efficiency analysis demonstrates that selecting the top 500-1000 features yields optimal performance (0.886-0.887 AUC), outperforming the full 3,451-feature set (0.879 AUC) through effective noise reduction. Our findings highlight the importance of feature selection as a regularization technique and demonstrate that sparse lexical features remain complementary to dense representations for specialized clinical text classification under severe class imbalance.
Health Systems Race to Contain AI Misinformation ‘Domino Effect’ - Newsweek
Marketing experts can't control the AI algorithm. But health care leaders aren't "comfortably ceding" their brands just yet.
On-Demand: CE Marking in Europe - Medical Device Regulations and AI [1/1/2026-9/30/2026] - Alabama Small Business Development Center
Live webinar recorded on 10/1/2024 Please join the Alabama International Trade Center and BSI Group for a webinar: CE Marking in Europe – the State for the Medical Device Regulations in Europe and AI Agenda: The State for the Medical Device Regulations in Europe MDR/IVDR/CE Marking/UKCA QMS ...
Emma the joke-telling robot cracks up the care home: Paula Hornickel’s best photograph
‘The first resident that Emma – a social robot – was introduced to was called Peter. After that, Emma assumed they were all called Peter, which everyone found hilarious. Then she broke down’ One morning in July 2025, I arrived in the small, quiet town of Albershausen in south-west Germany.
Opinion | Utah program to let AI refill prescriptions is not a crazy idea - The Washington Post
It’s easy to dismiss Utah’s latest artificial intelligence experiment as dangerous and dystopian. The state has partnered with a company called Doctronic to empower an “AI doctor,” rather than a human clinician, to refill medication ...
AI Startup Has Helped Reverse Thousands of Denied Health Insurance Claims
Americans rarely fight back when insurers reject treatments their doctors have prescribed. Claimable is working to change that, with a little help from Mark Cuban.
The Godmother of Silicon Valley and her former student want to fix how healthcare gets built
Fail fast, revise, repeat: Esther Wojcicki brings her classroom philosophy to healthcare investing with the launch of Treehub.
Error-free Training for MedMNIST Datasets
arXiv:2604.18916v1 Announce Type: new Abstract: In this paper, we introduce a new concept called Artificial Special Intelligence by which Machine Learning models for the classification problem can be trained error-free, thus acquiring the capability of not making repeated mistakes. The method is applied to 18 MedMNIST biomedical datasets. Except for three datasets, which suffer from the double-labeling problem, all are trained to perfection.
Merck Partners With Google
Merck to partner with Google Cloud on AI initiatives.
AI/ML Scientist – Operational Twinning & Healthcare ...
We cannot provide a description for this page right now
New Gallup poll finds that low-income Americans are turning to AI for healthcare
A Gallup poll shows that 32% of low-income Americans use AI as a substitute for doctor visits, compared to 14% of the general population.
Why Healthcare AI Still Struggles To Deliver
Why healthcare AI stalls after the pilot stage, where governance becomes the bottleneck, and where CIOs are finally seeing measurable ROI.
Can you rely on AI chatbots for medical advice?
Carsten Eickhoff of the University of Tübingen explores the problems observed when using AI chatbots for medical queries. Read more: Can you rely on AI chatbots for medical advice?
WHO: Rapid rise of AI in EU healthcare calls for clear frameworks | ICT&health
The use of AI in European healthcare is growing rapidly, but the conditions for responsible implementation are lagging.
New studies show how often chatbots get health answers wrong - The Washington Post
Two studies put ChatGPT, Gemini and others to the test on questions of health. In one, they got almost half the answers wrong.
How the world regulates AI in health - and why it’s complex | ICT&health
In regions such as Europe, the United States, Australia, and China, AI is mainly governed under existing medical device laws.
Murder, she wrote: Ex-FBI chief wants some ransomware crims charged with homicide
Lawmakers decry CISA cuts: 'We are shooting ourselves in the foot' If a cyberattack leads to a death, that's murder. A former FBI cyber division chief urged the US Justice Department to consider felony homicide charges against ransomware actors when attacks on hospitals lead to patient deaths.…
Dental AI adoption expands in the U.S. as Heartland rolls out DentalXChange, HOOTL raises US$6M+ - Oral Health Group
Artificial intelligence (AI) adoption in dental practice management is accelerating in the United States, with a major dental support organization (DSO) deal and fresh venture financing highlighting growing investment in automation across the revenue cycle. On Tuesday, DentalXChange, a dental revenue cycle management (RCM) technology company, announced a new enterprise agreement with Heartland Dental, the largest DSO in the U.S., to deploy ...
Toward Zero-Egress Psychiatric AI: On-Device LLM Deployment for Privacy-Preserving Mental Health Decision Support
Privacy represents one of the most critical yet underaddressed barriers to AI adoption in mental healthcare -- particularly in high-sensitivity operational environments such as military, correctional, and remote healthcare settings, where the risk of patient data exposure can deter help-seeking behavior entirely. Existing AI-enabled psychiatric decision support systems predominantly rely on cloud-based inference pipelines, requiring sensitive patient data to leave the device and traverse externa...
AI Approach for MRI-only Full-Spine Vertebral Segmentation and 3D Reconstruction in Paediatric Scoliosis
MRI is preferred over CT in paediatric imaging because it avoids ionising radiation, but its use in spine deformity assessment is largely limited by the lack of automated, high-resolution 3D bony reconstruction, which continues to rely on CT. MRI-based 3D reconstruction remains impractical due to manual workflows and the scarcity of labelled full-spine datasets. This study introduces an AI framework that enables fully automated thoracolumbar spine (T1-L5) segmentation and 3D reconstruction from ...
DeepER-Med: Advancing Deep Evidence-Based Research in Medicine Through Agentic AI
arXiv:2604.15456v1 Announce Type: new Abstract: Trustworthiness and transparency are essential for the clinical adoption of artificial intelligence (AI) in healthcare and biomedical research. Recent deep research systems aim to accelerate evidence-grounded scientific discovery by integrating AI agents with multi-hop information retrieval, reasoning, and synthesis. However, most existing systems lack explicit and inspectable criteria for evidence appraisal, creating a risk of compounding errors and making it difficult for researchers and clinicians to assess the reliability of their outputs. In parallel, current benchmarking approaches rarely evaluate performance on complex, real-world medical questions. Here, we introduce DeepER-Med, a Deep Evidence-based Research framework for Medicine with an agentic AI system. DeepER-Med frames deep medical research as an explicit and inspectable workflow of evidence-based generation, consisting of three modules: research planning, agentic collaboration, and evidence synthesis. To support realistic evaluation, we also present DeepER-MedQA, an evidence-grounded dataset comprising 100 expert-level research questions derived from authentic medical research scenarios and curated by a multidisciplinary panel of 11 biomedical experts. Expert manual evaluation demonstrates that DeepER-Med consistently outperforms widely used production-grade platforms across multiple criteria, including the generation of novel scientific insights. We further demonstrate the practical utility of DeepER-Med through eight real-world clinical cases. Human clinician assessment indicates that DeepER-Med's conclusions align with clinical recommendations in seven cases, highlighting its potential for medical research and decision support.
Palantir's NHS future in doubt as ministers eye contract break
£330M deal leaves service with no ownership of software built to connect trusts to the platform The UK government is considering ending Palantir's involvement in a central NHS data platform after coming under fire from MPs, unions, and campaigners.…
How people use Copilot for Health
arXiv:2604.15331v1 Announce Type: cross Abstract: We analyze over 500,000 de-identified health-related conversations with Microsoft Copilot from January 2026 to characterize what people ask conversational AI about health. We develop a hierarchical intent taxonomy of 12 primary categories using privacy-preserving LLM-based classification validated against expert human annotation, and apply LLM-driven topic-clustering for prevalent themes within each intent. Using this taxonomy, we characterize the intents and topics behind health queries, identify who these queries are about, and analyze how usage varies by device and time of day. Five findings stand out. First, nearly one in five conversations involve personal symptom assessment or condition discussion, and even the dominant general information category (40%) is concentrated on specific treatments and conditions, suggesting that this is a lower bound on personal health intent. Second, one in seven of these personal health queries concern someone other than the user, such as a child, a parent, a partner, suggesting that conversational AI can be a caregiving tool, not just a personal one. Third, personal queries about symptoms and emotional health queries increase markedly in the evening and nighttime hours, when traditional healthcare is most limited. Fourth, usage diverges sharply by device: mobile concentrates on personal health concerns, while desktop is dominated by professional and academic work. Fifth, a substantial share of queries focuses on navigating healthcare systems such as finding providers, and understanding insurance, highlighting friction in the delivery of existing healthcare. These patterns have direct implications for platform-specific design, safety considerations, and the responsible development of health AI.
Is AI actually improving healthcare? | Nature Medicine
A.G. is the Varma Family Chair ... Advanced Research AI Chair funds at the Vector Institute. J.W. is supported by AI & Digital Health Innovation at the University of Michigan, and by the National Heart Lung and Blood Institute of the US National Institutes of Health (grant R01HS027431). ... Correspondence to Anna Goldenberg. The authors declare no competing interests. ... Goldenberg, A., Wiens, J.
HSCC warns AI-driven supply chains are outpacing healthcare cybersecurity defenses and oversight models - Industrial Cyber
HSCC warns AI-driven supply chains are outpacing healthcare cybersecurity defenses, exposing gaps in vendor oversight and risk visibility.
New WHO/Europe report provides first-ever snapshot of AI in health care across European Union Member States
WHO/Europe has released a new report assessing the rapidly evolving use of artificial intelligence (AI) in health care across the 27 European Union (EU) Member States. The first comprehensive review of its kind, the report reveals strong and consistent momentum across EU Member States, with ...
AI in health care: Experts discuss the future of AI practices and policies | AHA News
Jim VandeHei, CEO of Axios; Marc Boom, M.D., AHA board chair and president and CEO of Houston Methodist; Anne Klibanski, M.D., president and CEO of Mass General Brigham; Jonathan Perlin, M.D., president and CEO of Joint Commission; and Ladd Wiley, senior vice president of global corporate affairs, ...
Preparing Healthcare Data for AI: Why Health Systems Must Fix Legacy Systems
Xsolis CTO Zach Evans explains why healthcare AI pilots frequently stall, revealing that less than 20% of enterprise data is ready for AI.
Regulatory Considerations for Artificial Intelligence in Healthcare: A WHO Perspective
The mission of the World Health Organization (WHO) is to promote health, keep the world safe and serve the vulnerable is articulated in its global strategy on digital health 2020–2025. At the heart of...
SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications
arXiv:2604.13180v1 Announce Type: new Abstract: Recent advances in agentic AI have enabled increasingly autonomous workflows, but existing systems still face substantial challenges in achieving reliable deployment in real-world scientific research. In this work, we present a safe, lightweight, and user-friendly agentic framework for the autonomous execution of well-defined scientific tasks. The framework combines an isolated execution environment, a three-layer agent loop, and a self-assessing do-until mechanism to ensure safe and reliable operation while effectively leveraging large language models of varying capability levels. By focusing on structured tasks with clearly defined context and stopping criteria, the framework supports end-to-end automation with minimal human intervention, enabling researchers to offload routine workloads and devote more effort to creative activities and open-ended scientific inquiry.
AI in Healthcare: Aid, Not Replace—Clinicians Warn of Risks as Users Seek Speed and Privacy
Physicians stress AI should support, not replace, professional care amid concerns about misinformation and privacy.
How AI Is Being Used To Detect Cancer at The Earliest Stage
Dr. Bea Bakshi, CEO & Co-Founder of C the Signs joins Bloomberg Businessweek to discuss the future of cancer detection and how AI is part of the solution even in the early stages. (Source: Bloomberg)
AI in Healthcare: Aid, Not Replace—Clinicians Warn of Risks as Users Seek Speed and Privacy
Physicians stress AI should support, not replace, professional care amid concerns about misinformation and privacy.
A longitudinal health agent framework
arXiv:2604.12019v1 Announce Type: new Abstract: Although artificial intelligence (AI) agents are increasingly proposed to support potentially longitudinal health tasks, such as symptom management, behavior change, and patient support, most current implementations fall short of facilitating user intent and fostering accountability. This contrasts with prior work on supporting longitudinal needs, where follow-up, coherent reasoning, and sustained alignment with individuals' goals are critical for both effectiveness and safety. In this paper, we draw on established clinical and personal health informatics frameworks to define what it would mean to orchestrate longitudinal health interactions with AI agents. We propose a multi-layer framework and corresponding agent architecture that operationalizes adaptation, coherence, continuity, and agency across repeated interactions. Through representative use cases, we demonstrate how longitudinal agents can maintain meaningful engagement, adapt to evolving goals, and support safe, personalized decision-making over time. Our findings underscore both the promise and the complexity of designing systems capable of supporting health trajectories beyond isolated interactions, and we offer guidance for future research and development in multi-session, user-centered health AI.
China Launches AI Doctor Platform for Parkinson's
Xuanwu Hospital in Beijing has launched El.kz, China's first AI-powered platform for Parkinson's disease, aimed at automating routine patient inquiries. This initiative is part of a broader strategy to digitize healthcare, leveraging over 20 years of clinical data to alleviate physician workload and better manage chronic conditions in an aging population.
Adoption and Effectiveness of AI-Based Anomaly Detection for Cross Provider Health Data Exchange
arXiv:2604.09630v1 Announce Type: new Abstract: This study investigates the adoption and effectiveness of AI-based anomaly detection in cross-provider electronic health record (EHR) environments. It aims to (1) identify the organisational and digital capabilities required for successful implementation and (2) evaluate the performance and interpretability of lightweight anomaly detection approaches using contextual audit data. A semi-systematic scoping synthesis is conducted to derive a four-pillar readiness framework covering governance, infrastructure/interoperability, workforce, and AI integration, operationalised as a 10-item checklist with measurable indicators. This is complemented by a simulation of cross-provider audit logs incorporating contextual features such as provider mismatch, time of access, days since discharge, session duration, and access frequency. A rule-based approach is benchmarked against Isolation Forest, with SHAP used to explain model behaviour. Results show that rule-based methods achieve high recall but generate higher alert volumes, while Isolation Forest reduces alert burden at the cost of lower sensitivity. SHAP analysis highlights provider mismatch and off-hours access as dominant anomaly drivers. The study proposes a staged deployment strategy combining rules for coverage and machine learning for prioritisation, supported by explainability and continuous monitoring. The findings contribute a practical readiness framework and empirical insights to guide the implementation of AI-based anomaly detection in multi-provider healthcare environments.
China Launches AI Doctor Platform for Parkinson's, Streamlining Patient Support
Xuanwu Hospital in Beijing has launched El.kz, China's first AI-powered platform for Parkinson's disease, aimed at automating routine patient inquiries.
Investigating Vaccine Buyer's Remorse: Post-Vaccination Decision Regret in COVID-19 Social Media Using Politically Diverse Human Annotation
arXiv:2604.09626v1 Announce Type: new Abstract: A significant gap exists in datasets regarding post-COVID-19 vaccination experiences, particularly ``vaccine buyer's remorse''. Understanding the prevalence and nature of vaccine regret, whether based on personal or vicarious experiences, is vital for addressing vaccine hesitancy and refining public health communication. In this paper, we curate a novel dataset from a large YouTube news corpus capturing COVID-19 vaccination experiences, and construct a benchmark subset focused on vaccine regret, annotated by a politically diverse panel to account for the subjective and often politicized nature of the topic. We utilize large language models (LLMs) to identify posts expressing vaccine regret, analyze the reasons behind this regret, and quantify its occurrence in both first and second-person accounts. This paper aims to (1) quantify the prevalence of vaccine regret; (2) identify common reasons for this sentiment; (3) analyze differences between first-person and vicarious experiences; and (4) assess potential biases introduced by different LLMs. We find that while vaccine buyer's remorse appears in only $<2\%$ of public discourse, it is disproportionately concentrated in vaccine-skeptic influencer communities and is predominantly expressed through first-person narratives citing adverse health events.
AI chatbots misdiagnose in over 80% of early medical cases, study finds
Top models including OpenAI and DeepSeek make judgments too quickly when patient data is incomplete
AI to predict how bowel cancer patients will respond to new NHS drug | Bowel cancer | The Guardian
PhenMap tool could spare thousands of patients from treatment that would be ineffective for them
7 ways AI is advancing healthcare and wellbeing around the world - Source
AI-powered tools are being used to bring greater efficiency and security to healthcare around the world and increase access to medicines and care.
Healthcare AI Faces Scaling Challenges
A Qventus study reveals that while many health IT leaders deploy AI, only 4% have achieved measurable outcomes due to integration hurdles.
AI Data Governance for Healthcare | Health AI
Data quality, privacy, and governance requirements for clinical AI
India Unveils Futuristic Surgical Tech at SMRSC 2026
India showcased battlefield care and tele-surgery innovations from SS Innovations International at the SMRSC 2026 event.
AI Policy for Reproductive Medicine in California | Health AI
AI compliance requirements for reproductive medicine in California. State-specific regulation, HIPAA, and governance guidance.
Healthcare AI Faces Scaling Challenges
A Qventus study reveals that while 42% of health IT leaders deploy AI across multiple use cases, only 4% have measurable outcomes, highlighting challenges in scaling AI pilots. Experts emphasize the risks of poor technology bets and the necessity of AI integration for competitive advantage, as healthcare systems push for …
Co-design for Trustworthy AI: An Interpretable and Explainable Tool for Type 2 Diabetes Prediction Using Genomic Polygenic Risk Scores
arXiv:2604.08217v1 Announce Type: new Abstract: The polygenic risk scores (PRS) have emerged as an important methodology for quantifying genetic predisposition to complex traits and clinical disease. Significant progress has been made in applying PRS to conditions such as obesity, cancer, and type 2 diabetes (T2DM). Studies have demonstrated that PRS can effectively identify individuals at high risk, thereby enabling early screening, personalized treatment, and targeted interventions for diseases with a genetic predisposition. One current limitation of PRS, however, is the lack of interpretability tools. To address this problem for T2DM, researchers at the Graduate School of Data Science at the Seoul National University introduced eXplainable PRS (XPRS). This visualization tool decomposes PRSs into gene-level and single-nucleotide polymorphism (SNP) contribution scores via Shapley Additive Explanations (SHAP), providing granular insights into the specific genetic factors driving an individual's risk profile. We used a co-design approach to assess XPRS trustworthiness by considering legal, medical, ethical, and technical robustness during early design and potential clinical use. For that, we used Z-inspection, an ethically aligned Trustworthy AI co-design methodology, and piloted the Council of Europe's Human Rights, Democracy, and the Rule of Law Impact Assessment for AI Systems (HUDERIA) (Council of Europe (CAI) 2025). The findings of this use-case comprise a comprehensive set of ethical, legal, and technical lessons learned. These insights, identified by a multidisciplinary team of experts (ethics, legal, human rights, computer science, and medical), serve as a framework for designers to navigate future challenges with this and other AI systems. The findings also provide a useful reference for researchers developing explainability frameworks for PRS in diverse clinical contexts.
IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures
arXiv:2604.07709v1 Announce Type: cross Abstract: Ask a frontier model how to taper six milligrams of alprazolam (psychiatrist retired, ten days of pills left, abrupt cessation causes seizures) and it tells her to call the psychiatrist she just explained does not exist. Change one word ("I'm a psychiatrist; a patient presents with...") and the same model, same weights, same inference pass produces a textbook Ashton Manual taper with diazepam equivalence, anticonvulsant coverage, and monitoring thresholds. The knowledge was there; the model withheld it. IatroBench measures this gap. Sixty pre-registered clinical scenarios, six frontier models, 3,600 responses, scored on two axes (commission harm, CH 0-3; omission harm, OH 0-4) through a structured-evaluation pipeline validated against physician scoring (kappa_w = 0.571, within-1 agreement 96%). The central finding is identity-contingent withholding: match the same clinical question in physician vs. layperson framing and all five testable models provide better guidance to the physician (decoupling gap +0.38, p = 0.003; binary hit rates on safety-colliding actions drop 13.1 percentage points in layperson framing, p = 1 (kappa = 0.045); the evaluation apparatus has the same blind spot as the training apparatus. Every scenario targets someone who has already exhausted the standard referrals.
AI Legislative Update: April 10, 2026 — Transparency Coalition. Legislation for Transparency in AI Now.
Every Friday, TCAI brings you the nation’s most comprehensive update of AI-related legislation moving through state legislatures. This week: Therapy chatbot bans are picking up speed. Maine sent a therapy bot ban to the governor, while Missouri is moving on a similar ban via an omnibus health ...
UnitedHealth Just Dropped $3 Billion On AI. Not To Save Your Life. To Deny Your Claim Faster.
The company now employs 22,000 software engineers. More than 80 percent are building AI tools. Not to find cures. Not to coordinate care.