Skip to content
All posts

Healthcare AI · June 25, 2026 · kaddu livingstone · 6 min read

Agentic AI in Healthcare, and Why the Best Systems Do Less Than They Could

Agentic AI is past the pilot stage in healthcare, but the systems that survive real clinical environments are the most constrained, not the most autonomous. We explain the autonomy paradox and what separates a deployable agent from theater.

Agentic AI in Healthcare, and Why the Best Systems Do Less Than They Could
Share

Agentic AI has moved past the pilot stage in healthcare, but the systems that survive contact with real clinical environments are not the most autonomous ones. They are the most constrained. In most industries, value scales with how much an agent can do on its own. In healthcare, autonomy is closer to a liability that has to be engineered down to a safe level than a feature to maximize. The buyer question that matters is not "how much can it do without us," but "can you prove what it did, and can you stop it."

That reframing matters now because the money is already moving. In Deloitte's 2026 US Health Care Outlook Survey of 120 C-suite executives, more than 80% expect generative and agentic AI together to deliver moderate-to-significant value across clinical, business, and back-office functions this year (Deloitte, 2026). A related Deloitte study of healthcare technology leaders found 61% already building or budgeting for agentic initiatives, with 85% planning to increase investment over the next two to three years (Deloitte, 2026). The strategic question for most organizations is no longer whether to invest. It is what to deploy first, and how to keep it safe at scale.

What "agentic" actually changes

An AI agent is not a chatbot or a copilot. A copilot drafts a note faster when a clinician asks it to. An agent perceives data across systems, decides on a course of action, executes a multi-step workflow, and adapts when conditions change, often without a human directing each step. A scheduling chatbot answers a question about clinic hours. A scheduling agent checks live provider availability, applies insurance eligibility rules, books the slot in the electronic health record, sends a confirmation, and sets a reminder.

The difference is consequential because it changes what can go wrong. A copilot that drafts a flawed summary produces a document a human still reviews. An agent that misreads a payer rule can take an action with downstream effects before anyone looks. Capability and exposure rise together.

Where it is genuinely working

The strongest current evidence sits in administrative and documentation workflows, not in autonomous clinical decision-making.

Ambient documentation is the clearest win. Tools that draft clinical notes from the patient encounter are saving clinicians one to two hours a day in reported deployments, and the US Department of Veterans Affairs is expanding AI scribe technology across its medical centers (ACL Digital, 2026). Documentation also shows up in clinical-adjacent work that stays under human review. Deloitte points to Northwestern Medicine's in-house system, which drafts radiology reports in real time that are roughly 95% complete and automatically flags life-threatening findings, with no measured impact on clinical accuracy (Deloitte, 2026). Prior authorization, claims follow-up, and revenue cycle work are the other heavily targeted areas, driven by high volume, rule-bound logic, and the cost of manual processing.

A grading note is warranted. Much of the most striking ROI circulating in the market, the appointment platforms claiming several hundred percent returns or the prior-authorization tools citing 8x payback, comes from vendor case studies rather than peer-reviewed evaluation. Treat those as directional marketing, not as evidence. The credible signal is narrower and more honest: executive expectation is high and well-documented, documentation tools show consistent time savings, and at-scale clinical outcome data remains thin. Deloitte's own numbers make the point. In the same survey where 80% expect value, only about 30% of health systems run generative AI at scale in selected areas, just 2% report enterprise-wide deployment, and 49% are still experimenting (Deloitte, 2026). The center of gravity has shifted. The work has not finished.

The autonomy paradox

Here is the part the hype cycle skips. In healthcare, more autonomy does not reliably mean more value, and past a certain point it means more risk for the same return.

Three structural problems explain why. The first is what some practitioners call the hidden middle. An agent may take twenty or thirty reasoning steps to reach a conclusion, and a failure buried in the intermediate chain is hard to detect from the final output alone (ODSC, 2026). The second is the illusion of transparency. A model's stated chain of thought is often a plausible-sounding explanation rather than a faithful account of how the decision was actually made, which means you cannot fully trust an agent's own description of its reasoning. The third is agent sprawl. As organizations deploy many agents with system access, ownership and controls get inconsistent, and the agents themselves can become a new category of unmanaged identity and security risk (Imprivata, 2026).

The wider trust signal points the same direction. One analysis found that confidence in fully autonomous AI fell from 43% to 27% over 2025, while fewer than 10% of organizations reported robust governance frameworks (AURA, 2025). We read those figures cautiously, since they aggregate across sectors, but the message is consistent with what serious healthcare buyers are saying out loud. The reasonable posture, in the words of one practitioner, is to treat clinical agentic AI as being in its "Year 1," comparable to the early days of cybersecurity, where full autonomy in high-stakes settings is not yet warranted.

This is why constrained autonomy wins. The best systems do less than they technically could because the marginal task they give up is rarely worth the oversight it would cost.

What separates a real system from theater

The 2026 market is crowded with what buyers at the JP Morgan Healthcare Conference described as generic models repackaged as clinical AI (ACL Digital, 2026). The era of rewarding any pitch deck with "AI" on it is closing. The distinguishing features are unglamorous and verifiable.

Auditability. Every action an agent takes should leave a complete, immutable trail. Under HIPAA, demonstrable access and audit controls are a baseline for any system touching protected health information, not an add-on.

Scoped permissions. An agent should hold the narrowest system access its task requires, with a clear owner, rather than broad standing credentials. The Health Information Sharing and Analysis Center has already flagged weak governance and credential misuse as live risks in agentic deployments (ISMG, 2026).

The right human in the loop. For high-stakes clinical decisions, the agent recommends and a human approves before action, which is the human-in-the-loop model that regulatory frameworks such as the EU AI Act effectively require for high-risk systems (EW Solutions, 2026). For lower-risk, high-volume administrative work, a human-on-the-loop model, where the agent acts and humans review logs and exceptions, can be appropriate. The skill is matching the oversight model to the stakes, not applying one setting everywhere.

Lifecycle governance. Agents need to be registered, authorized, monitored, updated, and retired under a defined control plane, with telemetry that adjusts oversight to an agent's autonomy and clinical criticality (governance research, 2026). Static, once-a-quarter committee review does not scale to fleets of agents.

Honest evidence. Outputs should be source-linked and outcomes measurable. Vendors who can show what an agent did, why, and to what effect are operating differently from those selling a confidence number.

The takeaway

The organizations that win with agentic AI in healthcare will not be the ones that automate the most. They will be the ones that treat governance as part of the product rather than paperwork bolted on afterward. That is the discipline this technology rewards. Build agents that are auditable, scoped, supervised at the right point, and honest about their evidence, and the autonomy you do grant becomes an asset you can defend. Maximize autonomy for its own sake, and you have built a liability that looks impressive until the first action no one can explain.

At Curely AI, that is the standard we hold our own systems to, because in healthcare the trustworthy system and the valuable system are the same system.