Healthcare AI · June 30, 2026 · Curely AI Research · 10 min read
We Refused to Build One Big Healthcare AI, Here Is What We Built Instead
The industry is racing to build a single AI that does everything in healthcare. Curely AI is doing the opposite, on purpose. Here is the agent ecosystem we built, what the evidence actually supports, and why the clinician stays in control.

The healthcare AI industry is racing to build a single assistant that does everything. Curely AI is building the opposite, and the choice is deliberate.
A general-purpose assistant is easy to demo and hard to trust. It hides the line between the tasks where AI is genuinely reliable today and the tasks where the evidence is still thin, and in medicine that hidden line is where patients get hurt. So we built an ecosystem of specialized agents instead. Each one owns a single clinical or operational problem. Each one is given exactly as much autonomy as the evidence justifies, and not one degree more. And in every case, the human professional makes the decision that matters.
The research has arrived at the same place by a different road. A 2026 scoping review of AI agents in healthcare found the field moving decisively toward multi-agent systems with explicit planning and self-correction, while warning that technical innovation is outpacing the translational research and governance needed to deploy it safely. That gap is the whole problem. Our response to it is narrow, accountable agents, and a refusal to overstate what any of them can do.
Here is the ecosystem, with the evidence behind each agent graded plainly. Strong evidence means randomized trials, systematic reviews, or regulatory guidance. Weaker signals mean single studies, early deployments, or vendor claims. We tell you which is which, because that honesty is the product.
1. Clinical Documentation Agent
Start here, because this is where the evidence is strongest and the pain is sharpest. Physicians spend roughly two hours on paperwork for every hour of patient care, and that burden is now one of the leading drivers of burnout.
The Curely Clinical Documentation Agent generates consultation notes, discharge summaries, referral letters, and follow-up instructions, and structures the record so it is consistent and complete. The clinician reviews and signs. Nothing reaches the chart unedited.
Evidence grade, strong. This is one of the few corners of healthcare AI now backed by randomized trials. A pragmatic randomized trial of 238 outpatient physicians across 14 specialties, published in NEJM AI in late 2025, found that ambient AI scribes cut documentation time and produced modest but real improvements in burnout, work exhaustion, and task load. A separate randomized trial at UW Health reported about 30 minutes of documentation time saved per provider per day, alongside a clinically meaningful drop in burnout.
The same evidence carries the warning. The NEJM AI trial found that occasional inaccuracies in the generated notes required ongoing physician vigilance, and that two competing vendors performed almost identically, which means the tool is not the differentiator, the review process is. The eye-catching figures of 60 to 90 minutes saved per day come from vendor and industry analyses, not controlled trials, and should be read as ceilings, not norms. We built the agent around the clinician's review step rather than around the headline number, because that step is exactly what keeps a fast note from becoming a wrong one.
2. Patient Intelligence Agent
A clinician who walks into a consultation without the full picture is not making a decision, they are guessing efficiently. Where records are scattered across facilities, which is the norm across much of Africa, that is the default condition.
The Patient Intelligence Agent closes that gap. It continuously analyzes medical history, previous visits, medications, allergies, laboratory results, and vital signs, and hands the clinician an organized clinical summary before the encounter begins, not buried somewhere inside it.
Evidence grade, moderate. Cutting the time clinicians spend hunting for information is among the most consistently demonstrated benefits of clinical AI, and large language models are demonstrably capable at parsing and summarizing records. The risk is specific. A clean summary can be more dangerous than a messy record, because it invites trust it has not yet earned, and summarization can quietly drop the one detail that mattered. So the agent's summary always links back to the source. It is a faster way into the record, never a substitute for it.
3. Clinical Decision Support Agent
This is the agent that can do the most good and the most harm, and it is where our entire design philosophy gets tested.
The Clinical Decision Support Agent highlights abnormal laboratory results, identifies potential drug interactions, suggests evidence-based guidelines, flags clinical risks, and supports differential diagnosis. It does not decide. The clinician owns the final call. That is not fine print, it is the constraint the agent is built around.
Evidence grade, mixed, and the mix is the lesson. Decision support has decades of history and a notorious failure mode. Traditional rule-based alerting, including drug-interaction warnings, floods clinicians with low-relevance noise, and override rates have been reported as high as 96 percent. That is alert fatigue, and it is its own patient safety hazard, because a clinician trained to dismiss alerts will eventually dismiss the one that would have saved someone. A 2025 systematic review in JAMIA found the evidence linking drug-interaction alerts to better patient outcomes remains limited, partly because the underlying knowledge of which interactions truly matter is itself often low quality.
Newer AI methods promise to fix this by making alerts specific and patient-aware, and early studies suggest they cut inappropriate alerts. But a scoping review of those studies found that not one reported external validation or transparency of model development. The promise is real. The proof is not yet in.
We draw three hard rules from this. Precision beats volume, always, because an agent that interrupts constantly is worse than no agent at all. Until external validation exists, the agent supports the clinician's reasoning and never issues instructions. And the responsibility line does not move. The agent surfaces, the clinician decides.
4. Hospital Operations Agent
A hospital makes thousands of operational decisions a day, most of them blind to the rest of the building. The result is the queue that forms while a bed sits empty two floors up.
The Hospital Operations Agent works on patient flow, queue management, scheduling, bed allocation, staff workload, and resource utilization. The objective is efficiency without new administrative weight, which is the exact trap most operational software walks straight into.
Evidence grade, early and promising. Operations sit in a friendlier risk class than clinical care, because a scheduling error is recoverable in a way a missed diagnosis is not, and the lighter regulatory bar lets deployment move faster. The evidence here is mostly single-site deployments rather than controlled trials, so the right expectation is measurable local gains, not a guaranteed systemwide leap. In a crowded, under-resourced hospital, though, even a modest improvement in flow shows up directly as shorter waits and less crowding. We hold this agent to the specific bottlenecks of the facility it serves, not to a generic benchmark that flatters everyone.
5. Revenue Cycle Agent
In many African facilities, revenue lost to bad coding is not an accounting footnote. It is the reason staff go unpaid and shelves go empty.
The Revenue Cycle Agent verifies billing completeness, reviews service documentation, identifies missing billable services, and improves claim readiness. It makes sure the work that was done is the work that gets captured, and paid.
Evidence grade, moderate, mostly commercial. Revenue cycle automation is one of the most active areas of healthcare AI deployment, and improved coding accuracy and claim readiness are well-established gains. Much of the supporting evidence is vendor-reported return on investment rather than independent study, so read the numbers with care. But the mechanism is sound and, unusually, self-verifying. Better-documented, more complete, more accurate claims get paid more reliably, and the agent's output can be checked against what was actually reimbursed. A finance team can confirm this agent's value directly, rather than take it on faith. That is rare in this field, and it is the point.
6. Laboratory Intelligence Agent
The critical result is useless if it sits unread in a queue behind two hundred normal ones. That delay, not the test itself, is where labs lose patients.
The Laboratory Intelligence Agent detects abnormal trends, prioritizes critical results, assists workflow, notifies clinicians of urgent findings, and monitors quality indicators. The function that matters most is the fastest one, putting a critical result in front of the right clinician without delay.
Evidence grade, moderate. Abnormality detection and result prioritization are mature, well-understood machine learning tasks, and automated notification of significant findings already runs in advanced surveillance systems. The hard part was never detection. It is the closed loop, making sure an urgent finding actually reaches someone who acts on it, and that the alerting does not itself decay into noise. We engineer the escalation logic around that loop, because a critical-result alert no one sees is identical to no alert at all.
7. Pharmacy Intelligence Agent
Medication safety is among the highest-stakes problems in any health system, and in much of Africa it runs headlong into a second problem, the stockout.
The Pharmacy Intelligence Agent supports pharmacists by reviewing prescriptions, detecting interactions, identifying duplicate therapies, monitoring availability, and forecasting inventory.
Evidence grade, mixed for safety, strong and practical for supply. The interaction-detection function inherits the same alert-fatigue caution as decision support, and a 2025 comparative evaluation found that large language models and conventional interaction databases each leave meaningful gaps when tested against real patient data, which is precisely why the pharmacist's judgment stays central. The supply-side functions stand on firmer and more useful ground, and they answer a problem that is acute across the continent, where essential-medicine stockouts are routine and forecasting is often done by hand. An agent that sees a stockout coming before it happens delivers value that is easy to verify and hard to overstate.
8. Public Health Intelligence Agent
Aggregated, anonymized data reveals patterns no single facility can see, an outbreak forming, coverage collapsing, a region tipping toward crisis.
The Public Health Intelligence Agent identifies disease outbreaks, regional health trends, resource shortages, vaccination coverage, and population-level risk, to put real signal in front of the people who allocate resources.
Evidence grade, strong for detection, weak for forecasting, and confusing the two is how public health AI fails. AI is genuinely good at detecting anomalies and surfacing early signals. A 2025 study of an AI tool for hospital outbreak investigation found 37 transmission routes that manual review had missed, reaching a sensitivity above 90 percent after accounting for downstream exposures. That is a real, demonstrated strength.
Forecasting is another story. A systematic review of 67 AI-based early-warning studies found no shared performance benchmark and persistent concerns about data quality, bias, and deployment readiness. AI also breaks down on genuinely novel outbreaks with no historical data to learn from, and it performs worst in exactly the data-poor rural and low-income settings where early warning would save the most lives. The credible position, and ours, is blunt. AI should sharpen traditional public health surveillance, not replace it. The agent flags signals for human epidemiologists to verify. It does not issue forecasts on its own.
The spine that holds the ecosystem together
Three principles run through all eight agents, and they are the reason this is an architecture rather than a feature list.
Autonomy is rationed to evidence. Documentation and operations move faster because the cost of error is recoverable. Decision support and forecasting are deliberately held back, because the cost of misplaced confidence there is counted in patients. No agent is trusted beyond what the data has proven.
The human stays accountable. Across every agent, the professional decides and the agent prepares. This is not a positioning statement. It is the only way to build a system clinicians will trust and regulators will accept.
Context is not an afterthought. Curely is built in Kampala for the realities of African and other lower-resource health systems, fragmented records, scarce specialists, frequent stockouts, long queues. The literature is unambiguous that AI built for well-resourced systems transfers poorly to these settings, and that infrastructure, governance, and local validation, not raw algorithmic power, are the binding constraints. An ecosystem of specialized agents, validated where it is actually deployed, is an honest answer to that reality. A single assistant promising to do everything is not.
The intelligence layer for healthcare will not arrive as one system that knows everything. It will be assembled from many narrow systems, each doing one thing reliably and each knowing the edge of what it knows. That is what we are building. And grading our own evidence honestly, in public, is not a caveat on the work. It is the work.
Related reading
Healthcare AI
Human Oversight Is Necessary but Not a Safety Strategy for Clinical AI
Evidence from primary care in Kenya shows clinical AI can cut errors while still passing harmful recommendations through human review. Oversight is necessary, but until it is designed and measured, it is not a safety strategy.
ReadHealthcare AI
Artificial Intelligence in Stroke Care, Where the Evidence Is Strong and Where It Is Not
AI is already changing how stroke teams read scans and move patients to treatment, but the evidence is uneven. Diagnosis and workflow show the strongest signal, prediction remains promising, and the hardest gaps are in low-resource systems.
ReadHealthcare AI
Artificial Intelligence in Pandemic Preparedness, What the Evidence Actually Supports
AI has become a real layer in pandemic preparedness, but its value is concentrated and uneven. The strongest evidence sits in early detection and genomic surveillance, the weakest in long-range prediction. Here is the honest read for health systems.
Read
Put it into practice
Hospital operating system
CurelyHMS
A connected hospital operating system — bed management, scheduling, supply, and revenue cycle in one intelligent layer.
ExplorePatient-centred AI
Patient Intelligence
Real-time patient profiles that surface risk, care gaps, and the right context at the right moment in care.
ExploreClinician copilot
AI Clinical Assistance
Clinician copilots for chart summarization, evidence retrieval, and documentation at the point of care.
Explore

