Building Trust in AI Adoption for Healthcare

Expectations from Artificial Intelligence (AI) in healthcare are to be perfect because the risks are too high. We ask: Will AI make zero mistakes? Can it outperform doctors in every scenario? These questions, while understandable, overlook a more rational starting point: the Human Baseline.

Every healthcare process —triage, diagnostics, discharge planning, billing—is built on human judgment. These processes, while invaluable, are also imperfect, variable, and frequently unmeasured. By acknowledging this baseline, we can fairly evaluate AI: not against an impossible ideal, but against the real-world performance of human clinicians and staff. In either case, expectations from AI should not be to automate clinicians’ work or treatments/interventions, but rather to serve as an assistant in decision support.

This perspective reframes AI adoption from hype or fear into an evidence-driven, trust-building approach.

Why the Human Baseline Matters

1. Human error is pervasive but rarely measured.

  • Radiologists miss lung nodules on chest X-rays in 19–30% of cases, depending on the study and nodule type.
  • Colonoscopy procedures often leave behind missed adenomas in up to 25–30% of patients.
  • Emergency department triage has misclassification rates as high as 10–20%.
  • AI-supported screening reduced workload by 44.3% while maintaining cancer detection at levels comparable to double reading.
  • Trust Implication: AI doesn’t need to outperform humans outright. By matching human detection while reducing workload, it adds measurable value.

2. Colonoscopy: Miss Rates Matter

  • Human Baseline: Adenoma detection rate (ADR) in standard colonoscopy ~25–30%, with high miss rates.
  • AI Result: A 2024 meta-analysis of 28 RCTs (23,861 participants) showed AI-assisted colonoscopy increased ADR by 20% and cut adenoma miss rates by 55%.
  • Trade-off: Slightly longer procedure times.
  • Trust Implication: Again as a decision support system, by catching what humans miss, AI provides tangible improvements without needing perfection.

3. Sepsis: Time is Life

  • Human Baseline: Usual care detects sepsis late; bundle compliance is inconsistent. Mortality remains high.
  • AI Result: The COMPOSER model, deployed in two emergency departments, increased bundle compliance and was associated with a 17% relative reduction in mortality.
  • Caution: Independent validation of other tools (e.g., the Epic Sepsis Model) showed lower accuracy than vendor claims, underscoring the need for local benchmarking vs human baselines.
  • Trust Implication: AI can save lives — but only when validated transparently against real-world baselines.

4. Diabetic Retinopathy: Closing the Screening Gap

  • Human Baseline: In youth with diabetes, referral-based screening often fails; adherence is poor.
  • AI Result: A 2024 cluster RCT (Wolf et al.) found that autonomous AI increased screening completion and follow-up adherence vs traditional referral.
  • Trust Implication: Here, AI doesn’t just improve diagnostic accuracy — it solves a workflow bottleneck humans routinely miss.

5. Stroke: Speeding Up Critical Pathways

  • Human Baseline: Stroke care often suffers delays — every 15 minutes of treatment delay reduces the chance of a good outcome.
  • AI Result: The VALIDATE multicenter study found that an AI-enabled coordination platform shortened neuro-intervention contact time by ~40 minutes.
  • Trust Implication: AI contributes by reducing delays, a key weakness of human communication and coordination.

Ethical and Operational Lessons

The Human Baseline reframes AI adoption as:

  • Incremental gains, not perfection: Even minor relative improvements save lives.
  • Transparent trade-offs: AI might increase false positives, but if it dramatically cuts false negatives, the net gain is clear.
  • Shared accountability: Both human and AI processes should be continuously measured, audited, and improved.
  • Equity in baselines: Human performance varies by population and setting; AI must be tested across diverse baselines, not a single gold standard.

Building a Culture of Trust

Healthcare leaders should embed the Human Baseline approach into AI governance by:

  1. Measuring human performance first — make invisible errors visible.
  2. Benchmarking AI against humans, not perfection.
  3. Openly communicating trade-offs with clinicians and patients.
  4. Monitoring both AI and human performance over time.

AI is not flawless. But neither are humans. By anchoring evaluation in the Human Baseline, healthcare systems can shift adoption from hype or fear to rational trust. AI should be judged not by whether it is perfect, but by whether it makes care better, safer, and fairer than the current reality.

The Human Baseline doesn’t lower the bar — it shows us where the bar truly is.

📚 Key References

  • Lång et al. Lancet Digital Health (2023) — MASAI trial, AI in mammography.
  • Hernström et al. Lancet Digital Health (2025) — registry analysis of AI screening.
  • Makar et al. Meta-analysis, 2024 — AI in colonoscopy, ADR improvements.
  • Boussina et al. npj Digital Medicine (2024) — COMPOSER sepsis model outcomes.
  • Wong et al. JAMA Intern Med (2021) — Epic Sepsis Model external validation.
  • Wolf et al. Nature Communications (2024) — autonomous AI in diabetic retinopathy.
  • Devlin et al. Frontiers in Stroke (2024) — VALIDATE AI platform, stroke workflow.