NB: this series is still a work in progress.

Implementing Healthcare AI

This post builds off of a previous introduction to the healthcare AI lifecycle and a discussion on healthcare AI development. These are not necessary pre-reading, but they provide some additional context/grounding.

Implementation is the work of integrating and utilizing an AI model into clinical care.

In this post we will first cover some of the key steps of implementation and then cover some of the general challenges associated with implementing AI tools in healthcare.

Healthcare AI implementation portion of the lifecycle. Implementation is the creation of models and involves predictive task selection, data access, data preparation, model training, and model validation.
Healthcare AI implementation portion of the lifecycle. Implementation is the integration of models into workflows.

Implementation Steps

Like the development process I like to break down implementation into 5 steps.

  • Technical Integration
  • Prospective Validation
  • Workflow Integration
  • Monitoring
  • Updating Although the same caveats about about distinction and non-linearity apply to these step I tend to think there’s a bit more of a structure to this process. That’s due to the fact that there;s more “on the line” the further you get along the lifecycle.

Technical Integration

Technical integration is the first step where the rubber meets the road. It’s about getting the AI model to communicate effectively with existing healthcare IT systems. This often involves working closely with IT departments to ensure data flows smoothly and securely from electronic medical records (EMRs) to the AI model and back. This step is crucial for silent prospective validation, where the model’s predictions are tested in a live environment without affecting clinical decisions

Prospective Validation

Prospective validation is where we test the waters. It’s about running the model in the real world but in a controlled manner. The aim is to see how the model performs with live data, without directly impacting patient care. This step is critical for assessing the model’s readiness for full-scale implementation and identifying any unforeseen issues that might not have been apparent during development.

Prospective validation is sometimes the only way to assess if your model development and technical integration worked correctly. We did a deep dive on an AI model we developed and implemented for the health system. This work is catalogued in the Mind the Performance Gap: Dataset Shift During Prospective Validation paper. In addition to discussing prospective validation it also uncovered a new type of dataset shift which was driven primarily by issues in our health IT infrastructure. The difference between the data our model saw during development and implementation environments caused a pretty noticeable degradation in performance.

Workflow Integration

Integrating an AI model into clinical workflows is more art than science. It’s about understanding how healthcare professionals work and how the AI tool can fit into their routines without causing disruption. This might involve designing user interfaces that are intuitive for clinicians or setting up alert systems that provide actionable insights without overwhelming the user.


Once an AI model is up and running, the job isn’t over. Continuous monitoring is essential to ensure that the model remains performant and relevant over time. This involves tracking the model’s performance, identifying any drifts in accuracy, and being alert to changes in clinical practices that might affect how the model should be used.


You don’t “set it and forget it” with AI models in healthcare. They need to evolve as medical knowledge advances and as patient populations change. Updating models might involve retraining with new data, incorporating feedback from users, or even redesigning the model to accommodate new clinical guidelines or technologies

Ensuring models remain current and relevant involves more than just routine retraining with new datasets. It demands a thoughtful approach that considers how updates might impact the user’s trust and the model’s usability in clinical settings. This is where our recent work on Updating Clinical Risk Stratification Models Using Rank-Based Compatibility comes into play.

Updating models to maintain or enhance their performance is crucial, especially as new data become available or when data shifts occur. However, it’s imperative that these updates do not disrupt the user’s expectations or the established workflow. Our research introduced a novel rank-based compatibility measure that allows us to evaluate and ensure that the updated model’s rankings align with those of the original model, preserving the clinician’s trust in the AI tool.


Overall, implementing ML models into clinical care is extremely challenging. During model implementation, the goal is to use models to estimate unknown information that can be used to guide various healthcare processes. This exposes models to the transient behaviors of the healthcare system. Over time we expect the model’s performance to change. Even though the model in use is not changing, the healthcare system is, and these changes in the healthcare system may reflect new patterns that the model was not trained to identify.

Its important to contrast this with the fact that the model in use may also change over time. Although we often talk about static models (that may be updated occasionally by model developers) it is important to that there are some inherently dynamic models. These models change their behavior over time. Employing updating and dynamic models means that there’s a second set of reasons why a model’s performance would be expected to change over time. Thus it could be hard to disentangle issues arising from new model behaviors or changes in the healthcare system.

To make things more concrete, here are some examples:

  • A model flags patients based on their risk of developing sepsis. There is an increase in the population of patients admitted with respiratory complaints due to a viral pandemic. This change in patient population leads to a massive increase in the number of patients the model flags, and the overall model performance drops because these patients do not end up experiencing sepsis. This is an example of a static model being impacted by the changes in the healthcare system over time.
  • A model identifies physicians who could benefit from additional training. The model uses a limited set of specially collected information. Model developers create a new model version that utilizes EHR data. After implementation, the updated model identifies physicians with better accuracy. This is an example of a static model being updated to improve performance over time.

Transition from Bench-to-Bedside

Implementation into clinical care requires the model to be connected to systems that can present it with data in real-time. We refer to these systems as infrastructure. Infrastructure are the systems (primarily IT systems) needed to take data recorded as a part of clinical care operations and present it in a format accessible to ML models. This infrastructure determines the availability, format, and content of information. Although data may be collected in the same source HIT system (e.g., an EHR system), the data may be passed through a different series of extract, transform, and load (ETL) processes (sometimes referred to as pipelines) depending on the data use target.

Once connected into clinical care, ML models need monitoring and updating. For example, developers may want to incorporate knowledge about a new biomarker that changes how a disease is diagnosed and managed. Model developers may thus consider updating models as a part of their regular maintenance.

Physician-AI Teams

This maintenance is complicated because models do not operate in a vacuum. In many application areas users interact with models and learn to about their behavior over time. In safety-critical applications like healthcare, models and users may function as a team. The user and model each individually assess patients. The decision maker, usually the user, considers both assessments (their own and the model’s) and then makes a decision based on all available information. The performance of this decision is the user-model team performance.

A Note on Deployment vs Integration vs Implementation

As we finish up I want to make a quick note on nomenclature. We often toss around implementation, deployment, and integration interchangeably, however there are subtle but important distinctions between them. I think precision in language between these three terms is important when discussing connecting AI tools to care processes.

  • Deployment – Now, this one’s got a bit of a heavy vibe to it, doesn’t it? Kind of like rolling out tanks in a military operation. In the tech realm, it’s about pushing out code or updates from one side (developers) without much say from the other side (users). I view it as a one-way street, with the developers calling the shots. But in healthcare, where stakes are high and workflow and subject matter expertise are paramount, this frame of mind doesn’t yield great results. We can deploy code, but we should be wary of deploying workflows, instead we should be co-developing workflows with all the necessary stakeholders.
  • Integration – This is the nuts and bolts of getting an AI model to play nice with the existing tech stack, like fitting a new piece into a complex puzzle. But here’s the kicker: just because the piece fits doesn’t mean it’s going to be used effectively, or at all. Integration focuses on the technical handshake between systems, but it can miss the bigger picture – workflow needs and human factors. It’s like setting up an elaborate stage for a play without considering whether the actors know their lines or if the audience will even show up.
  • Implementation – Now we’re talking. Implementation is where the magic happens. It’s not just about the technical melding of AI into healthcare systems; it’s about weaving it into the fabric of clinical workflows and practices. It’s a two-way street, a dialogue between developers and end-users (clinicians and sometimes patients). Implementation is a collaborative evolving process that treats users as partners in the socio-technical development of an AI system. It acknowledges that for AI to truly make a difference, it needs to be embraced and utilized by those on the front lines of patient care.

So, when we talk about bringing AI into healthcare, let’s lean more towards implementation. It’s about more than just getting the tech right; it’s about fostering a collaborative ecosystem where we can make tools that genuinely contribute to better health outcomes by meeting the needs of clinical users and workflows.

Wrapping Up

And there you have it, an overview of implementing healthcare AI. We’ve traversed through the steps of technical integration, prospective validation, workflow integration, monitoring, and updating, each with its own set of challenges and nuances. We’ve also untangled some jargon along the way – implementation, deployment, integration – words that might seem interchangeable but carry different implications in the realm of healthcare AI. Implementation is more than just a technical task; it’s a collaborative endeavor that calls for developers and clinicians to join forces, ensuring AI tools not only fit into healthcare workflows but also genuinely enhance patient care.

Go ÖN Home

Some of this content was adapted from the introductory chapter of my doctoral thesis, Machine Learning for Healthcare: Model Development and Implementation in Longitudinal Settings.