NB: this series is still a work in progress.

Implementing Healthcare AI

This post builds off of a previous introduction to the healthcare AI lifecycle and a discussion on healthcare AI development. These are not necessary pre-reading, but they provide a good background for the main focus of this post: Implementation  Implementation is the work of integrating and utilizing an AI model into clinical care.

In this post, we will first cover some of the key implementation steps and then some of the general challenges associated with implementing AI tools in healthcare.

 Healthcare AI implementation portion of the lifecycle. Implementation is the creation of models and involves predictive task selection, data access, data preparation, model training, and model validation.  
Healthcare AI implementation portion of the lifecycle. Implementation is the integration of models into workflows and generally has the following steps: technical integration, prospective validation, workflow integration, monitoring, and updating.

Implementation Steps

Like the development process, I break down implementation into five steps.

  • Technical Integration
  • Prospective Validation
  • Workflow Integration
  • Monitoring
  • Updating Although the same caveats about distinction and non-linearity apply to these steps, I tend to think there’s a bit more structure to this process. That’s because there’s more “on the line” the further along the lifecycle you get. So, it’s best to be sure you’ve perfected a step before moving on to the next.

Technical Integration

Technical integration is the first step where the rubber meets the road. It’s about getting the AI model to communicate effectively with existing healthcare IT systems. This often involves working closely with IT departments to ensure data flows smoothly and securely from electronic medical records (EMRs) to the AI model and back. This step is crucial for silent prospective validation, where the model’s predictions are tested in a live clinical data environment without affecting clinical decisions.

Prospective Validation

Prospective validation is the first high-fidelity test of the model. It’s about running the model in the real world but in a controlled manner. The aim is to see how the model performs with live data without directly impacting patient care. This step is critical for assessing the model’s readiness for full-scale implementation and identifying any unforeseen issues that might not have been apparent during development.

Prospective validation is sometimes the only way to assess if your model development and technical integration worked correctly. We did a deep dive into an AI model we developed and implemented for the health system. This work is cataloged in the Mind the Performance Gap: Dataset Shift During Prospective Validation paper. In addition to discussing prospective validation, it also uncovered a new type of dataset shift driven primarily by issues in our health IT infrastructure. The difference between the data our model saw during development and implementation environments caused a noticeable degradation in performance. So, we needed to rework our model and the technical integration in order to ameliorate this performance degradation.

Workflow Integration

Integrating an AI model into clinical workflows is more art than science. It’s about understanding how healthcare professionals work and how the AI tool can fit into their routines without causing disruption. This might involve designing intuitive user interfaces for clinicians or setting up alert systems that provide actionable insights without overwhelming the user.


The job isn’t over once an AI model is up and running. Continuous monitoring ensures the model remains performant and relevant over time. This involves tracking the model’s performance, identifying any drifts in accuracy, and being alert to changes in clinical practices that might affect how the model should be used.


You don’t “set it and forget it” with AI models in healthcare. Models in use must be maintained as medical knowledge advances and patient populations change. Updating models might involve:

  • retraining with new data,
  • incorporating feedback from users, or
  • Redesigning the model to accommodate new clinical guidelines or technologies. 

Ensuring models remain current and relevant involves more than just routine retraining with new datasets. It demands a thoughtful approach, considering how updates might impact the user’s trust and the model’s usability in clinical settings. This is where our recent work on Updating Clinical Risk Stratification Models Using Rank-Based Compatibility comes into play. We developed mathematical techniques to ensure that updated models maintain the correct behavior of previous models that physicians may have come to depend on. 

Updating models to maintain or enhance their performance is crucial, especially as new data become available or when data shifts occur. However, these updates must maintain the user’s expectations and the established workflow. Our research introduced a novel rank-based compatibility measure that allows us to evaluate and ensure that the updated model’s rankings align with those of the original model, preserving the clinician’s trust in the AI tool.


Implementing AI models into clinical care can be challenging. During model implementation, the goal is to use models to estimate unknown information that can be used to guide various healthcare processes. This exposes models to the transient behaviors of the healthcare system. Over time, we expect the model’s performance to change. Even though the model in use may not be changing, the healthcare system is, and these changes in the healthcare system may reflect new patterns that the model was not trained to identify.

Contrasting this with the fact that the model may also change over time is essential. Although we often talk about static models (which may be updated occasionally by model developers), it is important to note that some are inherently dynamic. These models change their behavior over time. Employing updating and dynamic models produces a second set of factors impacting how a model’s performance could change over time. Thus, it could be hard to disentangle issues arising from new model behaviors or changes in the healthcare system.

To make things more concrete, here are some examples:

  • A model flags patients based on their risk of developing sepsis. There is an increase in the population of patients admitted with respiratory complaints due to a viral pandemic. This change in patient population leads to a massive increase in the number of patients the model flags, and the overall model performance drops because these patients do not end up experiencing sepsis. This is an example of a static model being impacted by the changes in the healthcare system over time.
  • A model identifies physicians who could benefit from additional training. The model uses a limited set of specially collected information.  Model developers create a new model version that utilizes EHR data. After implementation, the updated model identifies physicians with better accuracy. This is an example of a static model being updated to improve performance over time.

Transition from Bench-to-Bedside

Implementation into clinical care requires the model to be connected to systems that can present it with real-time data. We refer to these systems as infrastructure. Infrastructure are the systems (primarily IT systems) needed to take data recorded during clinical care operations and present it in a format accessible to ML models. This infrastructure determines the availability, format, and content of information. Although data may be collected in the same source HIT system (e.g., an EHR system), the data may be passed through a different series of extract, transform, and load (ETL) processes (sometimes referred to as pipelines) depending on the data use target.

Once connected to clinical care, ML models need monitoring and updating. For example, developers may want to incorporate knowledge about a new biomarker that changes how a disease is diagnosed and managed. Model developers may thus consider updating models as a part of their regular maintenance.

Physician-AI Teams

This maintenance is complicated because models do not operate in a vacuum. In many application areas, users interact with models and learn about their behavior over time. In safety-critical applications, like healthcare, models and users may function as a team. The user and model each individually assess patients. The decision maker, usually the user, considers both assessments (their own and the model’s) and then decides based on all available information. The performance of this decision is the user-model team performance.

A Note on Deployment vs Integration vs Implementation

As we finish, I want to make a quick note on terminology. We often use the terms implementation, deployment, and integration interchangeably; however, there are subtle but important distinctions between them. Precision in language between these three terms is crucial when discussing connecting AI tools to care processes.

  • Deployment—This one has a heavy-handed vibe; it may conjure up images of a military operation. In the tech realm, it’s about pushing out code or updates from one side (developers) without much say from the other side (users). I view it as a one-way street, with the developers calling the shots. But in healthcare, where the stakes are high and workflow and subject matter expertise are paramount, this mindset doesn’t yield great results. We can deploy code, but we should be wary of deploying workflows. Instead, we should co-develop workflows with all the necessary stakeholders.
  • Integration—This is the process of getting an AI model to work with the existing tech stack, like fitting a new piece into a complex puzzle. But just because the piece fits doesn’t mean it will be used effectively or at all. Integration focuses on the technical handshake between systems, but it can miss the bigger picture – workflow needs and human factors.
  • Implementation – This is where the magic happens. It’s not just about the technical melding of AI into healthcare systems; it’s about weaving it into the fabric of clinical workflows and practices. It’s a two-way street, a dialogue between developers and end-users (clinicians and sometimes patients). Implementation is a collaborative evolving process that treats users as partners in the socio-technical development of an AI system. It acknowledges that for AI to make a difference, it needs to be embraced and utilized by those on the front lines of patient care.

So, when discussing AI in healthcare, let’s lean more towards implementation. It’s about more than just getting the tech right; it’s about fostering a collaborative ecosystem where we can make tools that genuinely contribute to better health outcomes by meeting the needs of clinical users and workflows.

Wrapping Up

We’ve traversed through the steps of technical integration, prospective validation, workflow integration, monitoring, and updating, each with its challenges and nuances. We’ve also untangled some jargon—implementation, deployment, integration—words that might seem interchangeable but have different implications in healthcare AI. Implementation is more than just a technical task; it’s a collaborative endeavor that requires developers and clinicians to collaborate, ensuring AI tools not only fit into healthcare workflows but also genuinely enhance patient care.

This post wraps up our overview of the healthcare AI lifecycle. In the next few posts, we will discuss the infrastructure necessary to power all this.

Some of this content was adapted from the introductory chapter of my doctoral thesis, Machine Learning for Healthcare: Model Development and Implementation in Longitudinal Settings.

Go ÖN Home