NB: this series is still a work in progress.

Healthcare AI Development

This is the second post in our series on the healthcare AI lifecycle, to start at the beginning click here.

Now that we have a general framework for the healthcare AI lifecycle we can get a little deeper into the weeds. In the abscence of a better starting spot, we will start at what I perceive to be the “beginning,”1 which is development.

Development are the processes involved in creating an AI model.

Healthcare AI development portion of the lifecycle. Development is the creation of models and involves predictive task selection, data access, data preparation, model training, and model validation.
Healthcare AI development portion of the lifecycle. Development is the creation of models.

Development Steps

I like to break down the development phase into five steps:

  • Task Selection
  • Data Access
  • Data Preparation
  • Model Training
  • Model Validation

Its easiest to depict these steps as discrete and chronological, but that’s a little disingenuous. These steps are semi-continuous with one another and they tend to not be linear. Instead, you may find model developers jumping back and forth between these steps or doing them concurrently. However, these steps are generally all present in model development projects and you generally do tend to finalize them in the order presented.

We will now go into a brief discussion of the development steps.

Task Selection

Choosing the right problem for AI to tackle is crucial. It’s not just about finding a gap; it’s about ensuring the AI solution can actually improve outcomes or efficiency in a meaningful way. We’re looking for problems where AI can provide insights or automation that weren’t feasible before This step should involve lots of discussions with clinicians to pinpoint where they feel the pressure and where they think AI could help.

Caution should always be exercised when someone says “I just want an AI to predict/do X”. There may be deeper or related problems that should be surfaced before jumping directly in the initial direction. A great approach for overcoming this issue is to ask a bunch of questions. Some of my favorite lines of inquiry are:

  • Sequential Why? Asking why or how repeatedly often is a super fast way of understanding the existing problem/system.
  • Would magic help? Asking how a “perfect solution” would help (e.g., “If I could give you Y information with 100% accuracy how would that help?”) gives you a sense of the maximum possible benefit of a solution.
  • Do you have data? If the answer is no, you should think long and hard about if this project is truly feasible.

Data Access

Getting the right data is often the first big hurdle of AI development. The data needs to be comprehensive, clean(able), and relevant. This step involves negotiating access to medical records while ensuring patient privacy and data security. Its imperative that you consider the following:

  • Provenance: where is the data coming from? Who’s going to get it for you?
  • Protection: how are you going to ensure that the data are properly potected? I recommend working directly in hospital IT systems or working with them to spec out compliant environments.
  • Prospective use: will you have this data available when you are trying to use this system prospectively or in the real-world?

Data Preparation

Having obtained data, model developers may realize that healthcare data, like healthcare itself, is complicated. Processing and transforming data for AI model development requires a unique mix of clinical and technical expertise. Preparing this data for AI involves cleaning it, dealing with missing values, and transforming it into a format that algorithms can work with.

This step is usually pretty labor intensive, my estimate is that 90% of the engineering time will be dedicated to data preparation work. It can be helpful to use tools to help automate the data preparation. I made a tool called TemporalTransformer that can help you quickly convert EMR or claims data into a format ready for processing with neural networks/foundation models. I discuss it in the supplemental of my paper on predicting return to work and you can find code here.

Model Training

Training the model is where the may be the most exciting step for the technical members of the project. But its often one of the shortest parts of the project (in terms of wall-time, not CPU/GPU time). We select algorithms, tune parameters, and iteratively improve the model based on its performance. This step is a mix of science, art, and a bit of luck. The goal is to develop a model that’s both performant and generalizable.

Model Validation

After being developed, models must be validated to assess if they will benefit patients, physicians, or healthcare systems. Validation means testing the model on new, unseen data to ensure it performs well in settings representative of intended real-world usage. Ultimately, it’s about making sure the model isn’t just memorizing the data it’s seen but can actually make good predictions on new data.

This step often involves internal and external validation to ensure robustness. There are varying definitions for internal and external validation, but the distinction I like to use is based on the system generating the underlying data. If the data comes from the same system (e.g., same hospital, just a different timespan) then I would consider it internal validation data. A well conducted external validation is a great way to assess if a model will work in a given environment. However, external validation may be challenging due to data-sharing restrictions. Despite this challenge, its often a great place to get started in engaging with healthcare AI system, especially for physicians. Here are some examples of external validation studies that I’ve worked on:

Wrapping Up

We’ve taken a closer look at the development phase of healthcare AI, covering everything from task selection to model validation. Each step is filled with unique challenges and requiring a blend of clinical insight, data science expertise, and ethical considerations. It’s worth noting that while we’ve covered a lot of ground here, each of these development steps could easily merit its own detailed post. The discussions here have been intentionally brief to provide an overview and establish a foundation. In future posts, I may delve deeper into each aspect, unpacking the complexities and sharing insights on how to navigate these crucial stages in the lifecycle of healthcare AI development.

The journey of developing healthcare AI is a continuous one, with each step uncovering new data, insights, and evolving clinical needs. As we advance, our goal remains clear: to harness the potential of AI in improving patient care, enhancing healthcare operations, and advancing the practice of medicine.

Thank you for joining me on this exploration of healthcare AI development. Stay tuned for more detailed discussions on each step and other facets of the healthcare AI lifecycle in upcoming posts. Until next time, keep pushing the boundaries of what’s possible in healthcare with AI.

Go ÖN Home

Some of this content was adapted from the introductory chapter of my doctoral thesis, Machine Learning for Healthcare: Model Development and Implementation in Longitudinal Settings.

  1. “If you wish to make an apple pie from scratch, you must first invent the universe.” - Carl Sagan