NB: this series is still a work in progress.

This post builds off of our previous discussions on healthcare AI infrastructure. If you are unfamiliar with that infrastructure, it may be helpful to review the posts that cover the AI lifecycle or the general infrastructure landscape.


Technical Integration Testing

Although I’ve alluded to it, we haven’t formally discussed testing side of integration yet. Testing all the components needed to technically implement the system is something that I refer to as technical integration testing. After careful consideration of clinical workflow it is one of the most important steps in the implementation process, imo.

The basic premise of technical integration testing is to double check that you get the expected results from implementation components and that the system functions correct.
This can be tricky because you need a good end-to-end understanding of the system and should approach each of the components from several different perspectives (software engineer, data engineer, ML engineer). Additionally, we don’t have a standard toolbox to use when we are conducting technical integration testing.

Although I didn’t have a guide book, I tried to approach this process in a systematic manner through the course of the M-CURES project. I ended up creating several techniques that can be … [TODO: transition]

A couple of the techniques were simply around getting more information from the integration system. These involved closely examining the way data was being passed to and from the model. This is crucial because small changes in data format coming in can have big downstream consequences. As such we developed some techniques that allowed us to debug how our model was receiving and processing data. These techniques utilized the Python error console that Epic provided in the ECCP management dashboard. We built custom errors that helped assure that we were receiving and processing data in the correct manner. This process helped us refine our mental model of ECCP to align with the way it actually works.

Part of the ECCP production debugging was inspired by another line of testing that we had conducted, which was diffing predictions. Diffing predictions grew out of a technique we had developed for analyzing prospective performance degradation. The basic premise is straightforward. Run the same information through two different implementations of the same

These techniques were:

  • ECCP Production Debugging
  • Diffing PatientLevel Predictions

ECCP Production Debuggin

During this implementation process I developed 2 techniques that could ev

some approaches for That being said, I did take a shot at doing a systematic

This is an area I’m particularly interested in and hopefully I can convince some peer reviewers that

It would be

Its tricky as you have to you need to approach the system from a c

During this process - slate vs. manually running the model - production debugging

Go ÖN Home