Journal of Surgical Education. Can be found here.

Abstract

Objective

To validate the performance of a natural language processing (NLP) model in characterizing the quality of feedback provided to surgical trainees.

Design

Narrative surgical resident feedback transcripts were collected from a large academic institution and classified for quality by trained coders. 75% of classified transcripts were used to train a logistic regression NLP model and 25% were used for testing the model. The NLP model was trained by uploading classified transcripts and tested using unclassified transcripts. The model then classified those transcripts into dichotomized high- and low- quality ratings. Model performance was primarily assessed in terms of accuracy and secondary performance measures including sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC).

Setting

A surgical residency program based in a large academic medical center.

Participants

All surgical residents who received feedback via the Society for Improving Medical Professional Learning smartphone application (SIMPL, Boston, MA) in August 2019.

Results

The model classified the quality (high vs. low) of 2,416 narrative feedback transcripts with an accuracy of 0.83 (95% confidence interval: 0.80, 0.86), sensitivity of 0.37 (0.33, 0.45), specificity of 0.97 (0.96, 0.98), and an area under the receiver operating characteristic curve of 0.86 (0.83, 0.87).

Conclusions

The NLP model classified the quality of operative performance feedback with high accuracy and specificity. NLP offers residency programs the opportunity to efficiently measure feedback quality. This information can be used for feedback improvement efforts and ultimately, the education of surgical trainees.

Share on

X Facebook LinkedIn Bluesky

Natural Language Processing and Assessment of Resident Feedback Quality

Erkin Ötleş

Abstract

Objective

Design

Setting

Participants

Results

Conclusions

Share on

You May Also Enjoy

UW Emergency Medicine Trauma Case Conference, ED Thoracotomy, Pediatric Hematuria, and Evaluating Predictive AI

AI Infrastructure: Technical Integration Testing

AI Infrastructure Example: C. difficile Infection Risk

Healthcare AI Infrastructure - To be deprecated