Forecasting emergency department boarding

Presented at SAIL 2026, this poster describes a prospective evaluation of methods for forecasting emergency department (ED) boarding across two UC San Diego Health emergency departments. We compared classical econometric models, machine learning models, time-series foundation models, and hybrid approaches in a live operational setting.

The central finding is that, for this kind of short-horizon operational forecasting problem, classical multivariate forecasting remains very difficult to beat. A well-specified vector autoregression (VAR) model consistently improved on the operational moving-average baseline at both La Jolla and Hillcrest. More complex hybrid and foundation-model approaches were sometimes helpful, but their gains were more modest and site-dependent.

I think this is an important result for healthcare AI. There is a natural tendency to assume that newer model classes should dominate older methods, especially as foundation models become available for more data modalities. However, in operational medicine, the best model is often the one that fits the structure of the problem, can be retrained reliably, produces interpretable outputs, and can be placed into a real decision-making workflow. In this case, a transparent multivariate time-series model provided a strong practical foundation for forecast-driven operational planning.

Presentation

Link to download.

Abstract

Background: Emergency department boarding is a major driver of crowding, delays in care, and downstream operational harm. Short-horizon forecasts of boarding volume may support proactive health-system actions, including discharge acceleration, intra-system transfers, and elective surgical adjustments. However, much of the prior forecasting literature focuses on ED arrivals and is evaluated retrospectively, rather than prospectively in live operational settings.

Methods: We prospectively compared six forecasting approaches spanning econometrics, machine learning, and time-series foundation models to predict daily ED boarding volume up to four days ahead (T+1 to T+4) across two UC San Diego Health EDs, La Jolla and Hillcrest. Daily covariates represented key operational drivers, including recent boarding volume, planned surgical volume, hospital census, and temporal indicators. Each morning at 07:10, models were retrained using all available data through day T and generated forecasts for T+1 to T+4 using an identical rolling evaluation framework. We evaluated vector autoregression (VAR), XGBoost, Google TimesFM, and two hybrid extensions (VAR+XGBoost and TimesFM+XReg), against a two-week moving-average operational baseline.

Results: During a four-month prospective validation period, VAR consistently outperformed the operational baseline at both sites. The VAR+XGBoost hybrid achieved the lowest RMSE for three of four forecast horizons at La Jolla and two of four forecast horizons at Hillcrest. Relative to baseline RMSE, VAR+XGBoost reduced forecast error by 16 to 42% at La Jolla and 5 to 19% at Hillcrest. However, the incremental gain over VAR alone was modest, highlighting the continued value of interpretable econometric structure for multivariate operational time series. TimesFM performance was site-dependent, and the addition of external regressors through TimesFM+XReg improved accuracy at La Jolla and only at shorter horizons at Hillcrest. VAR was implemented in live operations, with forecasts emailed daily to Mission Control and reviewed in the morning huddle to inform same-day planning. Forecast uncertainty was communicated using 80% confidence intervals, which demonstrated good coverage during prospective use.

Conclusion: This work demonstrates a practical learning health system workflow for forecast-driven decision support. It provides a prospective, real-world comparison of econometric, machine learning, and foundation-model approaches for ED boarding prediction, and suggests that classical multivariate forecasting remains a strong baseline even when modern foundation models and hybrid methods are available.

What we studied

ED boarding is an operationally meaningful forecasting target because it sits at the intersection of emergency care, inpatient capacity, transfer decisions, discharge planning, and elective procedural scheduling. Unlike ED arrivals, which are often modeled as a demand signal, boarding reflects the interaction between demand, hospital throughput, and capacity constraints. That makes it a more difficult forecasting problem, but also a more actionable one.

We compared six approaches for daily ED boarding forecasts:

1) Two-week moving average, used as the operational baseline

2) Vector autoregression (VAR), a classical econometric model for multivariate time series

3) XGBoost, a flexible machine learning model

4) TimesFM, a time-series foundation model

5) VAR + XGBoost, a hybrid model designed to combine structured time-series forecasting with nonlinear residual modeling

6) TimesFM + XReg, a foundation-model extension incorporating external regressors

The models were evaluated prospectively from July through October 2024 at two UC San Diego Health emergency departments, La Jolla and Hillcrest.

Operational implementation

The VAR model was implemented in live operations. Each morning at 07:10, updated forecasts were generated and emailed to Mission Control staff and health-system leaders. The forecasts were then reviewed during the daily huddle and used to support proactive operational planning, including discharge planning, intra-system transfers, and surgical-case scheduling.

This implementation detail matters. Retrospective forecasting studies can show whether a model might have been accurate under historical conditions, but prospective use tests a broader set of requirements. The data must be available on time. The pipeline must run reliably. The forecast must arrive early enough to shape operational decisions. The output must be understandable to the people who can act on it. In this project, the deployed forecast was not simply an academic prediction exercise, it was incorporated into the daily rhythm of health-system operations.

Key findings

  • VAR consistently outperformed the operational moving-average baseline at both sites.
  • VAR + XGBoost produced the best point performance across several horizons, but only modestly improved on VAR alone.
  • TimesFM performance varied by site, performing worse than baseline at La Jolla and better than baseline at Hillcrest.
  • Adding external regressors improved TimesFM at La Jolla and at shorter horizons at Hillcrest.
  • Uncertainty estimates from the deployed VAR model were communicated as 80% confidence intervals and showed good prospective coverage.

Why this matters

For me, the most interesting part of this work is not that one model won. It is that the results complicate a simple narrative about model sophistication. Modern foundation models and flexible machine learning approaches are powerful, but operational healthcare forecasting is not only a model-selection problem. It is also a data availability problem, a workflow problem, and a decision-support problem.

In this setting, the classical VAR model had several advantages. It could directly model relationships among operational time series. It could be retrained in a straightforward rolling framework. It produced forecasts and uncertainty estimates that were relatively easy to communicate. It also fit the cadence of the operational workflow, where daily forecasts needed to be available early enough to support planning.

The broader lesson is that healthcare AI systems should be evaluated in the context in which they will be used. For operational forecasting, a useful system needs to be accurate enough, reliable enough, understandable enough, and timely enough to support action. Prospective evaluation is essential because it captures the frictions that retrospective experiments often miss.

Citation

Poursoltan L, Ötleş E, Cao J, Clay B, Trimble B, Adrid L, Pan J, Chua A, Bell J, Longhurst CA, Zhu K, Singh K. Prospective comparison of econometric, machine learning, and foundation models for forecasting emergency department boarding patients. Presented at SAIL 2026.

Cheers,

Erkin
Go ÖN Home