Using Complex Bayesian Hierarchical Models for “Detection” with Count Data

Ben Brintz

IDEAS / VA

Domains of Interest as Collaborative Statistician

Clinical Prediction &
Decision Support

Design &
Analysis

Complex Bayesian Hierarchical Models in Stan

Complex Bayesian Hierarchical Models in Stan

  • Two examples of work in progress:
    • Stochastic SEIR models that incorporate imperfect detection - COVID-19
    • Early detection of call spikes - United Way 211 Calls (Food Insecurity Resources)

Note

SEIR models track transitions between susceptible, exposed, infectious, and removed states to represent epidemic dynamics.

Case study 1: COVID-19 is not perfectly reported

  • Latent disease transmission is never directly observed
  • Reported cases reflect testing intensity, reporting practices, and asymptomatic disease
    • Only the sickest got diagnosed with COVID-19 early in the pandemic
    • “We must rely on data from people who are sick enough to get themselves tested, which is a bit like trying to understand exercise trends among average Americans by surveying the participants of a marathon” - Utah Hero Project
  • Key capability: separate what is happening from what is observed.

Various approaches have been developed to estimate clinical detection rates

  • Back-calculation methods estimate the number of infections from hospitalizations and deaths
  • Seroprevalence studies estimate the proportion of the population that has been infected by testing for antibodies
  • Compartment models estimate the number of infections from observed cases and the distribution of the incubation period
    • E.g. SIR (Susceptible-Infectious-Recovered) models
  • Key Challenge: identifiability issues arise when trying to estimate both the transmission rate and the detection rate from observed data, especially when testing practices change over time.

We propose a novel hierarchical Bayesian SEIR model using reported counts of new cases

  • We model detection with a continuous approximation of the binomial distribution (no discrete priors in stan)
    • Lay interpretation: we assume we’re only observing a proportion of true infections
  • Model assumes transmission follows a hierarchical AR(1) process over time
    • Lay interpretation: transmission this week tends to resemble recent weeks, while still allowing small differences between districts.
  • Model estimation allows for different start times of infections across health-districts
  • Allows an evolving recovery/incubation rate (E->I and I->R transitions) using a novel approximation of a beta-binomial transition
    • Lay interpretation: the time it takes for people to become infectious/recover can change over time, and we model this flexibility directly rather than assuming it is constant.

Modeling strategy: allow flexibility where variability is real.

Model estimation allows for different start times of infections across health-districts

Note

This observed data aren’t smooth SEIR curves

Estimating an evolving recovery rate and incubation using beta-binomial transitions

Estimating an evolving transmission rate using a hierarchical AR(1) process

Utah COVID-19 clinical detection rate (posterior density)

Utah Estimated True Infections vs Reported Cases

Case study 2: Can spikes in 211 calls be detected early?

  • 211 is a nationwide, free public referral system that responds to >20 million requests for help each year from U.S. residents
  • We have call counts from Utah from 2019 - 2025.
  • Calls are often zero or low volume, but occasionally spike when there are major events (e.g. COVID-19, natural disasters, economic shocks)
  • Can we capture these spikes early to help with resource allocation and response?

How we model the call data

  • Zero-inflated negative binomial (ZINB) model for daily call counts
    • Lay interpretation: some days truly have no demand, while other days vary widely and can spike.
  • Day-of-week effects included in both the zero-inflated and NB count components
    • Lay interpretation: Mondays and weekends can have different baseline call patterns.
  • Latent intensity evolves through an AR(1) state process over time
    • Lay interpretation: call demand today is related to recent days, not independent day-to-day.
  • Annual seasonality represented with Fourier harmonics
    • Lay interpretation: recurring yearly patterns are modeled smoothly rather than with hard cutoffs.

Modeling strategy: match distributional assumptions to observed call behavior.

How early warning works

  • Smooth logistic change-point ramps capture medium-term structural shifts
    • Pre-specified number but estimated using horseshoe priors to allow for flexibility
    • Lay interpretation: the model allows gradual shifts in demand level but avoids overreacting to noise.
  • Daily shock term captures short-lived departures from baseline
    • Lay interpretation: unusual surges are flagged when calls jump above expected patterns.

Capability: separate expected temporal structure from transient anomalies.

Prospective model fitting from Mar 25 to Apr 10, 2020 (COVID-19 example)

Orange dots are observed call counts. Dark grey line and light grey shade are model fit with posterior credible intervals. Green bar near the x-axis represents 90% credible intervals for change-point timing, and the dashed blue line marks the current date.

Prospective model fitting from Mar 25 to Apr 10, 2020 (COVID-19 example)

Top panel overlays raw calls (orange points) with posterior mean and 90% credible interval. Bottom panel shows posterior kappa points with reference lines at 1.0 and +/-10% (y-axis fixed to 0.5-1.5).

Practical decision use is still a work in progress

  • Posterior expected calls are compared against latent baseline expectations
  • Kappa values above a certain level indicate anomolous behavior, but thresholds for action are still being determined
  • Tau and ramps can be used to show a systematic shift in call behavior, but how to use this information for decision-making is still being explored

What unifies this work

  • Both projects involve detection of latent phenomena (true infections, call spikes) from observed count data.
  • Complexity of these models requires us to take an efficient Bayesian hierarchical approach
  • Latent-process Bayesian models on count data with temporal dependence
  • Flexible distributions matched to real-world count data

Thank you

Questions or collaborations welcome.