Using Complex Bayesian Hierarchical Models for “Detection” with Count Data

Ben Brintz

IDEAS / VA

Domains of Interest as Collaborative Statistician

Clinical Prediction &
Decision Support

Design &
Analysis

Complex Bayesian Hierarchical Models in Stan

Two examples of work in progress:
- Stochastic SEIR models that incorporate imperfect detection - COVID-19
- Early detection of call spikes - United Way 211 Calls (Food Insecurity Resources)

Note

SEIR models track transitions between susceptible, exposed, infectious, and removed states to represent epidemic dynamics.

Case study 1: COVID-19 is not perfectly reported

Latent disease transmission is never directly observed
Reported cases reflect testing intensity, reporting practices, and asymptomatic disease
- Only the sickest got diagnosed with COVID-19 early in the pandemic
- “We must rely on data from people who are sick enough to get themselves tested, which is a bit like trying to understand exercise trends among average Americans by surveying the participants of a marathon” - Utah Hero Project
Key capability: separate what is happening from what is observed.

Various approaches have been developed to estimate clinical detection rates

Back-calculation methods estimate the number of infections from hospitalizations and deaths
Seroprevalence studies estimate the proportion of the population that has been infected by testing for antibodies
Compartment models estimate the number of infections from observed cases and the distribution of the incubation period
- E.g. SIR (Susceptible-Infectious-Recovered) models
Key Challenge: identifiability issues arise when trying to estimate both the transmission rate and the detection rate from observed data, especially when testing practices change over time.

We propose a novel hierarchical Bayesian SEIR model using reported counts of new cases

We model detection with a continuous approximation of the binomial distribution (no discrete priors in stan)
- Lay interpretation: we assume we’re only observing a proportion of true infections
Model assumes transmission follows a hierarchical AR(1) process over time
- Lay interpretation: transmission this week tends to resemble recent weeks, while still allowing small differences between districts.
Model estimation allows for different start times of infections across health-districts
Allows an evolving recovery/incubation rate (E->I and I->R transitions) using a novel approximation of a beta-binomial transition
- Lay interpretation: the time it takes for people to become infectious/recover can change over time, and we model this flexibility directly rather than assuming it is constant.

Modeling strategy: allow flexibility where variability is real.

Model estimation allows for different start times of infections across health-districts

Note

This observed data aren’t smooth SEIR curves

Estimating an evolving recovery rate and incubation using beta-binomial transitions

Estimating an evolving transmission rate using a hierarchical AR(1) process

Utah COVID-19 clinical detection rate (posterior density)

Utah Estimated True Infections vs Reported Cases

Case study 2: Can spikes in 211 calls be detected early?

211 is a nationwide, free public referral system that responds to >20 million requests for help each year from U.S. residents
We have call counts from Utah from 2019 - 2025.
Calls are often zero or low volume, but occasionally spike when there are major events (e.g. COVID-19, natural disasters, economic shocks)
Can we capture these spikes early to help with resource allocation and response?

How we model the call data

Zero-inflated negative binomial (ZINB) model for daily call counts
- Lay interpretation: some days truly have no demand, while other days vary widely and can spike.
Day-of-week effects included in both the zero-inflated and NB count components
- Lay interpretation: Mondays and weekends can have different baseline call patterns.
Latent intensity evolves through an AR(1) state process over time
- Lay interpretation: call demand today is related to recent days, not independent day-to-day.
Annual seasonality represented with Fourier harmonics
- Lay interpretation: recurring yearly patterns are modeled smoothly rather than with hard cutoffs.

Modeling strategy: match distributional assumptions to observed call behavior.

How early warning works

Smooth logistic change-point ramps capture medium-term structural shifts
- Pre-specified number but estimated using horseshoe priors to allow for flexibility
- Lay interpretation: the model allows gradual shifts in demand level but avoids overreacting to noise.
Daily shock term captures short-lived departures from baseline
- Lay interpretation: unusual surges are flagged when calls jump above expected patterns.

Capability: separate expected temporal structure from transient anomalies.

Prospective model fitting from Mar 25 to Apr 10, 2020 (COVID-19 example)

Orange dots are observed call counts. Dark grey line and light grey shade are model fit with posterior credible intervals. Green bar near the x-axis represents 90% credible intervals for change-point timing, and the dashed blue line marks the current date.

Prospective model fitting from Mar 25 to Apr 10, 2020 (COVID-19 example)

Top panel overlays raw calls (orange points) with posterior mean and 90% credible interval. Bottom panel shows posterior kappa points with reference lines at 1.0 and +/-10% (y-axis fixed to 0.5-1.5).

Practical decision use is still a work in progress

Posterior expected calls are compared against latent baseline expectations
Kappa values above a certain level indicate anomolous behavior, but thresholds for action are still being determined
Tau and ramps can be used to show a systematic shift in call behavior, but how to use this information for decision-making is still being explored

What unifies this work

Both projects involve detection of latent phenomena (true infections, call spikes) from observed count data.
Complexity of these models requires us to take an efficient Bayesian hierarchical approach
Latent-process Bayesian models on count data with temporal dependence
Flexible distributions matched to real-world count data

Thank you

Questions or collaborations welcome.