Modeling Covid-19 in Pennsylvania

Forecasting the impact of Covid-19 using an epidemiological model trained on real world data

Location determines the demographic data used by the model, including population, existing data about the spread of Covid-19 in the region, and historical social distancing levels.

The social distancing scenario models what the people and governments in the region might do in the future — how socially distanced will they be, and for how long?

Questions? Feedback? Contact us at

Model run on
Recent & upcoming changes

Instead of selecting a single scenario, our model provides several potential options for social distancing. Many scenarios illustrate what happens when distancing measures are stopped entirely: a second wave of cases. Several scenarios suppress the virus enough for a robust “test and trace” strategy to become feasible. We model these options to show how our policies and collective actions could impact the overall spread of the virus.

These options are themselves a simplification, and each involve only one distancing period. In practice, distancing is complex, and will vary over time. We hope to model more complex distancing scenarios based on real world proposals and data in the future.

Available scenarios for Pennsylvania

Social distancing levels in Pennsylvania

The following graph displays social distancing levels relative to regular social activity. The current distancing level, , is calculated based on the average the past seven days of available mobility data for Pennsylvania, which was last updated on .

Past social distancing levels are based on available mobility data for Pennsylvania, and prospective social distancing levels are based on the selected scenario: .

Past social distancing levels are calculated from Google’s Covid-19 Community Mobility Reports, which track movement trends over time by geographic area and location category (e.g., retail, transit, workplace) relative to a baseline. For more information, see the documentation.

We use the data available to us  —  location demographics, reported fatalities, and positive test cases  —  to estimate when Covid-19 began to spread in a location. The model estimates that Covid-19 began to spread in Pennsylvania on .

How does social distancing relate to how the virus spreads?

Epidemiologists measure how quickly a disease spreads through R₀, its basic reproduction number, defined as the number of people a disease will spread to from a single infected person. R₀ differs across geographic locations based on population demographics and density. The model couples this information with confirmed fatality and positive testing data to estimate how contagious Covid-19 is in each location. The model estimates that Covid-19 had an R₀ of in Pennsylvania when the virus first arrived and there were no distancing measures in place.

When social distancing measures are introduced, it becomes more difficult for a disease to spread through a population. We represent this usingRt, the effective reproduction number. Rt represents how many people a single case of the disease will spread to at a given point in time, taking social distancing measures into account.

If the virus spreads through a significant portion of the population, it has a decreasing chance of reaching a susceptible person. This also contributes to a reduced Rt.

How could distancing affect the population?

Our model is based upon a standard epidemiological model called the SEIR model. The SEIR model is a compartmental model, which estimates the spread of a virus by dividing the population into different groups:

  • Susceptible people are healthy and at risk for contracting Covid-19.
  • Exposed people have Covid-19 and are in the incubation period; the model assumes most exposed people cannot infect others.
  • Infectious people have Covid-19 and can infect others.
  • Hospitalized people are currently in the hospital or ICU. As a simplifying assumption, we do not model susceptible healthcare workers. As a result, the model assumes hospitalized people cannot infect others.
  • Recovered people have had Covid-19 and are considered immune to re-infection. Our model assumes that the typical immune response will last “at least a year.”
  • Deceased people have passed away due to Covid-19.
Healthcare workers are at a higher risk of contracting Covid-19. The fraction of overall reported cases who are healthcare workers was observed as 10% in China , 10% in Italy , and 15.8% in Ontario. Modeling this scenario is complicated: healthcare workers are also part of the overall susceptible population and have a higher likelihood of being tested.

This graph shows a detailed view of how we project that Covid-19 will affect the population of Pennsylvania over time. While only a small portion of the population actively has Covid-19 at any given time, it can quickly spread. The graph in the top right shows how small changes compound to impact the population as a whole.

The model estimates that by , of the Pennsylvania population will have contracted Covid-19.Pennsylvania with

Comparing the model with verified data

We use two primary data sources to calibrate the curves for each state: confirmed positive tests and confirmed fatalities. The model takes these data points alongside the distancing data and computes a set of curves that satisfy the epidemiological constraints of the SEIR model.

Both datasets are imperfect. The model assumes that the number of reported positive tests is less than the number of cases in a region and predicts the fraction of cases detected by tests in Pennsylvania, accounting for how testing capacity varies over time. While the number of fatalities can also be underreported, the model doesn’t include additional adjustments beyond predicting the fraction of cases detected.

To predict this number, the model combines overall estimations of testing capacity across the United States over time and computes an adjustment for each state to fit to the available data.

If we look at this data on a logarithmic scale, we can see how the actual data aligns with the model’s predictions:positive tests andreported fatalities, and how they compare to the predicted number oftotal Covid-19 cases in Pennsylvania after accounting for estimated testing rates.

Reading the graph: Each line represents the model’s best estimation. The shaded area around a line indicates uncertainty: the darker the area, the more likely the outcome.

The total number of cases is equivalent to the number of people who have been in the “exposed” group.

Finding the best fit

To arrive at a prediction of viral spread for each state that best matches the available data for Pennsylvania, the model adjusts three values: the date Covid-19 arrived, the R₀ (basic reproduction number), and fraction of cases being detected in Pennsylvania. Reported fatalities and confirmed positive cases are weighed at a 3:1 ratio during the fitting process.

Visualizing the data on a daily basis

The graph above shows the impact of the virus on a cumulative basis: this gives us a sense of overall impact, but doesn’t give us a good look at the daily change in cases. While daily reports tend to fluctuate, over time they indicate if there is an increase or decrease in viral spread.

The following graph comparesnew infections per day,positive tests per day, andreported fatalities per day, along with their respective confirmed data points:

Modeling hospitalizations

The following charts show our projections of hospitalizations due to Covid-19. Unlike the fatality and confirmed case data, we do not fit the model to hospitalization data. Instead, the model projects where hospital occupancy is expected to fall based on published times from symptom onset to hospitalization. Hospitalized cases are not consistently reported by all states. When reported, they have variable reporting delays, may not reflect all hospital systems in a state, and usually only include cases with confirmed positive tests (and not unconfirmed suspected cases).

We estimate the hospital capacity for Covid-19 patients by taking the number of available beds and discounting for that hospital system’s typical occupancy rate. Note that these hospitalization estimates do not include patients who are admitted to the intensive care unit, which is modeled separately below.

Observed as 8.5 in California and Washington, 5.9 in Singapore, 2.7 in China, and 3.4 in Shenzhen.

The graph shows how projections forpatients requiring hospitalization andpatients currently reported hospitalized compare to estimated hospital capacity. The distinction being that the number reported hospitalized is delayed from the actual number of infections severe enough to require hospitalization and there are a percentage of cases that will require hospitalization but never be tested (the model assumes all reported hospitalizations have tested positive). We don’t expect the number reported to ever exceed the hospital capacity.

The model estimates that hospitals in Pennsylvania capacity.Pennsylvania with

Next, we look at an analagous graph for cumulative hospitalizations, which is how some states report this statistic.

Reading the graph: This graph shows cumulative hospitalizations as a result of Covid-19, as some states report only cumulative numbers.

Projecting intensive care unit (ICU) occupancy

We also model the expected number of Covid-19 cases that will require intensive care. Similar to hospitalizations, we do not fit the model to the reported ICU admission data. Instead, we show what the model would expect for Pennsylvania.

Pennsylvania typically has total ICU beds. As the number of patients who currently require intensive care approaches ICU capacity, Pennsylvania can add ICU beds and personnel to care for incoming patients. As a result, the model allows the number of patients currently reported in intensive care to exceed the typical total number of ICU beds.

Cities like New York have significantly increased ICU capacity when faced with an influx of cases.As more data becomes available in the future, we would like to display the reported ICU capacity in Pennsylvania for additional clarity.

While we expect that exceeding ICU capacity would have a dramatic effect on the fatality rate of Covid-19, the model currently does not adjust the fatality rate in this situation.

Next, we look at an analagous graph for cumulative ICU admissions, which is how some states report this statistic.

Model details, references, and outcomes