1 of 3

Learn Bayesian Data Analysis and Stan with Jonah Gabry [Virtual]

Wed Jul 14, 2021 9:00 AM - Fri Jul 16, 2021 5:30 PM EDT Online, Zoom

Learn Bayesian Data Analysis and Stan with Jonah Gabry [Virtual]

Wed Jul 14, 2021 9:00 AM - Fri Jul 16, 2021 5:30 PM EDT Online, Zoom

Need help?

Learn Bayesian Data Analysis (BDA) and Markov chain Monte Carlo (MCMC) computation using Stan in this three-day workshop with Stan developer Jonah Gabry.

Jonah is a Stan developer based at Columbia University and the developer of many R packages for applied Bayesian data analysis (rstan, rstanarm, rstantools, bayesplot, shinystan, loo). Jonah will be joined by fellow Stan developers Scott Spencer and Rob Trangucci, and other members of the Stan Development Team, such as Andrew Gelman, will make some guest appearances at various times throughout the course.

We will provide instructions for virtual workshop access, which will afford the ability to ask questions, participate, and engage in an active way.

The course consists of three main themes: Bayesian inference and computation; the Stan modeling language; applied statistics/Bayesian data analysis in practice. There will be some lectures to cover important concepts, but the course will also be heavily interactive, with much of the time dedicated to hands-on examples. We will be interfacing with Stan from R, but users of Python and other languages/platforms can still benefit from the course as all of the code we write in the Stan language (and all of the modeling techniques and concepts covered in the course) can be used with any of the Stan interfaces.

Before class everyone should install R, RStudio or python equivalents and Stan on their computers. If problems occur please join the stan-users group and post any questions. It is important that all participants get Stan running and bring their laptops to the course. Participants will receive a copy of Andrew Gelman's landmark book Bayesian Data Analysis. Proceeds from the class support further development of Stan and the New York Open Statistical Programming Meetup.

Example topics and course structure is below. Actual coverage will be determined during the class based on the participants. Be sure to tune in 5 minutes prior to 9:00 AM.

*Content is subject to change according to the needs of the class.*

Section 1: Bayesian Inference in Theory and Practice

This section covers (or reviews, depending on the audience) the most essential concepts that form the foundations of Bayesian inference. The focus is on the necessary background required to successfully apply Bayesian statistics to real world problems.

What is Bayesian inference and how does it differ from other forms of statistical inference?
- Advantages and disadvantages compared to frequentist inference and approximate forms of Bayesian inference
- Generative models
- The role of prior distributions in practice
- Important properties of posterior distributions
The Bayesian data analysis workflow
- Iterative model building, checking, and refinement
Bayesian computation with Markov chain Monte Carlo

Section 2: Intro to Stan

This section introduces the Stan modeling language, RStan, the interface for fitting Stan models via R, and the rest of the Stan ecosystem (e.g., the many supporting R packages).

What is Stan and why is it an important tool for Bayesian data analysis?
Programming statistical models in the Stan language
- Understanding the structure of a Stan program
- Key similarities and differences between the Stan language and other common programming languages
Using the Stan interfaces to fit models
- Introduction to RStan, the R interface to Stan
- Estimating models in Stan using data from an R session
- Working with the fitted model objects returned by RStan
- Brief intro to the Stan Development Team’s packages for diagnostics and visualization (bayesplot, shinystan). More on these packages also in later sections.

Section 3: Linear and Generalized Linear Models in Stan

This section covers how to program the most commonly used regression models in Stan and fit them using the RStan interface. In subsequent sections we will add hierarchical structure to these models.

Review of regression and generalized linear models (GLMs) from a Bayesian perspective
Programming and fitting GLMs in Stan
- Models for continuous data, binary data, count data
- Examples will be drawn from various domains including A/B testing, political polling, clinical trials, sports and more
How to think about specifying prior distributions for parameters in GLMs
- Weakly informative defaults
- When are non-informative priors appropriate?
- Translating prior knowledge into mathematical form

Section 4: Model Checking and Model Comparison

This section covers methods and tools for checking the fit of a model to data and comparing (or combining) multiple competing models.

Understanding the role of the posterior predictive distribution for model checking
Graphical and numerical posterior predictive checking using the bayesplot and shinystan R packages:
- How to use visualizations of the posterior predictive distribution to identify important features of the data not captured by a model
- Using posterior predictive checks to motivate improvements to the A/B testing model from Section 3
Comparing multiple models on estimated predictive performance
- When are techniques like cross-validation appropriate?
- The importance of predictive power and explanatory power and then frequent tension between them
- Introduction to the loo R package for model comparison and model averaging

Section 5: Hierarchical/Multilevel Models (Part 1)

In this section we focus on more advanced models that incorporate hierarchical structures unique to the particular application. These models are more difficult computationally and require paying more attention to diagnostics that motivate changes to the models.

Review of hierarchical models from a Bayesian perspective
- Bias/variance tradeoff
- Partial pooling, shrinkage, borrowing strength, regularization
- Hyperpriors and hyperparameters
Implementing hierarchical models in Stan
- Adding hierarchical structure to the GLMs from Section 3
- Coding tips for balancing ease of programming, code clarity, and computational efficiency
Diagnosing and fixing computational problems when fitting hierarchical models
- Visual and numerical Markov Chain Monte Carlo diagnostics
- Sampler tuning parameters
- Reparameterization

Section 6: Hierarchical/Multilevel Models (Part 2)

In this section we continue with the topic of hierarchical models and introduce techniques for decision making using inferences from the models.

Intro to more advanced hierarchical modeling techniques
- Temporal variation
- Spatial correlation structures
- Splines and Gaussian processes
Forecasting and out-of-sample prediction with hierarchical models
- Out-of-sample prediction using the generated quantities block of a Stan program
- Decision analysis (e.g., setting prices to maximize expected revenue, cost/benefit analysis in healthcare)

Section 7: Wrapping Up

Review essential concepts from previous sections
Time for discussing additional topics of interest to participants
Various tips and tricks for becoming an advanced Stan user
Q&A session with other Stan developers

Learn Bayesian Data Analysis and Stan with Jonah Gabry [Virtual]