Data Science Team Training
Building data science capacity in the public health workforce: A free e-book for public health practitioners upskilling in data science.
I’ve been a project coach with the Data Science Team Training (DSTT) program run by CSTE for several years now, working with public health agencies across the country to build data science capacity and upskill the public health workforce in data science. Each year I work with ~4-8 state, tribal, local, or territorial public health agencies who are working on a data science project, and I meet with them monthly to provide general “coaching” support, which could be technical or higher-level executive data science. It’s one of my favorite projects I’ve been involved with over the years. And over those years I’ve watched teams struggle with a the same set of problems.
There’s no shortage of resources for technical topics like how to make a plot with ggplot2, how to wrangle data with dplyr, or how to write SQL. And these days you can get your favorite AI to handle most of that anyway. The harder problems aren’t necessarily code. How to organize a project so your collaborators (and your future self) can navigate it; using version control as a team without stepping on each other’s work; managing reproducible environments so your analysis doesn’t break six months later because a package updated; naming files; managing scope; communicating findings to people who didn’t run the analysis and won’t read a methods section; and the new one lately, how to use AI in a data science project.

These are the topics I kept coming back to in coaching sessions, so I started writing them down. Data Science Team Training is a free, open-source e-book covering the practical foundations that make data science work sustainable and collaborative in a public health setting. Technical chapters address organizing and validating data, connecting to and querying relational databases, writing clean and well-documented code, managing reproducible environments and package dependencies, building R packages to share functions across projects, producing accessible reproducible reports and dashboards, and working effectively with AI coding assistants, and others. Nontechnical chapters cover project management, peer review of analytical work, navigating the data governance and IT relationships that shape what public health teams can actually do with their data, and communicating findings clearly to audiences who did not run the analysis.

The book grew out of the DSTT program, but I hope the material is broadly useful to anyone getting started team-based data science, whether in public health or not. It’s very much a work in progress, and I’ll update it more as I spend more time with my teams this year.
The book is available at dstt.stephenturner.us, and the source (Quarto) is on GitHub. It’s also available on Amazon for your Kindle, but you can download the PDF or EPUB for free at dstt.stephenturner.us.
As with my previous book, Biological Data Science with R, I wrote this book with Quarto. More on how:
