I just wrapped up my first useR conference since Brussels in 2017, and it was just the best. The #rstats community is one of a kind and it was delightful seeing old friends and hearing about all of the amazing things folks are building. Yesterday morning Davis Vaughan & I whipped up a last minute shiny app to help us facilitate Q&As after sessions we were chairing and I ended up ditching my original talk today and writing one about that instead 🙃 🧰 tools used: Claude, Positron, RStudio,...|
This analysis was made possible by the mdr R package, which used data originally compiled by Sam_Badi on Reddit. The data consists of all elevator dings in the Severance episodes along with the episode number, time stamp, pitch of the ding, and the action associated. Examining the plot below, we see across all dings the G is associated with both innie and outies going to sleep, the C# is consistently associated with both innies and outies waking up. (Spoiler: There is one notable exception, a...|
This analysis was made possible by the mdr R package, which used data originally compiled by the Severance wiki. Here, we create a little sentiment profile for each episode, binning them in three minute increments and calculating the AFINN average sentiment score in each. library(tidytext)library(mdr)library(tidyverse)df <- transcripts |>mutate(timestamp_seconds =as.numeric(timestamp), bin =floor(timestamp_seconds /180) *180) |>left_join(episodes, by =c("season", "episode"))df |>mutate(id = g...|
In this analysis, I use my mdr R package, which used data originally compiled by the Severance wiki. For each episode we count the number of words each of the four main characters (Mark, Helly, Dylan, and Irving) speak for in each minute and visualize them below. Click on the tabs to switch episodes. library(tidyverse)library(tidytext)library(ggiraph)library(mdr)make_plot <-function(input) {data <- transcripts |>mutate(speaker =case_when(grepl("Cobel", speaker) ~"Cobel", speaker =="Mark W"~"M...|
I thought it’d be fun to celebrate spooky season with a little stats punny plot. We’re going to turn a normal distribution into a paranormal distribution! HA! Ok first let’s get some packages. library(tidyverse)library(tweenr)library(gganimate) Now let’s generate our Normal data: We’re doing this twice in the data frame because we need to have the same number of data points as the paranormal data, which has a little wiggly bottom. n <-100x_top <-seq(-1.5, 1.5, length.out = n)y_top <...|
Here we are going to look at several diagnostic plots are helpful when attempting to answer a causal question. They can be used to visualize the target population, balance, and treatment effect heterogeneity. Setup I’ve simulated data to demonstrate the utility of the various plots. In each simulation, we have four pre-treatment variables: var1, var2, var3, and var4, a treatment, t, and an outcome y. I have also fit a propensity score model for each and calculated ATE, ATT, and overlap weig...|
To celebrate the 40th anniversary of the paper The Central Role of the Propensity Score in Observational Studies for Causal Effects published in Biometrika in 1983, the journal Observational Studies had a special issue highlighting the methods in the paper and developed since. This led us to take a closer look at this seminal paper, and in doing so we noticed mention of a visual diagnostic tool that we haven’t see used often but might be useful for exploring potential treatment effect heter...|
After my previous post about missing data, Kathy asked on Twitter whether two wrong models (the imputation model + the outcome model) would be better than one (the outcome model alone). Without doing any of the math, I’d guess the assumption of correctly spec the model also has a bigger impact in the CC analysis. You need correct spec in MI, twice, but trade off that potential bias for higher prec. This is a great question! I am going to investigate via a small simulation (so the answer cou...|
Here is the scenario: You are trying to predict some outcome, , and some of your predictors have missing data. Will doing a complete case analysis give you unbiased results? What additional information do you need before deciding? For some reason, when I tried to answer this question, my first instinct was to try to decide whether the data were missing at random, but it turns out, this might not be the right first question! Why? Complete case analysis will give us unbiased estimates even if t...|
I created a little Shiny application to demonstrate that Neural Networks are just souped up linear models: https://lucy.shinyapps.io/neural-net-linear/ This application has a neural network fit to a dataset with one predictor, x, and one outcome, y. The network has one hidden layer with three activations. You can click a “Play” button to watch how the neural network fits across 300 epochs. You can also click on the nodes of the neural network diagram to highlight each of the individual ac...|
On this weeks episode of Casual Inference we talk about a “Causal Quartet” a set of four datasets generated under different mechanisms, all with the same statistical summaries (including visualizations!) but different true causal effects. The figures and tables are from our recent preprint: https://arxiv.org/pdf/2304.02683.pdf Given a single dataset with 3 variables: exposure, outcome and covariate (z) how can statistics help you decide whether to adjust for z? It can’t! For example her...|
Transparency in public health messaging matters. Hannah Mendoza and I looked at how providing transparent information about why a public health recommendation is being made can increase uptake in a randomized trial published today in Plos One What did we do? We conducted a randomized controlled trial to assess whether disclosing elements of uncertainty in an initial public health statement will change the likelihood that participants will accept new, different advice that arises as more evide...|
We have migrated our blog from Hugo to Quarto! Here are a few quick tips that made the transition a bit smoother. 1. Setting up a Quarto website It is super easy to set up a Quarto website. To get the basic template, you can run the following in your terminal: quarto create-project mysite --type website You can find lots of details about how to customize your site in the Quarto Docs. The rest of this post will cover a few things that made the transition smooth for us. 2. Moving .Rmd files fro...|
The tipr R package has some new features! And a new and improved API! What is tipr tipr is an R package that allows you to conduct sensitivity analyses for unmeasured confounders. Why might you want to do that? Well, as it turns out, the assumption of “no unmeasured confounders” is integral to any estimation of a causal effect. This assumption is untestable, so often the best we can do is examine how far off our estimates would be should an unmeasured confounder exists, hence sensitivity ...|
I recently noticed that the Pfizer immunobridging trials, presumably set up to demonstrate that their COVID-19 vaccines elicit the same antibody response in children as was seen in 16-25 year olds, for whom efficacy has previously been demonstrated, have a strange criteria for “success”. GMT: geometric mean titer. This is a measure of the antibody titers. We use the geometric mean because this data is quite skewed (it is also why you typically see it plotted on the log scale). For those o...|
This post explores the impact of setting particular criteria for “success” in clinical trial designs. A common study design is a “non-inferiority” trial. The goal here is to show that some intervention is not inferior, that is not worse, than some already approved intervention, by some specific definition of not worse. While this may sound straightforward, it can be tricky! Especially because in all clinical trials we are working with a sample of individuals, so there is necessarily u...|
In SNL’s cold open last night, “President Joe Biden” suggested that the COVID-19 surge we are seeing in the US is due to people seeing Spider-Man: No Way Home. If people would just stop seeing this film, he argues, cases will go back down! Interesting hypothesis, let’s take a looksy at the data, shall we? And now, a message from President Joe Biden. pic.twitter.com/Q8TglFNBlF — Saturday Night Live - SNL (@nbcsnl) January 16, 2022 I pulled the domestic box office data from the-number...|
I’m seeing lots of confusion around the frequency of breakthrough cases and the effectiveness of vaccines (in fact, a recent interview I did resulted in a confusing headline on this topic!) so let’s dive in! Vaccine effectiveness is a relative measure, it tells us how protected you will be relative to an unvaccinated person. Even with delta, this looks ok for infections (and very good for severe illness) Scenario 1: 🤒 if an unvaccinated person has a 10% chance of getting sick 💉 and ...|
I’ve seen a lot today about how effective the vaccines are; mistakes aside, lots of folks seem to be mixing up which denominators matter - good thing statisticians LOVE denominators! If you see something like x% of the sick/hospitalized/deceased were vaccinated, the better the vaccine uptake the scarier this number will seem! It is using the wrong denominator. For example, here is a scenario with 90% vaccination, 4 people got sick: 2 vaccinated 2 unvaccinated: In this scenario, 50% of the s...|
There was a recent email thread in the IsoStat listserv about a cool visualization that recently came out in the New York Times showing COVID-19 cases over time. This sparked a discussion about whether this was possible to recreate in R with ggplot, so of course I gave it a try! library(tidycensus)library(tidyverse)library(geofacet)library(zoo) The plot shows cases per 100,000 by state, so I first needed to pull population data. To do that I used the tidycensus package. (If you don’t have a...|