Measures of Central Tendency For symmetric normal-like distributions there is a clear winner for measuring central tendency: the sample mean. The mean has the highest precision/efficiency and is also representative of a typical observation from the population distribution. The mean is not robust, e.g., is too affected by extreme values, when the distribution is heavy-tailed or asymmetric. For general continuous distributions with arbitrarily heavy tails or skewness, the sample median is robus...| Statistical Thinking
Event: ACTStats 2025 Annual Meeting Keynote Talk, Nashville TN USA Slides Details| Statistical Thinking
Background The goal here is strong internal validation after fitting a pre-specified regression model or one that was derived using backwards step-down variable selection such that the same variable selection procedure can be repeated afresh for each bootstrap repetition. So strong internal validation means estimating a variety of model performance measures in a way that does not reward them for overfitting and that penalizes for all aspects of model selection and derivation that utilized the...| Statistical Thinking
Background This article considers the following setting. Suppose we have one continuous predictor and an outcome variable and we wish to estimate a smooth, usually nonlinear, relationship between and some property of such as the mean or the probability that exceeds some specified value. When there is no censoring on , one can estimate such a smooth relationship nonparametrically using a standard smoother such as loess or the R “super smoother” supsmu. Semiparametric ordinal regression, us...| Statistical Thinking
Overview Maximum likelihood estimation (MLE) is a gold standard estimation procedure in non-Bayesian statistics, and the likelihood function is central to Bayesian statistics (even though it is not maximized in the Bayesian paradigm). MLE may be unpenalized (the standard approach) or various penalty functions such as L1 (lasso, absolute value penalty), and L2 (ridge regression; quadratic) penalties may be added to the log-likelihood to achieve shrinkage (aka regularization). I have been doing...| Statistical Thinking
Background Clustering of patients to find new “phenotypes” is now a fad. For example, repeating the false assertion that diabetes was ever a binary diagnosis, Ahlqvist et al claimed to have found 5 diabetes subtypes using a purely statistical analysis not driven by clinical knowledge. What they found is likely just inefficient prognostic stratification that could be improved upon by directly relating patient characteristics to outcomes. Maarten van Smeden showed that clustering algorithms...| Statistical Thinking
Janice Pogue Lecture in Biostatistics, Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Ontario, Canada 2024-12-06 Slides| Statistical Thinking
Background In clinical and epidemiologic studies one is frequently tasked with maximizing accuracy when assessing the presence of clinical conditions (symptoms, diagnoses, syndromes, etc.) or verifying outcome events such as stroke, myocardial infarction, or death from a specific cause. Prospective studies have the advantage of standardizing definitions of clinical conditions, minimizing bias, and being honest about disagreements about clinical designations. Many studies have clinical endpoin...| Statistical Thinking
One of my best decisions was to build my own web sites hbiostat.org and fharrell.com so that I have total control of content and formatting and can easily and quickly post content updates. I want to share a few things I’ve learned. While your organization’s web pages are great for static content, my public-facing content evolves rapidly with constant improvements made to course web pages, miscellaneous web pages such as hbiostat.org/data, blog articles, and course handouts. To make it eas...| Statistical Thinking
Background Consider these four conditions: There is no reliable prior information about an effect and an uninformative prior is used in the Bayesian analysis There is only one look at the data The look was pre-planned and not data-dependent A one-sided assessment is of interest, so that one-tailed p-values and Bayesian posterior probabilities are used, where is the effect parameter of interest (e.g., difference in means, log effect ratio) and means “conditional on” or “given”. One-Sid...| Statistical Thinking
Slides| Statistical Thinking
Event: Consilium Scientific Slides Video| Statistical Thinking
Event: International Chinese Statistical Association Applied Statistics Symposium, Nashville, Tennessee USA Slides| Statistical Thinking
Background As explained here, the power for a group comparison can be greatly increased over that provided by a binary endpoint, with greater increase when an ordinal endpoint has several well-populated categories or has a great many categories, in which it becomes a standard continuous variable. When a randomized clinical trial (RCT) is undertaken and deaths can occur, there are disadvantages to excluding the death and analyzing responses only on survivors using death as a competing risk, wh...| Statistical Thinking
Background A binary endpoint in a clinical trial is a minimum-information endpoint that yields the lowest power for treatment comparisons. A time-to-event outcome, when only a minority of subjects suffer the event, has little power gain over a pure binary endpoint, since its power comes from the number of events (number of uncensored observations). The highest power endpoint would be from a continuous variable that is measured precisely and reflects the clinical outcome situation. An ordinal ...| Statistical Thinking
Background The log-rank test is a Mantel-Haenszel “observed - expected frequency” type of test that was derived in a slightly ad hoc way by Nathan Mantel in 1966 and named the logrank test by R Peto and J Peto in 1972. It was later formally derived as the rank test having optimal local power for a shift in the type I extreme value (Gumbel) distribution. This horizontal shift is equivalent to a vertical shift in survival distributions after log-log transforming them. This is identical to s...| Statistical Thinking
A Definition All statistical procedures have assumptions. Even the most simple response variable (Y) where the possible values are 0 and 1, when analyzed using the proportion that Y=1, assumes that Y is truly binary, every observation has the same probability that Y=1, and that observations are independent. Non-categorical Y have more assumptions. Even simple descriptive statistics have assumptions as described below. But what does it mean that an assumption is required for using a statistica...| Statistical Thinking
UCLA Cardiology Grand Rounds 2020-10-23 | Video (better video below) Vanderbilt University Department of Biostatistics 2020-11-18 Vanderbilt Translational Research Forum 2021-11-04 | Video Consilium Scientific 2024-03-14 | Video and here Slides| Statistical Thinking
Background Consider the problem of comparing two treatments by doing squential analyses by avoiding putting too much faith into a fixed sample size design. As shown here the lowest expected sample size will result from looking at the developing data as often as possible in a Bayesian design. The Bayesian approach computes probabilities about unknowns, e.g., the treatment effect, and one can update the current evidence base as often as desired, knowing that the current information has made pre...| Statistical Thinking
Slides Elaborations Video| Statistical Thinking