Articles from 2022| blog.analytics-toolkit.com
The Mann Whitney U Test (MWU), also known as the Wilcoxon Rank Sum Test and the Mann-Whitney-Wilcoxon Test, continues to be advertised as the go-to test for analyzing non-normally distributed data. In online experimentation it is often touted as the most suitable for analyses of non-binomial metrics with typically non-normal (skewed) distributions such as average […] Read more...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
Have you heard how there is a much greater probability than generally expected that a statistically significant test outcome is in fact a false positive? In industry jargon: that a variant has been identified as a “winner” when it is not. In demonstrating the above the terms “False Positive Risk” (FPR), “False Findings Rate” (FFR), […] Read more...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
In A/B testing sequential tests are gradually becoming the norm due to the increased efficiency and flexibility that they grant practitioners. In most practical scenarios sequential tests offer a balance of risks and rewards superior to that of an equivalent fixed sample test. Sequential monitoring achieves this superiority by trading statistical power for the ability to stop earlier on average under any true value of the primary metric.| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
In this article we continue our examination of the AGILE statistical approach to AB testing with a more in-depth look into futility stopping, or stopping early for lack of positive effect (lack of superiority). We’ll cover why such rules are helpful and how they help boost the ROI of A/B testing, why a rigorous statistical rule is required in order to stop early when results are unpromising or negative and how it works in practice. We’re reviewing this from the standpoint of the AGILE me...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
Overgeneralization is a mistake in interpreting the outcomes of online controlled experiments (a.k.a. A/B tests) that can have a detrimental impact on any data-driven business. Overgeneralization is used in the typical sense of going above and beyond what the evidence at hand supports, with “evidence” being a statistically significant or non-significant outcome of an online […] Read more...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
Sequential statistics are gathering interest and there are more and more questions posed by CROs looking into the matter. For this article I teamed up with Lucia van den Brink, a distinguished CRO consultant who recently started using Analytics Toolkit and integrated frequentist sequential testing into her client workflow. In this short interview she asks […] Read more...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
Google Analytics 4 has been a let down in many aspects based on every discussion I’ve seen and had with professionals of all stripes – marketers, advertising specialists, CROs, GA professionals, online experimentation experts, etc. One of the less discussed issues it brought with it is the default heavyweight GTAG library integration it comes with. GA4’s GTAG compares quite unfavorably to the Universal Analytics’ analytics.js library and incurs a much heavier toll on website performan...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
The concept of statistical significance is central to planning, executing and evaluating A/B (and multivariate) tests, but at the same time it is the most misunderstood and misused statistical tool in internet marketing, conversion optimization, landing page optimization, and user testing.| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
What is Statistical Power?| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
I got a question today about our AGILE A/B testing calculator and the statistics behind it and realized that I’m yet to write a dedicated post explaining the efficiency gains from using the method in more detail. This despite the fact that these speed gains are clearly communicated and verified through simulation results presented in our AGILE statistical method white paper [1].| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
What is the goal of A/B testing? How long should I run a test for? Is it better to run many quick tests, or one long one? How do I know when is a good time to stop testing? How do I choose the significance threshold for a test? Is there something special about 95%? Does it make sense to run tests at 50% significance? How about 5%? What is the cost of adding more variants to test?| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
After many months of statistical research and development we are happy to announce two major releases that we believe have the potential to reshape statistical practice in the area of A/B testing by substantially increasing the accuracy, efficiency and ultimately return on investment of all kinds of A/B testing efforts in online marketing: a free white paper and a statistical calculator for A/B testing practitioners. In this post we’ll cover briefly the need for a new method, some highligh...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
This is a comprehensive guide to the different types of costs and benefits, risks and rewards related to A/B testing. Understanding them in detail should be valuable to A/B testers and businesses considering whether to engage in A/B testing or not, what to A/B test and what not to test, etc. As far as I am aware, this is the first attempt to systematically review all the different factors contributing to the return on investment from the process of A/B testing. Here I will cover A/B testing m...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
Several charges are commonly thrown at A/B testing while considering it or even after it has become standard practice in a company. They may come from product teams, designers, developers, or management, and can be summed up like this: A good way to address these and to make the business case for experimentation is to […] Read more...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
“Observed power”, “Post hoc Power”, and “Retrospective power” all refer to the statistical power of a statistical significance test to detect a true effect equal to the observed effect. In a broader sense these terms may also describe any power analysis performed after an experiment has completed. Importantly, it is the first, narrower sense that […] Read more...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
The above is a question asked by some practitioners of A/B testing, as well as a number of their clients when examining the outcome of an online controlled experiment. It may be raised regardless if the outcome is statistically significant or not. In both cases the fact the observed effect in an A/B test is […] Read more...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
Observed power, often referred to as “post hoc power” and “retrospective power” is the statistical power of a test to detect a true effect equal to the observed effect size. “Detect” in the context of a statistical hypothesis test means to result in a statistically significant outcome. Some calculators aimed at A/B testing practitioners use […] Read more...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
The question of whether one should run A/B tests (a.k.a online controlled experiments) using one-tailed versus two-tailed tests of significance was something I didn’t even consider important, as I thought the answer (one-tailed) was so self-evident that no discussion was necessary.| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
Analytics Toolkit was conceived in 2012 as a set of tools that automate essential Google Analytics-related tasks and augment the GA functionalities in various ways. This goal was achieved in the years since with the release of over a dozen tools utilizing the Google Analytics API. These were accompanied by dozens of in-depth technical articles on the same topic posted on this very blog which gathered hundreds of thousands of views over time. The toolkit served hundreds of digital agencies and...| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
Short, understandable, yet accurate explanation of p-values and confidence intervals. Starting from the problem of random variability and building up with minimal jargon, this is the most accessible introduction to these basic statistical concepts. Understand the meaning and utility of confidence intervals and p-values in statistical hypothesis testing and estimation.| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
Navigating the maze of A/B testing statistics can be challenging. This is especially true for those new to statistics and probability. One reason is the obscure terminology popping up in every other sentence. Another is that the writings can be vague, conflicting, incomplete, or simply wrong, depending on the source. Articles sprinkled with advanced math, calculus equations, and poorly-labeled graphs represent a major hurdle for newcomers.| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
A central feature of sequential testing is the idea of stopping “early”, as in “earlier compared to an equivalent fixed-sample size test”. This allows running A/B tests with fewer users and in a shorter amount of time while adhering to the targeted error guarantees.| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
Running shorter tests is key to improving the efficiency of experimentation as it translates to smaller direct losses from testing inferior experiences and also less unrealized revenue due to late implementation of superior ones.| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
One topic has surfaced in my ten years of developing statistical tools, consulting, and participating in discussions and conversations with CRO & A/B testing practitioners as causing the most confusion and that is statistical power and the related concept of minimum detectable effect (MDE). Some myths were previously dispelled in “Underpowered A/B tests – confusions, myths, and reality”, “A comprehensive guide to observed power (post hoc power)”, and other works. Yet others remain.| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...
How long does a typical A/B test run for? What percentage of A/B tests result in a ‘winner’? What is the average lift achieved in online controlled experiments? How good are top conversion rate optimization specialists at coming up with impactful interventions for websites and mobile apps?| Blog for Web Analytics, Statistics and Data-Driven Internet Marketing | Analy...