As I’ve written about before, as a data scientist supporting a product or marketing team with A/B testing, the job is communication – helping to translate between business requirements and what we can learn from statistics. I (and many, many others) have found that there is a lot of value in having a document, shared among the team that is running the test. Some A/B testing tools include some limited workflow for collaboration on testing (e.| Harlan D. Harris
The other day, I was reading a post by Venkatash Rao (thousands of words of under-edited brilliance, as usual), and was struck by this note about the complexity of climate solutions: I tend to take as an article of faith the systems science rule of thumb that the complexity of solutions generally matches the complexity of the problems. If it doesn’t, then you either got lucky, or there are negative externalities you’re ignoring.| Harlan D. Harris
There’s been an immense amount of discussion about Large Language Models (LLMs) such as ChatGPT over the last year, of course. Some of that discussion has been whether they are intelligent, conscious, or on the path to Artificial General Intelligence. I’m particularly interested in the “consciousness” question, as it was an area of personal interest when I was working as a cognitive scientist, in a prior career. I never did research on the topic, but I read plenty of philosophers of m...| Harlan D. Harris
Recently I wrote a blog post that mentioned “Superiority” as a type of A/B test decision. In this post I want to talk about all five types of A/B test decision that I think are relevant. This is an adaptation and extension of a talk I gave last year at the Quant UX conference (it’s a great event, you should check it out.) Note that I go into a little more statistical detail here, although most of the below is readable by non-data scientists.| Harlan D. Harris
Recently, tech-journalism site The Markup ran a long, detailed, critical investigation of a predictive machine learning model used by the State of Wisconsin to identify public school students at risk of not graduating. I mostly agree with the conclusions of the piece – the system appears not to be fit for purpose and needs to be substantially improved – but I want to comment on several aspects of the model and the Markup’s reporting.| Harlan D. Harris
The “best practice”, when evaluating the results of an online controlled experiment (A/B test), is to use classical statistical tests, proceeding with a change if (and only if) the result of the test includes a p value of less than 0.05. But, the American Statistical Association (ASA) said in a prominent 2016 statement that “…business… decisions should not be based only on whether a p-value passes a specific threshold.” Wait, what?| Harlan D. Harris
A/B testing is a tool for supporting decision-making in business, and so in addition to getting the statistics right, it’s really important to communicate well with the non-statisticians who will have the final say on the go/no-go decision. Most A/B tests in practice are testing ratios, conversion rates of various sorts – say, the proportion of people who visit your web site who buy at least one pair of shoes.| Harlan D. Harris
Suppose you’re a data scientist at an e-commerce web site that sells shoes, responsible for supporting A/B tests. Many A/B tests are easy, and there are a number of companies that sell tools that make the easy cases as simple as clicking a few buttons and looking at pretty graphs. But A/B tests can get statistically complex surprisingly quickly, which is why hiring data scientists with a strong statistics background can make a big difference in the quality of decisions.| Harlan D. Harris
Just a quick post here to note a few professional accomplishments: I just added a new publication to my vita – a peer-reviewed conference proceeedings article about abstractions for building repeated, related versions of similar predictive models. Check out some longer thoughts on Medium, or read the full article. Earlier this year, I added an incredibly old project! A paper that I had contributed a bit to in… 2005! finally got published!| Harlan D. Harris
I’ve been cooking a recipe recently of my own creation that I really like, and there isn’t much similar on the internet, so I’m sharing the recipe here. It’s a combination of two great things – hot-smoking fish with wood chips in a stovetop smoker, and the fermented flavors of Hunanese cuisine. Smoked fish and duck are common flavors in Chinese cooking. Tea-smoked duck is common on Chinese menus in America, and Chinese cookbooks have recipes like Steamed Smoked Fish with Black Beans...| Harlan D. Harris
A thing that I do when I cook is to re-write the recipes I’m using (whether they’re from a cookbook or my own invention) onto a piece of paper in a very specific way. I think the approach I use is handy, so I’m describing it here in case you’d like to use it. (Or in case you need more evidence about how weird I am.) There are 4 ideas that I think are important:| Harlan D. Harris
This is my first new post on harlan.harris.name for a while. The occasion is a change of scenery. For about 10 years, my primary blog has been on WordPress, more recently supplemented by Medium. But WordPress and Medium are limited for technical writing, and the trend among data people recently has been to publish static sites built with Blogdown and Hugo. So that’s what this is. The technology I’m using (more on it below) lets me do fun things like trivially embed math: \(\sum_i a^2_i\),...| Harlan D. Harris
This post was originally published on Medium There’s recently been some interesting opinionated writing in the R statistical programming community about how and when to teach the abstracted, easy-to-use approaches to solving problems, versus the underlying nitty-gritty. David Robinson, Data Scientist at Stack Overflow, wrote a blog post recently called Don’t teach students the hard way first. The primary example was on the data-manipulation tools in the tidyverse, versus the underlying me...| Harlan D. Harris
This post was originally published on Medium I recently attended two small conferences — the ISBIS (International Society for Business and Industrial Statistics) 2017 conference, held at IBM Research in Westchester County, and the Domino Data Lab Popup, held in West SoHo. I was invited to speak at ISBIS (slides here, if you’re curious), but for this post, I want to summarize some insights from other people’s talks. In chronological (to me) order… First a few talks from ISBIS that I pa...| Harlan D. Harris
This post was originally published on Medium Occasionally when chatting with other data scientists, especially with others who are interested in integrating predictive models into production software system, the word “scaling” comes up. Not this. Although some West Coast data scientists are into this kind of scaling too. I think this is a great question, but it’s a little underspecified. There seem to be at least three qualitatively different notions of “scaling” in data science, an...| Harlan D. Harris
This post was originally published on Medium A particularly good way to get a little more out of professional conferences is to blog about your experiences, I think. It makes you focus your thoughts on things like “what’s the big take-away here,” and “what should I be asking people in the hallways?” Rather than just summarizing what you saw, or making snarky Twitter comments (also worth doing!), a great conference blog post is synthesis — combining insights from multiple presentat...| Harlan D. Harris
This post was originally published on Medium A particularly good talk at Strata NY last year was by Brett Goldstein, former CIO of Chicago, who talked about accountability and transparency in predictive models that affect people’s lives. This struck a strong chord with me, so I wanted to take some time to write down some thoughts. (And a rather longer time to publish those thoughts…) I’m sure others’ have thought about this more and have better takes on this — please comment and pro...| Harlan D. Harris
I, Harlan D. Harris, hereby commit to the neveragain.tech pledge. Please stand with me and hold me to it. It starts: We, the undersigned, are employees of tech organizations and companies based in the United States. We are engineers, designers, business executives, and others whose jobs include managing or processing data about people. We are choosing to stand in solidarity with Muslim Americans, immigrants, and all people whose lives and livelihoods are threatened by the incoming administrat...| Harlan D. Harris
This post was originally published on Medium When building a complex system, it’s often helpful to think about the design of that system using patterns and abstractions. Architects and software engineers do so frequently, and the experience of implementing predictive modeling pipelines has recently led to a variety of patterns and best practices. For instance, when dealing with large amounts of streaming data, some organizations use the Lambda Architecture to handle both real-time and compu...| Harlan D. Harris
This post was originally published on Medium You’re a data scientist, and you’ve got a predictive model — great work! Now what? In many cases, you need to hook it up to some sort of large, complex software product so that users can get access to the predictions. Think of LinkedIn’s People You May Know, which mines your professional graph for unconnected connections, or Hopper’s flight price predictions. Those started out as prototypes on someone’s laptop, and are now running at sc...| Harlan D. Harris
This post was originally published on Medium Yesterday was the 2016 National Day of Civic Hacking, a Code for America event that encourages people with technology and related skills to explore projects related to civil society and government. My friend Josh Tauberer wrote a thoughtful post earlier about the event called Why We Hack —on what the value of this sort of event might be — please read it.| Harlan D. Harris
This is an updated version of an article first published on Medium on Oct. 24, 2015. I love my smartwatch, way more than I thought I would when I bought it, over a year ago. It’s a Moto 360, which is still better looking than the Apple watch, I think. Why do I love it? It’s not the health monitoring. I turned that junk off as soon as I got the thing.| Harlan D. Harris
This post was originally published on Medium Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician. — @spite.vc on bluesky (@josh_wills) May 3, 2012 There are different types of data scientists, with different backgrounds and career paths. With Sean Murphy and Marck Vaisman, I wrote an article about this for O’Reilly a few years back, based on survey research we’d done.| Harlan D. Harris
This post was originally published on Medium I’m the Director of Data Science at EAB, a firm that provides best-practices research and enterprise software for colleges and universities. My team is responsible for the predictive models and other advanced analytics that are part of the Student Success Collaborative product that’s used by academic advisors and other campus leadership. We’re hiring data scientists, and I wanted to publicly say a few things about the roles we have advertised.| Harlan D. Harris
The below is a public version of a post originally posted on an internal blog at the Education Advisory Board (EAB), my current employer. We don’t yet have a public tech blog, but I got permission to edit and post it here, along with the referenced code. Data Science teams get asked to do a lot of different sorts of things. Some of what the team that I’m part of builds is enterprise-scale predictive analytics, such as the Student Risk Model that’s part of the Student Success Collabor...| Harlan D. Harris
{{% tweet "508680143867768832" %}} Let me unpack that a bit… Hugh and Crye t-shirt Recently, Hugh & Crye, a DC-based clothing firm for men, with a novel take on sizing, recently did a Kickstarter campaign for their new line of fitted t-shirts. What the hell? H&C has been around for about 5 years, and based on their product growth and hiring seems to be doing quite well. I like their stuff. Why do they need a Kickstarter? The original goal of Kickstarter was to “kickstart” new products ...| Harlan D. Harris
Earlier this year, I attended the INFORMS Conference on Business Analytics & Operations Research, in Boston. I was asked beforehand if I wanted to be a conference blogger, and for some reason I said I would. This meant I was able to publish posts on the conference’s WordPress web site, and was also obliged to do so! Here are the five posts that I wrote, along with an excerpt from each.| Harlan D. Harris
On Monday, October 28th, 2013, I gave a 5-minute Ignite talk entitled “Why a Data Community is Like a Music Scene” at an event associated with the Strata conference. Here’s the video: And here are the acknowledgements and references for the talk: Data Community DC How Music Works, by David Byrne my slides for the Ignite talk my blog post (written first) Photos: CBGB’s exterior: NYC - East Village: CBGB & OMFUG by wallyg, on Flickr (Creative Commons) Grafitti wall: cbgb, september 2006...| Harlan D. Harris
A topographic map of Washington in 1791 by Don Alexander Hawkins. I live on the top edge of the map, on one of those hills. I’m a generally happy user of DC’s Capital Bikeshare system – just renewed my annual membership today in fact. But I don’t use it as much as I’d like to, for one critical reason. I live on top of a hill. Riders are happy to take bikes from the neighborhood to their jobs downhill, but are much less likely to ride them uphill.| Harlan D. Harris
And, we’re back! After being off-line for several weeks, this site is now live again! I can’t imagine you missed it. Here’s what happened. Let’s start at the beginning. In 2003, ICANN added .name to the list of top-level domains (like .com, .edu, etc.). The idea is that individuals would use it for personal sites and email addresses. You can still do this, but (in case you haven’t noticed), it’s not very popular, and most domain name registrars don’t even sell .| Harlan D. Harris
For those people (or, more likely, 0 or 1 persons) who follow this blog to catch up on my professional thoughts: I’ve been doing a little bit of writing on the Data Community DC blog. Here are all my posts over there: http://datacommunitydc.org/blog/author/harlan/ I’d definitely encourage you to read everyone else’s work on the DC2 blog too! Two titles of my own: Examining Overlapping Meetup Memberships with Venn Diagrams Hackathons and DataDives And three of others’:| Harlan D. Harris
I recently gave a presentation on communication issues around the terms “Data Science” and “Data Scientist”, based in part on a survey that I did with my Meetup colleagues Marck and Sean. The basic idea is that these new, extremely-broad buzzwords have resulted in confusion, which has impacted the ability of people with skills and people with data to meet and effectively communicate about who does what and what appropriate expectations should be.| Harlan D. Harris
My newish cooking club had a dinner yesterday with the theme American Beer. I was tasked with dessert, and came up with this recipe for Pretzel Whoopie Pies. They turned out extremely well, so I thought I’d share the recipe here. Sources: About.com Southern Food (basic whoopie pie recipe) Southern Living (Stout buttercream) Ideas In Food (flavor combo) Bakewise (cookie recipe tweaks) Ingredients: 2 egg yolks 1/2 c minus 1 T sugar 1 T light corn syrup 1/2 c finely ground unsalted mini pr...| Harlan D. Harris
I just returned from the useR! 2012 conference for developers and users of R. One of the common themes to many of the presentations was integration of R-based statistical systems with other systems, be they other programming languages, web systems, or enterprise data systems. Some highlights for me were an update to Rserve that includes 1-stop web services, and a presentation on ESB integration. Although I didn’t see it discussed, the new httr package for easier access to web services is al...| Harlan D. Harris
As I’ve discussed here before, there is a debate raging (ok, maybe not raging) about terms such as “data science”, “analytics”, “data mining”, and “big data”. What do they mean, how do they overlap, and perhaps most importantly, who are the people who work in these fields? Along with two other DC-area Data Scientists, Marck Vaisman and Sean Murphy, I’ve put together a survey to explore some of these issues.| Harlan D. Harris
On last week’s Build and Analyze – a great podcast nominally about iOS development, but actually more about just living a tech-geek lifestyle – Marco talked a lot about the rumored “Apple TV” and whether it could actually be a groundbreaking product. He concluded that it probably couldn’t. Most people wouldn’t dump a working TV just for an Apple brand; the touch-screen interface that Apple has been using for the iPad and iPhone wouldn’t work for a TV; the only apps that would ...| Harlan D. Harris
I’m fond of navel gazing, meta discussions, and so forth. I’ve recently written about inferring navel gazing from link data, and about the meaning of the “Analytics” buzzword. This post will be my second on that other infectious buzzword, “Data Science”. When I moved to Washington DC in July, I was struck by the fact that there was no Meetup for analytics/applied statistics/machine learning/data science. There’s a great DC Tech Meetup, a great Big Data Meetup, and a great R Meet...| Harlan D. Harris
This past Friday, the web portal to the US Federal government, USA.gov, organized hackathons across the US for programmers and data scientists to work with and analyze the data from their link-shortening service. It turns out that if you shorten a web link with bit.ly, the shortened link looks like 1.usa.gov/V6NpL (that one goes to a NASA page). And because this service was paid for by taxpayer money, the data about each clickthrough is freely available.| Harlan D. Harris
In my previous post, I motivated a web application that would allow small-scale sustainable meat producers to sell directly to consumers using a meat share approach, using constrained optimization techniques to maximize utility for everyone involved. In this post, I’ll walk through some R code that I wrote to demonstrate the technique on a small scale. Although the problem is set up in R, the actual mathematical optimization is done by Symphony, an open-source mixed-integer solver that’s ...| Harlan D. Harris
A personal interest I have is the ethical and sustainable production of food. I’ve been a member of and helped run Community Supported Agriculture groups, and my wife and I currently purchase the majority of our meat from a group of upstate NY pastured-livestock producers who sell their products through CSAs. It’s an ala-carte business model, where I place an order on a website, and the next week I pick up the frozen products cut and packaged as if for retail.| Harlan D. Harris
For a project I’m working on at work, I’m building a predictive model that categorizes something (I can’t tell you what) into two bins. There is a default bin that 95% of the things belong to and a bin that the business cares a lot about, containing 5% of the things. Some readers may be familiar with the use of predictive models to identify better sales leads, so that you can target the leads most likely to convert and minimize the amount of effort wasted on people who won’t purchase ...| Harlan D. Harris
I recently attended the INFORMS Conference on Business Analytics and Operations Research, aka “INFORMS Analytics 2011”, conference in Chicago. This deserves a little bit of an explanation. INFORMS is the professional organization for Operations Research (OR) and Management Science (MS), which are terms describing approaches to improving business efficiency by use of mathematical optimization and simulation tools. OR is perhaps best known for the technique of Linear Programming (read “Pr...| Harlan D. Harris
Neil Saunders has an interesting (to me) blog post up this morning, with the title “Dumped on by data scientists.” He uses the use of “data scientist” in a Chronicle of Higher Ed article to rant a little bit about the term. For Neil, it’s redundant, as the act of doing science necessarily requires data; it’s insulting, as if “scientist” wasn’t cool enough and you have to add “data”; and it’s misleading, as many people who call themselves “data scientists” are actua...| Harlan D. Harris
I was recently given the opportunity to re-present my ggplot2 talk, which I originally gave to the NYC R Meetup, to the DC R Meetup group. The Meetup was held co-located with the Predictive Analytics World conference in Alexandria, VA. (More on my thoughts on PAW below…) Contentwise, I made only small changes, changing a bit of patter and adding more examples at the end. I still love ggplot, with some frustration at the way it is typically introduced.| Harlan D. Harris
The Meetup phenomenon, which is now substantial and longstanding enough to be more of a cultural change than a flash in the pan, continues to impress me. Even more so than tools like LinkedIn, Meetups have changed the nature of professional networking, making it more informal, diverse, and decentralized. Last night, statistics consultant (and cheap eats guru) Jared Lander and I presented a talk on a statistical technique tangentially related to my professional work (more closely associated wi...| Harlan D. Harris
As more and more people get smartphones that can play MP3s or streamed music, like the iPhone or Android phone like the upcoming HTC Evo 4G (I’m gettin’ one!), fewer and fewer people are buying standalone MP3 players. Why have two gadgets when you can have just one? But I think there are good reasons to do so, but I don’t think the right combination of products are currently on the market. Here’s my thinking.| Harlan D. Harris
A few months back I gave a presentation to the NYC R Meetup. (R is a statistical programming language. If this means nothing to you, feel free to stop reading now.) The presentation was on ggplot2, a popular package for generating graphs of data and statistics. In the talk (which you can see here, including both my slides and my patter!) I presented both the really great things about ggplot2 and some of its downsides. In this blog post, I wanted to expand a bit on my thinking on ggplot, the G...| Harlan D. Harris
Hah, it rhymes! The fall haul (hah!) from the CSA inevitably means two things, root vegetables and ungodly numbers of pears. I love root vegetables, but tend to find pears to be pale imitations of apples. But when poached in red wine, or cooked with butter and sugar, pears can have some redeeming value. Recently, for the cooking club, I teamed up to make a dessert with the theme “Fall Harvest.| Harlan D. Harris
The problem of how to monetize online publishing, particularly news publishing, is neither new nor all that surprising. But the ongoing lack of a solution is …| www.harlan.harris.name
Comments on a few things from Larson's book, and highlights of a few things that jumped out as particularly resonant.| www.harlan.harris.name