Editor's Note: Honing your BS meter with a sound understanding of statistics has never been more important, as Stephanie Thomas notes in this Classic post.
My Cafe Colleague Ann Bares wrote a post at Compensation Force entitled "Stats Literacy: A Survival Skill for the Age of Data." She references John Sumser, who said that "statistics literacy looks like a survival skill for 21st Century organizational life."
This couldn't be more true.
Statistical literacy, however, is more than just brushing up on the long-forgotten concepts and formulae from your Stats 101 course. The concepts and formulae are important, but the reality is this: without highly specialized training far beyond Stats 101, you're simply not going to have the quantitative skills cut it on the "technical" side of Big Data.
The limits on the size of data sets that can be processed can now be measured in terms of exabytes (one exabyte = 1,000,000,000 gigabytes) of data. Handling these kinds of massively large data sets requires a particular set of skills: one part quant, one part IT, one part philosopher, and one part translator/communicator. Possessing these skills in the requisite proportions is a rarity, and it's why good Big Data analysts earn Big Bucks.
Does this mean you should just scrap any efforts on improving your quantitative skills? Absolutely not.
Focus your statistical literacy training on developing your BS meter. You can improve your ability to critically evaluate empirical analyses simply by knowing what questions to ask. Here are three examples to give you a flavor of what I mean:
- Were any additional studies performed that aren't presented here? It's not uncommon for multiple study approaches to be used in evaluating a data set. In some instances, different analyses lead to different conclusions. One study may indicate a strong positive relationship between wellness programs and employee retention, while another study may indicate that wellness programs have no impact on employee retention. It's important to get the big picture and evaluate all of the results, not just the ones that support a particular position. Hiding data or analyses should set your BS meter to high alert.
- What is the logical explanation for the reported relationship between X and Y? Big Data gave us data mining, or exploring large data sets for identifying previously unknown patterns. Big Data also gave us data dredging, or searching for relationships among variables with no pre-defined expectations based on logic and no validation of that relationship. There has to be some underlying explanation for the relationship. If the explanation seems contrived, it probably is, and you may have to call BS on this finding.
- How does this result generalize from the sample to the larger population? Even though they have access to exabytes of data, analysts often focus on smaller samples within the larger data set. These samples can be smaller in terms of people (e.g., employees, customers, suppliers, etc.) or they can be smaller in terms of time (e.g., summer months, pre-Christmas holiday season, etc.). What is true for a given subset may not be true for the population as a whole. It's important to understand how a given result translates to the population at large. If it's not translatable, it's not usable.
As noted in the Harvard Business Review, "even as companies invest eight- and nine-figure sums to derive insight from information streaming in from suppliers and customers, less than 40% of employees have sufficiently mature processes and skills to do so."
Statistical literacy is about being able to make good decisions by balancing analysis and judgment. Successfully maintaining this balance is a critical skill for the 21st Century workplace.
Stephanie Thomas, Ph.D., is a Lecturer in the Department of Economics at Cornell University. She teaches undergraduate and graduate courses on economic theory and labor economics in the College of Arts and Sciences and in Cornell’s School of Industrial and Labor Relations. Throughout her career, Stephanie has completed research on a variety of topics including wage determination, pay gaps and inequality, and performance-based compensation systems. She frequently provides expert commentary in media outlets such as The New York Times, CBC, and NPR, and has published papers in a variety of journals.
Comments