This couldn't be more true.
Statistical literacy, however, is more than just brushing up on the long-forgotten concepts and formulae from your Stats 101 course. The concepts and formulae are important, but the reality is this: without highly specialized training far beyond Stats 101, you're simply not going to have the quantitative skills cut it on the "technical" side of Big Data.
As of 2012, the limits on the size of data sets that can be processed were measured in terms of exabytes (one exabyte = 1,000,000,000 gigabytes) of data. Handling these kinds of massively large data sets requires a particular set of skills: one part quant, one part IT, one part philosopher, and one part translator/communicator. Possessing these skills in the requisite proportions is a rarity, and it's why good Big Data analysts earn Big Bucks.
Does this mean you should just scrap any efforts on improving your quantitative skills? Absolutely not.
Focus your statistical literacy training on developing your BS meter. You can improve your ability to critically evaluate empirical analyses simply by knowing what questions to ask. Here are three examples to give you a flavor of what I mean:
- Were any additional studies performed that aren't presented here? It's not uncommon for multiple study approaches to be used in evaluating a data set. In some instances, different analyses lead to different conclusions. One study may indicate a strong positive relationship between wellness programs and employee retention, while another study may indicate that wellness programs have no impact on employee retention. It's important to get the big picture and evaluate all of the results, not just the ones that support a particular position. Hiding data or analyses should set your BS meter to high alert.
- What is the logical explanation for the reported relationship between X and Y? Big Data gave us data mining, or exploring large data sets for identifying previously unknown patterns. Big Data also gave us data dredging, or searching for relationships among variables with no pre-defined expectations based on logic and no validation of that relationship. There has to be some underlying explanation for the relationship. If the explanation seems contrived, it probably is, and you may have to call BS on this finding.
- How does this result generalize from the sample to the larger population? Even though they have access to exabytes of data, analysts often focus on smaller samples within the larger data set. These samples can be smaller in terms of people (e.g., employees, customers, suppliers, etc.) or they can be smaller in terms of time (e.g., summer months, pre-Christmas holiday season, etc.). What is true for a given subset may not be true for the population as a whole. It's important to understand how a given result translates to the population at large. If it's not translatable, it's not usable.
As noted in the Harvard Business Review, "even as companies invest eight- and nine-figure sums to derive insight from information streaming in from suppliers and customers, less than 40% of employees have sufficiently mature processes and skills to do so."
Statistical literacy is about being able to make good decisions by balancing analysis and judgment. Successfully maintaining this balance is a critical skill for the 21st Century workplace.
Stephanie R. Thomas is an economic and statistical consultant specializing in EEO issues and employment litigation risk management. Since 1999, she's been working with businesses and government agencies providing expert quantitative analysis. Stephanie's articles on examining compensation systems for internal equity have appeared in professional journals and she has appeared on NPR to discuss the gender wage gap. Stephanie is the founder of Thomas Econometrics Inc., the host of The Proactive Employer radio show, and author of the upcoming book Compensating Your Employees Fairly: A Guide to Internal Pay Equity. Follow her on Twitter at @proactivemployr.
Image courtesy of Summers Fire via Flickr