I recently came across a few discussion pages where people asked for recommendations on statistical software that would be good for non-statisticians. The responses generally fell into one or both of two categories:

  • Don’t use anything with a point-and-click interface (e.g., SPSS, which is what I, my students, and my colleagues generally use) because you’ll have no idea what you’re doing and you’ll mess everything up. Instead, only people who can use R should be allowed to do statistics.
  • People who do not have extensive, specialized, academic training in the mathematical bases of statistics should never do statistics because – again – they have no idea what they’re doing and everything will blow up.

I find these answers deeply disappointing. First, they ignore the original question, which was for easy-to-use statistical software for non-statisticians, even though the asker made it clear that they had reasonable statistical training for their field. This violates a basic principle of communication: if you’re asked a question, then you should answer the question.

Second, the responders seemed to believe that if you are not a full-time, rigorously-trained specialist – which many of the responders apparently were (or acted as though they were) – then you should stay as far away from data as possible because you’ll break things. I certainly wouldn’t tell people that if you don’t have a PhD in psychology like I do – Social/Personality Psychology, City University of New York, 1999 – then you should never talk about people’s thoughts, feelings, and behaviors. That would be demeaning and amazingly restrictive.

Third (and related), they ignore the likely possibility that most analyses are not complicated and the questions are pretty basic: what’s the mean of this variable, do these two groups have different means, and so on. You don’t need a PhD in statistics to provide a workable answer to those questions. As Bob Dylan said, “you don’t need a weatherman to know which way the wind blows.”

Mostly, these responses strike me as unthoughtful, undemocratic, and unkind. They’re unthoughtful because they don’t seem to reflect on the reality of the asker’s needs and abilities. They’re undemocratic because they explicitly claim that only an elite group is qualified to work with data. And they’re unkind because they’re shutting somebody down for asking a sincere and important question.

Very sad.

Fortunately, the world is bigger than that and data is bigger than that. People are bigger than that, too. I’m reminded of a line by writer Robert A. Heinlein:

A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.

I don’t necessarily think that means that everybody everywhere should be able to do all of those things expertly at the same time. Rather, I think what it means (or might mean) is that people have diverse abilities and potentialities. People are not restricted in their function in quite the same way that other organisms might be.

Data is everywhere. I believe that everybody would be best served by being able to work personally and fluently with data (even if they don’t use R). I’m a very strong supporter of the democratization of data and data science, and that’s why I created datalab.cc. Don’t get bullied; the data is waiting for you, so dive right in.

[By the way, my answer to the question of user-friendly software for non-statisticians with an adequate understanding of statistics is this: First, spreadsheets such as Google Sheets or Excel. Second, visualization with Tableau Public or Plotly. Third, analysis with SPSS. That should take care of 98% of most non-specialists’ needs. R and Python are nice – I use both – and the ability to manipulate data with Bash scripts or query relational databases with SQL are definitely helpful. But those are not especially user friendly (compared to the others) and they’re definitely oriented towards specialists and full-time statisticians and data scientists. And there you have it.]