Influential statisticians

To this list, I’d add Prashanta Mahalanobis of Mahalanobis distance and who set up Indian Statistical Institute.

Seth lists the statisticians who’ve had the biggest effect on how he analyzes data:

1. John Tukey. From Exploratory Data Analysis I [Seth] learned to plot my data and to transform it. A Berkeley statistics professor once told me this book wasn’t important!

2. John Chambers. Main person behind S. I [Seth] use R (open-source S) all the time.

3. Ross Ihaka and Robert Gentleman. Originators of R. R is much better than S: Fewer bugs, more commands, better price.

4. William Cleveland. Inventor of loess (local regression). I [Seth] use loess all the time to summarize scatterplots.

5. Ronald Fisher. I [Seth] do ANOVAs.

6. William Gosset. I [Seth] do t tests.

My data analysis is 90% graphs, 10% numerical summaries (e.g., means) and statistical tests (e.g., ANOVA). Whereas most statistics texts are about 1% graphs, 99% numerical summaries and statistical tests.

I think this list is pretty reasonable, but I have a few comments:

1. Just to let youall know, I wasn’t the Berkeley prof who told Seth that EDA wasn’t important. I’ve even published an article about EDA. That said, Tukey’s book isn’t perfect. I mean, really, who cares about the January temperature in Yuma?

2, 3. I agree that S and R are hugely important. But if they weren’t invented, maybe we’d just be using APL or Matlab?

4. Cleveland also made important contributions to statistical graphics.

5. I’ve written an article about Anova too, but at this point I think of Fisher’s version of Anova as an excellent lead-in to hierarchical models and not such a great tool in itself. I think that psychology researchers will be better off when they forget about sums of squares, mean squares, and F tests, and instead focus on coefficients, variance components, and scale parameters.

6. I don’t really do t-tests.

P.S. I wouldn’t even try to make my own list. As a statistician myself, I’ve been influenced by so many many statisticians that any such list would run to the hundreds of names. I suppose if I had to make such a list about which statisticians have had the biggest effect on how I analyze data, it might go something like:

1. Rubin: He taught me applied statistics and clearly has had the largest influence on me (and, maybe, on many readers of my books)

2. Laplace/Lindley/etc.: The various pioneers of hierarchical modeling and applied Bayesian statistics

3. Gauss: Least squares, error models, etc etc

4. Cleveland: Crisp, clean graphics for data analysis. Although maybe if Cleveland had never existed, I’d have picked this up from somewhere else

5. Fisher: He’s gotta be there, since he’s had such a big influence on the statistical practice of the twentieth century

6. Jaynes: Not the philosophy of Bayes stuff, but just one bit–an important bit–in his book where he demonstrated the principle of setting up a model, taking it really seriously, looking hard to see where it doesn’t fit the data, and then looking deeply at the misfit to see what it reveals about how the model to see how it could be improves.

But I’m probably missing some big influences that I’m forgetting right now.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s