Excited about statistics (really?)

Feb 2, 2014 • Pauline Barmby

Yes, really.

I spent much of summer 2013 revising a paper about the prevalence of X-ray-emitting binary stars in a certain type of star cluster. Some star clusters have these X-ray-emitting  stars, and some don’t, and we can figure out which are which relatively easily. What’s not so easy to figure out is why: while it’s known that certain properties of a star cluster make it more or less likely to host an X-ray source, these properties are all correlated with one another and so it’s tough to disentangle what is going on. Previous attempts at doing this in the literature were kind of a mess, and the paper’s referee wasn’t too happy with our first attempt at it either.

We had a set of measurements of star clusters, both with and without X-ray sources. I had been doing backflips trying to figure out how to quantify the importance of the various parameters, and a clever person on an online forums suggested that I try logistic regression, which deals with the case of categorical dependent variables and non-categorical independent variables. It was great to know that the technique I needed was already in existence: people in my field have the bad habit of revinventing such things from scratch, which I didn’t want to do. I found a nice online example which explained how to do logistic regression in Python; following along with this also let me enlarge my Python toolbox with the addition of pandas and statsmodels.

Why was I so excited about learning this new piece of statistics? I had a problem to solve and a paper to get finished, and this did the trick on both counts. I was also excited about the idea of introducing this technique to astronomers; I can see it being useful for many. But it was also really fun just to learn something new: I spend a fair bit of my time trying to figure out better ways to teach stuff I already know *to other people,* and it was refreshing to dive into something that was new to me.