Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How about not-beginners looking to refresh / deepen their intuitions?

I've recently been working with the Python toolset in this space -- pandas, numpy, matplotlib -- and run smack dab into my rusty regression analysis. In particular I need to better understand the distribution assumptions underlying the error distributions and the variances around the coefficient and intercept values.

Any suggestions for some deeper study / refresher?



Depends on the data sets you want to work with. For straight-up linear regression, with a heavy emphasis on observational data appropriate for microeconometrics, "Introductory Econometrics: A Modern Approach" by Jeff Wooldridge is absolutely phenomenal (an old edition is fine). (This is usually assigned for advanced undergraduate econ majors or non-advanced masters students; I don't know what the equivalent would be for undergraduate stats majors).

For more "intuition about working with data, especially if you're a visual person," Howard Wainer's books are wonderful; one example is "Graphic Discovery: A Trout in the Milk and Other Visual Adventures." They're non-technical, short chapters, discussions of different data sets.

Bill Cleveland's "Visualizing Data" and "Elements of Graphing Data" cover the same material -- graphing data -- at a more technical level. I don't know Cleveland's books would help with the issues you asked about, but... they are amazing books and if you're interested in the subject at all I can't recommend them highly enough.

I don't have any free recommendations, unfortunately.


Seconding Wooldridge. It's the only economics text I'm keeping from university - I'm getting rid of all the rest. It really digs into the material and highlights the pitfalls and incorrect assumptions in regression and forecasting. I'm planning to consult it when I start working on analytics in current/future projects.


Honestly, the only way to understand regression is to study something like Mccullagh & Nelders book. Anything else and you are going to have a very hard time really being useful without misinterpreting the results. There are some real subtleties to interpretation of regression coefficients, and more importantly structuring your data in such a way that you will answer the questions you want.

It's not an easy book, but if you've gone through to at least third year level in statistics it's approachable and you will understand it to a deep level.


If you can follow the maths of things like Poisson distributions, the central-limit theorem, etc., then it's the modelling issues that are tricky.

Take a look at Dudley's Statistics for Applications from MIT's Open Courseware - http://ocw.mit.edu/courses/mathematics/18-443-statistics-for... - is supposed to be very thorough and easy to follow, but warning, it's based on an expensive textbook.


Gelman and Hill is a nice book organized specifically around regression.


Gelman and Hill is a wonderful and under-rated book. My guess is that its clumsy title (Data Analysis Using Regression and Multilevel/Hierarchical Models) hides the fact that it's an introductory textbook that takes the reader from knowing nothing to eventually constructing complex Bayesian models. Plus, it's a pretty good tutorial on R and BUGS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: