# Don’t Judge Your Ecuadorian Chef or How to Learn Statistics (Part 1)

There is a small Ecuadorian cafe right outside of my apartment building.  The beauty of living in Astoria Queens is that delicious, reasonably priced food is just a few blocks away.  The owner, Claudio, is a good buddy of mine and cooks up amazing dishes like Pernil, Ceviche Camaron, and other Ecuadorian goodness.  It is usually quite warm inside, with South American music playing in the background and clunking sounds of dishes and cutlery filling in the rest of the spectrum.

Beef Stew, Ecuadorian Style - \$8 with Soup

Claudio’s brother Edgar helps run the place by chopping vegetables and occasionally stepping up to the stove.  The place is small, usually a bit crowded and you get this feeling that these guys earn every dime the hard way.  Always hustling, delivering lunches and dinners, but never forgetting to smile.  Recently I had the following conversation with Edgar.

“Enrique, cómo está, my friend?”, says Edgar as I enter the store.
(Everyone there calls me Enrique.  I think I might have said my name was Enrique, when we first met.  They also sometimes call me Che Guevara. So it goes.)

“Bien, gracias, ¿y tú?”
(This is the extent of my Spanish)

“Very good; jost getting the lonches ready”
(He knows, that was the extent of my Spanish.)

Then after a while.

“Enrique, I want it to ask you so’mthin. You mens’ioned that you studied stateestics. I want to ask you, what should I do to get into this field.”

I feel embarassed.  I never thought that he could be even remotely interested in the subject. So, like a total ass I proceeded with a question of my own.

“Oh yeah?  Have you taken any math classes?”, I ask sardonically.

“Well,  jes.  I have taken Calculus I through III, Probability Theeory, Leenear Algebra, Differensial Equasions, Real Analysis I, Real Analysis II, Heestory of Mathemathics, and lets see.  I have also taken Leenear Regression.  They let me take graduate classes at Hunter, since I have done so well in my undergraduate ones”, nonchalantly answers Edgar, while wiping off the butcher’s knife.

OK, now I feel like a special kind of ass – a dumb-ass.

“Dude, you are much more qualified than I ever was.  What can I do for you?”, I said trying to recover.
(I call a lot of people ‘dude’.  Even my daughter.  She once said: “I am not a dude daddy, I am a girl!”. “Whatever, dude”, came the reply.)

“Well, I was looking for graduate schools, but they are so espensive. I don’t think I can afford it.”

“You don’t need to go to graduate school, Edgar.  I mean you can, but I think that you can get the skills necessary without it and perhaps learn even more.”

When I came home, I thought about how one would go about learning the subject without relying on the traditional venues.  What if the person does not have Edgar’s math background?

The Teaser
The first question one usually asks when confronted with a new subject is why?  The answer is that statistics will make you a better human being, no more, no less.  As a side effect, it will help you make better decisions when confronted with uncertainty, which is pretty much always.

Here is a teaser.  Suppose you are given two sequences of Boy/Girl births: BGGBGG and BBBBBG.  Which one is more likely to occur?  If you think that the first one is more likely, you are in the majority.  You are also wrong, but not to worry.  A casual investigation of probabilities will clear this right up.

Since, I am in the baby mood, here is another.  A small rural hospital and a large urban hospital reported a ratio of boys to girls to be 0.70 on a given day.  Do you believe this?  Where is it more likely to occur?

Study after study shows, that we do not have the intrinsic intuition for probabilities.  Nature, it seems, did not prepare us to live in a highly uncertain world.  All of us need some help with this, some more than others.  For example, if the right answer to this problem, is immediately obvious to you, you are either a highly evolved human being or an alien robot.  In either case, I envy you!

The Tools
While discussing the subject, I purposely avoid terms like Machine Learning, Data Mining, and Data Science.  A lot has been written about the subtle differences.  If you care about that, here is one discussion thread.  Also, see a related post from Drew Conway here.

There are three main threads I see that are needed to be an applied statistician. (Theoretical statistics is largely an academic discipline, and if you are interested in it and live in New York take a look here.)

• PROGRAMMING: Statistics is becoming more and more computational.  In the periphery, it is important to be able to prepare your data for analysis.  At the core, a rapidly evolving, but old discipline of Bayesian Statistics seldom offers analytical solutions and the ability to program simulations becomes very important.  Programming your own optimization algorithms, is also a pedagogical and rewarding exercise.  Do you need to be a computer scientist?  No, but it helps to think like one.  If you have never programmed, read the first chapter and ignore the references to Python.  More on programming languages later.
• FOUNDATIONAL MATH: Despite Edgar’s love for Mathematics, a basic grasp of multivariate calculus, probability, and linear algebra is enough to acquire the skills and intuition necessary for applied statistics. If you want to dig a little deeper, Measure Theory, a branch on Real Analysis, helped solidify the foundations of probability. It is good to know it exists, but I would trust that it works, and put off the investigation till much later.  If all this sounds daunting, don’t be alarmed.  Anyone can learn enough math to understand what is going on.  (I did, and I am no mathematician.)
• STATISTICS:  This will largely depend on the type of work you enjoy doing, but the core will include Linear Regression (like predicting income based on age, sex, education, and so on) and Generalized Linear Models (like classifying a tumor as benign or malignant, based on its size.)  I would also recommend Time Series (used in climate models, economics and finance, among others.) Understanding of Markov Chains and Markov Chain Monte Carlo (MCMC) will be helpful for Bayesian Statistics. This list go on and on, but even if you master the first sentence in this paragraph, you are way ahead of the curve.

If the above sounds like a bunch of gibberish, do not be deterred.  Because, so far I have not told you much.  I listed and named the areas of focus, but I have not described the method.  Where should I acquire these skills, using what mode of interaction, and in what sequence?

To give you a preview, I do not like the highly theoretical and sequential method used in the most graduate Statistics programs.  In the next post, I will describe an alternative.

In the meantime, I am off to Claudio’s cafe.   I hear he has lentil soup and goat stew cooking.  (While I eat, Claudio is trying to convince me to follow Jesus.  After sampling the food, I am starting to believe…)