November 4, 2015 10:04 PM   Subscribe

How can I become a Data Scientist.......the first answer in this Quora thread is a pretty concise profile of this hot (and hyped) new career choice written by William Chen, whose data science blog Storytelling with Statistics has got some cool stuff in it. Like the Probability Cheat Sheet.

Some nifty nuggets from Chen's Quora answer:

“Knowledge is knowing that a tomato is a fruit, wisdom is not putting it in a fruit salad.” - Miles Kington

"How does Richard Feynman distinguish which concepts he understands and which concepts he doesn't?

Feynman was a truly great teacher. He prided himself on being able to devise ways to explain even the most profound ideas to beginning students. Once, I said to him, "Dick, explain to me, so that I can understand it, why spin one-half particles obey Fermi-Dirac statistics." Sizing up his audience perfectly, Feynman said, "I'll prepare a freshman lecture on it." But he came back a few days later to say, "I couldn't do it. I couldn't reduce it to the freshman level. That means we don't really understand it." - David L. Goodstein, Feynman's Lost Lecture: The Motion of Planets Around the Sun"
posted by storybored (16 comments total) 106 users marked this as a favorite
Wow, that's quite a bit of info in the first link there. Helpful, but perhaps a bit overwhelming.

So why not add to it. Here are my thoughts:

(1) Yep, R is the computer language to learn. Matlab will also do, but R is free, so. I first learned R some 24 years ago -- back then it was called S -- and I still use it today, even though I'm not in any kind of data science field professionally.

(2) Basic stats and probability theory. IMHO, there's no better intro text for this than the Freedman, Pisani & Purves text. I TA'd for two of the authors when I was in grad school at Berkeley, so I know firsthand that this book works wonders for undergrads who are trying to understand the logic underlying statistics.

(3) Linear algebra. Specifically, understanding what matrices are and how to analyze them in the context of a linear vector space. A huge amount of data analysis depends on an understanding of this theory. Plus, it's really cool stuff, so just learn it.
posted by mikeand1 at 11:57 PM on November 4, 2015 [24 favorites]

I have always been concerned that anything I do will be over-saturated by people doing it because I am unlikely to be the outlier in cultural trends. Studying statistics, this does not help the feeling.

At least I appear to be doing all the right things.
posted by solarion at 12:19 AM on November 5, 2015 [4 favorites]

Determining whether the data science field is oversaturated seems like a job for a data scientist.
posted by um at 12:37 AM on November 5, 2015 [14 favorites]

Also a slight dereail, but: I went to Quora recently to find I was logged in under the name of somebody I'd never heard of, and who appeared to be last active in 2011 (although they'd subscribed to some lists).

Anecdotal, and therefore insufficient data, but you might want to change your password.
posted by solarion at 2:21 AM on November 5, 2015

Am I a jerk for thinking step one to becoming a data scientist is not asking "How do I become a data scientist?" on Quora?

There's occasionally some decent discussion on Quora (and William Chen definitely writes good answers almost across the board--of course he works for Quora, so is all but obliged to), but I feel like 95% of the questions are "How do I do X? My sole motivation is to make loads of money." Your life is not over if your job title is not "Data Science" or you don't work for one of Google, Facebook or LinkedIn (or Quora or possibly Tesla, for some reason).

Also, since I'm complaining about Quora, I also find it amusing that their terrible tagging is the backdrop to all these "I want to be a data scientist" questions.
posted by hoyland at 4:09 AM on November 5, 2015 [3 favorites]

I have to say that, if you are going to say:

Many news articles have inherently flawed main premises.

You probably shouldn't quote:

“Knowledge is knowing that a tomato is a fruit, wisdom is not putting it in a fruit salad.”

Because the proper response to that first clause is "what classification system are you using?" Because a tomato is a fruit in certain classification systems and not in others. Also, you probably could put a tomato in a fruit salad, if it's the right tomato and the right salad.
posted by GenjiandProust at 4:28 AM on November 5, 2015 [3 favorites]

The first step in becoming a data scientist is understanding that "How do I become a data scientist?" is a different question from "How does one become a data scientist?" which in turn is a different question from "How did you become a data scientist?".
posted by srboisvert at 5:03 AM on November 5, 2015 [2 favorites]

I like to put tomatoes in a fruit salad with plums and nectarines and herbs and shreds of white cheese and maybe a little honey and balsamic vinegar and IT'S FUCKING DELICIOUS.
I miss summer :(
posted by casarkos at 7:22 AM on November 5, 2015 [3 favorites]

I don't think it's an empirical statement that you can get a data science job without a degree, which is one implicature made by all the self-education stuff. See Kaggle jobs board. Of the currently top 10 featured jobs displayed to me, 1 requires a BS, 3 require BS but prefers MS or PHD, 2 require a PHD, 2 require a grad degree (MS or PHD), 1 requires "An excellent academic record in a numerical discipline", leaving one position where you can try lacking one of those. Although you can definitely get a BS and an MS without learning half this stuff, especially in physics and math.

I mean, it doesn't stop you from applying for this stuff, but it's less like software engineering proper, where an ideological claim exists that a degree is not necessary (and it is rendered true in some places, although less so than you think).

I note that W. Chen, the author, has a bachelor's in stats from Harvard and a master's in math, also from Harvard.
posted by curuinor at 7:30 AM on November 5, 2015 [5 favorites]

Holy smokes, thank you. I've been trying to use my background in stats and data management to break into data science and further my career but until recently (maybe literally a few months ago) the whole realm seemed like this frustrating, amorphous blob of skill-sets, credentials, education and job descriptions that I couldn't navigate. Luckily there are also online MS programs available that I am looking into that seem pretty damn affordable.
posted by Young Kullervo at 7:33 AM on November 5, 2015 [1 favorite]

“Data science is statistics on a mac” is how I’ve heard it described.

Many years ago my company Stamen helped develop their first job description for what would later be named data science. They ended up hiring a fabulous Physics PhD with the twin skillsets of analysis and storytelling. He was equally at home doing the math and performing in the videos about the math. The ToC of Programming Collective Intelligence was probably the closest thing to a description of the field coming out of that time, but the narrative ability was key.
posted by migurski at 8:28 AM on November 5, 2015 [3 favorites]

Hello, I have been employed as a data scientist for two years. Here is roughly how I would rank candidates, and I would say if you score ten points or more you are hire-able, but to be the top candidate you will need to score closer to twenty. This is not super scientific, just off the top of my head.

Industry experience doing data science: +20
Quantitative PhD: +10
High-ranking Kaggle profile: +7
Industry experience using applied math and statistics: +7
Industry experience doing research: +5
Industry experience writing software: +5
Non-quantitative PhD: +5
Quantitative research oriented master's: +5
Skills-oriented Master's degree (e.g. data science specializations): +5
CS, Math, Software Engineering, Statistics undergrad: +5
Non-quantitative research oriented master's: +3
Other numerically focused undergrad (science, engineering etc): +3
Average Kaggle profile: +3
Coursera courses: + 3
Data science personal projects: +2
Self-taught programmer: +1
Self-taught statistics: +1
Self-taught machine learning: +1

The interesting thing is that the word is out about data science. There are a huge amount of candidates with PhDs in the sciences who want to be data scientists. I heard a recent opening in our company had 300 applicants. The vast majority never made it past the first filters.

What's needed here is a mix of a strong numerical foundation, an ability to do research, some facility with software, and of course communication skills, though those are harder to measure. And experience of course. If you don't have either a quantitative or a computer science foundation, your first step is to go build one. If you don't have a research-oriented graduate degree, then the best way to show you can do the job is to get some experience doing some part of it. Right now I think Kaggle is the best platform for that.

I think one message though is that don't expect to throw together some Coursera courses and be competitive. Coursera is great, but we have our pick of candidates with science PhDs and they will always rank higher, because so much of the job is unstructured research. There are other ways in, but none of them are quick and easy.
posted by PercussivePaul at 8:46 AM on November 5, 2015 [10 favorites]

Also a slight dereail, but: I went to Quora recently to find I was logged in under the name of somebody I'd never heard of, and who appeared to be last active in 2011 (although they'd subscribed to some lists).

Is it possible you had used a shared login from BugMeNot or similar? I'm never "myself" on that site.
posted by Gordafarin at 8:53 AM on November 5, 2015

IAADS. I'll just be glad when we get a different title, because I feel a bit silly saying "I'm a data scientist". What other kind of scientist is there?

I work in my company's "Big Data" group, and that's even worse, because a lot of the best projects we've done have been on medium or even small data. And everybody who's ever heard of Hadoop thinks they can be a "Big Data software engineer"; we get the worst piles of buzzword-filled resumes to sort through.
posted by madcaptenor at 8:55 AM on November 5, 2015

My title is "Data Engineer", although I also poked at my boss to change it to "Guy Who Does Stuff" and he was like, "you're an idiot". This may be true
posted by curuinor at 12:49 PM on November 5, 2015 [1 favorite]

« Older "She described it years later as a 'boy-meets-dog...   |   a mail-order house in Schenectady Newer »

This thread has been archived and is closed to new comments