“Word Embedding Models let us take a stab formalizing an interesting counterfactual question: what would the networks of meaning in language look like if patterns that map onto gender did not exist?” [more inside]
What's the Difference Between Data Science and Statistics? — Not long ago, the term "data science" meant nothing to most people-even to those who worked with data. A likely response to the term was: "Isn't that just statistics?" These days, data science is hot. The Harvard Business Review called data scientist the "Sexiest Job of the 21st Century." So what changed? Why did data science become a distinct term? And what distinguishes data science from statistics?
"Julia is a high-level, high-performance dynamic programming language for technical computing." The language is elegant (homoiconic, multiple-dispatch, consistent and extensible type system), but with easy-to-learn syntax. The standard library includes a wide array of fast and useful functions, and the number of useful packages is growing. [more inside]
Interactive map of pronunciation and use of various words and phrases differs by region in the US. Based on Bert Vaux's online survey of English dialects, the program allows you to see results for individual cities, as well as nationwide (though inexplicably it does not include Alaska or Hawaii).
Yesterday, the New York Post published a dramatic image on its cover of a Queens man just seconds from being hit by a Q train after being pushed by another man who is now in custody. [more inside]
Bully is an unflinching new documentary about teenagers and bullying. Controversially the MPAA is giving it an R for "language", preventing it's subjects from seeing it, and refusing to change that rating. In response Harvey Weinstein is considering a leave of absence from the MPAA, 75,000 people signed an online petition urging the rating be overturned and now in retaliation the National Association of Theatre Owners is now threatening to give all Weinstein Company films an automatic NC-17 rating in future.
A hive plot (slides) is a beautiful and compelling way to visualize multiple, complex networks, without resorting to "hairball" graphs that are often difficult to qualitatively compare and contrast. [more inside]
OpenCPU provides a RESTful interface to the popular open-source statistical package R, enabling the user to perform calculations and create publication-quality or web-embeddable visualizations via standard web requests.
Dataists give their hopes and dreams for data, data tools and data science in 2011. Already, Google has provided Google Refine (previously) to help clean your datasets. While great visualizations can be created with online tools or by combining R (great posts previously), with ggplot2, GGobi, and even Google Motion Charts With R (already built into Google Spreadsheets). Need data? Needlebase, helps non-programmers scrape, harvest, merge, and data from the web. Or if you’re introspective, Your Flowing Data and Daytum provide tools to measure and chart details of your own life.
Her name is Shawnee Jenkins... A brief fairy tale about a little girl and her parents. Who live in the basement... [more inside]
R is powerful, but tricky. RKWard is an UI that make R user friendly or, at least, make it more similar to SPSS or Stata. Screenshots!
R is quickly becoming the programming language for data analysis and statistics. R (an implementation of S) is free, open-source, and has hundreds of packages available. You can use it on the command-line, through a GUI, or in your favorite text editor. Use it with Python, Perl, or Java. Sweave R code into LaTeX documents for reproducible research. [more inside]