When Bertoni showed me a list of his 25 most-difficult-to-predict movies, I noticed they were all similar in some way to “Napoleon Dynamite” — culturally or politically polarizing and hard to classify, including “I Heart Huckabees,” “Lost in Translation,” “Fahrenheit 9/11,” “The Life Aquatic With Steve Zissou,” “Kill Bill: Volume 1” and “Sideways.”posted by Twang (97 comments total) 16 users marked this as a favorite
Netflix should cancel this new, irresponsible contest, which it has dubbed Netflix Prize 2. Researchers have known for more than a decade that gender plus ZIP code plus birthdate uniquely identifies a significant percentage of Americans (87% according to Latanya Sweeney's famous study.) True, Netflix plans to release age not birthdate, but simple arithmetic shows that for many people in the country, gender plus ZIP code plus age will narrow their private movie preferences down to at most a few hundred people. Netflix needs to understand the concept of "information entropy": even if it is not revealing information tied to a single person, it is revealing information tied to so few that we should consider this a privacy breach.posted by russm at 3:39 AM on September 22, 2009 [2 favorites]
I have no doubt that researchers will be able to use the techniques of Narayanan and Shmatikov, together with databases revealing sex, zip code, and age, to tie many people directly to these supposedly anonymized new records.
I'm curious...what would be the effect of taking those top 25 movies and excluding them from the ranking algorithm? Sort of of throwing out the most troublesome data? I know it wouldn't solve the issue, but...I wonder if the end result, in this case, might not be a MORE accurate suggestion engine?If the ratings for these movies were pure noise, the equivalent of a die role, then excluding them could result in more accurate ratings. However, if they are honest ratings that really do give insight to a person's opinions on movies, then it might be helpful to give them more importance during learning rather than less, which is called boosting. I always liked boosting because it makes intuitive sense in the way of practicing the things that you're bad at, but it's additionally a very important theoretical result in a field where it's extremely difficult to build theory.
« Older The Electro-Plasmic Hydrocephalic Genre-Fiction Ge... | First Zimbabwe formally abando... Newer »
This thread has been archived and is closed to new comments
posted by twoleftfeet at 2:56 AM on September 22, 2009