27 posts tagged with datamining. (View popular tags)
Displaying 1 through 27 of 27. Subscribe:

Related tags:
+ (7)
+ (5)
+ (4)
+ (4)


Users that often use this tag:
Weebot (3)
rzklkng (2)
Datamining Shakespeare --- Othello is a Shakespearean tragedy: when the hero makes a terrible mistake of judgment, his once promising world is led into ruin. Computer analysis of the play, however, suggests that the play is a comedy or, at least, that it does the same things with words that comedies usually do. On October 26, 2011, Folger Shakespeare Library Director Michael Witmore discussed his recent work in Shakespeare studies which combines computer analysis of texts, linguistics, and traditional literary history. Taking the case of Shakespeare's genres as a starting point, Witmore shows how subtle human judgments about the kinds of plays Shakespeare wrote — were they comedies, histories or tragedies? — are connected to frequent, widely distributed features in the playwright's syntax, vocabulary, and diction. (approx. 30 minute lecture.) [more inside]
posted by crunchland on Dec 8, 2011 - 29 comments

Oren Etzioni is a renowned data mining expert who sold Farecast, his airline-ticket price predictor to Microsoft for $115 million. Now he's turned his focus to the general problem of finding when the best shopping bargains occur. Punch in a consumer electronics item and his website will tell you whether to buy now or to wait. Over time he'll be adding more product categories. In any case, he can tell you right now the best prices for most things aren't on Black Friday or Cyber Monday.
posted by storybored on Dec 1, 2011 - 14 comments

YouTube Insult Generator. Enter a keyword or phrase, and the Insult Generator will trawl YouTube for relevant videos, and pull insults from those videos. Wired write-up. [via]
posted by Phire on Oct 21, 2011 - 71 comments

Google and its massive knowledge feedback loop.
posted by Weebot on Oct 4, 2011 - 63 comments

By processing a million songs in twenty minutes, and using the Stairway detector Paul discovered many songs that Slow Build "more" (up to 29) than Stairway to Heaven (which gets only a 9). [via] [more inside]
posted by morganw on Sep 19, 2011 - 44 comments

It’s for your own good—that is Google’s cherished belief. If we want the best possible search results, and if we want advertisements suited to our needs and desires, we must let them into our souls. James Gleick writes about 'How Google Dominates Us' for the New York Review of Books. [more inside]
posted by WalterMitty on Aug 1, 2011 - 61 comments

On not reading books. Franco Moretti, author of the controversial Graphs, Maps, Trees: Abstract Models for a Literary History, proposes that literary study needs to abandon "close reading" for "distant reading": "understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data." He is co-founder of the Stanford Literary Lab, where he and like-minded colleagues have published studies on programming computers to use statistical analysis to identify a novel's genre(PDF) and analyzing plots as networks(PDF). Similar projects are on the way.
posted by Saxon Kane on Jun 26, 2011 - 53 comments

It has applications in health care, pharmaceuticals, facial recognition, economics/related areas, and of course, much much more. Previously, MeFi discussed controversial homeland security applications, and the nexus between social networking and mobile devices that further contributes to the pool. With plenty to dig into, let's talk Data Mining in more detail. [more inside]
posted by JoeXIII007 on Apr 22, 2011 - 14 comments

"The results were astounding. In a six-month period — from Aug 31, 2009, to Feb. 28, 2010, Deutsche Telekom had recorded and saved his longitude and latitude coordinates more than 35,000 times. It traced him from a train on the way to Erlangen at the start through to that last night, when he was home in Berlin. Mr. Spitz has provided a rare glimpse — an unprecedented one, privacy experts say — of what is being collected as we walk around with our phones."
posted by Scoop on Mar 26, 2011 - 45 comments

"In many places the concentration [of convicted residents] is so dense that states are spending in excess of a million dollars a year to incarcerate the residents of single city blocks."
Using rarely accessible data from the criminal justice system, the Spatial Information Design Lab and the Justice Mapping Center have created maps of these “million dollar blocks” and of the city-prison-city-prison migration flow for five of the nation’s cities. The maps suggest that the criminal justice system has become the predominant government institution in these communities and that public investment in this system has resulted in significant costs to other elements of our civic infrastructure — education, housing, health, and family. Prisons and jails form the distant exostructure of many American cities today.
See the several linked pdfs.
posted by OmieWise on Dec 28, 2010 - 59 comments

MeFi's own Elizabeth Pisani, of The Wisdom of Whores, on Big Data and the End of the Scientific Method (PDF).
posted by Weebot on Dec 19, 2010 - 28 comments

Kaggle hosts competitions to glean information from massive data sets, a la the Netflix Prize. Competitors can enter free, while companies with vast stores of impenetrable data pay Kaggle to outsource their difficulties to the world population of freelance data-miners. Kaggle contestants have already developed dozens of chess rating systems which outperform the Elo rating currently in use, and identified genetic markers in HIV associated with a rise in viral load. Right now, you can compete to forecast tourism statistics or predict unknown edges in a social network. Teachers who want to pit their students against each other can host a Kaggle contest free of charge.
posted by escabeche on Nov 13, 2010 - 10 comments

Social Networks and Data Mining: Where it is and Where it's Going
Telecoms operators naturally prize mobile-phone subscribers who spend a lot, but some thriftier customers, it turns out, are actually more valuable. Known as “influencers”, these subscribers frequently persuade their friends, family and colleagues to follow them when they switch to a rival operator. The trick, then, is to identify such trendsetting subscribers and keep them on board with special discounts and promotions. People at the top of the office or social pecking order often receive quick callbacks, do not worry about calling other people late at night and tend to get more calls at times when social events are most often organised, such as Friday afternoons. Influential customers also reveal their clout by making long calls, while the calls they receive are generally short. Companies can spot these influencers, and work out all sorts of other things about their customers, by crunching vast quantities of calling data with sophisticated “network analysis” software. Instead of looking at the call records of a single customer at a time, it looks at customers within the context of their social network.

posted by Weebot on Sep 4, 2010 - 22 comments

Roger Ebert gets his voice back [more inside]
posted by Blazecock Pileon on Mar 2, 2010 - 56 comments

According to one estimate, mankind created 150 exabytes (billion gigabytes) of data in 2005. This year, it will create 1,200 exabytes. Data data everywhere and possibly too much to drink?
posted by Glibpaxman on Feb 28, 2010 - 21 comments

How (not) to write an online-dating message, based on a sample of 500,000 "first contact" messages. [more inside]
posted by Kadin2048 on Sep 14, 2009 - 79 comments

The National Security Agency is building a data center in San Antonio that’s the size of the Alamodome. Microsoft has opened an 11-acre data center a few miles away. Coincidence? Not according to author James Bamford, who probably knows more about the NSA than any outsider. Bamford's new book reports that the biggest U.S. spy agency wanted assurances that Microsoft would be in San Antonio before it moved ahead with the Texas Cryptology Center. Bamford notes that under current law, the NSA could legally tap into Microsoft’s data without a court order. Whatever you do, don't take pictures of it the spy building unless you want to be taken in for questioning.
posted by up in the old hotel on Dec 8, 2008 - 42 comments

Worried about social-network data mining? Facebook hires Ted Ullyot, former right-hand man to former Attorney General Alberto Gonzales, as its general counsel. Tapping Ullyot, who worked on the infamous torture memo and other illustrious projects, is a sign that the burgeoning Scrabble platform "is a little more grown-up," says Facebook public-policy VP Elliot Schrage.
posted by digaman on Sep 30, 2008 - 40 comments

Pluribo is a way-cool Firefox extension that automagically summarises Amazon product reviews.
posted by matthewr on Jul 1, 2008 - 25 comments

The idea was that a spike in, say, falafel sales, combined with other data, would lead to Iranian secret agents in the south San Francisco-San Jose area. I've read this article twice now because I was laughing too hard the first time. If I were more paranoid I might actually seriously ask what sort of data mining the FBI is doing, but... falafel sales! via. [more inside]
posted by tarheelcoxn on Nov 6, 2007 - 75 comments

Arguing Against Datamining MySpace in search of Pedophiles. In certain circles, MySpace has become the villain de jour for all sorts of debauchery (threatening the President, phishing , dismembered women , etc.), as well as being fertile hunting grounds for the pedophile. Given the huge size of MySpace, reported as 100 million accounts (although estimates of active accounts are far lower, at approximately 43 million ), and an hypothetical and absurdly low natural incidence of pedophiles and pedarasts (let's say just 1%), one could assume that there could be as many as 430,000 to 1,000,000 of them out there. Wired contributor and reformed hacker (Kevin Poulson) has developed a script to weed out the bad seeds [via]. His script was effective, although it took several months of sifting and refining, as well as numerous false positives - 744 registered sex offenders, 497 with convictions for crimes against children. While such an experiment has merit, how much time, resources, and law enforcement manpower will be wasted chasing down the ""high-cost "false positives", and what will be neglected and sacrificed for that effort?
posted by rzklkng on Oct 16, 2006 - 38 comments

AOL releases 3-months of queries from 500k users. AOL, either fairly or unfairly, is sometimes considered the internet with training wheels. So while parsing this data, keep that in mind. Some of these queries seem like spam email subjects, don't they? Don't forget, this is the same demographic that brought you the September that didn't end. AOL tried to retract the data, but it's of no use - it's out there, on the web.
posted by rzklkng on Aug 7, 2006 - 89 comments

The Secret History of Able Danger The WP may have have the goods on Able Danger. The Pentagon and Intel officials are mum on the data mining project because it could have been illegal.
posted by raaka on Sep 29, 2005 - 16 comments

Exploring enron -- A breathtaking web of conspiratorial email messages. How often did Jeff Skilling email Ken Lay? How often were those emails about company business? Internal alliances? The company's allegiance? The California energy crisis? Who else was talking about it? Who wasn't? Temptingly complete with software download and MySQL tables for your own tinfoil hat explorations.
posted by boo_radley on Jun 13, 2005 - 10 comments

Docusearch settles claim for 75K with family whose daughter was killed by a stalker who purchased her personal information from them -- a killer whose intentions were described on a Googleable website. The NH Supreme Court determined last year that Docusearch, the company who sold Amy Boyer's work address and SSN to her killer could be held liable for her death, even though some of that information was publicly available. An "Amy Boyer's Law" intended to increase privacy by restricting the display, sale or use of SSNs received negative reviews by privacy organizations and ultimately was removed from an appropriations bill. In a statement, Amy's parents encourage others to use the Internet to keep track of who may be keeping track of their kids. "If only we had typed our daughter's name into any search engine, the Amy Boyer Web site that was posted by her killer would have come up, and we could have called the police...This may never have happened."
posted by jessamyn on Mar 11, 2004 - 6 comments

That U.S. intelligence agencies confuse terrorists with children on passenger jets is a reminder that data collection is easy, but data analysis is hard. That must be why the six-year-old daughter of one of Boing Boing's co-founders is on the CAPPS list as a security risk. All this is also a reminder that we need privacy safeguards for these data mining programs.
posted by homunculus on Jan 11, 2004 - 34 comments

The Patriots didn't win; Britney did. TiVo analyzed their viewers behavior during the Superbowl and they came up with some pretty interesting results. How soon till TV programming adapts to viewer behavior?
posted by costas on Feb 5, 2002 - 36 comments

Page: 1