Join 3,497 readers in helping fund MetaFilter (Hide)


Computers replace grad students
April 4, 2013 3:51 PM   Subscribe

A new software program grades essay answers automatically. While not the first to do so, the program released by EdX is expected to gain more traction as it will be used to give instant feedback for the non-profit's free online courses offered by top universities. Critics have already found ways to game the system.
posted by DoubleLune (69 comments total) 18 users marked this as a favorite

 
The software uses artificial intelligence to grade student essays and short written answers, freeing professors for other tasks.

Yet more evidence that techno-utopianism is a faith unfalsifiable by mere facts like those of labor politics — this should really read "freeing professors from their jobs."
posted by RogerB at 3:59 PM on April 4, 2013 [26 favorites]


Funny, using flowery language, writing huge sentences and paragraphs, and everything else he mentions is exactly how I got such good grades from my human scorers when I was in school. Maybe the robots really are right.
posted by Ghostride The Whip at 3:59 PM on April 4, 2013 [2 favorites]


Based on the Game the System link, the E-Rater doesn't sound too different from human SAT graders. Longer essays better? check. Longer sentences better? Check. Longer words better? Check. Doesn't matter what you say as long as you sound good saying it? Check.

I'd use it for an SAT prep class, is what I'm saying. Or really, I'm starting to agree with the EdX people
Good writers have internalized the skills that give them better fluency, he said, enabling them to write more in a limited time.
If this can be used to teach underserved populations how to BS a five-paragraph as well as I learned how to BS a five-paragraph essay, then what's the harm? It's a skill that served me well in college after all.
posted by muddgirl at 4:04 PM on April 4, 2013 [2 favorites]


A recently released study has concluded that computers are capable of scoring essays on standardized tests as well as human beings do.

If a bot which cannot pass the Turing test can grade standardized tests as well as humans, I question the value of the standardized test.
posted by justsomebodythatyouusedtoknow at 4:06 PM on April 4, 2013 [23 favorites]


all that's remaining is for a program that will generate the essays in the first place
posted by rebent at 4:10 PM on April 4, 2013 [7 favorites]


Can this AI system do factchecking and logic-checking ? Otherwise you still need a human grader to review each essay
posted by Bwithh at 4:28 PM on April 4, 2013 [1 favorite]


The software uses artificial intelligence to grade student essays and short written answers, freeing professors for other tasks.

Yet more evidence that techno-utopianism is a faith unfalsifiable by mere facts like those of labor politics — this should really read "freeing professors from their jobs."
posted by RogerB at 3:59 PM on April 4
[5 favorites +] [!]


Surely "long-suffering PhD student TAs" rather than profs...
posted by Bwithh at 4:40 PM on April 4, 2013 [5 favorites]


That is crap. Talk about a race to the bottom. ETS should eliminate the written section entirely if they can't be bothered with human readers. An algorithm capable of the analysis required for this task would also be capable of passing the Turing test.
posted by Skwirl at 4:42 PM on April 4, 2013 [1 favorite]


Computers replace grad students

Computers can never replace grad students because programs cannot feel the abject humiliation that result from being dressed down by an associate professor with self esteem issues.
posted by Joey Michaels at 4:43 PM on April 4, 2013 [45 favorites]


ETS should eliminate the written section entirely if they can't be bothered with human readers.

Who has claimed that ETS is eliminating human readers entirely, or EdX for that matter?
posted by muddgirl at 4:47 PM on April 4, 2013


Surely "long-suffering PhD student TAs" rather than profs...

Considering the limited number of ways for grad students to support themselves without incurring crushing debt, eliminating TA positions in favor of machine graders seems like a way to induce more suffering.
posted by Noms_Tiem at 4:49 PM on April 4, 2013 [1 favorite]


Anyone want to help me create a highschool/intro college smartphone app? Basically it'll just track that a student is sitting in a chair for 7 hours a day, play videos, pop up an automatic quiz periodically, and send monthly bills to the parents and local school district.
posted by Grimgrin at 4:50 PM on April 4, 2013 [3 favorites]


But how are the computers' kvetching skills?
posted by ThatFuzzyBastard at 4:58 PM on April 4, 2013 [1 favorite]


all that's remaining is for a program that will generate the essays in the first place

That's addressed in the Times article:


Two former students who are computer science majors told him that they could design an Android app to generate essays that would receive 6’s from e-Rater. He says the nice thing about that is that smartphones would be able to submit essays directly to computer graders, and humans wouldn’t have to get involved.
posted by amarynth at 5:06 PM on April 4, 2013 [9 favorites]


Can this AI system do factchecking and logic-checking ? Otherwise you still need a human grader to review each essay

Can this metafilter commenter do the link-clicking and article-reading? Otherwise you still need a human reader to review each link and make sure that their comments are not asking questions directly answered by the articles under discussion.
posted by jacalata at 5:08 PM on April 4, 2013 [17 favorites]


In high stakes testing where e-Rater has been used, like grading the Graduate Record Exam, the writing samples are also scored by a human, they point out. And if there is a discrepancy between man and machine, a second human is summoned.

Summoning a second human? Ha ha, what a fun idea! What will our benevolent technarchons think of next?
posted by Greg Nog at 5:08 PM on April 4, 2013 [7 favorites]


But if we automate being idiots, what will the idiots do?
posted by klangklangston at 5:13 PM on April 4, 2013


This whole idea is terrible in every way. "Let's give up giving people an education, and just pretend to give them an education."

How long before a US school just fires all its teachers, replaces them with "teaching assistants", provides canned lectures from a MOOC and grades everything by machine?
posted by BungaDunga at 5:17 PM on April 4, 2013 [7 favorites]


There are numerous exploitive adjunct positions at most American universities, Noms_Tiem, maybe this makes their lives more dreary, but hopefully this reduces their numbers.

I'd expect departments ultimately spend vastly more on their TAs than their adjuncts, especially with tuition remission, but TAs represent part of the university's educational mission, and the field's future at lower tier universities, while adjuncts represent that mission's failure.

Automating grading improves society by improving consistency, reducing bias, ala racism and sexism, etc. I'm far less confident about video lectures benefiting students though.
posted by jeffburdges at 5:22 PM on April 4, 2013


How long before a US school just fires all its teachers, replaces them with "teaching assistants", provides canned lectures from a MOOC and grades everything by machine?

This sounds like a good charter school idea, actually.
posted by Noms_Tiem at 5:22 PM on April 4, 2013 [1 favorite]


It's another example to trying to use technology to solve a human problem. The theory is that this will save money, but only in the sense that it will allow the state to further cut funding. Soon we will have a primary and secondary school system that pretends to prepare students for college, which will robo-pretend to prepare students for the jobs they won't get because of outsourcing and automation. In the end, there will just be a few legislators lurking in the rubble of their filthy chambers, eying each other hungrily and wondering how it came to this. Naturally, having never learned critical thinking skills, they will have no answer.
posted by GenjiandProust at 5:24 PM on April 4, 2013 [21 favorites]


Yeah, seems like the smart thing to do is learn how to program your robot.
posted by notyou at 5:30 PM on April 4, 2013 [1 favorite]


The theory is that this will save money, but only in the sense that it will allow the state to further cut funding.

What state is going to cut funding? Every organization mentioned in connection with this technology (Harvard, MIT, EdX, Stanford, ETS) is a private institution.
posted by muddgirl at 5:31 PM on April 4, 2013 [1 favorite]


What state is going to cut funding?

They're making the program available for free, which means any school would be able to use it at no cost. They're talking about this on a K-12 level for state budget purposes.
posted by DoubleLune at 5:36 PM on April 4, 2013 [1 favorite]


I can't speak for other schools, but round these parts computers are more expensive than grad students.
posted by pemberkins at 5:38 PM on April 4, 2013 [2 favorites]


They're talking about this on a K-12 level for state budget purposes.

Did I miss an article in this thread about that?

I taught myself how to BS a 5-paragraph essay, and it served me well in college. I'm not necessarily opposed to providing a tool for students to learn how to quickly and accurately structure an essay. As has been pointed out, these programs can't teach logic and argument-forming - a teacher is still needed for that important task. It seems like we're trying to argue that providing kids with free calculators puts math teachers out of work.
posted by muddgirl at 5:40 PM on April 4, 2013


n high stakes testing where e-Rater has been used, like grading the Graduate Record Exam, the writing samples are also scored by a human, they point out. And if there is a discrepancy between man and machine, a second human is summoned.

Summoning a second human? Ha ha, what a fun idea! What will our benevolent technarchons think of next?


Man, this conversation is weird for me since it's about my day job and all.
posted by PhoBWanKenobi at 5:40 PM on April 4, 2013 [1 favorite]


I can just imagine the joys of dealing with first-year students who've internalized these concepts: "How DARE you fail me! Sure, I never read the play and I've argued that Shakespeare wrote "As You Like It" in 1945, but I used lots of big words!"

Fuck it. I've been diagnosed with most virulent form of a nasty, fatal melanoma and the prospect of a vastly shortened life is not much fun, but at least I won't have to deal with this shit.
posted by jrochest at 5:41 PM on April 4, 2013 [4 favorites]


"Considering the limited number of ways for grad students to support themselves without incurring crushing debt, eliminating TA positions in favor of machine graders seems like a way to induce more suffering."

Considering the limited number of tenure-track job openings relative the numbers of PhDs being graduated each year in most fields, eliminating the need for so much cheap TA labor to do the grading drudgework seems like a way to reduce the number of surplus grad students who are strung along by their department for years and years only to find out they're wasted their 20s preparing for a career that they'll never have.
posted by Jacqueline at 5:43 PM on April 4, 2013 [2 favorites]


There is now a range of companies offering commercial programs to grade written test answers, and four states — Louisiana, North Dakota, Utah and West Virginia — are using some form of the technology in secondary schools. A fifth, Indiana, has experimented with it.

muddgirl, it's not a far leap from selective use to wide-spread use when it goes from costing money to free.
posted by DoubleLune at 5:49 PM on April 4, 2013 [1 favorite]


Summoning a second human? Ha ha, what a fun idea! What will our benevolent technarchons think of next?

I think it's a hopeful sign they don't summon a second e-grader instead.
posted by looli at 6:08 PM on April 4, 2013


What state is going to cut funding? Every organization mentioned in connection with this technology (Harvard, MIT, EdX, Stanford, ETS) is a private institution.

You think it would stay there? As I said in the MOOC thread earlier today, I don't think that Stanford and MIT are planning to replace their own courses with MOOCs....
posted by GenjiandProust at 6:13 PM on April 4, 2013 [1 favorite]


Also see Steven Levy's recent article in Wired, "Can an Algorithm Write a Better News Story Than a Human Reporter?"
posted by jjwiseman at 6:14 PM on April 4, 2013 [1 favorite]


muddgirl, it's not a far leap from selective use to wide-spread use when it goes from costing money to free.

It does seem like a big leap to me, because unlike unaccredited college courses, K-12 teachers in the US have to follow a pretty rigorous set of guidelines for textbooks and teaching tools, guidelines that are set at a very high level by people who get input from a lot of different agents, including parents and teachers. A program like this isn't one that can just be implemented on the fly.

That doesn't mean it won't ever be implemented, or that some school board isn't going to jump on this without enough thought, but I don't think it's as easy as "free resource == all high school english teachers will be fired and replaced by robots." Are there any reports that a school board was considering implementing this tool? Or that this is being marketed to school boards?

One area of potential concern is that the ETS provides standardized testing for high school students, so this tool could be used to grade standardized written tests. On the other hand, my understanding of high-school level standardized essay testing is that they already doesn't require a high level of analytical thinking - similar to the SAT writing exam (I got a top score on an essay I wrote about getting a Christmas gift. I don't think the content mattered). TL;DR if high school essays can be graded by a computer, there is a much deeper problem in the grading standards.
posted by muddgirl at 6:18 PM on April 4, 2013


all that's remaining is for a program that will generate the essays in the first place

Would you like that in humanities or computer science flavor?
posted by 23 at 6:28 PM on April 4, 2013 [1 favorite]


It does seem like a big leap to me

There are all kinds of exceptions to regulations -- private schools don't have a required amount of time/days in class, charter schools offer online courses. And regulations are changing every year. I'm just saying it's out there, and I would not be surprised to see someone more worried about money than quality find a way to use this.
posted by DoubleLune at 6:32 PM on April 4, 2013


This is nuts. My whole motivation for writing essays was (and probably would be, if I was still writing essays) to be communicating ideas with another person, for a reader. That mattered to me, that I was creating something, for someone else. If it's just going to be fed into a computer, scanned and discarded, what is the point? It reduces writing to little more than just shoveling a pile of dirt from one side of the room to the other.
posted by Flashman at 6:37 PM on April 4, 2013 [10 favorites]


I have a great deal of respect for the work by Vijay Kumar and his open courseware initiative. However the EdX effort is more typical of MIT's other messes in online education. The OKI OSID stuff was a horrible mess of unimplementable babble. MIT never delivered in their promises to Sakai, even though iirc they got a bit of the Melon Foundation grant money. MIT and their partners are spending millions of dollars to reinvent Moodle. Then they are throwing in a few imaginary features that will never really work right such as auto-grading essays.
posted by humanfont at 6:38 PM on April 4, 2013 [1 favorite]


Our societal tombstone will read:

They were too fucking smart clever stupid for their own good.
posted by Benny Andajetz at 6:49 PM on April 4, 2013


Grah! In that "game the system" article: "Gargantuan words are indemnified because e-Rater interprets them ....". That's just an utterly incorrect usage of "indemnify", right? Do you think that was intentional?
posted by benito.strauss at 6:54 PM on April 4, 2013 [1 favorite]


Before insisting that this sort of system couldn't possibly do a better job than a human, we should step back and ask what it is we hope to achieve when grading essays. (Note - any similarities between what follows and a five paragraph standardized test essay are genuinely unintentional.)

A graded assignment serves three basic functions. First, its most obvious and least interesting function is to provide students with a grade, allowing for the evaluation both of an individual's mastery of the material and the quality of the course itself. Second, and more important - especially for the vast majority of college students in the world who don't attend highly selective universities - graded papers teach students the specific skills required to write coherent statements and make themselves understood. Third, and much more important, it forces students to spend time thinking about material that they otherwise wouldn't.

Which of these things can a machine grader hope to accomplish today?

When it comes to evaluation, it's fair to be skeptical. There are plenty of subjects in which judging the difference between a good and a poor essay response requires a lot more than grammar and keywords, and a computer that knows enough about the world to turn free-form language into testable logical statements is a long way off. It isn't hard to invent questions where an algorithmic approach seems incredible. (To pick an actual example, "write a dialog in the style of Plato which critique's Mach's principle using modern evidence.") But, in truth, there actually are quite a lot of instances where keywords get you most of the way there. If the point of a question is to see if students did the reading and know the major events involving Pepin the Short and the Pope, a keyword template is likely to do more or less exactly the same thing a human grader would.

That you can game the system by deliberately creating nonsense using the same form and words doesn't necessarily prove the machine grader is wrong; in order to successfully trick system the student may need the very skills required to do well on the test anyway. Like so many opportunities to cheat, this one likely takes just as much effort and skill receiving high marks without cheating.

What are the limits to this sort of system? That's an impossible question to answer a priori, and it's both amusing and terrifying to see the number of otherwise reasonable people who attempt to do so. Fortunately, we don't have to. If there's one thing MOOCs and good at generating, it's score data. I saw a talk by Agarwal recently that included data from thousands of tests showing that machine grading is better at reproducing the average human grader score than any one human grader for several very specific kinds of tests. That's hard evidence to refute without resorting to theology. In the next few years we'll find out just where the current limits are to accurate machine grading; and I suspect many of us who take grading seriously will be annoyed by the answer.

When it comes to basic writing skills, it's pretty hard to make the case that humans with red pens are a good idea, except as some sort of cruel and particularly degrading public works program for academics. Not only are there plenty of classes explicitly focused on writing, but it's a significant component of most undergraduate humanities courses. As someone who tends to pal around with academic elites, I was shocked the first time I talked at length with a typical community college professor about what she does with her day. For a lot of people - perhaps most - college has more to do with learning rudimentary communication skills than mastering specific material. When it comes to rudimentary writing, computers can do a fine job, and we'd be crazy not to let them. Hiring someone with decades of education to circle misplaced commas isn't just wasteful, it's obscene. Replacing them with a system that provides real-time feedback on multiple drafts and can handle an unlimited number of assignments sounds like a great idea.

The trick of course is to find a way to take human graders and let them spend their time doing something more useful. If time spent correcting poor sentences were instead spent discussing the results of that grading, or providing one-on-one tutoring, or teaching more classes and discussion sections, the campus would be a better place for everyone. I recognize making that happen is no easy task, and it's going to take a lot to insure we don't just fire all the graders. But, the alternative is to insist that we cannot do away with soul-crushing busywork because our society would never be willing to pay people to actually teach and interact with students if we didn't trick them into it. If that's true, then one has to wonder why are we so worried about keeping academia alive in the first place. If there's really no alternative between circling commas and being unemployed, we might well be better off unemployed.

When it comes to student motivation, there's a lot less data, and most of it is complicated by self-selection. This is the part we really ought to be thinking and arguing about, if you ask me. How do you convince someone to actually engage with course material? By testing them on it, or (occasionally) by telling them that it's really exciting. Both benefit from the knowledge that a human knows who they are and will see what they write. Judging by the knee-jerk reaction most academics have to the idea of machine grading, students will probably have a tough time convincing themselves that it's as good as the response of a human reader. But, there's no reason a human has to see *everything* that they do, or that the human necessarily has to be a professor. Intermittent and randomized grading, peer grading, or opportunities to be informally tested in class can do a lot of the same thing. There's plenty of room here to experiment. If you tell a student that one essay per course will be graded by a professor - and that it will be read thoroughly and thoughtfully - is that enough to convince them to work hard on every essay? I don't know the answer to that. At what point do a handfull of carefully graded random papers become more useful than every paper hastily graded? Are there structures in which peer grading can be just as good as grading by academics? We need to find out, and it's worth making sure we don't trick ourselves into making poor decisions before we've done so.

There are countless grounds for complaining about the current crop of MOOCs and their advocates. (Perhaps slightly fewer when it comes to edX than the others.) But, that they may endanger the noble tradition of grading papers by hand seems like an odd choice. History suggests that starting from the position, "we know this technology is better, but we like out jobs the way they are," is rarely a winning argument. Figuring out whether the technology actually is better, and where it is not, is likely to be more useful.

Besides, we were all students once. We know just how sloppy, fickle, uneven, and vengeful human graders can be, and how easy it is to game the existing system to earn high marks. Not making things worse is important, but making things better would be fantastic.
posted by eotvos at 7:20 PM on April 4, 2013 [7 favorites]


Why the hell should I have to write stupid clunky English for a computer to read? The whole point of writing silly human languages is to convey meaning to humans.
posted by Matt Oneiros at 7:35 PM on April 4, 2013 [1 favorite]


Why the hell should I have to write stupid clunky English for a computer to read? The whole point of writing silly human languages is to convey meaning to humans.

Readable Score: 57 (The higher the score the easier the article is to read!)
Grade Level: 6

-http://www.writingtester.com/tester/grade_level.php
posted by sebastienbailard at 8:03 PM on April 4, 2013 [2 favorites]


There is a domain of problems in machine learning where the goal is to take the representation of a piece of data and represent it in a lower-dimensional space.

For example, an autoencoder is a neural network which takes in a piece of data and puts out the same piece of the data, something which is absolutely pointless unless you do it in a lower-dimensional space. So instead of representing a handwriting sample as pixels, what a nicely trained autoencoder will do is to represent a handwriting sample as the linear combination of edges. Like this. The specific implementation there puts it in a higher, but sparse dimensional space, for efficiency reasons.

If you look at the problem of grading, the essential problem, as posed by a computer scientist, is much the same: take the very high-dimensional data point of a human's judgment of another human being and reduce it into a discrete one-dimensional datapoint: A+, A, A-, B+, B, B-, etc.

It has long been established that humans are the best at dimensionality reduction and this specific sort of pattern recognition, by a fair bit. However, what is rapidly changing is how good the computers are at it. There was a NYT article some time ago about a large-scale ML effort to do that sort of thing. Then, there was a NYT article some time ago about how people are making money off of that sort of thing. The fundamental idea behind the first linked NYT article was essentially dimensionality reduction, with innovations made to get it to work reasonably well on a bunch of computers on a shitfuckton of data.

(I have a suspicion that it's the NYT reporting on this stuff because they have an in at a lot of info visualization places: they're the best data visualization newspaper by far, and I imagine that also leads to a lot of connections at ML and statistics departments.)

Now, if you will see the first NYT article again, there's an interesting Stanford professor named Andrew Ng. Is that the same Stanford professor who cofounded Coursera? Why, yes he is.

One of the first classes on Coursera was a machine learning class taught by him, as a matter of fact. I was once told by someone that he took out a mathematics requirement for his great big machine learning lecture class in Stanford because so many people interested in education - education students, developmental psych students, etc - were pestering him if they could take his class without the math requirements. So he changed it to "be good at math", and then someone told him that you can't do that and then he changed it to "proficiency in statistics, linear algebra and programming."

I'm very pessimistic about whether the programs which have been tested as linked do anything at all. I'm also very pessimistic about our human ability to do better.

Les Perelman has damn good reason to criticize the ETS's system, but he also has much grounding to criticize the ETS's human system, which has long been known to be a system where the length of an essay is the best predictor of its score.

I distinctly recall Daphne Koller (the other cofounder of Coursera) mentioning that they had some plans for automated essay grading. I don't see a Coursera automated essay grading system in the NYT article. I don't know if they're still pursuing that. But if I were to judge whether an automated grading system was feasible, I would wait until two of the world's experts on dimensionality reduction came out with theirs.

I believe that it would still be no better than human graders, probably worse in a humorous way. Perhaps they will realize after all the obvious problems of dimensionality reduction, how you lose detail as you shed those dimensions. I think that that might be an unexpected benefit, something that all parties might unexpectedly agree upon.
posted by curuinor at 8:35 PM on April 4, 2013 [5 favorites]


Can this AI system do factchecking and logic-checking ? Otherwise you still need a human grader to review each essay

Can this metafilter commenter do the link-clicking and article-reading? Otherwise you still need a human reader to review each link and make sure that their comments are not asking questions directly answered by the articles under discussion.
posted by jacalata at 5:08 PM on April 4 [8 favorites +] [!]


How rude - really.

Yes, I read the article.
posted by Bwithh at 8:41 PM on April 4, 2013


If you can game the system, you're probably smart enough to skip a grade or three.
posted by Blazecock Pileon at 8:43 PM on April 4, 2013 [1 favorite]


Yes, I read the article

Well then, I apologise. I didn't imagine you could have read it and missed this discussion:

The e-Rater’s biggest problem, he says, is that it can’t identify truth. He tells students not to waste time worrying about whether their facts are accurate, since pretty much any fact will do as long as it is incorporated into a well-structured sentence. “E-Rater doesn’t care if you say the War of 1812 started in 1945,” he said.

... The substance of an argument doesn’t matter, he said, as long as it looks to the computer as if it’s nicely argued.

For a question asking students to discuss why college costs are so high, Mr. Perelman wrote that the No. 1 reason is excessive pay for greedy teaching assistants.

“The average teaching assistant makes six times as much money as college presidents,” he wrote. “In addition, they often receive a plethora of extra benefits such as private jets, vacations in the south seas, starring roles in motion pictures.”

E-Rater gave him a 6.

...The possibilities are limitless. If E-Rater edited newspapers, Roger Clemens could say, “Remember the Maine,” Adele could say, “Give me liberty or give me death,” Patrick Henry could sing “Someone Like You.”

...E.T.S. also acknowledges that truth is not e-Rater’s strong point. “E-Rater is not designed to be a fact checker,” said Paul Deane, a principal research scientist.

posted by jacalata at 8:58 PM on April 4, 2013 [2 favorites]


The only way this could be better is if dynamically change its assessment based on patterns it doesn't yet recognize but that tend to turn up in writing that it scores well. Wait, does it do this?

Because in that case, I can imagine student writing gradually over the years drifting farther and farther away from anything that we could conventionally call English.* It'd be funny.

Seriously, though, this is really interesting work so long as it's being used in situations where it's treated as a method of reading in and of itself, rather than in situations that invite us / demand us to assess the algorithm itself in terms of its fidelity to the grades human readers give. It's fascinating stuff... but its connection to the various ways various humans assess texts is contingent at best.

*: Picture a child raised in an environment where they and everyone around them had a star-trek-style universal translator. Why would the child learn to speak any extant language? Instead, the child would be incentivized to produce any pattern of sounds that the translation machine recognized as expressing desires or thought, regardless of whether or not any other human had used those sounds to express anything before. The student in a machine-graded-writing scenario has a much easier task in front of them than a child hashing out a pattern of sounds that the universal translator could use to render their desires or statements into something intelligible to others, because the student, unlike the child, isn't concerned with expressing desires or making statements or, really, anything other than producing patterns that the algorithm will identify as "quality." Although initially these patterns might co-locate with whatever EdX's team of human readers might consider quality, there's no reason that it's necessary that what the algorithm picks up as quality and what humans pick up as quality bear any relation to each other, other than the relation of appearing in the same text. Presumably the EdX people won't let it drift too far from consensus ideas of language, but it would be great if they did.
posted by You Can't Tip a Buick at 9:18 PM on April 4, 2013 [2 favorites]


I saw this earlier today and giggled when I saw the reporter's name. Too bad he didn't also write the older story about the robo-writers.
posted by fantabulous timewaster at 9:19 PM on April 4, 2013 [2 favorites]


Well then, I apologise. I didn't imagine you could have read it and missed this discussion:


I don't accept bad faith apologies.

The article you quote from is one year old and describes a different piece of software produced by a different organization than the software described in the NYT article from today (the first link). I assumed the grading software reported on as newly developed by the NYT article today is the main focus of discussion here, not the year-old one.
posted by Bwithh at 9:19 PM on April 4, 2013 [2 favorites]


Moreover, we already assess standardized test writing, and even student writing in general, in ways that insert wholly new, stereotyped and quasi-machinic assessment standards — witness the five paragraph essay form, one that has precisely fuckall to do with good writing*, but which millions upon millions of schoolchildren are called upon to master specifically because it is easily assessed.

Okay, so I guess the conclusion that I'm coming to here is that writing training in America isn't about writing, exactly, it's about learning to demonstrate willingness to submit to orders, ability to suss out unstated details of orders, and cleverness in carrying out orders. It's purely accidental that these exercises happen to happen in the medium of typed words. I trust machines to successfully automate evaluating that much more than I trust them to evaluate writing.

*: Or rather, it's a form that constrains the student such that they can either meet the demands of the form or produce writing that is of any value to anyone... but never both.
posted by You Can't Tip a Buick at 9:27 PM on April 4, 2013 [1 favorite]


humanfont: " MIT never delivered in their promises to Sakai, even though iirc they got a bit of the Melon Foundation grant money. MIT and their partners are spending millions of dollars to reinvent Moodle."

I'm on board with that as long as they teach Moodle how to transactional database. It's ridiculous that DB sessions will make it such that a 30 student quiz will kill the entire site...
posted by pwnguin at 9:33 PM on April 4, 2013


Fun link to the grade level tester, sebastienbailard.

Spork hater!
Grade level: 2.

"Spork hater," he exhaled.
Grade level: 6.

"Spork hater," he exhaled, with a certain reluctant exuberance.
Grade level: 12.
posted by chortly at 10:23 PM on April 4, 2013 [6 favorites]


curuinor: "If you look at the problem of grading, the essential problem, as posed by a computer scientist, is much the same: take the very high-dimensional data point of a human's judgment of another human being and reduce it into a discrete one-dimensional datapoint: A+, A, A-, B+, B, B-, etc."

There's an interesting point buried in here: why do we do that? Sure, there's far too many dimensions to present to the student, but why boil it into one number, with a sentence or two at best? Perhaps we should be breaking these out into four or five axes for students. Or use all billion of them but show the top four or five most influential for a given piece of work.
posted by pwnguin at 10:29 PM on April 4, 2013


I would like to comment in this thread, as I am in higher education and am keenly interested in the ways in which techno-utopians are envisioning my inevitable redundancy. Alas, it is 2:00 AM and I am reading student paper proposals, giving abundant annotation, bibliographical advice and general guidance for how to engage in a sustained research project on a literary topic. For each of these 6-8 page writeups, I will take probably an hour to think through the various issues each proposal presents. I will consider the skills and interests of the students (each of whom I have probably met with 3 times by this point in the semester), shaping my advice and expectations to my assessment of their individual skills and long-term educational goals. After they receive my comments, I'll meet with each of them multiple times over the remaining weeks of the term, offering follow-up advice, reassurance and, occasionally, reading drafts.

This is what undergraduate education looks like. Done properly, it is a personalized interaction between a dedicated teacher and an engaged student. Its function is to help students become better thinkers and writers, and to learn how to approach complex problems systematically. The skills my students learn will help them no matter what profession they ultimately choose. And the topics we discuss and the books we read over the semester will enrich their lives incalculably.

Write a computer program that does all that EdX. Free me up for other things. Like sleeping.

The problem here is grades. We have convinced ourselves over the past 50 years that grades are a reliable measure of worth. Having reduced education to a quantifiable, numerical outcome, the logical question is, then, how we can derive that number more efficiently. Forgetting all the while, that the grade was always just a stand in for something else: a core human ability called "learning."

Mark my words, if this takes root in higher education, the next fad in assessment will be the Voight-Kampff test.
posted by R. Schlock at 10:52 PM on April 4, 2013 [14 favorites]


the five paragraph essay form, one that has precisely fuckall to do with good writing

Really? Though it is not likely to produce a masterpiece in itself, the five-paragraph essay is a manageable way to teach fourth-graders the basic building blocks of making a cogent written argument. If you are expected to produce them in the twelfth grade (I'm looking at you standardized tests), then you are being underused, but I am pretty happy with the form as a starting piece.
posted by dame at 10:54 PM on April 4, 2013 [3 favorites]


"That you can game the system by deliberately creating nonsense using the same form and words doesn't necessarily prove the machine grader is wrong; in order to successfully trick system the student may need the very skills required to do well on the test anyway. Like so many opportunities to cheat, this one likely takes just as much effort and skill receiving high marks without cheating."

My senior year, we had to take the MEAP (Michigan Education Assessment Program) and that was the first year that they introduced the essay portion. But, lucky us, the essay wasn't required for our diplomas like it was for every year after us.

We still had to spend a fucking hour writing a one page essay on "A time you made a choice."

Mine was about choosing to come to school and sit through bullshit exams because it was less hassle to fuck around and draw dicks on an essay form than it was to walk out from the testing classrooms.

To be fair, the next year when I was at community college knocking down my requirements because I couldn't afford a real university, I got the same question from my English comp teacher, and she freaked out because I answered it with a one page essay about how my deep dedication to the prescriptive rules of grammar got me through a North Vietnamese torture camp — I think that was my first F on any assignment, and we had to have a conference since I "Didn't take the assignment seriously." I had always assumed there was a tacit recognition of the rote, meaningless bullshit that underlay essays like that, where as long as you were reasonable about the form your teacher was bright enough to recognize what a mind-numbing slog it was to write any of this crap. Robots would have been an improvement in pedagogy.

(Of course, that was also the teacher that decreed that any color movie was better than any black and white movie, and stuck to it even when I asked her if that meant "Ernest Goes to Jail" was better than "Citizen Kane.")
posted by klangklangston at 11:36 PM on April 4, 2013 [3 favorites]


I assumed the grading software reported on as newly developed by the NYT article today is the main focus of discussion here, not the year-old one.

Well, I guess in that case you would start with the quotes in that article.

“Let’s face the realities of automatic essay scoring,” the group’s statement reads in part. “Computers cannot ...measure ...accuracy, reasoning, ...convincing argument.... and veracity, among others.”

These quotes are from Les Perelman, the same guy who was arguing against the other software in the other article. I may be overestimating everyone involved here, but I imagine that if the EdX software had managed to accurately grade the truth and evidence in an article, it would have been mentioned right after the quote saying it could not do so. Instead, the writer follows the quote with the comment "but we expect lots of people to use it anyway". So I'm pretty sure it has not overcome this, and the only 'new development' they've got is that they are giving away their software for free, which will likely see it become much more widely used than the previous versions, only used by the standardised testing companies that developed them.
posted by jacalata at 11:51 PM on April 4, 2013


I propose that metafilter place the grade level tester's evaluation after every comment for next april fools.
posted by jeffburdges at 12:48 AM on April 5, 2013 [3 favorites]


If these robots are so goddam great why do I still have to go to work?
posted by nowhere man at 3:50 AM on April 5, 2013 [2 favorites]


Poom!

Readable Score: 122 (The higher the score the easier the article is to read!)
Grade Level: -4

What is a grade -4? Those cats seemed pretty dim, but hardly uneducated....
posted by GenjiandProust at 4:26 AM on April 5, 2013 [3 favorites]


Computers can never replace grad students because programs cannot feel the abject humiliation that result from being dressed down by an associate professor with self esteem issues.

I imagine there's some kind of WOPR machine deep in the earth that is busy analysing all facets of the feelings in these relationships, but would much prefer a nice game of chess.
posted by urbanwhaleshark at 4:33 AM on April 5, 2013


I feel like there is something truly horrible about writing an essay only for an algorithm to read. Mostly, for one, because the algorithm doesn't read; it may be the case that a gradeable essay is written to a specific rubric and toward a specific response, but the essence of that response is that it is human, and arises from having been read. At the heart of the graded essay is language. Language is inextricably social. Even in the highly artificial environment of the graded essay, it is the effect of language on a language-using reader which is the expected, and only fair, means of evaluation.
posted by adoarns at 5:17 AM on April 5, 2013 [3 favorites]


Always worth a read, Manna, by Marshall Brain.

Touches on the same issues starting in the fast food world.
posted by KaizenSoze at 5:43 AM on April 5, 2013 [1 favorite]


“There is a huge value in learning with instant feedback,” Dr. Agarwal said. “Students are telling us they learn much better with instant feedback.”

Real-time feedback certainly has its place, but reinforcing students' expectations for instant gratification in everything is not a good idea.
posted by sriracha at 6:58 AM on April 5, 2013


This is where standardized testing is heading vis-a-vis essay-grading in secondary schools, from what I understand. I sympathize with the poster above (jrochest) who is facing death, yet hasn't lost his sense of humor. At this point (and who knows what my doctor will say next week) my only joy is that I get to retire before all this bullshit becomes the norm. Computers grading essays? Indefensible. See arguments above.
posted by kozad at 7:49 AM on April 5, 2013


This entire thread through comment 66:

Readable Score: 45 (The higher the score the easier the article is to read!)
Grade Level: 8
posted by double block and bleed at 9:59 PM on April 5, 2013


Paradise Lost:

Readable Score: 23 (The higher the score the easier the article is to read!)
Grade Level: 12

The Canterbury Tales (with footnotes):

Readable Score: 67 (The higher the score the easier the article is to read!)
Grade Level: 4

Needs a little work, if you ask me.
posted by double block and bleed at 10:14 PM on April 5, 2013


What an awful idea, and like remonstrations against the use of torture, it seems a side issue whether it works. The problem is more innate: students should write for an audience, to actually connect to someone, about something they care about. If the problem is that there are too many students and not enough graders, especially for online work, there can be ways to arrange for meaningful peer review, or to frame the assignments to be useful to the world at large rather than pro forma for a single bored grader. (A colleague of mine likens grading law exams to "watching the same bad movie eighty times.")

Or they should just have the students turn in computer-generated prose for the computers to grade, eliminating the middleman.
posted by zittrain at 8:40 PM on April 6, 2013 [1 favorite]


« Older Comic book legend Carmine Infantino has died at th...  |  Gorgeous Portraits of Movie Ch... Newer »


This thread has been archived and is closed to new comments