Levine mostly finds this amusing
April 12, 2024 8:13 AM   Subscribe

OpenAI Training Bot Crawls 'World's Lamest Content Farm' 3 Million Times in One Day “If you were wondering what they're using to train GPT-5, well, now you know,” Levine wrote in his post.
posted by bq (46 comments total) 20 users marked this as a favorite
 
"Of those 3 million page fetches, 1.8 million were for robots.txt"

Okay, now that's funny.
posted by Tell Me No Lies at 8:19 AM on April 12 [12 favorites]


Remember when the internet was young and you’d get 429 errors?

Of those 3 million 429 errors, 1.8 million were directs from metafilter
posted by rubatan at 8:25 AM on April 12


What a pity… that he told them. Too honest to perform mischief on people who deserve it, I guess.
posted by Artw at 8:29 AM on April 12 [5 favorites]


Now imagine doing this with a short AI-generated paragraph per page like some kind of terrible Wikipedia. Would that serve as a poison pill for OpenAI (and other AI projects)? Are people already doing this?
posted by GenjiandProust at 8:31 AM on April 12 [5 favorites]


Are people already doing this?

One can only hope.
posted by briank at 8:33 AM on April 12 [2 favorites]


There would be a certain poetic justice.

Though I suspect people doing because the are proponents of AI/scammers/scammy AI proponents will take the lead on content farms if they aren’t already way, way in front.
posted by Artw at 8:36 AM on April 12 [1 favorite]


every perl script is a weapon, if you hold it right
posted by gwint at 8:38 AM on April 12 [18 favorites]


Would that serve as a poison pill for OpenAI (and other AI projects)?

Every AI researcher (actual AI, not just LLMs) I have ever met is at-or-above typical Metafilter user levels of terminally online, without exception. You can safely assume they are at least as aware of this shit as anyone here, learn it about earlier because that’s what their recommendations prioritize, and filtering this stuff out once you know about it is not hard.

This ain’t Congress, there is none of the usual digital skills or knowledge gap, and in fact it’s very slightly in the other direction. The usual exceptions for some of the big corporate efforts but Google/Facebook/OpenAI (latter being the neural network specialized SV programmer set) means way less of that than you’d normally expect. Really wish people would stop thinking we’re going to clever monkey our way out of this.

Apologies if I’m overly grouchy this morning, just not feeling very patient for snarky “they’re smart enough to do this shit in the first place yet also somehow too stupid to just do the obvious fix” characterizations at the moment.
posted by Ryvar at 8:48 AM on April 12 [14 favorites]


I mean, TFA is about how they aren’t apparently smart enough to build a web crawler that avoids them, which I imagine would be easier than filtering on the other end….
posted by GenjiandProust at 8:56 AM on April 12 [9 favorites]


The fix is people literally have to tell them. And they are all con artists, so they are not going to be telling each other when they build similar.
posted by Artw at 8:59 AM on April 12 [6 favorites]


Right, and if they weren’t already aware ($5 says they were, but not $25) it’s a simple fix that will be patched immediately for this case and detecting it in the future will become a priority for the ingest team. The gathered data will have already been reduced to a single weighted copy regardless, because any team that wasn’t filtering out duplicate inputs from the beginning never produced a viable model in the first place.

And they are all con artists

This is the preconception you really need to push past. The money men are con artists because it’s the SV set and it’s a requirement of continued existence in their role. The people actually building these systems are mostly former academic researchers cashing in on their PhDs by applying what they’ve learned, with some typically very sharp software engineers embedded in each team.

You hate that this stuff exists, and while I don’t agree I can sympathize. But you’re letting that hatred cloud your perception of who actually builds it - not funds, builds - and that mischaracterization is leading you to incorrect conclusions. Implementation and monetization decisions are not just separate individuals but separate mindsets.
posted by Ryvar at 9:14 AM on April 12 [40 favorites]


I'm too lazy to provide references, but some details gleaned from somewhere (Hacker News?)

John Levine is a well known Internet spam expert. The original report is to NANOG, they put the OG in North American Networking.

Levine runs this website as surveillance for new bots. Its purpose is to catch bots being dumb.

OpenAI isn't respecting robots.txt correctly. It seems to be following links on pages that robots.txt is telling them to ignore.
posted by Nelson at 9:21 AM on April 12 [27 favorites]


Right, and if they weren’t already aware ($5 says they were, but not $25) it’s a simple fix that will be patched immediately for this case and detecting it in the future will become a priority for the ingest team

Ah, but that's where real knowledge and experience gaps come into play. Someone who is naively fixing the situation might just take two minutes to put those websites on a blacklist. After all it would be a pain to fix the algorithm and the problem has only happened once.

Poorly thought out and overly-expedient technology decisions happen all the time and networks aren't even adjacent to OpenAI's core competency.
posted by Tell Me No Lies at 9:22 AM on April 12 [1 favorite]


I’ll stop calling them con artists when my lived experience of it isn’t a tidal wave of con artists, scammers, people promising things they can’t possibly deliver and managers under the spell of that, thank you very much.

AI is new crypto and it’s practitioners very much behave that way.
posted by Artw at 9:22 AM on April 12 [24 favorites]


That 150qps is probably around one millionth of what that spider is doing right now.
posted by rlk at 9:23 AM on April 12 [1 favorite]


I’ll stop calling them con artists when my lived experience of it isn’t a tidal wave of con artists, scammers, people promising things they can’t possibly deliver and managers under the spell of that, thank you very much.

Okay, great. But even if I dropped out of the field, I still know a bunch of people who do this for a living - pretty much everyone I’m still in touch with from college is either on the Android kernel team or in one of the major ML groups. You’re mostly right about the people trying to monetize it, but on the implementation side you’re …just plain wrong. Not sure else how to put it. You’re demonizing people you don’t know, and whether or not you’re right to question their ethics (with OpenAI in particular the answer is nearly always yes), you’re letting feelings about one group spill over to another. It’s a textbook category error.

OpenAI isn't respecting robots.txt correctly. It seems to be following links on pages that robots.txt is telling them to ignore

This, on the other hand, tracks with the actual problem at OpenAI specifically, which is the easy arrogance from currently being ahead, coupled with the mindset of “well if people are trying to hide it from us OBVIOUSLY that’s the data we want.” The difference there is they’ll need the wakeup slap of a round of shit results before they begin filtering with…

Someone who is naively fixing the situation might just take two minutes to put those websites on a blacklist.

…the extremely obvious solution of pairing a typical exclusion filter (blacklist is problematic) with a classifier trained on the excluded data for systemic filtering in the future. Like, c’mon, this isn’t hard.
posted by Ryvar at 9:35 AM on April 12 [12 favorites]


…the extremely obvious solution of pairing a typical exclusion filter (blacklist is problematic) with a classifier trained on the excluded data for systemic filtering in the future. Like, c’mon, this isn’t hard.

Not if it's something you're familiar with. However, as your company isn't in the business of web-scraping you (the director of data acquisition) just bought a third party solution and the fresh grad AI scientist you had devote 50% of his time to making it work hasn't really finished reading the manual.
posted by Tell Me No Lies at 9:43 AM on April 12


Nelson: "OpenAI isn't respecting robots.txt correctly. It seems to be following links on pages that robots.txt is telling them to ignore."

Nah, that's just an artifact of the way the linkfarm is constructed. As Levine says, "rather than being one web site with 6,859,000,000 pages, it is 6,859,000,000 web sites each with one page", each of which has its own robots.txt file. Robots.txt only applies to pages within a domain, not outbound links. So the bot is blocked from crawling any given site's (nonexistent) internal pages, but the other links technically point to external sites and are fair game.
posted by Rhaomi at 9:46 AM on April 12 [6 favorites]


To refer to the software engineers who program LLM AI as con artists is analogous to referring to the mechanics and technicians who build missiles as warmongers. Which is to say that logically, I could formulate an argument for or against the position, but my morals and ethics compel me to favor one side over the other.
posted by Faint of Butt at 9:48 AM on April 12 [5 favorites]


Ryvar - Like I say, our lived experiences differ. I’m a software developer seeing an industry lose its fucking mind over some bullshit, you on the other hand appear to be in some academic field related to legacy ML, which I consider more or less completely irrelevant to current discourse. What I don’t completely get is the compulsion to leverage that into a defense of all the obvious horrible generative bullshit that’s bearing down on us.
posted by Artw at 9:49 AM on April 12 [14 favorites]


Robots.txt only applies to pages within a domain, not outbound links.

My understanding from the Hacker News discussion is the only way to find new domains on the sp.am honeypot is by scraping links from disallowed pages. So if it's respecting the robots.txt on domain A it should never find all the links to domain B, C, and D in the first place, not even to check their robots.txt.

That being said I'm not really understanding the actual live robots.txt that I see now so I don't know what's what. There's a GPTBot entry that's commented out. Levine's own post to NANOG seems more about how dumb the bot is that it was indexing nearly-identical content, not that it was ignoring his robots.txt.
posted by Nelson at 10:04 AM on April 12 [2 favorites]


To refer to the software engineers who program LLM AI as con artists is analogous to referring to the mechanics and technicians who build missiles as warmongers.

The word “collaborator” is right there. I agree that the management that reaps the overwhelming share of the profits is the greatest problem, but asserting that scientists, programmers, and technicians who work on the projects are either too pure or too naive to be held in anyway accountable is… not great. The systems people who manage crypto mining farms are somewhat culpable in the economic deviation, fraud, and criminality they enable, aren’t they?
posted by GenjiandProust at 10:28 AM on April 12 [18 favorites]


This suggests to me that the various AI companies are going to start creating adversarial honeypot content farms to waste the time and poison the corpora of their competitors. If they haven't already.
posted by adamrice at 10:30 AM on April 12 [4 favorites]


Big opportunity there for the open source community, too.

Honeypot@home, anyone?
posted by flabdablet at 10:32 AM on April 12 [3 favorites]


Spamming Google probably remains the most likely target for any serious honeypot effort , with LLM crawlers as collateral damage, though Google does seem to be doing a good job of taking itself out of the “reliable search engine” game even without that.
posted by Artw at 10:34 AM on April 12 [3 favorites]


you on the other hand appear to be in some academic field related to legacy ML, which I consider more or less completely irrelevant to current discourse

I’l feel like I’ve written about this too many times, but I’m on break with a full engine rebuild kicked off, so here’s my very stupid story (a lot of you might want to skip this comment):
I was in RPI’s Minds & Machines cognitive science program in the late 90s because I basically got a free ride there and while CMU/MIT were offering a steep discount to my lower-middle class family, either would’ve immediately bankrupted them. My interest was specifically in neural networks, in particular deeply layered pretrained networks - the paleolithic precursor to Deep Learning - and continuous runtime training/topological reconfiguration in ANNs, which was very, very nascent in the late ‘90s.

My motivation here was mostly because cognitive science was about creating a generalized framework for minds with humans as just one example, the way x86 architecture is one example of a Turing Machine. It was patently obvious that finding other humans who understood me wasn’t happening, so I was going to need to create an entirely different kind of intelligence that could. Also: freshly out of evangelical fundamentalism and extremely bitter about it, the classic worst case scenario of Skynet didn’t seem like an entirely negative outcome at the time.

Most of the people I was talking to saw LLM-like systems on the horizon fully 15 years in advance of ImageNet, but departmental politics were savage and the AI Winter logician/LISP strong AI deadenders still had the high ground in every meaningful respect. I’d been put in charge of a 40 student academic project’s implementation arm as a freshman, and between that and a dual major (finished 90% of the CS degree, half the psych degree, barely touched the philosophy minor) I went through the usual mental health crisis/ragequit scenario and dropped out at the end of my sophomore year. Also it was just kind of a bad fit: I was raised by a German immigrant family to be an engineer, but because I am profoundly bipolar I am basically cursed with an artist’s soul, which is probably why I wound up making games.

A few years of barely employed/unemployed followed, 18 months “freelance network penetration” (*cough*), mostly legit, and very briefly having zero day remote admin on ALL extant Microsoft-based webservers (IIS 5) - external AND most intranet at the time - until the induced paranoia of that whole field became overwhelming (some fun war stories though). Then back to gamedev - my first paying job was a junior tools programmer summer internship when I was 16, which was the whole reason I’d gotten the aforementioned academic project implementation role right out of the gate.

No surprises in the above, I think. Hopefully only a little boring.

I still get the occasional recruiter call related to this stuff. Drone training sim teams from DARPA subcontractors, autonomous driving sim teams (we can infer Waymo is the only serious major player in that space because they haven’t called me).

You might be able to glean from all that why I have a strong interest in seeing where the field goes. I read a couple papers a month still - old friends tapping me on the shoulder and “you might like this one,” but I’m not in the game at all. I won’t pretend there hasn’t been a viscious, childlike glee along “fuck you, I was right all along” lines directed at the now-retired old guard that still live rent-free in my head.

If you think I’m defending what the industry has become - particularly and especially OpenAI - then you simply haven’t been paying attention. I’ve been more than savage in my comments about both the major players in particular and what capitalism’s been doing with it. But I know a handful of the people that actually write the papers and code - nobody at OpenAI, thankfully, though everyone from back then is still living in the Valley - that are underpinning key aspects of all this, and I hope they get to continue doing cool shit despite the ocean of bullshit surrounding them. Two friends from back then are still in CS academia (one AI - RL not LLM) though mercifully at different schools entirely. A pair of them have been on the Android kernel team since more or less day one. I don’t know anybody with an MBA, or in any C-suite and I honestly hope that never changes because it would basically guarantee one person or another that once had my respect opted to become something grotesque horrible. My own very brief SV contracting experience during the VR flashpan has been that the culture there rarely permits exceptions to that rule.

Hope that helps prevent making any bad assumptions about where I’m coming from in the future, or what depth of knowledge you should expect from me.
posted by Ryvar at 11:04 AM on April 12 [26 favorites]


I read you loud and clear, Ryvar. I think I come at this topic, in general, from a pretty critical and pessimistic side but I appreciate your perspective and you've been patient and gracious with how you share your perspective and (I think) you have elevated the caliber of discussion on this. Thank you.
posted by elkevelvet at 11:20 AM on April 12 [15 favorites]


Thanks Ryvar. That's about what I gleaned from your various comments. But if you don't want to be seen as an AI apologist, maybe... stop defending the industry so loudly when we gripe about it? You're all over these threads doing exactly that. And personally, I do indeed put blame on all these very clever people that think they are just working on problems and it's not their fault how it's used. I also have college friends working tech jobs with horrible ethical problems. And I absolutely do judge them for continuing to be paid a lot of money to do work they know damn well is more on the side of "part of the problem" than "part of the solution".

I have a PhD in applied math. I've learned lots of skills that can be used to devise harmful things and get paid a lot by various tech giants or defense contractors for doing so. And you know what? It really hasn't hasn't been hard to just ... not do that.
posted by SaltySalticid at 11:33 AM on April 12 [19 favorites]


("The people developing the software aren't con artists, they just work for con artists" has some conspicuous parallels with "I'm not a racist/fascist, I just vote for racists/fascists". It's a distinction that is increasingly difficult to care about as the cold water rises to chin level.)
posted by Sing Or Swim at 11:50 AM on April 12 [4 favorites]


God, I hope we aren't that complicit in the morality of our bosses.
posted by mittens at 12:02 PM on April 12 [5 favorites]


Just to bring the discussion back up to a dumber level for a second, how does he have 6,859,000,000 different domain names?? I get that his script could make up a domain name, but apparently they actually work, which means they have to be registered with some outside registrar and propagated to DNS servers across the net, right? How is that possible?
posted by Naberius at 12:16 PM on April 12


They’re not domain names. They are host names.
posted by Tell Me No Lies at 12:21 PM on April 12


Naberius: "how does he have 6,859,000,000 different domain names?"

You don't. You have 6,859,000,000 subdomains. In "subdomain.domain.com", "com" is the top-level domain. "domain" is the domain name, and you have to pay a registrar to get one of these and have it registered so the rest of the internet can find it. But you control "subdomain" and can generate as many of these as you want. It's not that hard to set up website hosting for a domain name to allow for an infinite number of subdomains and map to them on the fly, or even generate a page based on whatever subdomain someone might have accidentally accessed.
posted by adamrice at 12:35 PM on April 12 [3 favorites]


stop defending the industry so loudly when we gripe about it? You're all over these threads doing exactly that.

In virtually every case you are going to find that I am supporting one aspect or another of a common theme: which is the ability of open source and small teams to remain competitive. I strongly believe that this is an arms race between capital and workers, and I am against everything that leaves the former with an advantage once the smoke clears. I don’t know whether imbalance in access to these systems will have actual survival-level stakes, but the possibility is not zero. Not-being-functionally-enslaved stakes seems more likely.

Like a lot of nerds here and everywhere I have a low tolerance for what I know to be misinformation and what I believe (and there is always a good chance I am wrong) to be poor assumptions. Eg, no, the ecological impact of current systems is not a major concern. The likely ecological impact of the proposed future systems in OpenAI’s papers is fucking horrific. Assuming that LLM developers are idiots because you don’t like how the technology is being used is understandable but leads to incorrect conclusions. And annoying tendencies to underestimate what bad actors like OpenAI will do in the future. It’s wishful thinking that your opponent will hamstring themselves when there is zero reason to believe that will actually pan out. We cannot relax.

Point is: the only “side” I am on is small teams and labor. My politics are found in any Culture novel. I am sorry that both in the past and probably the future I come off as insufficiently sympathetic to artists. I see myself as an artist and all of my professional creative output is in the training set, but I’ve basically written it off as a lost cause - not just in the past but continuing on into the future. It’s a pointless fight because even if we win in one jurisdiction in a global economy it’s just pushing bubbles under plastic.

The only saving grace is that most efforts to lay off writers and artists en masse will fail spectacularly and lead to partial rehirings, followed by near status quo ante rehiring during the next expansion cycle because capitalism must keep most labor captive for monopolies to survive. But the next few years are going to suck ass while the MBAs learn all this via trial and error: they never listened to anyone else outside the golf course before, why would this time be different?

We have to learn to live with this shit in our midst, and step one is making sure there is not a monopoly on the future means of production. We should fight about the rest after that’s been taken care of.
posted by Ryvar at 12:41 PM on April 12 [12 favorites]


You don't. You have 6,859,000,000 subdomains


What adamrice describes with subdomains is entirely doable, but for the record www.web.sp.am is just using hostnames. I'm not sure why it's stated differently in the article.
posted by Tell Me No Lies at 12:44 PM on April 12


stop defending the industry so loudly when we gripe about it?

Stemming Metafilter's tendency to histrionically condemn entire branches of human endeavor is always a good thing. Some people are known to paint with too broad a brush -- a lot of people here don't seem to own anything short of a paint sprayer.
posted by Tell Me No Lies at 12:50 PM on April 12 [20 favorites]


AI crawlers consuming content created by other AI is already a pretty big problem in this space. Also mentioned in Ed Zitron's newsletter article asking "Are We Watching The Internet Die?" (previously on Metafilter)

Basically, you don't need some sort of valorous rebellion of folks trying to sabotage AI by generating garbage. The users of GenAI, in a bid to make money off SEO and search result hijacking, are doing that already, and none of the AI companies have a good solution for being hoisted on their own petard this way.

I also know that MeFi loves to hate on Ezra Klein (with some good reason) but his recent conversation with Nilay Patel is a good delve into the problems emerging from an Internet where rampant and invasive genAI content is spreading like kudzu.
posted by bl1nk at 12:51 PM on April 12 [2 favorites]


Tell Me No Lies: "for the record www.web.sp.am is just using hostnames"

That's where it starts. But if you click any of the links, they'll take you emperor-franz-josef.web.sp.am or whatever (and that's actually a sub-subdomain!).
posted by adamrice at 1:05 PM on April 12


So here's a question that I hope won't be too derailing: I was listening to the podcast Better Offline where tech journalist Ed Zitron talks, mostly negatively, about aspects of the tech industry. The podcast is part of Cool Zone Media, which started with Behind the Bastards, so that's the tone you are getting.

Anyway, in a recent episode, he was suggesting that there are several upcoming cliffs for AI applications, including resources (processing, storage space, energy consumption) but also a lack of new data to train and refine models on. Since each iteration requires more data, and humans can only produce so much data, they will either fall off the cliff or start training models on data produced by other models with a degrading effect. Is this wistful thinking, or is this an actual concern?
posted by GenjiandProust at 1:06 PM on April 12 [2 favorites]


GenjiandProust: "they will either fall off the cliff or start training models on data produced by other models with a degrading effect. Is this wistful thinking, or is this an actual concern?"

Yes. It's called model collapse.

The problem (perhaps I should say a problem) is that we don't have a good sense of what features in training data an LLM is picking up on. We do know in general that LLMs pick up on features that humans would not notice, or would consider irrelevant. So we'd see first-generation output from an AI, say "that's good enough for me," and it would go live.

So when AI output is subsequently used as training data, it could contain features that we humans are not good at spotting, but that would throw the next-generation LLM into a tizzy.

There's a complementary phenomenon you may have heard of, computer vision dazzle to defeat facial recognition. There's also a tool called nightshade to alter images in ways that are imperceptible to humans but will poison AIs using them as training data.
posted by adamrice at 1:19 PM on April 12 [7 favorites]


It's interesting and amusing to read about how the training bot got mired in that glue trap, but, net, there is about nothing to "learn" from those auto-generated content-free but link-heavy pages, right? So no real impact, other than wasted energy?

It's my uninformed opinion/hope that this indiscriminate "ingest anything/everything" phase of AI training must be close to done, or at least near the point of diminishing returns, and that real world deployments will be to use them as natural language front-ends on specialist, validated datasets.

I will still keep mashing zero and screaming "agent!" for the foreseeable future, anyway.
posted by Artful Codger at 1:28 PM on April 12 [1 favorite]


Is this wistful thinking, or is this an actual concern?

Both? From what I’ve read it’s definitely an actual concern in some cases but a bit of model-generated input is probably not going to incurably poison future training runs if there’s a healthy amount of authentically representative data. And there are already cases where some kind of synthetic or ML-transformed data (like machine transcripts of audio content, machine-generated labels for images) are useful in training. Plus there are examples of model paradigms that get very good at things through self-play/adversarial training, as long as there’s some objective criterion for improvement.

So real issue but not exactly the seed of doom for the whole enterprise?
posted by atoxyl at 1:52 PM on April 12 [4 favorites]


Atoxyl likely nails the major players’ short term plans, though it’s worth noting multi-modality opens up new, less polluted data sources and buys them another couple years of runway. Medium and long term everyone’s pushing for approaches less dependent on training with vast amounts of data, as much to begin making inroads to genuine reasoning as avoiding model collapse. NVidia isn’t just a hardware supplier on this front, either, some of their runtime systems works appears to be a step ahead.

All else fails? They’ll outsource a shit ton of “manual” labor to digital factory workers in Africa to cover any immediate gap until reinforcement hybrids are production-ready.
posted by Ryvar at 2:27 PM on April 12 [2 favorites]


The people actually building these systems are mostly former academic researchers cashing in on their PhDs by applying what they’ve learned, with some typically very sharp software engineers embedded in each team.

This is generally true: I know a manager in one of the big AI companies getting billions in VC funding, and going from an AI model to a product that can make money and help keep the company running is proving to be harder than anticipated, requiring considerable engineering design and architecture skills well outside of the PhD skillset.
posted by They sucked his brains out! at 11:21 PM on April 12 [1 favorite]


if you click any of the links, they'll take you emperor-franz-josef.web.sp.am or whatever (and that's actually a sub-subdomain!)

That said, all prefix.web.sp.am hostnames resolve to the same IP address regardless of what string is put in place of prefix. All that's required to achieve that is control over the authoritative DNS server for addresses inside the web.sp.am domain (i.e. subdomains of web.sp.am), which is a completely normal thing for the registered owner of any domain to have.

That means that any request directed at any such hostname ends up arriving at the same web server, which can then use SNI and/or name-based virtual hosting to work out what to serve in response. Again, this is completely standard practice, though it's more often used to facilitate shared hosting for multiple customers than for weirdness dreamed up by the web server's owner.

As it's currently set up, web.sp.am's DNS server doesn't appear to be doing any checking whatsoever on prefix: if prefix exists at all, that's enough to make prefix.web.sp.am resolve to 64.57.183.45. With no prefix, web.sp.am doesn't resolve to an IP address.

The web server listening for HTTPS requests on 64.57.183.45 port 443 does check prefix though. If the URL supplied is either https://www.web.sp.am or https://name1-name2-name3.web.sp.am, where each of the three names is unique and drawn from a preset list, it will return a web page constructed using that prefix; otherwise it will return a page titled "Invalid request" and showing either "invalid URL" or "Invalid website" depending on which part of the URL it's objecting to.

It doesn't appear to check the prefix on requests for robots.txt, though. https://prefix.web.sp.am/robots.txt gets you the same file regardless of what prefix is. Even https://64.57.183.45/robots.txt works, if you ignore your browser's security warnings.
posted by flabdablet at 3:32 AM on April 13 [2 favorites]


they'll take you emperor-franz-josef.web.sp.am or whatever (and that's actually a sub-subdomain!).

I don't think that's the case...? emperor-franz-josef.web.sp.am resolves to a host record.


$ host -a emperor-franz-josef.web.sp.am
Trying "emperor-franz-josef.web.sp.am"
;; -HEADER- opcode: QUERY, status: NOERROR, id: 58097
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;emperor-franz-josef.web.sp.am. IN ANY

;; ANSWER SECTION:
emperor-franz-josef.web.sp.am. 211 IN A 64.57.183.45


As far as I can see the only subdomain involved in the spider-trap is web.sp.am. It looks like he has set up a DNS server for it that returns 64.57.183.45 for any hostname resolution in that sub-domain.
posted by Tell Me No Lies at 4:45 PM on April 13


« Older ^•ﻌ•^ฅ oh, hello ฅ^•ﻌ•^ฅ...   |   Learning to milk wild camels is just one challenge... Newer »


You are not currently logged in. Log in or create a new account to post comments.