The internet is about to get a little worse as Reddit moves to block the Internet Archive so AI companies can’t scrape its content
The internet, which was once a useful thing, is about to become a little less so: A new report from The Verge says Reddit is going to start blocking the Wayback Machine from indexing most of its content.
The Wayback Machine, part of the Internet Archive, takes “snapshots” of websites as they exist at various points through their history—even if those websites don’t exist anymore. Want to know what the old BioWare forums looked like before they were closed in 2016? Wayback Machine’s got you. It’s also incredibly handy for tracking things like Steam page changes and answering questions like, “Hey, did the CIA ever run a Star Wars fan site?” (And yes, it did.)
The Internet Archive’s ability to do this is dependent on crawling and indexing websites, and that’s what Reddit is going to block: In future, the Wayback Machine will only be able to index the reddit.com homepage, meaning individual subreddits and posts will be out of reach—effectively rendering it useless. Reddit spokesperson Tim Rathschmidt said the block is being imposed because “we’ve been made aware of instances where AI companies violate platform policies, including ours, and scrape data from the Wayback Machine.”
The report says limits on the Wayback Machine’s ability to scrape Reddit will start “ramping up” today. Rathschmidt said Reddit had been in touch with the Internet Archive in advance, to “inform them of the limits before they go into effect.”
I’m generally all for anything that makes life more difficult for AI companies, but I can’t really hand it to Reddit in this case because the principle in question here appears to be, well, not principle, but money: Reddit made a deal with Google in 2024 to make its content available for AI training. Another deal with OpenAI followed a few months later.
Reddit’s thing isn’t so much about preventing the abuses of AI training, then, as it is charging top dollar for the privilege. In that light, this really sucks: The Internet Archive is a non-profit organization, and the Wayback Machine—in sharp contrast to AI-powered chatbots—is genuinely useful, even vital given how quickly working links turn into dead ones. The Internat Archive provides a valuable service, accurately and without unprompted racist slurs. Cutting the Wayback crawler off from Reddit, a massive trove of information on just about every subject imaginable, is a loss for us all.
Best gaming rigs 2025