Where to find what's disappeared online, and a whole lot more: the Internet Archive

Brewster Kahle, founder of the Internet Archive

In a gleaming white former church with Greek-style pillars, under the shade of cypress trees in a quiet neighborhood of San Francisco, an effort to preserve much of what’s online, and to scan books, and save video streams from around the world, is now underway.

The Internet Archive's headquarters, in San Francisco

The Internet Archive's headquarters, in San Francisco

Credit:

Mary Kay Magistad

Go in the front entrance, and you may see volunteers unloading books, while others sit at scanning machines, digitizing the books so they can be made available online. Go into the old church itself, and you’ll see a bank of servers blinking green in the back, while soft light streams in from the stained glass dome, onto wooden pews below, and a little army of statues of people — one for each of those who have worked here for at least three years.

Internet Archive staff, represented as statues, in its headquarters in a former church in San Francisco

Internet Archive staff, represented as statues, in its headquarters in a former church in San Francisco

Credit:

Mary Kay Magistad

This is the Internet Archive, the brainchild of Brewster Kahle, an MIT-educated computer engineer, internet entrepreneur and digital librarian. Since the Internet Archive started in 1996, its staff — now, about 140 people — have digitized almost 3 million books, and are aiming for 10 million.

They’ve saved video streams from major television networks around the world. And they’ve saved multiple versions of websites and webpages that might otherwise have disappeared, available to anyone who goes to archive.org, by using the “Wayback Machine.” About five million people use it every day, Kahle says.

At work in the Internet Archive

At work in the Internet Archive

Credit:

Mary Kay Magistad

The Wayback Machine is playfully named after the contraption invented by the cartoon talking dog Mr. Peabody, in The Rocky and Bullwinkle Show, but it has a serious mission — to preserve knowledge, as well as a way for investigative reporters and ordinary citizens to find information, and past statements, that those misusing power might prefer to see disappear. MSNBC host Rachel Maddow is a fan.

“The Wayback Machine. It is a national treasure. It is an international treasure,” she said on her show in late 2016. “We have used it hundreds, probably thousands of times in the preparation of this show. Any time somebody tries to make something go away, that’s the first place you look, to see if you can find it, despite the efforts of the person who tried to disappear that information.”

Like, Maddow says, how when Mike Pence was first running for Congress, he proposed that rather than allocating money to help people with AIDS, that money should go to curing people of being gay. Or, when Donald Trump was first elected president, he put on a page online, called “Meet the President-elect,” promoting his business interests.  Both pages have since been taken down — but they live forever on the Internet Archive.

But the Wayback Machine is only one part of what the Internet Archive is all about.

Sign in the Internet Archive

Sign in the Internet Archive

Credit:

Mary Kay Magistad

“It's actually much more than that,” says founder Brewster Kahle. “We're trying to build a Library of Alexandria, Version 2, so can we make all the published works of humankind available to people, permanently. If you're curious enough to want to have access, can we make it available to all the books, music, video, web pages, software, lectures, available to anybody wanting to have access.”

Why the reference to the Library of Alexandria, which has also long been the symbol of the Internet Archive, and which its current home in a former Christian Science Church vaguely resembles?

Brewster Kahle, Internet Archive founder, in the former church that now serves as its headquarters

Brewster Kahle, Internet Archive founder, in the former church that now serves as its headquarters

Credit:

Mary Kay Magistad

“That was the last time that someone tried to collect it all,” Kahle says. “The idea of, have everything together. It was the center of learning in the ancient world. They were agnostic about where the information came from. They wanted the works from the Hebrews, from the Hittites, from the Greeks, from the Romans. And they brought it all together, and they made it so that they could learn from it."

"And they came up with great things — geometry! They knew that the earth was round and how big it was within a few percent," he adds. "They figured out all sorts of things by being able to bring all of this information together. They built a global brain.”

Building a global brain in the internet age has a different set of challenges, than building one out of papyrus and paper. On the one hand, vast amounts of information — and misinformation — are added to the internet daily. What do you choose to preserve?  And how do you actually preserve it, given how quickly technologies and formats change, not to mention the risk of digital decay?

“It's challenging in a digital age,” Kahle says. “Paper lasts 500 years. Palm leaves last a thousand years. Papyrus, we have a papyrus from the Egyptians. But hard drives? Flash drives? Floppy drives? How are we going to make this last?”

Internet Archive scanner

Internet Archive scanner

Credit:

Internet Archive

An important part of the answer, he says, is making copies — lots of copies, copies in different places, in case political intervention or natural calamities damage or destroy the archive in one location, but also updating copies in the same place, to make sure a fresh and complete digital copy is always available.  The Internet Archive now has 30 petabytes of data, and Kahle says that is likely to double in the next two or three years. A partial copy of it is in Alexandria, Egypt, another in Amsterdam, and Kahle plans to put a full copy in Canada. A fundraising drive has already secured more than half of the $5 million needed to do that.

Much of the Internet Archive’s funding comes from donations. It also works with 500 libraries around the world, including the Library of Congress and the National Archives, getting 10 cents a page to digitize books — and the Internet Archive digitizes about 1,000 books a day, in 29 scanning centers in eight countries. Kahle says Donald Trump’s election as US president, after his musings on the campaign trail that maybe parts of the internet should be "shut down," have spurred ever more volunteers to show up and pitch in.

“It puts a lot more fire under our butts,” Kahle says. “And people are volunteering out of nowhere. So at the end of every [presidential] term, in 2008 and 2012, and now 2016, we archive everything on dot gov and dot mil, very, very well.  But there's a lot more interest this time around, obviously. There's volunteers. There's hackathons. There's a button on the Wayback Machine page, of Save Page Now. And so people are doing that, up a storm. But we're also expanding now into government data sets, climate change data sets, particularly. There have been people that are now in power that have said they want to just have the whole Department of Energy go away, much less the EPA, or the NEA and NIH, all of these major American institutions can just be erased. What the taxpayers have paid for should stay around. So we're very motivated, and we're getting lots and lots of help.”

Internet Archive conference

Internet Archive conference

Credit:

Internet Archive

The Internet Archive has a default “all in” policy to what it saves, but does sometimes agree to remove data if a government or other entity comes and asks it to do so. Kahle says this is considered on a case-by-case basis, and sometimes, the Internet Archive pushes back. The Chinese government, which is an advocate of internet sovereignty and which censors and blocks many websites, has blocked the Internet Archive, rather than insisting that certain sites created in China be taken down.

How the Trump Administration might deal with the Internet Archive is still an open question. One thing it won’t be able to do is glean information about who is searching for what, because the Internet Archive makes searching anonymous.

“We specifically ignore who it is that's using us, to be able to protect reader privacy. Because we do get demands, sometimes from law enforcement, that want to know about the patrons of the Internet Archive,” Kahle says. “And it's very helpful to just be able to say, 'we don't know.' And they just turn around and go away. And sometimes they still come and demand things. They give us one of these national security letters, where there’s a gag order, where they demand information. And the only reason why I can say we've gotten them is because we fought them and we fought them and won. And the libraries have a long history of dealing with authorities that have come and rounded up people for what it is they've read, and bad things happening to them.”

Kahle is committed to the pursuit and preservation of knowledge, recognizing what it has cost the world in generations past, when books have been burned, and libraries have been destroyed.  In an age when some Americans find it comforting to look inward, he has a vision of universal access to all knowledge, through a decentralized web that can’t be controlled or shut down by illiberal forces.

Internet Archive server in its headquarters, in a former church in San Francisco

Internet Archive server in its headquarters, in a former church in San Francisco

Credit:

Mary Kay Magistad

“What doomed the Library of Alexandria was starting to be a shift of interest from universal ideas into a much smaller ideas of the Christians of that day, that really shrank the interest and the support for libraries,” Kahle says. “And that is what we want to fight going forward. We like the idea of universal access to all knowledge and education and growth. Let's keep that alive and growing. “

Sign up for our daily newsletter

Sign up for The Top of the World, delivered to your inbox every weekday morning.