The New York Library has just released a treasure trove of incredible archive images to the Internet

January 13, 2016

By Julie Leibach

Prospectors in Alaska — A stereograph featuring “Prospectors returning to camp. 62 degrees below zero, Alaska.”
The Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Photography Collection, The New York Public Library

Searching for a 14th Century manuscript for a school report? How about an old baseball photo for your stash of sports memorabilia? You might try the New York Public Library’s Digital Collections. Recently, the library made more than 187,000 digitized, public-domain items more easily accessible in the highest resolution available.

Specifically, the library removed permissions and payment processes that encumbered access to this material, according to Ben Vershbow, director of NYPL Labs, one of the departments involved in the project. The institution, which celebrated its 120th anniversary last year, also added updates to its API and GitHub account to enable further use of its content.

Science Friday recently spoke with Vershbow about the library’s archives, its approach to digitization, and the importance of making digital collections available to the public.

Science Friday: The NYPL has been digitizing for a while, right?
Ben Vershbow: The library’s digitization story started at some scale around ’99, roughly. In 2005, we launched the predecessor [called the Digital Gallery] to the current Digital Collections website. That was really the library’s first big move at-scale, with over a quarter of a million items — in-copyright, out of copyright, a whole mix. That’s grown over the years, and will undoubtedly undergo further evolutions.

What kind of equipment do you use to digitize?
Obviously digitization that happened in the earlier days was done using somewhat different tools, and what was high-resolution then is not so high-resolution now. A lot of the early-wave digitization the library did, and many libraries and cultural heritage organizations did, involved flatbed scanners.

The digitization we do today is overhead photography in a copy-stand setup, and there are variations on that. We have a lot of what you call ‘transmissive media materials’—so, materials that are not reflective but where light moves through them, like slides and glass plate negatives—and that work requires its own kind of modification to a copy stand. There’s also book-scanning equipment, of course, which can vary widely for different kinds of books. And there are other high-volume apparati being developed. What we are making available through this public domain project represents a lot of different types of digitization, to be sure.

The library’s in-house digitization lab is in our department at NYPL Labs, in an awesome space in a non-public facility that the library runs in Queens, and it’s really very meticulous, expert work.

What can we find when we plunge into this archive of high-resolution, public-domain images?
Well, it’s incredibly diverse. You’ll see there’s a lot of maps, there’s a lot of stereographs, there’s a lot of sheet music, there’s a lot of other kinds of photography, there’s a lot of correspondence and manuscript material. I think the single biggest collection and maybe genre category is the stereoscopic views. These are 3-D images, and they were an incredibly popular form of entertainment and virtual sightseeing, in a sense, in their day in the late 19th, early 20th century.

Is there a trend among cultural institutions to digitally open their collections to the public?
There’s a whole movement for opening up collections in GLAMs [an acronym for galleries, libraries, archives, and museums]. The web has become a vibrant cultural commons, and I think that we’ve seen that—whether they’re legacy cultural institutions like libraries and museums and archives, or more Internet-native public institutions like Wikipedia, Wikimedia, and the Internet Archive—more are offering unrestricted open content into the web.

We’re trying to share data so that people can build aggregators so that you don’t have to go to each institution’s web presence to search. Wikimedia itself is fed by a lot of different institutions that are releasing content into that commons, for example. And the Digital Public Library of America (DPLA) is a great way to expose what’s been digitized. It points you to that local collection and web property, and then you can work through the particular use parameters of that institution [if the item still has use restrictions]. [There’s also a European forerunner to the DPLA, called europeana.]

Why has it become important to GLAM institutions to get this content out to the public?
It’s a very powerful thing. It creates a common resource base that everyone can draw on and repurpose very freely, leading to all kinds of new uses and illuminating projects. [The library created a few examples of how users might repurpose its content. Here’s one.]

When you think of the web and the Internet in general as a cultural medium, and as a place that is not just about finding your way to resources that live elsewhere, but is in fact made itself of resources—of materials that can be used in a digitally native context, even if they make their way back into physical forms and other forms of distribution—it does start to feel quite limited if your materials are mostly there as a reference that points you back to something that you either need to pay for or require permission to use.

Now, copyright and all kinds of other things that we have to work through and respect and abide by do require us to place requirements on certain materials, but the ones that don’t have those constrictions, that are out of copyright?—I think people are starting to realize, let’s just make those as freely available as possible, because then you’re really able to attain greater impact in terms of these things being used in ways that are both expected and unexpected. For all of us working in the space, it’s obvious to us that we have to do this for any material that we can. Let’s just get it out there and see what people can do.

How would you describe NYPL Labs?
NYPL Labs is a new kind of what was traditionally called a ‘digital library program.’ We’re really looking at that entire life cycle of bringing our research collections onto the Internet and even working proactively to engage new users and create context where things can be used. For example, we host hackathons that are exploring ways that we can engage local technologists and creators to work with us on projects or show us how these things can be used in new ways. We feel like we’re sketching a new kind of organ, in a way, of research libraries that supports people working in a new mode with cultural data that the libraries collected.

What other projects are you excited about?
The NYC Space/Time Directory, which is an initiative made up of a lot of different projects but with this unifying dream of opening up historical geographic data about New York City—as a resource in itself, because I think having a record of the city’s changes is very important for people who are understanding the city’s development, and it’s certainly a historical interest for a wide number of people, but also as an organizing framework for aggregating other information, such as photographs about these past places. It’s great to be able to search in our digital collection site and just find photos, but what if you could browse a map and find them geographically and also temporally? Or you might search for a place that doesn’t exist anymore. Or you might want to see what the layout of the city was at a certain time. We have coverage of the city at these different time periods.

This project is something we’re undertaking in the coming two years and just got a Knight Foundation grant [through the Knight News Challenge] to do.

This story was first published by Science Friday with Ira Flatow.