Hello, our names are Eleanor, David, Lucian, and Elizabeth, and we are students in the School of Information Science at the University of Illinois at Urbana Champaign. We are creating a digital labor history of Internet Archive’s book scanning project for an Introduction to Digital Humanities course. See our GitHub repository here.
The complete and processed data files, scraped from the Internet Archive API, that we used for this project are available at this Box site.
Our project is called: “Scanning Labor in the Internet Archive: Erasure in Metadata and Digital Products." In it, we are seeking to make more visible the often invisibilized labor of the workers who scan the books that make up the Internet Archive. To do this, we’ve mined the Internet Archive API to scrape metadata records associated with the scanning process. We have used these records to create a map that visualizes the spatial patterns of digitization of the archive over time. We wanted to ground this spatial-data history in the experiences of Internet Archive workers, so we sought to conduct oral histories with the workers. We wanted to embed these stories in the map as pop-ups that would link out to narratives, quotes, or media. Ultimately, due to the quick turn over rate of workers at Internet Archive and the relatively high threshold and time commitments for an interview, we have not yet been able to connect with a scanning worker to conduct live oral histories. Pivoting our approach, we decided to send out a survey en masse to the emails found in the scrapped metadata.
In conjunction with information gathered through our survey, zoom chats with the Internet Archive's managment, we've found details about the scanning sites through Internet Archive's own collection, ProPublica, and GlassDoor. In conjunction to creating this website and map, we've created profiles on each scanning center to detail our findings.
Still, central to this project is the movement to reappraise the work of scanning. Without scans, and the workers who create them, so many projects in the digital humanities (DH) would be left without their source materials. The workers who enable many of the exciting possibilities of DH are too often left without credit in DH projects, including the webpages that their digital products are displayed on. This work of scanning is often underpaid, outsourced, and physically taxing--if not destructive--to the body. If DH, or any research, purports itself to be critical it must rectify its valuation of (scanning) labor and its understanding of the materiality of scanning. Hopefully in turn, moving towards rectifying its own relationship to exploited labor.”