Metadata key
Complete and processed data files are available at this Box site.
The code to create the website is available at our static site repo
The code for the high volume API queries and creating all of our data visualizations is available at this Github Repo
Geocoding refers to the process through which geographers (or GIS systems) transform language referring to a place into mappable geographic coordinates. GIS systems can often automatically geocode information like addresses, city names, or country names. However, they are incapable of resolving less standardized place references to geographic coordinates. In that case, a human must geocode them.
For the Scanning Labor project, we geocoded the metadata field, “scanningcenter” in the downloaded Open Library records. Of the 3 million records we scraped from the Open Library API, only 2.5 million had any information in the “scanningcenter” field. Workers created these 500,000 records mostly from 2001 to 2008 under the purview of the Google Books project. Google, unlike IA, required employees to sign NDAs to not reveal the scanning center location. As such, we cannot geocode the centers at which workers scanned these 500,000 books based on the records we have. This geographic opacity is strategic on Google’s part.
The other 2.5 million records contain 93 unique values in the “scanningcenter” field. We attempted to geocode these through the following method:
KeplerGL is an open source mapping platform run on Uber’s mapping API. We decided to use it in lieu of Esri's ArcGIS because it is open source and allows for easy export to HTML. We uploaded the csv file containing the number of scans per center per month to Kepler’s online user interface. From there, we made the radius of each point correspond to the number of scans, “count.” Next, we made the color of the point dependent on how certain we were that we had geocoded the location properly. Finally, we added a time filter to create the animation.
The python script with which we created the csv file, the csv file, along with the map as a json file are all available on our GitHub.
Initially, our goal for this project was to include oral histories of scan operators at each of the scanning centers. To do so, we developed a series of questions found here: DH-IA-oral history questions and planned to conduct one-hour oral interviews to capture the day-to-day experience of scanning for IA.
We scraped over 500 emails from the metadata and reached out individually to 25 of those people. Unfortunately, we had trouble connecting with any of them - almost all of the emails bounced back. This highlights the high turnover rate of scanners at IA that contributes to the invisibility of these workers. We did receive one reply from a scan operator: “Thank you for considering me for this project. Unfortunately I don't feel I would be a good fit at this time. Please contact Chris Freeland at chrisfreeland@archive.org. Chris is our PR representative. I'm sure they will be able to help you.” We reached out to Chris Freeland at Internet Archive but did not receive a reply.
Pivoting our approach, we created a google survey to send out oral history questions via email. We felt a google survey would lead to more responses as it takes less time from interviewees and is completely anonymous. We wanted to be cognizant of the extra time and unpaid labor we would be asking of these workers, so the survey is only 13 questions and none of them are required. We were able to send this survey to all 500 of the email addresses that were in the metadata, so we had a broader reach and higher likelihood of connecting with workers who were interested and emails were still active. We created a Google forms survey. So far we’ve only received three responses and many bounced back emails.
Beyond gathering stories and anecdotes through oral history and survey, we also sought out pre-existing worker narratives on glassdoor. From our experience, it seems crucial to approach oral history more intentionally and slowly, to be realisitic about the commitment to relationship building it entials.
Another aspect of oral history in our project was to meet with several non-scanning staff at the Internet Archive (although two of the people we interviewed had started as scanners). We conducted two hour long interviews to understand the workflow of scanning centers, better understand our findings in the metadata, and get contacts for further interviews.
While IA middle management were initially willing to meet with us and interview them for the project, they refused to go forward with the project after we sent out our survey to IA scanning center workers. After sending the survey, we received this email from an IA staff member: “I'm glad our conversation was helpful for your project. At this point, we have participated as an organization to the extent that we are comfortable. If you have any further inquiries, please direct them to me.”