Big Archival Data: Designing Workflows and Access to Large-Scale Digitized Collections

Session Type: Presentation

Session Description
In 2007, “Digitization Matters,” a forum at the Newberry Library sponsored by the Society of American Archivists and the Research Libraries Group on scaling up digitization of special collections reverberated through the archival community. Three panelists will present initiatives that have heeded the forum’s call to ramp up and that demonstrate digital access to large-scale archival collections of image, text and sound artifacts. We seek to answer the questions: how do we achieve large-scale workflows; and does the result, as envisioned in 2007, enable new scholarly research as well as new forms of community discovery and engagement?

The first paper will discuss Milwaukee’s Polonia, a project to digitize and publish online in one year 35,000 images, primarily glass plate negatives, by employing a novel “rail-system” to efficiently guide the materials through digitization, and re-using existing item-level metadata. The speakers will discuss community and scholarly reactions and how those interactions will inform further development of tools and contextual materials.

The second paper will focus on the Smithsonian’s Archives of American Art’s digitization model, which relies on the archivists’ EAD finding aids for metadata and workflow support that has made it possible to digitize since 2005 over 140 collections, totaling over 1,200 linear feet and 2 million digital files, and a more recent challenge of introducing a Smithsonian-wide crowdsourcing/transcription platform.

The third paper will present on the High Performance Sound Technologies for Access and Scholarship (HiPSTAS) project, which is developing open source software (ARLO) that uses machine learning and visualization to help librarians and archivists automate metadata description for large undescribed sound collections. Currently installed on the Stampede super computer at the University of Texas at Austin, ARLO has been used with PennSound, the American Philosophical Society’s Native American Projects collection, and the Texas Folklore Society’s collection of field recordings.

Session Leaders
Ann Hanlon, University of Wisconsin-Milwaukee
Michael Doylen, University of Wisconsin-Milwaukee
Karen Weiss, Smithsonian Archives of American Art
Tanya Clement, University of Texas

View the community reporting Google doc for this session.