Embedded Semantic Markup, schema.org, the Common Crawl, and Web Data Commons: What Big Web Data Means for Libraries and Archives and Enhancing an OAI-PMH Service Using Linked Data

Session Type: Presentation

Session Description
For over ten years the Sheet Music Consortium has been harvesting metadata using the OAI protocol and providing user services at http://digital2.library.ucla.edu/sheetmusic/. With the support of IMLS planning and leadership grants the latest iteration of the portal maps all metadata to MODS (rather than DC), invites users to add structured metadata to records, offers metadata downloads, and provides metadata mapping and Static Repository services that facilitate the participation of smaller and less technically able institutions. Despite these enhancements the problems inherent in metadata harvesting projects persist, including variant metadata standards, inconsistent application of standards, and varying levels of authority control. Utilizing the user-supplied metadata infrastructure and Linked Open Data principles and standards, SMC has initiated a project to both improve the normalization of the Consortium’s metadata and expose enhanced metadata as Linked Open Data (LOD), thereby expanding its impact and our ability to share data more widely and effectively, both directly with our users and through automated systems.

The Consortium’s strategy for publishing trustworthy linked data leverages the user-supplied metadata layer of the data repository, which is maintained separately from the harvested data. Text analysis tools such as Open Refine and Voyeur/Voyant are used to group data and assist in the identification of appropriate normalized forms, which are then written to the user-supplied metadata layer that forms the basis for publication of LOD records.

Our presentation will address and compare the challenges and possibilities for publishing LOD for creators, titles (works), publishers and subjects. Then we will discuss a pilot project that focuses on normalizing publisher information and exposing that as linked open data.

While this case study is focused on sheet music, the methods discussed are generally applicable in the context of harvested metadata.

Session Leaders
Stephen Davison, University of California, Los Angeles
Elizabeth McAulay, University of California, Los Angeles
Claudia Horning, University of California, Los Angeles

View the community reporting Google doc for this session.