Piloting a Peer-to-Peer Process for Becoming a Trusted Digital Repository
Community Notes
In this presentation, representatives from UF and UNT will share on their work in collaboratively creating a pilot peer-to-peer process for TRAC to build towards becoming a Trusted Digital Repository, and how the process supports other concerns including needs for different types of collaborations and scales of collaboration for achieving TRAC goals, with peer-to-peer style collaboration for peer review of TRAC offering an important option for building capacity locally and as a community.
In 2014 the University of Florida (UF) and the University of North Texas (UNT) began a collaborative process to each complete a full self-audit using the Trusted Repository Audit Checklist (TRAC) for both institution's digital repositories. In addition to the self-audit, each institution agreed to participate in a peer review process evaluating and scoring each other's self-audit and supplied documentation.
The goals of the project are as follows:
Session Leaders
Laurie Taylor, University of Florida
Chelsea Dinsmore, University of Florida
Suchi Yellapantula, University of Florida
Mark Phillips, University of North Texas
This presentation will provide an update on Fedora 4, both in terms of community support and technical development. Attendees will learn about the new Fedora 4.0 feature set, as well as use cases and strategies for migrating from Fedora 3.x to Fedora 4.
Session Leader
Mike Durbin, University of Virginia
Two project updates:
Researcher Identifiers—What's in a Name (or URI)?
Community Notes
A number of approaches to providing authoritative researcher identifiers have emerged, but they tend to be limited by discipline, affiliation or publisher. The rise of bibliometrics and its extension, altmetrics—the attempt to measure the impact of a work including mentions in social media and news media—strengthens the need to uniquely identify researchers and correctly associate them with their scholarly output. Both institutions and researchers have a stake in ensuring their scholarly output is accurately represented across academia and the web. It is time for universities to transition from watchful waiting to engagement.
It is difficult to uniquely identify researchers when they have not authored monographs, but write primarily journal articles, and thus are not represented in national name authority files. An OCLC Research Task Group comprising specialists from the US, the UK, and the Netherlands (see http://www.oclc.org/research/activities/registering-researchers.html#taskgroup) developed eighteen use-case scenarios around different stake-holders, generated a list of functional requirements derived from these use case scenarios, and profiled 20 research networking systems. A researcher ID information flow diagram illustrates the complexity of the current ecosystem. The same information about a specific researcher may be represented in multiple databases, and only a subset interoperates with each other.
This presentation will summarize emerging adoption trends and focus on three identifiers—ISNI, ORCID and VIAF. Participants will be asked to comment on the recommendations targeted to librarians, researchers and university administrators and share their experiences with or plans for researcher identifiers at their institutions.
Session Leader
Karen Smith-Yoshimura, OCLC Research
AND
SHARE: An Update on the SHared Access Research Ecosystem
Community Notes
An update on the latest developments with SHARE, a higher education and research community initiative to facilitate the preservation of, access to, and reuse of research outputs. Learn the status of the project’s first undertaking, the SHARE Notification Service, which aims to notify interested stakeholders when research release events occur. Currently in prototype development, the SHARE Notification Service is working with funding agencies, sponsored research offices, institutional repositories, disciplinary repositories, publishers, data archives, and other interested parties to provide a timely, structured, and comprehensive communication channel. Presentation will describe how the SHARE Notification Service can be used by researchers to keep interested parties apprised of their scholarly output; by universities to facilitate the work of the sponsored research office, tenure and promotion committees, and to oversee open access polices; by funding agencies to track grant compliance; and by libraries to help populate their institutional repositories.
The presentation will also touch on SHARE’s larger vision of a coordinated repository infrastructure that will give campus-driven research outputs their widest exposure, and facilitate their broad reuse. In its fully realized state, SHARE will provide a registry of what is available within publicly accessible repositories and facilitate discovery of, and access to, content across these repositories. SHARE will expose this content so that the community can reuse, mine, and build services on top of the corpus. We look forward to detailing this vision and getting your critical input as we pursue this community-driven project.
Session Leader
Eric Celeste, SHARE
The presentation will demonstrate the workflow system which is being implemented to manage this massive press collection, which has yielded to date more than 400,000 pages. It will shed some light on the BA's Digital Assets Factory (DAF), which is the nucleus upon which the digitization process of CEDEJ collection has been built. Additionally, the presentation will discuss the tools implemented for ingesting data into the digitization process starting form indexing until the creation of batches that are ingested into the system. The outflow will also be discussed in terms of organizing and grouping multipart press clips, in addition to the reviewing, validation and correction of the output. Light will also be shed on the challenges encountered to associate the accessible online archive with a powerful search engine supporting multidimensional search while maintaining a user-friendly navigation experience.
Session Leaders
Bassem Elsayed, Bibliotheca Alexandrina
Ahmed Samir, Bibliotheca Alexandrina
The project is co-led by IU's Vice President for Information Technology and Dean of University Libraries. IU is partnering with a commercial vendor, Memnon Archiving Services of Belgium, to set up a facility in Bloomington, Indiana to digitize these materials, in a workflow that will produce as much as 12 terabytes per day of digital data to be preserved beginning in summer 2014.
MDPI was planned out of recognition by IU leadership that large portions of IU's media holdings were becoming seriously endangered due to media degradation and/or format obsolescence. A 2008-2009 survey of holdings at IU Bloomington (http://www.indiana.edu/~medpres/documents/iub_media_preservation_survey_FINALwww.pdf) uncovered over 569,000 audiovisual items on 51 different physical formats held in collections of 80 different organizational units across the campus, with significant quantities of rare and unique items in danger of becoming inaccessible within 5-15 years due to degradation or obsolescence.
In this presentation, we will outline the goals and history of MDPI, describe the workflows that we are establishing to feed content into the digitization process and manage content coming out of the process, and discuss planned strategies for preservation storage, access, and metadata.
Session Leaders
Juliet Hardesty, Indiana University
Jon Dunn, Indiana University
AND
Building a Ten-Campus Digital Library Collection at the University of California
Community Notes
The University of California (UC) Libraries and the California Digital Library are nearing the conclusion of an ambitious project to build a shared system for creating, managing, and providing access to unique digital resources across ten campuses (see http://bit.ly/UCLDC).
The platform we are creating will have three major components: 1) a shared digital asset management system for librarians to centrally add and edit digital files and metadata, 2) a metadata harvest for digital resources hosted on external platforms, and 3) an integrated public interface so end-users can seamlessly search across these disparate resources. Together, these components will provide critical infrastructure for the UC Libraries to more efficiently, economically, and collaboratively manage and surface digital content. We will also be leveraging this platform to participate in the Digital Public Library of America (DPLA), and we are investigating the possibility of extending it to facilitate participation in DPLA by additional libraries, archives, and museums throughout California.
This session will build on a "Community Idea Exchange" poster presentation from the 2013 Forum—at which point we had just begun the project—to describe in more depth the components of the platform and the technologies employed, as well as challenges to and changes in our approach since we embarked. One of the more interesting aspects of our technology stack is that we have opted to license and customize a vendor product for the digital asset management system with which the digital library community may not have much familiarity (Nuxeo, http://www.nuxeo.com/), and in this session we will discuss our experiences with it. We will also describe how our project and our platform will connect with other initiatives, most notably the DPLA, and may provide a piece of the technical infrastructure needed for institutions across California to share their respective digital resources.
Session Leaders
Sherri Berger, California Digital Library
Brian Tingle, California Digital Library
AND
Redesigning Electronic Record Processing and Preservation at NARA
Community Notes
The US National Archives and Records Administration (NARA) is in the process of refactoring its infrastructure for the processing and preservation of electronic records.
In gathering requirements to enhance the tool suite at NARA, a number of needs were identified. The key need was for a flexible processing environment with an expandable set of software tools to verify and process a significant volume and varieties of electronic records. Existing systems lacked support for non-Federal digital materials (e.g., digital surrogate masters, Legislative, Donated, Supreme Court, etc.) or classified digital materials. And given highly successful partnerships with other types of organizations, there are growing storage for digital surrogates and a need for a more efficient workflows to provide public access.
This new infrastructure is described as the Optimized Ingest Framework (OIF). This framework includes a new model for managing the receipt and processing of digital materials for preservation and access; a modular approach to systems managing digital materials; a departure from the model of a single, monolithic system; the refactoring and evolution of existing systems; the establishment of an environment to provide necessary processing flexibility and tools for a wide variety of digital materials; and a more automated and robust solution for digital preservation with reduced complexity.
This refactoring comprises three modular systems: a Digital Processing Environment (DPE) that encompasses a suite of tools for processing including validation, characterization and transformation of files; a Business Object Management system to create and manage workflows for transfer and ingest; and an enhanced Digital Object Repository for the management and preservation of records and surrogates.
This project is just getting underway at NARA with its first iteration DPE prototype currently scheduled for early 2015.
Session Leader
Leslie Johnston, National Archives and Records Administration
Two project updates:
Running Up That Hill: The Academic Preservation Trust: A Community Based Approach to Digital Preservation
Community Notes
The Academic Preservation Trust (APT), a consortium of 17 institutions, was formed two and a half years ago when a small group of academic library deans agreed to take a community approach in building and managing a repository that would provide long-term preservation of the scholarly record. The repository also aims to aggregate content, to provide for disaster recovery, to leverage economies of scale, and to explore access and other services. From its beginning, APTrust has been a layered collaboration of deans, technology experts, content/preservation specialists, and a small APTrust staff located at the University of Virginia.
The growth of the consortium has been bumpy at times, with differences of opinion regarding technology decisions and, inside the University of Virginia, in building awareness that an entrepreneurial program requires quick responses from the infrastructure. APTrust remains repository and format agnostic by using the Baglt specification for content submission. Metadata is managed by Fedora with pointers to content preserved in Amazon S3 and Glacier with administrative functions built using Hydra and Blacklight. The repository is scheduled to go live in July and will become a DPN node. A panel of APTrust partners and UVA staff will describe the interplay in decision making among deans, technologists, and content experts and will discuss the evolving nature of an effort that is approaching full production, including questions of governance, business modeling, certification goals and the consortium's evolving approach to the complex issues related to digital preservation.
Session Leaders
Bradley Daigle, University of Virginia
Scott Turnbull, APTrust
Laura Capell, University of Miami
Stephen Davis, Columbia University
Elisabeth Long, University of Chicago
Nathan Tallman, University of Cincinnati
We will share our processes and encourage discussion with participants concerning digital preservation of complex media.
Project Background:
In February 2013, the Rose Goldsen Archive of New Media Art, part of Cornell University Library's Division of Rare and Manuscript Collections, received a $300,000 grant from the National Endowment for the Humanities to develop PAFDAO: preservation and access frameworks for complex digital media art objects: http://www.neh.gov/files/grants/cornell_universitypreservation_and_access_framework_for_digital_art_objects.pdf.
PAFDAO's test collection includes more than 300 interactive born-digital artworks created for CD-ROM, DVD-ROM, and web distribution, many of which date back to the early 1990s. Though vitally important to understanding the development of media art and aesthetics over the past two decades, these materials are at serious risk of degradation and are unreadable without obsolete computers and software.
Our goal is to create a scalable preservation workflow to ensure the best feasible access to these materials for decades to come, and also contribute to the development of coherent best practices in the area of preserving complex media collections.
Session Leaders
Jason Kovari, Cornell University
Dianne Dietrich, Cornell University
Michelle A. Paolillo, Cornell University