Skip to content

Converts ArchivesSpace EAD Finding Aid XML to an OAI-PMH Static Repository with DC metadata.

License

Notifications You must be signed in to change notification settings

caltechlibrary/ead2dc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ArchiveSpace OAI/EAD to OAI/DC Conversion

There are two components to this service:

  • A Python 3 script to convert ArchivesSpace Encoded Archival Description (EAD) finding aids to Dublin Core (DC) records.
  • An Open Archives Initiative (OAI) data provider to provide access to the DC records.

License

Table of contents

Introduction

The Python 3 script, ead2dc.py, takes as its input the Open Archives Initiative (OAI) output of an ArchivesSpace resource finding aid in Encoded Archival Description (EAD) format and outputs an XML file. The XML output contains Dublin Core (DC) records for digital resources found in the finding aid. Only records containing links to digital objects are included. The XML output is a 'static repository', inspired by the OAI Static Repository specification, but not adhering to it strictly. The static repository is the data source for the Open Archives Initiative (OAI) Data Provider.

The OAI Data Provider adheres to the OAI standard and supports all the verbs (Identify, ListMetadataFormats, ListSets, ListIdentifiers, ListRecords, and GetRecord), resumption tokens, and sets. Only DC metadata is provided. Sets correspond to the archival collections in the Caltech Archives.

ead2dc - Main features and assumptions:

  • Both input and output are OAI-compliant XML files.
  • All 12 levels of EAD container are supported.
  • Titles are inherited down the container hierarchy.
  • All other metadata is mapped from the record containing the digital object references.
  • Records without digital object references are ignored (i.e. not mapped to the output file)

OAI Data Provider - Main features and assumptions:

  • The OAI Data Provider uses a static repository, i.e. it does not dynamically generate records.
  • DC metadata only.
  • Sets correspond to the archival collections and do not overlap.
  • Records are delivered in batches of 250.

Installation and Usage

The ead2dc.py file is designed to be run from the command line, or from within your favorite editing environment. It uses standard Python libraries and has been tested using Python 3.9.10 and 3.9.17.

The OAI Data Provider is a web application written in Python 3 using the Flask micro web framework. Installation of Flask will include dependent libraries, such as Jinja2 and werkzeug. No additional libraries are required.

Mapping

Element Encoded Archival Description Dublin Core
Collection title archdesc/did/unittitle title
Container titles dsc/c??/did/unittitle title
Personal creators dsc/c??/did/origination label="creator"/persname creator
Corporate creators dsc/c??/did/origination label="creator"/corpname creator
Dates dsc/c??/did/unitdate date
Extent dsc/c??/did/physdesc/extent extent
Description dsc/c??/did/abstract description
Subject, general dsc/c??/controlaccess/subject subject
Subject, geographic dsc/c??/controlaccess/geogname subject
Subject, person dsc/c??/controlaccess/persname subject
Subject, corporate dsc/c??/controlaccess/corpname subject
Subject, activity dsc/c??/controlaccess/function subject
Identifier dsc/c??/did/unitid identifier
Identifier, link dsc/c??/did/daogrp/daoloc['xlink:href'] identifier

Example

Paul B. MacCready Papers ca. 1931-2002, Caltech Archives

License

Software produced by the Caltech Library is Copyright © 2023 California Institute of Technology. This software is freely distributed under a BSD-style license. Please see the LICENSE file for more information.

References

Acknowledgments

This work was funded by the California Institute of Technology Library.

About

Converts ArchivesSpace EAD Finding Aid XML to an OAI-PMH Static Repository with DC metadata.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published