Harvesting Guideline

From DISMARC Help

Jump to: navigation, search

Contents

DISMARC Harvesting Guidelines

DISMARC establishes a gateway to freely-available audio-content which will increase the visibility of the individual collections and will make the content discoverable to a wider audience. The project offers music archives the opportunity to submit music collections for aggregation in the DISMARC repository via harvesting of the metadata records associated with the resources.


The term ‘metadata’ refers here to the cataloguing and indexing information describing the data items. For metadata harvesting DISMARC uses the international standard “Open Archives Initiative protocol (OAI-PMH) www.openarchives.org. The DISMARC OAI-PMH harvester supports the Dublin Core metadata schema which has been adapted to the needs of music metadata on the basis of the Dublin Core application profile for libraries (DC-Lib). OAI-PMH enables connecting distributed electronic repositories of many kinds. An outstanding feature of the DISMARC search and browse services will be the multilingual search support and multilingual portal presentation.


Harvesting with DISMARC

DISMARC partners (data providers) need to undertake the following steps, in order to make their resources discoverable via DISMARC (service provider):

  1. The archive should provide us details such as the nature of the archive resources, number of metadata records to be harvested and metadata schemas & vocabularies
  2. The archive should make available a test sample of objects (preferred export formats: plain text files (.csv, .xml, .txt). If the archive is using a relational database system, then please make samples available as an SQL dump)
  3. OPTIONAL: install the DISMARC Metadata Manager (MDM) at the archive’s local site (DISMARC-on-a-stick)
  4. Finalise the metadata (and vocabulary) mapping via the provided tools (this procedure will be supported by our team)
  5. Please send the mapping information to our technical service
  6. OPTIONAL: If a local provider has been set up: advise us of the URL for the provider (to set up the schedule for future harvesting)
  7. We then conduct trial harvesting and notify the partner of any problems and incompatibilities.
  8. When trial harvesting is successful, the partners make a full set of metadata records available via their local provider
  9. We conduct an initial full harvest into a test system. A test search is conducted to make sure the harvested metadata is working correctly via the search functionality.
  10. We conduct an initial full harvest into our production system which makes the harvested metadata discoverable via the search tools.

Preferred Data Exports

The following ranking provides you with an overview on which kind of data exports are easily to be integrated in DISMARC to the ones that will cause additional importing effort: (starting from 1: minimal analyzing effort - to 6: causing major analyzing effort)

Export formats which can be used at the moment:

  1. XML (with UTF-8 encoding)
  2. CSV (any encoding)
  3. ISO2709
  4. Plain text files
  5. Database (with ER model attached)
  6. Database (without ER model)

Ongoing Operation

The archive should update their metadata according to the agreed schedule.


Help

The team provides a Help service to support contributors during the harvesting process. Email kochg@ait.co.at with any questions.

Personal tools