Expressing UNIMARC relator codes in RDA

Facilitating access to scientific bibliographic collections on the web of data is a major challenge for the sovereignty and scientific collaboration of European countries. The LRM-Factory project aims to create the first metadata factory to facilitate the transition to the semantic web for bibliographic agencies through the mass migration of collections to vocabularies such as RDA. The LRM-Factory project was presented by members of the Medical library Documentation Centrale des Hospices Civils de Lyon at the 6th UNIMARC Users Meeting. Following the presentation From UNIMARC to RDA: LRM-Factory, An Online Mass-migration Tool, we were asked about the difficulties encountered. In this contribution, we would like to answer this question by describing the mapping work carried out as part of the project: the alignment work between UNIMARC relator codes and RDA. After a brief overview of the project and the partners involved, the method used to construct this specific mapping and the tools used are described, and then the three possible levels of alignment that can be observed are dealt with in detail. A special case is briefly mentioned before the conclusion.

A colorful pattern on a white background Description automatically generated

Project and actors: The ACASE consortium

The LRM-Factory project is led by the ACASE consortium (Accès à la Culture, aux Arts et aux Sciences par les Entités/Access to Culture, Arts and Sciences by Entities), consisting of three partners. Each consortium member has a specific role to play in the project. Senolys owns and produces CoM3T (for “Case-oriented MARC Metadata Migration Tool”): the mass migration tool used for this project to configure the migration rules. It was developed, based on the thesis of Joffrey Decourselle, the owner of Senolys. Tech’Advantage is the owner and developer of Syrtis, an Integrated Library System (ILS) capable of producing and receiving RDA records, and the developer of the platform LRM-Factory, which is the platform connected to CoM3T used to configure the conversion (files used and other parameters but also analysis results). Syrtis is the ILS we use in the project to control and visualize RDA records. Finally, Hospices Civils de Lyon (HCL) is the second university hospital of France. It employs documentation experts who work on defining and perfecting the migration rules needed to convert this dataset. During this process, one of our other missions is to produce documentation, as clear and reusable as possible for any documentation professional wishing to try this project as LRM-Factory will become an online platform for anyone to try. For this project, we work with a dataset of 18 million records from Abes (Agence bibliographique de l’enseignement supérieur/Bibliographic agency of higher education). The goal is to be able to convert the Abes dataset in one go through CoM3T in order to obtain an RDA/RDF dataset. The original dataset is in the UNIMARC format. Once developed at the end of this project, CoM3T should enable millions of metadata to be processed in a simplified way. Such mass processing would be a European first.  It should be noted that the LRM-Factory project has successfully obtained funding from the French Government operating the France 2030 investment program.

Relator codes alignment methodology

As no complete mapping exists between UNIMARC and RDA, we decided to develop our own mapping by building an RDF mapping table (each of the UNIMARC fields and subfields are taken and expressed in RDA properties). Significant differences can be observed, depending on the type of resource and the mapping evolved over the course of the project. With this in mind, and in order to facilitate exchanges between the project partners as well as the implementation of the mapping in CoM3T, the current mapping version is expressed in RDF triplets. This work has also shown us the need to carry out certain specific mappings concerning the relator codes present in fields block 7XX or the metadata present in fields blocks 0XX (identification block) and 1XX (coded information block).

Relator codes are used to indicate the nature of a person’s or corporate body’s intellectual or other responsibility for a resource. The 7XX subfields for authorized access point and/or subfield 7XX$3 are also used to link to the authority record. Subfield 7XX$4, in which the relator codes are used, can therefore be used to indicate the authority’s role. The codes are not mutually exclusive: where several codes appear to be suitable, institutions using them may choose the code that is most specific or most suited to their uses.

The list of relator codes suggested by Abes is based on the 6th French edition of the UNIMARC Manual: Bibliographic Format (2010) and standard Z44-059 (1987). Recent additions to appendix C of the UNIMARC Bibliographic Format Manual (2023, online ed.) make some of Abes recommendations partially obsolete. Sudoc uses are detailed in a table available online. Some relator codes are not, or are no longer, used in the Sudoc catalogue, for example 000 “Undetermined function”, 240 “Composer”, 305 “Dissertant”, 400 “Funder”, 420 “Honoree”, 570 “Other”. These ‘obsolete’ codes, which are due to be corrected or replaced, have not been included in our mapping because of their unstable/unsure nature. The mapping work was carried out in a spreadsheet/Excel file. The codes in use in Sudoc were taken one by one and examined in order to look for possible correspondences with RDA agent properties. Numerous examples were consulted, and the tools provided by Abes (methodological guide, table of relator codes), the RDA Toolkit and the RDA Registry were used extensively. Exchanges with experts also took place to support our approach.

In UNIMARC, the 7XX block is broken down as follows:

    • 700/701/702 Personal Name,
    • 710/711/712 Corporate Body Name,
    • 720/721/722 Family Name.
    • There is also the special feature of 716 for trademarks.

In order to respect the precision of the information provided, our mapping uses and adapts this breakdown according to the agent levels in RDA. For example, the 310 “Distributor” relator code corresponds to three RDA properties, depending on whether it is used for a personal name, a corporate name or a family name: rdamo:P30359 “has distributor person”, rdamo:P30417 “has distributor corporate body”, rdamo:P30446 “has distributor family”. In some cases, this complete declination is not possible. For example, codes 725 “Standards body”, 981 “Laboratory associated with academic work”, 982 “Company associated with academic work”, 983 “Foundation associated with academic work”, 984 “Research team associated with academic work”, 995 “Co-supervision body” and 996 “Doctoral school associated with the thesis” are only mapped on corporate body properties. Seemingly, code 956 “Chairman of the jury” can only be applied to an agent person. Finally, mapping involves studying the links between agents and WEMI entities. The relator codes ultimately link the agent to the work (480 “Librettist”), to the expression (730 “Translator”), to the manifestation (760 “Wood engraver”) or to the item (920 “Current owner”), depending on the responsibility evoked by the definition of the relator code given by the Sudoc guide. The table of relator codes offered by Abes and the RDA documentation provide some guidance on this subject, and the team has carried out a thorough semantic and linguistic review.

A close-up of a miniature figurine Description automatically generated

Three possible levels of alignment

This mapping work has enabled us to identify three possible levels of alignment between UNIMARC relator codes and RDA. Some relator codes correspond directly to an RDA property. For example, 030 “Arranger” is equivalent to rdaeo:P20365 “has arranger person of music” (but also, as mentioned in the methodology description above, to rdaeo:P20483 “has arranger corporate body of music”, rdaeo:P20542 “has arranger family of music”).

In some cases, the relator code is trickier to align because, although a similar property can be found, it does not correspond exactly to it. In this case, we have chosen to adopt a generic property accompanied by a note on the entity it’s linked to. Thus, 062 “Attributed author” becomes rdawo:P10436 “has author person”, and a note is added (here, on the work). The description of the relator code is added to retain as much information as possible on the nature of the link between authority/agent and resource/entity.

In the most complex cases of alignment, no agent property can express the nature of the relationship indicated by the relator code. The relationship between the authority/agent and the resource/entity is expressed in a different way, and the solution therefore involves the design of bibliographic patterns. This is the case for relator codes 233 “Composer of adapted work” and 236 “Composer of main work”. Their alignment corresponds to the creation of an adapted work pattern and therefore implies creating another work, then linking the authority/agents to the work with the property rdaw:P10142 “is adaptation of work” and a property rdaw:P10053: “has composer agent” (also used for mapping the relator code 230 “Composer” to which it corresponds exactly).

For several cases of complex alignments, we have had to represent the bibliographic pattern in the form of a diagram. Indeed, if a critical review can be understood as a work or expression that is part of an aggregate, the relator code 675 “Author of the critical review” actually corresponds in RDA to a work-to-work relationship (to which agent/work relationships are of course associated) and is expressed as follows.

Figure 1: Schema for the author of the critical report (source: Bastien, Marie – Project LRM-Factory)
Figure 1: Schema for the author of the critical report (source: Bastien, Marie – Project LRM-Factory)

Conclusion

First of all, the work carried out highlights the interests, but also the challenges, of the notion of ‘bibliographic pattern’: at the crossroads of semantic and technical issues, it is therefore central in the transition from one format to another, and it is the heart of the functionalities of CoM3T to function based on bibliographic patterns. The complexity of the work involved in aligning certain codes, corresponding to more semantically complex or historically marked functions (e.g. attributed author, forger, printer-librarian, holder of the privilege), also highlights the particularities of heritage/patrimonial and specialised collections. Finally, our approach demonstrates the need to adapt to changes in the various formats and standards: while the 380 “Forger” code does not (for the moment) correspond directly to any RDA property, it now exists in RDA-FR.

Prepared by: Marie Bastien, Carole Bruno and Morgane Sedoud (Hospices Civils de Lyon, Documentation centrale, Project LRM-Factory)

Bibliography