Some metadata may be more useful than others. Some may be more accurate than others. Some may be clearer. The quality of metadata can make a big difference to the effectiveness of an information retrieval tool.
Chapter 5 – Metadata quality
What are some of the key criteria for good metadata?
“Ultimately, what makes for good metadata is its potential to support effectively information retrieval. Its adherence to certain standards may indicate that professionals were involved in its creation, but this does not in itself make it good metadata” (Hider, 2012, p.86). While it is important metadata reflects the information context, aspects that might be considered when assessing metadata quality include:
- Functionality – While there are numerous possibilities in terms of elements that could be recorded in the description of an information resource, some are more likely to be useful than others depending on contextual factors such as the needs of the user and the retrieval system being utilised. To improve the functionality of metadata, aspects of the information context may be studied by metadata specialists, however, this is not an exact science and ascertain information about things such as the different information users and their needs can be difficult.
- Comprehensiveness – some descriptions are fuller than others and this can impact their effectiveness, however, more detailed descriptions often take longer to produce and therefore are often more costly to produce. Given that many of the possible attributes that could be recorded may not be that important, decisions regarding the amount of detail required are not simple. Some users require different levels of description (e.g. a university library may need more detail than a school library for its resources) and different types of resources may require different degrees of comprehensiveness (e.g. an Internet resource might only require minimal description to support discovery but not selection verses a novel which might require more information)
- Accuracy – Given the minimal information that is often used to find, identify, select or obtain information resources, the accuracy of the information provided is essential. While mistakes can occur it is important that these are rectified as it can impact a users’ confidence in professional resource description or hinder a user in their search for information resources. Misspellings, a misinterpretation of the subject, incorrectly recording a value or not updating an attribute of an information resource that changes (e.g. a website that updates or a serial that continues to add new editions) may all impact the accuracy of metadata used for describing information resources.
- Clarity – While metadata needs to be accurate, it is also imperative that it is recorded in a way that considers the users of this information. Metadata may be less accessible to users depending on the language used and the degree of familiarity the user has with terms or abbreviations used by metadata creators (e.g. cataloguer jargon) but increasingly metadata specialists are acknowledging the wide user audience. It also needs to be succinct so that more information can be presented to the user (i.e. more elements and descriptions) and to improve the quality of search results (i.e. less terms should allow for more precise matches).
- Consistency – Consistency encompasses both elements and values and can greatly increase retrieval. The use of standard values can make systems easier for users to use as they are more likely to be able to read and interpret information if it is standardised. Consistency may facilitate semantic interoperability within and across systems. However, the notion of consistency is far from simple as discrepancies might exist in interpretations between indexer and search (e.g. in one context the resource might be considered to be about terrorism while in another about freedom fighters) and between indexers.
- Vocabulary and authority control – To improve the effectiveness of retrieval systems, those creating metadata often use standardised or controlled vocabularies. Controlled vocabularies are often regulate or standardise subject terms (e.g. influenza rather than flu). Controlled vocabularies may indicate their preferred terms to users which is known as cross-references (e.g. Goal use jail, images use pictures, etc.). This may mean that antonyms (e.g. sickness and health) are also given allocated a single subject term (e.g. health being used for both). Other metadata elements can also be controlled such as author names and titles. This too can be problematic as author’s may share the same name (e.g. Jane Doe) and to disambiguate this element additional information may be given (e.g. Jane Joan Doe verses Jane May Doe) to separate two quite distinct set of resources.
How is metadata quality assurance achieved?
Quality of metadata can impact information retrieval. The quality of metadata might be improved through:
- Information agencies regularly engaging in a cycle of monitoring (e.g. by supervisors or senior metadata experts overseeing), evaluating (e.g. audits using criteria or utilising automated computer functions that check for such things as spelling errors) and improvement of metadata
- The development and refinement of best practice standards by professionals
- Ongoing professional development for information professionals
Hider, P. (2012). Chapter 5, Metadata quality (pp.77-91). Information resource description: Creating and managing metadata. London: Facet.
Activity: Quiz – The following are examples of abbreviations and jargon which have been routinely used by library cataloguers. Do you understand what is meant by all of them?
For the given abbreviations I was able to guess:
- t.p. = title page
- ill. = illustrations
- ports. = portraits
- prelim. = preliminaries (although I did not know that this meant in the context of library cataloguing until checking the answers (i.e. Anglo-American Cataloguing Rules defines as the pages before and including the title page and cover)
- repr.= reprint
- fl.= flourished (i.e. was living)
Read: The Introduction to Chapter 7 on pp. 103-104 of Hider, P. (2012). Information resource description. London: Facet. Then read the rest of Chapter 7 from pp. 104-144.
What are standards?
“A standard is more than a convention. It represents a practice that is prescribed, not simply what is normal” (Hider, 2012, p.103). Standards are set out in documentation and may be created for use by a single organisation or a group of related organisations, possibly even on a national or international level.
What types of metadata standards have been developed?
What are some of the issues with metadata standards?
“Ultimately, there is a trade-off between a desire for consistency and best practice, on the one hand, and the desire to address local needs and economic realities on the other” (Hider, 2012, p.104).
Metadata standards for key information domains:
- Web publishing – Hypertext Markup Language (HTML), Extensible Markup Language (XML), Resource Description Framework (RDF)
- Libraries – Anglo-American Cataloguing Rules (AACR), International Standard Bibliographic Description (ISBD), Resource Description and Access (RDA), Functional Requirements for Bibliographic Records (FRBR), Functional Requirements for Authority Data (FRAD), Machine-Readable Cataloguing (MARC), Z39.50
- Digital libraries – Dublin Core (DC), Metadata Object Description Schema (MODS), Metadata Encoding and Transmission Standard (METS), Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), Open-URL
- Archives – General International Standard Archival Description (ISAD(G)), Encoded Archival Description (EAD)
- Museums – Standard Procedures for Collections Recording Used in Museums (SPECTRUM), International Guidelines for Museum Object Information: the CIDOC information categories (CIDOC), Conceptual Reference Model (CRM)
What are the ‘standards’ for libraries?
- Elements and values
o Anglo-American Cataloguing Rules (AACR).
- Cataloguing code of 1967
- covered both heading and descriptions
- adopted primarily by English-speaking countries
- AACR2 was published in 1978 in response to the Anglo-American library community wanting AACR to align with ISBD and the need to incorporate rule revisions occurring since the original publication
- Developed at a time when card catalogues were widely used
- It was adopted by most libraries in English-speaking countries
- Revised versions of AACR2 were released in 1988, 1998 and 2002
- Later versions of AACR2 cover resources such as website but they use terms and concepts that might reflect the 20thcentury
- It has a lot of rules but also three different levels of description where the at the minimal levels different rules may not apply or be optional or are frequently adapted to address the local contextual needs
- It is organised in two parts: description (e.g. from general rules related to all resources such as how and what to describe to rules specific to particular types of material) and headings (e.g. choice of access points, headings for persons, geographic names, etc.)
- AACR2, like ISBD, does not systematically define various elements which can lead to rich resource descriptions but also create records that are less conducive to computer processing
o International Standard Bibliographic Description (ISBD)
- First published in 1971 by the International Federation of Library Associations (IFLA) specifically for books
- Later versions were created (e.g. ISBD for other material and ISBD(G) for materials in general)
- Originally developed for catalogue cards but consolidated editions are still being published (e.g. 2011)
- Prescribes the elements for cataloguers to include in their description of library resources and stipulates how this description is to be presented (e.g. order, punctuation etc.)
- Descriptions are in human-readable form to facilitate sharing
o Resource Access Description (RDA)
- Intended to replace AACR2 as it was decided that a complete overhaul of the system was needed to take advantage of the possibilities technology might afford rather than another releasing another version of AACR2 which would unlikely be able to be computer-friendly given its original inception in the era of card catalogues
- Released in 2010 and implemented in 2013
- RDA focuses on content (e.g. elements and their values)
- It is different from AACR as it does not prescribe the ISBD or any format. It is accommodates ISBD-based descriptions as well as descriptions that might be schematic in nature (e.g. RDF/XML) which makes it more useful in the online world
- “Its aim is to serve as the basis for the development of all cataloguing codes” so that international sharing of records is facilitated
- An advantage to those wishing to utilise RDA is that the code can be applied without radical change (which is significant in an era of fiscal constraint)
- RDA covers more elements and defines them more narrowly that AACR (e.g. different elements would be recognised for a title derived from the resource verses one constructed by the cataloguer when none exists)
- It is more relevant to a variety of information agencies beyond the library sector as it is not limited to the ISBD elements (more broad)
- RDA aims to cover all the elements that might be usefully included in authority and bibliographic records (e.g. various attributes of the persons, corporate bodies and other entities)
- RDA identifies 463 elements and sub-elements for bibliographic records and 59 for authority records (not all will be applicable for every resource but a small number of ‘core’ RDA elements are required if applicable for each resource)
- RDA’s organisation is based on the theoretical framework of two conceptual models of how an effective catalogue functions (FRBR/FRAD).
- The 10 sections are divided into chapters and rules are grouped together according to the specific element they cover.
- Online cataloguing tool (contains hyperlinks that improve navigation)
- Offers vocabularies, or sets of values for some of its elements; controlled vocabulary for description elements
o MARC (Machine-Readable Cataloguing)
- “MARC is a record exchange format used by automated library systems to share and process cataloguing data” (Hider, 2012, p.122)
- It does not tell the cataloguer what to record but how to record or encode catalogue records so they can be used by computerised systems
- Many countries developed their own variants of MARC but the USMARC became the standard which was adopted as international catalogue exchange became more common and as many records were already in this format. Later, MARC21 became the prominent variant and is maintained by the Library of Congress
- MARC commonly used in the library domain and while other formats that make bibliographical data more interoperable exist, the cost in conversion to another format may see considerable data lost
- Transmission standards
Transmission standards are required for the sharing of catalogue files as computers need to be able to not only process records such as MARC (which is widely used by libraries) but be able to receive it. Some users of bibliographic records will allow their computer to search for records, which may include searching several different online catalogues. As a result, many library management system have applications designed for this purpose, applying a client-user-protocol. These are often configured by specialists who must consider how to make retrieval effective given the different ways in which databases can be searched and ways in which records can be indexed. Specialised transfer protocols (between the client and the server) for downloading bibliographic records in formats such as MARC include:
o Z39.50 – It is widely used for MARC records but can be limited where systems are based on newer formats or structures.
o SRU & CQL– Newer and more adaptable than Z39.50 as facilitates Search/Retrieve by URL (SRU) meaning it is not limited to MARC records. It “enables search applications to communicate with systems outside the library domain, such as search engines, as well as online catalogues. It is maintained by the Library of Congress and may end up superseding Z39.50. The protocol is ‘XML focused’ and utilises Contextual Query Language (CQL)” (Hider, 2012, p.126).
o OAI-PMH – Stands for Open Archives Initiative Protocol for Metadata Harvesting and is Dublin Core (DC) equivalent of Z39.50.
o OpenURL – Has been widely adapted in many digital libraries. Allows a search to be duplicated, using URL format, on multiple systems via a ‘link resolver’. An example would be a journal article available through multiple databases which a library subscribes to.
What is FRAD?
- Focuses on the elements needed for authority records, which are often used by cataloguers to control the names of authors etc. rather than end-users
- Based on FRBR model but focuses on group 2 entities (i.e. person, family, corporate group)
- Based on a slightly different set of user tasks. “The tasks are: to find an entity associated with a resource; identify that entity; contextualise this entity amongst similar entities; and justify the preferred name for the entity” (Hider, 2012, p.120). Last task = cataloguer rather than end-user.
Hider, P. (2012). Chapter 7, Metadata standards (pp.103-150). Information resource description: Creating and managing metadata. London: Facet