ETL505 Module 02: Tools and systems

2.1 Tools of library organisation

Tools and systems in information organisation utilise metadata created by information professionals, authors, publishers, and others to facilitate access to valuable information resources within specific collections. A library catalogue is a classic example of such a tool, but it is not the only one.

Online resources can be accessed via a database, but they first need to be discovered and identified as the appropriate or best resource. Information organisation tools must support all five FRBR user tasks. As mentioned in Chapter 2 of the prescribed text, navigating collections is also crucial. Thus, navigation is the fifth function supported by the tools and systems discussed in this module.

The primary tool for improving access to information resources (and many other types) is the index. This tool organises a collection intellectually rather than physically, allowing each resource to be distinguished by its attributes. In libraries, these indexes are called catalogues; in archives, they are known as finding aids; in museums, they are referred to as registers. Indexers and cataloguers are responsible for creating and maintaining these indexes.

All indexes are essentially sets of resource descriptions. A simple index might consist of a single word or phrase, such as a back-of-the-book index. More complex indexes contain records divided into fields representing various attributes of the resource, with separate indexes for each field, such as title, author, and subject indexes. Regardless of their complexity, indexes are essential for accessing physical and online collections effectively, transforming a collection into a functional library.

In addition to indexes, other tools help users find what they need, such as online directories that offer a less directed way of exploring resource collections. However, the most widely used tool is another form of index synonymous with online information retrieval: the search engine, with Google being the prime example. Most search engines for the entire Internet do not rely heavily on metadata but instead index the actual content of resources, particularly the text. While effective in many situations, search engines are not always the best tool, especially when information needs are vague, cannot be precisely expressed, or involve non-textual content. Therefore, metadata-based tools like library catalogues remain highly valuable.

Read

The first three sections of Chapter 3, ‘Tools and systems’ (pp. 43-49) in Hider, P. (2018). Information resource description: Creating and managing metadata (2nd ed.). Facet Publishing.

Arrangements

Humans have been organising items for thousands of years, including information resources. Arrangements are designed to aid in searching for and browsing collections of information resources. Given the numerous attributes of resources, they can be arranged in various ways. Retaining the ‘original order’ of resources is sometimes best, an important archival principle. It is essential that users understand the basis of any arrangement. Arrangements can be made across one, two, or three dimensions. A common system is arranging by one dimension, such as by author.

Sophisticated arrangements often use labels for both groups and individual items. In libraries, for instance, sections like ‘fiction’ or ‘sports’ are common, with specific labels indicating each resource’s place within the section. Labels can be literal or symbolic, such as an author’s name or a library call number. Online, labels are used in hyperlinks, forming menus or directories.

Labels can also be non-textual symbols, like colours representing different subjects. Arrangements involving letters, numbers, or symbols adhere to conventions and standards, such as numerical or alphabetical order. For example, the American Library Association’s Filing Rules standardised the arrangement of library catalogue cards.

Symbolic notations, such as Dewey Decimal Classification (DDC), are used in library classification schemes. These notations allow resources on similar topics to be placed close together, creating a classified order. Website menus and online directories also use classified arrangements, often based on the meaning of labels rather than alphabetical order, providing logical and user-friendly structures.

Indexes and Databases

The primary tool for information organisation is the index, which comprises metadata arranged independently of the resources it represents. This independence allows for multiple access points to a resource, increasing the likelihood of finding it. Index entries can link directly to online resources or provide information about the location of physical resources. Indexes are compact, making it easier to view descriptions than the actual resources.

Indexes have been used long before computers and remain prevalent in both physical and electronic forms. They underpin various information retrieval systems, including library catalogues, bibliographic and citation databases, archival finding aids, and search engines.

Indexes can be created for single resources or collections, providing intellectual access to their components. Book indexes, for example, list terms alphabetically, referring to page numbers. In contrast, open indexes for growing collections require flexibility, such as the loose-leaf file or card index. Card indexes, like the library card catalogue, have largely been replaced by computer databases, which store vast numbers of records efficiently.

Database systems, organised to process search queries efficiently, assign records a number and divide them into fields. Separate indexes are compiled for different fields, allowing for detailed searches. Modern computers can index every word in a field, offering deep access impossible with card indexes.

Computer databases store various types of data, not just bibliographic records. They are used in hospitals for patient records, universities for student records, and many other applications. As data quantities have grown exponentially, traditional databases face challenges. ‘Big data’ involves massive datasets requiring advanced computing and mathematics for analysis, differing from traditional information organisation’s goal of efficient access to discrete collections.

 

The arrangement of resources by Dewey Decimal Classification, alphabetical order, and occasionally by other attributes such as material type, difficulty level, and genre, along with the use of indexes and databases like the library catalogue, are key tools for information retrieval used in school libraries.

Tools Used in Libraries

Libraries employ various tools to organise information, including:

  • Library catalogues
  • Periodical databases
  • Citation databases
  • Image and other specialised databases
  • Bibliographies and subject guides
  • Online subject gateways and directories
  • Search engines

The form and structure of today’s classification schemes and library catalogues have their origins in the nineteenth century. Many library databases evolved from printed indexes compiled over the last century. However, librarians do not avoid using newer, often more powerful tools developed in recent years—librarians also use Google! Often, a combination of tools is used, as different tools can be more effective in different situations. Throughout this module, we will explore their advantages and disadvantages.

Listen

Listen to this talk by Alice Ferguson, from Charles Sturt University Library, in which she discusses using various tools for organising and retrieving information.

90% of the day, they provide information to students: library on desktop, physical books, online databases, and online searching. Library cataloguing has been collected and provided by academics and is accurate. This is not the case with online searching. Information is added at random with no authority. The importance of organisation ensures that catalogues are kept useful. Without this catalogue formality, libraries cannot be the knowledge database they are. Despite this, the ease of access to the web dominates searching.

Library Catalogues

Library catalogues are a classic example of an information organisation tool. Librarians have been describing the resources in their collections for a very long time. Initially, these were simple inventories for stocktaking purposes. However, as collections expanded and the need to serve patrons became more professional, the catalogue became an essential tool. It became the primary means for staff and patrons to discover what a library held. Card catalogues, introduced in the late nineteenth century, featured record cards for each item, sorted into drawers under titles, authors, subjects, etc. While some libraries still use card catalogues, they have largely been replaced by the online public access catalogue (OPAC).

Online catalogues offer many features that card catalogues did not, but the basic content of the records remains similar, with titles, author names, subject headings, and more. In online catalogues, many more elements can be searched, including individual words within elements, enabling patrons to find items with minimal information, such as an author’s surname and a keyword from the title. Online catalogues also present more records simultaneously, facilitating easier selection.

While many elements in a modern catalogue can be searched (known as access points), some are not indexed. For instance, most people would not search for a book by the number of its pages. However, these elements still form part of the description and are generally displayed in the ‘full record’. Catalogue records describe both the physical features of the carrier (e.g., a book’s number of pages) and the information itself (e.g., its subject). Titles often indicate a resource’s subject, but additional terms and synopses (especially for novels or films) are also included.

Library cataloguers routinely control key access points, such as authors’ names and subject terms, using consistent names and terms for the same author or subject. This standardisation is known as authority control.

Watch

The Process of Cataloguing

A library catalogue record is a set of metadata describing a resource, providing enough detail for users to select items that meet their needs and confirm the item matches the library’s offerings. Cataloguers are responsible for accurately recording details about resources, whether they are printed materials, electronic resources, websites, films, or DVDs.

Steps in Cataloguing:

  1. Describing the Resource:
    • Basic information includes the creator’s name, the title, the general content type, required equipment for viewing, and packaging details.
    • Important publication details such as the publisher’s information, publication date, edition, and standard identification numbers (e.g., ISBN for books, ISSN for journals) are also recorded to distinguish the publication from other versions.
  2. Recording Physical Information:
    • This includes the number of pages for books, illustrations, sound details, running time for audiovisual materials, and dimensions.
    • Additional important characteristics or attributes such as alternative titles, bibliographic information, and any accompanying materials are also identified.
  3. Creating Access Points:
    • These are terms users might search for, including the title, creator, alternative titles, series names, and subject descriptors.
    • Classification involves assigning the resource to a specific collection and using systems like the Dewey Decimal Classification for organisation.

Example Process: The tutorial illustrates cataloguing with the book “Learn Cataloguing the RDA Way.” Cataloguers examine the book’s cover and additional materials for relevant information such as the title, authors, and series. They check the title page for primary details and the verso for publishing information, including the publisher name, copyright date, and ISBN.

Physical Assessment: Cataloguers note the number of pages, illustrations, and the book’s height. They document any bibliographies or indexes included in the book, then compile descriptive elements including the title, authors, physical details, and additional resources like bibliographies.

Recording Access Points: Prioritising surnames for unique searchability, cataloguers identify the book’s subject matter and match it to standardised terms from the Library of Congress subject headings, ensuring consistency.

Final Steps: The process culminates in structured records using guidelines from RDA (Resource Description and Access), Library of Congress subject headings, and Dewey Decimal Classification. The tutorial emphasises the importance of these standards in maintaining catalogue consistency and preparing for electronic data sharing through machine-readable cataloguing (MARC) formats. This foundational understanding of cataloguing practices equips cataloguers for more advanced practical units in the future.

Activity

Go to the SCIS website https://www.scisdata.com/ then click on ‘Login’ in the right-hand panel. Use the SCIS username and password that you created in Module 1 to access SCIS Data. Click on ‘Search’ and you will be in the SCIS catalogue (or SCIS database) which contains all the catalogue records created by SCIS for school libraries. Guidance on using this catalogue is available at ‘Help’.

Locate three catalogue records of interest to you and explore which elements can be searched in this particular catalogue, and which elements describe a resource’s information content. It is best to search for current records (2014 and after) to see recent examples of SCIS records.

Read

‘Library catalogues’, pp. 50-58, in Hider, P. (2018). Information resource description: Creating and managing metadata (2nd ed.). Facet Publishing. 

Library Catalogues

Libraries have been providing catalogues of their collections for centuries. Originally, these catalogues were lists of manuscripts in medieval European monastic libraries. As university libraries and other significant collections grew, their catalogues became more organised and served as tools for retrieval rather than mere inventories. By the 19th century, catalogue records were created on separate sheets of paper, later transitioning to cards by the early 20th century. The advent of card catalogues coincided with the standardisation of cataloguing practices, enabling the sharing of records among libraries. Catalogue entries typically included author, title, and subject headings.

The transition to digital began in the 1970s and 1980s with the Online Public Access Catalogue (OPAC), which replaced card catalogues. This shift required the conversion of vast amounts of bibliographic data into electronic formats, often using the MARC standard. OPACs integrated with circulation modules to provide real-time availability information and evolved to offer more sophisticated search options and remote access.

Despite these advancements, some OPACs have struggled to keep pace with the user-friendly interfaces of search engines like Google and Amazon. New-generation catalogues have emerged, featuring single search boxes, relevance ranking, and more interactive user-system interactions. They allow users to explore records through facets, tag clouds, and FRBR-based displays, which organise results by work and its various expressions and manifestations. Web 2.0 features, such as user tagging and rating, have also been introduced.

However, challenges remain. The quality of metadata and the limitations of MARC records affect the effectiveness of next-generation features. Additionally, library catalogues represent only a fraction of available online resources. To address this, libraries are increasingly integrating their metadata with the broader online environment. Federated search systems, or discovery layers, have been implemented in many libraries to allow simultaneous searching across multiple databases.

While library catalogues have evolved significantly, ongoing efforts are needed to enhance their functionality and integration with the wider digital landscape.

Which generation of OPAC do you feel the SCIS catalogue represents? Note that SCIS has chosen to link a range of social networking sites to its records in this catalogue.

Combes, B. (2012). Practical curriculum opportunities and the library catalogue. Connections, 2012(82), 5-7.
(View this article by going to the Home page of SCIS https://www.scisdata.com/ . Click on the tab ‘Connections’ and scroll down to Issue 82, Term 3 2012. You will need to download the issue to view the article.)

This article provides an overview of the roles a school library catalogue can potentially play in a school’s educational programs.

Understanding and Utilising Creative Commons: An Essential Guide for Educators

Creative Commons (CC) offers an alternative to the traditional “All Rights Reserved” copyright model, providing a range of licenses that allow creators to share their works more freely while retaining some rights. These licenses help creators specify permissions for reuse, remixing, and redistribution of their works.

Types of Creative Commons Licenses:

  1. Attribution (BY): Allows others to use the work in any way, provided they give appropriate credit to the creator.
  2. Attribution-ShareAlike (BY-SA): Permits remixing and redistribution, including for commercial purposes, as long as the new work is licensed under identical terms.
  3. Attribution-NoDerivs (BY-ND): Allows redistribution, commercial and non-commercial, provided the work is unchanged and credited.
  4. Attribution-NonCommercial (BY-NC): Lets others remix, tweak, and build upon the work non-commercially, with credit to the creator.
  5. Attribution-NonCommercial-ShareAlike (BY-NC-SA): Permits non-commercial remixing, tweaking, and building upon the work, with credit and the same licensing terms for new creations.
  6. Attribution-NonCommercial-NoDerivs (BY-NC-ND): The most restrictive license, allowing only non-commercial sharing with credit and no modifications.

Key Terms:

  • Remix: Originally an audio term, now generally used for combining various works (photos, videos, text) to create something new.
  • ShareAlike: Ensures that any derivatives of the work are licensed under identical terms.

Using Creative Commons: To license a work, creators can visit the Creative Commons website and answer two questions:

  1. Allow commercial uses of your work?
  2. Allow modifications of your work?

The website then generates the appropriate license, which can be embedded as HTML code on a web page or displayed as text.

Implications for K–12 Education:

  1. Finding and Using CC Licensed Materials: Students and teachers should be familiar with finding and interpreting CC-licensed materials, which provides a legal and ethical way to use creative works. Useful resources include the CC search tool and directories of CC-licensed content.
  2. Licensing Educational Materials: Teachers can license their materials with CC to share with other educators, promoting collaboration and resource sharing. Check local policies regarding ownership of teacher-created materials.
  3. Teaching Intellectual Property: Requiring students to license their own creations under CC helps them understand intellectual property rights and responsibilities. This fosters empathy towards content creators and a better grasp of copyright issues.

The adoption of CC licenses narrows the gap between technology use and legal/ethical standards, promoting a culture of sharing and collaboration in education.

Resources:

By embracing Creative Commons, educators and students alike can legally and ethically engage in the creation and sharing of knowledge, benefiting the entire educational community.

Author and Context: The article is authored by Doug Johnson, Director of Media and Technology in Mankato Schools, and was originally published on his website in 2009​​.

Read

‘Bibliographic databases’ and ‘Citation databases’ on pages 49 and 70, in Hider, P. (2018). Information resource description: Creating and managing metadata (2nd ed.). Facet Publishing.

Bibliographic Databases

Bibliographic databases are extensively used in libraries and related fields like bookselling. These databases describe information resources, primarily at the manifestation level, in bibliographic records divided into various fields representing different bibliographic elements. Typically, a bibliographic database includes a general keyword index, title keyword index, author keyword index, and subject keyword index, among others.

A significant type of bibliographic database is the library catalogue, discussed in the following section. Other types include those provided by commercial services. Indexing and abstracting services’ databases complement library catalogues by offering deeper access to information resources. While library catalogues usually index periodicals as single resources, these databases index individual periodical articles and conference papers, sometimes providing direct online access to the articles.

Major providers like EBSCOhost and ProQuest aggregate records from multiple sources to create extensive databases that serve as crucial academic tools. Records in these databases often include an abstract, typically supplied by the author, though sometimes created by the service provider. Database providers add value by assigning their own index terms to each article, in addition to author-supplied keywords. Other bibliographic databases cover materials such as theses, dissertations, and books not specific to a library collection.

Citation Databases

Citation databases represent another type of index based on metadata, specifically the reference lists appended to journal articles and other texts. These references establish a relationship between citing and cited documents, implying a form of similarity. Although documents can be cited for various reasons, and relevant documents may not always be linked through citations, citation databases offer an alternative and often effective method to find related materials. Unlike other information organisation tools that utilise diverse metadata sources, citation databases rely on references provided by authors.

Read

‘Federated search systems’ on pp. 58-60 in Hider, P. (2018). Information resource description: Creating and managing metadata (2nd ed.). Facet Publishing. 

Federated Search Systems

There is a growing expectation for service convergence, especially online, with many users preferring a ‘one-stop shopping’ approach to information seeking. Library websites often provide access to numerous databases, but users are generally unwilling to search each one individually. Consequently, libraries have introduced federated search options, enabling users to search multiple databases simultaneously. The results from these databases are integrated, de-duplicated, and presented as if they come from a single system. This concept, formerly referred to as a ‘portal’, represents a single access point to various resources.

Major discovery tools offered by commercial vendors include Primo from Ex Libris, Summon from ProQuest, and EBSCO Discovery Services, with open-source solutions like VuFind also available. The main challenge in creating federated search applications is interoperability, as different database systems often do not communicate effectively due to differences in syntax, functionality, and semantic definitions. Standardisation is the primary solution, but it is difficult to achieve across databases designed for different communities. Consequently, federated search systems may offer more basic functionality than individual database systems, making certain searches more challenging.

Libraries continue to provide access to individual databases alongside their federated search systems due to the mixed blessings of discovery tools. Benefits include ease of use, one-stop shopping, facet limiting, and citation information, while concerns involve excessive results, relevancy issues, incomplete content, loss of specific catalogue details, reduced database functionality, and user knowledge gaps. Metadata quality is crucial, as it must work well both in its original system and in the discovery tool.

Libraries are not the only institutions developing federated search systems, but their contributions are significant. For example, Trove offers integrated access to the National Library of Australia’s collections. Some systems, like the union catalogue hosted by the SearchM25 consortium of academic libraries in and around London, and the European Library, which includes collections from 48 national and research libraries across Europe, operate across multiple institutions. The (UK) National Archives’ Discovery system provides aggregated access to multiple archives.

Federated search systems, however, remain silos when their contents are hidden from general web search engines. While proprietary databases cannot be made publicly accessible, libraries aim to maximise exposure for their catalogues. Therefore, they have started providing catalogue content to companies like Google and Yahoo! for indexing and publishing this content as ‘linked data’ in anticipation of the Semantic Web.

Discussion / Forum

Conduct a search on Primo or Trove to see the wide range of sources that are brought together by these federated search engines.

Share your observations in the discussion forum:

I searched for the first topic that came to mind, which is linked to a Year 4 unit on Nikola Tesla (inventors).

On Trove, there were over 2000 results, which came in a variety of formats, from newspaper articles, to research articles to books and audio and video resources. I like how there was a separate option to explore further on websites, therefore expanding search opportunities but making it clear that these were moving outside the ‘vetted’ resources available directly through Trove.

On Primo, there were just over 4,300 results, again from a variety of sources, mostly newspaper articles and only 138 website links.

Delving further, some resources were the same but I did like that there were differences between resources within the searches.

Having resources appear in the same location is incredibly helpful and something that we are struggling to utilise in school – we don’t have the systems in place and often go between various sites, which can be challenging for both students and staff when looking for credible and reliable sources to use in lessons.

Activity

Scootle is an example of an Australian educational digital library for schools. Scootle does not describe itself as a library but could be considered to be a database, collection, website, or even an information agency.  The terminology frequently becomes less specific in the online environment. Read the overview of Scootle and find out more about the National Digital Learning Resources Network

Comment on Scootle as a starting point when searching for electronic resources.

What I do appreciate about Scootle from  a teaching perspective, is the resources and how they are set out to search and support teaching and planning. However, it is restricted for the Primary Years – there are no history resources available pre-7, for example, and it would be great to see this expand further to support the curriculum throughout F-10.

On a positive, delving into the curriculum objectives for the core subjects and being able to choose resources based on specific objectives is incredibly helpful – including the variety of resources available, such as units of work and reports. Again, it would be nice to have physical resource suggestions that link.

Read

‘Content management and repository systems’, pp. 64-66, in Hider, P. (2018). Information resource description: Creating and managing metadata (2nd ed.). Facet Publishing.

Content Management and Repository Systems

Many institutions, including museums, now possess substantial digital content requiring effective management. Various content management systems (CMS) have been developed to cater to diverse digital collections. Some CMS are designed for organising internal business documents on company intranets, with SharePoint being a leading application for enterprise search. Others showcase digitised collections of museums, libraries, and similar institutions to the public. These systems often allow ongoing contributions from staff or the public.

Universities have established publicly accessible institutional repositories to store and sometimes share the scholarly outputs and associated datasets of their researchers. Some repositories, such as YouTube, rely entirely on public contributions. Institutions like the National Library of Australia (NLA) host collaborative collections like Pandora, an archive of web resources curated by the NLA and other Australian institutions. Subject-specific repositories, such as PubMED, are becoming less common compared to institutional repositories, which are easier to manage.

Digital collections are widespread on the web and intranets, varying greatly in size and nature. Content can be textual, audiovisual, or multimedia, with many collections now born digital, while others result from digitisation projects. The design and functionality of these systems vary significantly. They may use minimal or extensive metadata, depending on the resources’ nature and the availability of metadata. Professional indexing is often too costly for large collections, necessitating reliance on creators, contributors, or users for metadata.

Public-facing online collections often use generic database management applications, which form part of the hidden web, not indexed by external search engines. A solution has been to make metadata available for harvesting and indexing in other search systems. Digital collections managed by libraries, museums, and archives have driven the development of metadata standards and research in information retrieval/organisation.

Visualisation is a key area of study, focusing on presenting information innovatively, such as displaying search results as maps or clouds, or replacing text with pictures in interfaces for children. Visualisation becomes challenging with mixed media resources. Virtual reality technologies may offer new ways for users to engage with and explore digital collections.

 

Challenges for Teacher Librarians

Teacher librarians face the challenge of integrating and providing access to selected external digital collections in a way that best serves their school community’s needs. Additionally, they must decide whether to develop a digital collection alongside the physical one to meet the specific needs of teachers and students. As digital resources are typically external to the school, the key to building such a collection lies in the access provided.

Read

‘The scope of digital libraries’, pp. 6-9 and ‘Metadata: Elements of organisation’ pp. 285-286 in Witten, I. H., Bainbridge, D., & Nichols, D. M. (2010). How to build a digital library (2nd ed.). Morgan Kaufmann. Available from CSU eBooks.

The Scope of Digital Libraries

Digital libraries span a vast range, as illustrated by four examples in various stages of development. While libraries might appear scholarly and specialised to many, they can cater to practical interests, such as those of Kataayi’s members. Academic libraries aim to support research and education, with fields like high-energy physics already relying heavily on electronic document collections. Digital libraries uniquely capture, preserve, and share culture in multimedia formats. Popular collections, like those featuring music, films, or TV shows, have become consumer products, accessible on portable, web-enabled devices.

A “killer app” is an application that creates a sustained market for a promising technology. The term, first used in the mid-1980s for the Lotus spreadsheet, described the software that drove the business market for IBM PCs. The World Wide Web is often seen as the Internet’s killer app. For digital libraries, music collections might be this transformative application. Additionally, in the developing world, digital libraries themselves could be the killer apps for computer technology.

Libraries and Digital Libraries

Is a digital library an institution or technology? The term “digital library,” much like “library,” holds different meanings for different people. Many envisage libraries as physical buildings housing books, while professional librarians view them as institutions responsible for preserving, collecting, organising, and providing access to various materials. Libraries encompass more than books, including art, film, sound recordings, botanical specimens, and cultural objects. Researchers see libraries as networks offering access to the world’s recorded knowledge. Modern students, however, often mistakenly equate libraries with the World Wide Web.

A digital library is more than a “digitised library.” It represents a new way of handling knowledge—preserving, collecting, organising, propagating, and accessing information—not merely converting physical libraries into digital formats. A digital library is defined as: “A focused collection of digital objects, including text, video, and audio, with methods for access, retrieval, selection, organisation, and maintenance of the collection.” This definition includes various digital objects beyond text, such as 3D objects, simulations, dynamic visualisations, and virtual reality. It emphasises the importance of both user access and librarian organisation.

Digital libraries blur the line between user and librarian roles, with collections often created by non-professionals. However, the distinction between these roles remains important. Digital library software aids both users in searching and browsing, and librarians in organising and maintaining collections.

Digital libraries, though without physical walls, require boundaries. Collections must have a purpose and guiding principles to ensure their cohesion and integrity. Decisions on what to include or exclude are crucial and challenging.

Unlike physical libraries, digital collections often appear opaque, with little indication of their scope or content quality. A physical library’s presence and permanence are apparent, while digital collections lack this tangibility. Digital libraries differ from the Web, which lacks selection and organisation. They also differ from well-organised websites, which may not be easily expandable. In digital libraries, new acquisitions should integrate seamlessly without manual updates, akin to cataloguing in physical libraries, where metadata (information about data) plays a crucial role.

Metadata: Elements of Organisation

Metadata, often referred to as data about data, is essential for the organisation of digital content. It underpins the structure of digital libraries, ensuring they are more than just unorganised collections of digital objects. Metadata enables the creation of organised collections through various digital library technologies discussed in this book, such as surrogates, video browsing categories, and usage information.

Metadata facilitates:

  • Item displays and surrogates in the Pergamos Digital Library.
  • Browsing structures in the Village Brickmaking collection.
  • The table of contents of the Otago Witness newspaper.
  • Coloured tabs marking chapters in the realistic book “Farming Snails I”.
  • Complete record displays.
  • Structured form searching interfaces.
  • Various searching interfaces.
  • Structured subject browsing.
  • User log graphs.

Generally, metadata is structured information about an information resource, allowing meaningful manipulation without content understanding. For example, bibliographic information serves as metadata for source documents, providing structure by identifying author names, titles, and so forth. Additionally, metadata might include information about the bibliographic items, such as the compiler and the compilation date.

To clarify the role of metadata in your digital library, consider these questions:

  • What is the source of your metadata? Is it extracted automatically, manually assigned, or imported?
  • How does metadata influence document display, browsing, searching, and maintenance?
  • Does the metadata require additional processing before use? For instance, should variations in people’s names be standardised?
  • Is your metadata influenced by end-user activities?
  • Can you monitor metadata quality continually?
  • Is your metadata private or shareable?
  • Can your metadata be migrated to another software application?

Consider how a school might organise digital content.

 

Digital Libraries

The term ‘digital libraries’ is widely used, referring not just to traditional libraries but to all types of virtual information centres. These digital libraries provide online access to collections of resources, which may not necessarily be owned or stored by the centres themselves. The merging of different media and types of information resources has blurred the lines between libraries, archives, and museums in the digital space. Consequently, ‘digital library’ can describe online museums, archives, and other digital collections. As these collections expand, the methods used to organise and describe them become increasingly significant.

Search Engines

Search engines are not solely a tool for library retrieval; they are used to access a variety of resources both within and outside libraries. While their scope can be limited to specific websites, intranets, or types of resources (such as those in French), popular search engines like Google and Bing aim to search the entire web. In practice, even the most successful search engines only scratch the surface and do not perfectly order the thousands or millions of results by relevance. Nevertheless, they are highly effective for many information-seeking tasks, particularly for fact-finding searches, as long as the retrieved information is treated with caution.

Users of search engines may encounter issues with ‘homonyms’ and ‘synonyms’. For instance, searching for ‘football’ might yield results about the wrong type of football, or searching for ‘glasses’ might return sites about spectacles instead of tableware. These problems arise due to a lack of vocabulary control; search engines index terms as they appear in resources, and different resources use terms in various ways. Additionally, different search engines will find and index different resources, ranking them according to distinct algorithms.

The key point is that search engines primarily rely on the content of resources rather than the metadata describing them.

Read

‘Search engines’, pp. 66-68, in Hider, P. (2018). Information resource description: Creating and managing metadata (2nd ed.). Facet Publishing.

Search Engines and Their Role in Information Retrieval

Search engines index resources in more ‘open’ digital collections, primarily focusing on content rather than metadata. Their significance in modern information retrieval is immense. Major search engines like Google, Bing, and Yahoo provide access to millions of online resources and are often the first choice for finding information on the web.

According to Dawson and Hamilton (2006), Google’s success can be attributed to its speed, reliability, extensive indexing of various document types (HTML, PDF, Word), useful contextual summaries, constant updates, advanced features, simplicity of use, and being free to users worldwide. The advent of mobile computing has further enhanced the usability of search engines, making them ideal for obtaining information on the go.

While search engines like Google aim to index as much of the web as possible, others focus on specific areas, such as particular subjects, media, formats, countries, or languages. Some are designed to cover specific organisations’ intranets rather than the entire internet. The effectiveness of most search engines lies in their ability to index vast amounts of content and organise search results by relevance rather than alphabetically. This relevance ranking has been adopted by other retrieval systems, including next-generation library catalogues. These algorithms, refined over many years, involve complex mathematics and are crucial for improving search accuracy.

Personalisation has become a significant feature in enhancing relevance rankings, tailoring results based on users’ previous interactions with the system. This technique is distinct from customisation, which involves direct user involvement with system settings. Personalisation is also used in ‘recommender systems’, which suggest resources based on content similarities or shared ratings.

However, search engines are not flawless. They may not be suitable for comprehensive scholarly research or for users with only a vague idea of their information needs. Miller (2004) argues that current search engines struggle with focused requests for richly structured information found in databases of institutions like libraries, hospitals, and universities. The retrieved information is often buried among millions of hits with varying degrees of accuracy, authority, and relevance.

Indexing more metadata alongside content is not always the solution. Metadata, when available, is not always reliable. Some website managers engage in ‘spam tagging’, using popular keywords irrelevant to their content to manipulate relevance rankings. Proper configuration of search engines to effectively use metadata can be challenging, particularly in enterprise search environments where technical expertise may be lacking. Nonetheless, when quality metadata is effectively harnessed, it can significantly improve search performance (White, 2016).

Many search engines are commercial enterprises that profit by incorporating sponsored search results. These sponsored resources appear at the top of search results, though they are usually marked as such.

 

 

Ironically, despite Hider’s comments on search engines, users often prefer using search engines like Google over thoughtfully constructed school library collections, which provide more controlled and generally more reliable access. A teacher librarian’s expertise in information resource description is essential for developing educational programmes that help users effectively and thoughtfully use various access tools and systems.

Most search engines retrieve information based solely on text. However, significant research has been conducted into retrieving content-based images and sounds. This approach indexes the attributes of images and sounds, such as colour, texture, and pitch, rather than using captions and index terms. Several experimental search engines that offer visual and audio search methods are available on the web. Their limited effectiveness can be seen either as an opportunity for further research or as a sign of a lack of viable future.

Image retrieval

The video introduces an innovative feature from Google that allows users to search using images instead of text, emphasising the idea that a picture can convey more information than words. This feature, referred to as “Search by Image,” enables users to explore a wide range of subjects, including locations, artworks, and even unidentified creatures.

Users can access this capability by visiting images.google.com and clicking on the camera icon, making it easy to start a search using any image. This functionality is not limited to images found on the web; users can also upload photos from their personal collections, such as those taken during holidays. This versatility opens up new opportunities for users to discover and learn more about the things they encounter visually.

To enhance the efficiency of this feature, the video suggests using browser extensions available for Chrome and Firefox, which streamline the process of searching by image. By leveraging this tool, every image becomes a gateway to further exploration, examination, and discovery, transforming the way users interact with visual information online.

Sound Retrieval

Sound possesses specific attributes, such as pitch and interval, that distinguish it from other types of information. Ideally, one would input a sequence of sounds, such as by singing or playing a melody into a computer, and retrieve similar sequences. However, accurately determining the start and end of a melody is a task that humans excel at, and current content-based music retrieval systems are not very effective. Consequently, sound recordings are usually indexed and searched by associated words, such as the title of a song, a description of the sound (e.g., railway engines), or the name of the composer, performer, or group.

 

2.2 Other tools and systems

Why do information centres need to share metadata?
Information centres share metadata due to the considerable advantages it offers, including economic benefits and enhanced efficiency and accessibility of their collections.

How do information centres share metadata?
Metadata is shared by creating it according to agreed standards for reuse in different systems and transmitting it through standard protocols.

Library record sharing
The most notable example of organised metadata sharing is in library catalogue records. Librarians have long understood the economic benefits of duplicating records from each other’s catalogues, which saves the cost of creating new records. Initially, this was done with catalogue cards, but now electronic files, usually in MARC (MAchine Readable Cataloguing) format, are distributed via the Internet.

The sharing of catalogue records has driven the standardisation of library cataloguing for over a century. In the twentieth century, the concept of Universal Bibliographic Control (UBC) was introduced, aiming to have every published book catalogued once, with the resulting record available to any library worldwide. Although UBC has not been fully achieved, the library community has reached impressive levels of cooperation, greatly benefiting both librarians and patrons.

Read

‘Library catalogue records’ of Chapter 6, ‘Sharing metadata’ (pp. 111-116), in Hider, P. (2018). Information resource description: Creating and managing metadata (2nd ed.). Facet Publishing.

Library Catalogue Records

Before computers, one of the most common information retrieval systems was the library card catalogue. Librarians saw the potential efficiency gains from copying and distributing record cards for use in multiple catalogues. However, few libraries could mass-produce their catalogue cards, so distribution was usually one-way. The Library of Congress (LC) began its card distribution service in 1901 and has been a leading record supplier since then (Yee, 2009). Other national libraries, such as the British Library, also started providing catalogue copies to their library communities, while some companies like H. W. Wilson in the USA supplied cards commercially. Some libraries sent their copies to suppliers like LC for editing and distribution.

The computer revolutionised the library catalogue and the sharing of its records. MARC records were developed by LC in the mid-1960s to facilitate record distribution. Soon after, libraries began building the first automated library systems with bibliographic record databases at their core. Initially, these databases were shared by library consortia due to the high costs involved. A significant example is the Ohio College Library Center, founded in 1967, which later became the Online Computer Library Center (OCLC). Its bibliographic database is now the world’s largest repository of library catalogue records.

The shared databases were initially populated with machine-readable records from primary suppliers like LC. However, these records rarely covered all of a library’s holdings, so libraries started contributing their records to the consortium, allowing others to use them. This marked the beginning of earnest library catalogue record exchange.

Bibliographic Networks

In the 1970s and 1980s, as automated systems became more affordable, many libraries acquired their own systems but continued to use shared databases. These databases became hubs of bibliographic networks, linking libraries through a central database for record copy and exchange. These networks subscribed to the services of major record suppliers on behalf of their member libraries. Initially, records were downloaded via tape, but now they are downloaded via the internet. When a record is not found, a cataloguer creates a record in the central database or their local system, uploading it for other member libraries to use. If all holdings are represented in the central database, it effectively becomes a union catalogue.

As library automation expanded through the 1980s and 1990s, more libraries joined bibliographic networks, fostering ‘co-operative cataloguing’. OCLC led the way, but other networks also emerged, often catering to specific library types or regions. National networks, such as the Australian Bibliographic Network (now Libraries Australia), were established. University and research libraries in the UK formed the Consortium of University and Research Libraries, now Research Libraries UK (RLUK), and the USA established the Research Libraries Group (RLG), which later merged with OCLC. School libraries in Australia and New Zealand are served by the Schools Catalogue Information Service (SCIS).

As networks grew, so did their databases, with many libraries contributing to the growth of MARC records from thousands to millions. Major bibliographic databases included RLIN, UTLAS, and WLN. The vast coverage of these databases exemplifies successful library cooperation.

With faster and cheaper internet connections in the 1990s and 2000s, regional bibliographic databases became less necessary, leading to consolidation and mergers with larger networks. Some networks diversified into other areas like software development and training. Most English-speaking libraries are now served by a few large databases, with OCLC becoming increasingly dominant. OCLC’s WorldCat, containing over 400 million bibliographic records, has over 16,000 institutional members from 124 countries. Newer companies like LibLime and SkyRiver offer alternative record supply options, but OCLC remains dominant.

A public version of WorldCat is available online, and some networks also have their databases available as union catalogues. However, some networks use a distributed model, where libraries download and upload records directly from each other’s systems. Despite seeming efficient, this model can face issues like traffic congestion and less precise searches. A central database pool is more common, with some networks using a hybrid model.

Standardisation is crucial for sharing catalogue records. Initially, records were distributed on standard-sized cards, and machine-readable records had to be standardised for different computers. The MARC format, developed by LC, is widely used today, allowing computer systems to import catalogue records and indicating how metadata should be indexed and displayed. Libraries also agree on a minimum set of elements and content standards for their catalogue records.

Libraries in bibliographic networks agree to broader standards and policies, covering database access and contributions to maintain an accurate union catalogue. However, libraries can still follow local cataloguing policies as long as they do not alter the master record in the network’s database.

Sharing catalogue records has enabled libraries to provide detailed catalogues and extend their reach. The goal of coordinating cataloguing to account for the world’s published knowledge continues, supported by legal deposit laws. The Program for Cooperative Cataloging (PCC), administered by LC, pools resources among elite cataloguing departments, sharing detailed records and contributing to authority files. PCC’s mission is to support metadata producers and promote the discovery and use of knowledge.

Libraries now supply records to search and social media companies to make their catalogues more accessible. OCLC indexes WorldCat on search engines like Google and social cataloguing sites like Goodreads. Libraries aim to maximise the visibility and use of their metadata, with some publishing it as linked data.

Chadwick, B (2015). SCIS is more. Connections, 2015(92), 12. https://www.scisdata.com/connections/issue-92/scis-is-more/

The issue highlights various aspects of SCIS (Schools Catalogue Information Service), focusing on its benefits, features, and the importance of maintaining up-to-date subscriptions and contact details.

Cover Images

SCIS offers access to cover images for over 500,000 publications. These images can be utilised in library management systems, school websites, blogs, wikis, newsletters, and intranet platforms.

Community Support

A SCISWeb subscription provides access to professional development webinars, workshops, the SCIS blog, and social media channels. Additionally, some government school systems and Catholic Dioceses coordinate access to SCIS for their schools.

Subscriber Management

Current subscribers are advised to update passwords and contact details for security reasons. The process for changing passwords and editing contact details is straightforward and can be done through the SCISWeb My Profile page.

Time-Saving Benefits

The issue emphasises the time-saving benefits of SCIS. For instance, a typical school can save hundreds of hours annually by subscribing to SCIS, which provides ready-to-upload catalogue records instead of creating them from scratch.

Breadth of Resources

SCIS boasts over 1.35 million records, making it the largest database of school-related catalogue records in the southern hemisphere. It focuses on educational resources, including electronic materials like websites, DVDs, online videos, digital learning objects, and eBooks.

Quality of Records

SCIS adheres to strict cataloguing standards to ensure high-quality bibliographic records and authority control. This high standard minimises duplications and ensures data integrity, which is crucial for library staff.

Educational Value-Add

SCIS metadata caters specifically to the education market, providing Dewey Decimal classifications and subject headings suitable for students and teachers. The SCIS Subject Headings and Schools Online Thesaurus (ScOT) align with key curriculum concepts and support the classification of fiction resources by genre.

SCIS as a Selection Tool

The SCIS catalogue is a valuable resource for selecting curriculum-relevant materials. Enhanced content services in the SCIS OPAC, such as plot summaries, author notes, awards, and reviews, further assist in resource selection.

 

Z39.50

In addition to content, format, and vocabulary standards, the sharing of catalogue records necessitates transmission standards. Computers must be capable of not only processing but also receiving MARC records. Basic internet protocols such as FTP (File Transfer Protocol) and HTTP (Hypertext Transfer Protocol) are helpful, alongside specialised protocols for downloading bibliographic records in formats like MARC. One crucial protocol is the North American standard ANSI/NISO Z39.50-1995, commonly known as Z39.50.

The Z39.50 protocol is a set of rules facilitating data communication, specifically for the search and retrieval of bibliographic data from remote information retrieval systems. It is extensively used in libraries, museums, and government information centres. Z39.50 standardises the search format, enabling both the client (e.g., a local library’s catalogue) and the server (e.g., a bibliographic database in another location) to understand it.

Read

Z39.50′ (p. 149), in Hider, P. (2018). Information resource description: Creating and managing metadata (2nd ed.). Facet Publishing.

Z39.50 is a client-server protocol that allows users to have the computer search for MARC records. First established in 1988, the most recent version was updated in 2003 and the Z39.50 standard is maintained by the Library of Congress.

Chadwick, B. (2015). SCIS is more. Connections, 2015(93), 12.  https://www.scisdata.com/connections/issue-93/scis-is-more/

Z39.50 is an application layer network protocol designed for information retrieval in bibliographic databases, covered by ANSI/NISO Z39.50 and ISO 23950 standards, and maintained by the Library of Congress. It facilitates data sharing between library management systems (LMS) and databases worldwide. Despite its development in the 1970s, Z39.50 remains widely used in the library industry.

Beyond Z39.50

Specifically designed for MARC records, Z39.50 does not work well in the wider computing environment, with other means, such as OpenURL being used.

Read

SRU and CQL’ on p. 150 and ‘OpenURL’ on p. 154, in Hider, P. (2018). Information resource description: Creating and managing metadata (2nd ed.). Facet Publishing.

SRU

Also maintained by LC, SRU, or Search/Retrieve by URL allows applications to communicate elsewhere besides libraries. Again, this is XML focussed and uses CQL for searches.

OpenURL

OpenURL is mostly used to link records for online journals and articles across databases

Metadata Mapping

Libraries are keen to share metadata due to the commonality of their collections. This practice is less prevalent in archives and museums, although it can still benefit them. Sharing metadata benefits both the provider and the recipient by increasing the likelihood of the metadata being discovered online, thereby enhancing the resource’s usage.

There are several methods to distribute metadata online. One effective strategy is to ensure that metadata is indexed by major search engines like Google, as well as making metadata accessible through a variety of protocols, not just Z39.50. The Open Archives Initiative Protocol for Metadata Harvesting (OAIPMH), which is simpler and based on Dublin Core, is commonly used in the digital library and archives community.

Read

‘Metadata from repositories, archives and museums’ (pp.116-117) and ‘OAI-PMH’ (p. 154), in Hider, P. (2018). Information resource description: Creating and managing metadata (2nd ed.). Facet Publishing.

Metadata from repositories, archives and museums

Many organisations are keen to promote assets online. In the same way that MARC records are used for library cataloguing, search engines and database systems can be used to retrieve resource information, with scholastic resource companies proting their products through more partnerships, such as with Google or Google Books.

OAI-PMH

The Dublin Core alternative to Z39.50 is the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), which has been developed with data and service providers.

Interoperability

There are numerous standards available for metadata specialists, and it is unlikely that a single set of standards will be universally adopted in the near future. Therefore, retrieval systems must be capable of receiving and processing metadata from various protocols and formats to ensure interoperability. Crosswalks are essential for this process, as they map one metadata format onto another. For instance, if records are downloaded in both International Standard Archival Description (ISAD(G)) and MARC21 formats, crosswalks enable users to search these records simultaneously by mapping the two formats together.

ISAD(G) elements can be mapped with MARC 21 element
Title 245
Extent and medium of description 300
Scope and content 520

Significant work has been done to develop crosswalks between various standards, with several websites publishing lists of these mappings, such as Michael Day’s Metadata: Mapping between metadata formats. This work has made substantial progress towards achieving interoperability, and potentially, the development of the Semantic Web.

Read

‘Interoperability’ (pp. 117-119), in Hider, P. (2018). Information resource description: Creating and managing metadata (2nd ed.). Facet Publishing.

Interoperability

The sharing of metadata between information retrieval systems relies on the cooperation of the involved parties. For metadata to be interoperable across different systems, it either needs to be standardised or converted. In communities with a shared practice, such as libraries, agencies can often agree on various standards. However, when this is not possible, metadata can still be shared by converting it to a format that the host system can use. As information agencies increasingly operate across domains, interest in converting metadata for interoperability has grown, leading to the common practice of retrieval systems handling multiple standards and formats.

Metadata conversion can occur at the format, element, or value level and often at both format and element levels, sometimes involving all three. Applications based on generic mappings convert incoming metadata into a format that the host system can process effectively. These mappings of different element sets are sometimes called crosswalks. Unfortunately, mapping across metadata standards is rarely a straightforward one-to-one process. Different element sets cover different attributes and may represent the same attributes in varying ways. Consequently, an element in one set may map to several elements in another scheme or vice versa. Similar to language translation, crosswalks can be imprecise, and different versions can be equally valid. They work best when the standards have been developed within the same community of practice (Godby, Smith and Childress, 2003).

A common example of a crosswalk is the conversion of MARC records into the simpler Dublin Core (DC) format. Many fields in MARC are not covered in DC, leading to potential data loss. Additionally, some DC elements may correspond to several MARC fields, which reduces the richness of the metadata. For instance, MARC distinguishes between authors, co-authors, and corporate authors, which may all map to the Creator element in DC, resulting in a loss of specific information. Some MARC fields partially map to DC elements; for example, the 700 field in MARC may represent a co-author (mapped to Creator in DC) or other associated individuals (possibly mapped to Publisher in DC), making the mapping an approximation.

Adhering to metadata standards remains crucial when using crosswalks. If imported metadata closely follows a particular standard, a crosswalk can function reasonably well. Otherwise, significant manual editing may be required. Fortunately, many domains now have well-established metadata standards, not just in librarianship. These standards are increasingly being adopted by metadata creators, and crosswalks between them are being established and implemented more frequently.

Leave a Reply

Your email address will not be published. Required fields are marked *