METS / ALTO

What is METS ?

METS (Metadata Encoding and Transmission Standard) is a standard that allows the exchange of digitized documents between heritage institutions. It has been developed following the initiative of the Digital Library Federation (DLF) and is an implementation of the OAIS reference model (Model for an Open Archival Information System). Currently, the library of Congress in the United States of America is responsible for the maintenance of the METS schema.

METS is an XML schema for the creation of digital objects. A digital object can be simple or complex, can consist of one or more digital files, which can be in different formats and describe detailed internal structure.

What is ALTO ?

ALTO (Analyzed Layout and Text Object) is an XML standard created as a result of the European project METAe and is designed to represent a physical document in terms of page layout, word positions and much more.

It is used to store information about the content and layout of physical documents. In particular, it is especially well suited to represent OCR results. The BnL uses ALTO is used together with METS. Each digitized page is represented by one ALTO file. ALTO files are responsible for the contents of individual pages and METS is responsible for the metadata, structural information and links between external files.

How does the BnL use METS / ALTO?

METS is an XML schema for the creation of digital objects. A digital object can be simple or complex, can consist of one or more digital files, which can be in different formats and describe detailed internal structure.

The BnL created clear technical requirements and guideline on how to use METS / ALTO.

METS File

Each document (newspaper issue or monograph) is modeled in 1 METS file. The file contains metadata, file sections as well as the physical and logical structures. The logical structure follows closely the requirements of the BnL. The METS file describes the relationship between the ALTO, PDF, TIFF, PNG and JPG files.

ALTO Files

Each scanned page goes through an OCR engine and the result is stored into the ALTO files via text blocks, lines and individual words with coordinates. The text blocks are linked inside the METS file.

PDF Files

We also have PDF files of every page and one PDF of the entire document also containing the table of content. All PDFs contains the full text as an overlay and is fully searchable and selectable.

Original Images

Each page of the document is scanned and saved as a TIFF file with a resolution of 300 PPI. Since 2018, the BnL controls the quality of images using ISO/TS-19264-1.

ISO/TS-19264-1
Black & White Images

Next to the other images, a high contrast black and white PNG image for each page is generated out of the original TIFF.

Thumbnails

Next to the other images, a smaller JPEG thumbnail for each page is generated as well.

BnL’s Technical Requirements

The National Library of Luxembourg wrote a complete technical document that describes all the requirements for all its digitization projects. This includes aspects such as transport, image quality, rules for the logical structure, metadata requirements and contains numerous examples.

Download BnL’s Latest Technical Requirements (21MB)