About BookMetaHub

About BookMetaHub


We have built a new environment with a free interface to create, maintain and enrich, or export available open book metadata.

Open metadata for books is essential for the transformation of the whole scholarly landscape, and one of its greatest advantages is full immediate accessibility.

The new BookMetaHub was created to enable institutional and academic publishers, libraries, or university presses to easily maintain and enrich their metadata. These stakeholders can have free access to a state-of-the-art system to create from scratch or enrich available input via an easy-to-use user interface storing data in versatile BITS XML format. Vice versa, an open API allows for data export to facilitate distribution across various databases and repositories and will guarantee compliance with common standards and best practices.



Back in 2019 we started working with books content and have now over 6 million book and chapter records. However, coming initially from journal indexing, we rather quickly had to learn that data and corresponding data formats in use for sharing book content that is out there in the book publishing industry have a qualitative and quantitative downside in comparison to available article content—which is that they were not primarily created for the usage of indexing in a digital research environment and speaking of ONIX, not created only for books but covering a great variety of other media formats as well.

So arguably, even though MARC and ONIX records are still the most common data resource for book indexing, they are not necessarily or inherently well-suited for this purpose. Originally intended for library catalogs or distribution channels, the main objective is to have all available formats listed with respect to the individual physical manifestation in actual print. Of course, also these formats are being updated constantly to adjust to recent developments and to be able to answer to shifting needs in indexing. However records oftentimes do not even reference themselves. This can lead to a number of problematic consequences, such as missing digitalized and persistent bibliographic data and/or fragmentary portability or interoperability.

So, book publications overall not only still lack visibility within an electronic environment but the whole publishing landscape around books seems to suffer more from a tendency towards a non-standardized, and as such potentially more error-prone, communication between various indexing systems and respective data formats. As an indexer and research database our aim is always to consolidate datasets—that is bringing together formats under one header—and most importantly, we want to emphasize the version of record side-by-side with further, and potentially freely accessible, versions.

Therefore, our focus is greatly different: Instead of accentuating the variety we aim to unify the records. As a fundamental prerequisite we require persistent identifiers for creating constant linkages in a digital world. Most importantly, we need stable link-outs to the actual content pages (this is a huge qualitative difference!), ideally DOIs; and obviously we need license information to show open access content as freely accessible. Many times, licenses are not provided, or if they are, they are not clearly tagged in a machine-readable format—which makes it basically lost OA content. Persistent IDs are not necessarily inputted in MARC or ONIX (even if the data structure is there); however, and this is important to stress, those perceived gaps in data are oftentimes due to a different intended purpose for creating the content or simply due to an unawareness regarding the importance of adding those essential elements.

Our idea was to change the status quo and help publishers upgrade their (“analog”) metadata coming from print to e-metadata 2.0 for the digital world.

How does BookMetaHub work?

Put plainly, we have created a free interface for publishers to enhance their metadata to make it compliant to digital environment requirements, and to enrich and standardize it to increase interoperability for easy sharing and re-use for other platforms.

The new BookMetaHub is able to ingest a variety of formats, esp. here of course ONIX 2.1 and 3.0, Crossref metadata, or BITS XMLs. Publishers can then enhance and enrich the metadata according to a selected output and its format-specific requirements. For those who have neither at hand, they can input information directly via the User Interface.

Designated output is primarily PubMed compatible BITS XMLs, and Crossref-compatible XMLs for easy content destribution and DOI + metadata registration. For further output formats such as ONIX, MARC, KBART, or JSON, we set up an easy data transfer between the BookMetaHub and the Thoth system.

The overall objective is to make use of the variety of formats and enhance them so that they can be reused for digital indexing, to help solidify and standardize relevant metadata elements, to upgrade records for usage in an e-environment.


Our approach to bridge those data gaps is therefore threefold:
  • First, data is uploaded via Crossref or DataCite DOI or ONIX and relevant elements for the purpose of indexing are stored in the database. For best results and a richer start set of metadata, Crossref queried records can be updated with ONIX (e.g. for cover images). Alternatively, records can be created by direct/manual input via the interface.
  • The next step is format-specific enhancement to make the records a suitable data source for indexing in digital environments. Via the interface book and chapter metadata can also be created from scratch for those who have no book records at hand.
    • For example, a source ONIX record could be uploaded, enriched with detailed affiliation data, incl. also ORCID IDs, FundRef IDs, DOIs, translated titles or abstract information, keywords, etc.
  • Lastly, enhanced records can be outputted as BITS XML to be sent to other databases or to Crossref to make the freely available metadata set richer.


An Overview of Essential Elements as Part of the Metadata to Boost Book and Chapter Discoverability:
  • DOI (Digital Object Identifier)
    • Track publications throughout their lifecycle of various formats, editions, platforms, or versions
  • Chapter-DOIs – connected to a book title
    • Browse chapter pages with rich metadata connected via a TOC menu
    • Chapter-DOIs as additional links back to your content
  • OA licenses
    • Machine-readability to guarantee OA publications will be detected across the landscape
  • Copyright details
  • Book-level abstracts (in English and original language of publication)
    • Help to reach wider audiences and researchers to evaluate best fits
    • Abstracts as essential data for many machine-learning and AI systems
  • Chapter-level abstracts (in English and original language of publication)
    • Easy enrichment on all book-levels
  • Funding details (Funder, Funder-IDs, Grant no.)
  • Institution/Affiliation
    • Add more insight to the context around a publication
  • ORCID ID (authors/editors)
    • Add persistent IDs
  • Open references
    • Why keep them under wraps—increase your citation metrics instead
  • Open Reviews
    • Add transparency to the workflow
    • Credit the review community
  • Open Data Linking
    • Better reproducibility
    • Better transparency
    • Less redundancy