About BookMetaHub

About BookMetaHub


We have built a new environment with a free interface to create, maintain and enrich, or export available open book metadata.

Open metadata for books is essential for the transformation of the whole scholarly landscape, and one of its greatest advantages is full immediate accessibility.

The new BookMetaHub was created to enable institutional scientific publishers, libraries, or university presses to easily maintain and enrich their metadata. These stakeholders will be able to make use of a state-of-the-art system to create from scratch or enrich available input via an easy-to-use user interface storing data in versatile BITS XML format. Vice versa, an API will allow for data export in customary industry standards, such as BITS, MARC, ONIX, or Crossref, to facilitate compatibility with an extensive distribution across various databases and repositories and will guarantee compliance with common standards and best practices.



Back in 2019 we started working with books content and have now over 6 million book and chapter records. However, coming initially from journal indexing, we rather quickly had to learn that data and corresponding data formats in use for sharing book content that is out there in the book publishing industry have a qualitative and quantitative downside in comparison to available article content—which is that they were not primarily created for the usage of indexing in a digital research environment and speaking of ONIX, not created only for books but covering a great variety of other media formats as well.

So arguably, even though MARC and ONIX records are still the most common data resource for book indexing, they are not necessarily or inherently well-suited for this purpose. Originally intended for library catalogs or distribution channels, of course, the main objective is to have all available formats listed with respect to the individual physical manifestation in actual print. Sometimes those individual records do not even reference themselves. This can lead to a number of problematic consequences, such as missing digitalized and persistent bibliographic data and/or fragmentary portability or interoperability.

So, book publications overall not only still lack visibility within an electronic environment but the whole publishing landscape around books seems to suffer more from a tendency towards a non-standardized, and as such potentially more error-prone, communication between various indexing systems and respective data formats. As an indexer and research database our aim is always to consolidate datasets—that is bringing together formats under one header—and most importantly, we want to emphasize the version of record side-by-side with further, and potentially freely accessible, versions.

Therefore, our focus is greatly different: Instead of accentuating the variety we aim to unify the records. As a fundamental prerequisite we require persistent identifiers for creating constant linkages in a digital world. Most importantly, we need stable link-outs to the actual content pages (this is a huge qualitative difference!), ideally DOIs; and obviously we need license information to show open access content as freely accessible. Many times, licenses are not provided, or if they are, they are not clearly tagged in a machine-readable format—which makes it basically lost OA content. Persistent IDs are not necessarily inputted in MARC or ONIX (even if the data structure is there); however, and this is important to stress, those perceived gaps in data are not the result of missing understanding but of a different original purpose.

Our idea was to change the status quo and help publishers upgrade their (“analog”) metadata coming from print to e-metadata 2.0 for the digital world.

How does BookMetaHub work?

Put plainly, we have created a free interface for publishers to enhance their metadata to make it compliant to digital environment requirements and to enrich and standardize it, to increase interoperability for easy sharing and re-use for other platforms.

The new BookMetaHub is able to ingest a variety of formats, esp. here of course ONIX 2.1 and 3.0, Crossref metadata, or BITS XMLs. Publishers can then enhance and enrich the metadata according to a selected output and its format-specific requirements. For those who have neither at hand, they can input information directly via the User Interface.

Designated output is primarily PubMed compatible BITS XMLs, but also ONIX 3.0 as the current ONIX standard, and Crossref XMLs. We will also add a MARC XML implementation later this year to be able to still service the librarians and libraries, as MARC records sometimes are rather slim and could benefit from additional or translated abstracts, keywords, and the like to provide more context information to the publication.

The overall objective is to make use of the variety of formats and enhance them so that they can be reused for digital indexing, to help solidify and standardize relevant metadata elements, to upgrade records for usage in an e-environment.


Our approach to bridge those data gaps is therefore threefold:
  • First, data is uploaded and relevant elements for the purpose of indexing are stored in the database to keep the records viable in their original format; alternatively, records can be created by direct/manual input via the interface.
  • The next step is format-specific enhancement. Building upon the interface we have implemented this year to input and create book metadata from scratch for those who have no book records at hand, we created a tabular structure, so to speak format-specific input pages listing required elements to be added to the given source content to make it better workable in a digital environment, or to make it transformable into a different output format for the same purpose.
  • For example, a source ONIX record could be uploaded, enriched with detailed affiliation data, incl. also ORCID IDs, FundRef IDs, DOIs, translated titles or abstract information, keywords, etc.
  • Lastly, enhanced records can be outputted as BITS XML to be sent to other databases or to CR to make the freely available metadata set richer.


A preview of essential additions to the metadata set as envisioned to boost open access book discoverability:
  • DOI (Digital Object Identifier)
    • Track publications throughout their lifecycle of various formats, editions, platforms, or versions
  • Chapter-DOIs – connected to a book title
    • Browse chapter pages with rich metadata connected via a TOC menu
    • Chapter-DOIs as additional links back to your content
  • OA licenses
    • Machine-readability to guarantee OA publications will be detected across the landscape
  • Copyright details
  • Book-level abstracts (in English and original language of publication)
    • Help to reach wider audiences and researchers to evaluate best fits
    • Abstracts as essential data for many machine-learning and AI systems
  • Chapter-level abstracts (in English and original language of publication)
    • Easy enrichment on all book-levels
  • Funding details (Funder, Funder-IDs, Grant no.)
  • Institution/Affiliation
    • Add more insight to the context around a publication
  • ORCID ID (authors/editors)
    • Add persistent IDs
  • Open references
    • Why keep them under wraps—increase your citation metrics instead
  • Open Reviews
    • Add transparency to the workflow
    • Credit the review community
  • Open Data Linking
    • Better reproducibility
    • Better transparency
    • Less redundancy