The niso-jats library is a parser and generator for NISO JATS XML documents.

Supported versions

This library targets NISO JATS XML v1.2 (and v1.3) documents. It models the full Journal Publishing Tag Set element vocabulary. The model layer is version-agnostic: a single unified set of Ruby classes covers all JATS versions from 1.0 through 1.3.

Installation

Add this line to your application’s Gemfile:

gem 'niso-jats'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install niso-jats

Usage

Parsing a NISO JATS XML document

require 'niso/jats'

article = Niso::Jats::Article.from_xml(File.read('article.xml'))

Extracting bibliographic data

The Article class provides convenience methods for common lookups:

article.dtd_version   # => "1.1"
article.journal_title # => "Metrologia"
article.doi           # => "10.1088/0026-1394/52/1/155"

article.contributors
# => [#<Niso::Jats::Contrib>, #<Niso::Jats::Contrib>, ...]

article.affiliation("aff1")
# => [#<Niso::Jats::Aff>]

article.pub_dates
# => ["2015-02-01", "2015-01-30"]

article.locality
# => [["volume", "52"], ["issue", "1"], ["page", "155", "162"]]

article.doi_links
# => [{ content: "https://doi.org/10.1088/0026-1394/52/1/155", type: "src" },
#     { content: "https://doi.org/10.1088/0026-1394/52/1/155", type: "doi" }]

Working with contributors

The Contrib class provides filtering on cross-references:

contrib = article.contributors.first
contrib.contrib_type  # => "author"
contrib.aff_xrefs    # => [#<Niso::Jats::Xref ref_type="aff" rid="aff1">]

Accessing nested elements

You can navigate the full JATS element tree:

# Journal metadata
article.front.journal_meta.journal_title_group.journal_title.content
# => "Metrologia"

# Article metadata
article.front..title_group.article_title.content
# => "News from the BIPM laboratories: 2014"

# Permissions (copyright, license)
perms = article.front..permissions
perms.copyright_statement.content # => "© 2015 BIPM & IOP Publishing Ltd"
perms.license.license_type        # => "iop-standard"

# History dates
article.front..history.date.each do |date|
  puts "#{date.date_type}: #{date.year}-#{date.month}-#{date.day}"
end
# => received: 2014-12-22
#    revised: 2014-12-22
#    accepted: 2014-12-22

Generating a NISO JATS XML document

puts article.to_xml(declaration: true, encoding: "utf-8")

Working with inline text and mixed content

Elements that contain both text and child elements (mixed content) are supported via the mixed_content feature. For example, affiliations often contain inline text with embedded <institution> or <country> elements:

aff = article.front..contrib_group.first.aff.first
aff.content
# => "\n1\nBureau International des Poids et Mesures (BIPM), Pavillon de Breteuil, 92312 CEDEX, Sèvres, France\n"

Test fixtures

Round-trip tests parse XML fixtures, serialize them back, and verify semantic equivalence using the canon gem.

Source fixtures

spec/fixtures/niso-jats/

A git submodule of the official NCBI NISO JATS repository (github.com/ncbi/niso-jats). Contains official DTD distributions and sample XML documents (Smallsamples/) across all JATS tags (archiving, article authoring, publishing) and versions 0.4 through 1.2d1. Versions 0.4 and 1.1d3 are skipped in tests.

spec/fixtures/metrologia/

Real-world Metrologia journal articles (IOP Publishing) obtained from the github.com/relaton/rawdata-bipm repository. These exercise JATS patterns not covered by the NISO samples: <collab> for team authors, <email> inside <contrib>, mixed-content <aff> with inline <institution> and <country>, <author-notes>, <custom-meta-group>, <elocation-id>, and IOP-specific license attributes.

spec/fixtures/bmj_sample.xml

BMJ journal article (JATS 1.3) exercising <processing-meta>, <article-version>, CRediT <role> elements, and <institution-wrap> with ISNI/DOI identifiers.

spec/fixtures/pnas_sample.xml

PNAS journal article (JATS 1.3) exercising <processing-meta>, <article-version>, CRediT roles, <contrib contrib-type="software">, and <prefix> name parts.

spec/fixtures/element_citation.xml

Small fragment for testing <element-citation> with xlink attributes.

How the tests work

The shared example "a serializer" in spec/support/shared_examples.rb performs the round-trip:

  1. Parse the fixture with Article.from_xml(input)

  2. Serialize back with to_xml(declaration: true, encoding: "utf-8")

  3. Strip spurious empty <oasis:table/> elements (known OASIS table workaround)

  4. Compare input and output for semantic equivalence using be_xml_equivalent_to from the canon gem (with C14N preprocessing)

OASIS table fixtures (paths containing "oasis") use a weaker assertion: the output must be re-parseable without error, but is not compared element-by-element, because the duplicate <table> mapping in TableWrap causes duplicated entry elements.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/metanorma/niso-jats.

The gem is available as open source under the terms of the MIT License.

All rights reserved. Ribose