The niso-jats library is a parser and generator for
NISO JATS XML documents.
Supported versions
This library targets NISO JATS XML v1.2 (and v1.3) documents. It models the full Journal Publishing Tag Set element vocabulary. The model layer is version-agnostic: a single unified set of Ruby classes covers all JATS versions from 1.0 through 1.3.
Installation
Add this line to your application’s Gemfile:
gem 'niso-jats'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install niso-jats
Usage
Parsing a NISO JATS XML document
Extracting bibliographic data
The Article class provides convenience methods for common lookups:
article.dtd_version # => "1.1"
article.journal_title # => "Metrologia"
article.doi # => "10.1088/0026-1394/52/1/155"
article.contributors
# => [#<Niso::Jats::Contrib>, #<Niso::Jats::Contrib>, ...]
article.affiliation("aff1")
# => [#<Niso::Jats::Aff>]
article.pub_dates
# => ["2015-02-01", "2015-01-30"]
article.locality
# => [["volume", "52"], ["issue", "1"], ["page", "155", "162"]]
article.doi_links
# => [{ content: "https://doi.org/10.1088/0026-1394/52/1/155", type: "src" },
# { content: "https://doi.org/10.1088/0026-1394/52/1/155", type: "doi" }]
Working with contributors
The Contrib class provides filtering on cross-references:
contrib = article.contributors.first
contrib.contrib_type # => "author"
contrib.aff_xrefs # => [#<Niso::Jats::Xref ref_type="aff" rid="aff1">]
Accessing nested elements
You can navigate the full JATS element tree:
# Journal metadata
article.front..journal_title_group.journal_title.content
# => "Metrologia"
# Article metadata
article.front..title_group.article_title.content
# => "News from the BIPM laboratories: 2014"
# Permissions (copyright, license)
perms = article.front..
perms.copyright_statement.content # => "© 2015 BIPM & IOP Publishing Ltd"
perms.license.license_type # => "iop-standard"
# History dates
article.front..history.date.each do |date|
puts "#{date.date_type}: #{date.year}-#{date.month}-#{date.day}"
end
# => received: 2014-12-22
# revised: 2014-12-22
# accepted: 2014-12-22
Generating a NISO JATS XML document
puts article.to_xml(declaration: true, encoding: "utf-8")
Working with inline text and mixed content
Elements that contain both text and child elements (mixed content) are
supported via the mixed_content feature. For example, affiliations
often contain inline text with embedded <institution> or <country>
elements:
aff = article.front..contrib_group.first.aff.first
aff.content
# => "\n1\nBureau International des Poids et Mesures (BIPM), Pavillon de Breteuil, 92312 CEDEX, Sèvres, France\n"
Test fixtures
Round-trip tests parse XML fixtures, serialize them back, and verify
semantic equivalence using the canon gem.
Source fixtures
spec/fixtures/niso-jats/-
A git submodule of the official NCBI NISO JATS repository (
github.com/ncbi/niso-jats). Contains official DTD distributions and sample XML documents (Smallsamples/) across all JATS tags (archiving, article authoring, publishing) and versions 0.4 through 1.2d1. Versions 0.4 and 1.1d3 are skipped in tests. spec/fixtures/metrologia/-
Real-world Metrologia journal articles (IOP Publishing) obtained from the
github.com/relaton/rawdata-bipmrepository. These exercise JATS patterns not covered by the NISO samples:<collab>for team authors,<email>inside<contrib>, mixed-content<aff>with inline<institution>and<country>,<author-notes>,<custom-meta-group>,<elocation-id>, and IOP-specific license attributes. spec/fixtures/bmj_sample.xml-
BMJ journal article (JATS 1.3) exercising
<processing-meta>,<article-version>, CRediT<role>elements, and<institution-wrap>with ISNI/DOI identifiers. spec/fixtures/pnas_sample.xml-
PNAS journal article (JATS 1.3) exercising
<processing-meta>,<article-version>, CRediT roles,<contrib contrib-type="software">, and<prefix>name parts. spec/fixtures/element_citation.xml-
Small fragment for testing
<element-citation>with xlink attributes.
How the tests work
The shared example "a serializer" in spec/support/shared_examples.rb
performs the round-trip:
-
Parse the fixture with
Article.from_xml(input) -
Serialize back with
to_xml(declaration: true, encoding: "utf-8") -
Strip spurious empty
<oasis:table/>elements (known OASIS table workaround) -
Compare input and output for semantic equivalence using
be_xml_equivalent_tofrom thecanongem (with C14N preprocessing)
OASIS table fixtures (paths containing "oasis") use a weaker assertion:
the output must be re-parseable without error, but is not compared
element-by-element, because the duplicate <table> mapping in
TableWrap causes duplicated entry elements.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/metanorma/niso-jats.
Copyright and license
The gem is available as open source under the terms of the MIT License.
All rights reserved. Ribose