Module: Relaton::Bib::Sanitizer

Defined in:
lib/relaton/bib/sanitizer.rb

Overview

Strips inline markup not in the basicdoc PureTextElement set (plus <p>, <eref>, <xref>, <fn>) from raw marked-up content strings. Disallowed elements are unwrapped: tags removed, inner text kept.

<fn> is admitted beyond strict PureTextElement because bibliographic titles in real Metanorma input routinely carry footnotes (e.g. ISO standards titles with a disclaimer footnote), and downstream consumers — notably relaton-render’s own inline-tag allow-list —already accept <fn> as a legitimate child of <title>. Stripping it here would break the round-trip.

Constant Summary collapse

ALLOWED =
%w[
  em strong sub sup tt underline strike smallcap br stem
  p eref xref fn
].freeze
RENAME =
{
  "italic" => "em",
}.freeze
TAG_RX =
%r{<[a-zA-Z/!?]}

Class Method Summary collapse

Class Method Details

.sanitize(content) ⇒ Object



27
28
29
30
31
32
33
34
35
# File 'lib/relaton/bib/sanitizer.rb', line 27

def self.sanitize(content)
  return content unless sanitizable?(content)

  fragment = Nokogiri::XML::DocumentFragment.parse(content)
  return content if fragment.errors.any?

  sanitize_children(fragment)
  fragment.children.map { |c| c.to_xml(encoding: "UTF-8") }.join
end