Module: Relaton::Bib::Sanitizer
- Defined in:
- lib/relaton/bib/sanitizer.rb
Overview
Strips inline markup not in the basicdoc PureTextElement set (plus <p>, <eref>, <xref>, <fn>) from raw marked-up content strings. Disallowed elements are unwrapped: tags removed, inner text kept.
<fn> is admitted beyond strict PureTextElement because bibliographic titles in real Metanorma input routinely carry footnotes (e.g. ISO standards titles with a disclaimer footnote), and downstream consumers — notably relaton-render’s own inline-tag allow-list —already accept <fn> as a legitimate child of <title>. Stripping it here would break the round-trip.
Constant Summary collapse
- ALLOWED =
%w[ em strong sub sup tt underline strike smallcap br stem p eref xref fn ].freeze
- RENAME =
{ "italic" => "em", }.freeze
- TAG_RX =
%r{<[a-zA-Z/!?]}
Class Method Summary collapse
Class Method Details
.sanitize(content) ⇒ Object
27 28 29 30 31 32 33 34 35 |
# File 'lib/relaton/bib/sanitizer.rb', line 27 def self.sanitize(content) return content unless sanitizable?(content) fragment = Nokogiri::XML::DocumentFragment.parse(content) return content if fragment.errors.any? sanitize_children(fragment) fragment.children.map { |c| c.to_xml(encoding: "UTF-8") }.join end |