Class: Html2rss::Selectors::PostProcessors::SanitizeHtml

Inherits:
Base
  • Object
show all
Defined in:
lib/html2rss/selectors/post_processors/sanitize_html.rb

Overview

Returns sanitized HTML code as String.

It sanitizes by using the [sanitize gem](github.com/rgrove/sanitize) with [Sanitize::Config::RELAXED](github.com/rgrove/sanitize#sanitizeconfigrelaxed).

Furthermore, it adds:

  • ‘rel=“nofollow noopener noreferrer”` to <a> tags

  • ‘referrer-policy=’no-referrer’‘ to <img> tags

  • wraps all <img> tags, whose direct parent is not an <a>, into an <a> linking to the <img>‘s `src`.

Imagine this HTML structure:

<section>
  Lorem <b>ipsum</b> dolor...
  <iframe src="https://evil.corp/miner"></iframe>
  <script>alert();</script>
</section>

YAML usage example:

selectors:
  description:
    selector: '.section'
    extractor: html
    post_process:
      name: sanitize_html

Would return:

'<p>Lorem <b>ipsum</b> dolor ...</p>'

Constant Summary collapse

TAG_ATTRIBUTES =
{
  'a' => {
    'rel' => 'nofollow noopener noreferrer',
    'target' => '_blank'
  },

  'area' => {
    'rel' => 'nofollow noopener noreferrer',
    'target' => '_blank'
  },

  'img' => {
    'referrerpolicy' => 'no-referrer',
    'crossorigin' => 'anonymous',
    'loading' => 'lazy',
    'decoding' => 'async'
  },

  'iframe' => {
    'referrerpolicy' => 'no-referrer',
    'crossorigin' => 'anonymous',
    'loading' => 'lazy',
    'sandbox' => 'allow-same-origin',
    'src' => true,
    'width' => true,
    'height' => true
  },

  'video' => {
    'referrerpolicy' => 'no-referrer',
    'crossorigin' => 'anonymous',
    'preload' => 'none',
    'playsinline' => 'true',
    'controls' => 'true'
  },

  'audio' => {
    'referrerpolicy' => 'no-referrer',
    'crossorigin' => 'anonymous',
    'preload' => 'none'
  }
}.freeze

Instance Attribute Summary

Attributes inherited from Base

#context, #value

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Base

assert_type, expect_options, #initialize

Constructor Details

This class inherits a constructor from Html2rss::Selectors::PostProcessors::Base

Class Method Details

.get(html, url) ⇒ String?

Shorthand method to get the sanitized HTML.

Parameters:

Returns:

  • (String, nil)


98
99
100
101
102
103
# File 'lib/html2rss/selectors/post_processors/sanitize_html.rb', line 98

def self.get(html, url)
  return nil if String(html).empty?

  context = Selectors::Context.new(config: { channel: { url: } }, options: {})
  new(html, context).get
end

.validate_args!(value, context) ⇒ void

This method returns an undefined value.

Parameters:

  • value (String)

    extracted selector value

  • context (Selectors::Context)

    post-processor context



89
90
91
# File 'lib/html2rss/selectors/post_processors/sanitize_html.rb', line 89

def self.validate_args!(value, context)
  assert_type value, String, :value, context:
end

Instance Method Details

#getString?

Returns:

  • (String, nil)


107
108
109
110
111
112
# File 'lib/html2rss/selectors/post_processors/sanitize_html.rb', line 107

def get
  sanitized_html = Sanitize.fragment(value, sanitize_config).to_s
  sanitized_html.gsub!(/\s+/, ' ')
  sanitized_html.strip!
  sanitized_html.empty? ? nil : sanitized_html
end