Class: Html2rss::Selectors::PostProcessors::SanitizeHtml
- Defined in:
- lib/html2rss/selectors/post_processors/sanitize_html.rb
Overview
Returns sanitized HTML code as String.
It sanitizes by using the [sanitize gem](github.com/rgrove/sanitize) with [Sanitize::Config::RELAXED](github.com/rgrove/sanitize#sanitizeconfigrelaxed).
Furthermore, it adds:
-
‘rel=“nofollow noopener noreferrer”` to <a> tags
-
‘referrer-policy=’no-referrer’‘ to <img> tags
-
wraps all <img> tags, whose direct parent is not an <a>, into an <a> linking to the <img>‘s `src`.
Imagine this HTML structure:
<section>
Lorem <b>ipsum</b> dolor...
<iframe src="https://evil.corp/miner"></iframe>
<script>alert();</script>
</section>
YAML usage example:
selectors:
description:
selector: '.section'
extractor: html
post_process:
name: sanitize_html
Would return:
'<p>Lorem <b>ipsum</b> dolor ...</p>'
Constant Summary collapse
- TAG_ATTRIBUTES =
{ 'a' => { 'rel' => 'nofollow noopener noreferrer', 'target' => '_blank' }, 'area' => { 'rel' => 'nofollow noopener noreferrer', 'target' => '_blank' }, 'img' => { 'referrerpolicy' => 'no-referrer', 'crossorigin' => 'anonymous', 'loading' => 'lazy', 'decoding' => 'async' }, 'iframe' => { 'referrerpolicy' => 'no-referrer', 'crossorigin' => 'anonymous', 'loading' => 'lazy', 'sandbox' => 'allow-same-origin', 'src' => true, 'width' => true, 'height' => true }, 'video' => { 'referrerpolicy' => 'no-referrer', 'crossorigin' => 'anonymous', 'preload' => 'none', 'playsinline' => 'true', 'controls' => 'true' }, 'audio' => { 'referrerpolicy' => 'no-referrer', 'crossorigin' => 'anonymous', 'preload' => 'none' } }.freeze
Instance Attribute Summary
Attributes inherited from Base
Class Method Summary collapse
-
.get(html, url) ⇒ String?
Shorthand method to get the sanitized HTML.
- .validate_args!(value, context) ⇒ void
Instance Method Summary collapse
Methods inherited from Base
assert_type, expect_options, #initialize
Constructor Details
This class inherits a constructor from Html2rss::Selectors::PostProcessors::Base
Class Method Details
.get(html, url) ⇒ String?
Shorthand method to get the sanitized HTML.
98 99 100 101 102 103 |
# File 'lib/html2rss/selectors/post_processors/sanitize_html.rb', line 98 def self.get(html, url) return nil if String(html).empty? context = Selectors::Context.new(config: { channel: { url: } }, options: {}) new(html, context).get end |
.validate_args!(value, context) ⇒ void
This method returns an undefined value.
89 90 91 |
# File 'lib/html2rss/selectors/post_processors/sanitize_html.rb', line 89 def self.validate_args!(value, context) assert_type value, String, :value, context: end |
Instance Method Details
#get ⇒ String?
107 108 109 110 111 112 |
# File 'lib/html2rss/selectors/post_processors/sanitize_html.rb', line 107 def get sanitized_html = Sanitize.fragment(value, sanitize_config).to_s sanitized_html.gsub!(/\s+/, ' ') sanitized_html.strip! sanitized_html.empty? ? nil : sanitized_html end |