Class: Browsable::HtmlExtractor

Inherits:
Object
  • Object
show all
Defined in:
lib/browsable/html_extractor.rb

Overview

Pure-Ruby parser for a rendered HTML response. Walks the document for asset references (‘<link rel=“stylesheet”>`, `<script src>`) and inline CSS/JS blocks, then asks the configured AssetResolver to translate each external URL into an on-disk path.

This is the only HTML work the runtime middleware performs per request. No analysis happens here — that is the TestReport’s job, end of suite.

Defined Under Namespace

Classes: AssetRef, Extraction, InlineBlock

Constant Summary collapse

EMPTY =
Extraction.new(asset_paths: [], inline_blocks: []).freeze

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(html, asset_resolver: nil) ⇒ HtmlExtractor

Returns a new instance of HtmlExtractor.



35
36
37
38
# File 'lib/browsable/html_extractor.rb', line 35

def initialize(html, asset_resolver: nil)
  @html = html.to_s
  @asset_resolver = asset_resolver
end

Instance Attribute Details

#asset_resolverObject (readonly)

Returns the value of attribute asset_resolver.



33
34
35
# File 'lib/browsable/html_extractor.rb', line 33

def asset_resolver
  @asset_resolver
end

#htmlObject (readonly)

Returns the value of attribute html.



33
34
35
# File 'lib/browsable/html_extractor.rb', line 33

def html
  @html
end

Class Method Details

.extract(html, asset_resolver: nil) ⇒ Object

Convenience entry point used by the middleware so a single call replaces both ‘new(…)` and `.extract` at the call site.



42
43
44
# File 'lib/browsable/html_extractor.rb', line 42

def self.extract(html, asset_resolver: nil)
  new(html, asset_resolver: asset_resolver).run
end

Instance Method Details

#runObject



46
47
48
49
50
51
52
53
54
55
56
# File 'lib/browsable/html_extractor.rb', line 46

def run
  return EMPTY if html.strip.empty?

  doc = Nokogiri::HTML5.parse(html)
  Extraction.new(
    asset_paths: extract_assets(doc),
    inline_blocks: extract_inline_blocks(doc)
  )
rescue StandardError
  EMPTY
end