Class: Coelacanth::Extractor::EyecatchImageExtractor
- Inherits:
-
Object
- Object
- Coelacanth::Extractor::EyecatchImageExtractor
- Defined in:
- lib/coelacanth/extractor/eyecatch_image_extractor.rb
Overview
Finds and downloads the representative image for a document.
Defined Under Namespace
Classes: Result
Constant Summary collapse
- POSITIVE_KEYWORDS =
%w[eyecatch hero main featured cover headline banner article primary lead].freeze
- NEGATIVE_KEYWORDS =
%w[avatar icon logo emoji badge button profile author comment footer nav thumbnail thumb ad sponsor].freeze
- METADATA_SOURCES =
[ { selector: "meta[property='og:image:secure_url']", attribute: "content", score: 140 }, { selector: "meta[property='og:image:url']", attribute: "content", score: 135 }, { selector: "meta[property='og:image']", attribute: "content", score: 130 }, { selector: "meta[name='twitter:image:src']", attribute: "content", score: 125 }, { selector: "meta[name='twitter:image']", attribute: "content", score: 120 }, { selector: "meta[itemprop='image']", attribute: "content", score: 110 }, { selector: "meta[name='thumbnail']", attribute: "content", score: 100 }, { selector: "link[rel='image_src']", attribute: "href", score: 95 } ].freeze
- JSON_LD_IMAGE_KEYS =
%w[image imageUrl imageURL thumbnail thumbnailUrl thumbnailURL contentUrl contentURL].freeze
- LAZY_SOURCE_ATTRIBUTES =
%w[data-src data-original data-lazy-src data-lazy data-url data-image data-preview src].freeze
Instance Method Summary collapse
- #call(doc:, base_url: nil) ⇒ Object
-
#initialize(http_client: Coelacanth::HTTP) ⇒ EyecatchImageExtractor
constructor
A new instance of EyecatchImageExtractor.
Constructor Details
#initialize(http_client: Coelacanth::HTTP) ⇒ EyecatchImageExtractor
Returns a new instance of EyecatchImageExtractor.
35 36 37 |
# File 'lib/coelacanth/extractor/eyecatch_image_extractor.rb', line 35 def initialize(http_client: Coelacanth::HTTP) @http_client = http_client end |
Instance Method Details
#call(doc:, base_url: nil) ⇒ Object
39 40 41 42 43 44 45 46 |
# File 'lib/coelacanth/extractor/eyecatch_image_extractor.rb', line 39 def call(doc:, base_url: nil) return unless doc image_url = locate_image_url(doc, base_url) return unless image_url download(image_url) end |