Class: SourceMonitor::Items::BatchItemCreator

Inherits:
Object
  • Object
show all
Defined in:
lib/source_monitor/items/batch_item_creator.rb

Overview

Builds a pre-fetched lookup index of existing items for a batch of entries.

Instead of N individual SELECT queries (one per feed entry) to check for existing items, this class:

1. Pre-parses all entries to collect GUIDs + fingerprints
2. Does a single WHERE guid IN (...) query to find existing items by GUID
3. Does a single WHERE content_fingerprint IN (...) for remaining entries
4. Returns an index hash that ItemCreator can use to skip per-entry SELECTs

The actual item creation/update is still done by ItemCreator.call, which accepts the index via the existing_items_index parameter.

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source:, entries:) ⇒ BatchItemCreator

Returns a new instance of BatchItemCreator.



26
27
28
29
# File 'lib/source_monitor/items/batch_item_creator.rb', line 26

def initialize(source:, entries:)
  @source = source
  @entries = Array(entries)
end

Class Method Details

.build_index(source:, entries:) ⇒ Object

Builds a lookup index from a batch of feed entries. Returns a Hash with :by_guid and :by_fingerprint keys.



22
23
24
# File 'lib/source_monitor/items/batch_item_creator.rb', line 22

def self.build_index(source:, entries:)
  new(source: source, entries: entries).build_index
end

Instance Method Details

#build_indexObject



31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/source_monitor/items/batch_item_creator.rb', line 31

def build_index
  return { by_guid: {}, by_fingerprint: {} } if @entries.empty?

  # Step 1: Pre-parse entries to extract GUIDs and fingerprints for bulk lookup.
  entry_identifiers = @entries.map do |entry|
    normalized_entry = NormalizedEntry.new(
      source: @source,
      entry: entry,
      content_extractor: content_extractor
    )

    {
      guid: normalized_entry.item_guid,
      fingerprint: normalized_entry.content_fingerprint,
      raw_guid_present: normalized_entry.raw_guid_present?
    }
  end

  # Step 2: Batch-fetch existing items by GUID (single query)
  guids = entry_identifiers
    .select { |ei| ei[:raw_guid_present] }
    .filter_map { |ei| ei[:guid] }
    .uniq

  existing_by_guid = if guids.any?
    @source.all_items.where(guid: guids).index_by(&:guid)
  else
    {}
  end

  # Step 3: For entries without a GUID match, batch-fetch by fingerprint
  unmatched_fingerprints = entry_identifiers.filter_map do |ei|
    guid = ei[:guid]
    next if ei[:raw_guid_present] && existing_by_guid.key?(guid)

    ei[:fingerprint].presence
  end.uniq

  existing_by_fingerprint = if unmatched_fingerprints.any?
    @source.all_items
      .where(content_fingerprint: unmatched_fingerprints)
      .index_by(&:content_fingerprint)
  else
    {}
  end

  { by_guid: existing_by_guid, by_fingerprint: existing_by_fingerprint }
end