Class: Woods::Unblocked::Exporter

Inherits:
Object
  • Object
show all
Defined in:
lib/woods/unblocked/exporter.rb

Overview

Orchestrates syncing Woods extraction data to an Unblocked collection.

Reads extraction output from disk via IndexReader, converts units to condensed Markdown documents, and pushes via the Unblocked Documents API. Syncs are incremental: a SyncManifest records the content hash and remote document_id of everything last pushed, so each run only PUTs new/changed documents, skips unchanged ones, and deletes documents whose source unit has disappeared. Documents are upserted by URI, so a missing manifest (first run / CI cache miss) degrades to a correct full sync.

Examples:

exporter = Exporter.new(index_dir: "tmp/woods")
stats = exporter.sync_all
# => { synced: 12, skipped: 928, deleted: 1, errors: [] }

Constant Summary collapse

MAX_ERRORS =
100
PURGE_GUARD_FRACTION =

Mass-deletion guard: refuse to purge when more than this fraction of a manifest of at least PURGE_GUARD_MIN_DOCS entries would be deleted —the signature of a sync run against a partial index. Override with force_purge.

0.30
PURGE_GUARD_MIN_DOCS =
10
FULL_SYNC_TYPES =

Unit types to sync, in priority order. All units are synced for these types.

%w[
  model controller service job mailer manager decorator concern serializer
  graphql graphql_type graphql_mutation graphql_resolver graphql_query
].freeze
PARTIAL_SYNC_TYPES =

Unit types where only the most-connected units are synced. Each entry: [type, max_count]

[
  ['poro', 100],
  ['lib', 50]
].freeze

Instance Method Summary collapse

Constructor Details

#initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil, manifest: nil, force_full: false, force_purge: false, output: $stdout) ⇒ Exporter

Returns a new instance of Exporter.

Parameters:

  • index_dir (String)

    Path to extraction output directory

  • config (Configuration) (defaults to: Woods.configuration)

    Woods configuration (default: global config)

  • client (Client, nil) (defaults to: nil)

    Unblocked API client (auto-created from config if nil)

  • reader (Object, nil) (defaults to: nil)

    IndexReader instance (auto-created if nil)

  • manifest (SyncManifest, nil) (defaults to: nil)

    Sync manifest (auto-created under index_dir if nil)

  • force_full (Boolean) (defaults to: false)

    Re-push every unit, ignoring the unchanged check

  • force_purge (Boolean) (defaults to: false)

    Bypass the mass-deletion guard

  • output (IO) (defaults to: $stdout)

    Progress output stream (default: $stdout)

Raises:



62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/woods/unblocked/exporter.rb', line 62

def initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil,
               manifest: nil, force_full: false, force_purge: false, output: $stdout)
  @collection_id = config.unblocked_collection_id
  raise ConfigurationError, 'unblocked_collection_id is required' unless @collection_id

  repo_url = config.unblocked_repo_url
  raise ConfigurationError, 'unblocked_repo_url is required' unless repo_url

  api_token = config.unblocked_api_token
  raise ConfigurationError, 'unblocked_api_token is required' unless api_token

  budget = ENV.fetch('UNBLOCKED_DAILY_BUDGET', RateLimiter::DEFAULT_BUDGET.to_s).to_i
  limiter = RateLimiter.new(daily_budget: budget)

  @client = client || Client.new(api_token: api_token, rate_limiter: limiter)
  @reader = reader || build_reader(index_dir)
  @builder = DocumentBuilder.new(repo_url: repo_url)
  @manifest = manifest || build_manifest(index_dir)
  @force_full = force_full
  @force_purge = force_purge
  @output = output
  # Initialized here as well as in sync_all so the public sync_type /
  # sync_type_partial methods work standalone (track_uri needs them).
  @current_uris = Set.new
  @budget_exhausted = false
  # base URI => identifier that keeps the bare URI (only populated for
  # URIs shared by >1 unit). Rebuilt per sync_all run.
  @uri_primary = {}
end

Instance Method Details

#sync_allHash

Sync all configured unit types to the Unblocked collection.

Returns:

  • (Hash)

    { synced:, skipped:, deleted:, errors: }



95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# File 'lib/woods/unblocked/exporter.rb', line 95

def sync_all
  @current_uris = Set.new
  @budget_exhausted = false
  build_uri_index
  reconcile_from_remote if @manifest.empty?

  synced = 0
  skipped = 0
  errors = []

  FULL_SYNC_TYPES.each do |type|
    break if @budget_exhausted

    result = sync_type(type)
    synced += result[:synced]
    skipped += result[:skipped]
    errors.concat(result[:errors])
  end

  PARTIAL_SYNC_TYPES.each do |type, max_count|
    break if @budget_exhausted

    result = sync_type_partial(type, max_count)
    synced += result[:synced]
    skipped += result[:skipped]
    errors.concat(result[:errors])
  end

  deleted = @budget_exhausted ? 0 : purge_stale(errors)
  { synced: synced, skipped: skipped, deleted: deleted, errors: cap_errors(errors) }
ensure
  save_manifest
end

#sync_type(type) ⇒ Hash

Sync all units of a given type.

Parameters:

  • type (String)

    Unit type (e.g. “model”, “controller”)

Returns:

  • (Hash)

    { synced:, skipped:, errors: }



133
134
135
136
137
138
# File 'lib/woods/unblocked/exporter.rb', line 133

def sync_type(type)
  units = @reader.list_units(type: type)
  log "  #{type}: #{units.size} units"

  sync_units(units)
end

#sync_type_partial(type, max_count) ⇒ Hash

Sync the top N most-connected units of a type (by dependent count).

Parameters:

  • type (String)

    Unit type

  • max_count (Integer)

    Maximum units to sync

Returns:

  • (Hash)

    { synced:, skipped:, errors: }



145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
# File 'lib/woods/unblocked/exporter.rb', line 145

def sync_type_partial(type, max_count)
  units = @reader.list_units(type: type)
  return empty_stats if units.empty?

  # Load full data to sort by dependent count
  units_with_data = units.filter_map do |entry|
    data = @reader.find_unit(entry['identifier'])
    next unless data

    dep_count = (data['dependents'] || []).size
    { entry: entry, data: data, dep_count: dep_count }
  end

  # Every unit of this type still exists — track its URI so partial units
  # that fall *out* of the top-N are never mistaken for deletions.
  units_with_data.each { |u| track_uri(u[:data]) }

  top_units = units_with_data.sort_by { |u| -u[:dep_count] }.first(max_count)
  # Count against what was actually synced — units.size includes entries
  # whose unit data was missing (dropped by the filter_map above).
  skipped_count = units.size - top_units.size

  log "  #{type}: #{top_units.size}/#{units.size} units (top by dependents)"

  result = sync_unit_data(top_units.map { |u| [u[:entry], u[:data]] })
  result[:skipped] += skipped_count
  result
end