Class: Woods::Unblocked::Exporter

Inherits:

Object

Object
Woods::Unblocked::Exporter

show all

Defined in:: lib/woods/unblocked/exporter.rb

Overview

Orchestrates syncing Woods extraction data to an Unblocked collection.

Reads extraction output from disk via IndexReader, converts units to condensed Markdown documents, and pushes via the Unblocked Documents API. Syncs are incremental: a SyncManifest records the content hash and remote document_id of everything last pushed, so each run only PUTs new/changed documents, skips unchanged ones, and deletes documents whose source unit has disappeared. Documents are upserted by URI, so a missing manifest (first run / CI cache miss) degrades to a correct full sync.

Examples:

exporter = Exporter.new(index_dir: "tmp/woods")
stats = exporter.sync_all
# => { synced: 12, skipped: 928, deleted: 1, errors: [] }

Constant Summary collapse

MAX_ERRORS =

PURGE_GUARD_FRACTION = Mass-deletion guard: refuse to purge when more than this fraction of a manifest of at least PURGE_GUARD_MIN_DOCS entries would be deleted —the signature of a sync run against a partial index. Override with force_purge.

0.30

PURGE_GUARD_MIN_DOCS =

FULL_SYNC_TYPES = Unit types to sync, in priority order. All units are synced for these types.

%w[
  model controller service job mailer manager decorator concern serializer
  graphql graphql_type graphql_mutation graphql_resolver graphql_query
].freeze

PARTIAL_SYNC_TYPES = Unit types where only the most-connected units are synced. Each entry: [type, max_count]

[
  ['poro', 100],
  ['lib', 50]
].freeze

Instance Method Summary collapse

#initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil, manifest: nil, force_full: false, force_purge: false, output: $stdout) ⇒ Exporter constructor

A new instance of Exporter.
#sync_all ⇒ Hash

Sync all configured unit types to the Unblocked collection.
#sync_type(type) ⇒ Hash

Sync all units of a given type.
#sync_type_partial(type, max_count) ⇒ Hash

Sync the top N most-connected units of a type (by dependent count).

Constructor Details

#initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil, manifest: nil, force_full: false, force_purge: false, output: $stdout) ⇒ `Exporter`

Returns a new instance of Exporter.

Parameters:

index_dir (String) —

Path to extraction output directory
config (Configuration) (defaults to: Woods.configuration) —

Woods configuration (default: global config)
client (Client, nil) (defaults to: nil) —

Unblocked API client (auto-created from config if nil)
reader (Object, nil) (defaults to: nil) —

IndexReader instance (auto-created if nil)
manifest (SyncManifest, nil) (defaults to: nil) —

Sync manifest (auto-created under index_dir if nil)
force_full (Boolean) (defaults to: false) —

Re-push every unit, ignoring the unchanged check
force_purge (Boolean) (defaults to: false) —

Bypass the mass-deletion guard
output (IO) (defaults to: $stdout) —

Progress output stream (default: $stdout)

Raises:

(ConfigurationError) —

if required config is missing

# File 'lib/woods/unblocked/exporter.rb', line 62

def initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil,
               manifest: nil, force_full: false, force_purge: false, output: $stdout)
  @collection_id = config.unblocked_collection_id
  raise ConfigurationError, 'unblocked_collection_id is required' unless @collection_id

  repo_url = config.unblocked_repo_url
  raise ConfigurationError, 'unblocked_repo_url is required' unless repo_url

  api_token = config.unblocked_api_token
  raise ConfigurationError, 'unblocked_api_token is required' unless api_token

  budget = ENV.fetch('UNBLOCKED_DAILY_BUDGET', RateLimiter::DEFAULT_BUDGET.to_s).to_i
  limiter = RateLimiter.new(daily_budget: budget)

  @client = client || Client.new(api_token: api_token, rate_limiter: limiter)
  @reader = reader || build_reader(index_dir)
  @builder = DocumentBuilder.new(repo_url: repo_url)
  @manifest = manifest || build_manifest(index_dir)
  @force_full = force_full
  @force_purge = force_purge
  @output = output
  # Initialized here as well as in sync_all so the public sync_type /
  # sync_type_partial methods work standalone (track_uri needs them).
  @current_uris = Set.new
  @budget_exhausted = false
  # base URI => identifier that keeps the bare URI (only populated for
  # URIs shared by >1 unit). Rebuilt per sync_all run.
  @uri_primary = {}
end

Instance Method Details

#sync_all ⇒ `Hash`

Sync all configured unit types to the Unblocked collection.

Returns:

(Hash) —

{ synced:, skipped:, deleted:, errors: }

# File 'lib/woods/unblocked/exporter.rb', line 95

def sync_all
  @current_uris = Set.new
  @budget_exhausted = false
  build_uri_index
  reconcile_from_remote if @manifest.empty?

  synced = 0
  skipped = 0
  errors = []

  FULL_SYNC_TYPES.each do |type|
    break if @budget_exhausted

    result = sync_type(type)
    synced += result[:synced]
    skipped += result[:skipped]
    errors.concat(result[:errors])
  end

  PARTIAL_SYNC_TYPES.each do |type, max_count|
    break if @budget_exhausted

    result = sync_type_partial(type, max_count)
    synced += result[:synced]
    skipped += result[:skipped]
    errors.concat(result[:errors])
  end

  deleted = @budget_exhausted ? 0 : purge_stale(errors)
  { synced: synced, skipped: skipped, deleted: deleted, errors: cap_errors(errors) }
ensure
  save_manifest
end

#sync_type(type) ⇒ `Hash`

Sync all units of a given type.

Parameters:

type (String) —

Unit type (e.g. “model”, “controller”)

Returns:

(Hash) —

{ synced:, skipped:, errors: }

# File 'lib/woods/unblocked/exporter.rb', line 133

def sync_type(type)
  units = @reader.list_units(type: type)
  log "  #{type}: #{units.size} units"

  sync_units(units)
end

#sync_type_partial(type, max_count) ⇒ `Hash`

Sync the top N most-connected units of a type (by dependent count).

Parameters:

type (String) —

Unit type
max_count (Integer) —

Maximum units to sync

Returns:

(Hash) —

{ synced:, skipped:, errors: }

# File 'lib/woods/unblocked/exporter.rb', line 145

def sync_type_partial(type, max_count)
  units = @reader.list_units(type: type)
  return empty_stats if units.empty?

  # Load full data to sort by dependent count
  units_with_data = units.filter_map do |entry|
    data = @reader.find_unit(entry['identifier'])
    next unless data

    dep_count = (data['dependents'] || []).size
    { entry: entry, data: data, dep_count: dep_count }
  end

  # Every unit of this type still exists — track its URI so partial units
  # that fall *out* of the top-N are never mistaken for deletions.
  units_with_data.each { |u| track_uri(u[:data]) }

  top_units = units_with_data.sort_by { |u| -u[:dep_count] }.first(max_count)
  # Count against what was actually synced — units.size includes entries
  # whose unit data was missing (dropped by the filter_map above).
  skipped_count = units.size - top_units.size

  log "  #{type}: #{top_units.size}/#{units.size} units (top by dependents)"

  result = sync_unit_data(top_units.map { |u| [u[:entry], u[:data]] })
  result[:skipped] += skipped_count
  result
end

Class: Woods::Unblocked::Exporter

Overview

Examples:

Constant Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil, manifest: nil, force_full: false, force_purge: false, output: $stdout) ⇒ Exporter

Instance Method Details

#sync_all ⇒ Hash

#sync_type(type) ⇒ Hash

#sync_type_partial(type, max_count) ⇒ Hash

#initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil, manifest: nil, force_full: false, force_purge: false, output: $stdout) ⇒ `Exporter`

#sync_all ⇒ `Hash`

#sync_type(type) ⇒ `Hash`

#sync_type_partial(type, max_count) ⇒ `Hash`