Class: Woods::Unblocked::Exporter
- Inherits:
-
Object
- Object
- Woods::Unblocked::Exporter
- Defined in:
- lib/woods/unblocked/exporter.rb
Overview
Orchestrates syncing Woods extraction data to an Unblocked collection.
Reads extraction output from disk via IndexReader, converts units to condensed Markdown documents, and pushes via the Unblocked Documents API. All syncs are idempotent — documents are upserted by URI.
Constant Summary collapse
- MAX_ERRORS =
100- FULL_SYNC_TYPES =
Unit types to sync, in priority order. All units are synced for these types.
%w[ model controller service job mailer manager decorator concern serializer graphql graphql_type graphql_mutation graphql_resolver graphql_query ].freeze
- PARTIAL_SYNC_TYPES =
Unit types where only the most-connected units are synced. Each entry: [type, max_count]
[ ['poro', 100], ['lib', 50] ].freeze
Instance Method Summary collapse
-
#initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil, output: $stdout) ⇒ Exporter
constructor
A new instance of Exporter.
-
#sync_all ⇒ Hash
Sync all configured unit types to the Unblocked collection.
-
#sync_type(type) ⇒ Hash
Sync all units of a given type.
-
#sync_type_partial(type, max_count) ⇒ Hash
Sync the top N most-connected units of a type (by dependent count).
Constructor Details
#initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil, output: $stdout) ⇒ Exporter
Returns a new instance of Exporter.
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/woods/unblocked/exporter.rb', line 44 def initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil, output: $stdout) @collection_id = config.unblocked_collection_id raise ConfigurationError, 'unblocked_collection_id is required' unless @collection_id repo_url = config.unblocked_repo_url raise ConfigurationError, 'unblocked_repo_url is required' unless repo_url api_token = config.unblocked_api_token raise ConfigurationError, 'unblocked_api_token is required' unless api_token budget = ENV.fetch('UNBLOCKED_DAILY_BUDGET', RateLimiter::DEFAULT_BUDGET.to_s).to_i limiter = RateLimiter.new(daily_budget: budget) @client = client || Client.new(api_token: api_token, rate_limiter: limiter) @reader = reader || build_reader(index_dir) @builder = DocumentBuilder.new(repo_url: repo_url) @output = output end |
Instance Method Details
#sync_all ⇒ Hash
Sync all configured unit types to the Unblocked collection.
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
# File 'lib/woods/unblocked/exporter.rb', line 66 def sync_all synced = 0 skipped = 0 errors = [] FULL_SYNC_TYPES.each do |type| result = sync_type(type) synced += result[:synced] skipped += result[:skipped] errors.concat(result[:errors]) end PARTIAL_SYNC_TYPES.each do |type, max_count| result = sync_type_partial(type, max_count) synced += result[:synced] skipped += result[:skipped] errors.concat(result[:errors]) end { synced: synced, skipped: skipped, errors: cap_errors(errors) } end |
#sync_type(type) ⇒ Hash
Sync all units of a given type.
92 93 94 95 96 97 |
# File 'lib/woods/unblocked/exporter.rb', line 92 def sync_type(type) units = @reader.list_units(type: type) log " #{type}: #{units.size} units" sync_units(units) end |
#sync_type_partial(type, max_count) ⇒ Hash
Sync the top N most-connected units of a type (by dependent count).
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
# File 'lib/woods/unblocked/exporter.rb', line 104 def sync_type_partial(type, max_count) units = @reader.list_units(type: type) return empty_stats if units.empty? # Load full data to sort by dependent count units_with_data = units.filter_map do |entry| data = @reader.find_unit(entry['identifier']) next unless data dep_count = (data['dependents'] || []).size { entry: entry, data: data, dep_count: dep_count } end top_units = units_with_data.sort_by { |u| -u[:dep_count] }.first(max_count) skipped_count = [units.size - max_count, 0].max log " #{type}: #{top_units.size}/#{units.size} units (top by dependents)" result = sync_unit_data(top_units.map { |u| [u[:entry], u[:data]] }) result[:skipped] += skipped_count result end |