Class: Woods::Unblocked::Exporter
- Inherits:
-
Object
- Object
- Woods::Unblocked::Exporter
- Defined in:
- lib/woods/unblocked/exporter.rb
Overview
Orchestrates syncing Woods extraction data to an Unblocked collection.
Reads extraction output from disk via IndexReader, converts units to condensed Markdown documents, and pushes via the Unblocked Documents API. Syncs are incremental: a SyncManifest records the content hash and remote document_id of everything last pushed, so each run only PUTs new/changed documents, skips unchanged ones, and deletes documents whose source unit has disappeared. Documents are upserted by URI, so a missing manifest (first run / CI cache miss) degrades to a correct full sync.
Constant Summary collapse
- MAX_ERRORS =
100- PURGE_GUARD_FRACTION =
Mass-deletion guard: refuse to purge when more than this fraction of a manifest of at least PURGE_GUARD_MIN_DOCS entries would be deleted —the signature of a sync run against a partial index. Override with force_purge.
0.30- PURGE_GUARD_MIN_DOCS =
10- FULL_SYNC_TYPES =
Unit types to sync, in priority order. All units are synced for these types.
%w[ model controller service job mailer manager decorator concern serializer graphql graphql_type graphql_mutation graphql_resolver graphql_query ].freeze
- PARTIAL_SYNC_TYPES =
Unit types where only the most-connected units are synced. Each entry: [type, max_count]
[ ['poro', 100], ['lib', 50] ].freeze
Instance Method Summary collapse
-
#initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil, manifest: nil, force_full: false, force_purge: false, output: $stdout) ⇒ Exporter
constructor
A new instance of Exporter.
-
#sync_all ⇒ Hash
Sync all configured unit types to the Unblocked collection.
-
#sync_type(type) ⇒ Hash
Sync all units of a given type.
-
#sync_type_partial(type, max_count) ⇒ Hash
Sync the top N most-connected units of a type (by dependent count).
Constructor Details
#initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil, manifest: nil, force_full: false, force_purge: false, output: $stdout) ⇒ Exporter
Returns a new instance of Exporter.
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
# File 'lib/woods/unblocked/exporter.rb', line 62 def initialize(index_dir:, config: Woods.configuration, client: nil, reader: nil, manifest: nil, force_full: false, force_purge: false, output: $stdout) @collection_id = config.unblocked_collection_id raise ConfigurationError, 'unblocked_collection_id is required' unless @collection_id repo_url = config.unblocked_repo_url raise ConfigurationError, 'unblocked_repo_url is required' unless repo_url api_token = config.unblocked_api_token raise ConfigurationError, 'unblocked_api_token is required' unless api_token budget = ENV.fetch('UNBLOCKED_DAILY_BUDGET', RateLimiter::DEFAULT_BUDGET.to_s).to_i limiter = RateLimiter.new(daily_budget: budget) @client = client || Client.new(api_token: api_token, rate_limiter: limiter) @reader = reader || build_reader(index_dir) @builder = DocumentBuilder.new(repo_url: repo_url) @manifest = manifest || build_manifest(index_dir) @force_full = force_full @force_purge = force_purge @output = output # Initialized here as well as in sync_all so the public sync_type / # sync_type_partial methods work standalone (track_uri needs them). @current_uris = Set.new @budget_exhausted = false # base URI => identifier that keeps the bare URI (only populated for # URIs shared by >1 unit). Rebuilt per sync_all run. @uri_primary = {} end |
Instance Method Details
#sync_all ⇒ Hash
Sync all configured unit types to the Unblocked collection.
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
# File 'lib/woods/unblocked/exporter.rb', line 95 def sync_all @current_uris = Set.new @budget_exhausted = false build_uri_index reconcile_from_remote if @manifest.empty? synced = 0 skipped = 0 errors = [] FULL_SYNC_TYPES.each do |type| break if @budget_exhausted result = sync_type(type) synced += result[:synced] skipped += result[:skipped] errors.concat(result[:errors]) end PARTIAL_SYNC_TYPES.each do |type, max_count| break if @budget_exhausted result = sync_type_partial(type, max_count) synced += result[:synced] skipped += result[:skipped] errors.concat(result[:errors]) end deleted = @budget_exhausted ? 0 : purge_stale(errors) { synced: synced, skipped: skipped, deleted: deleted, errors: cap_errors(errors) } ensure save_manifest end |
#sync_type(type) ⇒ Hash
Sync all units of a given type.
133 134 135 136 137 138 |
# File 'lib/woods/unblocked/exporter.rb', line 133 def sync_type(type) units = @reader.list_units(type: type) log " #{type}: #{units.size} units" sync_units(units) end |
#sync_type_partial(type, max_count) ⇒ Hash
Sync the top N most-connected units of a type (by dependent count).
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
# File 'lib/woods/unblocked/exporter.rb', line 145 def sync_type_partial(type, max_count) units = @reader.list_units(type: type) return empty_stats if units.empty? # Load full data to sort by dependent count units_with_data = units.filter_map do |entry| data = @reader.find_unit(entry['identifier']) next unless data dep_count = (data['dependents'] || []).size { entry: entry, data: data, dep_count: dep_count } end # Every unit of this type still exists — track its URI so partial units # that fall *out* of the top-N are never mistaken for deletions. units_with_data.each { |u| track_uri(u[:data]) } top_units = units_with_data.sort_by { |u| -u[:dep_count] }.first(max_count) # Count against what was actually synced — units.size includes entries # whose unit data was missing (dropped by the filter_map above). skipped_count = units.size - top_units.size log " #{type}: #{top_units.size}/#{units.size} units (top by dependents)" result = sync_unit_data(top_units.map { |u| [u[:entry], u[:data]] }) result[:skipped] += skipped_count result end |