Class: Exwiw::MongodbParallelPlan
- Inherits:
-
Object
- Object
- Exwiw::MongodbParallelPlan
- Defined in:
- lib/exwiw/mongodb_parallel_plan.rb
Overview
Classifies a MongoDB dump’s collections into the three dependency groups the inter-collection fork schedule needs, plus the derived adjacency that schedule consumes. See docs/mongodb-dump-parallelism-2x-notes.md for the why; this class is the static, config-derived half of that plan.
It is a pure function of the loaded configs and the dump target — no DB access — so it can be computed once up front and unit-tested without a live MongoDB. The fork orchestration (worker pools, LPT bin-packing on output-size weights, @state Marshal sidecars, the Phase-2 cascade loop) lives elsewhere and consumes the structures produced here.
Input contract: ‘configs` are MongodbCollectionConfig already passed through `#reject_ignored_members!` (exactly as Runner#load_table_config produces them), so every surviving belongs_to has a non-nil `table_name`. ignore:true collections are still present in `configs` — they contribute to the schema and to the file-index ordering, but their data extraction is skipped — and are therefore excluded from the three processing groups.
The three groups partition the extractable collections exactly:
-
genuine — reachable to the dump target by following belongs_to edges
(the scoped DAG). Includes the target itself. -
leaf — no belongs_to at all: reference/master data dumped in full,
with no input dependencies (embarrassingly parallel). -
ref_bt — has belongs_to but is NOT reachable to the target: reference
data scoped by the adapter's strict-AND fallback. Its internal edges form shallow components.
‘reachable` mirrors MongodbAdapter#genuine_scope_set exactly (fixpoint over all non-embedded configs, including ignore:true ones), so the genuine set here matches the adapter’s runtime scoping classification.
Instance Attribute Summary collapse
-
#consumed_leaves ⇒ Object
readonly
Leaf collections referenced (via belongs_to) by some non-leaf extractable collection (genuine OR ref_bt).
-
#direct_leaf_genuine ⇒ Object
readonly
genuine collections that directly reference a leaf — the only genuine collections whose output can change once leaf @state is present (and only at runtime, when their genuine anchor turns out empty and they fall back to the leaf clause).
-
#extractable ⇒ Object
readonly
#ordered_all minus ignore:true collections — the collections whose data is actually extracted.
-
#genuine ⇒ Object
readonly
genuine — reachable to the dump target (includes the target).
-
#genuine_children ⇒ Object
readonly
name => genuine children (genuine collections that belongs_to it), keyed only by reachable parents.
-
#index_of ⇒ Object
readonly
name => 0-based position in #ordered_all (the file index is position + 1).
-
#leaves ⇒ Object
readonly
leaf — no belongs_to; reference/master data with no input dependencies.
-
#ordered_all ⇒ Object
readonly
Full processing order, INCLUDING ignore:true collections — the sequence the file index (insert-NNN-) is numbered over.
-
#reachable ⇒ Object
readonly
The set of collection names genuinely scoped by the target (the target plus everything that can reach it through belongs_to).
-
#ref_bt ⇒ Object
readonly
ref_bt — has belongs_to but not reachable to the target.
-
#reference_components ⇒ Object
readonly
ref_bt collections as dependency-closed weakly-connected components over intra-ref_bt belongs_to edges, each returned in a valid topological order (a parent before its child).
Instance Method Summary collapse
-
#initialize(configs:, target_table_name:, logger: nil) ⇒ MongodbParallelPlan
constructor
A new instance of MongodbParallelPlan.
- #summary ⇒ Object
Constructor Details
#initialize(configs:, target_table_name:, logger: nil) ⇒ MongodbParallelPlan
Returns a new instance of MongodbParallelPlan.
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 44 def initialize(configs:, target_table_name:, logger: nil) @by = configs.each_with_object({}) { |c, h| h[c.name] = c } @target_table_name = target_table_name dumpable = configs.reject(&:embedded?) # The file index (insert-NNN-) is taken over the FULL processing order, # including ignore:true collections, so the orchestrated run's filenames # are byte-identical to the serial Runner's (which numbers files the same # way). Data extraction, however, skips ignore:true — see #extractable. @ordered_all = DetermineTableProcessingOrder.run(dumpable, logger: logger).freeze @index_of = @ordered_all.each_with_index.to_h.freeze @extractable = @ordered_all.reject { |n| @by[n].ignore }.freeze @reachable = compute_reachable classify derive_consumed_leaves derive_cascade_adjacency @reference_components = compute_reference_components.freeze end |
Instance Attribute Details
#consumed_leaves ⇒ Object (readonly)
Leaf collections referenced (via belongs_to) by some non-leaf extractable collection (genuine OR ref_bt). These are the only leaves whose captured must hand back (e.g. as a Marshal sidecar). Set<String>.
97 98 99 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 97 def consumed_leaves @consumed_leaves end |
#direct_leaf_genuine ⇒ Object (readonly)
genuine collections that directly reference a leaf — the only genuine collections whose output can change once leaf @state is present (and only at runtime, when their genuine anchor turns out empty and they fall back to the leaf clause). These seed the Phase-2 cascade reprocess.
103 104 105 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 103 def direct_leaf_genuine @direct_leaf_genuine end |
#extractable ⇒ Object (readonly)
#ordered_all minus ignore:true collections — the collections whose data is actually extracted. Union of the three groups below.
73 74 75 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 73 def extractable @extractable end |
#genuine ⇒ Object (readonly)
genuine — reachable to the dump target (includes the target).
78 79 80 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 78 def genuine @genuine end |
#genuine_children ⇒ Object (readonly)
name => genuine children (genuine collections that belongs_to it), keyed only by reachable parents. Drives the Phase-2 cascade: when a reprocessed collection’s row count changes, its genuine children are re-enqueued.
108 109 110 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 108 def genuine_children @genuine_children end |
#index_of ⇒ Object (readonly)
name => 0-based position in #ordered_all (the file index is position + 1).
69 70 71 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 69 def index_of @index_of end |
#leaves ⇒ Object (readonly)
leaf — no belongs_to; reference/master data with no input dependencies.
81 82 83 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 81 def leaves @leaves end |
#ordered_all ⇒ Object (readonly)
Full processing order, INCLUDING ignore:true collections — the sequence the file index (insert-NNN-) is numbered over.
66 67 68 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 66 def ordered_all @ordered_all end |
#reachable ⇒ Object (readonly)
The set of collection names genuinely scoped by the target (the target plus everything that can reach it through belongs_to). Exposed for inspection.
112 113 114 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 112 def reachable @reachable end |
#ref_bt ⇒ Object (readonly)
ref_bt — has belongs_to but not reachable to the target.
84 85 86 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 84 def ref_bt @ref_bt end |
#reference_components ⇒ Object (readonly)
ref_bt collections as dependency-closed weakly-connected components over intra-ref_bt belongs_to edges, each returned in a valid topological order (a parent before its child). A whole component can be processed serially by one worker with no cross-worker @state IPC and no level barriers, seeded only with the leaf @state its members reference.
91 92 93 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 91 def reference_components @reference_components end |
Instance Method Details
#summary ⇒ Object
114 115 116 117 118 119 120 121 122 123 124 |
# File 'lib/exwiw/mongodb_parallel_plan.rb', line 114 def summary { extractable: @extractable.size, genuine: @genuine.size, leaves: @leaves.size, ref_bt: @ref_bt.size, consumed_leaves: @consumed_leaves.size, direct_leaf_genuine: @direct_leaf_genuine.size, reference_components: @reference_components.map(&:size).sort.reverse, } end |