Module: Moult::Duplication::Confidence
- Defined in:
- lib/moult/duplication/confidence.rb
Overview
The per-finding confidence model for duplication — the duplication slice's realisation of Moult's protected confidence API. It answers a deliberately humble question: how confident are we that this clone group is genuine, consolidatable duplication rather than an incidental structural rhyme? It never asserts certainty; every contributing factor is recorded as a Reason so the judgement is auditable.
Confidence.assess is a pure function of the signals flay hands us (already extracted by Clones): no IO, no flay objects. That keeps it trivially unit-testable and lets the scoring be pinned against hand-built inputs — drift is a bug, the same treatment ABC and the coverage Resolver get.
Defined Under Namespace
Classes: Assessment, Reason
Constant Summary collapse
- CATEGORY =
"duplication"- BASE =
Base likelihood before any adjustment, keyed by kind. An identical (byte-for-byte) match is near-certain duplication; a merely similar match (names/literals differ) is weaker and could be parallel-by-design.
{identical: 0.6, similar: 0.45}.freeze
- SIMILAR_CAP =
A structurally-similar (not identical) match never reaches high confidence: shared shape is not proof of shared intent.
0.75- MASS_LARGE =
Larger duplicated structures are far less likely to be coincidental.
100- MASS_MEDIUM =
40- WHOLE_DEFINITION =
sexp node types that are whole, cleanly-extractable definitions. A duplicated whole method/class is the least ambiguous "consolidate me".
%w[defn defs class module sclass].freeze
Class Method Summary collapse
- .assess(kind:, mass:, occurrence_count:, node_type:) ⇒ Assessment
- .base_detail(kind) ⇒ Object
-
.mass_reason(mass) ⇒ Object
Bucketed so the contribution is stable and pinnable regardless of the run's configurable --min-mass.
Class Method Details
.assess(kind:, mass:, occurrence_count:, node_type:) ⇒ Assessment
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/moult/duplication/confidence.rb', line 55 def assess(kind:, mass:, occurrence_count:, node_type:) base = BASE.fetch(kind, BASE[:similar]) reasons = [Reason.new(rule: :base_score, delta: base, detail: base_detail(kind))] mass_contribution = mass_reason(mass) reasons << mass_contribution if mass_contribution reasons << Reason.new(rule: :many_occurrences, delta: 0.07, detail: "duplicated across #{occurrence_count} locations") if occurrence_count >= 3 reasons << Reason.new(rule: :whole_definition, delta: 0.08, detail: "duplicates a whole #{node_type}") if WHOLE_DEFINITION.include?(node_type) raw = reasons.sum(&:delta) if kind == :similar && raw > SIMILAR_CAP reasons << Reason.new(rule: :similar_cap, delta: 0.0, detail: "structural similarity is not proof of duplication; capped at #{SIMILAR_CAP}") raw = SIMILAR_CAP end Assessment.new(confidence: raw.clamp(0.0, 1.0).round(2), reasons: reasons) end |
.base_detail(kind) ⇒ Object
73 74 75 76 77 78 79 |
# File 'lib/moult/duplication/confidence.rb', line 73 def base_detail(kind) if kind == :identical "identical structural match (byte-for-byte)" else "structurally-similar match (names/literals differ)" end end |
.mass_reason(mass) ⇒ Object
Bucketed so the contribution is stable and pinnable regardless of the run's configurable --min-mass.
83 84 85 86 87 88 89 |
# File 'lib/moult/duplication/confidence.rb', line 83 def mass_reason(mass) if mass >= MASS_LARGE Reason.new(rule: :large_mass, delta: 0.2, detail: "large duplicated mass (#{mass})") elsif mass >= MASS_MEDIUM Reason.new(rule: :medium_mass, delta: 0.1, detail: "moderate duplicated mass (#{mass})") end end |