Module: Moult::Duplication::Confidence

Defined in:
lib/moult/duplication/confidence.rb

Overview

The per-finding confidence model for duplication — the duplication slice's realisation of Moult's protected confidence API. It answers a deliberately humble question: how confident are we that this clone group is genuine, consolidatable duplication rather than an incidental structural rhyme? It never asserts certainty; every contributing factor is recorded as a Reason so the judgement is auditable.

Confidence.assess is a pure function of the signals flay hands us (already extracted by Clones): no IO, no flay objects. That keeps it trivially unit-testable and lets the scoring be pinned against hand-built inputs — drift is a bug, the same treatment ABC and the coverage Resolver get.

Defined Under Namespace

Classes: Assessment, Reason

Constant Summary collapse

CATEGORY =
"duplication"
BASE =

Base likelihood before any adjustment, keyed by kind. An identical (byte-for-byte) match is near-certain duplication; a merely similar match (names/literals differ) is weaker and could be parallel-by-design.

{identical: 0.6, similar: 0.45}.freeze
SIMILAR_CAP =

A structurally-similar (not identical) match never reaches high confidence: shared shape is not proof of shared intent.

0.75
MASS_LARGE =

Larger duplicated structures are far less likely to be coincidental.

100
MASS_MEDIUM =
40
WHOLE_DEFINITION =

sexp node types that are whole, cleanly-extractable definitions. A duplicated whole method/class is the least ambiguous "consolidate me".

%w[defn defs class module sclass].freeze

Class Method Summary collapse

Class Method Details

.assess(kind:, mass:, occurrence_count:, node_type:) ⇒ Assessment

Parameters:

  • kind (Symbol)

    :identical or :similar

  • mass (Integer)

    flay's mass for the duplicated node

  • occurrence_count (Integer)

    number of sites (>= 2)

  • node_type (String)

    flay sexp type, e.g. "defn", "call"

Returns:



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/moult/duplication/confidence.rb', line 55

def assess(kind:, mass:, occurrence_count:, node_type:)
  base = BASE.fetch(kind, BASE[:similar])
  reasons = [Reason.new(rule: :base_score, delta: base, detail: base_detail(kind))]

  mass_contribution = mass_reason(mass)
  reasons << mass_contribution if mass_contribution
  reasons << Reason.new(rule: :many_occurrences, delta: 0.07, detail: "duplicated across #{occurrence_count} locations") if occurrence_count >= 3
  reasons << Reason.new(rule: :whole_definition, delta: 0.08, detail: "duplicates a whole #{node_type}") if WHOLE_DEFINITION.include?(node_type)

  raw = reasons.sum(&:delta)
  if kind == :similar && raw > SIMILAR_CAP
    reasons << Reason.new(rule: :similar_cap, delta: 0.0, detail: "structural similarity is not proof of duplication; capped at #{SIMILAR_CAP}")
    raw = SIMILAR_CAP
  end

  Assessment.new(confidence: raw.clamp(0.0, 1.0).round(2), reasons: reasons)
end

.base_detail(kind) ⇒ Object



73
74
75
76
77
78
79
# File 'lib/moult/duplication/confidence.rb', line 73

def base_detail(kind)
  if kind == :identical
    "identical structural match (byte-for-byte)"
  else
    "structurally-similar match (names/literals differ)"
  end
end

.mass_reason(mass) ⇒ Object

Bucketed so the contribution is stable and pinnable regardless of the run's configurable --min-mass.



83
84
85
86
87
88
89
# File 'lib/moult/duplication/confidence.rb', line 83

def mass_reason(mass)
  if mass >= MASS_LARGE
    Reason.new(rule: :large_mass, delta: 0.2, detail: "large duplicated mass (#{mass})")
  elsif mass >= MASS_MEDIUM
    Reason.new(rule: :medium_mass, delta: 0.1, detail: "moderate duplicated mass (#{mass})")
  end
end