Module: Moult::Duplication

Defined in:
lib/moult/duplication.rb,
lib/moult/duplication/confidence.rb

Overview

Orchestrates the duplication analysis: it asks the Clones adapter (flay) for every structural clone group, attributes each occurrence to its enclosing method (best-effort, for the cross-analysis join), and grades each group through the pure Confidence model. The result is a ranked DuplicationReport of confidence-graded clone groups — never an assertion that duplication is certainly removable.

This is the only layer that knows where the facts come from; Confidence stays a pure function of the extracted signals so it can be pinned in isolation.

Defined Under Namespace

Modules: Confidence Classes: MethodIndex

Class Method Summary collapse

Class Method Details

.build_report(root:, files:, min_mass: Clones::DEFAULT_MIN_MASS, fuzzy: false, min_confidence: 0.0, git_ref: nil, generated_at: nil) ⇒ DuplicationReport

Parameters:

  • root (String)

    absolute analysis root

  • files (Array<String>)

    absolute Ruby file paths to scan

  • min_mass (Integer) (defaults to: Clones::DEFAULT_MIN_MASS)

    flay mass threshold; smaller fragments are ignored

  • fuzzy (Boolean) (defaults to: false)

    include near-matches (off by default)

  • min_confidence (Float) (defaults to: 0.0)

    drop findings below this confidence

Returns:



23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# File 'lib/moult/duplication.rb', line 23

def build_report(root:, files:, min_mass: Clones::DEFAULT_MIN_MASS, fuzzy: false,
  min_confidence: 0.0, git_ref: nil, generated_at: nil)
  clones = Clones.detect(root: root, files: files, min_mass: min_mass, fuzzy: fuzzy)
  methods = MethodIndex.new(root: root, files: files)

  findings = clones.sets.map { |set| finding_for(set, methods) }
  findings.select! { |f| f.confidence >= min_confidence }
  # Highest-confidence first, then heaviest, with node type as a deterministic
  # tie-break so output is stable across runs.
  findings.sort_by! { |f| [-f.confidence, -f.mass, f.node_type] }

  DuplicationReport.new(
    root: root,
    findings: findings,
    git_ref: git_ref,
    generated_at: generated_at,
    backend: clones.backend,
    backend_version: clones.backend_version,
    min_mass: clones.min_mass,
    fuzzy: clones.fuzzy
  )
end

.finding_for(set, methods) ⇒ Object



46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# File 'lib/moult/duplication.rb', line 46

def finding_for(set, methods)
  assessment = Confidence.assess(
    kind: set.kind,
    mass: set.mass,
    occurrence_count: set.occurrences.size,
    node_type: set.node_type
  )
  occurrences = set.occurrences.map do |occ|
    DuplicationReport::Occurrence.new(
      symbol_id: methods.symbol_id_at(occ.path, occ.line),
      path: occ.path,
      line: occ.line,
      fuzzy: occ.fuzzy
    )
  end
  DuplicationReport::Finding.new(
    confidence: assessment.confidence,
    kind: set.kind,
    node_type: set.node_type,
    mass: set.mass,
    reasons: assessment.reasons,
    occurrences: occurrences
  )
end