Class: Vivlio::Starter::CLI::IndexCommands::ScoringEngine

Inherits:
Object
  • Object
show all
Defined in:
lib/vivlio/starter/cli/index/scoring_engine.rb

Overview

スコアリングエンジン

Constant Summary collapse

WEIGHTS =

スコアリング係数

{
  tf: 1.0,           # 出現頻度
  idf: 5.0,          # IDF 係数
  definition: 30.0,  # 定義パターンボーナス
  technical: 15.0,   # 専門用語ボーナス
  heading: 20.0,     # 見出し近傍ボーナス
  first_occurrence: 10.0 # 章の冒頭出現ボーナス
}.freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeScoringEngine

Returns a new instance of ScoringEngine.



39
40
41
# File 'lib/vivlio/starter/cli/index/scoring_engine.rb', line 39

def initialize
  @scores = Hash.new { |h, k| h[k] = { total: 0.0, components: {} } }
end

Instance Attribute Details

#scoresObject (readonly)

Returns the value of attribute scores.



37
38
39
# File 'lib/vivlio/starter/cli/index/scoring_engine.rb', line 37

def scores
  @scores
end

Instance Method Details

#add_score(term, component, value) ⇒ Object

用語にスコアを追加

Parameters:

  • term (String)

    用語

  • component (Symbol)

    スコア要素

  • value (Float)

    スコア値



47
48
49
50
51
52
53
54
# File 'lib/vivlio/starter/cli/index/scoring_engine.rb', line 47

def add_score(term, component, value)
  weight = WEIGHTS[component] || 1.0
  weighted_value = value * weight

  @scores[term][:components][component] ||= 0.0
  @scores[term][:components][component] += weighted_value
  @scores[term][:total] += weighted_value
end

#calculate_tfidf(term, tf, df, doc_count) ⇒ Object

TF-IDF スコアを計算

Parameters:

  • term (String)

    用語

  • tf (Integer)

    出現頻度

  • df (Integer)

    文書頻度

  • doc_count (Integer)

    総文書数



61
62
63
64
65
66
67
# File 'lib/vivlio/starter/cli/index/scoring_engine.rb', line 61

def calculate_tfidf(term, tf, df, doc_count)
  return if tf.zero?

  idf = Math.log((doc_count + 1.0) / (df + 1.0)) + 1.0
  add_score(term, :tf, tf)
  add_score(term, :idf, idf)
end

#debug_scores(term) ⇒ Object

デバッグ用: スコアの内訳を表示



84
85
86
87
88
89
90
91
92
93
# File 'lib/vivlio/starter/cli/index/scoring_engine.rb', line 84

def debug_scores(term)
  data = @scores[term]
  return nil unless data

  {
    term: term,
    total: data[:total].round(2),
    components: data[:components].transform_values { it.round(2) }
  }
end

#filter_by_threshold(threshold) ⇒ Hash

閾値以上のスコアを持つ用語を取得

Parameters:

  • threshold (Float)

    閾値

Returns:

  • (Hash)

    用語とスコアのハッシュ



72
73
74
75
76
# File 'lib/vivlio/starter/cli/index/scoring_engine.rb', line 72

def filter_by_threshold(threshold)
  @scores.select { _2[:total] >= threshold }
         .sort_by { -_2[:total] }
         .to_h
end

#reset!Object

スコアをリセット



79
80
81
# File 'lib/vivlio/starter/cli/index/scoring_engine.rb', line 79

def reset!
  @scores.clear
end