Class: Kotoshu::Analyzers::SemanticAnalyzer
- Inherits:
-
Object
- Object
- Kotoshu::Analyzers::SemanticAnalyzer
- Defined in:
- lib/kotoshu/analyzers/semantic_analyzer.rb
Overview
Unified semantic error analyzer.
Uses word embeddings for context-aware error detection and suggestions. Provides unified semantic analysis without artificial spelling/grammar split.
Constant Summary collapse
- HIGH_CONFIDENCE_THRESHOLD =
Similarity threshold for high-confidence suggestions
0.85- MEDIUM_CONFIDENCE_THRESHOLD =
Similarity threshold for medium-confidence suggestions
0.70- MIN_SIMILARITY =
Minimum similarity for suggestions
0.50- DEFAULT_MAX_SUGGESTIONS =
Default number of suggestions to generate
5
Instance Attribute Summary collapse
-
#max_suggestions ⇒ Object
readonly
Returns the value of attribute max_suggestions.
-
#model ⇒ Object
readonly
Returns the value of attribute model.
Instance Method Summary collapse
-
#analyze(document) ⇒ Array<Models::SemanticError>
Analyze a document for semantic errors.
-
#calculate_confidence(suggestions) ⇒ Float
Calculate confidence score for suggestions.
-
#detect_error(word:, location:, context: nil) ⇒ Models::SemanticError?
Detect semantic error for a single word.
-
#initialize(model, max_suggestions: DEFAULT_MAX_SUGGESTIONS, min_similarity: MIN_SIMILARITY) ⇒ SemanticAnalyzer
constructor
Create a new semantic analyzer.
-
#suggest_corrections(word, context: nil) ⇒ Array<Models::Suggestion>
Suggest corrections for a word.
-
#valid_word?(word) ⇒ Boolean
Check if a word is valid (exists in vocabulary).
Constructor Details
#initialize(model, max_suggestions: DEFAULT_MAX_SUGGESTIONS, min_similarity: MIN_SIMILARITY) ⇒ SemanticAnalyzer
Create a new semantic analyzer.
42 43 44 45 46 47 48 |
# File 'lib/kotoshu/analyzers/semantic_analyzer.rb', line 42 def initialize(model, max_suggestions: DEFAULT_MAX_SUGGESTIONS, min_similarity: MIN_SIMILARITY) raise ArgumentError, "Model must be an EmbeddingModel" unless model.is_a?(Models::EmbeddingModel) @model = model @max_suggestions = max_suggestions @min_similarity = min_similarity end |
Instance Attribute Details
#max_suggestions ⇒ Object (readonly)
Returns the value of attribute max_suggestions.
35 36 37 |
# File 'lib/kotoshu/analyzers/semantic_analyzer.rb', line 35 def max_suggestions @max_suggestions end |
#model ⇒ Object (readonly)
Returns the value of attribute model.
35 36 37 |
# File 'lib/kotoshu/analyzers/semantic_analyzer.rb', line 35 def model @model end |
Instance Method Details
#analyze(document) ⇒ Array<Models::SemanticError>
Analyze a document for semantic errors.
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/kotoshu/analyzers/semantic_analyzer.rb', line 54 def analyze(document) errors = [] # Get text nodes from document document.text_nodes.each do |text_node| # Tokenize and check each word words = tokenize_words(text_node.text) words.each do |word| next if valid_word?(word) # Detect error error = detect_error( word: word, location: text_node.location, context: document.context_for(text_node.location) ) errors << error if error end end # Sort errors by location and confidence errors.sort end |
#calculate_confidence(suggestions) ⇒ Float
Calculate confidence score for suggestions.
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 |
# File 'lib/kotoshu/analyzers/semantic_analyzer.rb', line 164 def calculate_confidence(suggestions) return 0.0 unless suggestions&.any? # Confidence is based on top suggestion quality top = suggestions.first # High confidence: top suggestion > 0.85 similarity return 1.0 if top.confidence > HIGH_CONFIDENCE_THRESHOLD # Medium confidence: top suggestion > 0.70 similarity return 0.7 if top.confidence > MEDIUM_CONFIDENCE_THRESHOLD # Low confidence: top suggestion < 0.70 0.5 end |
#detect_error(word:, location:, context: nil) ⇒ Models::SemanticError?
Detect semantic error for a single word.
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
# File 'lib/kotoshu/analyzers/semantic_analyzer.rb', line 86 def detect_error(word:, location:, context: nil) return nil if valid_word?(word) # Get suggestions suggestions = suggest_corrections(word, context: context) # Determine error type based on analysis error_type = classify_error(word, suggestions, context) # Calculate confidence based on suggestions confidence = calculate_confidence(suggestions) # Create error object Models::SemanticError.new( id: generate_error_id(word, location), location: location, original: word, suggestions: suggestions, error_type: error_type, confidence: confidence, context: context ) end |
#suggest_corrections(word, context: nil) ⇒ Array<Models::Suggestion>
Suggest corrections for a word.
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
# File 'lib/kotoshu/analyzers/semantic_analyzer.rb', line 115 def suggest_corrections(word, context: nil) return [] if word.nil? || word.empty? # Get nearest neighbors from embedding model neighbors = @model.nearest_neighbors(word, k: @max_suggestions * 3) # Filter by minimum similarity neighbors = neighbors.select { |n| n.similarity >= @min_similarity } # If we have context, rank by contextual relevance if context && context.respond_to?(:surrounding_words) neighbors = rank_by_context(neighbors, context) end # Convert to Suggestions neighbors.first(@max_suggestions).map do |neighbor| Models::Suggestion.new( word: neighbor.word, confidence: neighbor.similarity, source: :semantic, metadata: { distance: neighbor.distance, similarity: neighbor.similarity } ) end end |
#valid_word?(word) ⇒ Boolean
Check if a word is valid (exists in vocabulary).
147 148 149 150 151 152 153 154 155 156 157 158 |
# File 'lib/kotoshu/analyzers/semantic_analyzer.rb', line 147 def valid_word?(word) return false if word.nil? || word.empty? # Skip numbers return true if word =~ /^\d+$/ # Skip single characters (likely abbreviations) return true if word.length == 1 # Check if word exists in model vocabulary @model.has_word?(word) end |