Class: Kotoshu::Models::WordEmbedding
- Inherits:
-
Object
- Object
- Kotoshu::Models::WordEmbedding
- Defined in:
- lib/kotoshu/models/word_embedding.rb
Overview
Immutable value object for word embeddings.
Represents a word and its vector representation in a semantic space. Used for semantic similarity calculations and nearest neighbor searches.
Instance Attribute Summary collapse
-
#dimension ⇒ Object
readonly
Returns the value of attribute dimension.
-
#language_code ⇒ Object
readonly
Returns the value of attribute language_code.
-
#vector ⇒ Object
readonly
Returns the value of attribute vector.
-
#word ⇒ Object
readonly
Returns the value of attribute word.
Instance Method Summary collapse
-
#==(other) ⇒ Boolean
(also: #eql?)
Check if this embedding is equal to another.
-
#distance(other) ⇒ Float
Calculate Euclidean distance from another embedding.
-
#hash ⇒ Integer
Hash code for hash table usage.
-
#initialize(word, vector, language_code, dimension: 300) ⇒ WordEmbedding
constructor
Create a new word embedding.
-
#similarity(other) ⇒ Float
Calculate cosine similarity with another embedding.
-
#to_s ⇒ String
(also: #inspect)
String representation.
Constructor Details
#initialize(word, vector, language_code, dimension: 300) ⇒ WordEmbedding
Create a new word embedding.
25 26 27 28 29 30 31 32 33 34 |
# File 'lib/kotoshu/models/word_embedding.rb', line 25 def initialize(word, vector, language_code, dimension: 300) raise ArgumentError, "Vector dimension mismatch" unless vector.size == dimension @word = word @vector = vector.freeze @language_code = language_code @dimension = dimension freeze end |
Instance Attribute Details
#dimension ⇒ Object (readonly)
Returns the value of attribute dimension.
16 17 18 |
# File 'lib/kotoshu/models/word_embedding.rb', line 16 def dimension @dimension end |
#language_code ⇒ Object (readonly)
Returns the value of attribute language_code.
16 17 18 |
# File 'lib/kotoshu/models/word_embedding.rb', line 16 def language_code @language_code end |
#vector ⇒ Object (readonly)
Returns the value of attribute vector.
16 17 18 |
# File 'lib/kotoshu/models/word_embedding.rb', line 16 def vector @vector end |
#word ⇒ Object (readonly)
Returns the value of attribute word.
16 17 18 |
# File 'lib/kotoshu/models/word_embedding.rb', line 16 def word @word end |
Instance Method Details
#==(other) ⇒ Boolean Also known as: eql?
Check if this embedding is equal to another.
75 76 77 78 79 |
# File 'lib/kotoshu/models/word_embedding.rb', line 75 def ==(other) return false unless other.is_a?(WordEmbedding) @word == other.word && @language_code == other.language_code end |
#distance(other) ⇒ Float
Calculate Euclidean distance from another embedding.
63 64 65 66 67 68 69 |
# File 'lib/kotoshu/models/word_embedding.rb', line 63 def distance(other) raise TypeError, "Must be WordEmbedding" unless other.is_a?(WordEmbedding) return Float::INFINITY if @dimension != other.dimension Math.sqrt(@vector.zip(other.vector).map { |a, b| (a - b)**2 }.sum) end |
#hash ⇒ Integer
Hash code for hash table usage.
85 86 87 |
# File 'lib/kotoshu/models/word_embedding.rb', line 85 def hash [@word, @language_code].hash end |
#similarity(other) ⇒ Float
Calculate cosine similarity with another embedding.
Cosine similarity measures the cosine of the angle between two vectors. Returns 1.0 for identical vectors, 0.0 for orthogonal vectors.
44 45 46 47 48 49 50 51 52 53 54 55 56 |
# File 'lib/kotoshu/models/word_embedding.rb', line 44 def similarity(other) raise TypeError, "Must be WordEmbedding" unless other.is_a?(WordEmbedding) return 0.0 if @dimension != other.dimension dot_product = @vector.zip(other.vector).map { |a, b| a * b }.sum magnitude_a = vector_magnitude magnitude_b = other.vector_magnitude return 0.0 if magnitude_a.zero? || magnitude_b.zero? dot_product / (magnitude_a * magnitude_b) end |
#to_s ⇒ String Also known as: inspect
String representation.
92 93 94 |
# File 'lib/kotoshu/models/word_embedding.rb', line 92 def to_s "#{self.class.name}[#{@word}, #{@language_code}, #{@dimension}D]" end |