Class: HTM::SqlBuilder

Inherits:
Object
  • Object
show all
Defined in:
lib/htm/sql_builder.rb

Overview

SQL building utilities for constructing safe, parameterized queries

Provides class methods for building SQL conditions for:

  • Timeframe filtering (single range or multiple ranges)

  • Metadata filtering (JSONB containment)

  • Embedding sanitization and padding (SQL injection prevention)

  • LIKE pattern sanitization (wildcard injection prevention)

All methods use proper escaping and parameterization to prevent SQL injection.

Examples:

Build a timeframe condition

HTM::SqlBuilder.timeframe_condition(1.week.ago..Time.now)
# => "(created_at BETWEEN '2024-01-01' AND '2024-01-08')"

Build a metadata condition

HTM::SqlBuilder.({ priority: "high" })
# => "(metadata @> '{\"priority\":\"high\"}'::jsonb)"

Sanitize an embedding

HTM::SqlBuilder.sanitize_embedding([0.1, 0.2, 0.3])
# => "[0.1,0.2,0.3]"

Sanitize a LIKE pattern

HTM::SqlBuilder.sanitize_like_pattern("test%pattern")
# => "test\\%pattern"

Constant Summary collapse

MAX_EMBEDDING_DIMENSION =

Maximum embedding dimension supported by pgvector with HNSW index

2000

Class Method Summary collapse

Class Method Details

.apply_metadata(scope, metadata, column: "metadata") ⇒ Sequel::Dataset

Apply metadata filter to Sequel dataset

Parameters:

  • scope (Sequel::Dataset)

    Base dataset

  • metadata (Hash)

    Metadata to filter by

  • column (String) (defaults to: "metadata")

    Column name (default: “metadata”)

Returns:

  • (Sequel::Dataset)

    Filtered dataset



169
170
171
172
173
# File 'lib/htm/sql_builder.rb', line 169

def (scope, , column: "metadata")
  return scope if .nil? || .empty?

  scope.where(Sequel.lit("#{column} @> ?::jsonb", .to_json))
end

.apply_timeframe(scope, timeframe, column: :created_at) ⇒ Sequel::Dataset

Apply timeframe filter to Sequel dataset

Parameters:

  • scope (Sequel::Dataset)

    Base dataset

  • timeframe (nil, Range, Array<Range>)

    Time range(s)

  • column (Symbol) (defaults to: :created_at)

    Column name (default: :created_at)

Returns:

  • (Sequel::Dataset)

    Filtered dataset



131
132
133
134
135
136
137
138
139
140
141
142
143
# File 'lib/htm/sql_builder.rb', line 131

def apply_timeframe(scope, timeframe, column: :created_at)
  return scope if timeframe.nil?

  case timeframe
  when Range
    scope.where(column => timeframe)
  when Array
    conditions = timeframe.map { |range| Sequel.expr(column => range) }
    scope.where(Sequel.|(*conditions))
  else
    scope
  end
end

.metadata_condition(metadata, table_alias: nil, column: "metadata") ⇒ String?

Build SQL condition for metadata filtering (JSONB containment)

Parameters:

  • metadata (Hash)

    Metadata to filter by

  • table_alias (String, nil) (defaults to: nil)

    Table alias (default: none)

  • column (String) (defaults to: "metadata")

    Column name (default: “metadata”)

Returns:

  • (String, nil)

    SQL condition or nil for no filter



152
153
154
155
156
157
158
159
160
# File 'lib/htm/sql_builder.rb', line 152

def (, table_alias: nil, column: "metadata")
  return nil if .nil? || .empty?

  prefix = table_alias ? "#{table_alias}." : ""
  full_column = "#{prefix}#{column}"

   = HTM.db.literal(.to_json)
  "(#{full_column} @> #{}::jsonb)"
end

.pad_embedding(embedding, target_dimension: MAX_EMBEDDING_DIMENSION) ⇒ Array<Numeric>

Pad embedding to target dimension

Pads embedding with zeros to reach the target dimension for pgvector compatibility.

Parameters:

  • embedding (Array<Numeric>)

    Embedding vector

  • target_dimension (Integer) (defaults to: MAX_EMBEDDING_DIMENSION)

    Target dimension (default: MAX_EMBEDDING_DIMENSION)

Returns:

  • (Array<Numeric>)

    Padded embedding



77
78
79
80
81
# File 'lib/htm/sql_builder.rb', line 77

def pad_embedding(embedding, target_dimension: MAX_EMBEDDING_DIMENSION)
  return embedding if embedding.length >= target_dimension

  embedding + Array.new(target_dimension - embedding.length, 0.0)
end

.sanitize_embedding(embedding) ⇒ String

Sanitize embedding for SQL use

Validates that all values are numeric and converts to safe PostgreSQL vector format. This prevents SQL injection by ensuring only valid numeric values are included.

Parameters:

  • embedding (Array<Numeric>)

    Embedding vector

Returns:

  • (String)

    Sanitized vector string for PostgreSQL (e.g., “[0.1,0.2,0.3]”)

Raises:

  • (ArgumentError)

    If embedding contains non-numeric values



44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# File 'lib/htm/sql_builder.rb', line 44

def sanitize_embedding(embedding)
  unless embedding.is_a?(Array)
    raise ArgumentError, "Embedding must be an Array, got #{embedding.class}"
  end

  if embedding.empty?
    raise ArgumentError, "Embedding cannot be empty"
  end

  # Find invalid values for detailed error message
  invalid_indices = []
  embedding.each_with_index do |v, i|
    unless v.is_a?(Numeric) && v.respond_to?(:finite?) && v.finite?
      invalid_indices << i
    end
  end

  unless invalid_indices.empty?
    sample = invalid_indices.first(5).map { |i| "index #{i}: #{embedding[i].inspect}" }.join(", ")
    raise ArgumentError, "Embedding contains invalid values at #{sample}"
  end

  "[#{embedding.map(&:to_f).join(',')}]"
end

.sanitize_like_pattern(pattern) ⇒ String

Sanitize a string for use in SQL LIKE patterns

Escapes SQL LIKE wildcards (% and _) to prevent pattern injection.

Parameters:

  • pattern (String)

    Pattern to sanitize

Returns:

  • (String)

    Sanitized pattern safe for LIKE queries



90
91
92
93
94
# File 'lib/htm/sql_builder.rb', line 90

def sanitize_like_pattern(pattern)
  return "" if pattern.nil?

  pattern.to_s.gsub(/[%_\\]/) { |match| "\\#{match}" }
end

.timeframe_condition(timeframe, table_alias: nil, column: "created_at") ⇒ String?

Build SQL condition for timeframe filtering

Parameters:

  • timeframe (nil, Range, Array<Range>)

    Time range(s)

  • table_alias (String, nil) (defaults to: nil)

    Table alias (default: none)

  • column (String) (defaults to: "created_at")

    Column name (default: “created_at”)

Returns:

  • (String, nil)

    SQL condition or nil for no filter



103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/htm/sql_builder.rb', line 103

def timeframe_condition(timeframe, table_alias: nil, column: "created_at")
  return nil if timeframe.nil?

  prefix = table_alias ? "#{table_alias}." : ""
  full_column = "#{prefix}#{column}"

  case timeframe
  when Range
    begin_quoted = HTM.db.literal(timeframe.begin.iso8601)
    end_quoted = HTM.db.literal(timeframe.end.iso8601)
    "(#{full_column} BETWEEN #{begin_quoted} AND #{end_quoted})"
  when Array
    conditions = timeframe.map do |range|
      begin_quoted = HTM.db.literal(range.begin.iso8601)
      end_quoted = HTM.db.literal(range.end.iso8601)
      "(#{full_column} BETWEEN #{begin_quoted} AND #{end_quoted})"
    end
    "(#{conditions.join(' OR ')})"
  end
end