Module: EzLogsAgent::BulkSqlParser

Defined in:
lib/ez_logs_agent/bulk_sql_parser.rb

Overview

Pure-functional parser for the four ActiveRecord bulk operations that bypass per-row callbacks: delete_all, update_all, insert_all, upsert_all. Used by BulkDatabaseCapturer to turn an AS::Notifications “sql.active_record” payload into a structured wire shape the server can humanize.

## What it extracts

For delete_all:

{ operation: :delete_all, where_template: "status = $1", where_binds: [{column:, value:}] }

For update_all:

{ operation: :update_all, set: {"status" => "paid"}, where_template:, where_binds: }

For insert_all / upsert_all:

{ operation: :insert_all|:upsert_all, columns: ["name", "email"] }
(No values — per the product decision; column SHAPE only.)

For anything else (subqueries, joins, raw SQL the regex can’t parse, malformed binds): returns { unparseable: true }. BulkDatabaseCapturer falls back to shipping row_count + operation + model_class with no template / binds / set — the timeline still reads “Bulk delete: 50,000 orders” minus the WHERE detail.

## Why regex (not Arel / pg_query)

Both Arel and pg_query are adapter-specific and either slow or heavy to add as a runtime dependency on every customer host app. Regex on the standardized AR-emitted SQL string is fast (sub-millisecond on typical statements) and adapter-tolerant. The graceful “unparseable” branch covers the long tail.

## Adapter quoting handled

  • PostgreSQL: “orders”.“status” = $1 (double-quoted, $N placeholders)

  • SQLite: “orders”.“status” = ? (double-quoted, ? placeholders)

  • MySQL: ‘orders`.`status` = ? (backticks, ? placeholders)

Constant Summary collapse

IDENTIFIER =

Loose match for an identifier wrapped in any of the three quote styles AR uses. Captures the unquoted name.

/["`]([^"`]+)["`]/.freeze
COLUMN =

Either a “qualified” or “bare” identifier reference, used in WHERE/SET clause columns. AR prefixes the table name in some paths and not in others — accept both. Captures just the column.

/(?:#{IDENTIFIER}\.)?#{IDENTIFIER}/.freeze
PLACEHOLDER =

A bind placeholder — Postgres uses $N (1-indexed), SQLite/MySQL use ?. The order of placeholders in the SQL corresponds 1:1 with the order of values in ‘binds` / `type_casted_binds`.

/(?:\$\d+|\?)/.freeze

Class Method Summary collapse

Class Method Details

.build_template_and_binds(where_sql, type_casted_binds) ⇒ Object

Given a WHERE clause SQL and the binds in placeholder order, walk the placeholders and pair each one with the column to its LEFT (the standard ‘“table”.“col” = $1` shape AR emits). Returns the template (placeholder-preserved) and the binds tagged with column.

Unrecognized shapes (subselects, joins, NULL checks without binds) leave the template intact but produce no column for the bind, in which case the bind ships with column: nil. The display layer handles nil-column binds.



263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 263

def build_template_and_binds(where_sql, type_casted_binds)
  return [nil, []] if where_sql.nil?

  template = where_sql
  binds = []
  bind_index = 0

  # Walk the template scanning each placeholder, looking backward
  # for the nearest column identifier to its left.
  template.scan(PLACEHOLDER) do
    match_data = Regexp.last_match
    next unless match_data

    column = nearest_column_left_of(template, match_data.begin(0))
    value = type_casted_binds[bind_index]
    binds << { column: column, value: value }
    bind_index += 1
  end

  [template, binds]
end

.detect_operation(sql) ⇒ Symbol?

Returns the symbolic operation name we expect downstream. Detected from SQL shape, independent of the ‘payload` Rails version variance (see plan §“insert_all/upsert_all payload :name varies”). Upsert detection requires the `ON CONFLICT` (PG) or `ON DUPLICATE KEY` (MySQL) clause — bare INSERTs are insert_all.

Returns:

  • (Symbol, nil)

    :delete_all, :update_all, :insert_all, :upsert_all, or nil if not a bulk op.



105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 105

def detect_operation(sql)
  return nil if sql.nil?

  sql_up = sql.lstrip.upcase

  return :delete_all if sql_up.start_with?("DELETE FROM ")
  return :update_all if sql_up.start_with?("UPDATE ")

  if sql_up.start_with?("INSERT INTO ")
    # Disambiguate insert_all vs upsert_all:
  # - insert_all on PG/SQLite emits `ON CONFLICT DO NOTHING` (still insert).
  # - upsert_all on PG/SQLite emits `ON CONFLICT ... DO UPDATE SET ...`.
  # - upsert_all on MySQL emits `ON DUPLICATE KEY UPDATE ...`.
    if (sql_up.include?("ON CONFLICT") && sql_up.include?("DO UPDATE")) ||
       sql_up.include?("ON DUPLICATE KEY")
      return :upsert_all
    end

    :insert_all
  end
end

.extract_insert_columns(sql) ⇒ Object

— INSERT INTO “users” (“name”,“email”) VALUES … — Extracts the ordered column list. Returns nil if the open paren column list isn’t present (e.g., INSERT … DEFAULT VALUES).



301
302
303
304
305
306
307
308
309
310
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 301

def extract_insert_columns(sql)
  # First parenthesized list after the table name.
  m = sql.match(/\AINSERT INTO\s+#{IDENTIFIER}\s*\(([^)]+)\)/i)
  return nil unless m

  columns_block = m[2]
  columns_block.split(",").map do |col|
    col.strip.gsub(/\A["`]|["`]\z/, "")
  end.reject(&:empty?)
end

.extract_set(sql) ⇒ Object

Returns the SET-clause SQL between “SET” and “WHERE”/end-of-string.



177
178
179
180
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 177

def extract_set(sql)
  match = sql.match(/\sSET\s+(.+?)(?:\s+WHERE\s+|\s+RETURNING\b|\z)/i)
  match && match[1].strip
end

.extract_where(sql) ⇒ Object

Returns the WHERE-clause SQL (without the keyword), or nil if absent. Stops at end-of-string, RETURNING, ORDER BY, LIMIT (AR rarely emits these on bulk DML but cheap insurance).



171
172
173
174
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 171

def extract_where(sql)
  match = sql.match(/\sWHERE\s+(.+?)(?:\s+(?:RETURNING|ORDER BY|LIMIT)\b|\z)/i)
  match && match[1].strip
end

.nearest_column_left_of(where_sql, pos) ⇒ Object

Finds the most recent column identifier to the left of ‘pos` in the WHERE clause. Used to attribute each bind to its column name so we can mask sensitive values by name downstream.



288
289
290
291
292
293
294
295
296
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 288

def nearest_column_left_of(where_sql, pos)
  prefix = where_sql[0...pos]
  # Look for the last identifier before the operator (=, <, >, etc.)
  match = prefix.match(/#{COLUMN}\s*(?:=|<>|!=|<=|>=|<|>|LIKE|IN|IS)\s*\z/i)
  return nil unless match

  # COLUMN has two capture groups: [table, col] or [_, col].
  match[2] || match[1]
end

.parse(sql:, type_casted_binds:) ⇒ Hash

Returns Always returns a Hash. Either the parsed structure (keys above) or ‘{ unparseable: true }`.

Parameters:

  • sql (String)

    Raw SQL string from ‘payload`.

  • type_casted_binds (Array)

    payload` —already-typecast values in placeholder order. May be nil for raw SQL paths.

Returns:

  • (Hash)

    Always returns a Hash. Either the parsed structure (keys above) or ‘{ unparseable: true }`.



63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 63

def parse(sql:, type_casted_binds:)
  return { unparseable: true } if sql.nil? || sql.empty?

  sql_stripped = strip_query_log_tags(sql.strip)

  case sql_stripped
  when /\ADELETE FROM /i
    parse_delete(sql_stripped, type_casted_binds || [])
  when /\AUPDATE /i
    parse_update(sql_stripped, type_casted_binds || [])
  when /\AINSERT INTO /i
    parse_insert(sql_stripped)
  else
    { unparseable: true }
  end
rescue StandardError
  # If the regex engine, binds zipping, or any sub-parse step raises,
  # ship unparseable rather than crash the capture handler. The
  # BulkDatabaseCapturer's own rescue would also catch this, but
  # defending here means the rest of the capturer sees a uniform
  # return shape.
  { unparseable: true }
end

.parse_delete(sql, type_casted_binds) ⇒ Object

— DELETE FROM “orders” WHERE “orders”.“status” = $1 —



128
129
130
131
132
133
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 128

def parse_delete(sql, type_casted_binds)
  where_sql = extract_where(sql)
  template, binds = build_template_and_binds(where_sql, type_casted_binds)

  { operation: :delete_all, where_template: template, where_binds: binds }
end

.parse_insert(sql) ⇒ Object

— INSERT INTO “users” (“name”,“email”) VALUES (…), (…) —



158
159
160
161
162
163
164
165
166
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 158

def parse_insert(sql)
  operation = detect_operation(sql)
  return { unparseable: true } unless operation == :insert_all || operation == :upsert_all

  columns = extract_insert_columns(sql)
  return { unparseable: true } if columns.nil? || columns.empty?

  { operation: operation, columns: columns }
end

.parse_set_assignments(set_sql) ⇒ Object

SET clause is comma-separated ‘“col” = <placeholder|literal>` pairs. AR inlines non-symbol values as literals (no placeholder) — we still surface those in the result so the reader sees what changed. Returns [[col_name, placeholder_or_literal_str], …] + total bind count.



186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 186

def parse_set_assignments(set_sql)
  pairs = []
  bind_count = 0

  split_top_level_commas(set_sql).each do |assignment|
    m = assignment.match(/\A#{COLUMN}\s*=\s*(.+)\z/)
    next unless m

    col = m[2] || m[1]
    rhs = m[3].strip
    pairs << [col, rhs]
    bind_count += 1 if rhs.match?(PLACEHOLDER)
  end

  [pairs, bind_count]
end

.parse_update(sql, type_casted_binds) ⇒ Object

— UPDATE “orders” SET “status” = $1 WHERE “orders”.“status” = $2 — SET binds come first in placeholder order, then WHERE binds.



137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 137

def parse_update(sql, type_casted_binds)
  set_sql = extract_set(sql)
  where_sql = extract_where(sql)
  return { unparseable: true } if set_sql.nil?

  set_pairs, set_bind_count = parse_set_assignments(set_sql)
  set_binds = type_casted_binds.first(set_bind_count)
  where_binds_raw = type_casted_binds.drop(set_bind_count)

  set_hash = zip_set_values(set_pairs, set_binds)
  where_template, where_binds = build_template_and_binds(where_sql, where_binds_raw)

  {
    operation: :update_all,
    set: set_hash,
    where_template: where_template,
    where_binds: where_binds
  }
end

.split_top_level_commas(str) ⇒ Object

Splits on commas that are NOT inside (parens) or quotes. Bulk DML SET / WHERE almost never nests, but defensive against function calls like ‘coalesce(x, 0)`.



222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 222

def split_top_level_commas(str)
  result = []
  depth = 0
  in_quote = false
  quote_char = nil
  buffer = +""

  str.each_char do |ch|
    if in_quote
      buffer << ch
      in_quote = false if ch == quote_char
    elsif ch == "'" || ch == '"' || ch == "`"
      in_quote = true
      quote_char = ch
      buffer << ch
    elsif ch == "("
      depth += 1
      buffer << ch
    elsif ch == ")"
      depth -= 1
      buffer << ch
    elsif ch == "," && depth.zero?
      result << buffer.strip unless buffer.empty?
      buffer = +""
    else
      buffer << ch
    end
  end
  result << buffer.strip unless buffer.empty?
  result
end

.strip_query_log_tags(sql) ⇒ Object

Strip Rails 7+ Query Log Tags (‘/*application=’X’,action=‘Y’*/‘) AND any other trailing SQL comments. They land at the end of every statement when `config.active_record.query_log_tags_enabled = true` and are pure noise on the timeline — they leak the host app’s name and controller into the user-visible filter line. Removing them at parse time means we never ship them on the wire.



93
94
95
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 93

def strip_query_log_tags(sql)
  sql.gsub(%r{/\*.*?\*/}m, "").rstrip
end

.zip_set_values(set_pairs, set_binds) ⇒ Object

Walks set_pairs in order, taking the next bind for each placeholder RHS and pulling the literal text for non-placeholder RHS (e.g., ‘updated_at = ’2026-06-05 12:00:00’‘).



206
207
208
209
210
211
212
213
214
215
216
217
# File 'lib/ez_logs_agent/bulk_sql_parser.rb', line 206

def zip_set_values(set_pairs, set_binds)
  bind_idx = 0
  set_pairs.each_with_object({}) do |(col, rhs), acc|
    if rhs.match?(PLACEHOLDER)
      acc[col] = set_binds[bind_idx]
      bind_idx += 1
    else
      # Strip surrounding quotes to surface the actual value.
      acc[col] = rhs.gsub(/\A['"]|['"]\z/, "")
    end
  end
end