Class: RubyWorker::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/ruby_worker/parser.rb

Overview

Parser walks each requested Ruby file via the whitequark/parser gem and produces the ParsedFile messages the Go-side structural detectors consume.

Mirrors ‘workers/ts/src/parser.ts` and `workers/php/src/Parser.php` structurally — same six collectors, same hashing scheme, same block-extraction shape. Cross-language symmetry isn’t accidental; it’s how the canonical-hash detector clusters Ruby / TS / Go / PHP functions that share a structural shape.

## Hashing scheme

Two hashes per function:

* `hash` (language-specific): captures Ruby-flavored AST
  node types including operator method names on `:send`
  nodes. Two Ruby methods with the same hash share AST
  shape modulo identifier names and literal values.
* `canonical_hash` (cross-language): same scheme using
  the universal token vocabulary defined in
  core/pkg/structural/lang/canonical.go.

Both hashes use SHA-1 truncated to 16 hex chars. Trivial bodies (≤2 nodes) short-circuit to “”.

## Concerns

Per-file concerns are categorized into ConcernEvidenceRef entries tagged with one of the eight canonical categories. The classifier looks at:

* `:send` nodes whose method is a known state / network
  / io / config / dataaccess identifier
* `[]` accesses on session / cookies / ENV
* Rails.cache, Rails.application.config
* High-complexity methods → business

The taxonomy aligns with the Rails framework profile in ‘core/pkg/structural/framework/rails.go`.

## Error tolerance

Syntactically broken Ruby still yields a partial ParsedFile with whatever the parser salvaged. The ‘parse_error` field carries the SyntaxError’s message verbatim. The parser gem itself has best-in-class error recovery (it’s what RuboCop relies on for partial parses).

Constant Summary collapse

BUSINESS_COMPLEXITY =
8
MIN_BLOCK_STMTS =
3
HASH_HEX_LEN =
16

Instance Method Summary collapse

Constructor Details

#initializeParser

Returns a new instance of Parser.



65
66
67
68
69
70
71
# File 'lib/ruby_worker/parser.rb', line 65

def initialize
  @ruby_parser = ::Parser::CurrentRuby.new
  # Silence parser-gem warnings on stderr — they pollute
  # the gRPC server logs.
  @ruby_parser.diagnostics.all_errors_are_fatal = false
  @ruby_parser.diagnostics.ignore_warnings = true
end

Instance Method Details

#parse_files(repo_path, rel_paths) ⇒ Array<Object>

Returns ParsedFile proto messages.

Parameters:

  • repo_path (String)

    absolute repo root path

  • rel_paths (Array<String>)

    repo-relative file paths

Returns:

  • (Array<Object>)

    ParsedFile proto messages



76
77
78
# File 'lib/ruby_worker/parser.rb', line 76

def parse_files(repo_path, rel_paths)
  rel_paths.map { |rel| parse_one(repo_path, rel) }
end