Module: Parse::PipelineSecurity

Defined in:
lib/parse/pipeline_security.rb

Overview

Canonical security validator for MongoDB aggregation pipelines and filter hashes that the SDK forwards to the driver or to Parse Server.

Previously the codebase had three different validators with three different rule sets:

  • ‘Parse::Agent::PipelineValidator` — strict allowlist for the Agent (read-only paths only)

  • ‘Parse::Query#validate_pipeline!` — outer-stage-only denylist

  • ‘Parse::MongoDB.assert_no_denied_operators!` — recursive denylist of server-side JS operators

‘Parse::AtlasSearch.convert_filter_for_mongodb` was a complete passthrough that bypassed all three. A user-supplied filter containing `$where`/`$expr`/`$function`/`$regex` was injected straight into the pipeline `$match` stage, bypassing every existing constraint guard.

This module consolidates the rules. Every entry point that forwards a caller-supplied pipeline or filter to MongoDB now routes through one of the two public methods here:

  • PipelineSecurity.validate_pipeline! — strict mode (allowlist + size/depth caps). Used by ‘Parse::Agent` and by `Parse::Query#aggregate` for user-facing aggregation entry points.

  • PipelineSecurity.validate_filter! — permissive mode (recursive denylist only). Used by ‘Parse::MongoDB.find/aggregate` and Atlas Search filter passthrough where the pipeline is constructed by SDK code but a user-controlled filter hash is interpolated. Refuses `$where`/`$function`/`$accumulator` and the data-mutating stages at any nesting depth.

Policy: allowlist top-level, denylist recursive

Strict mode enforces ALLOWED_STAGES ONLY at the top-level stage key — nested sub-pipelines (inside ‘$lookup.pipeline`, `$unionWith.pipeline`, `$facet.*`, `$graphLookup`) are walked with the operator denylist but NOT with the stage allowlist. This is intentional: Atlas Search and uncommon-but-legitimate read stages like `$densify` and `$fill` must be allowed inside sub-pipelines even when the outer pipeline is strict-validated. The denylist is the security boundary; the allowlist is a shape check.

Caveat for Query#aggregate callers

‘Parse::Query#aggregate` routes through PipelineSecurity.validate_filter!, not PipelineSecurity.validate_pipeline!, so user-supplied pipelines are checked against the denylist only. Permissive mode does NOT block `$lookup`, `$graphLookup`, or `$unionWith` reading from arbitrary collections — these are legitimate read stages but powerful enough to cross Parse ACL/CLP boundaries when the source collection lacks row-level enforcement. **Never pass raw attacker-controlled input into `Parse::Query#aggregate`.** Construct the pipeline in SDK code and interpolate only validated values.

Capability gap: ‘$expr`

‘$expr` itself is not in DENIED_OPERATORS. The recursive walker catches `$function`/`$accumulator` nested inside `$expr`, so the immediate JavaScript-execution risk is closed. A future Atlas operator gated under `$expr` would slip until DENIED_OPERATORS is extended. Defense-in-depth callers concerned about expensive aggregation expressions (`$regexMatch` ReDoS, large `$reduce` loops) should validate user input shape before reaching this module.

Defined Under Namespace

Classes: Error

Constant Summary collapse

DENIED_OPERATORS =

Operators that are ALWAYS refused at any nesting depth. These either execute server-side JavaScript (‘$where`, `$function`, `$accumulator`) or mutate the database (`$out`, `$merge`) or the server itself (`$collMod`, `$createIndex`, `$dropIndex`, `$planCacheSetFilter`, `$planCacheClear`). None of them are needed for read queries.

%w[
  $where $function $accumulator
  $out $merge
  $collMod $createIndex $dropIndex
  $planCacheSetFilter $planCacheClear
].freeze
DENIED_FIELD_REFS =

Field-reference paths (string values inside ‘$expr` whose first byte is `$`) that point at server-internal columns and must never be reachable from a user-influenced pipeline. A boolean expression inside `$expr` over any of these is a 1-bit-per-query side channel that bisects the value of a bcrypt hash, session token, or password-reset token. Names match Parse Server’s internal column layout (cf. MongoStorageAdapter).

%w[
  $_hashed_password $_password_history
  $_session_token $_sessionToken
  $_email_verify_token $_perishable_token
  $_failed_login_count $_account_lockout_expires_at
  $_rperm $_wperm
  $_auth_data
].freeze
DENIED_FIELD_REF_PREFIXES =

String prefix for per-provider auth-data field references inside $expr. Parse Server stores per-provider columns as ‘_auth_data_facebook`, `_auth_data_google`, etc. — none of these should be reachable from a user-influenced pipeline. The prefix `$auth_data` covers all of them without requiring an exhaustive list.

%w[$_auth_data_].freeze
ALLOWED_UNDERSCORE_COLLECTIONS =

MongoDB collection names that an SDK aggregation IS permitted to name in ‘from:`/`coll:`. Any name starting with `_` outside this set is refused as an internal Parse Server collection. The four entries here are the only `_`-prefixed collections that hold Parse SDK data classes; everything else with a leading `_` is server-managed state (`_SCHEMA` discloses class-level permissions; `_Hooks` discloses Cloud Code webhook URLs + secret keys; `_GraphQLConfig` discloses GraphQL schema state; `_Audit` holds operational telemetry; `_Idempotency`/`_PushStatus`/ `_JobStatus`/`_JobSchedule`/`_GlobalConfig`/`_Audience` hold internal Parse Server bookkeeping).

%w[_User _Role _Installation _Session].freeze
INTERNAL_FIELDS_DENYLIST =

Field names that are internal to Parse Server’s storage layout and must never appear in returned documents. Most are stripped by ‘Parse::MongoDB.convert_document_to_parse`, but a raw-result path (`raw: true`) bypasses that conversion and would otherwise surface the bcrypt hash, session token, or reset token.

‘sessionToken` / `session_token` (no leading underscore) are the credential column on `_Session` rows. Unlike the `_User`-side `_session_token`, the Session class declares it as a regular property, so without this entry a master-key agent that has had the class explicitly unhidden would receive raw bearer tokens in every row of a `query_class(“_Session”)` response. The denylist is the process-level floor — independent of class-visibility state — so even a deliberate `agent_unhidden` on `_Session` (or a compromised superadmin tool) cannot exfiltrate active tokens.

%w[
  _hashed_password _password_history
  _session_token _sessionToken
  sessionToken session_token
  _email_verify_token _perishable_token
  _failed_login_count _account_lockout_expires_at
  _rperm _wperm _tombstone
  _auth_data
].freeze
INTERNAL_FIELDS_PREFIX_DENYLIST =

Prefix covering per-provider auth-data columns (‘_auth_data_facebook`, `_auth_data_google`, …). Used by strip_internal_fields and by the walk_for_denied! field-name screen.

%w[_auth_data_].freeze
FORENSIC_OPERATORS =

Forensic string-introspection operators. When any of these appears INSIDE ‘$expr` with a field-reference input string, the query becomes a per-character oracle even though the operator itself is otherwise legitimate. Refused inside `$expr` regardless of the input — the validator does not try to introspect operand shapes deeply, and these operators have no legitimate use against Parse-Server-managed columns from an SDK aggregation.

%w[
  $regexMatch $regexFind $regexFindAll
  $substr $substrBytes $substrCP
  $indexOfBytes $indexOfCP
  $strLenBytes $strLenCP
  $strcasecmp
].freeze
ALLOWED_STAGES =

Top-level pipeline stages permitted by the strict validator. The set covers Parse-Stack’s own aggregation use, plus Atlas Search entry points (‘$search`, `$searchMeta`, `$listSearchIndexes`) so that `Parse::AtlasSearch` calls do not break.

%w[
  $match $group $sort $project $limit $skip $unwind $lookup
  $count $addFields $set $unset $bucket $bucketAuto $facet
  $sample $sortByCount $replaceRoot $replaceWith $redact
  $graphLookup $unionWith
  $search $searchMeta $listSearchIndexes
].freeze
MAX_PIPELINE_STAGES =

Cap on number of top-level stages in a strict-validated pipeline.

20
MAX_DEPTH =

Cap on nested object/array depth during recursive walks. Stops a caller from forcing the validator into a near-infinite traversal. Legitimate Parse-generated pipelines with ‘$facet` containing `$lookup` with `let` and correlated sub-pipelines (`$match.$expr. $and.`) can reach depth 12+ on a normal read, so we keep comfortable headroom above the real ceiling.

20

Class Method Summary collapse

Class Method Details

.assert_collection_allowed!(name) ⇒ Object

Refuses any collection name reserved for Parse Server’s internal state. Accepts the four SDK-data system classes (‘_User`, `_Role`, `_Installation`, `_Session`) and any non-`_`-prefixed name. Used by `LookupRewriter` and by the Agent’s pipeline walker to enforce a hard floor independent of any per-Agent ‘MetadataRegistry.hidden?` policy.

Parameters:

  • name (String, Symbol, nil)

    the collection name from ‘from:`/`coll:`. `nil` is treated as “no collection named” – the caller passes through.

Raises:



282
283
284
285
286
287
288
289
290
291
292
293
294
# File 'lib/parse/pipeline_security.rb', line 282

def assert_collection_allowed!(name)
  return if name.nil?
  str = name.to_s
  return if str.empty?
  return unless str.start_with?("_")
  return if ALLOWED_UNDERSCORE_COLLECTIONS.include?(str)
  raise Error.new(
    "SECURITY: Collection '#{str}' is reserved for Parse Server's internal " \
    "state and is not reachable from an SDK aggregation pipeline.",
    operator: str,
    reason: :denied_internal_collection,
  )
end

.refuse_protected_field_references!(pipeline, collection_name, resolution) ⇒ void

This method returns an undefined value.

Wave-3 TRACK-CLP-4: refuse caller-supplied pipelines that reference a protected field via ‘$<field>` on the RHS of a `$project` / `$addFields` / `$set` / `$group` / `$bucket` / `$replaceWith` / `$lookup.let` clause.

The protectedFields enforcement layer (CLPScope.redact_protected_fields!) strips the field by NAME from the result rows. But a pipeline can launder a protected field through a rename:

{ "$addFields" => { "ssn_copy" => "$ssn" } }
{ "$project"   => { "renamed"  => "$ssn", "objectId" => 1 } }
{ "$group"     => { "_id" => "$ssn", "n" => { "$sum" => 1 } } }

The post-fetch strip walks the rows and deletes ‘ssn` keys, but the value is now stored under `ssn_copy` / `renamed` / `_id`, so the strip walks past it. This scanner runs BEFORE the pipeline reaches Mongo: any `$<field>` string whose unprefixed name is in the class’s protected-fields set raises CLPScope::Denied so the caller knows the join was refused, rather than silently leaking the renamed value.

Variable references (‘$$ROOT`, `$$CURRENT`, `$$user_var`) are NOT field references — they’re aggregation variables. The walker checks the leading ‘$` is single, not double, before treating the string as a field path.

Master mode + nil resolution short-circuit at the entry: the walker is a no-op when the caller can read everything anyway.

Parameters:

  • pipeline (Array<Hash>)

    the caller-supplied pipeline, before SDK-side ACL stages are prepended.

  • collection_name (String)

    the queried collection / class.

  • resolution (Parse::ACLScope::Resolution, nil)

    the resolved scope; nil-or-master short-circuits.

Raises:

  • (Parse::CLPScope::Denied)

    when any nested string in the pipeline names a protected field via ‘$<name>` syntax.



347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
# File 'lib/parse/pipeline_security.rb', line 347

def refuse_protected_field_references!(pipeline, collection_name, resolution)
  return if resolution.nil? || (resolution.respond_to?(:master?) && resolution.master?)
  return if pipeline.nil? || pipeline.empty?
  perms = resolution.respond_to?(:permission_strings) ? resolution.permission_strings : nil
  return if perms.nil?

  # Lazy-require to avoid forcing CLPScope load order when the
  # caller hasn't otherwise needed it.
  require_relative "clp_scope" unless defined?(Parse::CLPScope)

  protected_set = Parse::CLPScope.protected_fields_for(collection_name, perms)
  return if protected_set.nil? || protected_set.empty?

  pipeline.each_with_index do |stage, idx|
    walk_for_protected_ref!(stage, protected_set, collection_name, "pipeline[#{idx}]")
  end
  nil
end

.strip_internal_fields(doc) ⇒ Object

Strip INTERNAL_FIELDS_DENYLIST keys from a Hash document (one level deep – raw search documents are flat). Returns a new Hash; the input is not mutated. Non-Hash inputs return unchanged so callers can pipe arbitrary cursor entries through this.



300
301
302
303
304
305
306
307
308
# File 'lib/parse/pipeline_security.rb', line 300

def strip_internal_fields(doc)
  return doc unless doc.is_a?(Hash)
  doc.each_with_object({}) do |(key, value), out|
    k = key.to_s
    next if INTERNAL_FIELDS_DENYLIST.include?(k)
    next if INTERNAL_FIELDS_PREFIX_DENYLIST.any? { |prefix| k.start_with?(prefix) }
    out[key] = value
  end
end

.valid_filter?(node) ⇒ Boolean

Returns true if the node passes permissive validation.

Returns:

  • (Boolean)

    true if the node passes permissive validation.



263
264
265
266
267
268
# File 'lib/parse/pipeline_security.rb', line 263

def valid_filter?(node)
  validate_filter!(node)
  true
rescue Error
  false
end

.valid_pipeline?(pipeline) ⇒ Boolean

Returns true if the pipeline passes strict validation.

Returns:

  • (Boolean)

    true if the pipeline passes strict validation.



255
256
257
258
259
260
# File 'lib/parse/pipeline_security.rb', line 255

def valid_pipeline?(pipeline)
  validate_pipeline!(pipeline)
  true
rescue Error
  false
end

.validate_filter!(node, allow_internal_fields: false) ⇒ true

Permissive validation: walks the given Hash or Array (or anything else, which is a no-op) and refuses any nested key that appears in DENIED_OPERATORS. Does NOT check the top-level stage allowlist or the stage count cap. Used by direct-MongoDB sinks where callers have explicit intent and want flexibility in stage selection, but server-side JS and data-mutating operators must still be refused.

Parameters:

  • node (Hash, Array, Object)

    the structure to walk.

  • allow_internal_fields (Boolean) (defaults to: false)

    when true, skip the INTERNAL_FIELDS_DENYLIST check (e.g. for SDK-generated ACL filters that legitimately reference _rperm/_wperm via Query#readable_by_role and friends). The DENIED_OPERATORS walk and forensic-operator gating still apply. Default false for callers that forward raw, user-influenced pipelines (e.g. Agent MCP tools).

Returns:

  • (true)

Raises:

  • (Error)

    if a denied operator is found at any depth.



249
250
251
252
# File 'lib/parse/pipeline_security.rb', line 249

def validate_filter!(node, allow_internal_fields: false)
  walk_for_denied!(node, depth: 0, allow_internal_fields: allow_internal_fields)
  true
end

.validate_pipeline!(pipeline) ⇒ true

Strict validation: pipeline must be a non-empty Array of Hashes, each Hash’s top-level key must be in ALLOWED_STAGES, and no entry in DENIED_OPERATORS may appear at any nesting depth.

Parameters:

  • pipeline (Array<Hash>)

    the aggregation pipeline.

Returns:

  • (true)

Raises:

  • (Error)

    if validation fails.



211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
# File 'lib/parse/pipeline_security.rb', line 211

def validate_pipeline!(pipeline)
  unless pipeline.is_a?(Array)
    raise Error.new("Pipeline must be an Array, got #{pipeline.class}", reason: :invalid_type)
  end
  if pipeline.empty?
    raise Error.new("Pipeline cannot be empty", reason: :empty_pipeline)
  end
  if pipeline.size > MAX_PIPELINE_STAGES
    raise Error.new(
      "Pipeline exceeds maximum of #{MAX_PIPELINE_STAGES} stages (got #{pipeline.size})",
      reason: :too_many_stages,
    )
  end

  pipeline.each_with_index do |stage, idx|
    validate_stage!(stage, idx)
  end
  true
end