Class: Gitlab::SecretDetection::Core::Scanner

Inherits:
Object
  • Object
show all
Defined in:
lib/gitlab/secret_detection/core/scanner.rb

Overview

Scan is responsible for running Secret Detection scan operation

Constant Summary collapse

DEFAULT_SCAN_TIMEOUT_SECS =

default time limit(in seconds) for running the scan operation per invocation

180
DEFAULT_PAYLOAD_TIMEOUT_SECS =

default time limit(in seconds) for running the scan operation on a single payload

30
DEFAULT_PATTERN_MATCHER_TAGS =

Tags used for creating default pattern matcher

['gitlab_blocking'].freeze
MAX_PROCS_PER_REQUEST =

Max no of child processes to spawn per request ref: gitlab.com/gitlab-org/gitlab/-/issues/430160

5
MIN_CHUNK_SIZE_PER_PROC_BYTES =

Minimum cumulative size of the payloads required to spawn and run the scan within a new subprocess.

2_097_152
RUN_IN_SUBPROCESS =

Whether to run scan in subprocesses or not. Default is false.

ENV.fetch('GITLAB_SD_RUN_IN_SUBPROCESS', false)
DEFAULT_MAX_FINDINGS_LIMIT =

Default limit for max findings to be returned in the scan

999

Instance Method Summary collapse

Constructor Details

#initialize(rules:, logger: Logger.new($stdout)) ⇒ Scanner

Initializes the instance with logger along with following operations:

  1. Extract keywords from the parsed ruleset to use it for matching keywords before regex operation.

  2. Build and Compile rule regex patterns obtained from the ruleset with DEFAULT_PATTERN_MATCHER_TAGS

tags. Raises RulesetCompilationError in case the regex pattern compilation fails.



35
36
37
38
39
40
41
42
43
44
45
46
47
# File 'lib/gitlab/secret_detection/core/scanner.rb', line 35

def initialize(rules:, logger: Logger.new($stdout))
  @logger = logger
  @rules = rules
  @keywords = create_keywords(rules)
  @default_keyword_matcher = build_keyword_matcher(
    tags: DEFAULT_PATTERN_MATCHER_TAGS,
    include_missing_tags: false
  )
  @default_pattern_matcher, @default_rules = build_pattern_matcher(
    tags: DEFAULT_PATTERN_MATCHER_TAGS,
    include_missing_tags: false
  ) # includes only gitlab_blocking rules
end

Instance Method Details

#secrets_scan(payloads, timeout: DEFAULT_SCAN_TIMEOUT_SECS, payload_timeout: DEFAULT_PAYLOAD_TIMEOUT_SECS, exclusions: {}, tags: DEFAULT_PATTERN_MATCHER_TAGS, subprocess: RUN_IN_SUBPROCESS, max_findings_limit: DEFAULT_MAX_FINDINGS_LIMIT) ⇒ Object

Runs Secret Detection scan on the list of given payloads. Both the total scan duration and the duration for each payload is time bound via timeout and payload_timeout respectively.

payloads

Array of payloads where each payload should have ‘id` and `data` properties.

timeout

No of seconds(accepts floating point for smaller time values) to limit the total scan duration

payload_timeout

No of seconds(accepts floating point for smaller time values) to limit

the scan duration on each payload
exclusions

Hash with keys: :raw_value, :rule and values of arrays of either

GRPC::Exclusion objects (when used as a standalone service)
or Security::ProjectSecurityExclusion objects (when used as gem).
:raw_value - Exclusions in the :raw array are the raw values to ignore.
:rule - Exclusions in the :rule array are the rules to exclude from the ruleset used for the scan.
Each rule is represented by its ID. For example: `gitlab_personal_access_token`
for representing Gitlab Personal Access Token. By default, no rule is excluded from the ruleset.
tags

Array of tag values to filter from the default ruleset when determining the rules used for the scan.

For example: Add `gitlab_blocking` to include only rules for Push Protection. Defaults to
[`gitlab_blocking`] (+DEFAULT_PATTERN_MATCHER_TAGS+).
max_findings_limit

Integer to limit the number of findings to be returned in the scan. Defaults

to 999 (+DEFAULT_MAX_FINDINGS_LIMIT+).

NOTE: Running the scan in fork mode primarily focuses on reducing the memory consumption of the scan by offloading regex operations on large payloads to sub-processes. However, it does not assure the improvement in the overall latency of the scan, specifically in the case of smaller payloads, where the overhead of forking a new process adds to the overall latency of the scan instead. More reference on Subprocess-based execution is found here: gitlab.com/gitlab-org/gitlab/-/issues/430160.

Returns an instance of Gitlab::SecretDetection::Core::Response by following below structure:

status: One of the Core::Status values
results: [SecretDetection::Finding]



82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# File 'lib/gitlab/secret_detection/core/scanner.rb', line 82

def secrets_scan(
  payloads,
  timeout: DEFAULT_SCAN_TIMEOUT_SECS,
  payload_timeout: DEFAULT_PAYLOAD_TIMEOUT_SECS,
  exclusions: {},
  tags: DEFAULT_PATTERN_MATCHER_TAGS,
  subprocess: RUN_IN_SUBPROCESS,
  max_findings_limit: DEFAULT_MAX_FINDINGS_LIMIT
)
  return Core::Response.new(status: Core::Status::INPUT_ERROR) unless validate_scan_input(payloads)

  # assign defaults since grpc passing zero timeout value to `Timeout.timeout(..)` makes it effectively useless.
  timeout = DEFAULT_SCAN_TIMEOUT_SECS unless timeout.positive?
  payload_timeout = DEFAULT_PAYLOAD_TIMEOUT_SECS unless payload_timeout.positive?
  tags = DEFAULT_PATTERN_MATCHER_TAGS if tags.empty?

  Timeout.timeout(timeout) do
    keyword_matcher = build_keyword_matcher(tags:)

    matched_payloads = filter_by_keywords(keyword_matcher, payloads)

    next Core::Response.new(status: Core::Status::NOT_FOUND) if matched_payloads.empty?

    # the pattern matcher will filter rules by tags so we use the filtered rule list
    pattern_matcher, active_rules = build_pattern_matcher(tags:)

    scan_args = {
      payloads: matched_payloads,
      payload_timeout:,
      pattern_matcher:,
      exclusions:,
      rules: active_rules,
      max_findings_limit:
    }.freeze

    logger.info(
      message: "Scan input parameters for running Secret Detection scan",
      timeout:,
      payload_timeout:,
      given_total_payloads: payloads.length,
      scannable_payloads_post_keyword_filter: matched_payloads.length,
      tags:,
      run_in_subprocess: subprocess,
      max_findings_limit:,
      given_exclusions: format_exclusions_hash(exclusions)
    )

    secrets, applied_exclusions = subprocess ? run_scan_within_subprocess(**scan_args) : run_scan(**scan_args)

    scan_status = overall_scan_status(secrets)

    logger.info(
      message: "Secret Detection scan completed with #{secrets.length} secrets detected in the given payloads",
      detected_secrets_metadata: (secrets),
      applied_exclusions: format_exclusions_arr(applied_exclusions)
    )

    Core::Response.new(status: scan_status, results: secrets, applied_exclusions:)
  end
rescue Timeout::Error => e
  logger.error "Secret detection operation timed out: #{e}"

  Core::Response.new(status: Core::Status::SCAN_TIMEOUT)
end