Module: ElasticGraph

Defined in:
lib/elastic_graph/error.rb,
lib/elastic_graph/version.rb,
lib/elastic_graph/constants.rb,
lib/elastic_graph/support/logger.rb,
lib/elastic_graph/support/time_set.rb,
lib/elastic_graph/support/hash_util.rb,
lib/elastic_graph/support/threading.rb,
lib/elastic_graph/support/time_util.rb,
lib/elastic_graph/support/from_yaml_file.rb,
lib/elastic_graph/support/memoizable_data.rb,
lib/elastic_graph/support/monotonic_clock.rb,
lib/elastic_graph/support/untyped_encoder.rb,
lib/elastic_graph/support/graphql_formatter.rb,
lib/elastic_graph/support/faraday_middleware/support_timeouts.rb,
lib/elastic_graph/support/faraday_middleware/msearch_using_get_instead_of_post.rb

Overview

Copyright 2024 Block, Inc.

Use of this source code is governed by an MIT-style license that can be found in the LICENSE file or at opensource.org/licenses/MIT.

frozen_string_literal: true

Defined Under Namespace

Modules: Support Classes: BadDatastoreRequest, ClusterOperationError, ConfigCannotBeMutatedError, ConfigError, ConfigSettingNotSetError, CountUnavailableError, CursorEncoderError, CursorEncodingError, Error, IdentifyDocumentVersionsFailedError, IndexOperationError, InvalidAggregationKeyError, InvalidArgumentValueError, InvalidCursorError, InvalidEventIDError, InvalidExtensionError, InvalidGraphQLNameError, InvalidMergeError, InvalidScriptDirectoryError, InvalidSortFieldsError, MessageIdsMissingError, MissingSchemaArtifactError, NotFoundError, QueryMergeError, RequestExceededDeadlineError, S3OperationFailedError, SchemaError, SearchFailedError, UnknownYAMLSettingError, UnsupportedOperationError

Constant Summary collapse

VERSION =

The version of all ElasticGraph gems.

"0.17.1.4"
DATASTORE_DATE_FORMAT =

The datastore date format used by ElasticGraph. Matches ISO-8601/RFC-3339. See www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html#built-in-date-formats

"strict_date"
DATASTORE_DATE_TIME_FORMAT =

The datastore date time format used by ElasticGraph. Matches ISO-8601/RFC-3339. See www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html#built-in-date-formats

"strict_date_time"
TIMEOUT_MS_HEADER =

HTTP header that ElasticGraph HTTP implementations (e.g. elasticgraph-rack, elasticgraph-lambda) look at to determine a client-specified request timeout.

"ElasticGraph-Request-Timeout-Ms"
INT_MIN =

Min/max values for the ‘Int` type. Based on the GraphQL spec:

> If the integer internal value represents a value less than -2^31 or greater > than or equal to 2^31, a field error should be raised.

(from spec.graphql.org/June2018/#sec-Int)

-(2**31).to_int
INT_MAX =
- 1
JSON_SAFE_LONG_MIN =
- 1).to_int
JSON_SAFE_LONG_MAX =
-JSON_SAFE_LONG_MIN
LONG_STRING_MIN =

Min/max values for our ‘LongString` type. This range is derived from the Elasticsearch docs on its longs: > A signed 64-bit integer with a minimum value of -2^63 and a maximum value of 2^63 - 1. (from www.elastic.co/guide/en/elasticsearch/reference/current/number.html)

-(2**63).to_int
LONG_STRING_MAX =
- 1
DEFAULT_MAX_KEYWORD_LENGTH =

When indexing large string values into the datastore, we’ve observed errors like:

> bytes can be at most 32766 in length

This is also documented on the Elasticsearch docs site, under “Choosing a keyword family field type”: www.elastic.co/guide/en/elasticsearch/reference/8.2/keyword.html#wildcard-field-type

Note that it’s a byte limit, but JSON schema’s maxLength is a limit on the number of characters. UTF8 uses up to 4 bytes per character so to guard against a maliciously crafted payload, we limit the length to a quarter of 32766.

32766 / 4
DEFAULT_MAX_TEXT_LENGTH =

Strings indexed as ‘text` can be much larger than `keyword` fields. In fact, there’s no limitation on the ‘text` length, except for the overall size of the HTTP request body when we attempt to index a `text` field. By default it’s limited to 100MB via the ‘http.max_content_length` setting:

www.elastic.co/guide/en/elasticsearch/reference/8.11/modules-network.html#http-settings

Note: there’s no guarantee that ‘text` values shorter than this will succeed when indexing them–it depends on how many other fields and documents are included in the indexing payload, since the limit is on the overall payload size, and not on the size of one field. Given that, there’s not really a discrete value we can use for the max length that guarantees successful indexing. But we know that values larger than this will fail, so this is the limit we use.

100 * (2**20).to_int
EVENT_ENVELOPE_JSON_SCHEMA_NAME =

The name of the JSON schema definition for the ElasticGraph event envelope.

"ElasticGraphEventEnvelope"
SINGLETON_CURSOR =

For some queries, we wind up needing a pagination cursor for a collection that will only ever contain a single value (and has no “key” to speak of to encode into a cursor). In those contexts, we’ll use this as the cursor value. Ideally, we want this to be a value that could never be produced by our normal cursor encoding logic. This cursor is encoded from data that includes a UUID, which we can trust is unique.

"eyJ1dWlkIjoiZGNhMDJkMjAtYmFlZS00ZWU5LWEwMjctZmVlY2UwYTZkZTNhIn0="
GRAPHQL_SCHEMA_FILE =

Schema artifact file names.

"schema.graphql"
JSON_SCHEMAS_FILE =
"json_schemas.yaml"
DATASTORE_CONFIG_FILE =
"datastore_config.yaml"
RUNTIME_METADATA_FILE =
"runtime_metadata.yaml"
JSON_SCHEMAS_BY_VERSION_DIRECTORY =

Name for directory that contains versioned json_schemas files.

"json_schemas_by_version"
JSON_SCHEMA_VERSION_KEY =

Name for field in json schemas files that represents schema “version”.

"json_schema_version"
ROLLOVER_INDEX_INFIX_MARKER =

String that goes in the middle of a rollover index name, used to mark it as a rollover index (and split on to parse a rollover index name).

"_rollover__"
DERIVED_INDEX_FAILURE_MESSAGE_PREAMBLE =
"Derived index update failed due to bad input data"
INDEX_DATA_UPDATE_SCRIPT_ID =

The current id of our static ‘index_data` update script. Verified by a test so you can count on it being accurate. We expose this as a constant so that we can detect this specific script in environments where we can’t count on ‘elasticgraph-schema_definition` (where the script is defined) being available, since that gem is usually only used in development.

Note: this constant is automatically kept up-to-date by our ‘schema_artifacts:dump` rake task.

"update_index_data_d577eb4b07ee3c53b59f2f6d6c7b2413"
OLD_INDEX_DATA_UPDATE_SCRIPT_ID =

The id of the old version of the update data script before ElasticGraph v0.9. For now, we are maintaining backwards compatibility with how it recorded event versions, and we have test coverage for that which relies upon this id.

TODO: Drop this when we no longer need to maintain backwards-compatibility.

"update_index_data_9b97090d5c97c4adc82dc7f4c2b89bc5"
UPDATE_WAS_NOOP_MESSAGE_PREAMBLE =

When an update script has a no-op result we often want to communicate more information about why it was a no-op back to ElatsicGraph from the script. The only way to do that is to throw an exception with an error message, but, as far as I can tell, painless doesn’t let you define custom exception classes. To allow elasticgraph-indexer to detect that the script “failed” due to a no-op (rather than a true failure) we include this common preamble in the exception message thrown from our update scripts for the no-op case.

"ElasticGraph update was a no-op: "
SELF_RELATIONSHIP_NAME =

The name used to refer to a document’s own/primary source event (that is, the event that has a ‘type` matching the document’s type). The name here was chosen to avoid naming collisions with relationships defined via the ‘relates_to_one`/`relates_to_many` APIs. The GraphQL spec reserves the double-underscore prefix on field names, which means that users cannot define a relationship named `__self` via the `relates_to_one`/`relates_to_many` APIs.

"__self"
VALID_LOCAL_TIME_REGEX =

This regex aligns with the datastore format of HH:mm:ss || HH:mm:ss.S || HH:mm:ss.SS || HH:mm:ss.SSS See rubular.com/r/NHjBWrpZvzOTJO for examples.

/\A(([0-1][0-9])|(2[0-3])):[0-5][0-9]:[0-5][0-9](\.[0-9]{1,3})?\z/
VALID_LOCAL_TIME_JSON_SCHEMA_PATTERN =

‘VALID_LOCAL_TIME_REGEX`, expressed as a JSON schema pattern. JSON schema supports a subset of Ruby Regexp features and is expressed as a String object. Here we convert from the Ruby Regexp start-and-end-of-string anchors (A and z) and convert them to the JSON schema ones (^ and $).

For more info, see: json-schema.org/understanding-json-schema/reference/regular_expressions.html www.rexegg.com/regex-anchors.html

VALID_LOCAL_TIME_REGEX.source.sub(/\A\\A/, "^").sub(/\\z\z/, "$")
LIST_COUNTS_FIELD =

Special hidden field defined in an index where we store the count of elements in each list field. We index the list counts so that we can offer a ‘count` filter operator on list fields, allowing clients to query on the count of list elements.

The field name has a leading ‘__` because the GraphQL spec reserves that prefix for its own use, and we can therefore assume that no GraphQL fields have this name.

"__counts"
LIST_COUNTS_FIELD_PATH_KEY_SEPARATOR =

Character used to separate parts of a field path for the keys in the special ‘__counts` field which contains the counts of the various list fields. We were going to use a dot (as you’d expect) but ran into errors like this from the datastore:

> can’t merge a non object mapping [seasons.players.__counts.seasons] with an object mapping

When we have a list of ‘object`, and then a list field on that object type, we want to store the count of both the parent list and the child list, but if we use dots then the datastore treats it like a nested JSON object, and the JSON entry at the parent path can’t both be an integer (for the parent list count) and an object containing counts of its child lists.

By using ‘|` instead of `.`, we avoid this problem.

"|"
DATASTORE_PROPERTYLESS_OBJECT_TYPES =

The set of datastore field types which have no ‘properties` in the mapping, but which can be represented as a JSON object at indexing time.

I built this list by auditing the full list of index field mapping types: www.elastic.co/guide/en/elasticsearch/reference/8.9/mapping-types.html

[
  "aggregate_metric_double", # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/aggregate-metric-double.html
  "completion", # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/search-suggesters.html#completion-suggester
  "flattened", # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/flattened.html
  "geo_point", # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/geo-point.html
  "geo_shape", # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/geo-shape.html
  "histogram", # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/histogram.html
  "join", # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/parent-join.html
  "percolator", # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/percolator.html
  "point", # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/point.html
  "range", # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/range.html
  "rank_features", # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/rank-features.html
  "shape" # https://www.elastic.co/guide/en/elasticsearch/reference/8.9/shape.html
].to_set
GRAPHQL_NAME_WITHIN_LARGER_STRING_PATTERN =

This pattern matches the spec for a valid GraphQL name: spec.graphql.org/June2018/#sec-Names

…however, it allows additional non-valid characters before and after it.

/[_A-Za-z][_0-9A-Za-z]*/
GRAPHQL_NAME_PATTERN =

This pattern exactly matches a valid GraphQL name, with no extra characters allowed before or after.

/\A#{GRAPHQL_NAME_WITHIN_LARGER_STRING_PATTERN}\z/
GRAPHQL_NAME_VALIDITY_DESCRIPTION =

Description in English of the requirements for GraphQL names. (Used in multiple error messages).

"Names are limited to ASCII alphanumeric characters (plus underscore), and cannot start with a number."
STOCK_GRAPHQL_SCALARS =

The standard set of scalars that are defined by the GraphQL spec: spec.graphql.org/October2021/#sec-Scalars

%w[Boolean Float ID Int String].to_set.freeze
JSON_META_SCHEMA =

The current variant of JSON schema that we use.

"http://json-schema.org/draft-07/schema#"
DATASTORE_BULK_FILTER_PATH =

Filter the bulk response payload with a comma separated list using dot notation. www.elastic.co/guide/en/elasticsearch/reference/7.10/common-options.html#common-options-response-filtering

Note: anytime you change this constant, be sure to check all the comments in the unit specs that mention this constant. When stubbing a datastore client test double, it doesn’t respect this filtering obviously, so it’s up to us to accurately mimic the filtering in our stubbed responses.

[
  # The key under `items` names the type of operation (e.g. `index` or `update`) and
  # we use a `*` for it since we always use that key, regardless of which operation it is.
  "items.*.status", "items.*.result", "items.*.error"
].join(",")
GRAPHQL_LAMBDA_AWS_ARN_HEADER =

HTTP header set by ‘elasticgraph-graphql_lambda` to indicate the AWS ARN of the caller.

"X-AWS-LAMBDA-CALLER-ARN"