Class: ElasticGraph::GraphQL::DatastoreQuery

Inherits:

Object

Object
ElasticGraph::GraphQL::DatastoreQuery

show all

Defined in:: lib/elastic_graph/graphql/datastore_query.rb,
lib/elastic_graph/graphql/datastore_query/paginator.rb,
lib/elastic_graph/graphql/datastore_query/routing_picker.rb,
lib/elastic_graph/graphql/datastore_query/document_paginator.rb,
lib/elastic_graph/graphql/datastore_query/index_expression_builder.rb

Overview

An immutable class that represents a datastore query. Since this represents a datastore query, and not a GraphQL query, all the data in it is modeled in datastore terms, not GraphQL terms. For example, any field names in a ‘Query` should be references to index fields, not GraphQL fields.

Filters are modeled as a ‘Set` of filtering hashes. While we usually expect only a single `filter` hash, modeling it as a set makes it easy for us to support merging queries. The datastore knows how to apply multiple `must` clauses that apply to the same field, giving us the exact semantics we want in such a situation with minimal effort.

Defined Under Namespace

Classes: Builder, IndexExpression, Paginator

Class Method Summary collapse

.perform(queries) ⇒ Object

Performs a list of queries by building a hash of datastore msearch header/body tuples (keyed by query), yielding them to the caller, and then post-processing the results.

Instance Method Summary collapse

#all_filters ⇒ Object
#cluster_name ⇒ Object

Returns the name of the datastore cluster as a String where this query should be setn.
#document_paginator ⇒ Object
#effective_size ⇒ Object
#empty? ⇒ Boolean

Indicates if the query does not need any results from the datastore.
#excluding_indices? ⇒ Boolean
#hash ⇒ Object

‘DatastoreQuery` objects are used as keys in a hash.
#inspect ⇒ Object
#merge_with(individual_docs_needed: false, total_document_count_needed: false, client_filters: [], internal_filters: [], sort: [], requested_fields: [], request_all_fields: false, requested_highlights: [], request_all_highlights: false, document_pagination: {}, size_multiplier: 1, monotonic_clock_deadline: nil, aggregations: {}) ⇒ Object

Merges in the provided attribute overrides, honoring the intended semantics and invariants of ‘DatastoreQuery`.
#route_with_field_paths ⇒ Object

Returns a list of unique field paths that should be used for shard routing during searches.
#search_index_expression ⇒ Object

Returns an index_definition expression string to use for searches.
#shard_routing_values ⇒ Object

The shard routing values used for this search.
#to_datastore_msearch_header ⇒ Object
#to_datastore_msearch_header_and_body ⇒ Object

Pairs the multi-search headers and body into a tuple, as per the format required by the datastore: www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html#search-multi-search-api-desc.

Class Method Details

.perform(queries) ⇒ `Object`

Performs a list of queries by building a hash of datastore msearch header/body tuples (keyed by query), yielding them to the caller, and then post-processing the results. The caller is responsible for returning a hash of responses by query from its block.

Note that some of the passed queries may not be yielded to the caller; when we can tell that a query does not have to be sent to the datastore we avoid yielding it from here. Therefore, the caller should not assume that all queries passed to this method will be yielded back.

The return value is a hash of ‘DatastoreResponse::SearchResponse` objects by query.

Note: this method uses ‘send` to work around ruby visibility rules. We do not want `#decoded_cursor_factory` to be public, as we only need it here, but we cannot access it from a class method without using `send`.

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 57

def self.perform(queries)
  empty_queries, present_queries = queries.partition(&:empty?)

  responses_by_query = Aggregation::QueryOptimizer.optimize_queries(present_queries) do |optimized_queries|
    header_body_tuples_by_query = optimized_queries.each_with_object({}) do |query, hash|
      hash[query] = query.to_datastore_msearch_header_and_body
    end

    yield(header_body_tuples_by_query)
  end

  empty_responses = empty_queries.each_with_object({}) do |query, hash|
    hash[query] = DatastoreResponse::SearchResponse::RAW_EMPTY
  end

  empty_responses.merge(responses_by_query).each_with_object({}) do |(query, response), hash|
    hash[query] = DatastoreResponse::SearchResponse.build(response, decoded_cursor_factory: query.send(:decoded_cursor_factory))
  end.tap do |responses_hash|
    # Callers expect this `perform` method to provide an invariant: the returned hash MUST contain one entry
    # for each of the `queries` passed in the args. In practice, violating this invariant primarily causes a
    # problem when the caller uses the `GraphQL::Dataloader` (which happens for every GraphQL request in production...).
    # However, our tests do not always run queries end-to-end, so this is an added check we want to do, so that
    # anytime our logic here fails to include a query in the response in any test, we'll be notified of the
    # problem.
    expected_queries = queries.to_set
    actual_queries = responses_hash.keys.to_set

    if expected_queries != actual_queries
      missing_queries = expected_queries - actual_queries
      extra_queries = actual_queries - expected_queries

      raise Errors::SearchFailedError, "The `responses_hash` does not have the expected set of queries as keys. " \
        "This can cause problems for the `GraphQL::Dataloader` and suggests a bug in the logic that should be fixed.\n\n" \
        "Missing queries (#{missing_queries.size}):\n#{missing_queries.map(&:inspect).join("\n")}.\n\n" \
        "Extra queries (#{extra_queries.size}): #{extra_queries.map(&:inspect).join("\n")}"
    end
  end
end

Instance Method Details

#all_filters ⇒ `Object`



256
257
258

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 256

def all_filters
  client_filters + internal_filters
end

#cluster_name ⇒ `Object`

Returns the name of the datastore cluster as a String where this query should be setn. Unless exactly 1 cluster name is found, this method raises a Errors::ConfigError.

Raises:

(Errors::ConfigError)

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 160

def cluster_name
  cluster_name = search_index_definitions.map(&:cluster_to_query).uniq
  return cluster_name.first if cluster_name.size == 1
  raise Errors::ConfigError, "Found different datastore clusters (#{cluster_name}) to query " \
    "for query targeting indices: #{search_index_definitions}"
end

#document_paginator ⇒ `Object`

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 231

def document_paginator
  @document_paginator ||= DocumentPaginator.new(
    sort_clauses: sort_with_tiebreaker,
    individual_docs_needed: individual_docs_needed,
    total_document_count_needed: total_document_count_needed,
    decoded_cursor_factory: decoded_cursor_factory,
    schema_element_names: schema_element_names,
    size_multiplier: size_multiplier,
    max_effective_size: search_index_definitions.map { |i| i.max_result_window }.min,
    paginator: Paginator.new(
      default_page_size: default_page_size,
      max_page_size: max_page_size,
      first: document_pagination[:first],
      after: document_pagination[:after],
      last: document_pagination[:last],
      before: document_pagination[:before],
      schema_element_names: schema_element_names
    )
  )
end

#effective_size ⇒ `Object`



252
253
254

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 252

def effective_size
  document_paginator.effective_size
end

#empty? ⇒ `Boolean`

Indicates if the query does not need any results from the datastore. As an optimization, we can reply with a default “empty” response for an empty query.

Returns:

(Boolean)

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 203

def empty?
  # If we are searching no indices or routing to an empty set of shards, there is no need to query the datastore at all.
  # This only happens when our filter processing has deduced that the query will match no results.
  return true if search_index_expression.empty? || shard_routing_values&.empty?

  datastore_body = to_datastore_body
  datastore_body.fetch(:size) == 0 && !datastore_body.fetch(:track_total_hits) && aggregations_datastore_body.empty?
end

#excluding_indices? ⇒ `Boolean`

Returns:

(Boolean)



154
155
156

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 154

def excluding_indices?
  search_index_expression.split(",").any? { |expr| expr.start_with?("-") }
end

#hash ⇒ `Object`

‘DatastoreQuery` objects are used as keys in a hash. Computing `#hash` can be expensive (given how many fields an `DatastoreQuery` has) and it’s safe to cache since ‘DatastoreQuery` instances are immutable, so we memoize it here. We’ve observed this making a very noticeable difference in our test suite runtime.



227
228
229

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 227

def hash
  @hash ||= super
end

#inspect ⇒ `Object`

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 212

def inspect
  description = to_datastore_msearch_header.merge(to_datastore_body).map do |key, value|
    "#{key}=#{(key == :query) ? "<REDACTED>" : value.inspect}"
  end.join(" ")

  "#<#{self.class.name} #{description}>"
end

#merge_with(individual_docs_needed: false, total_document_count_needed: false, client_filters: [], internal_filters: [], sort: [], requested_fields: [], request_all_fields: false, requested_highlights: [], request_all_highlights: false, document_pagination: {}, size_multiplier: 1, monotonic_clock_deadline: nil, aggregations: {}) ⇒ `Object`

Merges in the provided attribute overrides, honoring the intended semantics and invariants of ‘DatastoreQuery`.

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 97

def merge_with(
  individual_docs_needed: false,
  total_document_count_needed: false,
  client_filters: [],
  internal_filters: [],
  sort: [],
  requested_fields: [],
  request_all_fields: false,
  requested_highlights: [],
  request_all_highlights: false,
  document_pagination: {},
  size_multiplier: 1,
  monotonic_clock_deadline: nil,
  aggregations: {}
)
  individual_docs_needed ||= self.individual_docs_needed ||
    !requested_fields.empty? || request_all_fields ||
    !requested_highlights.empty? || request_all_highlights

  total_document_count_needed ||= self.total_document_count_needed || aggregations.values.any?(&:needs_total_doc_count?)

  with(
    individual_docs_needed: individual_docs_needed,
    total_document_count_needed: total_document_count_needed,
    client_filters: self.client_filters + client_filters,
    internal_filters: self.internal_filters + internal_filters,
    sort: merge_attribute(:sort, sort),
    requested_fields: self.requested_fields + requested_fields,
    request_all_fields: self.request_all_fields || request_all_fields,
    requested_highlights: self.requested_highlights + requested_highlights,
    request_all_highlights: self.request_all_highlights || request_all_highlights,
    document_pagination: merge_attribute(:document_pagination, document_pagination),
    size_multiplier: self.size_multiplier * size_multiplier,
    monotonic_clock_deadline: [self.monotonic_clock_deadline, monotonic_clock_deadline].compact.min,
    aggregations: self.aggregations.merge(aggregations)
  )
end

#route_with_field_paths ⇒ `Object`

Returns a list of unique field paths that should be used for shard routing during searches.

If a search is filtering on one of these fields, we can optimize the search by routing it to only the shards containing documents for that routing value.

Note that this returns a list due to our support for type unions. A unioned type can be composed of subtypes that have use different shard routing; this will return the set union of them all.



175
176
177

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 175

def route_with_field_paths
  search_index_definitions.map(&:route_with).uniq
end

#search_index_expression ⇒ `Object`

Returns an index_definition expression string to use for searches. This string can specify multiple indices, use wildcards, etc. For info about what is supported, see: www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 144

def search_index_expression
  @search_index_expression ||= index_expression_builder.determine_search_index_expression(
    all_filters,
    search_index_definitions,
    # When we have aggregations, we must require indices to search. When we search no indices, the datastore does not return
    # the standard aggregations response structure, which causes problems.
    require_indices: !aggregations_datastore_body.empty?
  ).to_s
end

#shard_routing_values ⇒ `Object`

The shard routing values used for this search. Can be ‘nil` if the query will hit all shards. `[]` means that we are routing to no shards.

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 181

def shard_routing_values
  return @shard_routing_values if defined?(@shard_routing_values)
  routing_values = routing_picker.extract_eligible_routing_values(all_filters, route_with_field_paths)

  @shard_routing_values ||=
    if routing_values&.empty? && !aggregations_datastore_body.empty?
      # If we return an empty array of routing values, no shards will get searched, which causes a problem for aggregations.
      # When a query includes aggregations, there are normally aggregation structures on the respopnse (even when there are no
      # search hits to aggregate over!) but if there are no routing values, those aggregation structures will be missing from
      # the response. It's complex to handle that in our downstream response handling code, so we prefer to force a "fallback"
      # routing value here to ensure that at least one shard gets searched. Which shard gets searched doesn't matter; the search
      # filter that led to an empty set of routing values will match on documents on any shard.
      ["fallback_shard_routing_value"]
    elsif contains_ignored_values_for_routing?(routing_values)
      nil
    else
      routing_values&.sort # order doesn't matter, but sorting it makes it easier to assert on in our tests.
    end
end

#to_datastore_msearch_header ⇒ `Object`



220
221
222

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 220

def to_datastore_msearch_header
  @to_datastore_msearch_header ||= {index: search_index_expression, routing: shard_routing_values&.join(",")}.compact
end

#to_datastore_msearch_header_and_body ⇒ `Object`

Pairs the multi-search headers and body into a tuple, as per the format required by the datastore: www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html#search-multi-search-api-desc



137
138
139

# File 'lib/elastic_graph/graphql/datastore_query.rb', line 137

def to_datastore_msearch_header_and_body
  @to_datastore_msearch_header_and_body ||= [to_datastore_msearch_header, to_datastore_body]
end

Class: ElasticGraph::GraphQL::DatastoreQuery

Overview

Defined Under Namespace

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.perform(queries) ⇒ Object

Instance Method Details

#all_filters ⇒ Object

#cluster_name ⇒ Object

#document_paginator ⇒ Object

#effective_size ⇒ Object

#empty? ⇒ Boolean

#excluding_indices? ⇒ Boolean

#hash ⇒ Object

#inspect ⇒ Object

#route_with_field_paths ⇒ Object

#search_index_expression ⇒ Object

#shard_routing_values ⇒ Object

#to_datastore_msearch_header ⇒ Object

#to_datastore_msearch_header_and_body ⇒ Object