Class: ElasticGraph::GraphQL::DatastoreQuery

Inherits:
Object
  • Object
show all
Defined in:
lib/elastic_graph/graphql/datastore_query.rb,
lib/elastic_graph/graphql/datastore_query/paginator.rb,
lib/elastic_graph/graphql/datastore_query/routing_picker.rb,
lib/elastic_graph/graphql/datastore_query/document_paginator.rb,
lib/elastic_graph/graphql/datastore_query/index_expression_builder.rb

Overview

An immutable class that represents a datastore query. Since this represents a datastore query, and not a GraphQL query, all the data in it is modeled in datastore terms, not GraphQL terms. For example, any field names in a ‘Query` should be references to index fields, not GraphQL fields.

Filters are modeled as a ‘Set` of filtering hashes. While we usually expect only a single `filter` hash, modeling it as a set makes it easy for us to support merging queries. The datastore knows how to apply multiple `must` clauses that apply to the same field, giving us the exact semantics we want in such a situation with minimal effort.

Defined Under Namespace

Classes: Builder, IndexExpression, Paginator

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.perform(queries) ⇒ Object

Performs a list of queries by building a hash of datastore msearch header/body tuples (keyed by query), yielding them to the caller, and then post-processing the results. The caller is responsible for returning a hash of responses by query from its block.

Note that some of the passed queries may not be yielded to the caller; when we can tell that a query does not have to be sent to the datastore we avoid yielding it from here. Therefore, the caller should not assume that all queries passed to this method will be yielded back.

The return value is a hash of ‘DatastoreResponse::SearchResponse` objects by query.

Note: this method uses ‘send` to work around ruby visibility rules. We do not want `#decoded_cursor_factory` to be public, as we only need it here, but we cannot access it from a class method without using `send`.



57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 57

def self.perform(queries)
  empty_queries, present_queries = queries.partition(&:empty?)

  responses_by_query = Aggregation::QueryOptimizer.optimize_queries(present_queries) do |optimized_queries|
    header_body_tuples_by_query = optimized_queries.each_with_object({}) do |query, hash|
      hash[query] = query.to_datastore_msearch_header_and_body
    end

    yield(header_body_tuples_by_query)
  end

  empty_responses = empty_queries.each_with_object({}) do |query, hash|
    hash[query] = DatastoreResponse::SearchResponse::RAW_EMPTY
  end

  empty_responses.merge(responses_by_query).each_with_object({}) do |(query, response), hash|
    hash[query] = DatastoreResponse::SearchResponse.build(response, decoded_cursor_factory: query.send(:decoded_cursor_factory))
  end.tap do |responses_hash|
    # Callers expect this `perform` method to provide an invariant: the returned hash MUST contain one entry
    # for each of the `queries` passed in the args. In practice, violating this invariant primarily causes a
    # problem when the caller uses the `GraphQL::Dataloader` (which happens for every GraphQL request in production...).
    # However, our tests do not always run queries end-to-end, so this is an added check we want to do, so that
    # anytime our logic here fails to include a query in the response in any test, we'll be notified of the
    # problem.
    expected_queries = queries.to_set
    actual_queries = responses_hash.keys.to_set

    if expected_queries != actual_queries
      missing_queries = expected_queries - actual_queries
      extra_queries = actual_queries - expected_queries

      raise Errors::SearchFailedError, "The `responses_hash` does not have the expected set of queries as keys. " \
        "This can cause problems for the `GraphQL::Dataloader` and suggests a bug in the logic that should be fixed.\n\n" \
        "Missing queries (#{missing_queries.size}):\n#{missing_queries.map(&:inspect).join("\n")}.\n\n" \
        "Extra queries (#{extra_queries.size}): #{extra_queries.map(&:inspect).join("\n")}"
    end
  end
end

Instance Method Details

#all_filtersObject



256
257
258
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 256

def all_filters
  client_filters + internal_filters
end

#cluster_nameObject

Returns the name of the datastore cluster as a String where this query should be setn. Unless exactly 1 cluster name is found, this method raises a Errors::ConfigError.

Raises:

  • (Errors::ConfigError)


160
161
162
163
164
165
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 160

def cluster_name
  cluster_name = search_index_definitions.map(&:cluster_to_query).uniq
  return cluster_name.first if cluster_name.size == 1
  raise Errors::ConfigError, "Found different datastore clusters (#{cluster_name}) to query " \
    "for query targeting indices: #{search_index_definitions}"
end

#document_paginatorObject



231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 231

def document_paginator
  @document_paginator ||= DocumentPaginator.new(
    sort_clauses: sort_with_tiebreaker,
    individual_docs_needed: individual_docs_needed,
    total_document_count_needed: total_document_count_needed,
    decoded_cursor_factory: decoded_cursor_factory,
    schema_element_names: schema_element_names,
    size_multiplier: size_multiplier,
    max_effective_size: search_index_definitions.map { |i| i.max_result_window }.min,
    paginator: Paginator.new(
      default_page_size: default_page_size,
      max_page_size: max_page_size,
      first: document_pagination[:first],
      after: document_pagination[:after],
      last: document_pagination[:last],
      before: document_pagination[:before],
      schema_element_names: schema_element_names
    )
  )
end

#effective_sizeObject



252
253
254
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 252

def effective_size
  document_paginator.effective_size
end

#empty?Boolean

Indicates if the query does not need any results from the datastore. As an optimization, we can reply with a default “empty” response for an empty query.

Returns:

  • (Boolean)


203
204
205
206
207
208
209
210
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 203

def empty?
  # If we are searching no indices or routing to an empty set of shards, there is no need to query the datastore at all.
  # This only happens when our filter processing has deduced that the query will match no results.
  return true if search_index_expression.empty? || shard_routing_values&.empty?

  datastore_body = to_datastore_body
  datastore_body.fetch(:size) == 0 && !datastore_body.fetch(:track_total_hits) && aggregations_datastore_body.empty?
end

#excluding_indices?Boolean

Returns:

  • (Boolean)


154
155
156
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 154

def excluding_indices?
  search_index_expression.split(",").any? { |expr| expr.start_with?("-") }
end

#hashObject

‘DatastoreQuery` objects are used as keys in a hash. Computing `#hash` can be expensive (given how many fields an `DatastoreQuery` has) and it’s safe to cache since ‘DatastoreQuery` instances are immutable, so we memoize it here. We’ve observed this making a very noticeable difference in our test suite runtime.



227
228
229
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 227

def hash
  @hash ||= super
end

#inspectObject



212
213
214
215
216
217
218
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 212

def inspect
  description = to_datastore_msearch_header.merge(to_datastore_body).map do |key, value|
    "#{key}=#{(key == :query) ? "<REDACTED>" : value.inspect}"
  end.join(" ")

  "#<#{self.class.name} #{description}>"
end

#merge_with(individual_docs_needed: false, total_document_count_needed: false, client_filters: [], internal_filters: [], sort: [], requested_fields: [], request_all_fields: false, requested_highlights: [], request_all_highlights: false, document_pagination: {}, size_multiplier: 1, monotonic_clock_deadline: nil, aggregations: {}) ⇒ Object

Merges in the provided attribute overrides, honoring the intended semantics and invariants of ‘DatastoreQuery`.



97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 97

def merge_with(
  individual_docs_needed: false,
  total_document_count_needed: false,
  client_filters: [],
  internal_filters: [],
  sort: [],
  requested_fields: [],
  request_all_fields: false,
  requested_highlights: [],
  request_all_highlights: false,
  document_pagination: {},
  size_multiplier: 1,
  monotonic_clock_deadline: nil,
  aggregations: {}
)
  individual_docs_needed ||= self.individual_docs_needed ||
    !requested_fields.empty? || request_all_fields ||
    !requested_highlights.empty? || request_all_highlights

  total_document_count_needed ||= self.total_document_count_needed || aggregations.values.any?(&:needs_total_doc_count?)

  with(
    individual_docs_needed: individual_docs_needed,
    total_document_count_needed: total_document_count_needed,
    client_filters: self.client_filters + client_filters,
    internal_filters: self.internal_filters + internal_filters,
    sort: merge_attribute(:sort, sort),
    requested_fields: self.requested_fields + requested_fields,
    request_all_fields: self.request_all_fields || request_all_fields,
    requested_highlights: self.requested_highlights + requested_highlights,
    request_all_highlights: self.request_all_highlights || request_all_highlights,
    document_pagination: merge_attribute(:document_pagination, document_pagination),
    size_multiplier: self.size_multiplier * size_multiplier,
    monotonic_clock_deadline: [self.monotonic_clock_deadline, monotonic_clock_deadline].compact.min,
    aggregations: self.aggregations.merge(aggregations)
  )
end

#route_with_field_pathsObject

Returns a list of unique field paths that should be used for shard routing during searches.

If a search is filtering on one of these fields, we can optimize the search by routing it to only the shards containing documents for that routing value.

Note that this returns a list due to our support for type unions. A unioned type can be composed of subtypes that have use different shard routing; this will return the set union of them all.



175
176
177
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 175

def route_with_field_paths
  search_index_definitions.map(&:route_with).uniq
end

#search_index_expressionObject

Returns an index_definition expression string to use for searches. This string can specify multiple indices, use wildcards, etc. For info about what is supported, see: www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html



144
145
146
147
148
149
150
151
152
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 144

def search_index_expression
  @search_index_expression ||= index_expression_builder.determine_search_index_expression(
    all_filters,
    search_index_definitions,
    # When we have aggregations, we must require indices to search. When we search no indices, the datastore does not return
    # the standard aggregations response structure, which causes problems.
    require_indices: !aggregations_datastore_body.empty?
  ).to_s
end

#shard_routing_valuesObject

The shard routing values used for this search. Can be ‘nil` if the query will hit all shards. `[]` means that we are routing to no shards.



181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 181

def shard_routing_values
  return @shard_routing_values if defined?(@shard_routing_values)
  routing_values = routing_picker.extract_eligible_routing_values(all_filters, route_with_field_paths)

  @shard_routing_values ||=
    if routing_values&.empty? && !aggregations_datastore_body.empty?
      # If we return an empty array of routing values, no shards will get searched, which causes a problem for aggregations.
      # When a query includes aggregations, there are normally aggregation structures on the respopnse (even when there are no
      # search hits to aggregate over!) but if there are no routing values, those aggregation structures will be missing from
      # the response. It's complex to handle that in our downstream response handling code, so we prefer to force a "fallback"
      # routing value here to ensure that at least one shard gets searched. Which shard gets searched doesn't matter; the search
      # filter that led to an empty set of routing values will match on documents on any shard.
      ["fallback_shard_routing_value"]
    elsif contains_ignored_values_for_routing?(routing_values)
      nil
    else
      routing_values&.sort # order doesn't matter, but sorting it makes it easier to assert on in our tests.
    end
end

#to_datastore_msearch_headerObject



220
221
222
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 220

def to_datastore_msearch_header
  @to_datastore_msearch_header ||= {index: search_index_expression, routing: shard_routing_values&.join(",")}.compact
end

#to_datastore_msearch_header_and_bodyObject

Pairs the multi-search headers and body into a tuple, as per the format required by the datastore: www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html#search-multi-search-api-desc



137
138
139
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 137

def to_datastore_msearch_header_and_body
  @to_datastore_msearch_header_and_body ||= [to_datastore_msearch_header, to_datastore_body]
end