Class: ElasticGraph::GraphQL::DatastoreQuery
- Inherits:
-
Object
- Object
- ElasticGraph::GraphQL::DatastoreQuery
- Defined in:
- lib/elastic_graph/graphql/datastore_query.rb,
lib/elastic_graph/graphql/datastore_query/paginator.rb,
lib/elastic_graph/graphql/datastore_query/routing_picker.rb,
lib/elastic_graph/graphql/datastore_query/document_paginator.rb,
lib/elastic_graph/graphql/datastore_query/index_expression_builder.rb
Overview
An immutable class that represents a datastore query. Since this represents a datastore query, and not a GraphQL query, all the data in it is modeled in datastore terms, not GraphQL terms. For example, any field names in a ‘Query` should be references to index fields, not GraphQL fields.
Filters are modeled as a ‘Set` of filtering hashes. While we usually expect only a single `filter` hash, modeling it as a set makes it easy for us to support merging queries. The datastore knows how to apply multiple `must` clauses that apply to the same field, giving us the exact semantics we want in such a situation with minimal effort.
Defined Under Namespace
Classes: Builder, IndexExpression, Paginator
Class Method Summary collapse
-
.perform(queries) ⇒ Object
Performs a list of queries by building a hash of datastore msearch header/body tuples (keyed by query), yielding them to the caller, and then post-processing the results.
Instance Method Summary collapse
- #all_filters ⇒ Object
-
#cluster_name ⇒ Object
Returns the name of the datastore cluster as a String where this query should be setn.
- #document_paginator ⇒ Object
- #effective_size ⇒ Object
-
#empty? ⇒ Boolean
Indicates if the query does not need any results from the datastore.
- #excluding_indices? ⇒ Boolean
-
#hash ⇒ Object
‘DatastoreQuery` objects are used as keys in a hash.
- #inspect ⇒ Object
-
#merge_with(individual_docs_needed: false, total_document_count_needed: false, client_filters: [], internal_filters: [], sort: [], requested_fields: [], request_all_fields: false, requested_highlights: [], request_all_highlights: false, document_pagination: {}, size_multiplier: 1, monotonic_clock_deadline: nil, aggregations: {}) ⇒ Object
Merges in the provided attribute overrides, honoring the intended semantics and invariants of ‘DatastoreQuery`.
-
#route_with_field_paths ⇒ Object
Returns a list of unique field paths that should be used for shard routing during searches.
-
#search_index_expression ⇒ Object
Returns an index_definition expression string to use for searches.
-
#shard_routing_values ⇒ Object
The shard routing values used for this search.
- #to_datastore_msearch_header ⇒ Object
-
#to_datastore_msearch_header_and_body ⇒ Object
Pairs the multi-search headers and body into a tuple, as per the format required by the datastore: www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html#search-multi-search-api-desc.
Class Method Details
.perform(queries) ⇒ Object
Performs a list of queries by building a hash of datastore msearch header/body tuples (keyed by query), yielding them to the caller, and then post-processing the results. The caller is responsible for returning a hash of responses by query from its block.
Note that some of the passed queries may not be yielded to the caller; when we can tell that a query does not have to be sent to the datastore we avoid yielding it from here. Therefore, the caller should not assume that all queries passed to this method will be yielded back.
The return value is a hash of ‘DatastoreResponse::SearchResponse` objects by query.
Note: this method uses ‘send` to work around ruby visibility rules. We do not want `#decoded_cursor_factory` to be public, as we only need it here, but we cannot access it from a class method without using `send`.
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 57 def self.perform(queries) empty_queries, present_queries = queries.partition(&:empty?) responses_by_query = Aggregation::QueryOptimizer.optimize_queries(present_queries) do |optimized_queries| header_body_tuples_by_query = optimized_queries.each_with_object({}) do |query, hash| hash[query] = query.to_datastore_msearch_header_and_body end yield(header_body_tuples_by_query) end empty_responses = empty_queries.each_with_object({}) do |query, hash| hash[query] = DatastoreResponse::SearchResponse::RAW_EMPTY end empty_responses.merge(responses_by_query).each_with_object({}) do |(query, response), hash| hash[query] = DatastoreResponse::SearchResponse.build(response, decoded_cursor_factory: query.send(:decoded_cursor_factory)) end.tap do |responses_hash| # Callers expect this `perform` method to provide an invariant: the returned hash MUST contain one entry # for each of the `queries` passed in the args. In practice, violating this invariant primarily causes a # problem when the caller uses the `GraphQL::Dataloader` (which happens for every GraphQL request in production...). # However, our tests do not always run queries end-to-end, so this is an added check we want to do, so that # anytime our logic here fails to include a query in the response in any test, we'll be notified of the # problem. expected_queries = queries.to_set actual_queries = responses_hash.keys.to_set if expected_queries != actual_queries missing_queries = expected_queries - actual_queries extra_queries = actual_queries - expected_queries raise Errors::SearchFailedError, "The `responses_hash` does not have the expected set of queries as keys. " \ "This can cause problems for the `GraphQL::Dataloader` and suggests a bug in the logic that should be fixed.\n\n" \ "Missing queries (#{missing_queries.size}):\n#{missing_queries.map(&:inspect).join("\n")}.\n\n" \ "Extra queries (#{extra_queries.size}): #{extra_queries.map(&:inspect).join("\n")}" end end end |
Instance Method Details
#all_filters ⇒ Object
256 257 258 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 256 def all_filters client_filters + internal_filters end |
#cluster_name ⇒ Object
Returns the name of the datastore cluster as a String where this query should be setn. Unless exactly 1 cluster name is found, this method raises a Errors::ConfigError.
160 161 162 163 164 165 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 160 def cluster_name cluster_name = search_index_definitions.map(&:cluster_to_query).uniq return cluster_name.first if cluster_name.size == 1 raise Errors::ConfigError, "Found different datastore clusters (#{cluster_name}) to query " \ "for query targeting indices: #{search_index_definitions}" end |
#document_paginator ⇒ Object
231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 231 def document_paginator @document_paginator ||= DocumentPaginator.new( sort_clauses: sort_with_tiebreaker, individual_docs_needed: individual_docs_needed, total_document_count_needed: total_document_count_needed, decoded_cursor_factory: decoded_cursor_factory, schema_element_names: schema_element_names, size_multiplier: size_multiplier, max_effective_size: search_index_definitions.map { |i| i.max_result_window }.min, paginator: Paginator.new( default_page_size: default_page_size, max_page_size: max_page_size, first: document_pagination[:first], after: document_pagination[:after], last: document_pagination[:last], before: document_pagination[:before], schema_element_names: schema_element_names ) ) end |
#effective_size ⇒ Object
252 253 254 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 252 def effective_size document_paginator.effective_size end |
#empty? ⇒ Boolean
Indicates if the query does not need any results from the datastore. As an optimization, we can reply with a default “empty” response for an empty query.
203 204 205 206 207 208 209 210 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 203 def empty? # If we are searching no indices or routing to an empty set of shards, there is no need to query the datastore at all. # This only happens when our filter processing has deduced that the query will match no results. return true if search_index_expression.empty? || shard_routing_values&.empty? datastore_body = to_datastore_body datastore_body.fetch(:size) == 0 && !datastore_body.fetch(:track_total_hits) && aggregations_datastore_body.empty? end |
#excluding_indices? ⇒ Boolean
154 155 156 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 154 def excluding_indices? search_index_expression.split(",").any? { |expr| expr.start_with?("-") } end |
#hash ⇒ Object
‘DatastoreQuery` objects are used as keys in a hash. Computing `#hash` can be expensive (given how many fields an `DatastoreQuery` has) and it’s safe to cache since ‘DatastoreQuery` instances are immutable, so we memoize it here. We’ve observed this making a very noticeable difference in our test suite runtime.
227 228 229 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 227 def hash @hash ||= super end |
#inspect ⇒ Object
212 213 214 215 216 217 218 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 212 def inspect description = to_datastore_msearch_header.merge(to_datastore_body).map do |key, value| "#{key}=#{(key == :query) ? "<REDACTED>" : value.inspect}" end.join(" ") "#<#{self.class.name} #{description}>" end |
#merge_with(individual_docs_needed: false, total_document_count_needed: false, client_filters: [], internal_filters: [], sort: [], requested_fields: [], request_all_fields: false, requested_highlights: [], request_all_highlights: false, document_pagination: {}, size_multiplier: 1, monotonic_clock_deadline: nil, aggregations: {}) ⇒ Object
Merges in the provided attribute overrides, honoring the intended semantics and invariants of ‘DatastoreQuery`.
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 97 def merge_with( individual_docs_needed: false, total_document_count_needed: false, client_filters: [], internal_filters: [], sort: [], requested_fields: [], request_all_fields: false, requested_highlights: [], request_all_highlights: false, document_pagination: {}, size_multiplier: 1, monotonic_clock_deadline: nil, aggregations: {} ) individual_docs_needed ||= self.individual_docs_needed || !requested_fields.empty? || request_all_fields || !requested_highlights.empty? || request_all_highlights total_document_count_needed ||= self.total_document_count_needed || aggregations.values.any?(&:needs_total_doc_count?) with( individual_docs_needed: individual_docs_needed, total_document_count_needed: total_document_count_needed, client_filters: self.client_filters + client_filters, internal_filters: self.internal_filters + internal_filters, sort: merge_attribute(:sort, sort), requested_fields: self.requested_fields + requested_fields, request_all_fields: self.request_all_fields || request_all_fields, requested_highlights: self.requested_highlights + requested_highlights, request_all_highlights: self.request_all_highlights || request_all_highlights, document_pagination: merge_attribute(:document_pagination, document_pagination), size_multiplier: self.size_multiplier * size_multiplier, monotonic_clock_deadline: [self.monotonic_clock_deadline, monotonic_clock_deadline].compact.min, aggregations: self.aggregations.merge(aggregations) ) end |
#route_with_field_paths ⇒ Object
Returns a list of unique field paths that should be used for shard routing during searches.
If a search is filtering on one of these fields, we can optimize the search by routing it to only the shards containing documents for that routing value.
Note that this returns a list due to our support for type unions. A unioned type can be composed of subtypes that have use different shard routing; this will return the set union of them all.
175 176 177 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 175 def route_with_field_paths search_index_definitions.map(&:route_with).uniq end |
#search_index_expression ⇒ Object
Returns an index_definition expression string to use for searches. This string can specify multiple indices, use wildcards, etc. For info about what is supported, see: www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html
144 145 146 147 148 149 150 151 152 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 144 def search_index_expression @search_index_expression ||= index_expression_builder.determine_search_index_expression( all_filters, search_index_definitions, # When we have aggregations, we must require indices to search. When we search no indices, the datastore does not return # the standard aggregations response structure, which causes problems. require_indices: !aggregations_datastore_body.empty? ).to_s end |
#shard_routing_values ⇒ Object
The shard routing values used for this search. Can be ‘nil` if the query will hit all shards. `[]` means that we are routing to no shards.
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 181 def shard_routing_values return @shard_routing_values if defined?(@shard_routing_values) routing_values = routing_picker.extract_eligible_routing_values(all_filters, route_with_field_paths) @shard_routing_values ||= if routing_values&.empty? && !aggregations_datastore_body.empty? # If we return an empty array of routing values, no shards will get searched, which causes a problem for aggregations. # When a query includes aggregations, there are normally aggregation structures on the respopnse (even when there are no # search hits to aggregate over!) but if there are no routing values, those aggregation structures will be missing from # the response. It's complex to handle that in our downstream response handling code, so we prefer to force a "fallback" # routing value here to ensure that at least one shard gets searched. Which shard gets searched doesn't matter; the search # filter that led to an empty set of routing values will match on documents on any shard. ["fallback_shard_routing_value"] elsif contains_ignored_values_for_routing?(routing_values) nil else routing_values&.sort # order doesn't matter, but sorting it makes it easier to assert on in our tests. end end |
#to_datastore_msearch_header ⇒ Object
220 221 222 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 220 def to_datastore_msearch_header @to_datastore_msearch_header ||= {index: search_index_expression, routing: shard_routing_values&.join(",")}.compact end |
#to_datastore_msearch_header_and_body ⇒ Object
Pairs the multi-search headers and body into a tuple, as per the format required by the datastore: www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html#search-multi-search-api-desc
137 138 139 |
# File 'lib/elastic_graph/graphql/datastore_query.rb', line 137 def to_datastore_msearch_header_and_body @to_datastore_msearch_header_and_body ||= [to_datastore_msearch_header, to_datastore_body] end |