Module: Elasticsearch::Persistence::Model::Find::ClassMethods

Defined in:
lib/elasticsearch/persistence/model/find.rb

Instance Method Summary collapse

Instance Method Details

#count(query_or_definition = nil, options = {}) ⇒ Integer

Returns the number of models

Examples:

Return the count of all models


Person.count
# => 2

Return the count of models matching a simple query


Person.count('fox or dog')
# => 1

Return the count of models matching a query in the Elasticsearch DSL


Person.search(query: { match: { title: 'fox dog' } })
# => 1

Returns:

  • (Integer)


26
27
28
# File 'lib/elasticsearch/persistence/model/find.rb', line 26

def count(query_or_definition = nil, options = {})
  gateway.count(query_or_definition, options)
end

#find_each(options = {}) ⇒ String, Enumerator

Iterate effectively over models using the ‘find_in_batches` method.

All the options are passed to ‘find_in_batches` and each result is yielded to the passed block.

Examples:

Print out the people’s names by scrolling through the index


Person.find_each { |person| puts person.name }

# # GET http://localhost:9200/people/person/_search?scroll=5m&search_type=scan&size=20
# # GET http://localhost:9200/_search/scroll?scroll=5m&scroll_id=c2Nhbj...
# Test 0
# Test 1
# Test 2
# ...
# # GET http://localhost:9200/_search/scroll?scroll=5m&scroll_id=c2Nhbj...
# Test 20
# Test 21
# Test 22

Leave out the block to return an Enumerator instance


Person.find_each.select { |person| person.name =~ /John/ }
# => => [#<Person {id: "NkltJP5vRxqk9_RMP7SU8Q", name: "John Smith",  ...}>]

Returns:

  • (String, Enumerator)

    The ‘scroll_id` for the request or Enumerator when the block is not passed



144
145
146
147
148
149
150
# File 'lib/elasticsearch/persistence/model/find.rb', line 144

def find_each(options = {})
  return to_enum(:find_each, options) unless block_given?

  find_in_batches(options) do |batch|
    batch.each { |result| yield result }
  end
end

#find_in_batches(options = {}, &block) ⇒ String, Enumerator

Returns all models efficiently via the Elasticsearch’s scan/scroll API

You can restrict the models being returned with a query.

The Search API options are passed to the search method as parameters, all remaining options are passed as the ‘:body` parameter.

The full Repository::Response::Results instance is yielded to the passed block in each batch, so you can access any of its properties; calling ‘to_a` will convert the object to an Array of model instances.

Examples:

Return all models in batches of 20 x number of primary shards


Person.find_in_batches { |batch| puts batch.map(&:name) }

Return all models in batches of 100 x number of primary shards


Person.find_in_batches(size: 100) { |batch| puts batch.map(&:name) }

Return all models matching a specific query


Person.find_in_batches(query: { match: { name: 'test' } }) { |batch| puts batch.map(&:name) }

Return all models, fetching only the ‘name` attribute from Elasticsearch


Person.find_in_batches( _source_include: 'name') { |_| puts _.response.hits.hits.map(&:to_hash) }

Leave out the block to return an Enumerator instance


Person.find_in_batches(size: 100).map { |batch| batch.size }
# => [100, 100, 100, ... ]

Returns:

  • (String, Enumerator)

    The ‘scroll_id` for the request or Enumerator when the block is not passed



65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# File 'lib/elasticsearch/persistence/model/find.rb', line 65

def find_in_batches(options = {}, &block)
  return to_enum(:find_in_batches, options) unless block_given?

  search_params = options.slice(
    :index,
    :type,
    :scroll,
    :size,
    :explain,
    :ignore_indices,
    :ignore_unavailable,
    :allow_no_indices,
    :expand_wildcards,
    :preference,
    :q,
    :routing,
    :source,
    :_source,
    :_source_include,
    :_source_exclude,
    :stats,
    :timeout
  )

  scroll = search_params.delete(:scroll) || "5m"

  body = options

  puts "BODY: #{body}".color :red
  # Get the initial scroll_id
  #
  response = gateway.client.search({ index: gateway.index_name,
                                     type: gateway.document_type,
                                     search_type: "scan",
                                     scroll: scroll,
                                     size: 20,
                                     body: body }.merge(search_params))

  # Get the initial batch of documents
  #
  response = gateway.client.scroll({ scroll_id: response["_scroll_id"], scroll: scroll })

  # Break when receiving an empty array of hits
  #
  while response["hits"]["hits"].any?
    yield Repository::Response::Results.new(gateway, response)

    response = gateway.client.scroll({ scroll_id: response["_scroll_id"], scroll: scroll })
  end

  return response["_scroll_id"]
end