Class: WaterDrop::Producer

Inherits:

Object

Object
WaterDrop::Producer

show all

Extended by:: Forwardable

Includes:: Async, Buffer, Sync

Defined in:: lib/waterdrop/producer.rb,
lib/waterdrop/producer/sync.rb,
lib/waterdrop/producer/async.rb,
lib/waterdrop/producer/buffer.rb,
lib/waterdrop/producer/status.rb,
lib/waterdrop/producer/builder.rb,
lib/waterdrop/producer/dummy_client.rb

Overview

Main WaterDrop messages producer

Defined Under Namespace

Modules: Async, Buffer, Sync Classes: Builder, DummyClient, Status

Instance Attribute Summary collapse

#config ⇒ Object readonly

Dry-configurable config object.
#id ⇒ String readonly

Uuid of the current producer.
#messages ⇒ Concurrent::Array readonly

Internal messages buffer.
#monitor ⇒ Object readonly

Monitor we want to use.
#status ⇒ Status readonly

Producer status object.

Instance Method Summary collapse

#client ⇒ Rdkafka::Producer

Raw rdkafka producer.
#close ⇒ Object

Flushes the buffers in a sync way and closes the producer.
#ensure_active! ⇒ Object

Ensures that we don’t run any operations when the producer is not configured or when it was already closed.
#initialize(&block) ⇒ Producer constructor

Creates a not-yet-configured instance of the producer.
#produce(message) ⇒ Object

Runs the client produce method with a given message.
#setup(&block) ⇒ Object

Sets up the whole configuration and initializes all that is needed.
#validate_message!(message) ⇒ Object

Ensures that the message we want to send out to Kafka is actually valid and that it can be sent there.
#wait(handler) ⇒ Object

Waits on a given handler.

Constructor Details

#initialize(&block) ⇒ `Producer`

Creates a not-yet-configured instance of the producer

Parameters:

block (Proc) —

configuration block

# File 'lib/waterdrop/producer.rb', line 35

def initialize(&block)
  @buffer_mutex = Mutex.new
  @connecting_mutex = Mutex.new
  @closing_mutex = Mutex.new

  @status = Status.new
  @messages = Concurrent::Array.new

  return unless block

  setup(&block)
end

Instance Attribute Details

#config ⇒ `Object` (readonly)

Returns dry-configurable config object.

Returns:

(Object) —

dry-configurable config object



30
31
32

# File 'lib/waterdrop/producer.rb', line 30

def config
  @config
end

#id ⇒ `String` (readonly)

Returns uuid of the current producer.

Returns:

(String) —

uuid of the current producer



22
23
24

# File 'lib/waterdrop/producer.rb', line 22

def id
  @id
end

#messages ⇒ `Concurrent::Array` (readonly)

Returns internal messages buffer.

Returns:

(Concurrent::Array) —

internal messages buffer



26
27
28

# File 'lib/waterdrop/producer.rb', line 26

def messages
  @messages
end

#monitor ⇒ `Object` (readonly)

Returns monitor we want to use.

Returns:

(Object) —

monitor we want to use



28
29
30

# File 'lib/waterdrop/producer.rb', line 28

def monitor
  @monitor
end

#status ⇒ `Status` (readonly)

Returns producer status object.

Returns:

(Status) —

producer status object



24
25
26

# File 'lib/waterdrop/producer.rb', line 24

def status
  @status
end

Instance Method Details

#client ⇒ `Rdkafka::Producer`

Note:

Client is lazy initialized, keeping in mind also the fact of a potential fork that can happen any time.

Note:

It is not recommended to fork a producer that is already in use so in case of bootstrapping a cluster, it’s much better to fork configured but not used producers

Returns raw rdkafka producer.

Returns:

(Rdkafka::Producer) —

raw rdkafka producer

Raises:

(Errors::ProducerNotConfiguredError)

# File 'lib/waterdrop/producer.rb', line 69

def client
  return @client if @client && @pid == Process.pid

  # Don't allow to obtain a client reference for a producer that was not configured
  raise Errors::ProducerNotConfiguredError, id if @status.initial?

  @connecting_mutex.synchronize do
    return @client if @client && @pid == Process.pid

    # We should raise an error when trying to use a producer from a fork, that is already
    # connected to Kafka. We allow forking producers only before they are used
    raise Errors::ProducerUsedInParentProcess, Process.pid if @status.connected?

    # We undefine all the finalizers, in case it was a fork, so the finalizers from the parent
    # process don't leak
    ObjectSpace.undefine_finalizer(id)
    # Finalizer tracking is needed for handling shutdowns gracefully.
    # I don't expect everyone to remember about closing all the producers all the time, thus
    # this approach is better. Although it is still worth keeping in mind, that this will
    # block GC from removing a no longer used producer unless closed properly but at least
    # won't crash the VM upon closing the process
    ObjectSpace.define_finalizer(id, proc { close })

    @pid = Process.pid
    @client = Builder.new.call(self, @config)

    # Register statistics runner for this particular type of callbacks
    ::Karafka::Core::Instrumentation.statistics_callbacks.add(
      @id,
      Instrumentation::Callbacks::Statistics.new(@id, @client.name, @config.monitor)
    )

    # Register error tracking callback
    ::Karafka::Core::Instrumentation.error_callbacks.add(
      @id,
      Instrumentation::Callbacks::Error.new(@id, @client.name, @config.monitor)
    )

    @status.connected!
  end

  @client
end

#close ⇒ `Object`

Flushes the buffers in a sync way and closes the producer

# File 'lib/waterdrop/producer.rb', line 114

def close
  @closing_mutex.synchronize do
    return unless @status.active?

    @monitor.instrument(
      'producer.closed',
      producer_id: id
    ) do
      @status.closing!

      # No need for auto-gc if everything got closed by us
      # This should be used only in case a producer was not closed properly and forgotten
      ObjectSpace.undefine_finalizer(id)

      # We save this thread id because we need to bypass the activity verification on the
      # producer for final flush of buffers.
      @closing_thread_id = Thread.current.object_id

      # Flush has its own buffer mutex but even if it is blocked, flushing can still happen
      # as we close the client after the flushing (even if blocked by the mutex)
      flush(true)

      # We should not close the client in several threads the same time
      # It is safe to run it several times but not exactly the same moment
      # We also mark it as closed only if it was connected, if not, it would trigger a new
      # connection that anyhow would be immediately closed
      client.close(@config.max_wait_timeout) if @client

      # Remove callbacks runners that were registered
      ::Karafka::Core::Instrumentation.statistics_callbacks.delete(@id)
      ::Karafka::Core::Instrumentation.error_callbacks.delete(@id)

      @status.closed!
    end
  end
end

#ensure_active! ⇒ `Object`

Ensures that we don’t run any operations when the producer is not configured or when it was already closed

Raises:

(Errors::ProducerNotConfiguredError)

# File 'lib/waterdrop/producer.rb', line 153

def ensure_active!
  return if @status.active?

  raise Errors::ProducerNotConfiguredError, id if @status.initial?
  raise Errors::ProducerClosedError, id if @status.closing? || @status.closed?

  # This should never happen
  raise Errors::StatusInvalidError, [id, @status.to_s]
end

#produce(message) ⇒ `Object`

Runs the client produce method with a given message

Parameters:

message (Hash) —

message we want to send

# File 'lib/waterdrop/producer.rb', line 174

def produce(message)
  client.produce(**message)
rescue SUPPORTED_FLOW_ERRORS.first => e
  # Unless we want to wait and retry and it's a full queue, we raise normally
  raise unless @config.wait_on_queue_full
  raise unless e.code == :queue_full

  # We use this syntax here because we want to preserve the original `#cause` when we
  # instrument the error and there is no way to manually assign `#cause` value. We want to keep
  # the original cause to maintain the same API across all the errors dispatched to the
  # notifications pipeline.
  begin
    raise Errors::ProduceError
  rescue Errors::ProduceError => e
    # We want to instrument on this event even when we restart it.
    # The reason is simple: instrumentation and visibility.
    # We can recover from this, but despite that we should be able to instrument this.
    # If this type of event happens too often, it may indicate that the buffer settings are not
    # well configured.
    @monitor.instrument(
      'error.occurred',
      producer_id: id,
      message: message,
      error: e,
      type: 'message.produce'
    )

    # We do not poll the producer because polling happens in a background thread
    # It also should not be a frequent case (queue full), hence it's ok to just throttle.
    sleep @config.wait_on_queue_full_timeout
  end

  retry
end

#setup(&block) ⇒ `Object`

Sets up the whole configuration and initializes all that is needed

Parameters:

block (Block) —

configuration block

Raises:

(Errors::ProducerAlreadyConfiguredError)

# File 'lib/waterdrop/producer.rb', line 50

def setup(&block)
  raise Errors::ProducerAlreadyConfiguredError, id unless @status.initial?

  @config = Config
            .new
            .setup(&block)
            .config

  @id = @config.id
  @monitor = @config.monitor
  @contract = Contracts::Message.new(max_payload_size: @config.max_payload_size)
  @status.configured!
end

#validate_message!(message) ⇒ `Object`

Ensures that the message we want to send out to Kafka is actually valid and that it can be sent there

Parameters:

message (Hash) —

message we want to send

Raises:

(Karafka::Errors::MessageInvalidError)



167
168
169

# File 'lib/waterdrop/producer.rb', line 167

def validate_message!(message)
  @contract.validate!(message, Errors::MessageInvalidError)
end

#wait(handler) ⇒ `Object`

Waits on a given handler

Parameters:

handler (Rdkafka::Producer::DeliveryHandle)

# File 'lib/waterdrop/producer.rb', line 212

def wait(handler)
  handler.wait(
    max_wait_timeout: @config.max_wait_timeout,
    wait_timeout: @config.wait_timeout
  )
end

Class: WaterDrop::Producer

Overview

Defined Under Namespace

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Buffer

Methods included from Async

Methods included from Sync

Constructor Details

#initialize(&block) ⇒ Producer

Instance Attribute Details

#config ⇒ Object (readonly)

#id ⇒ String (readonly)

#messages ⇒ Concurrent::Array (readonly)

#monitor ⇒ Object (readonly)

#status ⇒ Status (readonly)

Instance Method Details

#client ⇒ Rdkafka::Producer

#close ⇒ Object

#ensure_active! ⇒ Object

#produce(message) ⇒ Object

#setup(&block) ⇒ Object

#validate_message!(message) ⇒ Object

#wait(handler) ⇒ Object

#initialize(&block) ⇒ `Producer`

#config ⇒ `Object` (readonly)

#id ⇒ `String` (readonly)

#messages ⇒ `Concurrent::Array` (readonly)

#monitor ⇒ `Object` (readonly)

#status ⇒ `Status` (readonly)

#client ⇒ `Rdkafka::Producer`

#close ⇒ `Object`

#ensure_active! ⇒ `Object`

#produce(message) ⇒ `Object`

#setup(&block) ⇒ `Object`

#validate_message!(message) ⇒ `Object`

#wait(handler) ⇒ `Object`