status-badge

Dipa

This gem provides an API for parallel processing like the parallel gem but distributed and scalable over different machines. All this with minimum configuration and minimum dependencies to specific technologies and using the rails ecosystem.

Dipa provides a rails engine which depends on ActiveJob and ActiveStorage. You can use whatever backend you like for any of this components and configure them for your specific usecase.

The purpose of this gem is to distribute load heavy and long running processing of large datasets over multiple processes or machines using ActiveJob.

Installation

Before you install Dipa make sure ActiveJob and ActiveStorage are installed and configured properly.

Add this line to your application's Gemfile:

gem 'dipa'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install dipa

Install Dipa migrations

bundle exec rake dipa:install:migrations
bundle exec rake db:migrate

Configuration

Dipa can be configured in the application config. These configuration options set the default for this installation.

config.dipa.agent_queue = :default_queue_for_dipa_agent_jobs
config.dipa.coordinator_queue = :default_queue_for_coordinator_queue_jobs
config.dipa.agent_timeout = 900
config.dipa.agent_processing_timeout = 600
config.dipa.coordinator_timeout = 0
config.dipa.coordinator_processing_timeout = 18000
  • config.dipa.agent_queue defaults to config.active_job.default_queue_name
  • config.dipa.coordinator_queue defaults to config.active_job.default_queue_name
  • config.dipa.agent_timeout defaults to 0 (no timeout).
  • config.dipa.agent_processing_timeout defaults to 0 (no timeout).
  • config.dipa.coordinator_timeout defaults to 0 (no timeout).
  • config.dipa.coordinator_processing_timeout defaults to 0 (no timeout).

Usage

Minimum example:

Dipa.map(1..100).with('Integer', :sqrt)

More realistic examples:

Dipa.map(large_dataset, options: options).with('ProcessorClassName', :processor_class_method)
Dipa.each(large_dataset, options: options).with('ProcessorClassName', :processor_class_method)

Dipa.map returns an Array of the processed items. The result is in the same order as the input (large_dataset).

Dipa.each returns large_dataset.to_a.

large_dataset must be an Enumerable.

options is a hash. Following keys are allowed:

  • agent_queue: [Symbol] Defaults to config.dipa.agent_queue.
  • coordinator_queue: [Symbol] Defaults to config.dipa.coordinator_queue.
  • agent_timeout: [Integer] Defaults to config.dipa.agent_timeout.
  • agent_processing_timeout: [Integer] Defaults to config.dipa.agent_processing_timeout.
  • coordinator_timeout: [Integer] Defaults to config.dipa.coordinator_timeout.
  • coordinator_processing_timeout: [Integer] Defaults to config.dipa.coordinator_processing_timeout.
  • keep_data: [true|false] Defaults to false. Useful for debugging. After processing all Dipa::* records and the associated ActiveStorage data will be removed. If you don't want that to happen, set this to true.

ProcessorClassName must be a Class or a String. Defines the class which provides the processor method.

:processor_class_method must be a Symbol or a String. Defines the method which is used to process each single element of large_dataset. MUST be a class method. MUST except just one element as argument.

TODO

TODO.md

Development

With nix

  • Having nix installed. See https://nixos.org/download.html for detailed instructions for your OS.

    • Enable flakes. shell mkdir -p ~/.config/nix echo "experimental-features = nix-command flakes" >> ~/.config/nix/nix.conf
    • Optional but recommended: Install direnv-nix as described here
    • Clone this repository
    • cd into the repository's directory
    • Enter shell:
      • Without direnv: Execute nix develop. You need to enter and leave the shell explicitly every time.
      • With direnv: shell echo -e "use flake . --impure" > .envrc direnv allow
      • Shell will start every time you cd into the projects directory and stop as soon as you leave it.

    The shell sets up the environment for working with this repository and installs all required tools for this project.

    Changes in the flake.nix fix will trigger a rebuild of your devenv environment as soon as you hit the shell (return key/(re-)enter shell). Specifically, it rebuilds parts that needs rebuild only. You can also enforce a rebuild by executing direnv reload.

    Starting the shell the first time might take some minutes.

  • Run bundle install.

  • Start services in another terminal window with devenv up (as of 15.08.2023 it's mysql). The first run will also setup the database.

  • Run bundle exec rake db:migrate.

Without nix

After checking out the repo, run bin/setup to install dependencies. Then, run bundle exec rspec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

Contributing

Bug reports and pull requests are welcome on Codeberg at https://codeberg.org/empunkt/dipa. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.

License

The gem is available as open source under the terms of the MIT License.

Code of Conduct

Everyone interacting in the Dipa project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.