Dipa
This gem provides an API for parallel processing like the parallel gem but distributed and scalable over different machines. All this with minimum configuration and minimum dependencies to specific technologies and using the rails ecosystem.
Dipa provides a rails engine which depends on ActiveJob and ActiveStorage. You can use whatever backend you like for any of this components and configure them for your specific usecase.
The purpose of this gem is to distribute load heavy and long running processing of large datasets over multiple processes or machines using ActiveJob.
Installation
Before you install Dipa make sure ActiveJob and ActiveStorage are installed and configured properly.
Add this line to your application's Gemfile:
gem 'dipa'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install dipa
Install Dipa migrations
bundle exec rake dipa:install:migrations
bundle exec rake db:migrate
Configuration
Dipa can be configured in the application config. These configuration options set the default for this installation.
config.dipa.agent_queue = :default_queue_for_dipa_agent_jobs
config.dipa.coordinator_queue = :default_queue_for_coordinator_queue_jobs
config.dipa.agent_timeout = 900
config.dipa.agent_processing_timeout = 600
config.dipa.coordinator_timeout = 0
config.dipa.coordinator_processing_timeout = 18000
-
config.dipa.agent_queuedefaults toconfig.active_job.default_queue_name -
config.dipa.coordinator_queuedefaults toconfig.active_job.default_queue_name -
config.dipa.agent_timeoutdefaults to 0 (no timeout). -
config.dipa.agent_processing_timeoutdefaults to 0 (no timeout). -
config.dipa.coordinator_timeoutdefaults to 0 (no timeout). -
config.dipa.coordinator_processing_timeoutdefaults to 0 (no timeout).
Usage
Minimum example:
Dipa.map(1..100).with('Integer', :sqrt)
More realistic examples:
Dipa.map(large_dataset, options: ).with('ProcessorClassName', :processor_class_method)
Dipa.each(large_dataset, options: ).with('ProcessorClassName', :processor_class_method)
Dipa.map returns an Array of the processed items. The result is in the same order as the input (large_dataset).
Dipa.each returns large_dataset.to_a.
large_dataset must be an Enumerable.
options is a hash. Following keys are allowed:
-
agent_queue:[Symbol] Defaults toconfig.dipa.agent_queue. -
coordinator_queue:[Symbol] Defaults toconfig.dipa.coordinator_queue. -
agent_timeout:[Integer] Defaults toconfig.dipa.agent_timeout. -
agent_processing_timeout:[Integer] Defaults toconfig.dipa.agent_processing_timeout. -
coordinator_timeout:[Integer] Defaults toconfig.dipa.coordinator_timeout. -
coordinator_processing_timeout:[Integer] Defaults toconfig.dipa.coordinator_processing_timeout. -
keep_data:[true|false] Defaults tofalse. Useful for debugging. After processing allDipa::*records and the associated ActiveStorage data will be removed. If you don't want that to happen, set this totrue.
ProcessorClassName must be a Class or a String. Defines the class which provides the processor method.
:processor_class_method must be a Symbol or a String. Defines the method which is used to process each single
element of large_dataset. MUST be a class method. MUST except just one element as argument.
TODO
Development
With nix
Having nix installed. See https://nixos.org/download.html for detailed instructions for your OS.
- Enable flakes.
shell mkdir -p ~/.config/nix echo "experimental-features = nix-command flakes" >> ~/.config/nix/nix.conf - Optional but recommended: Install direnv-nix as described here
- Clone this repository
-
cdinto the repository's directory - Enter shell:
- Without direnv: Execute
nix develop. You need to enter and leave the shell explicitly every time. - With direnv:
shell echo -e "use flake . --impure" > .envrc direnv allow - Shell will start every time you
cdinto the projects directory and stop as soon as you leave it.
- Without direnv: Execute
The shell sets up the environment for working with this repository and installs all required tools for this project.
Changes in the
flake.nixfix will trigger a rebuild of your devenv environment as soon as you hit the shell (return key/(re-)enter shell). Specifically, it rebuilds parts that needs rebuild only. You can also enforce a rebuild by executingdirenv reload.Starting the shell the first time might take some minutes.
- Enable flakes.
Run
bundle install.Start services in another terminal window with
devenv up(as of 15.08.2023 it's mysql). The first run will also setup the database.Run
bundle exec rake db:migrate.
Without nix
After checking out the repo, run bin/setup to install dependencies. Then, run bundle exec rspec to run the tests.
You can also run bin/console for an interactive prompt that will allow you to experiment.
Contributing
Bug reports and pull requests are welcome on Codeberg at https://codeberg.org/empunkt/dipa. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the Dipa project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.