Bemi
A suite of tools that allow to reliably track data changes without using extra Rails model callbacks.
Bemi stands for "beginner mindset" and is pronounced as [ˈbɛmɪ].
Contents
Overview
- Automatically storing database changes and any addition context in a structured form
- High performance without affecting your code execution with callbacks
- 100% reliability by using a design pattern called Change Data Capture
- Easy to use, no data engineering knowledge or complex infrastructure is required
- Web UI and code tools for inspecting and auditing data changes and user activity
- Works with the most popular databases like MySQL, PostgreSQL and MongoDB (soon)
Code example
Here is an example of storing all data changes made when processing an HTTP request:
class ApplicationController < ActionController::Base
before_action :set_bemi_context
private
# Attach any information you want to any subsequent data changes
def set_bemi_context
Bemi.set_context(
user_id: current_user&.id,
ip: request.remote_ip,
user_agent: request.user_agent,
controller: "#{self.class.name}##{action_name}",
)
end
end
class InvoicesController < ApplicationController
# Automatically store *any* database changes
def update
invoice = Invoice.find(params[:id])
invoice.update_column(:due_date, params[:due_date])
invoice.client.recurring_schedule.delete
end
end
Bemi then allows easily querying data changes:
Bemi.activity(ip: '127.0.0.1').map(&:pretty_print)
# Bemi::Changeset
# - id: 2040
# - table: "invoices"
# - external_id: 43
# - action: "update"
# - committed_at: Sat, 03 Jun 2023 21:16:22 UTC +00:00
# - change:
# - updated_at: ["2023-06-03 20:41:35", "2023-06-03 21:16:22"]
# - due_date: ["2023-06-03", "2023-06-30"]
# - context:
# - ip: "127.0.0.1"
# - user_id: 3195
# - controller: "InvoicesController#update"
# - user_agent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
#
# Bemi::Changeset
# - id: 2041
# - table: "recurring_schedules"
# - external_id: 5
# - action: "delete"
# - committed_at: Sat, 03 Jun 2023 21:16:22 UTC +00:00
# - change:
# - id: 5
# - frequency: 1
# - occurrences: 0
# - invoice_id: 43
# - created_at: "2023-04-28 20:34:09"
# - updated_at: "2023-04-28 20:34:09"
# - context:
# - ip: "127.0.0.1"
# - user_id: 3195
# - controller: "InvoicesController#update"
# - user_agent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
Architecture
Bemi is designed to be lightweight, composable, and simple to use by default.
/‾‾‾\
\___/
__/ \__
/ User \
│
│ Application code
- - - - - │ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
╵ │ ╵
╵ │ Update invoice ╵
╵ ∨ ╵
╵ ______________ ______________ ╵
╵ ┆ ┆ ┆ ┆ ╵
╵ ┆ Rails ┆ Structured changes ┆ Bemi ┆ ╵
╵ ┆ server ┆ ╷–––––––––––––––––––––– ┆ process ┆ ╵
╵ ┆ ┆ │ ┆ ┆ ╵
╵ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾ │ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾ ╵
╵ │ │ ⌃ ╵
╵ │ Database query │ Replication log │ ╵
╵ │ │ │ ╵
- - - - - │ - - - - - - - - - - - - - - - │ - - - - - - - - - - - - - - - │ - - - - -╵
│ ∨ │
│ [‾‾‾‾‾‾‾‾‾‾‾‾] │
│ [------------] │
╵–––––––––––––––––––––––> [ Database ] –––––––––––––––––––––––╵
[------------]
[____________]
Bemi by reuses the same connection configuration and runs a simple process to process a replication log that databases usually use to communicate within the same cluster:
- Binary Log for MySQL
- Write-Ahead Log for PostgreSQL
- Oplog for MongoDB
By default, it stores the structured data changes in the same database.
Usage
Installation
Add gem 'bemi' to your application's Gemfile and execute:
$ bundle install
Database migration
Create a new database migration to store changeset and context in a structured form:
$ bundle exec rails g migration create_bemi_tables
Then paste the following into the created migration file:
# db/migrate/20230603190131_create_bemi_tables.rb
CreateBemiTables = Class.new(Bemi.generate_migration)
And run:
$ bundle exec rails db:migrate
Bemi process
Alternatives
Background jobs with persistent state
Tools like Sidekiq, Que, and GoodJob are similar since they execute jobs in background, persist the execution state, retry, etc. These tools, however, focus on executing a single job as a unit of work. Bemi can use these tools to perform single actions when managing chains of actions defined in workflows without a need to use complex callbacks.
Bemi orchestrates workflows instead of trying to choreograph them. This makes it easy to implement and maintain the code, reduce coordination overhead by having a central coordinator, improve observability, and simplify troubleshooting issues.
Orchestration
Choreography
Workflow orchestration tools and services
Tools like Temporal, AWS Step Functions, Argo Workflows, and Airflow allow orchestrating workflows, although they use quite different approaches.
Temporal was born based on challenges faced by big-tech and enterprise companies. As a result, it has a complex architecture with deployed clusters, support for databases like Cassandra and optional Elasticsearch, and multiple services for frontend, matching, history, etc. Its main differentiator is writing workflows imperatively instead of describing them declaratively (think of state machines). This makes code a lot more complex and forces you to mix business logic with implementation and execution details. Some would argue that Temporal's development and user experience are quite rough. Plus, at the time of this writing, it doesn't have an official stable SDK for our favorite programming language (Ruby).
AWS Step Functions rely on using AWS Lambda to execute each action in a workflow. For various reasons, not everyone can use AWS and their serverless solution. Additionally, workflows should be defined in JSON by using Amazon States Language instead of using a regular programming language.
Argo Workflows rely on using Kubernetes. It is closer to infrastructure-level workflows since it relies on running a container for each workflow action and doesn't provide code-level features and primitives. Additionally, it requires defining workflows in YAML.
Airflow is a popular tool for data engineering pipelines. Unfortunately, it can work only with Python.
Ruby frameworks for writing better code
There are many libraries that also implement useful patterns and allow better organize the code. For example, Interactor, ActiveInteraction, Mutations, Dry-Rb, and Trailblazer. They, however, don't help with asynchronous and distributed execution with better reliability guarantees that many of us rely on to execute code "out-of-band" to avoid running long-running workflows in a request/response lifecycle. For example, when sending emails, sending requests to other services, running multiple actions in parallel, etc.
License
The gem is available as open source under the terms of the MIT License.
Code of Conduct
Everyone interacting in the Bemi project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.