cdc-parallel

Gem Version CI Ruby Version License: MIT

Optional high-throughput Ractor runtime for cdc-core.

cdc-parallel executes CDC::Core::Processor objects in Ractors when those processors explicitly declare themselves Ractor-safe.

Requirements

  • Ruby 4.0+
  • cdc-core
  • parallel-pool

Ruby 4.0+ is required because this gem targets the stabilized Ruby Ractor API.

Purpose

cdc-core
   │
   ▼
cdc-parallel
   │
   ▼
parallel Parallel-aware processing

cdc-parallel is a runtime adapter. It does not define CDC events and does not parse database streams.

Installation

gem "cdc-parallel"

Usage

require "cdc/core"
require "cdc/parallel"

class MetricsProcessor < CDC::Core::Processor
  ractor_safe!

  def process(event)
    CDC::Core::ProcessorResult.success(
      table: event.table,
      operation: event.operation
    )
  end
end

runtime =
  CDC::Parallel::Runtime.new(
    processor: MetricsProcessor.new,
    size: 4
  )

result = runtime.process(event)

runtime.shutdown

Processor Safety

Only processors that declare ractor_safe! can run in this runtime.

class AnalyticsProcessor < CDC::Core::Processor
  ractor_safe!
end

Unsafe processors raise:

CDC::Parallel::UnsafeProcessorError

Concurrency Contract

CDC::Parallel::ProcessorPool accepts submissions from multiple Ruby threads. Dispatch state is synchronized inside the pool, while processor execution occurs inside isolated Ruby 4 Ractors.

Workers own their Ractor::Port inboxes. The pool sends work to those inboxes, and workers send results back to a caller-owned reply port.

Caller Thread A ─┐
Caller Thread B ─┼─> ProcessorPool
Caller Thread C ─┘        │
                          │ synchronized dispatch
                          ▼
                +-------------------+
                | worker selection  |
                +-------------------+
                  │       │       │
                  ▼       ▼       ▼
            inbox port inbox port inbox port
                  │       │       │
                  ▼       ▼       ▼
            Ractor 1  Ractor 2  Ractor 3
                  │       │       │
                  └───┬───┴───┬───┘
                      ▼       ▼
             caller-owned   reply port
                      │
                      ▼
             ordered ProcessorResult[]

What Belongs Here

  • Ractor processor execution
  • Transaction envelope processing
  • Processor safety validation
  • Graceful shutdown
  • Result normalization

What Does Not Belong Here

  • PostgreSQL connection handling
  • pgoutput parsing
  • pgoutput decoding
  • Rails integration
  • Audit persistence
  • Kafka/Redis/S3 publishing

Ecosystem Position

cdc-parallel
      │
      ▼
pgoutput-parser
      │
      ▼
pgoutput-decoder
      │
      ▼
cdc-core
      │
      ▼
cdc-parallel
      │
      ▼
whodunit-chronicles

Roadmap

  • Persistent worker pools using parallel-pool
  • Mixed CompositeProcessor routing
  • Ratomic-backed queues
  • Ratomic-backed metrics
  • Backpressure policies
  • Transaction ordering strategies

Test Organization

The test suite is grouped by intent so the same structure can be reused across CDC ecosystem gems.

test/unit/          focused class and branch coverage
test/integration/   component interaction and runtime integration
test/behavior/      ecosystem contracts and guardrails
test/performance/   opt-in smoke benchmarks

Run the default quality suite:

bundle exec rake test

Run a specific group:

bundle exec rake test:unit
bundle exec rake test:integration
bundle exec rake test:behavior
bundle exec rake test:performance

The default test task runs unit, integration, and behavior tests. Performance tests are intentionally separate because they are environment-sensitive.

License

MIT.

Benchmarking

cdc-parallel includes reproducible benchmarks that compare serial processor execution against the pre-warmed Ractor worker pool.

The benchmark focuses on three workload categories:

Workload Purpose
tiny Measure dispatch overhead
cpu Measure CPU-bound processing throughput
batch Measure batched CDC event processing throughput

See benchmark/README.md for the full benchmark methodology, configuration reference, report schema, and interpretation guidance.

Quick Start

Tiny workload:

BENCHMARK_WORKLOAD=tiny \
bundle exec rake benchmark:processor_pool

CPU-bound workload:

BENCHMARK_WORKLOAD=cpu \
BENCHMARK_CPU_ROUNDS=5000 \
bundle exec rake benchmark:processor_pool

Batch workload:

BENCHMARK_WORKLOAD=batch \
BENCHMARK_BATCH_SIZE=10000 \
bundle exec rake benchmark:processor_pool

Worker-count sweep:

BENCHMARK_WORKLOAD=cpu \
BENCHMARK_WORKER_COUNTS=1,2,4 \
bundle exec rake benchmark:processor_pool

Credibility controls:

BENCHMARK_TRIALS=7 \
BENCHMARK_MIN_DURATION=0.25 \
BENCHMARK_ITERATIONS=1000 \
bundle exec rake benchmark:processor_pool

Benchmark Docker Image

Build and run the reusable Docker image:

bundle exec rake benchmark:docker_build
bundle exec rake benchmark:docker_run

Or run the image directly after it is published to GitHub Container Registry:

docker run --rm ghcr.io/kanutocd/cdc-parallel-benchmark:main

The benchmark image is intended to become the shared performance validation pattern across CDC Ecosystem gems, enabling reproducible benchmark execution locally, in CI, and across different development environments.