Module: KairosMcp::Daemon::WalRecovery
- Defined in:
- lib/kairos_mcp/daemon/wal_recovery.rb
Overview
WalRecovery — boot-time reconciliation of a WAL directory.
Design (v0.2 P3.0):
On daemon startup, each `<mandate_id>.wal.jsonl` in `wal_dir` may
contain steps that were marked `executing` but never transitioned
to `completed` / `failed` because the process crashed mid-phase.
Recovery resets those steps back to `pending` so the cycle
runner's idempotency check can re-execute them safely.
What recovery does NOT do:
* It does not replay steps itself — that's the cycle runner's job.
* It does not delete WAL files. Finalized plans are archived
elsewhere (WAL#archive).
* It does not touch steps whose plan is already finalized.
Return value:
Integer — total number of steps reset to pending across all files.
Useful both for tests and for the daemon's boot log line.
Constant Summary collapse
- WAL_GLOB =
'*.wal.jsonl'
Class Method Summary collapse
-
.recover_file!(path, logger = nil) ⇒ Object
Recover a single WAL file.
-
.recover_from_wal!(wal_dir, logger = nil) ⇒ Object
Reset every ‘executing` step in `wal_dir` back to `pending`.
Class Method Details
.recover_file!(path, logger = nil) ⇒ Object
Recover a single WAL file. Isolated so one corrupt file can’t block recovery of the rest of the directory.
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/kairos_mcp/daemon/wal_recovery.rb', line 52 def recover_file!(path, logger = nil) wal = KairosMcp::Daemon::WAL.open(path: path) count = 0 begin wal.plans_not_finalized.each do |plan| plan.steps.each do |step| next unless step.status == 'executing' wal.mark_reset_to_pending(step.step_id) count += 1 logger&.info('wal_recovery_reset_step', source: 'wal_recovery', details: { path: path, plan_id: plan.plan_id, step_id: step.step_id }) end end ensure wal.close end count rescue StandardError => e logger&.error('wal_recovery_file_failed', source: 'wal_recovery', details: { path: path, error: "#{e.class}: #{e.}" }) 0 end |
.recover_from_wal!(wal_dir, logger = nil) ⇒ Object
Reset every ‘executing` step in `wal_dir` back to `pending`. Returns the total reset count. A missing or empty directory returns 0 without raising — recovery on a clean boot is a no-op.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# File 'lib/kairos_mcp/daemon/wal_recovery.rb', line 35 def recover_from_wal!(wal_dir, logger = nil) return 0 if wal_dir.nil? || wal_dir.to_s.empty? return 0 unless Dir.exist?(wal_dir) total = 0 Dir.glob(File.join(wal_dir, WAL_GLOB)).sort.each do |path| total += recover_file!(path, logger) end logger&.info('wal_recovery_complete', source: 'wal_recovery', details: { wal_dir: wal_dir, reset_count: total }) total end |