Class: Iriq::CLI

Inherits:
Object
  • Object
show all
Defined in:
lib/iriq/cli.rb

Overview

Flag-driven CLI. The default action for an input is a combined parse + normalize + explain summary; the -p/-n/-e flags select individual sections. The only subcommand is ‘cluster`, which is structurally different (many inputs, not one). Construct with explicit IO so specs can run it without shelling out.

Constant Summary collapse

SECTION_FLAGS =
%i[parse normalize].freeze
TOP_N_STATS =
10
LARGE_BATCH_THRESHOLD =

When extraction yields this many or more IRIs, the default pipe output switches from a URL list to clusters — a longer list is easier to read as route-shape groups.

10
USAGE =
<<~TXT
  iriq — find a URL's shape: the route template behind it (e.g. /users/{id}).

  Usage: iriq [options] <input>
         iriq [options] < text
         iriq cluster [options] [file]

  <input> may be an IRI, a file path (extracted automatically), or piped
  text via stdin.

  Sections (combine freely):
    -n, --normalize       Shape — variable parts become placeholders
    -c, --canonical       Clean form — tidy scheme/host, keep the values
    -p, --parse           Parsed fields
    -e, --explain         Annotated trace — per-segment notes about why
                          each placeholder / canonical value was chosen

  Corpus + stats:
        --corpus PATH     Load/create a JSON corpus; observe and save atomically.
                          -n becomes corpus-informed once it has data.
        --host MODE       Host-keying strategy for clustering:
                          full (default), registrable (or reg) strips
                          subdomains, none ignores host entirely.
        --stats           Print rolling aggregates
        --reinfer         Replay the source-IRI log through the current
                          classifier + reducers; rebuilds materialized
                          views from scratch. Requires --corpus.
        --propose-recognizers
                          Scan observed values for shape patterns that
                          recur enough to suggest a new Recognizer.
                          Combine with --json for structured output.
                          Requires --corpus.
        --cross-host-shapes
                          List route shapes that recur across
                          multiple hosts. Combine with --min-hosts.
                          Requires --corpus.
        --activate-above F  With --propose-recognizers, promote every
                          proposal at or above CONFIDENCE F into a
                          live Recognizer on the corpus, then
                          reinfer. Confidence integrates coverage
                          and cross-host corroboration.

  Thresholds (apply to --propose-recognizers / --cross-host-shapes):
        --min-observations N  proposal noise floor (default 20)
        --min-coverage F      proposal coverage floor (default 0.7)
        --min-hosts N         proposal: minimum hosts (default 1);
                              cross-host-shapes: minimum hosts to
                              list (default 2)

  Other:
    -h, --help            Show this message
    -j, --json            Emit JSON instead of human-readable output
    -J, --ndjson          Newline-delimited JSON (one object per line). Implies --json.
    -N, --no-hints        Use {integer} placeholders instead of {user_id}
        --no-scheme-less  Skip foo.com/path extraction (explicit-scheme only)
    -V, --version         Print version

  Subcommands:
    cluster [file]        Force cluster view (default for ≥10 IRIs anyway)

  Examples:
    iriq foo.com/users/456
    iriq -n https://foo.com/users/123
    iriq ./access.log                     # auto-detect file → extract URLs
    cat README.md | iriq -n               # one normalized URL per line
    cat README.md | iriq --corpus c.json
TXT

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(stdin: $stdin, stdout: $stdout, stderr: $stderr) ⇒ CLI

Returns a new instance of CLI.



90
91
92
93
94
# File 'lib/iriq/cli.rb', line 90

def initialize(stdin: $stdin, stdout: $stdout, stderr: $stderr)
  @stdin  = stdin
  @stdout = stdout
  @stderr = stderr
end

Instance Attribute Details

#stderrObject (readonly)

Returns the value of attribute stderr.



88
89
90
# File 'lib/iriq/cli.rb', line 88

def stderr
  @stderr
end

#stdinObject (readonly)

Returns the value of attribute stdin.



88
89
90
# File 'lib/iriq/cli.rb', line 88

def stdin
  @stdin
end

#stdoutObject (readonly)

Returns the value of attribute stdout.



88
89
90
# File 'lib/iriq/cli.rb', line 88

def stdout
  @stdout
end

Instance Method Details

#parseable_iri?(input) ⇒ Boolean

Returns:

  • (Boolean)


152
153
154
155
156
157
# File 'lib/iriq/cli.rb', line 152

def parseable_iri?(input)
  Iriq.parse(input)
  true
rescue Iriq::ParseError
  false
end

#run(argv) ⇒ Object

Returns an integer exit code.



97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
# File 'lib/iriq/cli.rb', line 97

def run(argv)
  # Pre-scan so an error during option parsing can still honor --json.
  # Re-set authoritatively from opts once parsing succeeds.
  @json = json_requested?(argv)
  args, opts = parse_options(argv)
  @json = opts[:json]

  return print_usage(stdout, 0) if opts[:help]
  return print_version          if opts[:version]

  # `iriq completion <shell>` short-circuits — no corpus, no IRI input,
  # just emit the script bundled with the gem.
  if args.first == "completion"
    args.shift
    return cmd_completion(args)
  end

  explicit_cluster = (args.first == "cluster")
  args.shift if explicit_cluster

  # Auto-detect: a positional argument that isn't parseable as an IRI
  # but IS an existing file gets treated as a file to extract from. This
  # is what makes `iriq ./access.log` and `iriq /var/log/foo.log` Just
  # Work without a separate --extract flag.
  positional_is_file = args.first && File.file?(args.first) && !parseable_iri?(args.first)

  batch_mode = explicit_cluster || positional_is_file ||
               (args.empty? && piped_stdin?)

  return print_usage(stdout, 0) if args.empty? && !batch_mode && !opts[:reinfer] && !opts[:propose] && !opts[:cross_host_shapes]

  corpus = opts[:corpus] ? load_corpus(opts[:corpus], host_strategy: opts[:host_strategy]) : nil

  code = if opts[:reinfer]
    cmd_reinfer(corpus, opts)
  elsif opts[:propose]
    cmd_propose(corpus, opts)
  elsif opts[:cross_host_shapes]
    cmd_cross_host_shapes(corpus, opts)
  elsif batch_mode
    cmd_batch(args, opts, corpus, explicit_cluster: explicit_cluster)
  elsif opts[:stats]
    cmd_stats(corpus, opts)
  else
    cmd_summary(args, opts, corpus)
  end

  corpus.save(opts[:corpus]) if corpus && opts[:corpus]
  code
rescue Iriq::ParseError => e
  emit_error("parse_error", e.message, 2, human: "iriq: parse error: #{e.message}")
rescue OptionParser::ParseError => e
  emit_error("option_error", e.message, 1)
end