Class: Kotoshu::Documents::Document

Inherits:
Object
  • Object
show all
Defined in:
lib/kotoshu/documents/document.rb

Overview

Abstract base class for documents.

Provides a unified interface for different document formats:

  • Plain text

  • Markdown

AsciiDoc Code files (with syntax awareness)

Subclasses implement format-specific parsing and context retrieval.

Examples:

Plain text document

doc = PlainTextDocument.new("Hello world\n")
doc.text_nodes.each { |node| puts node.text }

Markdown document

doc = MarkdownDocument.new("# Title\nParagraph text")
doc.text_nodes.each { |node| puts node.text }

Constant Summary collapse

FORMATS =

Supported document formats

{
  text: 'Plain Text',
  markdown: 'Markdown',
  asciidoc: 'AsciiDoc',
  code: 'Code'
}.freeze

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(content, format: :text, language_code: 'en') ⇒ Document

Create a new document.

Parameters:

  • content (String)

    The document content

  • format (Symbol) (defaults to: :text)

    Document format (:text, :markdown, :asciidoc, :code)

  • language_code (String) (defaults to: 'en')

    ISO 639-1 language code (default: ‘en’)

Raises:

  • (ArgumentError)


102
103
104
105
106
107
108
# File 'lib/kotoshu/documents/document.rb', line 102

def initialize(content, format: :text, language_code: 'en')
  raise ArgumentError, "Invalid format: #{format}" unless FORMATS.key?(format)

  @content = content
  @format = format
  @language_code = language_code
end

Instance Attribute Details

#contentObject (readonly)

Returns the value of attribute content.



87
88
89
# File 'lib/kotoshu/documents/document.rb', line 87

def content
  @content
end

#formatObject (readonly)

Returns the value of attribute format.



87
88
89
# File 'lib/kotoshu/documents/document.rb', line 87

def format
  @format
end

#language_codeObject (readonly)

Returns the value of attribute language_code.



87
88
89
# File 'lib/kotoshu/documents/document.rb', line 87

def language_code
  @language_code
end

Class Method Details

.detect_format(content) ⇒ Symbol

Detect format from content.

Parameters:

  • content (String)

    The document content

Returns:

  • (Symbol)

    Detected format



178
179
180
181
182
# File 'lib/kotoshu/documents/document.rb', line 178

def self.detect_format(content)
  return :markdown if content.start_with?('#')
  return :code if content.end_with?('.')
  :text
end

.detect_language_from_path(path) ⇒ String

Detect language code from file path.

Parameters:

  • path (String)

    File path

Returns:

  • (String)

    Language code



219
220
221
222
223
224
225
226
# File 'lib/kotoshu/documents/document.rb', line 219

def self.detect_language_from_path(path)
  # Extract from path like "README.en.md" or "document.de.txt"
  if path =~ /\.([a-z]{2})\./i
    Regexp.last_match(1)
  else
    'en'
  end
end

.from_file(path) ⇒ Document

Create document from file.

Parameters:

  • path (String)

    Path to the file

Returns:



188
189
190
191
192
193
194
195
196
197
198
199
200
201
# File 'lib/kotoshu/documents/document.rb', line 188

def self.from_file(path)
  content = File.read(path, encoding: 'UTF-8')
  format = detect_format(content)
  language_code = detect_language_from_path(path)

  case format
  when :markdown
    MarkdownDocument.new(content, language_code: language_code)
  when :asciidoc
    AsciidocDocument.new(content, language_code: language_code)
  else
    PlainTextDocument.new(content, language_code: language_code)
  end
end

.from_string(content, language_code: 'en') ⇒ Document

Create document from string with format detection.

Parameters:

  • content (String)

    The document content

  • language_code (String) (defaults to: 'en')

    Language code (optional)

Returns:



208
209
210
211
# File 'lib/kotoshu/documents/document.rb', line 208

def self.from_string(content, language_code: 'en')
  format = detect_format(content)
  new(content, format: format, language_code: language_code)
end

Instance Method Details

#apply(corrections) ⇒ Document

Apply corrections and return new document.

Parameters:

Returns:

  • (Document)

    New document with corrections applied

Raises:

  • (NotImplementedError)


149
150
151
# File 'lib/kotoshu/documents/document.rb', line 149

def apply(corrections)
  raise NotImplementedError, "#{self.class} must implement #apply"
end

#context_for(location, window: 5) ⇒ Models::Context

Get context around a specific location.

Parameters:

  • location (Location)

    The error location

  • window (Integer) (defaults to: 5)

    Number of lines before/after (default: 5)

Returns:

Raises:

  • (NotImplementedError)


141
142
143
# File 'lib/kotoshu/documents/document.rb', line 141

def context_for(location, window: 5)
  raise NotImplementedError, "#{self.class} must implement #context_for"
end

#get_node(path) ⇒ Object?

Get node at a specific path (for structured formats).

Parameters:

  • path (Array)

    Node path (e.g., [:paragraph, 3, :text])

Returns:

  • (Object, nil)

    The node object or nil

Raises:

  • (NotImplementedError)


123
124
125
# File 'lib/kotoshu/documents/document.rb', line 123

def get_node(path)
  raise NotImplementedError, "#{self.class} must implement #get_node"
end

#line_countInteger

Get line count.

Returns:

  • (Integer)

    Total line count



163
164
165
# File 'lib/kotoshu/documents/document.rb', line 163

def line_count
  @content.lines.size
end

#nameString

Get document name (for display).

Returns:

  • (String)

    Document name or identifier



170
171
172
# File 'lib/kotoshu/documents/document.rb', line 170

def name
  "document"
end

#replace_node(location, new_text) ⇒ Document

Replace text at a specific location.

Parameters:

  • location (Location)

    The location to replace

  • new_text (String)

    The new text

Returns:

  • (Document)

    New document with replacement applied

Raises:

  • (NotImplementedError)


132
133
134
# File 'lib/kotoshu/documents/document.rb', line 132

def replace_node(location, new_text)
  raise NotImplementedError, "#{self.class} must implement #replace_node"
end

#text_nodesArray<TextNode>

Get all text nodes for spell checking.

Subclasses implement format-specific text extraction.

Returns:

  • (Array<TextNode>)

    Text nodes in the document

Raises:

  • (NotImplementedError)


115
116
117
# File 'lib/kotoshu/documents/document.rb', line 115

def text_nodes
  raise NotImplementedError, "#{self.class} must implement #text_nodes"
end

#word_countInteger

Get word count.

Returns:

  • (Integer)

    Total word count



156
157
158
# File 'lib/kotoshu/documents/document.rb', line 156

def word_count
  @content.split(/\s+/).size
end