Module: HexaPDF::Task::ImportPages

Defined in:
lib/hexapdf/task/import_pages.rb

Overview

Task for importing pages from another document that preserves the visual appearance.

It takes care of

  • importing the specified pages with all associated objects,

  • handling optional content groups and their default state,

  • and merging form fields.

Note that the /Order, /AS and /Locked fields of the default optional content configuration dictionary are not preserved.

Example:

doc.task(:import_pages, source: source_doc, pages: [1..-2])

Class Method Summary collapse

Class Method Details

.call(doc, source:, pages: :all, append: true, ocgs: :preserve, acro_form: :merge) ⇒ Object

Performs the necessary steps to import the pages from the source docment into the target document doc. Returns the imported pages.

source

Specifies the source PDF document from which the pages should be imported.

pages

Specifies the pages that should be imported. The argument has to be one of the following:

:all

Imports all pages from the source document.

Integer value

Imports the page with the given zero-based index.

Range value

Imports the pages from the zero-based range.

Array of Integer or Range values

Imports all specified pages or page ranges.

Array of source page objects

Imports the given pages.

:append

Specifies whether the imported pages should be appended to the target document’s page tree.

ocgs

Specifies the handling of optional content groups:

:preserve

Preserve the on/off state for all used OCGs.

:ignore

Ignore the on/off state.

:acro_form

Specifies whether AcroForm fields should be merged into the target document.

:merge

Merge AcroForm fields using the MergeAcroForm task.

:ignore

Ignore AcroForm fields.



90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/hexapdf/task/import_pages.rb', line 90

def self.call(doc, source:, pages: :all, append: true, ocgs: :preserve, acro_form: :merge)
  # Retrieve all specified source pages
  pages = if pages == :all
            source.pages.each.to_a
          elsif pages.kind_of?(Integer)
            [source.pages[pages]]
          elsif pages.kind_of?(Array) && pages[0].kind_of?(HexaPDF::Type::Page)
            pages
          else
            result = Set.new
            all_pages = source.pages.each.to_a
            pages = [pages] unless pages.kind_of?(Array)
            pages.each {|selector| result.merge(Array(all_pages[selector])) }
            result
          end

  # Import the source pages and optionally append them to the target page tree
  pages = pages.map do |page|
    imported_page = doc.import(page)
    doc.pages << imported_page if append
    imported_page
  end

  doc.task(:merge_acro_form, source: source, pages: pages) if acro_form == :merge
  preserve_ocgs(doc, source, pages) if ocgs == :preserve

  pages
end

.preserve_ocgs(doc, source, pages) ⇒ Object

Preserves the state of the OCGs found on pages so that the visual appearance in the target document doc is the same as in the source document.



121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
# File 'lib/hexapdf/task/import_pages.rb', line 121

def self.preserve_ocgs(doc, source, pages)
  # Find all OCGs used on all pages
  ocgs = Set.new
  process_ocg_or_ocmd = lambda do |obj|
    if obj.type == :OCG
      ocgs << obj
    elsif obj.type == :OCMD
      ocgs.merge(obj[:OCGs].to_ary)
    end
  end
  seen_resources = {}
  pages.each do |page|
    unless seen_resources[page.resources] # handle case when pages share the resources dict
      page.resources[:Properties]&.each do |name, obj|
        next unless obj
        process_ocg_or_ocmd.call(obj)
      end

      page.resources[:XObject]&.each do |name, obj|
        process_ocg_or_ocmd.call(obj[:OC]) if obj.key?(:OC)
      end
    end

    page.each_annotation do |annot|
      process_ocg_or_ocmd.call(annot[:OC]) if annot.key?(:OC)
    end

    seen_resources[page.resources] = true
  end

  return if ocgs.empty?

  # Add all found OCGs to the optional content properties dictionary
  ocp = doc.optional_content
  ocgs.each {|ocg| ocp.add_ocg(ocg) }

  # Create a mapping from source OCGs to target OCGs and vice-versa
  source_ocg = {}
  target_ocg = {}
  source.optional_content.ocgs.each do |ocg|
    imported_ocg = doc.import(ocg)
    next unless ocgs.include?(imported_ocg)
    source_ocg[imported_ocg] = ocg
    target_ocg[ocg] = imported_ocg
  end

  # Ensure the initial state of the OCGs is correct
  source_config = source.optional_content.default_configuration
  target_config = ocp.default_configuration
  ocgs.each do |ocg|
    target_config.ocg_state(ocg, source_config.ocg_state(source_ocg[ocg]))
  end

  # Copy radio button groups from the source document, removing unknown OCGs from them
  source_config[:RBGroups]&.each do |array|
    result = array.map {|ocg| target_ocg[ocg] }.compact
    next if result.empty?
    (target_config[:RBGroups] ||= []) << result
  end
end