Module: HexaPDF::Task::PDFA

Defined in:
lib/hexapdf/task/pdfa.rb

Overview

Task for creating a PDF/A compliant document.

It automatically

  • prevents the Standard 14 PDF fonts to be used.

  • adds an appropriate output intent if none is set.

  • adds the necessary PDF/A metadata properties.

Additionally, it applies fixes to the document so that the structures and content of non-conforming PDFs are corrected. See ::call for more information on the available fixes.

Note that you should use a PDF/A validation tool like veraPDF (verapdf.org/) to ensure that the resulting files confirm to the PDF/A specification because not all documents can be fixed at the moment.

Defined Under Namespace

Classes: CIDCollector

Constant Summary collapse

SRGB_ICC =

:nodoc:

'sRGB2014.icc'
ALL_FIXES =

:nodoc:

[:fix_glyph_widths]
FIXES_FOR_LOADED_DOCUMENTS =

:nodoc:

[:fix_glyph_widths]

Class Method Summary collapse

Class Method Details

.add_srgb_icc_output_intent(doc) ⇒ Object

:nodoc:



108
109
110
111
112
113
114
# File 'lib/hexapdf/task/pdfa.rb', line 108

def self.add_srgb_icc_output_intent(doc) # :nodoc:
  icc = doc.add({N: 3}, stream: File.binread(File.join(HexaPDF.data_dir, SRGB_ICC)))
  doc.catalog[:OutputIntents] = [
    doc.add({S: :GTS_PDFA1, OutputConditionIdentifier: SRGB_ICC, Info: SRGB_ICC,
             RegistryName: 'https://www.color.org', DestOutputProfile: icc}),
  ]
end

.call(doc, level: '3u', fixes: :default) ⇒ Object

Performs the necessary tasks to make the document PDF/A compatible.

level

Specifies the PDF/A conformance level that should be used. Can be one of the following strings: 2b, 2u, 3b, 3u.

fixes

Specifies the fixes that should be applied when converting a non-conforming PDF. If a document is created with HexaPDF but also includes parts of loaded documents, this argument hast to be set to :all.

Can be :default (which is also the default value), :all or an array with one or more fix names.

:default

Applies all fixes if the document was loaded from a file. Otherwise applies

only those fixes necessary for files created with HexaPDF.

:all: Applies all available fixes.

:glyph_widths

Corrects mismatching width information in fonts.



84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# File 'lib/hexapdf/task/pdfa.rb', line 84

def self.call(doc, level: '3u', fixes: :default)
  unless level.match?(/\A[23][bu]\z/)
    raise ArgumentError, "The given PDF/A conformance level '#{level}' is not supported"
  end
  doc.config['font_loader'].delete('HexaPDF::FontLoader::Standard14')
  doc.register_listener(:complete_objects) do
    part, conformance = level.chars
    doc..property('pdfaid', 'part', part)
    doc..property('pdfaid', 'conformance', conformance.upcase)
    add_srgb_icc_output_intent(doc) unless doc.catalog.key?(:OutputIntents)

    fixes = if fixes == :all || (fixes == :default && doc.revisions.parser)
              ALL_FIXES
            elsif fixes == :default
              ALL_FIXES - FIXES_FOR_LOADED_DOCUMENTS
            else
              fixes
            end
    fixes.each {|fix| send(fix, doc) }
  end
end

.fix_glyph_widths(doc) ⇒ Object

Makes the glyph widths stored in the embedded fonts the same as the ones specified in the PDF font data structures.

Note: Currently only handles Type 2 CIDFonts.



124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# File 'lib/hexapdf/task/pdfa.rb', line 124

def self.fix_glyph_widths(doc) # :nodoc:
  # Step 1: Collect all CIDs together with their respective fonts
  processor = CIDCollector.new
  doc.pages.each do |page|
    page.process_contents(processor)
    page.each_annotation do |annotation|
      next unless (appearance = annotation.appearance)
      appearance.process_contents(processor, original_resources: page.resources)
    end
  end

  # Step 2: Process all found fonts
  processor.map.each do |font_object, all_cids|
    next if all_cids.empty?
    font = HexaPDF::Font::TrueType::Font.new(StringIO.new(font_object.font_file.stream))
    cid_to_gid = cid_to_gid_mapping(font_object)

    # Process all found CIDs by comparing their width with the ones defined in the font and
    # correcting the font if necessary.
    raw_hmtx = font[:hmtx].raw_data
    width_conversion_factor = 1000.0 / font[:head].units_per_em
    all_cids.each do |cid|
      cid_width = font_object.width(cid)
      gid = cid_to_gid[cid]
      gid_width = font[:hmtx][gid].advance_width * width_conversion_factor
      next if (cid_width - gid_width).abs.round <= 1
      raw_hmtx[4 * gid, 2] = [(cid_width / width_conversion_factor).round].pack('n')
    end

    font_object.font_file.stream = font.build('hmtx' => raw_hmtx)
  end
end