Module: HexaPDF::Task::PDFA
- Defined in:
- lib/hexapdf/task/pdfa.rb
Overview
Task for creating a PDF/A compliant document.
It automatically
-
prevents the Standard 14 PDF fonts to be used.
-
adds an appropriate output intent if none is set.
-
adds the necessary PDF/A metadata properties.
Additionally, it applies fixes to the document so that the structures and content of non-conforming PDFs are corrected. See ::call for more information on the available fixes.
Note that you should use a PDF/A validation tool like veraPDF (verapdf.org/) to ensure that the resulting files confirm to the PDF/A specification because not all documents can be fixed at the moment.
Defined Under Namespace
Classes: CIDCollector
Constant Summary collapse
- SRGB_ICC =
:nodoc:
'sRGB2014.icc'- ALL_FIXES =
:nodoc:
[:fix_glyph_widths]
- FIXES_FOR_LOADED_DOCUMENTS =
:nodoc:
[:fix_glyph_widths]
Class Method Summary collapse
-
.add_srgb_icc_output_intent(doc) ⇒ Object
:nodoc:.
-
.call(doc, level: '3u', fixes: :default) ⇒ Object
Performs the necessary tasks to make the document PDF/A compatible.
-
.fix_glyph_widths(doc) ⇒ Object
Makes the glyph widths stored in the embedded fonts the same as the ones specified in the PDF font data structures.
Class Method Details
.add_srgb_icc_output_intent(doc) ⇒ Object
:nodoc:
108 109 110 111 112 113 114 |
# File 'lib/hexapdf/task/pdfa.rb', line 108 def self.add_srgb_icc_output_intent(doc) # :nodoc: icc = doc.add({N: 3}, stream: File.binread(File.join(HexaPDF.data_dir, SRGB_ICC))) doc.catalog[:OutputIntents] = [ doc.add({S: :GTS_PDFA1, OutputConditionIdentifier: SRGB_ICC, Info: SRGB_ICC, RegistryName: 'https://www.color.org', DestOutputProfile: icc}), ] end |
.call(doc, level: '3u', fixes: :default) ⇒ Object
Performs the necessary tasks to make the document PDF/A compatible.
level-
Specifies the PDF/A conformance level that should be used. Can be one of the following strings: 2b, 2u, 3b, 3u.
fixes-
Specifies the fixes that should be applied when converting a non-conforming PDF. If a document is created with HexaPDF but also includes parts of loaded documents, this argument hast to be set to
:all.Can be
:default(which is also the default value),:allor an array with one or more fix names.:default-
Applies all fixes if the document was loaded from a file. Otherwise applies
only those fixes necessary for files created with HexaPDF.:all: Applies all available fixes.:glyph_widths-
Corrects mismatching width information in fonts.
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
# File 'lib/hexapdf/task/pdfa.rb', line 84 def self.call(doc, level: '3u', fixes: :default) unless level.match?(/\A[23][bu]\z/) raise ArgumentError, "The given PDF/A conformance level '#{level}' is not supported" end doc.config['font_loader'].delete('HexaPDF::FontLoader::Standard14') doc.register_listener(:complete_objects) do part, conformance = level.chars doc..property('pdfaid', 'part', part) doc..property('pdfaid', 'conformance', conformance.upcase) add_srgb_icc_output_intent(doc) unless doc.catalog.key?(:OutputIntents) fixes = if fixes == :all || (fixes == :default && doc.revisions.parser) ALL_FIXES elsif fixes == :default ALL_FIXES - FIXES_FOR_LOADED_DOCUMENTS else fixes end fixes.each {|fix| send(fix, doc) } end end |
.fix_glyph_widths(doc) ⇒ Object
Makes the glyph widths stored in the embedded fonts the same as the ones specified in the PDF font data structures.
Note: Currently only handles Type 2 CIDFonts.
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
# File 'lib/hexapdf/task/pdfa.rb', line 124 def self.fix_glyph_widths(doc) # :nodoc: # Step 1: Collect all CIDs together with their respective fonts processor = CIDCollector.new doc.pages.each do |page| page.process_contents(processor) page.each_annotation do |annotation| next unless (appearance = annotation.appearance) appearance.process_contents(processor, original_resources: page.resources) end end # Step 2: Process all found fonts processor.map.each do |font_object, all_cids| next if all_cids.empty? font = HexaPDF::Font::TrueType::Font.new(StringIO.new(font_object.font_file.stream)) cid_to_gid = cid_to_gid_mapping(font_object) # Process all found CIDs by comparing their width with the ones defined in the font and # correcting the font if necessary. raw_hmtx = font[:hmtx].raw_data width_conversion_factor = 1000.0 / font[:head].units_per_em all_cids.each do |cid| cid_width = font_object.width(cid) gid = cid_to_gid[cid] gid_width = font[:hmtx][gid].advance_width * width_conversion_factor next if (cid_width - gid_width).abs.round <= 1 raw_hmtx[4 * gid, 2] = [(cid_width / width_conversion_factor).round].pack('n') end font_object.font_file.stream = font.build('hmtx' => raw_hmtx) end end |