Class: HexaPDF::Document::Metadata

Inherits:
Object
  • Object
show all
Defined in:
lib/hexapdf/document/metadata.rb

Overview

This class provides methods for reading and writing the document-level metadata.

When an instance is created (usually through HexaPDF::Document#metadata), the metadata is read from the document’s information dictionary (see HexaPDF::Type::Info) and made available through the various methods.

By default, the metadata is written to the information dictionary as well as to the document’s metadata stream (see HexaPDF::Type::Metadata) once the document is written. This can be controlled via the #write_info_dict and #write_metdata_stream methods.

While HexaPDF is able to write an XMP packet (using a limited form) to the document’s metadata stream, it provides no way for reading XMP metadata. If reading functionality or extended writing functionality is needed, make sure this class does not write the metadata and read/create the metadata stream yourself.

Caveats

  • Disabling writing to the information dictionary will only prevent parts from being written. The #producer is always written to the information dictionary as per the AGPL license terms. The #modification_date may be written depending on the arguments to HexaPDF::Document#write.

  • If writing the metadata stream is enabled, any existing metadata stream is completely overwritten. This means the metadata stream is not updated with the changed information.

Adding custom metadata properties

All the properties specified for the information dictionary are supported.

Furthermore, HexaPDF supports writing custom properties to the metadata stream. For this to work the used XMP namespaces need to be registered using #register_namespace. Additionally, the types of all used XMP properties need to be registered using #register_property.

The following types for XMP properties are supported:

String

Maps to the XMP simple string value. Values need to be of type String.

Integer

Maps to the XMP integer core value type and gets formatted as string. Values need to be of type Integer.

Date

Maps to the XMP simple string value, correctly formatted. Values need to be of type Time, Date, or DateTime

URI

Maps to the XMP simple value variant of URI. Values need to be of type String or URI.

Boolean

Maps to the XMP simple string value, correctly formatted. Values need to be either true or false.

OrderedArray

Maps to the XMP ordered array. Values need to be of type Array and items must be XMP simple values.

UnorderedArray

Maps to the XMP unordered array. Values need to be of type Array and items must be simple values.

LanguageArray

Maps to the XMP language alternatives array. Values need to be of type Array and items
must either be strings (they are associated with the set default language) or
LocalizedString instances.

See: PDF2.0 s14.3, www.adobe.com/products/xmp.html

Defined Under Namespace

Classes: LocalizedString

Constant Summary collapse

PREDEFINED_NAMESPACES =

Contains a mapping of predefined prefixes for XMP namespaces for metadata.

{
  "rdf" => "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "xmp" => "http://ns.adobe.com/xap/1.0/",
  "pdf" => "http://ns.adobe.com/pdf/1.3/",
  "dc" => "http://purl.org/dc/elements/1.1/",
  "x" => "adobe:ns:meta/",
  "pdfaid" => "http://www.aiim.org/pdfa/ns/id/",
}.freeze
PREDEFINED_PROPERTIES =

Contains a mapping of predefined XMP properties to their types, i.e. from namespace to property and then type.

{
  "http://ns.adobe.com/xap/1.0/" => {
    'CreatorTool' => 'String',
    'CreateDate' => 'Date',
    'ModifyDate' => 'Date',
  }.freeze,
  "http://ns.adobe.com/pdf/1.3/" => {
    'Keywords' => 'String',
    'Producer' => 'String',
    'Trapped' => 'Boolean',
  }.freeze,
  "http://purl.org/dc/elements/1.1/" => {
    'creator' => 'OrderedArray',
    'description' => 'LanguageArray',
    'title' => 'LanguageArray',
  }.freeze,
  "http://www.aiim.org/pdfa/ns/id/" => {
    'part' => 'Integer',
    'conformance' => 'String',
  }.freeze,
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(document) ⇒ Metadata

Creates a new Metadata object for the given PDF document.



158
159
160
161
162
163
164
165
166
167
168
# File 'lib/hexapdf/document/metadata.rb', line 158

def initialize(document)
  @document = document
  @namespaces = PREDEFINED_NAMESPACES.dup
  @properties = PREDEFINED_PROPERTIES.transform_values(&:dup)
  @default_language = document.catalog[:Lang] || 'x-default'
  @metadata = Hash.new {|h, k| h[k] = {} }
  write_info_dict(true)
  (true)
  @document.register_listener(:complete_objects, &method(:write_metadata))
  
end

Instance Method Details

#author(value = :UNSET) ⇒ Object

:call-seq:

metadata.author           -> author or nil
metadata.author(value)    -> value

Returns the name of the person who created the document (author) if no argument is given. Otherwise sets the author to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name dc:creator.



297
298
299
# File 'lib/hexapdf/document/metadata.rb', line 297

def author(value = :UNSET)
  property('dc', 'creator', value)
end

#creation_date(value = :UNSET) ⇒ Object

:call-seq:

metadata.creation_date           -> creation_date or nil
metadata.creation_date(value)    -> value

Returns the date and time (a Time object) the document was created if no argument is given. Otherwise sets the creation date to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name xmp:CreateDate.



376
377
378
# File 'lib/hexapdf/document/metadata.rb', line 376

def creation_date(value = :UNSET)
  property('xmp', 'CreateDate', value)
end

#creator(value = :UNSET) ⇒ Object

:call-seq:

metadata.creator           -> creator or nil
metadata.creator(value)    -> value

Returns the name of the PDF processor that created the original document from which this PDF was converted if no argument is given. Otherwise sets the name of the creator tool to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name xmp:CreatorTool.



346
347
348
# File 'lib/hexapdf/document/metadata.rb', line 346

def creator(value = :UNSET)
  property('xmp', 'CreatorTool', value)
end

#default_language(value = :UNSET) ⇒ Object

:call-seq:

metadata.default_language          -> language
metadata.default_language(value)   -> value

Returns the default language in RFC3066 format used for unlocalized strings if no argument is given. Otherwise sets the default language to the given language.

The initial default lanuage is taken from the document catalog’s /Lang entry. If that is not set, the default language is assumed to be default language (‘x-default’).



179
180
181
182
183
184
185
# File 'lib/hexapdf/document/metadata.rb', line 179

def default_language(value = :UNSET)
  if value == :UNSET
    @default_language
  else
    @default_language = value
  end
end

#delete(ns = nil, property = nil) ⇒ Object

:call-seq:

.delete
.delete(ns_prefix)
.delete(ns_prefix, name)

Deletes either all metadata properties, only the ones from a specific namespace, or a specific one.



258
259
260
261
262
263
264
265
266
# File 'lib/hexapdf/document/metadata.rb', line 258

def delete(ns = nil, property = nil)
  if ns.nil? && property.nil?
    @metadata.clear
  elsif property.nil?
    @metadata.delete(namespace(ns))
  else
    @metadata[namespace(ns)].delete(property)
  end
end

#keywords(value = :UNSET) ⇒ Object

:call-seq:

metadata.keywords           -> keywords or nil
metadata.keywords(value)    -> value

Returns the keywords associated with the document if no argument is given. Otherwise sets keywords to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name pdf:Keywords.



330
331
332
# File 'lib/hexapdf/document/metadata.rb', line 330

def keywords(value = :UNSET)
  property('pdf', 'Keywords', value)
end

#modification_date(value = :UNSET) ⇒ Object

:call-seq:

metadata.modification_date           -> modification_date or nil
metadata.modification_date(value)    -> value

Returns the date and time (a Time object) the document was most recently modified if no argument is given. Otherwise sets the modification date to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name xmp:ModifyDate.



391
392
393
# File 'lib/hexapdf/document/metadata.rb', line 391

def modification_date(value = :UNSET)
  property('xmp', 'ModifyDate', value)
end

#namespace(ns) ⇒ Object

Returns the namespace URI associated with the given prefix.



217
218
219
220
221
# File 'lib/hexapdf/document/metadata.rb', line 217

def namespace(ns)
  @namespaces.fetch(ns) do
    raise HexaPDF::Error, "Namespace prefix '#{ns}' not registered"
  end
end

#producer(value = :UNSET) ⇒ Object

:call-seq:

metadata.producer           -> producer or nil
metadata.producer(value)    -> value

Returns the name of the PDF processor that converted the original document to PDF if no argument is given. Otherwise sets the name of the producer to the given value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name pdf:Producer.



361
362
363
# File 'lib/hexapdf/document/metadata.rb', line 361

def producer(value = :UNSET)
  property('pdf', 'Producer', value)
end

#property(ns, property, value = :UNSET) ⇒ Object

:call-seq:

metadata.property(ns_prefix, name)           -> property_value
metadata.property(ns_prefix, name, value)    -> value

Returns the value for the property specified via the namespace prefix ns_prefix and name if the value argument is not provided. Otherwise sets the property to value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.



240
241
242
243
244
245
246
247
248
249
# File 'lib/hexapdf/document/metadata.rb', line 240

def property(ns, property, value = :UNSET)
  ns = @metadata[namespace(ns)]
  if value == :UNSET
    ns[property]
  elsif value.nil?
    ns.delete(property)
  else
    ns[property] = value
  end
end

#register_namespace(prefix, uri) ⇒ Object

Registers the prefix for the given namespace uri.



212
213
214
# File 'lib/hexapdf/document/metadata.rb', line 212

def register_namespace(prefix, uri)
  @namespaces[prefix] = uri
end

#register_property_type(prefix, property, type) ⇒ Object

Registers the property for the namespace specified via prefix as the given type.

The argument type has to be one of the following: ‘String’, ‘Integer’, ‘Date’, ‘URI’, ‘Boolean’, ‘OrderedArray’, ‘UnorderedArray’, or ‘LanguageArray’.



227
228
229
# File 'lib/hexapdf/document/metadata.rb', line 227

def register_property_type(prefix, property, type)
  (@properties[namespace(prefix)] ||= {})[property] = type
end

#subject(value = :UNSET) ⇒ Object

:call-seq:

metadata.subject           -> subject or nil
metadata.subject(value)    -> value

Returns the subject of the document if no argument is given. Otherwise sets the subject to the given value.

If the value is a LocalizedString, the language for the subject is taken from it. Otherwise the language specified via #default_language is used.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name dc:description.



315
316
317
# File 'lib/hexapdf/document/metadata.rb', line 315

def subject(value = :UNSET)
  property('dc', 'description', value)
end

#title(value = :UNSET) ⇒ Object

:call-seq:

metadata.title           -> title or nil
metadata.title(value)    -> value

Returns the document’s title if no argument is given. Otherwise sets the document’s title to the given value.

If the value is a LocalizedString, the language for the title is taken from it. Otherwise the language specified via #default_language is used.

The value nil is returned if the property is not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name dc:title.



282
283
284
# File 'lib/hexapdf/document/metadata.rb', line 282

def title(value = :UNSET)
  property('dc', 'title', value)
end

#trapped(value = :UNSET) ⇒ Object

:call-seq:

metadata.trapped           -> trapped or nil
metadata.trapped(value)    -> value

Returns true if the document has been modified to include trapping information if no argument is given. Otherwise sets the trapped status to the given boolean value.

The value nil is returned if the property ist not set. And by using nil as value the property is deleted from the metadata.

This metadata property is represented by the XMP name pdf:Trapped.



406
407
408
# File 'lib/hexapdf/document/metadata.rb', line 406

def trapped(value = :UNSET)
  property('pdf', 'Trapped', value)
end

#write_info_dict(value) ⇒ Object

Makes HexaPDF write the information dictionary if value is true.

See the class documentation for caveats.



195
196
197
# File 'lib/hexapdf/document/metadata.rb', line 195

def write_info_dict(value)
  @write_info_dict = value
end

#write_info_dict?Boolean

Returns true if the information dictionary should be written.

Returns:

  • (Boolean)


188
189
190
# File 'lib/hexapdf/document/metadata.rb', line 188

def write_info_dict?
  @write_info_dict
end

#write_metadata_stream(value) ⇒ Object

Makes HexaPDF write the metadata stream if value is true.

See the class documentation for caveats.



207
208
209
# File 'lib/hexapdf/document/metadata.rb', line 207

def (value)
  @write_metadata_stream = value
end

#write_metadata_stream?Boolean

Returns true if the metadata stream should be written.

Returns:

  • (Boolean)


200
201
202
# File 'lib/hexapdf/document/metadata.rb', line 200

def 
  @write_metadata_stream
end