IiifPrint
Overview
IiifPrint is a gem (Rails "engine") for Hyrax-based digital repository applications to support displaying parent/child works in the same viewer (Universal Viewer) and the ability to search OCR from the parent work to the child work(s).
IiifPrint is not a stand-alone application. It is designed to be integrated into a new or existing Hyku (v4.0-v5.0) application. Future development will include integrating it into a Hyrax-based application without Hyku and support for IIIF Presentation Manifest version 3 along with AllinsonFlex metadata profiles.
IiifPrint supports:
- OCR and ALTO creation
- full-text search
- OCR keyword match highlighting
- viewer with page navigation and deep zooming
- splitting of PDFs to LZW compressed TIFFs for viewing
- configuring how the manifest canvases are sorted in the viewer
- adding metadata fields to the manifest with faceted search links and external links
- excluding specified work types to be found in the catalog search
A complete list of features can be found here.
Documentation
A set of helpful documents to help you learn more and deploy IiifPrint can be found on the Project Wiki.
IiifPrint was developed against Hyku v4.0-v5.0. If your application uses Bulkrax, please ensure that its version is 5.0.1 or greater.
Requirements
- Ruby >=2.4
- Rails ~>5.0
- Bundler
- Hyrax v2.5-v3.5.0
- ...and various Samvera dependencies that entails.
- A Hyrax-based Rails application
Dependencies
- FITS
- Tesseract-ocr
- LibreOffice
- ghostscript
- poppler-utils
- ImageMagick
- ImageMagick policy XML may need to be more permissive in both resources and source media types allowed. See template policy.xml.
- libcurl3
- libgbm1
Installation
IiifPrint easily integrates with your Hyrax 2.x applications.
- Add
gem 'iiif_print'to your Gemfile. - Run
bundle install - Run
rails generate iiif_print:install - Set config options as indicated below...
Changes made by the installer:
- In
app/assets/javascripts/application.js, it adds//= require iiif_print - Adds
app/assets/stylesheets/iiif_print.scss - In
app/controllers/catalog_controller.rb, it addsinclude BlacklightIiifSearch::Controller - In
app/controllers/catalog_controller.rb, it addsadd_index_fieldandiiif_searchconfig in theconfigure_blacklightblock - Adds
app/models/iiif_search_build.rb - In
config/routes.rb, it addsconcern :iiif_search, BlacklightIiifSearch::Routes.new - In
config/routes.rb, it addsconcerns :iiif_searchin theresources :solr_documentsblock - Adds
config/initializers/iiif_print.rb - Adds three migrations,
CreateIiifPrintDerivativeAttachments,CreateIiifPrintIngestFileRelations, andCreateIiifPrintPendingRelationships - In
solr/conf/schema.xml, it adds Blacklight IIIF Search autocomplete config - In
solr/conf/solrconfig.xml, it adds Blacklight IIIF Search autocomplete config - Adds
solr/lib/solr-tokenizing_suggester-7.x.jar
(It may be helpful to run git diff after installation to see all the changes made by the installer.)
Configuration to enable IiifPrint features
NOTE: WorkTypes and models are used synonymously here.
Model level configurations
In app/models/{work_type}.rb add include IiifPrint.model_configuration to any work types which require IiifPrint processing features (such as PDF splitting or OCR derivatives). See lib/iiif_print.rb for details on configuration options.
# Example model Book which splits PDFs into child works of
# model Page, and runs only one derivative service (TIFFs)
class Book < ActiveFedora::Base
include IiifPrint.model_configuration(
pdf_split_child_model: Page,
derivative_service_plugins: [
IiifPrint::TIFFDerivativeService
]
)
end
Application level configurations
In config/initializers/iiif_print.rb specify application level configuration options.
IiifPrint.config do |config|
# Add models to be excluded from search so the user would not see them in the search results.
# By default, use the human readable versions like:
config.excluded_model_name_solr_field_values = ['Generic Work', 'Image']
# Add configurable solr field key for searching, default key is: 'human_readable_type_sim' if
# another key is used, make sure to adjust the config.excluded_model_name_solr_field_values to match
config.excluded_model_name_solr_field_key = 'some_solr_field_key'
# Configure how the manifest sorts the canvases, by default it sorts by `:title`, but a different
# model property may be desired such as :date_published
config.sort_iiif_manifest_canvases_by = :date_published
end
TO ENABLE OCR Search (from the UV and catalog search)
catalog_controller.rb
- In the CatalogController, find the add_search_field config block for 'all_fields'. Add
advanced_parse: falseas seen in the following example:rb config.add_search_field('all_fields', label: 'All Fields', include_in_advanced_search: false, advanced_parse: false) do |field| all_names = config.show_fields.values.map(&:field).join(" ") title_name = 'title_tesim' field.solr_parameters = { qf: "#{all_names} file_format_tesim all_text_timv", pf: title_name.to_s } end - Set
config.search_builder_class = IiifPrint::CatalogSearchBuilderto remove works from the catalog search results ifis_child_bsi: true - Ensure that all text search is configured in default_solr_params config block:
rb config.default_solr_params = { qt: "search", rows: 10, qf: "title_tesim description_tesim creator_tesim keyword_tesim all_text_timv" }
Ingesting Content
IiifPrint supports a range of different ingest workflows:
- single-item ingest via the UI
- batch ingest of works from local files or remote files via Bulkrax
The ingest process is configurable at the model level, granting the option to:
- split a PDF into TIFFs and create child works
- create a full complement of derivatives, including TIFF, JP2, PDF, OCR text, and word-coordinate JSON
Developing, Testing, and Contributing
We develop the IIIF Print gem using Docker and Docker Compose. You'll want to clone this repository and run the following commands:
$ docker compose build
$ docker compose up
$ docker compose exec web bash
You'll now be inside the web container:
$ bundle exec rake
The above will build the test application (if it doesn't already exist). During the rebuild you might get a notice on a conflict for files. It will ask you to override. We recommend that you select the "accept all" option (e.g. Typing a).
To rebuild the test application, delete the .internal_test_app directory.
Contributing
If you're working on a PR for this project, create a feature branch off of main.
This repository follows the Samvera Community Code of Conduct and language recommendations. Please do not create a branch called master for this repository or as part of your pull request; the branch will either need to be removed or renamed before it can be considered for inclusion in the code base and history of this repository.
We encourage anyone who is interested in newspapers and Samvera to contribute to this project. How can I contribute?
Acknowledgements
IIIF Print is a gem that was forked off Newspaper Works, a powerful and versatile library for working with digitized newspapers. We would like to thank the team and maintainers of Newspaper Works for creating such a useful and well-designed gem. Our work on IIIF Print would not have been possible without their hard work and dedication.
In particular, we would like to express our gratitude to brianmcbride, seanupton, ebenenglish, and JacobR for their pioneering efforts on Newspaper Works. Their foundation and expertise were invaluable in the development of this gem.
Thank you to the entire Newspaper Works team for creating and maintaining such a valuable resource for the Samvera community.