The MRC (Mixed Raster Content) Model and DjVu
a paper by PlanetDjVu, June 10, 2003
MRC 3-layer model
The MRC 3-layer model was developed initially as a way to compress color scanned page images for fax transmission. It was invisaged that color fax would, or could replace bitonal fax transmission, and that by compressing color page images sufficiently, the use of color faxes would become standard both for phone-line transmission and for web transmission.
The commercial implementation of color fax never happened (Adobe and Xerox could not agree on IP issues), but the MRC 3-layer model was implemented in a number of digital document formats for display and printing, both on the web and on the desktop.
These formats include .tfx (TIFF-FX), .ldx (LuraDocument), and .djvu (DjVu).
The 3 layers of the MRC model are Foreground and Background, which are both multi-level, and an Image Mask, which is bi-level. Each layer may appear only once on a page and is encoded independently of the other two. It is the Image Mask which merges the Foreground and Background layers together for display and printing.
The Extension of the MRC 3-layer model from fax to digital document
Both the LuraDocument and DjVu formats took the first step towards the implementation of the MRC 3-layer model for digital documents by producing a file type definition and viewers that support multipage files (MRC by itself only defines a single image or page).
Page navigation was added as a feature, including thumbnails, and DjVu went a step further by establishing two methods of multipage file storage, Bundled and Indirect.
Before AT&T sold the DjVu file format to LizardTech, the additional elements of a searchable text layer and hyperlinks were added to DjVu, making DjVu the most advanced MRC implementation of a digital document. LizardTech added annotations to DjVu, but has neglected to add document metadata and bookmarks, even after JRA offered a complete specification for these digital document features to LizardTech.
LuraDocument has not added a searchable text layer.
PDF can now be rendered using a limited MRC 3-layer model
Acrobat 6, released just two weeks ago, offers a new form of compression called Adaptive Capture. It follows the MRC 3-layer model by creating background and foreground layers, and creating a limited image mask (the transparancy part). The results are remarkable. The file size of Adaptive Compressed PDF compares very favorably to the best that DjVu can offer.
Here is an initial comparison of DjVu and PDF using the new Adaptive Compression. The PDF files can be viewed with either Acrobat 5 or 6 (they are PDF 1.4 files). Both formats were produced from 300 dpi color images except for the Newspaper Cover Page and the Wentworth News Cover DjVu pages, which are DjVuDigital. We have extended this study with examples produced with the updated Silx encoder (DgpEncode.exe), and Silx examples that were "re-fried" with the Acrobat 6 PDF Optimizer, in order to convert the JPEG portion of the pages to JPEG2000 in PDF 1.5 format.
* Acrobat 6 Reader, Standard or Professional required for viewing
Click here for more background information on the MRC 3-layer model.
Click here for an interesting technical paper on what is next for the MRC 3-layer model.
Click here for another interesting background paper on the MRC 3-layer model from PARC.
Samples including OCRed Text
|
||||||||||||||||||||||||||||||||||||||||
Conclusions
The OCRed text produced with ABBYY FineReader is remarkably better than the text produced by Acrobat 6. On page 2 of the Watershed Brochure, the ABBYY text is almost 100% correct, while Acrobat 6 failed to recognize any text on this page (too small for it to handle).
The image segmentation produced by Silx compression is remarkably better than the image segmentation produced by Acrobat 6 PDF Optimizer. Compare both yourself and I think you will agree.
The optimum file will be one that is OCRed by ABBYY FineReader (better text recognition), and then adaptively compressed using Silx Adaptive Compression (superior compression method).
Roxy Music Comparison
Conclusions
The DjVu and Acrobat 6 Adaptive Compression files are small in size because a "Clean" segmentation profile was used that did not place the page background into the color layer of the page.
The Silx Adaptive Compression files are very large because much of the page background is in the color layer of the file.
For document images of this type, it is undesirable to put the page background into the color layer. For Silx Adaptive Compression, we used the default settings of DgpEncode.exe. What is needed to make the Silx Adaptive Compression produce a small PDF file in this case is a "clean" profile - profile settings that you select when you want to force more of the page background to white. This is also the cure for color page images where the text from the opposite side can be seen in the page background (back-side bleedthrough).
Adobe Acrobat 6 PDF Optimizer does not have any control settings for segmentation. Only the default setting is available. The documentation for Acrobat mentions that is is "optimized for 300 dpi color images".
The PDF Optimizer did well with the agressive approach in this case, but consider the Watershed Brochure example above. In this document, which was printed with a light gray background, it is desirable to keep the background color - not force it to white. In this case the DgpEncode.exe default settings worked to produce a better file than PDF Optimizer, and this is because the light gray background of the page was retained.
Clearly there is a need for multiple "adaptive" or "segmentation" Profiles in both products A standard set, just like those that were implemented for DjVu originally by Patrick Haffner of AT&T Labs.
|