JRAConvert is full of suprises!

JRAConvert is full of suprises!

A report by James Rile, PlanetDjVu, November 26, 2002

JRAConvert is a companion application to JRAPublish, and the two are complementary and designed to ship together. In fact, each application has built-in links to open the other one on demand.

While JRAPublish is a unattended batch-processing application for creating DjVu and processing DjVu metadata, JRAConvert is an interactive wizard application, designed specifically for ad-hoc conversion processing when the need arises.

Let's review the basics of each JRAConvert operation:

Decode DjVu Documents to Images - You can produce raster image files from DjVu files on-demand with this operation, in TIFF, multipage TIFF, JPG and PNG formats.

Extract text from DjVu Documents - You can extract the ASCII text from DjVu files. You can select if you want to store the text in one file per page, or one file per document.

Convert or Merge DjVu Documents to BUNDLED format - This will convert existing DjVu documents to BUNDLED multipage DjVu format. You will even be able to pack contents of an entire directory into one DjVu document.

Convert or Merge DjVu Documents to INDIRECT format - This will convert existing DjVu documents to INDIRECT multipage DjVu format. You have full control over the naming conventions that are used to create INDIRECT format DjVu files.

Split multipage DjVu Documents into Single Pages - This option allows to split multipage DjVu documents into independent pages. If Djvu document contains shared dictionaries, each page will receive a copy of the dictionary containing just the information it needs.

Prepare DjVu Documents for Page-level Indexing - This will convert existing DjVu documents to INDIRECT format possibly renaming individual page files and populating page-level metadata.

Decoding Image Files from DjVu

This operation can be performed for a single DjVu file or an entire folder of DjVu files. If the DjVu page is a segmented image, the layers are merged together before the image file is produced. You have full control over the naming conventions used for the image files.

If the DjVu files were created with bitonal scans, then the TIFF files that are decoded will be every bit as good as the TIFF files that created the DjVu in the first place! This feature gives you the option of discarding the TIFF files that created the DjVu, knowing you can "recover" them on demand.

Extracting Text from DjVu Documents

The searchable text that was created during OCR of the DjVu can be extracted and saved to external text files. This can be done for a single DjVu file, or an entire folder of DjVu files. The text file format can be UTF-16 or UTF-8, meaning that text in any language can be accurately extracted. You may need to use this feature if you want to import recognized text into a databasse. It is also a great way to check the accuracy of OCR.

Converting and Merging DjVu Documents

This is a great tool for switching between DjVu Bundled and DjVu Indirect formats. It is also great for merging single-page DjVu files into multipage DjVu. Unlike similar functionality in the Command Line Encoder (now called Document Express Enterprise Edition), there are no memory or temp file constraints. So whether you are merging 50 files together or 5,000 files, you will have absolutely no problem! You have full control over the naming conventions used in the output files.

Splitting multipage DjVu files into Single Pages

This is a great way to "unbind" multipage DjVu files. You can reorder or reorganize the single pages, then combine them into new multipage DjVu documents.

Prepare DjVu Documents for Page-Level Indexing

This feature permits you to prepare DjVu for page-level indexing in the SearchPDF product (formerly called DjVuSearch). Document-level metadata can be automatically distributed to the page level. With page-level indexing, you can search upon the single pages of newspapers and magazines for example, yet when you open a single page for viewing, you can navigate to the other pages of the issue. This is not posssible to do with PDF files, as the PDF file architecture does not support page-level metadata.

Page-level indexing, and embedded metadata at both the Document and Page levels are features that are fully supported by JRAPublish, JRAConvert and SearchPDF. Embedded metadata is stored in DjVu files using standard XML syntax (tags).

Summary

JRAConvert is a wizard application that is bundled with the JRAPublish application. When you have ad-hoc conversion requirements, JRAConvert is just a click away, rounding out the JRAPublish product as the most advanced DjVu Encoding and DjVu Metadata Management application ever produced!

JRAConvert Help File (PDF)