Return to the original article.
What is "DjVu Imager" and what is it for?
The DjVu Imager v2.9 (1,17 MB) program is intended to DjVu-encode the scans containing the half-tone illustrations and photos. The program is a freeware and is covered by the "GPL 2 and later" license. It has the complete open sources (73 KB).
DjVu Imager has both Russian and English interfaces.
The program has its own history. Long time ago some people noticed that is was easy to DjVu encode the plain text, but the illustrated scans were a great problem.
Here are the examples of the illustrated scans:
Fig.1. An illustrated hardware catalogue. | Fig.2. A computer handbook with the dialog boxes images. |
The reason for the problem is that the text and the illustrations are too much dissimilar things - in terms of the DjVu encoding. They are even encoded differently, more exactly to say - the opposite ways.
The text is encoded with the maximum sharpness, but the illustrations on the contrary
are intentionally blurred as much as possible. That's why every time one had to create a
DjVu there was a question about how exactly to make the DjVu encoding?
At that time there were only 3 choices:
1. Encode the scan in the "text mode" (DjVuBitonal).
2. Encode the scan in the "illustrations mode" (DjVuPhoto).
2. Encode the scan with the automatical segmentation (auto-created DjVuDocument).
Let's consider a particular scan as an example:
Fig. 3. The original scan with a halftone illustration (taken from the U235 site). |
The first method (the "text mode") had 2 flaws: the generated DjVu files were too big and the illustration lost the halftones (and looked like a middle-age engraving):
Fig. 4. The original scan encoded in the "text mode". |
The second method (the "illustrations mode") was by far worse. It turned the text around the image much blurred and unreadable:
Fig. 5. The original scan encoded in the "illustrations mode". |
The third method (the automatical segmentation) was bad too: it generated the artifacts over the image, because the automatics erroneously tried to find some text-alike elements over the image:
Fig. 6. The original scan encoded with the automatical segmentation. |
On the figure 6 you can neatly see the erroneous dot-like artifacts. But usually such artifacts look much worse - like some rude monochrome strands or scraps.
IMPORTANT: These auto-segmentaion errors arose the wide-spread delusion sounding like "the DjVu format spoils the images" and "DjVu should not be used, only PDF is good".
But luckily once the solution to the problem was found. A person named manfred
suggested a so-called "Separated scans method" - the fourth
one.
The idea of the method is to separate the illustration from the text (into a separate
file) prior to the DjVu encoding.
Then both files (which I call "subscans") are DjVu-encoded separately each one
with its own best method. After the 2 resulting DjVu's are merged together using a special
technique:
Fig. 7. The text separated from the illustrated scan (according to the "Separated scans method"). That's a so-called "foreground subscan". |
Fig. 8. The illustation separated from the illustrated scan (according to the "Separated scans method"). That's a so-called "background subscan". |
This approach solves completely the problem of the illustrated scans encoding:
Fig. 9. A DjVu file encoded by the "Separated scans method". |
On figure 9 you can see that the "Separated scans method" both keeps the text sharp and preserves the images halftones - with no artifacts on it. The size of the resulting DjVu becomes optimal too.
Since the discovery of the separated scans method its technology has experienced the
different transformations. Finally they have all developped into the creation of DjVu
Imager that implements the separated scans method the practical way.
The DjVu Imager usage outline looks like this:
First in Scan
Tailor Featured the user separates the illustrated scans into the "text"
(foreground subscan) and the "illustrations" (background subscan) - as shown on
the figures 7 and 8.
"Foreground" means "as the prototype of the future DjVu foreground
layer", "background" - of the background layer accordingly.
Next the foreground subscans are encoded to DjVu with DjVu Small.
DjVu Imager encodes to DjVu the background subscans and immediately "pastes"
them up to the textual DjVu (obtained from DjVu Small).
Let's consider the whole process of the separated scans method over a particular example of a greyscale scan (as the most typical).
Load the raw book scans into Scan Tailor Featured (in the greyscale or color mode - not black-and-white). Process them as usually down to the "Output" stage. If you face Scan Tailor for the first time read the instruction for it.
2. At the "Output" stage process the illustrated scans the special way:
a. If the current scan is an illustrated one switch the Mode to Mixed. If the scan has only an illustration and no text on it - set Color / Grayscale. If it has only the text and no illustrations - set Black and White.
Fig. 10. Set the Mixed mode for the current scan in Scan Tailor.
As a result, the current scan will be reprocessed and the program will automatically recognize and place the illustrations zones on it.
IMPORTANT: Notice that the Mode parameter is page-dependant! It means that every page can have a different mode (one of 3 possible). This is a common mistake - many users think that if they set the very first scan to, say, Black-and-White, than all the others "get the same mode". It is not true in the common case.
b. Switch to the Picture Zones tab. The auto-recognized picture zones will be marked with the violet pulsating light:
Fig. 11. A picture zone is marked (in Scan Tailor) with the violet pulsating light. |
c. In case if the picture zones are auto-recognized improperly (not precisely), they have to be corrected manually - by applying the manual zones.
3. After the end of the scan processing you must export the output scans. In the Tools menu choose the item Export... and you will see the Export window:
Fig. 12. The output scans export window in Scan Tailor Featured. |
The output scans export is an approach to get the output scans renamed to the "consecutive name sequence" 0001.tif, 0002.tif, ... , 0010.tif, ..., 0100.tif, ... .
The issue here is that Scan Tailor outputs the files named in some weird way, reflecting the original filenames and the page original position (left or right).
But for the purpose of the next-to-come DjVu encoding we need the output to be renamed to a some consecutive sequence and it is important.
Setting this checkbox you force the program to ignore the user-set export folder. In this case the program automatically creates the folder "export" inside the "out" folder and exports the output scans there.
If you don't set this checkbox then you can choose your desired export folder. Note that in case the program will also auto-create the "export" folder - but for this time inside your chosen folder.
So the output scans are exported to a "export" folder in any case.
If you set this checkbox then the output scans will be separated into the pairs of the foreground and background subscans - at the process of the export.
At the start of the export the program will automatically create the
subfolders named "1" and "2" -
inside the "export" folder.
The foreground subscans are placed into "1", and the background
ones (same-named) - into "2" correspondingly.
If you don't set this checkbox then the splitting will not occur and the
export will be reduced to the simple copying and renaming the files to the consecutive
sequence.
It is recommended to set both checkboxes - "Default export folder"
and "Split mixed output" (as shown on Fig. 12).
Press on the "Export" button and wait for the export process to finish.
4. Now all the dealing with Scan Tailor Featured is over. Close the program (save the current project just for case you might need it).
DjVu Small v0.4 in this context is applied only to DjVu encode the foreground subscans.
Load in the program the "1" folder, set the "User
BW" profile and press the Convert button. That's it, just
wait until
the encoding finishes and you are done with DjVu Small.
What you get here out of DjVu Small is your DjVu-book, but yet without the illustrations. The illustration are to be made later - with DjVu Imager.
DjVu Imager creates the DjVu illustrations out of the background subscans (from Scan Tailor Featured) and pastes them up to the DjVu Small output.
The DjVu Imager usage looks like this:
1. Press the Options button and set the Custom Filename
checkbox. This is just some outdated legacy of the times when DjVu Imager was aimed only
at the ScanKromsator servicing. Now the Custom Filename checkbox should
always be set.
2. Load the "2" folder into the program.
3. In the "#" column the program automatically forms the "numbers" of the loaded scans (obtained from the scan filenames). Each such number means the number of the DjVu-page (inside the the black-and-white DjVu-multypaged from DjVu Small) where this scan will be pasted up (as the illustration).
If needed you can change this number - by double-clicking it (e.g. on the corresponding cell of the "#" column) and inputing your own value.
Fig. 13. |
Double-click on the file number (in the red circle) and input your page number (if you want to change it). This way you will change the number inside the the black-and-white DjVu-multypaged from DjVu Small, where this scan will be pasted up (as the illustration). |
IMPORTANT: It is most recommended to ALWAYS
load the background scans in DjVu Imager named in a consecutive sequence 0001.tif,
0002.tif, 0003.tif, ..., 0010.tif, ..., 0100.tif, ... E.g. the filenames should look such so they could be easily converted into numbers. |
4. Set the encoding parameters. They are 2: BSF (the background subsample factor) and the Background quality. The recommended BSF value is from 2 to 4.
BSF (Background Subsample Factor):
The ratio of the foreground layer geometrical storage size (in pixels) to the background
one (in DjVu). Ranges from 1 to 12. E.g. the background layer may be stored in a DjVu file
downsampled to 1..12 times. But any DjVu-viewer renders it back to the original size when
you open this DjVu-file. BSF is the most powerful way for you to cut the size of the
resulting DjVu. Background quality: Sets the -slices option for the c44.exe utility with the following formula: - Not more than 4 chunks are always used (although the maximum is up to 15 chunks). - The first chunk is always equal to 70. - The current slider value is added to the the first chunk value. So the real range of the quality regulation is from 70 to 120 (although the maximum is 210 - which does not make sense since he quality does not grow after 120). - The current slider value is distributed evenly among the remaining 3 chunks: it is (value) divided to 16 by modulus (with the remainder). The resulting values are set (with the remainder) to 3 chunks. In case of value > 48 the second chunk is augmented by 1 or 2 (to avoid introducing the 5-th chunk). I recommend you to play only with BSF and not to touch the Background quality (because the latter almost doesn't make sense). |
5. You can set the parameter values individually for every background subscan. For that
purpose click the desired item in the loaded filelist, choose the selector Current
file and set the desired values for BSF and Background
quality. As a result the current file will be marked bold in the filelist.
To reset the individual settings use the C button - pressing it will
bring all the files to the commmon value settings.
6. Press the Convert buton and wait. DjVu Imager will encode every
background subscan in the DjVuPhoto mode. The result will be a set of the single-paged
DjVuPhoto files.
Immediately after the encoding the program will create a temporary multypaged DjVu
consisting only of all the created DjVuPhoto files and will open it in your current
DjVu-viewer (associated with DjVu-files).
This is intended just for your convenience - so that you could observe all the created
DjVu illustrations at once.
7. For the interactive tune-up of the individual encoding settings (in the different list
items) use the Current button - it encodes the current item in the list
with its current parameters (actually independently from whether it is an
"individual" or a "common" file) and opens it right away in your
DjVu-viewer.
8. Playing with the encoding values (BSF and Background quality)
either for all the files at once or individually for some of them and watching the result
in the DjVu-viewer achive the best "quality/size" ratio for the resulting set of
the DjVuPhoto's.
For your convenience the current total size of all the created DjVuPhoto's is displayed
between the View and Current buttons and it changes
automatically after each DjVu encoding (by the Convert and Current
buttons).
9. Press the source button and choose in the opened window the black-and-white multypaged
DjVu from DjVu Small (by default it is the "DjVu Encoded.djvu" file on the
Desktop). Press the button Insert in DjVu. The program will copy
the source black-and-white multypaged DjVu to the destination folder (with the
"out" suffix by default) and will automaticall paste all the available
DjVuPhoto's up to the corresponding pages of the black-and-white DjVu - orienting itself
by the page numbers.
The result will be your DjVu-book completely ready for the usage - filled with the
illustrations.
Instead of Scan Tailor Featured you can use ScanKromsator like this:
But this is less recommended because ScanKromsator v5.92 does not have the auto-recognized illustration zones.
The latest ScanKromsator versions implement the Separated scans method completely internally (the whole cycle of the DjVu creation inside the ScanKromsator), but due to the numerous bugs this feature is considered to be virtually inaccessible.
Author: monday2000.
19 February 2013
E-Mail (monday2000 [at] yandex.ru)