Deskewing with JRAPublish 2.0 & QuickScan Pro 3.0
a report by PlanetDjVu, February 14, 2004

Introduction
JRAPublish 2.0 features a new deskew option that will operate on bitonal, grayscale and color images, during the course of document encoding, as an image pre-process.

Deskewing is available as a pre-process option in JRAPublish 2.0

Deskewing is the process of straightening an image to its exact original intended orientation.

How does an image become skewed? There are several causes. The first introduction of skew may occur when the original page image is printed, if the paper was not properly aligned during the print job. The second introduction of skew may occur if the page was photocopied, and the page was not properly aligned in the photocopier. The third introduction of skew may occur if the page was not properly aligned as it was digitally scanned.

It is almost inevitable that document page images will contain some measure of skew, however small.

Why do we want to deskew images? Even a small measure of skew is evident and to the human eye and disturbing to the viewer. We always see the contents of a page in relation to, and framed by, the edges of the page. Another reason is that when text lines are true horizontal (or vertical), they OCR better.

How Skew Is Detected
Skew can be detected either by the text lines on the page, or by the graphic objects on the page, or both. JRAPublish 2.0 detects skew based on both, but heavily favors skew recognition based on text lines.

Skew can also be determined by a vertical analysis of the page, by a horizontal analysis of a page, or both. The JRAPublish deskew feature will use either vertical or horizontal analysis, whichever orientation contains more lines of text.

Page Image Size After Deskew
The process of deskewing a page, by definition, always increases the size of the page image. A simple way to understand why is to pick up two pieces of paper, one on top of the other. Now slightly skew the top page. If you were to draw a rectangle around both pages now, it would have to be larger than the original paper size. It is a geometric law.

Deskew software, in order not to lose any part of the original page image, increases the image size. This is great in principle, but in practice there is a problem. We want the deskewed page size to be the same as the original because it adds consistency when viewing a multi-page document, and most importantly, it will then always print correctly.

For the sake of practicality, JRAPublish crops the deskewed image back to the original image size. Most of the time, the edges of a page image contain just margins (white space), and a little of it can be cropped away without any harm. You can turn off this option in JRAPublish if you wish.

Color Fill
When the page image is deskewed, it creates blank, narrow triangular areas around each edge that have to be filled with color information. JRAPublish deskew always fills with a white color. Other deskew software sometimes will give you a choice of white or black. In the future, we hope to implement adjacent color fill, so the fill areas blend nicely into a color page image that may have color right up to the edge of the page.

Maximum Skew Percentage
All deskew software has a limit on the degree of skew that can be processed. In JRAPublish it is about 9 percent. Other advanced deskewing software can process up to 15 percent skew.  The more skew there is in a page image, the harder it is to preserve character quality when deskewed.  This is why there is a limit.

If the amount of skew in an image exceeds the maximum, it will not be automatically deskewed. Either the page will have to be rescanned, or it must be manually deskewed in an editing program like Photoshop (and at your peril, because the of the text-degrading problem). Re-scanning is the recommended approach.

There are a couple of ways to check your page images for skew before processing them in JRAPublish. One is to use image management software, like ThumbsPlus, to generate thumbnail images of all the pages. Then just visually scroll the thumbnails looking for extreme skew (and any other problems). A more foolproof approach is to use advanced deskewing software like ScanFix, which will let you generate a report after measuring the skew of each image (without modifying it).

Deskewing in QuickScan Pro
QuickScan Pro 3.0 is the Scan & Index Application Option for JRAPublish, and it also has a deskew feature that can be applied to bitonal, grayscale and color image files, before they are processed in JRAPublish.


Option
Description
Detect angle and deskew
Detects how much the image is skewed and straightens the image.
Rotate by fixed angle
Rotates the image clockwise according to the Fixed Angle setting (see below).
Detect angle
Detects the skew angle.
Fixed Angle
Sets the angle to rotate by using the Rotate by fixed angle setting (see above).
Black
Adds black fill color to the areas around edges that may be created when the skewed image is straightened.
White
Adds white fill color to the areas around edges that may be created when the skewed image is straightened.
Horizontal
Skew angle is measured in a horizontal direction.
Vertical
Skew angle is measured in a vertical direction.
Both
Skew angle is measured in both a horizontal and a vertical direction.

Binary Deskew with Black Border Removal

This bitonal image was deskewed, and then as a second image processing step, the black border was removed. Black border removal is a function that is only available for binary (bitonal) images. Black Border Removal should not be used with color images, or they may be automatically demoted to binary images before processing.

What this illustration does NOT show is that the After image is larger than the Before image, because there is no feature to crop the image back to original size in QuickScan Pro.

Deskewing Tests are Recommended for Best Results
We recommend that you try out the deskew feature before your final conversion run. This way, you can verify that it will perform correctly on all of your page images, and you can handle any exceptions or problems in advance.

To test in JRAPublish, turn off the OCR feature to save time. If you have a monthly-limited version of JRAPublish, and you are outputting DjVu or JBIG2-PDF files, turn on the Trial DjVu or Trial JBIG2-PDF feature so that you don't unnecessarily use up page clicks.

To test in QuickScan Pro, make a backup copy of your images before running the deskew function.

Deskewing in Other Software that Encodes Searchable-Image DjVu and PDF Files
Document Express software from LizardTech, for encoding DjVu files, does not currently have a deskew feature.

Adobe Acrobat, from Adobe Systems, will deskew the images during OCR, and there is no way to turn it off or set any parameters. The page size will always increase, so your encoded PDF file will always have irregular page sizes as a result.

Most other applications that encode PDF files have a deskew feature that only works with bitonal images (not grayscale or color).

Example of 3-page Color Document, Deskewed in JRAPublish









                                                                                                                                                                                                                                                                                                           
Hosted by uCoz