Вернуться к разделу "Программа DjvuOCR".


The DjvuOCR v2.2 beta program. The cvhtml2 utility


README for cvhtml2

Version 2.0

- added option '-j'

- Improved the processing of lines with hyphenated words at the end of line


Version 1.0

I use the program dtSearch to make a CD with a full-text search. Since DJVU files are not recognized by dtSearch, I made a utility that converts the OCR layer file into an HTML file with the recognized text. This HTML file can be stored within a ZIP file together with the book (dtSearch can search inside ZIP files). In this way you can have a large DJVU collection with full-text search. When dtSearch finds something within a ZIP file, you should load the corresponding DJVU file, with a suitable naming convention, for example,

myfile.djvu
myfile.djvu.zip

Usage:

cvthtml [-j] <in_file> <out_file>

    -j - glues together lines that appear to be parts of one paragraph. (i.e. removes CR/LF at the end of lines that do not end by
          a punctuation sign)

    in_file - a text file, result of FRFGrab.EXE or extracted form a DJVU file using the command

djvused -e output-txt Myfile.djvu > ocrfile.txt

                    Note: please check at the end of the file ocrfile.txt, whether there are any error messages from djvused.exe

    out_file - resulting HTML file in UTF8 encoding. This file can be directly viewed in a web browser.


Автор: gencho  djvuocr [почтознак] mail2world.com

Подготовил: monday2000.

9 марта 2007 г.

E-Mail  (monday2000 [at] yandex.ru)

Hosted by uCoz