Can Newgen revolutionise document compression?

In a market already dominated by the monopoly of compression schemes like CCITT G4, JPEG, JBIG2 and LZW, the viability of a new product is a question mark. But with a superior technological advancement coupled with ingenious business strategy, document management and imaging solutions provider Newgen Software is fully geared to give tough competition to the existing image compression algorithms, says Shipra Arora

According to Hareish Gur,
NIF will not only offer the
advantage of higher
compression but also the ability to do so without losing text and colour resulting in high resolution

Looking at the enormous growth in volumes of data, any advancement in document compression technology is a milestone. What makes compressed images an important business need is the fact that documents are being constantly archived, communicated and manipulated in digital format and there is a growing demand for instant access to high quality documents.

Newgens recent technology innovation in colour document compression called Newgen Image File-Format (NIF), a result of two-and-a-half years of research and development efforts by the Advanced Image Processing Group, is believed to offer compression rates of up to 300 times. Hareish Gur, group head & deputy general manager, Advanced Imaging Group, Newgen Software Technologies says that NIF will be both a compression scheme and a file format (.NIF), just like JPEG. What will drive competition, however, is the companys strategy of piggybacking on competition by integrating the NIF compression scheme into Adobes PDF format. These factors can help in establishing the NIF brand name. The company is even considering patenting the technology, though it has not taken a decision on this as yet. With the beta version being released this month, this technology will be commercially available by early next quarter.

Whats new in NIF?

According to Gur, NIF is an open standard format that allows ultra-high compression ratios for scanned colour and grayscale office documents without losing text legibility and OCRability of the scanned document.

But the real benefit will depend on the quality of images because generally it has been observed that higher the compression greater the distortion and blocky effects in the images. "NIF will not only offer the advantage of higher compression but also the ability to do so without losing text and colour resulting in high resolution," explains Gur.

Some of the business benefits to the users are in terms of both cost savings in storage and time saving in transferring colour documents to Internet. This is even more critical considering the bandwidth scenario, as a majority of users still dial-up for Internet connectivity. A smaller image size will mean that images can be easily and instantly transmitted and viewed via standard Web browsers thereby resulting in efficient scanning, storing, downloading and emailing mission critical documents via corporate Intranets or even the Internet. These are the benefits that the company will have to explain to users to get them on board, says an industry expert.

How does it work?

The technology works on multi-layer compression. The scanned document is separated into multiple layersa layer containing high-resolution text (or hard edges); one layer of low-resolution background; another layer containing colours and soft edges. Then each layer is compressed separately according to an algorithm that yields the best results for image size and clarity. This is done on the basis of analytical strengths of the technology. The technology uses JPEG and JPEG 2000 for lossy and CCITT G4 and JBIG2 for lossless compression.

(Graphics compression techniques are of two types : lossless and lossy. Lossless techniques throw away redundant bits of information without affecting the quality of the image, but lossy techniques while reducing file size compromise on image quality.)

Based on end user requirements, mix and match of these combinations is possible without hindering interoperability. According to Gur, all these standards being open, there is no imposition of proprietary format and thus user confidence is boosted. While the encoder for bitonal areas can encode losslessly, the background layer is a lossy compression. The text layer is neither touched for resolution reduction, nor for any lossy operation resulting in a clear digital document which retains the quality of the original scanned document at high compression ratios. The most critical stage in the process of creating a NIF/PDF file is the ability to separate the foreground, background, colour and other parts via advanced image processing techniques known as segmentation.

Competitive scenario

There are three major players in the compression market, namely CCITT G4, JPEG and LZW. While CCITT G4 is a black & white (B&W) compression standard, the latter two are colour compression standards. As per industry estimates almost 90 percent of the compression market is still dominated by the B&W standard because of higher costs and prohibitory file sizes involved in colour compression.

While on one hand NIF will be facing competition from the B&W document compression market, on the colour front it will have JPEG and LZW compression schemes to contend with. The 90 percent B&W market is also a potential market which NIF can target and try to move towards colour. What will work to NIFs advantage is the increasing adoption of colour document imaging technologies by the business world, with the choice of storing it at the same cost of the commonly used B&W standard (CCITT G4-compressed TIFF). According to Gur, the size of a NIF compressed office document will be almost same as a CCITT G4-compressed document.

On the colour front, JPEG has the disadvantage of being a lossy compression scheme, which means that there is loss of content/information during compression process. The size is less but the quality suffers. As a result the JPEG scheme is not very readily used in document compression and medical imaging. Apart from this, the JPEG compression scheme is only supported by file formats like .JPG, .JPEG and .JFIF. Similarly there are only three file formats, namely .GIF, .TIF and .PDF, which use LZW compressed schemes. On the other hand, the first release of NIF itself will support file formats like .NIF (its own file format), .PDF, .TIFF, .BMP, .PNG, .GIF, which means that all these formats can be opened with the same viewer. This is done by converting the various formats into .NIF or NIF compressed .PDF formats.

Vis-a-vis the LZW compression scheme, NIF is an open standards-based scheme. This means it will be available for all to read and implement and will create a fair, competitive market for implementations of the standard. Thereby not locking in the customer into a particular vendor or group and maximising end-user choice. Being an open standard, NIF will be free for all to implement, with no royalty or fee. Newgen will be making a restricted version of NIF available as a freeware on the Internet for individual users. However, the SDK and advanced level viewer will be priced. On the other hand for LZW, the joint patent owners CompuServe and Unisys are into an agreement whereby they agree to encourage the GIF developers who use CompuServe as a distributor to pay a royalty fee to Unisys. For each registered copy of a program that uses the LZW compression technology, the developer pays 1.5% of the sale price of the program to CompuServe, or $0.15, whichever is greater.

However, what could make matters a little difficult for NIF is the fact that in June 2003 the patent for LZW will be expiring making it freely available. This will mean that people will be able to use .GIF file formats, etc. without paying. But this does not seem to deter Newgen. The company points out that price tag wont drive the competition. "Competition will largely depend on who offers better compression schemesboth in terms of compression size and quality," adds Gur. LZW cannot offer more than 8-bit colour per pixel for .GIF and .PDF. Its because 24-bit colour compression tends to increase the size of the image. NIF, however, will be able to offer 24-bit colour at an optimised size through the segmentation process. It will also offer the 8-bit colour per pixel choice.

Business strategy

Breaking into the technology domain of CCITT-G4, JPEG and LZW will, however, not be easy. Despite its technology strength, competition is tough for NIF to establish itself among already established technologies. The failure of an almost similar document compression offering from a US-based company LizardTech, (which had acquired DjVu colour document compression technology from AT&T Labs) to garner expected market share tells adequately on the tough market scenario and the competition Newgen will have to face.

It portends that more than the technology, the company has to get its business strategy right. Learning a lesson from LizardTechs case, Newgen think tank has tried to induce more flexibility and risk capacity into its NIF strategy. NIF being an open standard format, Newgen has decided to use it to leverage on Adobes PDF market share as well, thereby increasing the scope of addressing the market. This means that not only will Newgen be able to address the untapped market, but also cater to Adobes market. Gur explains that NIF technology complements Abobes technology by packaging multiple NIF layers into a PDF file, in compliance with Adobes specs for PDF creation. The resulting NIF-compressed PDF can be opened in a Acrobat Reader. This will provide Newgen access to millions of desktops worldwide having Acrobat Reader installed on them, which otherwise wasnt possible. With an advantage of a readymade market to the company it will make its business strategy more risk-free.

According to an expert, what ails LizardTech is its proprietary format due to which it has not been able to take PDFs market share head on. Besides, the pricing of DjVu was also prohibitively high ($20,000 for SDK and about four cents per document conversion fee). Learning from this, Newgen has kept its options open. It means that the user has the choice of saving the file (BMP, TIFF, etc.) either in Newgens .NIF format or Adobes .PDF format (NIF compressed). Though the NIF compressed PDF files are slightly larger than corresponding .NIF files, the conversion from .NIF to .PDF and vice-versa is a fast and non-lossy process. Typically, a 25 MB (A4, 300 DPI colour document) uncompressed BMP file when converted into NIF compressed PDF file will occupy only about 20 KB more than corresponding .NIF file (the size of the .NIF file being around 100 KB). Scoring one-up on the strategy front here, this is the spot where the company feels it will hit competition the most.

Business strategy is also designed towards generating multiple revenue streams. The companys revenue model comprises the following: -

  • Bundling the technology along with scanners
  • Releasing viewers
  • Releasing a development suite providing a wide range of tools & APIs for integrating into any third party application.

The company is also looking at integrating the technology into its mainline solution OmniDocs, a document management system, to begin with. The restricted freeware version on the Internet will enable the company to get the possible users to experiment with the technology, ultimately leading to adoption of the complete version.

The software package for viewing, generation and distribution of lightweight colour document images, which will be available as NIFView, is an image viewer that supports opening and saving of files like NIF, PDF, TIFF, BMP, PNG, GIF, etc. On the other hand companys other offering - NIF SDK will allow integration of NIF technology into all applications. It will include a collection of ActiveX controls, automation servers, Applets and platform-independent APIs for viewing, loading, saving, extracting text layers, annotating, etc. External OCR engines can be fed the binary text layer directly, thereby enabling faster and more accurate results vis-a-vis threshold or binary-scanned image. (OCR engines normally work on binary images only.)

Bundling with scanners and MFDs will be another important revenue generation model for the company. It is already in talks with scanner and MFD vendors globally for bundling NIF technology with their product lines. The company has received positive responses from various potential partners. "All big companies, whom we have met and demonstrated the software, are gung-ho about it and want to bundle our technology with their scanners and MFDs," says Gur. According to him, if the vendors pay royalty, the technology will be theirs otherwise it will remain Newgens brand.

Application Areas

Some of the application areas that the company will be targeting are advertising, publication, distribution and scan-to-web applications and back file conversion of colour publications and documents, workflow applications etc. It also caters to a whole range of B2C and B2B applications, from financial record storage and distribution to online publishing, online retailing, web publishing, e-book publishing. The files in NIF format can be easily put onto the website or embedded in HTML documents. Newgen will be targeting segments like BPO, telecom and insurance sectors, which are likely to be early beneficiaries of NIF document compression technology. It will also be targeting libraries, SME and SOHO segments.

Final Word

According to an IDC estimate, by 2004 there will be 19 million flatbed scanners. This is the size of one of the potential markets for NIF technology. In addition to this increasing thrust towards colour document imaging by enterprises will mark the way for NIF technology. However, its easier said than done. A lot will depend on how well the company is able to market the technology and establish the NIF brand strongly among CCITT G4, JPEG and LZW. Still to come are the counter strategies adopted by the competitors.

