Back to "OpenOCR".


The description of the ED-format

Author: Artyom Bojenov.
Written: 2.11.98
(C) Cognitive Tech

Translated from Russian to English
by monday2000
monday2000 [at] yandex [dot] ru
23.09.09


The ed-format is used to tranfer the recognition results from the kernel to converters which make the fragmentation and other required text conversions (as to production of txt, rtf and other formats).

The file may contain only one recognized page. Its number is stored in the file header (see sheet_disk_descr.sheet_numb).

The file has the ed extension.

It consists of the set of blocks each of which begins with a tag (data code) followed by a corresponding structure. The data code is the first byte of this structure (fileld code), e.g. it is not doubled. No tag can have a number more than 0x1f. For the list of tags see Appendix 1.

Further it is accepted that BYTE is a 1 unsigned char and WORD is the 2 unsigned chars.

All the mentioned offsets are in pixels (looks like).

The "current version" is meant as the format used in CuneiForm 96-97.

Now see the list of all the tags and their explanations:


HEADER

The code SS_SHEET_DESCR - the file header.

This is the most first file byte. It must (?) start with a structure:

struct sheet_disk_descr
{
    Word8 code;   //=0x0A
    Int8 quant_fragm;
    Word16 sheet_numb;
    Word16 descr_lth;
    Word8 byte_flag;
    Word16 resolution;
    Word16 incline;
    Word16 version;
    Int8 tabl[11];
};

where
quant_fragm - the number of the fragments on the page. At least 1 fragment must be ALWAYS present.
sheet_numb - the page number (either from 0, or 0 - the number of the page is misssing (?))
descr_lth - the length of the page descriptor, including the following fragment descriptors.
resolution - DPI of the TIFF.
incline - page incline. Calculated with a formula: incline=(xideal-xreal)/yreal*2048=(yreal-yideal)/xreal*2048.
tabl[13] - reserved.

Later come (one by one) the fragment descriptors (their total number is quant_fragm). Each of them starts with the SS_FRAGMENT code.


The code SS_FRAGMENT - the new fragment descriptor.

!!!!Note that BEYOND the header the code SS_FRAGMENT means COMPLETELY ANOTHER THING!!! (see below) and is not used in the current version.

!!!Also note that in the current version this structure is not used. E.g. one fictive fragment is created for the compatibility, and the real fragments data is stored with SS_TEXT_REF.

The srtructure contains the fragments information. The fragment's number corresponds to this structure position in the file.

struct fragm_disk_descr
{
Word8 code;                  //=0x0B
Word16 row;                 // coordinates of left upper
Word16 col;                  // angle of fragment's frame
Word16 height;             // height of fragment
Word16 w_width;        // Q.w_width of fragment
Int8 type;
Word8 kegl;                 // kegl for following fragm
Word8 font;                 // font ~~~~~~~~~~~~~~~~~~
Word8 language;         /* language for fragment*/
Word8 type_underl;    // type specifications of font
};                                 // for additional information
                                   // look at underline

row - the row coordinate of upper left
col - the column coordinate of the corner of the fragment's frame
height - the height (of the fragment's frame
w_width - the width of the fragment)
type - the type (0-textual,1-graphical)

It can also take the following values:
#define FD_TYPE_TEXT 0
#define FD_TYPE_PICT 1
#define FD_TYPE_TABLE 2
#define FD_TYPE_EMPTY 3

kegl - pointtype
font - font
language - fragment's language
type_underl - underline type (see type specificatons in the metasymbol SS_UNDERLINE - Underline)

!!! According to the current version next goes the block of the fragments descriptors implemented on the basis of the SS_TEXT_REF tag. The header ends with the tag SS_TEXT_REF with the field type =SSR_FRAG_END.

Next comes the file itself.


FILE BODY

The code SS_BITMAP_REF - the mapping information.

The structure contains the mapping of a char (or any other similar element like HalfSpace - see below) that FOLLOWS it (the structure).

If no any symbol is found between such 2 structures (only such a symbol that may be described with this structure) (such situation should not really happen) than the content of the first structure is ignored. Conversely if the first structure is followed by more than one such a symbol, than the structure information is applied to all of them (symbols).

struct bit_map_ref
{
Word8 code; // == SS_BITMAP_REF
Word8 pos;
Word16 row; // Reference box
Word16 col;
Word16 width;
Word16 height;
};

row,col - the coordinates of the upper left object corner,
width,height - width and height.

SS_TEXT_REF - text descriptor
(SS_REMARK)

struct text_ref
{ // 0 - letter
Word8 code; // 1 -
Word8 type; // 2 - word
Word16 object; // 3 - string
};

code = 0x01

In the current version the field type describes what is in the object field.

The block is used as an alternative to the old text-formatting system and is more flexible. The list of the possible type values - Appendix 4.

Next goes the information about the possible values of the fileds type and object:


The series about the fragment

It is used as an alternative to SS_FRAGMENT from the header. The blocks with the values type=SSR_FRAG_... must be found only in the file header. All the information contained between the structures with type=SSR_FRAG_TYPE is considered to refer to the same fragment. The data must be ordered in accordance with the sequence of the appearance of the lines with the attribute SSR_LINE_FN in the file body. E.g. you should write these blocks into the file according to the following algorithm:

while(the current fragment < the total fragments number)
{
    SSR_FRAG_TYPE,
    SSR_FRAG_N,X,W,Y,H,BASE(if such information is available),
    if(the fragment type == TAB_BEG)
    {
        SSR_FRAG_PNUM,
        for(i=0;i<the number of columns;i++)
        {
            SSR_FRAG_COLXW with x coordinate of the i-th column,
            SSR_FRAG_COLXW with the width of the i-th column
        },
    }
    if(the fragment type == MCOL_BEG)
    {
        SSR_FRAG_PNUM
    }
    the current fragment++,
    }
SSR_FRAG_END,
the current fragment=0,
while(the current fragment < the total fragments number)
{
    if(the current fragment is centered relative to another fragment)
    {
        SSR_FRAG_SN,
        SSR_FRAG_REL
    }
    the current fragment++,
}
SSR_SHEET_TYPE

Now more details:

SSR_FRAG_TYPE - the fragment type. The possible types - in Appendix 5. The decriptions will be.

object - the actual type
TP_ONE_LINE 0x0001
TP_LEFT_ALLIGN 0x0002
TP_RIGHT_ALLIGN 0x0004
TP_CENTER 0x0008
TP_POS_INDENT 0x0010
TP_NEG_INDENT 0x0020
TP_BULLET 0x0040
TP_POESY 0x0080
TP_LN_SPACE 0x0100
TP_NOT_RECOG 0x0200
TP_BORDER 0x0400
TP_BRACKET 0x8000
TP_MCOL_BEG (TP_BRACKET|0x1000) - the beginning of the multycolumn section
TP_NEW_COL (TP_BRACKET|0x2000) - the beginning of the new column inside the multycolumn section
TP_MCOL_END (TP_BRACKET|0x3000) - the end of the multycolumn section
TP_TAB_BEG (TP_BRACKET|0x4000)
TP_NEW_ROW (TP_BRACKET|0x5000)
TP_DEL (TP_NOT_RECOG|TP_BORDER)
TP_FICT_FR_FLAGS (TP_BORDER|TP_BRACKET)

FICT_FR_FLAGS - the fragments that have object&FICT_FR_FLAGS!=0 as fictive, e.g. they are not present in the file body as SS_FRAGMENT, but serve only to indicate the text-formatting.

SSR_FRAG_N - the fragments number assigned by a user at the segmentation stage(?).

object - the actual number

SSR_FRAG_X - the horizontal fragment position in TIFF
SSR_FRAG_W - object - correspondingly the left edge and the width of the fragment

SSR_FRAG_Y - the vertical fragment position in TIFF
SSR_FRAG_H - object - correspondingly the upper edge and the height of the fragment

SSR_FRAG_BASE - Regular margin from 'x'

SSR_FRAG_PNUM - if the type of the fragment is TAB_BEG or MCOL_BEG than object is the number (correspondingly) of the cells/columns.

SSR_FRAG_COLXW - the horizontal position of the table columns. The structures must be paired; in the first object - x-coordinate, in the second object - the width. Such structures must be located in the block which has the TAB_BEG type.

SSR_FRAG_SN - object: the fragment number assigned to it with the OCR kernel (e.g. the current number in the file SSR_FRAG_REL - object: for centered fragment - number given automatically of centered parent

SSR_FRAG_END - ends up the fragment descriptors. object - the number of the written fragments.


The series about the line

The blocks from this series are located independently from the tag SS_LINE_BEG. They kind of create the line header. They are written according to the alghorithm:

if(the line is fictive)
{
    SSR_LINE_FICT
    êîíåö
}
SSR_LINE_FN,
SSR_LINE_TYPE,
SSR_LINE_NUMBER(this is not necessary),
if (the beginning of the paragraph)
{
    SSR_LINE_PARAGRAPH
    SSR_LINE_INDENT
}
if (id the line is composite)
    SSR_LINE_NPIECES
if (the beginning of the paragraph with a marker)
{
    SSR_LINE_PARAGRAPH
    SSR_LINE_INDENT
}
SSR_LINE_X,
SSR_LINE_W,
SSR_LINE_BASELINE,
SSR_LINE_HEIGHT

Now more details:

SSR_LINE_FN - object: the current fragment number to which the line begins

SSR_LINE_PARAGRAPH - The line is the beginning of the new paragraph. Object can have the values:

#define ORD_LN 0
#define NEW_PAR 0x1 // new paragraph
#define BUL_PAR 0x2 // paragraph with bullet
If the line is not the beginning of the paragraph then this tag may be omitted

SSR_LINE_TYPE - object: the line type

0x0 for deleted line
#define DL_HUGE_XW 0x1 // XW changed due to huge letter
#define DL_HUGE 0x2 // Huge letter
#define DL_ASSEMBLY 0x4 // Line assembled from pieces
#define DL_SINGLE 0x8 // Single-letter line

SSR_LINE_BULIND - object - the indentation size for the marker (bullit)(if available)

SSR_LINE_INDENT - object - the indentation size for the beginning of the paragraph (if available)

SSR_LINE_X - The horizontal position in TIFF
SSR_LINE_W - object - correspondingly the left edge and the line width

SSR_LINE_BASELINE - 3d baseline position (ideal)

SSR_LINE_HEIGHT - Height beetween 2nd and 3rd baselines

SSR_LINE_NPIECES - comes in a broken line (when the line type is DL_ASSEMBLY). object - Number of original pieces within line - 1

SSR_LINE_FICT - means that the line is fictive (should not be taken into account). object is ignored.

Every type may take the following values:

SSR_LINE_NUMBER and than object is the current number of the line in the file (but it is unclear why is it needed: it can be easily calculated)
SSR_WORD_BOLD and than object is the density of word
SSR_SHEET_TYPE and than object is 0 or 1 depending from ExistSheets


The series about the broken line

These tags can be found in the middle of a line and describe the change of the values set by the tags SSR_LINE_...

type=SSR_BROKEN_X
Means the jump of the x-coordinate of the line. object is ignored.
It is equivalent to SS_LINE_BEG

type=SSR_BROKEN_W
object is ignored. It means the neglect of the line beginning (?)

type=SSR_BROKEN_BASELINE
The jump of the baseline which is the 3d baseline position (ideal). object - new baseline

type=SSR_BROKEN_HEIGHT
The jump of the line height which is the Height beetween 2nd and 3rd baselines.
object - new height


The code SS_FRAGMENT - the beginning of the new fragment

!!!It is not used in the current version. The information about whether a line belongs to a fragment is present in SS_TEXT_REF type=SSR_LINE_FN;!!!

It means the beginning of a new fragment. The data covering the fragment are located in the corresponding descriptor in the file header. It also means the beginning of a new line (see SS_LINE_BEG: fragm_disk.depth==line_beg.base_line). So you should not place SS_LINE_BEG after SS_FRAGMENT.

struct fragm_disk
{
Word8 code; //=0x0B
Word8 fragm_numb;
Word16 depth;
};
fragm_numb - the number of the beginning fragment (from zero)
depth - the offset of the current line base from the upper fragment edge.


The code SS_LANGUAGE - the recognition language

This tag is ignored?? It is necessary because of the editor imperfection that can not process it properly. For example after the processing of an English symbol it turned to Russian. The language tag at that is saved as English. This brings to an error during the conversion in TIGER (OCR module).

struct EdTagLanguage
{
Word8 code;         // 0x0f
Word8 language;
};
language - the language code (see Appendix 2).


The code SS_EXTENTION - the beginning of the extended block

This tag was added to extend the ed format and to bring more flexibility to it. It is followed with a substantial structure.

struct edExtention
{
Word8 code; /* always 0x1C */
Word16 Ecode; /* New extention code */
Word16 length; /* Length in bytes */
};

Ecode - contains one of the extended codes (see below) describing the substantial structure.
length - the size of this structure _INCLUDING_ the structure size edExtention.


The code SS_LINE_BEG - new line

!!!The fields height è base_line are not used in the current version. These and other parameters are described with the structure SS_TEXT_REF!!!

It denotes the beginning of the new line of symbols. Perhaps height is not really used, because this structure is much similar to fragm_disk (see SS_FRAGMENT).

The line is considered to belong to the last fragment met.

struct line_beg
{
Word8 code;            //0x0D
Word8 height;
Word16 base_line; // displacement for current
}; // line to upper frame of fragment
// i.e.Vertical offset from the fragm. start

height - reserved(?). Another opinion: perhaps it is the line height which is Height beetween 2nd and 3rd baselines
base_line - the shift (of the current line relative to the upper fragment edge)


The code SS_FONT_KEGL - the information about the font

The information about the font of a symbol (or any other similar object like HalfSpace see below) that FOLLOWS it.

For this code applies all that applies for the CC_BITMAPREF

struct font_kegl // 1 - serific
{ // 2 - gelvetic
Word8 code; // 4 - bold
Word8 new_kegl; // 8 - light
Word16 new_font; // 16 - italic
};     // 32 - straight
       // 64 - underlined

code = 0x02
new_kegl - new size of type
new_font - new font (the possible choices are listed in a column, combinations are possible)


The code SS_POS_HALF_SPACE - the positive halfspace (adding a gap)

It inserts into the current line the symbol SS_POS_HALF_SPACE (?). In earlier versions the code 0x1f meant SS_NEG_HALF_SPACE, and 0x1e - SS_POS_HALF_SPACE

struct pos_half_space
{
Word8 code;         //0x1f
Word8 authenticity_degree;
};

authenticity_degree - the probability of the correct recognition of a symbol ( 0<=authenticity_degree<=255 ).


SS_NEG_HALF_SPACE - the negative halfspace

(?)In earlier versions the code 0x1f meant SS_NEG_HALF_SPACE, à 0x1e - SS_POS_HALF_SPACE

struct neg_half_space
{
Word8 code; //0x1e
Word8 authenticity_degree;
};

authenticity_degree - the probability of the correct recognition of a symbol ( 0<=authenticity_degree<=255 ).


SS_TABUL - the tab symbol

It inserts the tab symbol into the current line after the current symbol

struct tabul
{
Word8 code;                 //0x08
Word8 numb_in_tab_tabul; // number of position in
}; // table of tabulation

numb_in_tab_tabul - the position number in the tab table (see SS_TABL_TABUL)


SS_KEGL - the size of type

struct kegl
{
Word8 code; //0x03
Word8 new_kegl;
};

new_kegl - the new size of type


SS_SHIFT - the shift

struct shift
{
Word8 code; //0x04
Word8 kegl;
Word16 clearance; // value of lift or descent
}; // lift positive
// descent negative

kegl - the size of type
clearance - the shift (the value of the raise or the lowering of the baseline, the raise is expressed with a positive value, the lowering is expressed with a negative value)


SS_RETR_LEVEL - Restore the level

struct retrieve_level
{
Word8 code;        //0x05
Word8 kegl;
};

kegl - the new size of type


SS_UNDERLINE - underlining

struct underline
{ // 0 - thin straight
Word8 code; // 1- half thick straight
Word8 type; // 2- thick straight
}; // 3- thin cursive
// 4- half thick cursive
// 5- thick cursive
// 6- beg of underline
// 7- end of underline

code = 0x06
type - òèï (0-thin straight,1-half bold straight, 2-bold straight, 3-thin italic, 4-half bold italic, 5-bold italic, 6-beginning of the underlining, 7-end of the underlining)


SS_DENS_PRINT - the print density

struct dens_print
{
Word8 code; //0x07
Word8 dens_atr; // attribute of print's
}; // density

dens_atr - the attribute of the print density


SS_TABL_TABUL - the tab table

struct tabl_tabul
{
Word8 code;            //0x09
Word8 lth; // scale of arow
Word16 arow_pos[1] ;
};

lth - the array length
arow_pos[1] - the first element of the array


SS_STEP_BACK - The paragraph indentation

!!!Not used in the current version. Instead is used SS_TEXT_REF with type=SSR_LINE_INDENT

struct step_back
{     //0x0c
Word8 code;
Word8 Step_back; // value of backstep
};

m_step_back - the indentation


SS_POSITION - the position

!!!Not used in the current version. Instead is used SS_TEXT_REF with type=SSR_LINE_X

struct position
{
Word8 code; //0x0E
Word8 store;
Word16 pos; // position in line for
}; // left frame of fragm

store - the reserve
pos - the position (in the line relative to the left fragment edge)


SS_TABL_CONFORM_SIZES - The table of the sizes conformity

struct table_conform_sizes
{
Word8 code;            //0x10
Word8 store;
Int8 tab_val_A [9*2]; // table of sizes of letter A
}; // for kegles from 4 to 12

store - the reserve
tab_val_A [9*2] - the table [9x2] (The elements of the table - the values of the type size 4-12 (inclusive) for the capital À letter. The length of the element is 1 byte).


SS_GROUP_WORDS - the group of words

struct group_words
{
Word8 code;            //0x11
Word8 gr_wd_type; // 0 - beg of group
}; // 1 - cur. word of group
// 2 - end
// 3 - partition between groups

gr_wd_type - type (0-the beginning of a group, 1-the current word of a group, 2-the end of a group, 3-the division between the groups.)


SS_GROUP_SYMBOLS - the group of symbols

struct group_symbols
{
Word8 code;            //0x12
Word8 gr_sb_type;
};

gr_sb_type - type (0-the beginning of a group, 1-the current word of a group, 2-the end of a group, 3-the division between the groups.)


SS_BORDER

struct border
{
Word8 code; // 0x16 SS_BORDER
Word8 type; // 1 - left
#define b_vert 1 // 2 - right
#define b_hor 4 // 4 - top
            // 8 - bottom
Word16 length; // Length of border in pixels
Word16 x,y; // Coordinates of line begin
};


SS_TABLE_HEADER

struct table_header
{
Word8 code; // 0x17 SS_TABLE_HEADER
Word8 cols; // Number of columns
Word16 lth; // Total length of record
Word16 nfrag[1]; // List of fragments in header of table
};


SS_LIST_OF_FRAGMENTS

struct list_of_fragments
{
Word8 code; // 0x18 SS_LIST_OF_FRAGMENTS
Word8 cols; // Number of columns
Word16 lth; // Total length of record
Word16 nfrag[1]; // List of fragments in table
};


SS_AKSANT - the accent

struct aksant
{
Word8 code1;            //0x1D
Word8 code;
};
code1 - symbol(?)
code - symbol


If you meet a tag that is not described above and its value is less than 0x20, it means that someone has extended the ed format (not recommended), and all we can do is to return with an error "end of file can not be read".

If a tag is not described above and has the value more or equal to 0x20 (space code) than it is not a tag, but it is a code of a recognizes symbol.

A symbol is considered to belong to the last line met.

!!! The very first symbol on a page must be preceded with a block describing the line beginning (SS_FRAGMENT or SS_LINE_BEG)!!! If such a block is absent, than the symbol should be ignored(?).

A symbol is implemented as an array of pairs: <alternate polygraphic symbol> - <its probability>. The symbol and the probability occupy by 1 byte each. The zero in the lowest probability bit serves as the end of the array. The symbol with the biggest probability (among all the others in the array) is placed first (before user's correction).

Now same thing but in my words. A file contain not more than 8(?) different recognized variants of a given symbol, e.g. not more than 8 consecutive structures

struct letter
{
Word8 bType; // ASCII code. >= ' '.
Word8 bAttrib;
};

bType - the recognized letter
bAttrib - the authenticity of the recogniton

If the field bAttrib contains 1 in the lowest bit (e.g. bAttrib&1==1), than it is not the last alternative - you should read one more such a structure. Otherwise it is the last one, then goes the new block. If we make (bAttrib & 254), than we will get the authenticity of the recogniton of the corresponding letter. Looks like it's most reasonable to take the variant with the maximum authenticity (among the others).



EXTENDED ED

(see SS_EXTENTION)

Every block from this group is actually a structure beginning with edExtention that contains the information about what is stored here. The field edExtention.Ecode may take one of the following values (divided to logical groups):

0x0000 - 0x00FF special code
0x0100 - 0x01FF table description
0x0200 - 0x02FF picture description
0x0300 - 0xEFFF Your code please...

0xF000 - 0xFFFF temporary code for debugging ( ! Not used in release version !)

#define ITS_EDEXT_SPECIAL(a) (a>=0x0000 && a<0x0100)
#define ITS_EDEXT_TABLE(a) (a>=0x0100 && a<0x0200)
#define ITS_EDEXT_PICTURE(a) (a>=0x0200 && a<0x0300)

These macros will help to work with the extension code.

ITS_EDEXT_SPECIAL(Ecode) the codes in this range are reserved for future usage.

ITS_EDEXT_TABLE(Ecode) the codes in this range describe the tables.

ITS_EDEXT_PICTURE(Ecode) the codes in this range describe the images.

The codes 0x0300 - 0xEFFF you may use the way you want. Tell us what range are you going to use to avoid the possible range intersections.

The codes 0xF000 - 0xFFFF are reserved for the debugging purposes. Do not use them in the release versions.


Here is the description of the concrete values:

EDEXT_VERSION - the version

EDEXT_TABLE_START - the table descriptor

This code initializes the table work. It points to the table describing structure.

typedef struct sTable
{
EDEXT     head;
Int32 Sh; // the number of the horizontal lines
Int32 Sv; // the number of the vertical lines
Int32 angle; // the angle of the table skew in 1/1024 radians
Int32 VerCount; // the number of the vertical lines that are to be deleted
                          // by the means of the recognition program
Int32 ShiftX; // the shift of the upper left table corner from
Int32 ShiftY; // the same corner of the image
} edTable;

EDEXT_TABLE_VER
This code ponts to the array of numbers - the X-coordinates of the vertical lines. The number of numbers Sv in the structure sTable. The type INT.

EDEXT_TABLE_HOR
This code ponts to the array of numbers - the Y-coordinates of the vertical lines. The number of numbers Sh in the structure sTable. The type INT.

EDEXT_TABLE_TAB
This code ponts to the array of numbers determining the matrix of the table description. Each matrix element is a number - the user's fragment number (user_num), that is located in this table. If the neighbor matrix cells have the same number, than these matrix cells determine one table cell. The number of numbers in the structure is calculated as (Sh-1)*(Sv-1). The type is INT.

EDEXT_TABLE_VERLINE
This code ponts to the array of the numbers - the X,Y coordinates of the vertical lines to be deleted. The number of the numbers is equal to VertCount*2. The type is INT.

EDEXT_CTP       
This code points to the ctp_hdr structure. This code initializes the work with the images (see CTP.H).

struct ctp_hdr
{
#define SIGNA "CT Picture"
BYTE Signatura[10];
};
This structure is located in the very beginning of a ctp file

EDEXT_PICTURE - the picture descriptor. It precedes the structure

struct ctp_pic_hdr {
Word32    pic_size;    // in bytes
Word16    x_off;        // in pixels
Word16    y_off;        // in pixels
Word16    w;        // in pixels
Word16    h;        // in pixels
Word16    resolution;
Word16    bpl;        // bytes per line. Only if native format
Word8 bitpix;
Word8    type;
Word8 PicName[32];
};

where type may take the values:

#define ctp_BW 0
#define ctp_greytone 1
#define ctp_color    2
#define ctp_2xx        10 // compressed
#define ctp_4xx        11 // contured
#define ctp_native 64 /* if type less this const
            It one of native formats */
#define ctp_tiff 65
#define ctp_gif 66
#define ctp_pcx 67
#define ctp_bmp 68
#define ctp_wmf        69
#define ctp_jpeg    70

This structure describes the concrete picture. All the elements of this structure except for PicName have the reference character. The element PicName contains the file name that contains an image. Currently (10.10.96) it is suggested that it is located in the subfolder of the folder where ED is located (named equally to the ED filename).

This structure is located immediately after ctp_hdr in a ctp file.


Composite table of the extended codes

#define EDEXT_VERSION        0x0000

#define EDEXT_TABLE_START     0x0100 // edTable struct
#define EDEXT_TABLE_VER 0x0101 // array horiz. coord of vert lines (x0,x1,...)
#define EDEXT_TABLE_HOR 0x0102 // array vert. coord of horiz. lines (y0,y1,...)
#define EDEXT_TABLE_TAB 0x0103 // array ID of items
#define EDEXT_TABLE_VERLINE 0x0104 // array not delete vert lines (x00,y00,x01,y01,...)

#define EDEXT_CTP        0x0200 // filename of CTP file
#define EDEXT_PICTURE 0x0201 // struct of <edPicture>


Appendix 1

The list of the tags

#define SS_BITMAP_REF         0x00
#define SS_TEXT_REF         0x01
#define SS_REMARK SS_TEXT_REF
#define SS_FONT_KEGL         0x02
#define SS_KEGL         0x03
#define SS_SHIFT         0x04
#define SS_RETR_LEVEL         0x05
#define SS_UNDERLINE         0x06
#define SS_DENS_PRINT         0x07
#define SS_TABUL         0x08
#define SS_TABL_TABUL         0x09
#define SS_SHEET_DESCR         0x0a
#define SS_FRAGMENT         0x0b
#define SS_STEP_BACK         0x0c
#define SS_LINE_BEG         0x0d
#define SS_POSITION         0x0e
#define SS_LANGUAGE         0x0f
#define SS_TABL_CONFORM_SIZES     0x10
#define SS_GROUP_WORDS         0x11
#define SS_GROUP_SYMBOLS     0x12
#define SS_PARAGRAPH     0x15
#define SS_BORDER     0x16
#define SS_TABLE_HEADER     0x17
#define SS_LIST_OF_FRAGMENTS     0x18
#define SS_AKSANT         0x1d
#define SS_NEG_HALF_SPACE     0x1e
#define SS_POS_HALF_SPACE     0x1f


Appendix 2

The list of the language codes for the EDCC_LANGUAGE structure

LANG_ENGLISH english
LANG_GERMAN german
LANG_FRENCH french
LANG_RUSSIAN russian
LANG_SWEDISH swedish
LANG_SPANISH spanish
LANG_ITALIAN italian
LANG_RUSENG russian&english
LANG_UKRAINIAN ukranian
LANG_SERBIAN serbian
LANG_CROATIAN croatian
LANG_DANISH danish
LANG_PORTUGUESE portuguese
LANG_DUTCH dutch

#define LANG_ENGLISH 0
#define LANG_GERMAN 1
#define LANG_FRENCH 2
#define LANG_RUSSIAN 3
#define LANG_SWEDISH 4
#define LANG_SPANISH 5
#define LANG_ITALIAN 6
#define LANG_RUSENG 7
#define LANG_UKRAINIAN 8
#define LANG_SERBIAN 9
#define LANG_CROATIAN 10
#define LANG_DANISH 11
#define LANG_PORTUGUESE 12
#define LANG_DUTCH 13


Appendix 3

The sizes of the standart blocks

If the size is bigger or equal to 0x40, it means that the real size is stored in the structure itself by the offset equal (table value & 0xf). The length of this size is 2 bytes, if the table value is >=0x80, and 1 byte conversely.

static unsigned char ed_table[]=
{
sizeof (struct bit_map_ref),                   /* 0 SS_BITMAP_REF */
sizeof (struct text_ref),                          /* 1 SS_TEXT_REF */
sizeof (struct font_kegl),                       /* 2 SS_FONT_KEGL */
sizeof (struct kegl),                               /* 3 SS_KEGL */
sizeof (struct shift),                               /* 4 SS_SHIFT */
sizeof (struct retrieve_level),                 /* 5 SS_RETR_LEVEL */
sizeof (struct underline),                        /* 6 SS_UNDERLINE */
sizeof (struct dens_print),                      /* 7 SS_DENS_PRINT */
sizeof (struct tabul),                               /* 8 SS_TABUL */
0x41,                                                    /* 9 SS_TABL_TABUL */
0x84,                                                    /* 0A SS_SHEET_DESCR */
sizeof (struct fragm_disk),                      /* 0B SS_FRAGMENT */
sizeof (struct step_back),                       /* 0C SS_STEP_BACK */
sizeof (struct line_beg),                          /* 0D SS_LINE_BEG */
sizeof (struct position),                           /* 0E SS_POSITION */
sizeof(struct EdTagLanguage),               /* 0F SS_LANGUAGE */
sizeof (struct table_conform_sizes),       /* 10 SS_TABL_CONFORM_SIZES */
sizeof (struct group_words),                   /* 11 SS_GROUP_WORDS */
sizeof (struct group_symbols),                /* 12 SS_GROUP_SYMBOLS */
0,                                                          /* 13 Unused code */
0,                                                          /* 14 Unused code */
2,                                                          /* 15 ASCII symbol 'Start Paragraph' */
sizeof (struct border),                            /* 16 SS_BORDER */
0x82,                                                    /* 17 SS_TABLE_HEADER */
0x82,                                                    /* 18 SS_LIST_OF FRAGMENTS */
0,                                                          /* 19 Unused code */
0,                                                          /* 1A Unused code */
0,                                                          /* 1B Unused code */
0x83,                                                    /* 1C Special code of Extention ED */
sizeof (struct aksant),                             /* 1D SS_AKSANT */
sizeof (struct neg_half_space),               /* 1E SS_NEG_HALF_SPACE */
sizeof (struct pos_half_space)                /* 1F SS_POS_HALF_SPACE */
};

The table of the codes of the special symbols

identifying code length in bytes name
0x00 6 the link to the graphical image
0x01 4 the link to the text
0x02 4 the font and the point type
0x03 2 the point type
0x04 4 the shift
0x05 2 restore the level
0x06 2 the underlining
0x07 2 the print density
0x08 2 the tab
0x09 ? the tab table
0x0a ? the page descriptor
0x0b 14 the fragment descriptor (can be found only in the header next to the page descriptor)
0x0b 4 the beginning of the fragment
0x0ñ 2 the indentation
0x0d 4 the beginning of the line
0x0e 4 the position
0x10 18 the table of the sizes conformity
0x11 2 the group of words
0x12 2 the group of symbols
0x1d 2 the accent
0x1e 2 the positive halfspace
0x1f 2 the negative halfspace

Appendix 4

The codes of the descriptors for SS_TEXT_REF(SS_REMARK)

#define SSR_HUGE        0 /* Internal remark about huge letter */

/* Seria within broken line: first - BROKEN_X, last - BROKEN_W */
#define SSR_BROKEN_BASELINE    1
#define SSR_BROKEN_HEIGHT    2
#define SSR_BROKEN_X        3
#define SSR_BROKEN_W        4

/* Seria about line: first - LINE_FN, other - optional */
#define SSR_LINE_FN        5
#define SSR_LINE_PARAGRAPH    6
#define SSR_LINE_TYPE        7
#define SSR_LINE_INDENT        8
#define SSR_LINE_NPIECES 9
#define SSR_LINE_FICT        16
#define SSR_LINE_BULIND        23
#define SSR_LINE_X        24
#define SSR_LINE_W        25
#define SSR_LINE_BASELINE    26
#define SSR_LINE_HEIGHT        27

/* Seria about fragment: first - FRAG_TYPE */
#define SSR_FRAG_TYPE        10
#define SSR_FRAG_BASE        11
#define SSR_FRAG_X        12
#define SSR_FRAG_W        13
#define SSR_FRAG_N        14
#define SSR_FRAG_SN        17
#define SSR_FRAG_REL        18
#define SSR_FRAG_Y         19
#define SSR_FRAG_H        20
#define SSR_FRAG_PNUM        21
#define SSR_FRAG_COLXW        28

#define SSR_FRAG_END        15    /* end of fragments list */

/* Miscellaneous */

#define SSR_LINE_NUMBER        22     /* Internal line number */
#define SSR_WORD_BOLD 32 /* density of word */
#define SSR_SHEET_TYPE 33 /* type of sheet */


Appendix 5

#define ONE_LINE 0x0001
#define LEFT_ALLIGN 0x0002
#define RIGHT_ALLIGN 0x0004
#define CENTER 0x0008
#define POS_INDENT 0x0010
#define NEG_INDENT 0x0020
#define BULLET 0x0040
#define POESY 0x0080
#define LN_SPACE 0x0100
#define NOT_RECOG 0x0200
#define BORDER 0x0400
#define BRACKET 0x8000
#define MCOL_BEG (BRACKET|0x1000)
#define NEW_COL (BRACKET|0x2000)
#define MCOL_END (BRACKET|0x3000)
#define TAB_BEG (BRACKET|0x4000)
#define NEW_ROW (BRACKET|0x5000)
#define DEL (NOT_RECOG|BORDER)
#define FICT_FR_FLAGS (BORDER|BRACKET)

Hosted by uCoz