tesseract.exe eng.arial.01.tif eng.arial.01 batch.nochop makebox


4.训练(如果遇到不可识别的字符,couldn t find a matching blob,尝试换位置或调坐标)

tesseract.exe eng.arial.01.tif eng.arial.01 nobatch box.train


unicharset_extractor.exe eng.arial.01.box

6.创建font_properties.txt,内容为:arial 0 0 0 0 0


mftraining.exe -F font_properties.txt -U unicharset eng.arial.01.tr

8.cntraining.exe eng.arial.01.tr

9.把unicharset, inttemp, normproto, pffmtable这四个文件加上前缀“eng.arial.01.”

10.combine_tessdata.exe eng.arial.01.


Combining tessdata files

TessdataManager combined tesseract data files.

Offset for type 0 is -1

Offset for type 1 is 108

Offset for type 2 is -1

Offset for type 3 is 1660

Offset for type 4 is 327545

Offset for type 5 is 327781

Offset for type 6 is -1

Offset for type 7 is -1

Offset for type 8 is -1

Offset for type 9 is -1

Offset for type 10 is -1

Offset for type 11 is -1

Offset for type 12 is –1




#tesseract.exe test.jpg result -l eng.arial.01

#tesseract.exe a.bmp result2 -l eng.arial.01


tesseract.exe 42.png result2 -l eng.arial.01 -psm 7


-psm N

Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are:

0 = Orientation and script detection (OSD) only.

1 = Automatic page segmentation with OSD.

2 = Automatic page segmentation, but no OSD, or OCR.

3 = Fully automatic page segmentation, but no OSD. (Default)

4 = Assume a single column of text of variable sizes.

5 = Assume a single uniform block of vertically aligned text.

6 = Assume a single uniform block of text.

7 = Treat the image as a single text line.

8 = Treat the image as a single word.

9 = Treat the image as a single word in a circle.

10 = Treat the image as a single character.

