Emgu.CV.OCR.Tesseract.Tesseract(string, string, Emgu.CV.OCR.Tesseract.OcrEngineMode, string)
public Tesseract(string dataPath, string language, Emgu.CV.OCR.Tesseract.OcrEngineMode mode, string whiteList)
Member of Emgu.CV.OCR.Tesseract
Create an tesseract OCR engine.
dataPath: The datapath must be the name of the parent directory of tessdata and must end in / . Any name after the last / will be stripped.
language: The language is (usually) an ISO 639-3 string or NULL will default to eng. It is entirely safe (and eventually will be efficient too) to call Init multiple times on the same instance to change language, or just to reset the classifier. The language
may be a string of the form [~]%lt;lang>[+[~]<lang>]* indicating that multiple languages are to be loaded. Eg hin+eng will load Hindi and English. Languages may specify internally that they want to be loaded with one or more other languages, so the ~ sign
is available to override that. Eg if hin were set to load eng by default, then hin+~eng would force loading only hin. The number of loaded languages is limited only by memory, with the caveat that loading additional languages will impact both speed and accuracy,
as there is more work to do to decide on the applicable language, and there is more chance of hallucinating incorrect words.
mode: OCR engine mode
whiteList: This can be used to specify a white list for OCR. e.g. specify "1234567890" to recognize digits only. Note that the white list currently seems to only work with OcrEngineMode.OEM_TESSERACT_ONLY
Tesseract tesseract = new Tesseract();
tesseract.Init(path, lang,Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);//path为语言包路径,lang为语言
tesseract.SetVariable("tessedit_char_whitelist", "0123456789");