今天介绍一个简单验证的识别。
主要是标准的格式,没有扭曲和变现。就用 pytesseract 去识别一下。
验证码地址:http://wscx.gjxfj.gov.cn/zfp/webroot/xfsxcx.html
需要识别的验证码是:
因为这个验证码有干扰点,所以直接识别的效果非常不好。
首先对验证码进行二值化和降噪。
效果如下:
识别结果:
识别率只有百分之四十,针对这么低的识别率,可以去切割分类,目前这个验证码很容易去切割。提高验证码的识别率问题。
二值化代码:
# coding:utf-8 import sys, os from PIL import Image, ImageDraw # 二值数组 t2val = {} def twoValue(image, G): for y in xrange(0, image.size[1]): for x in xrange(0, image.size[0]): g = image.getpixel((x, y)) if g > G: t2val[(x, y)] = 1 else: t2val[(x, y)] = 0 # 根据一个点A的RGB值,与周围的8个点的RBG值比较,设定一个值N(0 <N <8),当A的RGB值与周围8个点的RGB相等数小于N时,此点为噪点 # G: Integer 图像二值化阀值 # N: Integer 降噪率 0 <N <8 # Z: Integer 降噪次数 # 输出 # 0:降噪成功 # 1:降噪失败 def clearNoise(image, N, Z): for i in xrange(0, Z): t2val[(0, 0)] = 1 t2val[(image.size[0] - 1, image.size[1] - 1)] = 1 for x in xrange(1, image.size[0] - 1): for y in xrange(1, image.size[1] - 1): nearDots = 0 L = t2val[(x, y)] if L == t2val[(x - 1, y - 1)]: nearDots += 1 if L == t2val[(x - 1, y)]: nearDots += 1 if L == t2val[(x - 1, y + 1)]: nearDots += 1 if L == t2val[(x, y - 1)]: nearDots += 1 if L == t2val[(x, y + 1)]: nearDots += 1 if L == t2val[(x + 1, y - 1)]: nearDots += 1 if L == t2val[(x + 1, y)]: nearDots += 1 if L == t2val[(x + 1, y + 1)]: nearDots += 1 if nearDots < N: t2val[(x, y)] = 1 def saveImage(filename, size): image = Image.new("1", size) draw = ImageDraw.Draw(image) for x in xrange(0, size[0]): for y in xrange(0, size[1]): draw.point((x, y), t2val[(x, y)]) image.save(filename) for i in range(1,11): path = "5/" + str(i) + ".jpg" image = Image.open(path).convert("L") twoValue(image, 222) clearNoise(image, 3, 6) path1 = "5/" + str(i) + ".png" saveImage(path1, image.size)
识别代码:
#coding:utf-8 from common.contest import * from PIL import Image import pytesseract def recognize_captcha(img_path): im = Image.open(img_path) tessdata_dir_config = ‘--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"‘ num = pytesseract.image_to_string(im,config=tessdata_dir_config) return num if __name__ == ‘__main__‘: for i in range(1, 11): img_path = "5/" + str(i) + ".png" res = recognize_captcha(img_path) strs = res.split("\n") print strs[0].replace(" ",‘‘)
原文地址:https://www.cnblogs.com/xuchunlin/p/11333578.html
时间: 2024-11-07 16:59:14