Python图片识别汉字字母数字,tesseract-ocr


        环境:ubuntu + python2.7

        代码:

#/usr/bin/env python
# -*- coding: UTF-8 -*-

from PIL import Image
import pytesseract
text=pytesseract.image_to_string(Image.open('/root/Desktop/444.jpg'),lang='chi_sim')
print(text)


        效果:

444.jpg

1.png

2.png

        步骤:

                1:这里我们需要用到两个库:pytesseract和PIL

                2:同时我们还需要安装识别引擎tesseract-ocr

                3:下载中文简体字库chi_sim.traineddata


        安装pytesseract和PIL

                pip install PIL 

                pip install pytesseract 


        安装识别引擎tesseract-ocr

                安装Tesseract

                sudo apt-get install tesseract-ocr

                安装中文

                sudo apt-get install tesseract-ocr-chi-sim


        下载中文简体字库

                地址:https://download.csdn.net/download/leoeitail/10275552

                存放路劲:/usr/local/share/tessdata/


上一篇