For OCR | 基于 OpenCV 的图像预处理_飞道的博客

For OCR | 基于 OpenCV 的图像预处理

2021-03-29 09:18 684人阅读评论(0)

欢迎关注 “小白玩转Python”，发现更多 “有趣”

如果你的图像有随机噪声，不均匀的光照，前面的物体上的洞，等等。在将图片发布到计算机视觉 API 之前，有几件事情是你可以做的。在本文中，我们将介绍几种使用 OpenCV 提高 OCR 处理效果的技术。

安装 OpenCV

使用首选的包管理器安装 OpenCV


   
    
     
      
     
     
      
       conda 
       install -c conda-forge opencv
      
     
    
     
      
     
     
      
       pip 
       install opencv-python

读取图片

我将以我最喜欢的一本书的第一版的封面为例。让我们首先读取图像，指定要读取图像的颜色类型，这将读取图像的默认颜色格式为 OpenCV 中的 BGR (即蓝绿红)。然后我们将颜色空间转换为更常见的 RGB 顺序(为了可视化) ，最后，编写一个小函数来显示图像：


   
    
     
      
     
     
      
       import cv2
      
     
    
     
      
     
     
      
       import matplotlib.pyplot 
       as plt
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       img = cv2.imread(
      
     
    
     
      
     
     
      
         filename=
       'hands-on-machine-learning.jpg',
      
     
    
     
      
     
     
      
         flags=cv2.IMREAD_COLOR,
      
     
    
     
      
     
     
      
       )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       h, w, c = img.shape
      
     
    
     
      
     
     
      
       print(
       f'Image shape: {h}H x {w}W x {c}C')
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       img = cv2.cvtColor(
      
     
    
     
      
     
     
      
         src=img,    
      
     
    
     
      
     
     
      
         code=cv2.COLOR_BGR2RGB,
      
     
    
     
      
     
     
      
       )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       def show_image(img, **kwargs):
      
     
    
     
      
     
     
          
       """
      
     
    
     
      
     
     
      
        Show an RGB numpy array of an image without any interpolation
      
     
    
     
      
     
     
      
        """
      
     
    
     
      
     
     
      
           plt.subplot()
      
     
    
     
      
     
     
      
           plt.axis(
       'off')
      
     
    
     
      
     
     
      
           plt.imshow(
      
     
    
     
      
     
     
      
               X=img,
      
     
    
     
      
     
     
      
               interpolation=
       'none',
      
     
    
     
      
     
     
      
               **kwargs
      
     
    
     
      
     
     
      
           )
      
     
    
     
      
     
     
          
      
     
    
     
      
     
     
          
      
     
    
     
      
     
     
      
       show_image(img)

裁剪图像

大多数情况下，您要么在文本周围有一个框坐标(来自标签工具) ，要么您只对图像的一部分感兴趣。我们的图像是一个 3D numpy 数组。要裁剪它，我们可以简单地沿着高度和宽度切片：


   
    
     
      
     
     
      
       ymin, ymax = 
       200, 
       780
      
     
    
     
      
     
     
      
       xmin, xmax = 
       100, 
       1000
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       img = img[
      
     
    
     
      
     
     
          
       int(ymin): 
       int(ymax),
      
     
    
     
      
     
     
          
       int(xmin): 
       int(xmax),
      
     
    
     
      
     
     
      
       ]
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       h, w, c = img.shape
      
     
    
     
      
     
     
      
       print(f
       'Image shape: {h}H x {w}W x {c}C')
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       show_image(img)

扩充边界

这对于使用针对文档进行 OCR 模型训练的 API (文档通常有白色边框)可能很有用


   
    
     
      
     
     
      
       img_border = 
       cv2.copyMakeBorder(
      
     
    
     
      
     
     
          
       src=
       img,
      
     
    
     
      
     
     
          
       top=
       10,
      
     
    
     
      
     
     
          
       bottom=
       10,
      
     
    
     
      
     
     
          
       left=
       10,
      
     
    
     
      
     
     
          
       right=
       10,
      
     
    
     
      
     
     
          
       borderType=
       cv2.BORDER_CONSTANT,
      
     
    
     
      
     
     
          
       value=
       (255, 255, 255),
      
     
    
     
      
     
     
      
       )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       h, 
       w, c = img_border.shape
      
     
    
     
      
     
     
      
       print(f'Image 
       shape: {h}H x {w}W x {c}C')
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       show_image(img_border)

调整图像大小

API 将为输入图像设置一个最大尺寸。如果图像需要调整大小，应该保留长宽比


   
    
     
      
     
     
      
       MAX_PIX = 
       800
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       def resize_image(img, flag):
      
     
    
     
      
     
     
          
       """
      
     
    
     
      
     
     
      
        Resize an RGB numpy array of an image, either along the height or the width, and keep its aspect ratio. Show restult.
      
     
    
     
      
     
     
      
        """
      
     
    
     
      
     
     
      
           h, w, c = img.shape
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       if flag == 
       'h':
      
     
    
     
      
     
     
      
               dsize = (int((MAX_PIX * w) / h), int(MAX_PIX))
      
     
    
     
      
     
     
          
       else:
      
     
    
     
      
     
     
      
               dsize = (int(MAX_PIX), int((MAX_PIX * h) / w))
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
           img_resized = cv2.resize(
      
     
    
     
      
     
     
      
               src=img,
      
     
    
     
      
     
     
      
               dsize=dsize,
      
     
    
     
      
     
     
      
               interpolation=cv2.INTER_CUBIC,
      
     
    
     
      
     
     
      
           )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
           h, w, c = img_resized.shape
      
     
    
     
      
     
     
      
           print(
       f'Image shape: {h}H x {w}W x {c}C')
      
     
    
     
      
     
     
          
      
     
    
     
      
     
     
      
           show_image(img_resized)
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       return img_resized
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       if h > MAX_PIX:
      
     
    
     
      
     
     
      
           img_resized = resize_image(img, 
       'h')    
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       if w > MAX_PIX:
      
     
    
     
      
     
     
      
           img_resized = resize_image(img, 
       'w')

图像形态学操作

Opening：腐蚀后使用 5×5 的内核膨胀操作，有助于消除噪声
Closing：膨胀后使用 5×5 的内核腐蚀操作，以用来堵住小洞


   
    
     
      
     
     
      
       def apply_morphology(img, method):
      
     
    
     
      
     
     
          
       """
      
     
    
     
      
     
     
      
        Apply a morphological operation, either opening (i.e. erosion followed by dilation) or closing (i.e. dilation followed by erosion). Show result.
      
     
    
     
      
     
     
      
        """
      
     
    
     
      
     
     
          
       if method == 
       'open':
      
     
    
     
      
     
     
      
               op = cv2.MORPH_OPEN
      
     
    
     
      
     
     
          
       elif method == 
       'close':
      
     
    
     
      
     
     
      
               op = cv2.MORPH_CLOSE
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
           img_morphology = cv2.morphologyEx(
      
     
    
     
      
     
     
      
               src=img,
      
     
    
     
      
     
     
      
               op=op,
      
     
    
     
      
     
     
      
               kernel=np.ones((
       5, 
       5), np.uint8),
      
     
    
     
      
     
     
      
           )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
           show_image(img_morphology)
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       return img_morphology

高斯模糊操作

图像模糊(又名平滑)是有用的消除高频(如噪声)的操作。注意，内核越大，模糊效果越差。


   
    
     
      
     
     
      
       img_gaussian =
        cv2.GaussianBlur(
      
     
    
     
      
     
     
          
       src=
       img,
      
     
    
     
      
     
     
          
       ksize=
       (5, 5),
      
     
    
     
      
     
     
          
       sigmaX=
       0,
      
     
    
     
      
     
     
          
       sigmaY=
       0,
      
     
    
     
      
     
     
      
       )

自适应阈值操作

阈值化将灰度图像转换为二值图像。自适应阈值是有用的时候，图像有不同的照明条件在不同的地区，因为它计算不同的阈值不同的区域。

下面的函数使用以下两种方法中的一种来自适应阈值化：

高斯：阈值是邻域值的加权和，其中权重是一个高斯窗口
平均值：阈值是邻域面积的平均值


   
    
     
      
     
     
      
       def apply_adaptive_threshold(img, method):
      
     
    
     
      
     
     
          
       """
      
     
    
     
      
     
     
      
        Apply adaptive thresholding, either Gaussian (threshold value is the weighted sum of neighbourhood values where weights are a Gaussian window) or mean (threshold value is the mean of neighbourhood area). Show result.
      
     
    
     
      
     
     
      
        """
      
     
    
     
      
     
     
      
           img = cv2.cvtColor(
      
     
    
     
      
     
     
      
               src=img,
      
     
    
     
      
     
     
      
               code=cv2.COLOR_RGB2GRAY,
      
     
    
     
      
     
     
      
           )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       if method == 
       'gaussian':
      
     
    
     
      
     
     
      
               adaptive_method = cv2.ADAPTIVE_THRESH_GAUSSIAN_C
      
     
    
     
      
     
     
          
       elif method == 
       'mean':
      
     
    
     
      
     
     
      
               adaptive_method = cv2.ADAPTIVE_THRESH_MEAN_C
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
           img_adaptive = cv2.adaptiveThreshold(
      
     
    
     
      
     
     
      
               src=img,
      
     
    
     
      
     
     
      
               maxValue=
       255,
      
     
    
     
      
     
     
      
               adaptiveMethod=adaptive_method,
      
     
    
     
      
     
     
      
               thresholdType=cv2.THRESH_BINARY,
      
     
    
     
      
     
     
      
               blockSize=
       11,
      
     
    
     
      
     
     
      
               C=
       2,
      
     
    
     
      
     
     
      
           )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
           show_image(img_adaptive, cmap=
       'gray')
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       return img_adaptive

Sobel 滤波器操作

Sobel 算子结合了高斯平滑和微分(图像沿 x 或 y 的一阶导数)。它们有助于检测水平或垂直边缘，并能抵抗噪声。为了检测水平和垂直边缘，我们可以沿着 x 和 y 分别使用 Sobel 滤波器操作，然后查看结果。


   
    
     
      
     
     
      
       def 
       apply_sobel(img, direction):
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       img = 
       cv2.cvtColor(
      
     
    
     
      
     
     
              
       src=
       img,
      
     
    
     
      
     
     
              
       code=
       cv2.COLOR_RGB2GRAY,
      
     
    
     
      
     
     
          
       )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       if 
       direction == 'h':
      
     
    
     
      
     
     
              
       dx, 
       dy = 0, 1
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       elif 
       direction == 'v':
      
     
    
     
      
     
     
              
       dx, 
       dy = 1, 0
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       img_sobel = 
       cv2.Sobel(
      
     
    
     
      
     
     
              
       src=
       img,
      
     
    
     
      
     
     
              
       ddepth=
       cv2.CV_64F,
      
     
    
     
      
     
     
              
       dx=
       dx,
      
     
    
     
      
     
     
              
       dy=
       dy,
      
     
    
     
      
     
     
              
       ksize=
       5,
      
     
    
     
      
     
     
          
       )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       return 
       img_sobel

拉普拉斯滤波器操作

与 Sobel 算子类似，拉普拉斯算子使用微分进行处理。但是，他们使用图像沿 x 和 y 的二阶导数(通过内部加上使用 Sobel 算子计算的二阶 x 和 y 导数)。拉普拉斯算子用于检测边缘(以及其他二阶导数为0的可能无意义的位置，所以它们应该只适用于需要的地方)。


   
    
     
      
     
     
      
       def 
       apply_laplacian(img):
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       img = 
       cv2.cvtColor(
      
     
    
     
      
     
     
              
       src=
       img,
      
     
    
     
      
     
     
              
       code=
       cv2.COLOR_RGB2GRAY,
      
     
    
     
      
     
     
          
       )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       img_laplacian = 
       np.uint8(
      
     
    
     
      
     
     
              
       np.absolute(
      
     
    
     
      
     
     
                  
       cv2.Laplacian(
      
     
    
     
      
     
     
                      
       src=
       img,
      
     
    
     
      
     
     
                      
       ddepth=
       cv2.CV_64F,
      
     
    
     
      
     
     
                  
       )
      
     
    
     
      
     
     
              
       )
      
     
    
     
      
     
     
          
       )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       show_image(img_laplacian, 
       cmap='gray')
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
          
       return 
       img_laplacian

其他代码

最后，当你的图像中有一些不同区域的框坐标时(例如，你使用了一个标签工具来标注标题、副标题、作者等) ，你不希望: 裁剪每个区域 > 保存裁剪后的图像本地 > 使用本地图像发布一个 API 请求，因为这会拖慢你的速度。

为了解决这个问题，您可以改为：裁剪每个区域 > 编码裁剪图像到内存缓冲区 > 后请求使用内存缓冲区中的图像，这比第一种方法更快。

下面展示了如何将图像编码到 buffer 中，然后使用 buffer 为 OCR 请求准备好数据(以及如果需要的话如何将图像解码回来) ：


   
    
     
      
     
     
      
       _, buf =
        cv2.imencode(
      
     
    
     
      
     
     
          
       ext=
       ".jpg",
      
     
    
     
      
     
     
          
       img=
       img,
      
     
    
     
      
     
     
      
       )
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       data = 
       buf.tostring()
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
       
      
     
    
     
      
     
     
      
       img = 
       cv2.imdecode(
      
     
    
     
      
     
     
          
       buf=
       buf,
      
     
    
     
      
     
     
          
       flags=
       cv2.IMREAD_UNCHANGED,
      
     
    
     
      
     
     
      
       )

· END ·

HAPPY LIFE

转载：https://blog.csdn.net/weixin_38739735/article/details/114422618

查看评论

飞道的博客

飞道的博客

个人资料

文章分类

文章存档

阅读排行

评论排行

推荐文章

For OCR | 基于 OpenCV 的图像预处理

* 以上用户言论只代表其个人观点，不代表本网站的观点或立场