小言_互联网的博客

佛爷带你使用自动编码器(卷积神经网络CNN)构建一个简单的图像检索系统

448人阅读  评论(0)

图像检索是研究领域在过去十年中一个非常活跃和快速发展的领域。最著名的系统是Google Image Search和Pinterest Visual Pin Search。在本文中,我们将学习使用一种称为自动编码器的特殊类型的神经网络来构建一个非常简单的图像检索系统。我们将以一种无监督的方式进行操作,即无需查看图像标签。实际上,我们将仅通过使用图像的视觉内容(纹理,形状等)来检索图像。与关键字或基于文本的图像检索相反,这种类型的图像检索称为基于内容的图像检索(CBIR)。

在本文中,我们将使用手写数字图像,MNIST数据集和Keras深度学习框架。

自动编码器

简而言之,自动编码器是旨在将其输入复制到其输出的神经网络。他们通过将输入压缩为一个潜在空间表示,然后从该表示重构输出来进行工作。这种网络由两部分组成:
编码器:这是网络中将输入压缩为潜在空间表示的部分。它可以由编码函数h = f(x)表示。
解码器:这部分旨在从潜在空间表示中重建输入。它可以由解码函数r = g(h)表示。

如果您想了解有关自动编码器的更多信息,建议您阅读 《深入了解:自动编码器》

这种潜在的表示形式或代码正是我们感兴趣的,因为它是发现神经网络压缩每个图像的视觉内容的方式。这意味着所有相似的图像将以相似的方式被编码(希望)。
自动编码器有几种类型,但是由于我们要处理图像,所以最有效的方法是使用卷积自动编码器,它使用卷积层对图像进行编码和解码。

input_img = Input(shape=(28,28,1))
x = Conv2D(16,(3,3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2,2), padding='same')(x)
x = Conv2D(8,(3,3), activation='relu', padding='same')(x)
x = MaxPooling2D((2,2), padding='same')(x)
x = Conv2D(8,(3,3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2,2), padding='same', name='encoder')(x)

x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse')

因此,第一步是用我们的训练集训练我们的自动编码器,以使其学习将图像编码为潜在空间表示的方法。

训练完成后,我们只需要网络的编码部分。

encoder = Model(inputs=autoencoder.input, outputs=autoencoder.get_layer('encoder').output)

现在可以使用该编码器对我们的查询图像进行编码。

必须在我们的搜索数据库上完成相同的编码,我们要在该数据库中查找与查询图像相似的图像。然后,我们可以将查询代码与数据库代码进行比较,并尝试找到最接近的代码。为了进行比较,我们将使用最近邻技术。

最近邻居

我们将检索最近的代码的方法是通过执行最近邻居算法。最近邻方法背后的原理是找到距离新点最近的预定义数量的样本。距离可以是任何度量标准,但最常见的选择是欧几里得距离。对于尺寸均为n的查询图像q和样本s ,此距离可以通过以下公式计算。


在此示例中,我们将检索与查询图像最接近的5张图像。

# Fit the NN algorithm to the encoded test set
nbrs = NearestNeighbors(n_neighbors=5).fit(codes)

# Find the closest images to the encoded query image
distances, indices = nbrs.kneighbors(np.array(query_code))

结果

这些是我们检索到的图像,看起来很棒!所有检索到的图像都非常类似于我们的查询图像,并且它们都对应于相同的数字。这表明即使没有显示图像的相应标签,自动编码器也找到了一种以非常相似的方式对相似图像进行编码的方法

全部实验代码过程:

Unsupervised Image retrieval
Import the libraries
In [1]:
import numpy as np
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras.datasets import mnist
import matplotlib.pyplot as plt
Using TensorFlow backend.
Load the training data
In [2]:
(X_train,_),(X_test,_) = mnist.load_data()
Normalize the data
In [3]:
X_train = X_train.astype('float32') / 255.
X_test = X_test.astype('float32') / 255.
Reshape the data to have 1 channel
In [4]:
print(X_train.shape, X_test.shape)
(60000, 28, 28) (10000, 28, 28)
In [5]:
X_train = np.reshape(X_train, (-1, 28, 28, 1))
X_test = np.reshape(X_test, (-1, 28, 28, 1))
In [6]:
print(X_train.shape, X_test.shape)
(60000, 28, 28, 1) (10000, 28, 28, 1)
Create the autoencoder
In [7]:
input_img = Input(shape=(28,28,1))
x = Conv2D(16,(3,3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2,2), padding='same')(x)
x = Conv2D(8,(3,3), activation='relu', padding='same')(x)
x = MaxPooling2D((2,2), padding='same')(x)
x = Conv2D(8,(3,3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2,2), padding='same', name='encoder')(x)

x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
Train it
In [8]:
autoencoder.fit(X_train, X_train, epochs=2, batch_size=32, callbacks=None )
Train on 60000 samples, validate on 10000 samples
Epoch 1/2
60000/60000 [==============================] - 85s 1ms/step - loss: 0.1125 - val_loss: 0.1140
Epoch 2/2
60000/60000 [==============================] - 77s 1ms/step - loss: 0.1120 - val_loss: 0.1140
Out[8]:
<keras.callbacks.History at 0x12a58a908>
In [65]:
autoencoder.save('autoencoder.h5')
In [9]:
autoencoder.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 28, 28, 1)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 16)        160       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 16)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 14, 14, 8)         1160      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 7, 7, 8)           0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 7, 7, 8)           584       
_________________________________________________________________
encoder (MaxPooling2D)       (None, 4, 4, 8)           0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 4, 4, 8)           584       
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 8, 8, 8)           0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 8, 8, 8)           584       
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 16, 16, 8)         0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 14, 14, 16)        1168      
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 28, 28, 16)        0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 28, 28, 1)         145       
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________
Create the encoder part
The encoder part is the first half of the autoencoder, i.e. the part that will encode the input into a latent space representation. In this case, the dimension of this representation is $4 \times 4 \times 8$

In [10]:
encoder = Model(inputs=autoencoder.input, outputs=autoencoder.get_layer('encoder').output)
In [66]:
encoder.save('encoder.h5')
Load the query image
We take a query image from the test set

In [11]:
query = X_test[7]
In [12]:
plt.imshow(query.reshape(28,28), cmap='gray')
Out[12]:
<matplotlib.image.AxesImage at 0x139a16320>

Encode the test images and the query image
In [13]:
X_test.shape
Out[13]:
(10000, 28, 28, 1)
We remove the query image from the test set (the set in which we will search for close images)

In [55]:
X_test = np.delete(X_test, 7, axis=0)
In [33]:
X_test.shape
Out[33]:
(9999, 28, 28, 1)
Encode the query image and the test set
In [56]:
codes = encoder.predict(X_test)
In [57]:
query_code = encoder.predict(query.reshape(1,28,28,1))
In [58]:
codes.shape
Out[58]:
(9999, 4, 4, 8)
In [59]:
query_code.shape
Out[59]:
(1, 4, 4, 8)
Find the closest images
In [60]:
from sklearn.neighbors import NearestNeighbors
We will find the 5 closest images

In [89]:
n_neigh = 5
In [90]:
codes = codes.reshape(-1, 4*4*8); print(codes.shape)
query_code = query_code.reshape(1, 4*4*8); print(query_code.shape)
(9999, 128)
(1, 128)
Fit the KNN to the test set
In [91]:
nbrs = NearestNeighbors(n_neighbors=n_neigh).fit(codes)
In [92]:
distances, indices = nbrs.kneighbors(np.array(query_code))
In [93]:
closest_images = X_test[indices]
In [97]:
closest_images = closest_images.reshape(-1,28,28,1); print(closest_images.shape)
(5, 28, 28, 1)
Get the closest images
In [98]:
plt.imshow(query.reshape(28,28), cmap='gray')
Out[98]:
<matplotlib.image.AxesImage at 0x1a436d5ef0>

In [99]:
plt.figure(figsize=(20, 6))
for i in range(n_neigh):
    # display original
    ax = plt.subplot(1, n_neigh, i+1)
    plt.imshow(closest_images[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    
plt.show()

Reference:

  1. https://github.com/nathanhubens/Unsupervised-Image-Retrieval
  2. https://towardsdatascience.com/build-a-simple-image-retrieval-system-with-an-autoencoder-673a262b7921

转载:https://blog.csdn.net/Tong_T/article/details/102566127
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场