代码详解：从头开始学习注意力机制！_小言_互联网的博客

代码详解：从头开始学习注意力机制！

2020-04-01 15:16 629人阅读评论(0)

全文共3028字，预计学习时长6分钟

图片来源：unsplash.com/@titouanc

人类感知的一个重要特质在于个体无法同时处理全部信息。相反，人类将注意力集中于局部视觉空间，以便在需要的时间和场合获取信息并对不同时段的信息进行合并，构建内在场景表现，指导日后的注意力分配及决策过程。

《视觉注意循环模型》，2014

本文将探讨如何实施注意力，以及在实施过程如何脱离更大的模型。这是因为在现实模型中实施注意力时，很多时的重点在于管理数据与控制不同的向量，而非注意力本身。

我们将在进行注意力评分的同时计算注意文本向量。

左：层次式点积注意力机制右：以下将计算的多头注意力机制

注意力评分：

首先浏览一下评分功能的输入。假设处于解码阶段的第一步骤。评分功能的首个输入是解码器的隐藏状态（假定一循环神经网络有三个隐藏节点——尽管在实际应用中并不稳定，但便于说明）。

dec_hidden_state = [5,1,20]

将向量视觉化：

%matplotlib inlineimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns

将解码器隐藏状态视觉化：

plt.figure(figsize=(1.5, 4.5))sns.heatmap(np.transpose(np.matrix(dec_hidden_state)), 
annot=True, cmap=sns.light_palette(“purple”, as_cmap=True), linewidths=1)

结果如下：

评分功能首次运行的结果是单一注释（编码器隐藏状态），如下所示：

annotation = [3,12,45] #e.g. Encoder hidden state

将单一注释视觉化：

plt.figure(figsize=(1.5, 4.5))sns.heatmap(np.transpose(np.matrix(annotation)), 
annot=True, cmap=sns.light_palette(“orange”, as_cmap=True), linewidths=1)

实施：为单一注释评分

计算单一注释的点积。

NumPy点参与此运算过程：

def single_dot_attention_score(dec_hidden_state, enc_hidden_state):
#return the dot product of the two vectors return np.dot
(dec_hidden_state, enc_hidden_state) single_dot_attention_score
(dec_hidden_state, annotation)

结果：927

注释矩阵

来看一下如何同时对所有注释评分。为实现这一目的，形成注释矩阵如下：

annotations = np.transpose([[3,12,45], [59,2,5], [1,43,5], [4,3,45.3]])

可将其视觉化如下（每一栏都是编码器时间步骤的隐藏状态）：

ax = sns.heatmap(annotations, annot=True, cmap=sns.light_palette(“orange”, as_cmap=True), 
linewidths=1)

实施：同时对所有注释评分

应用矩阵操纵计算所有注释评分。继续使用点积评分方法，但首先需要转置dec_hidden_state状态，而后使用矩阵注释扩充。

def dot_attention_score(dec_hidden_state, annotations): 
# return the product of dec_hidden_state transpose and enc_hidden_states 
return np.matmul(np.transpose(dec_hidden_state), annotations)
attention_weights_raw = dot_attention_score(dec_hidden_state, annotations)
attention_weights_raw

获得评分后，应用柔性最大值传输函数：

def softmax(x): x = np.array(x, dtype=np.float128) e_x = np.exp(x) 
return e_x / e_x.sum(axis=0)attention_weights = softmax(attention_weights_raw)
attention_weights

重新对注释评分

评分后，根据评分对每个注释进行扩充，进而得到一个注意力文本向量。下方为该公式的扩充部分（将解决括号中的部分）。

def apply_attention_scores(attention_weights, annotations): 
# Multiple the annotations by their weights return attention_weights 
* annotationsapplied_attention = apply_attention_scores
(attention_weights, annotations)applied_attention

已对注意力重新评分，接下来将文本向量视觉化：

# Let’s visualize our annotations after applying 
attention to themax = sns.heatmap(applied_attention, annot=True, cmap=sns.
light_palette(“orange”, as_cmap=True), linewidths=1)

将结果与视觉化的原始注释对比，可发现第二和第三个注释（栏）几乎被擦除。第一个注释保留部分值，而第四个注释的代表性最强。

计算注意力文本向量

所有持续生成注意力文本向量的部分将总结为这四栏，进而生成单一注意力文本向量。

def calculate_attention_vector(applied_attention): 
return np.sum(applied_attention, axis=1)attention_vector = calculate_
attention_vector(applied_attention)attention_vector
# Let’s visualize the attention context vectorplt.figure(figsize=(1.5, 4.5))sns.
heatmap(np.transpose(np.matrix(attention_vector)), annot=True, cmap=sns.
light_palette(“Blue”, as_cmap=True), linewidths=1)