波士顿房价预测
本示例将以“波士顿房价”任务为例,了解使用Python语言和Numpy库来构建神经网络模型的思考过程和操作方法。
波士顿房价预测是一个经典的机器学习任务,和大家对房价的普遍认知相同,波士顿地区的房价是由诸多因素影响的。
数据集
该数据集统计了13种可能影响房价的因素和该类型房屋的均价,期望构建一个基于13个因素进行房价预测的模型,如图1所示。
图1 波士顿房价影响因素示意图
对于预测问题,可以根据预测输出的类型是连续的实数值还是离散的标签,区分为回归任务和分类任务。因为房价是一个连续值,所以房价预测显然是一个回归任务。下面我们尝试用最简单的线性回归模型解决这个问题,并用神经网络来实现这个模型。
线性回归模型
假设房价和各影响因素之间能够用线性关系来描述:,模型的求解即是通过数据拟合出每个和.其中,和分别表示该线性模型的权重和偏置。一维情况下,和是直线的斜率和截距。
线性回归模型使用均方误差作为损失函数(Loss),用以衡量预测房价和真实房价的差异,公式如下:
线性回归模型的神经网络结构
神经网络的标准结构中每个神经元由加权和非线性变换构成,然后将多个神经元分层的摆放并连接形成神经网络。线性回归模型可以认为是神经网络模型的一种极简特例,是一个只有加权和、没有非线性变换的神经元(无需形成网络),如图2所示。
图2 线性回归模型的神经网络结构
构建波士顿房价预测任务的神经网络模型
深度学习不仅实现了模型的端到端学习,还推动了人工智能进入工业大生产阶段,产生了标准化、自动化和模块化的通用框架。不同场景的深度学习模型具备一定的通用性,五个步骤即可完成模型的构建和训练,如图3所示:
图3 构建神经网络/深度学习模型的基本步骤
正是由于深度学习的建模和训练的过程中存在通用性,在构建不同的模型时,只有模型三要素不同,其它步骤基本一致。
数据处理
数据处理包含五个部分:数据导入、数据形状变换、数据集划分、数据归一化处理和封装load data函数。数据预处理后,才能被模型调用。
读入数据:
通过如下代码读入数据,了解波士顿房价的数据集结构,数据存放在本地目录下housing.data文件中。
-
#导入需要用到的package
-
import numpy
as np
-
import json
-
#读入训练数据
-
datafile=
'./work/housing.data'
-
data = np.fromfile(datafile, sep=
'')
-
data
array([6.320e-03, 1.800e+01, 2.310e+00, ..., 3.969e+02, 7.880e+00,1.190e+01])
数据形状变换
由于读入的原始数据是1维的,所有数据都连在一起。因此需要我们将数据的形状进行变换,形成一个2维的矩阵,每行的一个数据样本(14个值),每个样本包含13个X(影响房价的特征)和一个Y(该类型房屋的均价)。
-
#读入之后的数据被转化成
1维的array,其中array的第
0
-13项是第一条数据,第
14
-27项是第二条数据
-
#这里对原始数据做reshape,变成N*
14的形式。
-
feature_names = [
'CRIM',
'ZN',
'INDUS',
'CHAS',
'NOX',
'RM',
'AGE',
'DIS',
-
'RAD',
'TAX',
'PTRATIO',
'B',
'LSTAT',
'MEDV']
-
feature_num = len(feature_names)
-
data = data.reshape([data.shape[
0]
// feature_num,feature_num])
-
-
#查看数据
-
x = data[
0]
-
print(x.shape)
-
print(x)
-
(14,)
-
[6.320e-03 1.800e+01 2.310e+00 0.000e+00 5.380e-01 6.575e+00 6.520e+01 4.090e+00 1.000e+00 2.960e+02 1.530e+01 3.969e+02 4.980e+00 2.400e+01]
数据集划分
将数据集划分为训练集和测试集,其中训练集用于确定模型的参数,测试集用于评判模型的效果。在本示例中,将80%的数据用于训练集,20%作为测试集。代码如下,通过打印训练集的形状,可以发现共有404个样本数据,每个样本含有13个特征值和1个预测值。
-
ratio =
0.8
-
offset = int(data.shape[
0] * ratio)
-
training_data = data[:offset]
-
training_data.shape
结果如下:
(404, 14)
数据归一化处理:
对每个特征进行归一化处理,使得每个特征的取值缩放到0~1之间。这样做有两个好处:一是模型训练更高效;二是特征前的权重大小可以代表变量对预测结果的贡献度(因为每个特征本身的范围相同)。
-
#计算train数据集的最大值、最小值、平均值
-
maximums,minimums, avgs = \
-
training_data.max(axis=
0), \
-
training_data.min(axis=
0), \
-
training_data.sum(axis=
0) / training_data.shape[
0]
-
# 对数据进行归一化处理
-
for i
in range(feature_num):
-
#print(maximums[i], minimums[i], avgs[i])
-
data[:, i] = (data[:, i] - avgs[i]) / (maximums[i] - minimums[i])
封装成load data函数:
将上述几个数据处理操作封装成load data函数,以便下一步模型的调用,代码如下:
-
def load_data():
-
#从文件导入数据
-
datafile =
'./work/housing.data'
-
data = np.formfile(datafile, sep=
'')
-
# 每条数据包括
14项,其中前面
13项是影响因素,第
14项是相应的房屋价格中位数
-
feature_names = [
'CRIM',
'ZN',
'INDUS',
'CHAS',
'NOX',
'RM',
'AGE', \
-
'DIS',
'RAD',
'TAX',
'PTRATIO',
'B',
'LSTAT',
'MEDV' ]
-
feature_num = len(feature_names)
-
-
# 将原始数据进行Reshape,变成[N,
14]这样的形状
-
data = data.reshape([data.shape[
0]
// feature_num, feature_num])
-
-
# 将原数据集拆分成训练集和测试集
-
# 这里使用
80%的数据做训练,
20%的数据做测试
-
# 测试集和训练集必须是没有交集的
-
ratio =
0.8
-
offset = int(data.shape[
0] * ratio)
-
training_data = data[:offset]
-
-
# 计算train数据集的最大值,最小值,平均值
-
maximums, minimums, avgs = training_data.max(axis=
0), training_data.min(axis=
0), \
-
training_data.sum(axis=
0) / training_data.shape[
0]
-
-
# 对数据进行归一化处理
-
for i
in range(feature_num):
-
#print(maximums[i], minimums[i], avgs[i])
-
data[:, i] = (data[:, i] - avgs[i]) / (maximums[i] - minimums[i])
-
-
# 训练集和测试集的划分比例
-
training_data = data[:offset]
-
test_data = data[offset:]
-
return training_data, test_data
-
#获取数据
-
training_data, test_data = load_data()
-
x = training_data[:, :
-1]
-
y = training_data[:,
-1:]
-
# 查看数据
-
print(x[
0])
-
print(y[
0])
-
[-0.02146321 0.03767327 -0.28552309 -0.08663366 0.01289726 0.04634817 0.00795597 -0.00765794 -0.25172191 -0.11881188 -0.29002528 0.0519112 -0.17590923]
-
[-0.00390539]
模型设计
模型设计是深度学习模型关键要素之一,也称为网络结构设计,相当于模型的假设空间,即实现模型”前向计算“(从输入到输出)的过程。如果将输入特征和输出预测值均以向量表示,输入特征x有13个分量,y有1个分量,那么参数权重的形状(shape)是13*1.假设我们以如下任意数字赋值参数做初始化:
-
w = [
0.1,
0.2,
0.3,
0.4,
0.5,
0.6,
0.7,
0.8,
-0.1,
-0.2,
-0.3,
-0.4,
0.0]
-
w = np.array(w).reshape([
13,
1])
取出第1条样本数据,观察样本的特征向量与参数向量相乘的结果。
-
x1=x[
0]
-
t = np.dot(x1, w)
-
print(t)
结果如下:
[0.03395597]
完整的线性回归公式,还需要初始化偏移量b,同样随意赋初值-0.2.那么,线性回归模型的完整输出是z=t+b,这个从特征和参数计算输出值的过程称为“前向计算”。
-
b =
-0.2
-
z = t + b
-
print(z)
输出的结果是
[-0.16604403]
将上述计算预测输出的过程以”类和对象“的方式来描述,类成员变量有参数w和b.通过写一个forward函数(代表前向计算)完成上述从特征和参数到输出预测值的计算过程,代码如下:
-
class Network(object):
-
def __init__(self, num_of_weights):
-
# 随机产生w的初始值
-
# 为了保持程序每次运行结果的一致性,
-
# 此处设置固定的随机数种子
-
np.random.seed(0)
-
self.w = np.random.randn(num_of_weights,
1)
-
self.b =
0.
-
-
def forward(self, x):
-
z = np.dot(x, self.w) + self.b
-
return z
基于Network类的定义,模型计算过程如下
-
net = Network(
13)
-
x1 = x[
0]
-
y1 = y[
0]
-
z = net.forward(x1)
-
print(z)
[-0.63182506]
训练配置:
模型设计完成后,需要通过训练配置寻找模型的最优值,即通过损失函数来衡量模型的好坏。Loss作为损失函数,用来衡量模型好坏的指标。在回归问题中均方误差是一种比较常见的形式,分类问题通常会采用交叉熵作为损失函数。
-
Loss = (y1 - z)*(y1 - z)
-
print(Loss)
结果如下:
[0.39428312]
因为计算损失时需要把每个样本的损失都考虑到,所以我们需要对单个样本的损失函数进行求和,并除以样本总数N.
在Network类下面添加损失函数的计算过程如下:
-
class Network(object):
-
def __init__(self, num_of_weights):
-
# 随机产生w的初始值
-
# 为了保持程序每次运行结果的一致性,此处设置固定的随机数种子
-
np.random.seed(0)
-
self.w = np.random.randn(num_of_weights,
1)
-
self.b =
0.
-
-
def forward(self, x):
-
z = np.dot(x, self.w) + self.b
-
return z
-
-
def loss(self, z, y):
-
error = z - y
-
cost = error * error
-
cost = np.mean(cost)
-
return cost
下面就进行预测值和损失函数
-
net = Network(
13)
-
# 此处可以一次性计算多个样本的预测值和损失函数
-
x1 = x[
0:
3]
-
y1 = y[
0:
3]
-
z = net.forward(x1)
-
print(
'predict: ', z)
-
loss = net.loss(z, y1)
-
print(
'loss:', loss)
-
predict: [[-0.63182506] [-0.55793096] [-1.00062009]]
-
loss: 0.7229825055441156
训练过程
接下来介绍如何求解参数w和b的数值,这个过程也称为模型训练过程,目标就是让定义的损失函数Loss尽可能的小。
-
net = Network(
13)
-
losses = []
-
#只画出参数w3和w9在区间[
-160,
160]的曲线部分,以及包含损失函数的极值
-
w3 = np.arange(
-160.0,
160.0,
1.0)
-
w9 = np.arange(
-160.0,
160.0,
1.0)
-
losses = np.zeros([len(w3), len(w9)])
-
-
#计算设定区域内每个参数取值所对应的Loss
-
for i
in range(len(w3)):
-
for j
in range(len(w9)):
-
net.w[
3] = w3[i]
-
net.w[
9] = w9[j]
-
z = net.forward(x)
-
loss = net.loss(z, y)
-
losses[i, j] = loss
-
-
#使用matplotlib将两个变量和对应的Loss作
3D图
-
import matplotlib.pyplot
as plt
-
from mpl_toolkits.mplot3d
import Axes3D
-
fig = plt.figure()
-
ax = Axes3D(fig)
-
-
w5, w9 = np.meshgrid(w3, w9)
-
-
ax.plot_surface(w3, w9, losses, rstride=
1, cstride=
1, cmap=
'rainbow')
-
plt.show()
可以通过具体的程序查看每个变量的数据和维度。
-
x1 = x[
0]
-
y1 = y[
0]
-
z1 = net.forward(x1)
-
print(
'x1 {}, shape {}'.format(x1, x1.shape))
-
print(
'y1 {}, shape {}'.format(y1, y1.shape))
-
print(
'z1 {}, shape {}'.format(z1, z1.shape))
-
x1 [-0.02146321 0.03767327 -0.28552309 -0.08663366 0.01289726 0.04634817 0.00795597 -0.00765794 -0.25172191 -0.11881188 -0.29002528 0.0519112 -0.17590923], shape (13,)
-
y1 [-0.00390539], shape (1,)
-
z1 [-12.05947643], shape (1,)
使用Numpy进行梯度计算
基于Numpy进行梯度计算,(对向量和矩阵计算如同对一个单一变量计算一样),可以更快速的实现梯度计算,计算梯度的代码中直接用(z1-y1)*x1,得到的是一个13维的向量,每个分量分别代表该维度的梯度。
-
z = net.forward(x)
-
gradient_w = (z - y) * x
-
print(
'gradient_w shape {}'.format(gradient_w.shape))
-
print(gradient_w)
-
gradient_w shape (404, 13)
-
[[ 0.25875126 -0.45417275 3.44214394 ... 3.49642036 -0.62581917 2.12068622]
-
[ 0.7329239 4.91417754 3.33394253 ... 0.83100061 -1.79236081 2.11028056]
-
[ 0.25138584 1.68549775 1.14349809 ... 0.28502219 -0.46695229 2.39363651]
-
...
-
[ 14.70025543 -15.10890735 36.23258734 ... 24.54882966 5.51071122 26.26098922]
-
[ 9.29832217 -15.33146159 36.76629344 ... 24.91043398 -1.27564923 26.61808955]
-
[ 19.55115919 -10.8177237 25.94192351 ... 17.5765494 3.94557661 17.64891012]]
输入数据中有多个样本,每个样本都对梯度有贡献。如上代码计算了只有样本1时的梯度值,同样的计算方法也可以计算样本2和样本3对梯度的贡献。
-
x2 = x[
1]
-
-
y2 = y[
1]
-
-
z2 = net.forward(x2)
-
-
gradient_w = (z2 - y2) * x2
-
-
print(
'gradient_w_by_sample2 {}, gradient.shape {}'.format(gradient_w, gradient_w.shape))
-
gradient_w_by_sample2 [ 0.7329239 4.91417754 3.33394253 2.9912385 4.45673435 -0.58146277 -5.14623287 -2.4894594 7.19011988 7.99471607 0.83100061 -1.79236081
-
2.11028056], gradient.shape (13,)
-
-
x3 = x[
2]
-
-
y3 = y[
2]
-
-
z3 = net.forward(x3)
-
-
gradient_w = (z3 - y3) * x3
-
-
print(
'gradient_w_by_sample3 {}, gradient.shape {}'.format(gradient_w, gradient_w.shape))
-
gradient_w_by_sample3 [ 0.25138584 1.68549775 1.14349809 1.02595515 1.5286008 -1.93302947 0.4058236 -0.85385157 2.46611579 2.74208162 0.28502219 -0.46695229
-
2.39363651], gradient.shape (13,)
可能有的读者再次想到可以使用for循环把每个样本对梯度的贡献都计算出来,然后再作平均。但是我们不需要这么做,仍然可以使用Numpy的矩阵操作来简化运算,如3个样本的情况。
-
# 注意这里是一次取出
3个样本的数据,不是取出第
3个样本
-
-
x3samples = x[
0:
3]
-
-
y3samples = y[
0:
3]
-
-
z3samples = net.forward(x3samples)
-
-
-
-
print(
'x {}, shape {}'.format(x3samples, x3samples.shape))
-
-
print(
'y {}, shape {}'.format(y3samples, y3samples.shape))
-
-
print(
'z {}, shape {}'.format(z3samples, z3samples.shape))
-
-
x [[-0.02146321 0.03767327 -0.28552309 -0.08663366 0.01289726 0.04634817 0.00795597 -0.00765794 -0.25172191 -0.11881188 -0.29002528 0.0519112
-
-0.17590923]
-
[-0.02122729 -0.14232673 -0.09655922 -0.08663366 -0.12907805 0.0168406 0.14904763 0.0721009 -0.20824365 -0.23154675 -0.02406783 0.0519112
-
-0.06111894]
-
[-0.02122751 -0.14232673 -0.09655922 -0.08663366 -0.12907805 0.1632288 -0.03426854 0.0721009 -0.20824365 -0.23154675 -0.02406783 0.03943037
-
-0.20212336]], shape (3, 13)
-
y [[-0.00390539]
-
[-0.05723872]
-
[ 0.23387239]], shape (3, 1)
-
z [[-12.05947643]
-
[-34.58467747]
-
[-11.60858134]], shape (3, 1)
上面的x3samples, y3samples, z3samples的第一维大小均为3,表示有3个样本。下面计算这3个样本对梯度的贡献。
-
gradient_w = (z3samples - y3samples) * x3samples
-
-
print(
'gradient_w {}, gradient.shape {}'.format(gradient_w, gradient_w.shape))
-
gradient_w [[ 0.25875126 -0.45417275 3.44214394 1.04441828 -0.15548386 -0.55875363 -0.09591377 0.09232085 3.03465138 1.43234507 3.49642036 -0.62581917
-
2.12068622]
-
[ 0.7329239 4.91417754 3.33394253 2.9912385 4.45673435 -0.58146277 -5.14623287 -2.4894594 7.19011988 7.99471607 0.83100061 -1.79236081
-
2.11028056]
-
[ 0.25138584 1.68549775 1.14349809 1.02595515 1.5286008 -1.93302947 0.4058236 -0.85385157 2.46611579 2.74208162 0.28502219 -0.46695229
-
2.39363651]], gradient.shape (3, 13)
此处可见,计算梯度gradient_w的维度是3×133 \times 133×13,并且其第1行与上面第1个样本计算的梯度gradient_w_by_sample1一致,第2行与上面第2个样本计算的梯度gradient_w_by_sample1一致,第3行与上面第3个样本计算的梯度gradient_w_by_sample1一致。这里使用矩阵操作,可能更加方便的对3个样本分别计算各自对梯度的贡献。
那么对于有N个样本的情形,我们可以直接使用如下方式计算出所有样本对梯度的贡献,这就是使用Numpy库广播功能带来的便捷。 小结一下这里使用Numpy库的广播功能:
- 一方面可以扩展参数的维度,代替for循环来计算1个样本对从w0 到w12 的所有参数的梯度。
- 另一方面可以扩展样本的维度,代替for循环来计算样本0到样本403对参数的梯度。
-
z = net.forward(x)
-
-
gradient_w = (z - y) * x
-
-
print(
'gradient_w shape {}'.format(gradient_w.shape))
-
-
print(gradient_w)
-
gradient_w shape (404, 13)
-
[[ 0.25875126 -0.45417275 3.44214394 ... 3.49642036 -0.62581917 2.12068622]
-
[ 0.7329239 4.91417754 3.33394253 ... 0.83100061 -1.79236081 2.11028056]
-
[ 0.25138584 1.68549775 1.14349809 ... 0.28502219 -0.46695229 2.39363651]
-
...
-
[ 14.70025543 -15.10890735 36.23258734 ... 24.54882966 5.51071122 26.26098922]
-
[ 9.29832217 -15.33146159 36.76629344 ... 24.91043398 -1.27564923 26.61808955]
-
[ 19.55115919 -10.8177237 25.94192351 ... 17.5765494 3.94557661 17.64891012]]
上面gradient_w的每一行代表了一个样本对梯度的贡献。根据梯度的计算公式,总梯度是对每个样本对梯度贡献的平均值。
我们也可以使用Numpy的均值函数来完成此过程:
-
# axis =
0 表示把每一行做相加然后再除以总的行数
-
-
gradient_w = np.mean(gradient_w, axis=
0)
-
-
print(
'gradient_w ', gradient_w.shape)
-
-
print(
'w ', net.w.shape)
-
-
print(gradient_w)
-
-
print(net.w)
-
gradient_w (13,)
-
w (13, 1)
-
[ 1.59697064 -0.92928123 4.72726926 1.65712204 4.96176389 1.18068454 4.55846519 -3.37770889 9.57465893 10.29870662 1.3900257 -0.30152215
-
1.09276043]
-
[[ 1.76405235e+00]
-
[ 4.00157208e-01]
-
[ 9.78737984e-01]
-
[ 2.24089320e+00]
-
[ 1.86755799e+00]
-
[ 1.59000000e+02]
-
[ 9.50088418e-01]
-
[-1.51357208e-01]
-
[-1.03218852e-01]
-
[ 1.59000000e+02]
-
[ 1.44043571e-01]
-
[ 1.45427351e+00]
-
[ 7.61037725e-01]]
我们使用numpy的矩阵操作方便的完成了gradient的计算,但引入了一个问题,gradient_w的形状是(13,),而w的维度是(13, 1)。导致该问题的原因是使用np.mean函数的时候消除了第0维。为了加减乘除等计算方便,gradient_w和w必须保持一致的形状。因此我们将gradient_w的维度也设置为(13, 1),代码如下:
-
gradient_w = gradient_w[:, np.newaxis]
-
-
print(
'gradient_w shape', gradient_w.shape)
gradient_w shape (13, 1)
综合上面的讨论,计算梯度的代码如下所示。
-
z = net.forward(x)
-
-
gradient_w = (z - y) * x
-
-
gradient_w = np.mean(gradient_w, axis=
0)
-
-
gradient_w = gradient_w[:, np.newaxis]
-
-
gradient_w
-
array([[ 1.59697064],
-
[-0.92928123],
-
[ 4.72726926],
-
[ 1.65712204],
-
[ 4.96176389],
-
[ 1.18068454],
-
[ 4.55846519],
-
[-3.37770889],
-
[ 9.57465893],
-
[10.29870662],
-
[ 1.3900257 ],
-
[-0.30152215],
-
[ 1.09276043]])
上述代码非常简洁的完成了www的梯度计算。同样,计算bbb的梯度的代码也是类似的原理。
-
-
gradient_b = (z - y)
-
-
gradient_b = np.mean(gradient_b)
-
-
# 此处b是一个数值,所以可以直接用np.mean得到一个标量
-
-
gradient_b
-
-
-
-1.0918438870293816e-13
将上面计算www和bbb的梯度的过程,写成Network类的gradient函数,代码如下所示。
-
class Network(object):
-
-
def __init__(self, num_of_weights):
-
-
# 随机产生w的初始值
-
-
# 为了保持程序每次运行结果的一致性,此处设置固定的随机数种子
-
-
np.random.seed(0)
-
-
self.w = np.random.randn(num_of_weights,
1)
-
-
self.b =
0.
-
-
-
-
def forward(self, x):
-
-
z = np.dot(x, self.w) + self.b
-
-
return z
-
-
-
-
def loss(self, z, y):
-
-
error = z - y
-
-
num_samples = error.shape[
0]
-
-
cost = error * error
-
-
cost = np.sum(cost) / num_samples
-
-
return cost
-
-
-
-
def gradient(self, x, y):
-
-
z = self.forward(x)
-
-
gradient_w = (z-y)*x
-
-
gradient_w = np.mean(gradient_w, axis=
0)
-
-
gradient_w = gradient_w[:, np.newaxis]
-
-
gradient_b = (z - y)
-
-
gradient_b = np.mean(gradient_b)
-
-
-
-
return gradient_w, gradient_b
-
-
-
-
# 调用上面定义的gradient函数,计算梯度
-
-
# 初始化网络,
-
-
net = Network(
13)
-
-
# 设置[w5, w9] = [
-100., +
100.]
-
-
net.w[
5] =
-100.0
-
-
net.w[
9] =
-100.0
-
-
-
-
z = net.forward(x)
-
-
loss = net.loss(z, y)
-
-
gradient_w, gradient_b = net.gradient(x, y)
-
-
gradient_w5 = gradient_w[
5][
0]
-
-
gradient_w9 = gradient_w[
9][
0]
-
-
print(
'point {}, loss {}'.format([net.w[
5][
0], net.w[
9][
0]], loss))
-
-
print(
'gradient {}'.format([gradient_w5, gradient_w9]))
-
-
-
point [-100.0, -100.0], loss 686.3005008179159
-
gradient [-0.850073323995813, -6.138412364807849]
确定损失函数更小的点
下面我们开始研究更新梯度的方法。首先沿着梯度的反方向移动一小步,找到下一个点P1,观察损失函数的变化。
-
# 在[w5, w9]平面上,沿着梯度的反方向移动到下一个点P1
-
-
# 定义移动步长 eta
-
-
eta =
0.1
-
-
# 更新参数w5和w9
-
-
net.w[
5] = net.w[
5] - eta * gradient_w5
-
-
net.w[
9] = net.w[
9] - eta * gradient_w9
-
-
# 重新计算z和loss
-
-
z = net.forward(x)
-
-
loss = net.loss(z, y)
-
-
gradient_w, gradient_b = net.gradient(x, y)
-
-
gradient_w5 = gradient_w[
5][
0]
-
-
gradient_w9 = gradient_w[
9][
0]
-
-
print(
'point {}, loss {}'.format([net.w[
5][
0], net.w[
9][
0]], loss))
-
-
print(
'gradient {}'.format([gradient_w5, gradient_w9]))
-
point [-99.91499266760042, -99.38615876351922], loss 678.6472185028845
-
gradient [-0.8556356178645292, -6.0932268634065805]
运行上面的代码,可以发现沿着梯度反方向走一小步,下一个点的损失函数的确减少了。感兴趣的话,大家可以尝试不停的点击上面的代码块,观察损失函数是否一直在变小。
在上述代码中,每次更新参数使用的语句: net.w[5] = net.w[5] - eta * gradient_w5
- 相减:参数需要向梯度的反方向移动。
- eta:控制每次参数值沿着梯度反方向变动的大小,即每次移动的步长,又称为学习率。
大家可以思考下,为什么之前我们要做输入特征的归一化,保持尺度一致?这是为了让统一的步长更加合适。
如 图8 所示,特征输入归一化后,不同参数输出的Loss是一个比较规整的曲线,学习率可以设置成统一的值 ;特征输入未归一化时,不同特征对应的参数所需的步长不一致,尺度较大的参数需要大步长,尺寸较小的参数需要小步长,导致无法设置统一的学习率。
图8:未归一化的特征,会导致不同特征维度的理想步长不同
代码封装Train函数
将上面的循环的计算过程封装在train和update函数中,代码如下所示。
-
class Network(object):
-
-
def __init__(self, num_of_weights):
-
-
# 随机产生w的初始值
-
-
# 为了保持程序每次运行结果的一致性,此处设置固定的随机数种子
-
-
np.random.seed(0)
-
-
self.w = np.random.randn(num_of_weights,
1)
-
-
self.w[
5] =
-100.
-
-
self.w[
9] =
-100.
-
-
self.b =
0.
-
-
-
-
def forward(self, x):
-
-
z = np.dot(x, self.w) + self.b
-
-
return z
-
-
-
-
def loss(self, z, y):
-
-
error = z - y
-
-
num_samples = error.shape[
0]
-
-
cost = error * error
-
-
cost = np.sum(cost) / num_samples
-
-
return cost
-
-
-
-
def gradient(self, x, y):
-
-
z = self.forward(x)
-
-
gradient_w = (z-y)*x
-
-
gradient_w = np.mean(gradient_w, axis=
0)
-
-
gradient_w = gradient_w[:, np.newaxis]
-
-
gradient_b = (z - y)
-
-
gradient_b = np.mean(gradient_b)
-
-
return gradient_w, gradient_b
-
-
-
-
def update(self, graident_w5, gradient_w9, eta=
0.01):
-
-
net.w[
5] = net.w[
5] - eta * gradient_w5
-
-
net.w[
9] = net.w[
9] - eta * gradient_w9
-
-
-
-
def train(self, x, y, iterations=
100, eta=
0.01):
-
-
points = []
-
-
losses = []
-
-
for i
in range(iterations):
-
-
points.append([net.w[
5][
0], net.w[
9][
0]])
-
-
z = self.forward(x)
-
-
L = self.loss(z, y)
-
-
gradient_w, gradient_b = self.gradient(x, y)
-
-
gradient_w5 = gradient_w[
5][
0]
-
-
gradient_w9 = gradient_w[
9][
0]
-
-
self.update(gradient_w5, gradient_w9, eta)
-
-
losses.append(L)
-
-
if i %
50 ==
0:
-
-
print(
'iter {}, point {}, loss {}'.format(i, [net.w[
5][
0], net.w[
9][
0]], L))
-
-
return points, losses
-
-
-
-
# 获取数据
-
-
train_data, test_data = load_data()
-
-
x = train_data[:, :
-1]
-
-
y = train_data[:,
-1:]
-
-
# 创建网络
-
-
net = Network(
13)
-
-
num_iterations=
2000
-
-
# 启动训练
-
-
points, losses = net.train(x, y, iterations=num_iterations, eta=
0.01)
-
-
-
-
# 画出损失函数的变化趋势
-
-
plot_x = np.arange(num_iterations)
-
-
plot_y = np.array(losses)
-
-
plt.plot(plot_x, plot_y)
-
-
plt.show()
-
iter 0, point [-99.99144364382136, -99.93861587635192], loss 686.3005008179159
-
iter 50, point [-99.56362583488914, -96.92631128470325], loss 649.221346830939
-
iter 100, point [-99.13580802595692, -94.02279509580971], loss 614.6970095624063
-
iter 150, point [-98.7079902170247, -91.22404911807594], loss 582.543755023494
-
iter 200, point [-98.28017240809248, -88.52620357520894], loss 552.5911329872217
-
iter 250, point [-97.85235459916026, -85.9255316243737], loss 524.6810152322887
-
iter 300, point [-97.42453679022805, -83.41844407682491], loss 498.6667034691001
-
iter 350, point [-96.99671898129583, -81.00148431353688], loss 474.4121018974464
-
iter 400, point [-96.56890117236361, -78.67132338862874], loss 451.7909497114133
-
iter 450, point [-96.14108336343139, -76.42475531364933], loss 430.68610920670284
-
iter 500, point [-95.71326555449917, -74.25869251604028], loss 410.988905460488
-
iter 550, point [-95.28544774556696, -72.17016146534513], loss 392.5985138460824
-
iter 600, point [-94.85762993663474, -70.15629846096763], loss 375.4213919156372
-
iter 650, point [-94.42981212770252, -68.21434557551346], loss 359.3707524354014
-
iter 700, point [-94.0019943187703, -66.34164674796719], loss 344.36607459115214
-
iter 750, point [-93.57417650983808, -64.53564402117185], loss 330.33265059761464
-
iter 800, point [-93.14635870090586, -62.793873918279786], loss 317.2011651461846
-
iter 850, point [-92.71854089197365, -61.11396395304264], loss 304.907305311265
-
iter 900, point [-92.29072308304143, -59.49362926899678], loss 293.3913987080144
-
iter 950, point [-91.86290527410921, -57.930669402782904], loss 282.5980778542974
-
iter 1000, point [-91.43508746517699, -56.4229651670156], loss 272.47596883802515
-
iter 1050, point [-91.00726965624477, -54.968475648286564], loss 262.9774025287022
-
iter 1100, point [-90.57945184731255, -53.56523531604897], loss 254.05814669965383
-
iter 1150, point [-90.15163403838034, -52.21135123828792], loss 245.67715754581488
-
iter 1200, point [-89.72381622944812, -50.90500040003218], loss 237.796349191773
-
iter 1250, point [-89.2959984205159, -49.6444271209092], loss 230.3803798866218
-
iter 1300, point [-88.86818061158368, -48.42794056808474], loss 223.3964536766492
-
iter 1350, point [-88.44036280265146, -47.2539123610643], loss 216.81413643451378
-
iter 1400, point [-88.01254499371925, -46.12077426496303], loss 210.60518520483126
-
iter 1450, point [-87.58472718478703, -45.027015968976976], loss 204.74338990147896
-
iter 1500, point [-87.15690937585481, -43.9711829469081], loss 199.20442646183588
-
iter 1550, point [-86.72909156692259, -42.95187439671279], loss 193.96572062803054
-
iter 1600, point [-86.30127375799037, -41.96774125615467], loss 189.00632158541163
-
iter 1650, point [-85.87345594905815, -41.017484291751295], loss 184.3067847442463
-
iter 1700, point [-85.44563814012594, -40.0998522583068], loss 179.84906300239203
-
iter 1750, point [-85.01782033119372, -39.21364012642417], loss 175.61640587468244
-
iter 1800, point [-84.5900025222615, -38.35768737548557], loss 171.59326591927967
-
iter 1850, point [-84.16218471332928, -37.530876349682856], loss 167.76521193253296
-
iter 1900, point [-83.73436690439706, -36.73213067476985], loss 164.11884842217898
-
iter 1950, point [-83.30654909546485, -35.96041373329276], loss 160.64174090423475
训练扩展到全部参数
为了能给读者直观的感受,上面演示的梯度下降的过程仅包含w5w_5w5和w9w_9w9两个参数,但房价预测的完整模型,必须要对所有参数www和bbb进行求解。这需要将Network中的update和train函数进行修改。由于不再限定参与计算的参数(所有参数均参与计算),修改之后的代码反而更加简洁。实现逻辑:“前向计算输出、根据输出和真实值计算Loss、基于Loss和输入计算梯度、根据梯度更新参数值”四个部分反复执行,直到到达参数最优点。具体代码如下所示。
-
class Network(object):
-
-
def __init__(self, num_of_weights):
-
-
# 随机产生w的初始值
-
-
# 为了保持程序每次运行结果的一致性,此处设置固定的随机数种子
-
-
np.random.seed(0)
-
-
self.w = np.random.randn(num_of_weights,
1)
-
-
self.b =
0.
-
-
-
-
def forward(self, x):
-
-
z = np.dot(x, self.w) + self.b
-
-
return z
-
-
-
-
def loss(self, z, y):
-
-
error = z - y
-
-
num_samples = error.shape[
0]
-
-
cost = error * error
-
-
cost = np.sum(cost) / num_samples
-
-
return cost
-
-
-
-
def gradient(self, x, y):
-
-
z = self.forward(x)
-
-
gradient_w = (z-y)*x
-
-
gradient_w = np.mean(gradient_w, axis=
0)
-
-
gradient_w = gradient_w[:, np.newaxis]
-
-
gradient_b = (z - y)
-
-
gradient_b = np.mean(gradient_b)
-
-
return gradient_w, gradient_b
-
-
-
-
def update(self, gradient_w, gradient_b, eta =
0.01):
-
-
self.w = self.w - eta * gradient_w
-
-
self.b = self.b - eta * gradient_b
-
-
-
-
def train(self, x, y, iterations=
100, eta=
0.01):
-
-
losses = []
-
-
for i
in range(iterations):
-
-
z = self.forward(x)
-
-
L = self.loss(z, y)
-
-
gradient_w, gradient_b = self.gradient(x, y)
-
-
self.update(gradient_w, gradient_b, eta)
-
-
losses.append(L)
-
-
if (i+
1) %
10 ==
0:
-
-
print(
'iter {}, loss {}'.format(i, L))
-
-
return losses
-
-
-
-
# 获取数据
-
-
train_data, test_data = load_data()
-
-
x = train_data[:, :
-1]
-
-
y = train_data[:,
-1:]
-
-
# 创建网络
-
-
net = Network(
13)
-
-
num_iterations=
1000
-
-
# 启动训练
-
-
losses = net.train(x,y, iterations=num_iterations, eta=
0.01)
-
-
-
-
# 画出损失函数的变化趋势
-
-
plot_x = np.arange(num_iterations)
-
-
plot_y = np.array(losses)
-
-
plt.plot(plot_x, plot_y)
-
-
plt.show()
-
iter 9, loss 1.8984947314576224
-
iter 19, loss 1.8031783384598725
-
iter 29, loss 1.7135517565541092
-
iter 39, loss 1.6292649416831264
-
iter 49, loss 1.5499895293373231
-
iter 59, loss 1.4754174896452612
-
iter 69, loss 1.4052598659324693
-
iter 79, loss 1.3392455915676864
-
iter 89, loss 1.2771203802372915
-
iter 99, loss 1.218645685090292
-
iter 109, loss 1.1635977224791534
-
iter 119, loss 1.111766556287068
-
iter 129, loss 1.0629552390811503
-
iter 139, loss 1.0169790065644477
-
iter 149, loss 0.9736645220185994
-
iter 159, loss 0.9328491676343147
-
iter 169, loss 0.8943803798194307
-
iter 179, loss 0.8581150257549611
-
iter 189, loss 0.8239188186389669
-
iter 199, loss 0.7916657692169988
-
iter 209, loss 0.761237671346902
-
iter 219, loss 0.7325236194855752
-
iter 229, loss 0.7054195561163928
-
iter 239, loss 0.6798278472589763
-
iter 249, loss 0.6556568843183528
-
iter 259, loss 0.6328207106387195
-
iter 269, loss 0.6112386712285092
-
iter 279, loss 0.59083508421862
-
iter 289, loss 0.5715389327049418
-
iter 299, loss 0.5532835757100347
-
iter 309, loss 0.5360064770773406
-
iter 319, loss 0.5196489511849665
-
iter 329, loss 0.5041559244351539
-
iter 339, loss 0.48947571154034963
-
iter 349, loss 0.47555980568755696
-
iter 359, loss 0.46236268171965056
-
iter 369, loss 0.44984161152579916
-
iter 379, loss 0.43795649088328303
-
iter 389, loss 0.4266696770400226
-
iter 399, loss 0.41594583637124666
-
iter 409, loss 0.4057518014851036
-
iter 419, loss 0.3960564371908221
-
iter 429, loss 0.38683051477942226
-
iter 439, loss 0.3780465941011246
-
iter 449, loss 0.3696789129556087
-
iter 459, loss 0.3617032833413179
-
iter 469, loss 0.3540969941381648
-
iter 479, loss 0.3468387198244131
-
iter 489, loss 0.3399084348532937
-
iter 499, loss 0.33328733333814486
-
iter 509, loss 0.3269577537166779
-
iter 519, loss 0.32090310808539985
-
iter 529, loss 0.3151078159144129
-
iter 539, loss 0.30955724187078903
-
iter 549, loss 0.3042376374955925
-
iter 559, loss 0.2991360864954391
-
iter 569, loss 0.2942404534243286
-
iter 579, loss 0.2895393355454012
-
iter 589, loss 0.28502201767532415
-
iter 599, loss 0.28067842982626157
-
iter 609, loss 0.27649910747186535
-
iter 619, loss 0.2724751542744919
-
iter 629, loss 0.2685982071209627
-
iter 639, loss 0.26486040332365085
-
iter 649, loss 0.2612543498525749
-
iter 659, loss 0.2577730944725093
-
iter 669, loss 0.2544100986669443
-
iter 679, loss 0.2511592122380609
-
iter 689, loss 0.2480146494787638
-
iter 699, loss 0.24497096681926708
-
iter 709, loss 0.2420230418567802
-
iter 719, loss 0.23916605368251415
-
iter 729, loss 0.23639546442555454
-
iter 739, loss 0.23370700193813704
-
iter 749, loss 0.2310966435515475
-
iter 759, loss 0.2285606008362593
-
iter 769, loss 0.22609530530403904
-
iter 779, loss 0.22369739499361888
-
iter 789, loss 0.2213637018851542
-
iter 799, loss 0.21909124009208833
-
iter 809, loss 0.21687719478222933
-
iter 819, loss 0.21471891178284025
-
iter 829, loss 0.21261388782734392
-
iter 839, loss 0.2105597614038757
-
iter 849, loss 0.20855430416838638
-
iter 859, loss 0.20659541288730932
-
iter 869, loss 0.20468110187697833
-
iter 879, loss 0.2028094959090178
-
iter 889, loss 0.20097882355283644
-
iter 899, loss 0.19918741092814596
-
iter 909, loss 0.19743367584210875
-
iter 919, loss 0.1957161222872899
-
iter 929, loss 0.19403333527807176
-
iter 939, loss 0.19238397600456975
-
iter 949, loss 0.19076677728439412
-
iter 959, loss 0.1891805392938162
-
iter 969, loss 0.18762412556104593
-
iter 979, loss 0.18609645920539716
-
iter 989, loss 0.18459651940712488
-
iter 999, loss 0.18312333809366155
随机梯度下降法( Stochastic Gradient Descent)
在上述程序中,每次损失函数和梯度计算都是基于数据集中的全量数据。对于波士顿房价预测任务数据集而言,样本数比较少,只有404个。但在实际问题中,数据集往往非常大,如果每次都使用全量数据进行计算,效率非常低,通俗的说就是“杀鸡焉用牛刀”。由于参数每次只沿着梯度反方向更新一点点,因此方向并不需要那么精确。一个合理的解决方案是每次从总的数据集中随机抽取出小部分数据来代表整体,基于这部分数据计算梯度和损失来更新参数,这种方法被称作随机梯度下降法(Stochastic Gradient Descent,SGD),核心概念如下:
- min-batch:每次迭代时抽取出来的一批数据被称为一个min-batch。
- batch_size:一个mini-batch所包含的样本数目称为batch_size。
- epoch:当程序迭代的时候,按mini-batch逐渐抽取出样本,当把整个数据集都遍历到了的时候,则完成了一轮的训练,也叫一个epoch。启动训练时,可以将训练的轮数num_epochs和batch_size作为参数传入。
下面结合程序介绍具体的实现过程,涉及到数据处理和训练过程两部分代码的修改。
数据处理代码修改
数据处理需要实现拆分数据批次和样本乱序(为了实现随机抽样的效果)两个功能。
-
# 获取数据
-
-
train_data, test_data = load_data()
-
-
train_data.shape
-
-
(
404,
14)
-
train_data中一共包含
404条数据,如果batch_size=
10,即取前
0
-9号样本作为第一个mini-batch,命名train_data1。
-
-
train_data1 = train_data[
0:
10]
-
-
train_data1.shape
-
-
(
10,
14)
-
使用train_data1的数据(
0
-9号样本)计算梯度并更新网络参数。
-
-
net = Network(
13)
-
-
x = train_data1[:, :
-1]
-
-
y = train_data1[:,
-1:]
-
-
loss = net.train(x, y, iterations=
1, eta=
0.01)
-
-
loss
-
-
[
0.9001866101467375]
-
再取出
10
-19号样本作为第二个mini-batch,计算梯度并更新网络参数。
-
-
train_data2 = train_data[
10:
19]
-
-
x = train_data1[:, :
-1]
-
-
y = train_data1[:,
-1:]
-
-
loss = net.train(x, y, iterations=
1, eta=
0.01)
-
-
loss
[0.8903272433979657]
按此方法不断的取出新的mini-batch,并逐渐更新网络参数。
接下来,将train_data分成大小为batch_size的多个mini_batch,如下代码所示:将train_data分成 40410+1=41\frac{404}{10} + 1 = 4110404+1=41 个 mini_batch了,其中前40个mini_batch,每个均含有10个样本,最后一个mini_batch只含有4个样本。
-
batch_size =
10
-
-
n = len(train_data)
-
-
mini_batches = [train_data[k:k+batch_size]
for k
in range(
0, n, batch_size)]
-
-
print(
'total number of mini_batches is ', len(mini_batches))
-
-
print(
'first mini_batch shape ', mini_batches[
0].shape)
-
-
print(
'last mini_batch shape ', mini_batches[
-1].shape)
-
total number of mini_batches is 41
-
first mini_batch shape (10, 14)
-
last mini_batch shape (4, 14)
另外,我们这里是按顺序取出mini_batch的,而SGD里面是随机的抽取一部分样本代表总体。为了实现随机抽样的效果,我们先将train_data里面的样本顺序随机打乱,然后再抽取mini_batch。随机打乱样本顺序,需要用到np.random.shuffle函数,下面先介绍它的用法。
说明:
通过大量实验发现,模型对最后出现的数据印象更加深刻。训练数据导入后,越接近模型训练结束,最后几个批次数据对模型参数的影响越大。为了避免模型记忆影响训练效果,需要进行样本乱序操作。
-
# 新建一个array
-
-
a = np.array([
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12])
-
-
print(
'before shuffle', a)
-
-
np.random.shuffle(a)
-
-
print(
'after shuffle', a)
-
before shuffle [ 1 2 3 4 5 6 7 8 9 10 11 12]
-
after shuffle [ 7 2 11 3 8 6 12 1 4 5 10 9]
多次运行上面的代码,可以发现每次执行shuffle函数后的数字顺序均不同。 上面举的是一个1维数组乱序的案例,我们在观察下2维数组乱序后的效果。
-
# 新建一个array
-
-
a = np.array([
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12])
-
-
a = a.reshape([
6,
2])
-
-
print(
'before shuffle\n', a)
-
-
np.random.shuffle(a)
-
-
print(
'after shuffle\n', a)
-
-
before shuffle
-
[[ 1 2]
-
[ 3 4]
-
[ 5 6]
-
[ 7 8]
-
[ 9 10]
-
[11 12]]
-
after shuffle
-
[[ 1 2]
-
[ 3 4]
-
[ 5 6]
-
[ 9 10]
-
[11 12]
-
[ 7 8]]
观察运行结果可发现,数组的元素在第0维被随机打乱,但第1维的顺序保持不变。例如数字2仍然紧挨在数字1的后面,数字8仍然紧挨在数字7的后面,而第二维的[3, 4]并不排在[1, 2]的后面。将这部分实现SGD算法的代码集成到Network类中的train函数中,最终的完整代码如下。
-
-
# 获取数据
-
-
train_data, test_data = load_data()
-
-
-
-
# 打乱样本顺序
-
-
np.random.shuffle(train_data)
-
-
-
-
# 将train_data分成多个mini_batch
-
-
batch_size =
10
-
-
n = len(train_data)
-
-
mini_batches = [train_data[k:k+batch_size]
for k
in range(
0, n, batch_size)]
-
-
-
-
# 创建网络
-
-
net = Network(
13)
-
-
-
-
# 依次使用每个mini_batch的数据
-
-
for mini_batch
in mini_batches:
-
-
x = mini_batch[:, :
-1]
-
-
y = mini_batch[:,
-1:]
-
-
loss = net.train(x, y, iterations=
1)
-
训练过程代码修改
将每个随机抽取的mini-batch数据输入到模型中用于参数训练。训练过程的核心是两层循环:
- 第一层循环,代表样本集合要被训练遍历几次,称为“epoch”,代码如下:
for epoch_id in range(num_epoches):
- 第二层循环,代表每次遍历时,样本集合被拆分成的多个批次,需要全部执行训练,称为“iter (iteration)”,
代码如下:for iter_id,mini_batch in emumerate(mini_batches):
在两层循环的内部是经典的四步训练流程:前向计算->计算损失->计算梯度->更新参数,这与大家之前所学是一致的,代码如下:
-
x = mini_batch[
:,
:-
1]
-
y = mini_batch[
:, -
1
:]
-
a =
self.forward(x)
#前向计算
-
loss =
self.loss(a, y)
#计算损失
-
gradient_w, gradient_b =
self.gradient(x, y)
#计算梯度
-
self.update(gradient_w, gradient_b, eta)
#更新参数
将两部分改写的代码集成到Network类中的train函数中,最终的实现如下。
-
-
import numpy
as np
-
-
-
-
class Network(object):
-
-
def __init__(self, num_of_weights):
-
-
# 随机产生w的初始值
-
-
# 为了保持程序每次运行结果的一致性,此处设置固定的随机数种子
-
-
#np.random.seed(0)
-
-
self.w = np.random.randn(num_of_weights,
1)
-
-
self.b =
0.
-
-
-
-
def forward(self, x):
-
-
z = np.dot(x, self.w) + self.b
-
-
return z
-
-
-
-
def loss(self, z, y):
-
-
error = z - y
-
-
num_samples = error.shape[
0]
-
-
cost = error * error
-
-
cost = np.sum(cost) / num_samples
-
-
return cost
-
-
-
-
def gradient(self, x, y):
-
-
z = self.forward(x)
-
-
N = x.shape[
0]
-
-
gradient_w =
1. / N * np.sum((z-y) * x, axis=
0)
-
-
gradient_w = gradient_w[:, np.newaxis]
-
-
gradient_b =
1. / N * np.sum(z-y)
-
-
return gradient_w, gradient_b
-
-
-
-
def update(self, gradient_w, gradient_b, eta =
0.01):
-
-
self.w = self.w - eta * gradient_w
-
-
self.b = self.b - eta * gradient_b
-
-
-
-
-
-
def train(self, training_data, num_epoches, batch_size=
10, eta=
0.01):
-
-
n = len(training_data)
-
-
losses = []
-
-
for epoch_id
in range(num_epoches):
-
-
# 在每轮迭代开始之前,将训练数据的顺序随机的打乱,
-
-
# 然后再按每次取batch_size条数据的方式取出
-
-
np.random.shuffle(training_data)
-
-
# 将训练数据进行拆分,每个mini_batch包含batch_size条的数据
-
-
mini_batches = [training_data[k:k+batch_size]
for k
in range(
0, n, batch_size)]
-
-
for iter_id, mini_batch
in enumerate(mini_batches):
-
-
#print(self.w.shape)
-
-
#print(self.b)
-
-
x = mini_batch[:, :
-1]
-
-
y = mini_batch[:,
-1:]
-
-
a = self.forward(x)
-
-
loss = self.loss(a, y)
-
-
gradient_w, gradient_b = self.gradient(x, y)
-
-
self.update(gradient_w, gradient_b, eta)
-
-
losses.append(loss)
-
-
print(
'Epoch {:3d} / iter {:3d}, loss = {:.4f}'.
-
-
format(epoch_id, iter_id, loss))
-
-
-
-
return losses
-
-
-
-
# 获取数据
-
-
train_data, test_data = load_data()
-
-
-
-
# 创建网络
-
-
net = Network(
13)
-
-
# 启动训练
-
-
losses = net.train(train_data, num_epoches=
50, batch_size=
100, eta=
0.1)
-
-
-
-
# 画出损失函数的变化趋势
-
-
plot_x = np.arange(len(losses))
-
-
plot_y = np.array(losses)
-
-
plt.plot(plot_x, plot_y)
-
-
plt.show()
-
Epoch 0 / iter 0, loss = 0.6273
-
Epoch 0 / iter 1, loss = 0.4835
-
Epoch 0 / iter 2, loss = 0.5830
-
Epoch 0 / iter 3, loss = 0.5466
-
Epoch 0 / iter 4, loss = 0.2147
-
Epoch 1 / iter 0, loss = 0.6645
-
Epoch 1 / iter 1, loss = 0.4875
-
Epoch 1 / iter 2, loss = 0.4707
-
Epoch 1 / iter 3, loss = 0.4153
-
Epoch 1 / iter 4, loss = 0.1402
-
Epoch 2 / iter 0, loss = 0.5897
-
Epoch 2 / iter 1, loss = 0.4373
-
Epoch 2 / iter 2, loss = 0.4631
-
Epoch 2 / iter 3, loss = 0.3960
-
Epoch 2 / iter 4, loss = 0.2340
-
Epoch 3 / iter 0, loss = 0.4139
-
Epoch 3 / iter 1, loss = 0.5635
-
Epoch 3 / iter 2, loss = 0.3807
-
Epoch 3 / iter 3, loss = 0.3975
-
Epoch 3 / iter 4, loss = 0.1207
-
Epoch 4 / iter 0, loss = 0.3786
-
Epoch 4 / iter 1, loss = 0.4474
-
Epoch 4 / iter 2, loss = 0.4019
-
Epoch 4 / iter 3, loss = 0.4352
-
Epoch 4 / iter 4, loss = 0.0435
-
Epoch 5 / iter 0, loss = 0.4387
-
Epoch 5 / iter 1, loss = 0.3886
-
Epoch 5 / iter 2, loss = 0.3182
-
Epoch 5 / iter 3, loss = 0.4189
-
Epoch 5 / iter 4, loss = 0.1741
-
Epoch 6 / iter 0, loss = 0.3191
-
Epoch 6 / iter 1, loss = 0.3601
-
Epoch 6 / iter 2, loss = 0.4199
-
Epoch 6 / iter 3, loss = 0.3289
-
Epoch 6 / iter 4, loss = 1.2691
-
Epoch 7 / iter 0, loss = 0.3202
-
Epoch 7 / iter 1, loss = 0.2855
-
Epoch 7 / iter 2, loss = 0.4129
-
Epoch 7 / iter 3, loss = 0.3331
-
Epoch 7 / iter 4, loss = 0.2218
-
Epoch 8 / iter 0, loss = 0.2368
-
Epoch 8 / iter 1, loss = 0.3457
-
Epoch 8 / iter 2, loss = 0.3339
-
Epoch 8 / iter 3, loss = 0.3812
-
Epoch 8 / iter 4, loss = 0.0534
-
Epoch 9 / iter 0, loss = 0.3567
-
Epoch 9 / iter 1, loss = 0.4033
-
Epoch 9 / iter 2, loss = 0.1926
-
Epoch 9 / iter 3, loss = 0.2803
-
Epoch 9 / iter 4, loss = 0.1557
-
Epoch 10 / iter 0, loss = 0.3435
-
Epoch 10 / iter 1, loss = 0.2790
-
Epoch 10 / iter 2, loss = 0.3456
-
Epoch 10 / iter 3, loss = 0.2076
-
Epoch 10 / iter 4, loss = 0.0935
-
Epoch 11 / iter 0, loss = 0.3024
-
Epoch 11 / iter 1, loss = 0.2517
-
Epoch 11 / iter 2, loss = 0.2797
-
Epoch 11 / iter 3, loss = 0.2989
-
Epoch 11 / iter 4, loss = 0.0301
-
Epoch 12 / iter 0, loss = 0.2507
-
Epoch 12 / iter 1, loss = 0.2563
-
Epoch 12 / iter 2, loss = 0.2971
-
Epoch 12 / iter 3, loss = 0.2833
-
Epoch 12 / iter 4, loss = 0.0597
-
Epoch 13 / iter 0, loss = 0.2827
-
Epoch 13 / iter 1, loss = 0.2094
-
Epoch 13 / iter 2, loss = 0.2417
-
Epoch 13 / iter 3, loss = 0.2985
-
Epoch 13 / iter 4, loss = 0.4036
-
Epoch 14 / iter 0, loss = 0.3085
-
Epoch 14 / iter 1, loss = 0.2015
-
Epoch 14 / iter 2, loss = 0.1830
-
Epoch 14 / iter 3, loss = 0.2978
-
Epoch 14 / iter 4, loss = 0.0630
-
Epoch 15 / iter 0, loss = 0.2342
-
Epoch 15 / iter 1, loss = 0.2780
-
Epoch 15 / iter 2, loss = 0.2571
-
Epoch 15 / iter 3, loss = 0.1838
-
Epoch 15 / iter 4, loss = 0.0627
-
Epoch 16 / iter 0, loss = 0.1896
-
Epoch 16 / iter 1, loss = 0.1966
-
Epoch 16 / iter 2, loss = 0.2018
-
Epoch 16 / iter 3, loss = 0.3257
-
Epoch 16 / iter 4, loss = 0.1268
-
Epoch 17 / iter 0, loss = 0.1990
-
Epoch 17 / iter 1, loss = 0.2031
-
Epoch 17 / iter 2, loss = 0.2662
-
Epoch 17 / iter 3, loss = 0.2128
-
Epoch 17 / iter 4, loss = 0.0133
-
Epoch 18 / iter 0, loss = 0.1780
-
Epoch 18 / iter 1, loss = 0.1575
-
Epoch 18 / iter 2, loss = 0.2547
-
Epoch 18 / iter 3, loss = 0.2544
-
Epoch 18 / iter 4, loss = 0.2007
-
Epoch 19 / iter 0, loss = 0.1657
-
Epoch 19 / iter 1, loss = 0.2000
-
Epoch 19 / iter 2, loss = 0.2045
-
Epoch 19 / iter 3, loss = 0.2524
-
Epoch 19 / iter 4, loss = 0.0632
-
Epoch 20 / iter 0, loss = 0.1629
-
Epoch 20 / iter 1, loss = 0.1895
-
Epoch 20 / iter 2, loss = 0.2523
-
Epoch 20 / iter 3, loss = 0.1896
-
Epoch 20 / iter 4, loss = 0.0918
-
Epoch 21 / iter 0, loss = 0.1583
-
Epoch 21 / iter 1, loss = 0.2322
-
Epoch 21 / iter 2, loss = 0.1567
-
Epoch 21 / iter 3, loss = 0.2089
-
Epoch 21 / iter 4, loss = 0.2035
-
Epoch 22 / iter 0, loss = 0.2273
-
Epoch 22 / iter 1, loss = 0.1427
-
Epoch 22 / iter 2, loss = 0.1712
-
Epoch 22 / iter 3, loss = 0.1826
-
Epoch 22 / iter 4, loss = 0.2878
-
Epoch 23 / iter 0, loss = 0.1685
-
Epoch 23 / iter 1, loss = 0.1622
-
Epoch 23 / iter 2, loss = 0.1499
-
Epoch 23 / iter 3, loss = 0.2329
-
Epoch 23 / iter 4, loss = 0.1486
-
Epoch 24 / iter 0, loss = 0.1617
-
Epoch 24 / iter 1, loss = 0.2083
-
Epoch 24 / iter 2, loss = 0.1442
-
Epoch 24 / iter 3, loss = 0.1740
-
Epoch 24 / iter 4, loss = 0.1641
-
Epoch 25 / iter 0, loss = 0.1159
-
Epoch 25 / iter 1, loss = 0.2064
-
Epoch 25 / iter 2, loss = 0.1690
-
Epoch 25 / iter 3, loss = 0.1778
-
Epoch 25 / iter 4, loss = 0.0159
-
Epoch 26 / iter 0, loss = 0.1730
-
Epoch 26 / iter 1, loss = 0.1861
-
Epoch 26 / iter 2, loss = 0.1387
-
Epoch 26 / iter 3, loss = 0.1486
-
Epoch 26 / iter 4, loss = 0.1090
-
Epoch 27 / iter 0, loss = 0.1393
-
Epoch 27 / iter 1, loss = 0.1775
-
Epoch 27 / iter 2, loss = 0.1564
-
Epoch 27 / iter 3, loss = 0.1245
-
Epoch 27 / iter 4, loss = 0.7611
-
Epoch 28 / iter 0, loss = 0.1470
-
Epoch 28 / iter 1, loss = 0.1211
-
Epoch 28 / iter 2, loss = 0.1285
-
Epoch 28 / iter 3, loss = 0.1854
-
Epoch 28 / iter 4, loss = 0.5240
-
Epoch 29 / iter 0, loss = 0.1740
-
Epoch 29 / iter 1, loss = 0.0898
-
Epoch 29 / iter 2, loss = 0.1392
-
Epoch 29 / iter 3, loss = 0.1842
-
Epoch 29 / iter 4, loss = 0.0251
-
Epoch 30 / iter 0, loss = 0.0978
-
Epoch 30 / iter 1, loss = 0.1529
-
Epoch 30 / iter 2, loss = 0.1640
-
Epoch 30 / iter 3, loss = 0.1503
-
Epoch 30 / iter 4, loss = 0.0975
-
Epoch 31 / iter 0, loss = 0.1399
-
Epoch 31 / iter 1, loss = 0.1595
-
Epoch 31 / iter 2, loss = 0.1209
-
Epoch 31 / iter 3, loss = 0.1203
-
Epoch 31 / iter 4, loss = 0.2008
-
Epoch 32 / iter 0, loss = 0.1501
-
Epoch 32 / iter 1, loss = 0.1310
-
Epoch 32 / iter 2, loss = 0.1065
-
Epoch 32 / iter 3, loss = 0.1489
-
Epoch 32 / iter 4, loss = 0.0818
-
Epoch 33 / iter 0, loss = 0.1401
-
Epoch 33 / iter 1, loss = 0.1367
-
Epoch 33 / iter 2, loss = 0.0970
-
Epoch 33 / iter 3, loss = 0.1481
-
Epoch 33 / iter 4, loss = 0.0711
-
Epoch 34 / iter 0, loss = 0.1157
-
Epoch 34 / iter 1, loss = 0.1050
-
Epoch 34 / iter 2, loss = 0.1378
-
Epoch 34 / iter 3, loss = 0.1505
-
Epoch 34 / iter 4, loss = 0.0429
-
Epoch 35 / iter 0, loss = 0.1096
-
Epoch 35 / iter 1, loss = 0.1279
-
Epoch 35 / iter 2, loss = 0.1715
-
Epoch 35 / iter 3, loss = 0.0888
-
Epoch 35 / iter 4, loss = 0.0473
-
Epoch 36 / iter 0, loss = 0.1350
-
Epoch 36 / iter 1, loss = 0.0781
-
Epoch 36 / iter 2, loss = 0.1458
-
Epoch 36 / iter 3, loss = 0.1288
-
Epoch 36 / iter 4, loss = 0.0421
-
Epoch 37 / iter 0, loss = 0.1083
-
Epoch 37 / iter 1, loss = 0.0972
-
Epoch 37 / iter 2, loss = 0.1513
-
Epoch 37 / iter 3, loss = 0.1236
-
Epoch 37 / iter 4, loss = 0.0366
-
Epoch 38 / iter 0, loss = 0.1204
-
Epoch 38 / iter 1, loss = 0.1341
-
Epoch 38 / iter 2, loss = 0.1109
-
Epoch 38 / iter 3, loss = 0.0905
-
Epoch 38 / iter 4, loss = 0.3906
-
Epoch 39 / iter 0, loss = 0.0923
-
Epoch 39 / iter 1, loss = 0.1094
-
Epoch 39 / iter 2, loss = 0.1295
-
Epoch 39 / iter 3, loss = 0.1239
-
Epoch 39 / iter 4, loss = 0.0684
-
Epoch 40 / iter 0, loss = 0.1188
-
Epoch 40 / iter 1, loss = 0.0984
-
Epoch 40 / iter 2, loss = 0.1067
-
Epoch 40 / iter 3, loss = 0.1057
-
Epoch 40 / iter 4, loss = 0.4602
-
Epoch 41 / iter 0, loss = 0.1478
-
Epoch 41 / iter 1, loss = 0.0980
-
Epoch 41 / iter 2, loss = 0.0921
-
Epoch 41 / iter 3, loss = 0.1020
-
Epoch 41 / iter 4, loss = 0.0430
-
Epoch 42 / iter 0, loss = 0.0991
-
Epoch 42 / iter 1, loss = 0.0994
-
Epoch 42 / iter 2, loss = 0.1270
-
Epoch 42 / iter 3, loss = 0.0988
-
Epoch 42 / iter 4, loss = 0.1176
-
Epoch 43 / iter 0, loss = 0.1286
-
Epoch 43 / iter 1, loss = 0.1013
-
Epoch 43 / iter 2, loss = 0.1066
-
Epoch 43 / iter 3, loss = 0.0779
-
Epoch 43 / iter 4, loss = 0.1481
-
Epoch 44 / iter 0, loss = 0.0840
-
Epoch 44 / iter 1, loss = 0.0858
-
Epoch 44 / iter 2, loss = 0.1388
-
Epoch 44 / iter 3, loss = 0.1000
-
Epoch 44 / iter 4, loss = 0.0313
-
Epoch 45 / iter 0, loss = 0.0896
-
Epoch 45 / iter 1, loss = 0.1173
-
Epoch 45 / iter 2, loss = 0.0916
-
Epoch 45 / iter 3, loss = 0.1043
-
Epoch 45 / iter 4, loss = 0.0074
-
Epoch 46 / iter 0, loss = 0.1008
-
Epoch 46 / iter 1, loss = 0.0915
-
Epoch 46 / iter 2, loss = 0.0877
-
Epoch 46 / iter 3, loss = 0.1139
-
Epoch 46 / iter 4, loss = 0.0292
-
Epoch 47 / iter 0, loss = 0.0679
-
Epoch 47 / iter 1, loss = 0.0987
-
Epoch 47 / iter 2, loss = 0.0929
-
Epoch 47 / iter 3, loss = 0.1098
-
Epoch 47 / iter 4, loss = 0.4838
-
Epoch 48 / iter 0, loss = 0.0693
-
Epoch 48 / iter 1, loss = 0.1095
-
Epoch 48 / iter 2, loss = 0.1128
-
Epoch 48 / iter 3, loss = 0.0890
-
Epoch 48 / iter 4, loss = 0.1008
-
Epoch 49 / iter 0, loss = 0.0724
-
Epoch 49 / iter 1, loss = 0.0804
-
Epoch 49 / iter 2, loss = 0.0919
-
Epoch 49 / iter 3, loss = 0.1233
-
Epoch 49 / iter 4, loss = 0.1849
观察上述Loss的变化,随机梯度下降加快了训练过程,但由于每次仅基于少量样本更新参数和计算损失,所以损失下降曲线会出现震荡。
说明:
由于房价预测的数据量过少,所以难以感受到随机梯度下降带来的性能提升。
总结
本节,我们详细讲解了如何使用Numpy实现梯度下降算法,构建并训练了一个简单的线性模型实现波士顿房价预测,可以总结出,使用神经网络建模房价预测有三个要点:
- 构建网络,初始化参数w和b,定义预测和损失函数的计算方法。
- 随机选择初始点,建立梯度的计算方法和参数更新方式。
- 从总的数据集中抽取部分数据作为一个mini_batch,计算梯度并更新参数,不断迭代直到损失函数几乎不再下降。
转载:https://blog.csdn.net/weixin_43468161/article/details/106115607