1.Python代码
#!/usr/bin/env python3
# encoding: utf-8
'''
@file: WineCV.py
@time: 2020/6/13 0013 18:51
@author: Jack
@contact: jack18588951684@163.com
'''
import urllib.request
import numpy as np
from sklearn import datasets, linear_model
from sklearn.linear_model import LassoCV
from math import sqrt
import matplotlib.pyplot as plt
## 读取数据集
target_url = ("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv")
data = urllib.request.urlopen(target_url)
xList = []
labels = []
names = []
firstLine = True
for line in data:
if firstLine:
names = str(line, encoding='utf-8').strip().split(";")
firstLine = False
else:
row = str(line, encoding='utf-8').strip().split(";")
labels.append(float(row[-1]))
row.pop()
floatRow = [float(num) for num in row]
xList.append(floatRow)
nrows = len(xList)
ncols = len(xList[0])
## 计算均值方差
xMean = []
xSD = []
for i in range(ncols):
col = [xList[j][i] for j in range(nrows)]
mean = sum(col) / nrows
xMean.append(mean)
colDiff = [(xList[j][i] - mean) for j in range(nrows)]
sumSq = sum([colDiff[i] * colDiff[i] for i in range(nrows)])
stdDev = sqrt(sumSq / nrows)
xSD.append(stdDev)
xNormalized = []
for i in range(nrows):
rowNormalized = [(xList[i][j] - xMean[j]) / xSD[j] for j in range(ncols)]
xNormalized.append(rowNormalized)
meanLabel = sum(labels) / nrows
sdLabel = sqrt(sum([(labels[i] - meanLabel) * (labels[i] - meanLabel) for i in range(nrows)]) / nrows)
labelNormalized = [(labels[i] - meanLabel) / sdLabel for i in range(nrows)]
## 未归一化labels
Y = np.array(labels)
## 归一化labels
# Y = np.array(labelNormalized)
## 未归一化 X's
X = np.array(xList)
## 归一化 Xss
X = np.array(xNormalized)
## Call LassoCV from sklearn.linear_model
wineModel = LassoCV(cv=10).fit(X, Y)
## 显示结果
plt.figure()
plt.plot(wineModel.alphas_, wineModel.mse_path_, ':')
plt.plot(wineModel.alphas_, wineModel.mse_path_.mean(axis=-1),
label='Average MSE Across Folds', linewidth=2)
plt.axvline(wineModel.alpha_, linestyle='--',
label='CV Estimate of Best alpha')
plt.semilogx()
plt.legend()
ax = plt.gca()
ax.invert_xaxis()
plt.xlabel('alpha')
plt.ylabel('Mean Square Error')
plt.axis('tight')
plt.show()
# print out the value of alpha that minimizes the Cv-error
print("alpha Value that Minimizes CV Error ", wineModel.alpha_)
print("Minimum MSE ", min(wineModel.mse_path_.mean(axis=-1)))
alpha Value that Minimizes CV Error 0.010948337166040092
Minimum MSE 0.4338019871536978
2.代码说明
上述代码展示了执行 10 折交叉验证的效果并绘制了图形。代码的第1部分从UCI网站读入数据,转化为列表的列表,然后对属性以及标签进行归一化。接着将列表转换为 numpy 数组X(属性矩阵)以及数组 Y(标签向量)。回归存在 2 个版本的定义。其中一个版本使用归一化的数值,另一个版本使用非归一化的数值。不论哪种定义,都可以将对应的非归一化版本注释掉,重新运行代码看对属性或者标签进行归一化的实际效果。使用一行代码定义交叉验证的数据份数(10),并且对模型进行了训练。然后程序在 10 份数据的每一份上绘制错误随 α 变化的曲线,同时绘制在 10 份数据上的错误平均值。
转载:https://blog.csdn.net/u013010473/article/details/106736800
查看评论