机器学习作业 1 - 单变量线性回归(吴恩达)

2020-04-11 09:32 462人阅读评论(0)

文章目录

机器学习作业 1 - 线性回归

1.单变量线性回归
2.batch gradient decent（批量梯度下降）
3.推导过程

机器学习作业 1 - 线性回归

1.单变量线性回归

导入需要使用的包

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

导入数据集。提醒大家：一定要把数据文件ex1data1.txt放在和程序同一个文件夹里，否则需要使用绝对路径访问文件
将csv文件读入并转化为数据框形式,路径,指定哪一行作为表头。默认设置为0（即第一行作为表头），如果没有表头的话，要修改参数，设置header=None,
指定列的名称，用列表表示。一般我们没有表头，即header=None时，这个用来添加列名
在默认情况下，head命令显示文件的头5行内容

path =  'ex1data1.txt'
data = pd.read_csv(path,header=None,names = ['population','profit'])
data.head()

对于数值数据，结果的索引将包括计数，平均值，标准差，最小值，最大值以及较低的百分位数和50。默认情况下，较低的百分位数为25，较高的百分位数为75.50百分位数与中位数相同。

data.describe()

数据可视化，绘制散点图 kind: 取值为 line 或者 scatter, 后者为默认值图像大小

data.plot(kind='scatter',x = 'population',y = 'profit')
plt.show()

现在让我们使用梯度下降来实现线性回归，以最小化成本函数。

np.power(x1,x2)数组的元素分别求n次方。x2可以是数字，也可以是数组，但是x1和x2的列数要相同

def computeCost(X,y,theta):
    # your code here  (appro ~ 2 lines)
    inner = np.power(((X*theta.T)-y),2)
    return np.sum(inner)/(2*len(X))

让我们在训练集中添加一列，以便我们可以使用向量化的解决方案来计算代价和梯度。在训练集的左侧插入一列全为“1”的列，以便计算即x0=1
loc为0,name为ones,value为1.

data.insert(0,'Once',1)

现在我们来做一些变量初始化。.shape[0] 为第一维的长度,shape[1] 为第二维的长度理解列.pandas中利用.iloc选取数据iloc’,’ 前的部分标明选取的行，‘,’后的部分标明选取的列此时三列了

# set X (training data) and y (target variable)
cols = data.shape[1]
X = data.iloc[:,0:cols-1] #X是所有行，去掉最后一列
y = data.iloc[:,cols-1:cols]

观察下 X (训练集) and y (目标变量)是否正确.

X.head()

y.head()

代价函数是应该是numpy矩阵，所以我们需要转换X和Y，然后才能使用它们。我们还需要初始化theta，即把theta所有元素都设置为0.

X = np.matrix(X.values)
y = np.matrix(y.values)
# your code here  (appro ~ 1 lines)
theta = np.matrix(np.array([0,0]))

theta 是一个(1,2)矩阵

theta

看下维度

X.shape ,theta.shape,y.shape

计算代价函数 (theta初始值为0).

computeCost(X,y,theta)

2.batch gradient decent（批量梯度下降）

def gradientDescent(X,y,theta,alpha,iters):
    temp = np.matrix(np.zeros(theta.shape))#构建零值矩阵
    parameters = int(theta.ravel().shape[1])# ravel计算需要求解的参数个数 功能将多维数组降至一维
    cost = np.zeros(iters) #构建iters个0的数组
    
    for i in range(iters):
         # your code here  (appro ~ 1 lines)
        error = (X*theta.T) - y
        for j in range(parameters):
            # your code here  (appro ~ 2 lines)
            term = np.multiply(error,X[:,j])#计算两矩阵(hθ(x)-y)x
            temp[0,j] = theta[0,j] - ((alpha/len(X)) * np.sum(term))
        # your code here  (appro ~ 2 lines)  
        theta = temp
        cost[i] = computeCost(X,y,theta)
    return theta,cost

初始化一些附加变量 - 学习速率α和要执行的迭代次数。

alpha = 0.01
iters = 1000

g,cost = gradientDescent(X,y,theta,alpha,iters)
g

最后，我们可以使用我们拟合的参数计算训练模型的代价函数（误差）。

computeCost(X,y,g)

现在我们来绘制线性模型以及数据，直观地看出它的拟合。fig代表整个图像，ax代表实例

x = np.linspace(data.population.min(),data.population.max(),100) #抽100个样本
f = g[0,0] + (g[0,1]*x)#g[0,0] 代表theta0 , g[0,1] 代表theta1

fig ,ax = plt.subplots(figsize=(12,8))
ax.plot(x,f,'r',label='prediction')
ax.scatter(data.population,data.profit,label='Traning Data')
ax.legend(loc=4)#显示标签位置
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
plt.show()

由于梯度方程式函数也在每个训练迭代中输出一个代价的向量，所以我们也可以绘制。请注意，代价总是降低 - 这是凸优化问题的一个例子。

fig,ax = plt.subplots(figsize=(12,8))
ax.plot(np.arange(iters),cost,'r')
ax.set_xlabel('Iterations')
ax.set_ylabel('Cost')
ax.set_title('Error vs. Training Epoch')
plt.show()

3.推导过程

数据源码：https://github.com/XiangLinPro/MLPractice
所有巧合的是要么是上天注定要么是一个人偷偷的在努力。

个人微信公众号，专注于学习资源、笔记分享,欢迎关注。我们一起成长，一起学习。一直纯真着，善良着，温情地热爱生活,，如果觉得有点用的话，请不要吝啬你手中点赞的权力,谢谢我亲爱的读者朋友。

Love is patient, love is kind. It does not envy, it does not boast, it is not proud. (The Bible)
爱是恒久忍耐，又有恩慈；爱是不嫉妒，爱是不自夸，不张狂。——《圣经》

2020年4月11 00:20日于重庆城口
好好学习，天天向上，终有所获

转载：https://blog.csdn.net/youif/article/details/105445185

查看评论

小言_互联网的博客

小言_互联网的博客

个人资料

文章分类

文章存档

阅读排行

评论排行

推荐文章