飞道的博客

Machine Learning Experiment5 Regularization(正则化) 详解+代码实现

453人阅读  评论(0)
  1. 为什么要引入正则化?

在做线性回归或者逻辑回归的时候,会遇到过拟合问题,即,在训练集上的error很小,但是在测试集上的偏差却很大。因此,引入正则化项,防止过拟合。保证在测试集上获得和在训练集上相同的效果。

例如:对于线性回归,不同幂次的方程如下

通过训练得到的结果如下:

明显,对于低次方程,容易产生欠拟合,而对于高次方程,容易产生过拟合现象。

因此,我们引入正则化项:

其他的正则化因子

  1. 关于线性回归的正则化

(1)首先,绘制数据图像:

我们可以看到,只有7个数据点,因此,很容易过拟合,(训练数据集越大,越不容易过拟合)。

(2)我们用一个五次的多项式做线性回归:

之所以是线性回归,是因为对于每一个x的不同幂次,它们是线性组合的。对于初始的x,它是一个一维的特征,因此,我们将x重新构造,得到一个六维的向量。

m=length(y);

x=[ones(m,1),x,x.^2,x.^3,x.^4,x.^5];

如上实现,那么对于x的每一个维度,它们都是线性无关的,h(x)是它们的线性组合,因此,此时问题是一个多维线性回归问题。

(3)损失函数

其中λ是正则化参数。

(4)采用正规方程方式求解

注意:λ后的矩阵,θ0不参与计算,即不对θ0进行惩罚。

对于不同的参数λ,如果过大,则会把所有的参数都最小化了,导致模型编程常数θ0

即造成欠拟合。

(5)计算方法:

lambda=1;

Lambda=lambda.*eye(6);

Lambda(1)=0;

theta=(x'*x+Lambda)\x'*y

figure;

x_=(minx:0.01:maxx)';

x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]

hold on

plot(x0, y0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);

plot(x_,x_1*theta,'--b','LineWidth',2);

legend({'data','5-th line'})

title('\lambda=1')

xlabel('x')

ylabel('y')

hold off


其中λ取0,1,10.结果如下:

 

 

 

 

计算结果如下:

Theta(λ=0) = 6×1
    0.4725
    0.6814
   -1.3801
   -5.9777
    2.4417
    4.7371
Theta(λ=1) = 6×1
    0.3976
   -0.4207
    0.1296
   -0.3975
    0.1753
   -0.3394
Theta(λ=10) = 6×1
    0.5205
   -0.1825
    0.0606
   -0.1482
    0.0743
   -0.1280

 

我们可以看出,当λ=0时,曲线很好的拟合了数据点,但是也明显产生了过拟合;而当λ=1时,数据点相对均匀地分布在曲线的两侧,而λ=10时,欠拟合现象明显。

 

  1. 关于逻辑回归的正则化
  1. 绘制原始数据

其中,‘+’表示正例,‘o’表示反例。

绘制方法如下:

pos = find(y); neg = find(y == 0);

plot (x(pos,1),x(pos,2),'+')

hold on

plot (x(neg,1),x(neg,2),'o')

  1. 预测函数与x的转化

 注意:x是一个二维的向量,我们此处将x转化为一个高维的向量,同时,最高次数为6.特征映射函数如下:

function out = map_feature(feat1, feat2)

    degree = 6;

    out = ones(size(feat1(:,1)));

    for i = 1:degree

        for j = 0:i

            out(:, end+1) = (feat1.^(i-j)).*(feat2.^j);

        end

    end

 

正则化后的损失函数

参数θ的更新规则,其中H是Hessian矩阵,另一个参数为J的梯度。

  1. 迭代求解
  2. [m, n] = size(x);

    theta = zeros(n, 1);

    g =@(z)(1.0 ./ (1.0 + exp(-z)));

    % disp(theta)

    lambda=0

    iteration=20

    J = zeros(iteration, 1);

    for i=1:iteration

        z = x*theta;%  x:117x28 theta 28x1

        h = g(z) ;%  sigmoid   h

     

        % Calculate J (for testing convergence)

        J(i) =-(1/m)*sum(y.*log(h)+(1-y).*log(1-h))+ ...

        (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)

        %norm求的是向量theta的欧几里德范数

     

        % Calculate gradient and hessian.

        G = (lambda/m).*theta; G(1) = 0; % gradient

        L = (lambda/m).*eye(n); L(1) = 0;% Hessian

       

        grad = ((1/m).*x' * (h-y)) + G;

        H = ((1/m).*x'*diag(h)*diag(1-h)*x) + L;

     

        % Here is the actual update

        theta = theta - H\grad;

       

    end

    计算出θ的值,然后绘制决策边界,可视化展示计算结果。

  3. 结果展示
  4. 其中λ的取值同样为0,1,10;

    注意,采用MATLAB中的contour函数通过等高线的方式进行绘制,同时,在取值连线的时候注意要对u,v做同样的处理,如下:

    % Define the ranges of the grid

    u = linspace(-1, 1.5, 200);

    v = linspace(-1, 1.5, 200);

     

    % Initialize space for the values to be plotted

    z = zeros(length(u), length(v));

     

    % Evaluate z = theta*x over the grid

    for i = 1:length(u)

        for j = 1:length(v)

            % Notice the order of j, i here!

            z(j,i) = map_feature(u(i), v(j))*theta;

        end

    end

    绘制图像结果如下:

  5. 同样,我们可以看到对于λ=0,过拟合,λ=10,欠拟合。

 

 

附录 源代码


  
  1. 附录:程序源代码
  2. 1. 线性回归+正则化
  3. 2. clc,clear
  4. 3. x=load( "ex5Linx.dat");
  5. 4. y=load( "ex5Liny.dat");
  6. 5. x 0= x, y 0= y
  7. 6. figure;
  8. 7. plot( x, y, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
  9. 8. title( 'training data')
  10. 9. xlabel( 'x')
  11. 10. ylabel( 'y')
  12. 11. minx=min( x);
  13. 12. maxx=max( x);
  14. 13. m= length( y);
  15. 14. x=[ones( m, 1), x,x.^ 2,x.^ 3,x.^ 4,x.^ 5];
  16. 15. disp(size( x( 1,:))) %1x6
  17. 16. theta=zeros(size( x( 1,:)))
  18. 17. lambda= 0;
  19. 18. Lambda=lambda.*eye( 6);
  20. 19. Lambda( 1)= 0;
  21. 20. theta=( x '*x+Lambda)\x'* y
  22. 21. figure;
  23. 22. x _=(minx: 0. 01:maxx) ';
  24. 23. x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
  25. 24. hold on
  26. 25. plot(x0, y0, 'o ', 'MarkerFacecolor ', 'r ', 'MarkerSize ', 8);
  27. 26. plot(x_,x_1*theta,'--b ','LineWidth ',2);
  28. 27. legend({'data ',' 5-th line '})
  29. 28. title('\lambda= 0 ')
  30. 29. xlabel(' x ')
  31. 30. ylabel(' y ')
  32. 31. hold off
  33. 32. lambda=1;
  34. 33. Lambda=lambda.*eye(6);
  35. 34. Lambda(1)=0;
  36. 35. theta=(x'* x+Lambda)\ x '*y
  37. 36. figure;
  38. 37. x_=(minx:0.01:maxx)';
  39. 38. x_1=[ones(size( x _)), x _, x _.^ 2, x _.^ 3, x _.^ 4, x _.^ 5]
  40. 39. hold on
  41. 40. plot( x 0, y 0, 'o', 'MarkerFacecolor', 'r', 'MarkerSize', 8);
  42. 41. plot( x _,x_1*theta, '--b', 'LineWidth', 2);
  43. 42. legend({ 'data', '5-th line'})
  44. 43. title( '\lambda=1')
  45. 44. xlabel( 'x')
  46. 45. ylabel( 'y')
  47. 46. hold off
  48. 47. lambda= 10;
  49. 48. Lambda=lambda.*eye( 6);
  50. 49. Lambda( 1)= 0;
  51. 50. theta=( x '*x+Lambda)\x'* y
  52. 51. figure;
  53. 52. x _=(minx: 0. 01:maxx) ';
  54. 53. x_1=[ones(size(x_)),x_,x_.^2,x_.^3,x_.^4,x_.^5]
  55. 54. hold on
  56. 55. plot(x0, y0, 'o ', 'MarkerFacecolor ', 'r ', 'MarkerSize ', 8);
  57. 56. plot(x_,x_1*theta,'--b ','LineWidth ',2);
  58. 57. legend({'data ',' 5-th line '})
  59. 58. title('\lambda= 10 ')
  60. 59. xlabel(' x ')
  61. 60. ylabel(' y ')
  62. 61. hold off
  63. 2. 逻辑回归+正则化
  64. 1. clc,clear;
  65. 2. x = load ('ex5Logx.dat ') ;
  66. 3. y = load ('ex5Logy.dat ') ;
  67. 4. x0=x
  68. 5. y0=y
  69. 6. figure
  70. 7. % Find the i n d i c e s f or th e 2 c l a s s e s
  71. 8. pos = find(y); neg = find(y == 0);
  72. 9. plot (x(pos,1),x(pos,2),'+ ')
  73. 10. hold on
  74. 11. plot (x(neg,1),x(neg,2),'o ')
  75. 12. u=x(:,1)
  76. 13. v=x(:,2)
  77. 14. x = map_feature (u,v)
  78. 15. [m, n] = size(x);
  79. 16. theta = zeros(n, 1);
  80. 17. g =@(z)(1.0 ./ (1.0 + exp(-z)));
  81. 18. % disp(theta)
  82. 19. lambda=0
  83. 20. iteration=20
  84. 21. J = zeros(iteration, 1);
  85. 22. for i=1:iteration
  86. 23. z = x*theta;% x:117x28 theta 28x1
  87. 24. h = g(z) ;% sigmoid h
  88. 25.
  89. 26. % Calculate J (for testing convergence)
  90. 27. J(i) =-(1/m)*sum(y.*log(h)+(1-y).*log(1-h))+ ...
  91. 28. (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
  92. 29. %norm求的是向量theta的欧几里德范数
  93. 30.
  94. 31. % Calculate gradient and hessian.
  95. 32. G = (lambda/m).*theta; G(1) = 0; % gradient
  96. 33. L = (lambda/m).*eye(n); L(1) = 0;% Hessian
  97. 34.
  98. 35. grad = ((1/m).*x' * (h- y)) + G;
  99. 36. H = (( 1/ m).* x '*diag(h)*diag(1-h)*x) + L;
  100. 37.
  101. 38. % Here is the actual update
  102. 39. theta = theta - H\grad;
  103. 40.
  104. 41. end
  105. 42. % Define the ranges of the grid
  106. 43. u = linspace(-1, 1.5, 200);
  107. 44. v = linspace(-1, 1.5, 200);
  108. 45.
  109. 46. % Initialize space for the values to be plotted
  110. 47. z = zeros(length(u), length(v));
  111. 48.
  112. 49. % Evaluate z = theta*x over the grid
  113. 50. for i = 1:length(u)
  114. 51. for j = 1:length(v)
  115. 52. % Notice the order of j, i here!
  116. 53. z(j,i) = map_feature(u(i), v(j))*theta;
  117. 54. end
  118. 55. end
  119. 56. % Because of the way that contour plotting works
  120. 57. % in Matlab, we need to transpose z, or
  121. 58. % else the axis orientation will be flipped!
  122. 59. z = z'
  123. 60. % Plot z = 0 by specifying the range [ 0, 0]
  124. 61. contour(u,v,z,[ 0, 0], 'LineWidth', 2)
  125. 62. xlim([- 1.00 1.50])
  126. 63. ylim([- 0. 8 1.20])
  127. 64. legend({ 'y=1', 'y=0', 'Decision Boundary'})
  128. 65. title( '\lambda=0')
  129. 66. xlabel( 'u')
  130. 67. ylabel( 'v')
  131. 68. lambda= 1
  132. 69. % lambda= 10
  133. 70. iteration= 20
  134. 71. J = zeros(iteration, 1);
  135. 72. for i= 1:iteration
  136. 73. z = x*theta;% x: 117x28 theta 28x1
  137. 74. h = g(z) ;% sigmoid h
  138. 75.
  139. 76. % Calculate J ( for testing convergence)
  140. 77. J(i) =-( 1/ m)*sum(y.* log(h)+( 1- y).* log( 1-h))+ ...
  141. 78. (lambda/( 2* m))*norm(theta( 2:end))^ 2; %不包括theta( 0
  142. 79. %norm求的是向量theta的欧几里德范数
  143. 80.
  144. 81. % Calculate gradient and hessian.
  145. 82. G = (lambda/ m).*theta; G( 1) = 0; % gradient
  146. 83. L = (lambda/ m).*eye(n); L( 1) = 0;% Hessian
  147. 84.
  148. 85. grad = (( 1/ m).* x ' * (h-y)) + G;
  149. 86. H = ((1/m).*x'*diag(h)*diag( 1-h)* x) + L;
  150. 87.
  151. 88. % Here is the actual update
  152. 89. % disp(H\grad)
  153. 90. theta = theta - H\grad;
  154. 91. % disp(theta)
  155. 92. % disp(i)
  156. 93. end
  157. 94. % Define the ranges of the grid
  158. 95. u = linspace(- 1, 1.5, 200);
  159. 96. v = linspace(- 1, 1.5, 200);
  160. 97.
  161. 98. % Initialize space for the values to be plotted
  162. 99. z = zeros( length(u), length(v));
  163. 100.
  164. 101. % Evaluate z = theta* x over the grid
  165. 102. for i = 1: length(u)
  166. 103. for j = 1: length(v)
  167. 104. % Notice the order of j, i here!
  168. 105. z(j,i) = map_feature(u(i), v(j))*theta;
  169. 106. end
  170. 107. end
  171. 108. % Because of the way that contour plotting works
  172. 109. % in Matlab, we need to transpose z, or
  173. 110. % else the axis orientation will be flipped!
  174. 111. z = z '
  175. 112. % Plot z = 0 by specifying the range [0, 0]
  176. 113. figure;
  177. 114. pos = find(y0); neg = find(y0 == 0);
  178. 115. plot (x0(pos,1),x0(pos,2),'+ ')
  179. 116. hold on
  180. 117. plot (x0(neg,1),x0(neg,2),'o ')
  181. 118. contour(u,v,z,[0,0], 'LineWidth ', 2)
  182. 119. xlim([-1.00 1.50])
  183. 120. ylim([-0.8 1.20])
  184. 121. legend({' y= 1 ',' y= 0 ','Decision Boundary '})
  185. 122. title('\lambda= 1 ')
  186. 123. xlabel('u ')
  187. 124. ylabel('v ')
  188. 125. lambda=10
  189. 126. iteration=20
  190. 127. J = zeros(iteration, 1);
  191. 128. for i=1:iteration
  192. 129. z = x*theta;% x:117x28 theta 28x1
  193. 130. h = g(z) ;% sigmoid h
  194. 131.
  195. 132. % Calculate J (for testing convergence)
  196. 133. J(i) =-(1/m)*sum(y.*log(h)+(1-y).*log(1-h))+ ...
  197. 134. (lambda/(2*m))*norm(theta(2:end))^2; %不包括theta(0)
  198. 135. %norm求的是向量theta的欧几里德范数
  199. 136.
  200. 137. % Calculate gradient and hessian.
  201. 138. G = (lambda/m).*theta; G(1) = 0; % gradient
  202. 139. L = (lambda/m).*eye(n); L(1) = 0;% Hessian
  203. 140.
  204. 141. grad = ((1/m).*x' * (h- y)) + G;
  205. 142. H = (( 1/ m).* x '*diag(h)*diag(1-h)*x) + L;
  206. 143.
  207. 144. % Here is the actual update
  208. 145. theta = theta - H\grad;
  209. 146. end
  210. 147. % Define the ranges of the grid
  211. 148. u = linspace(-1, 1.5, 200);
  212. 149. v = linspace(-1, 1.5, 200);
  213. 150.
  214. 151. % Initialize space for the values to be plotted
  215. 152. z = zeros(length(u), length(v));
  216. 153.
  217. 154. % Evaluate z = theta*x over the grid
  218. 155. for i = 1:length(u)
  219. 156. for j = 1:length(v)
  220. 157. % Notice the order of j, i here!
  221. 158. z(j,i) = map_feature(u(i), v(j))*theta;
  222. 159. end
  223. 160. end
  224. 161. % Because of the way that contour plotting works
  225. 162. % in Matlab, we need to transpose z, or
  226. 163. % else the axis orientation will be flipped!
  227. 164. z = z'
  228. 165. % Plot z = 0 by specifying the range [ 0, 0]
  229. 166. figure;
  230. 167. pos = find( y 0); neg = find( y 0 == 0);
  231. 168. plot ( x 0( pos, 1), x 0( pos, 2), '+')
  232. 169. hold on
  233. 170. plot ( x 0(neg, 1), x 0(neg, 2), 'o')
  234. 171. contour(u,v,z,[ 0, 0], 'LineWidth', 2)
  235. 172. xlim([- 1.00 1.50])
  236. 173. ylim([- 0. 8 1.20])
  237. 174. legend({ 'y=1', 'y=0', 'Decision Boundary'})
  238. 175. title( '\lambda=10')
  239. 176. xlabel( 'u')
  240. 177. ylabel( 'v')

 


转载:https://blog.csdn.net/IT_flying625/article/details/105742243
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场