如何进行多元逻辑回归

可以使用阶梯函数通过逐步过程确定多元逻辑回归。此函数选择模型以最小化AIC。

通常建议不要盲目地遵循逐步程序，而是要使用拟合统计（AIC，AICc，BIC）比较模型，或者根据生物学或科学上合理的可用变量建立模型。

多元相关是研究潜在自变量之间关系的一种工具。例如，如果两个独立变量彼此相关，可能在最终模型中都不需要这两个变量，但可能有理由选择一个变量而不是另一个变量。

多元相关

创建数值变量的数据框


  
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Data.num $ Status = as.numeric（Data.num $ Status）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Data.num $ Length = as.numeric（Data.num $ Length）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Data.num $ Migr = as.numeric（Data.num $ Migr）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Data.num $ Insect = as.numeric（Data.num $ Insect）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Data.num $ Diet = as.numeric（Data.num $ Diet）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Data.num $ Broods = as.numeric（Data.num $ Broods）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Data。 num $ Wood = as.numeric（Data.num $ Wood）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Data.num $ Upland = as.numeric（Data.num $ Upland）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Data.num $ Water = as.numeric（Data.num $ Water）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Data.num $ Release = as.numeric（Data.num $ Release）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Data.num $ Indiv = as.numeric（Data.num $ Indiv）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      ###检查新数据框
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      headtail（Data.num）
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      1 
      1 
      1520 
      9600.
      0 
      1.
      21 
      1 
      12 
      2 
      6.
      0 
      1 
      0 
      0 
      1 
      6 
      29
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      2 
      1 
      1250 
      5000.
      0 
      0.
      56 
      1 
      0 
      1 
      6.
      0 
      1 
      0 
      0 
      1 
      10 
      85
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      3 
      1 
      870 
      3360.
      0 
      0.
      07 
      1 
      0 
      1 
      4.
      0 
      1 
      0 
      0 
      1 
      3 
      8
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      77 
      0 
      170 
      31.
      0 
      0.
      55 
      3 
      12 
      2 
      4.
      0 NA 
      1 
      0 
      0 
      1 
      2
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      78 
      0 
      210 
      36.
      9 
      2.
      00 
      2 
      8 
      2 
      3.
      7 
      1 
      0 
      0 
      1 
      1 
      2
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      79 
      0 
      225 
      106.
      5 
      1.
      20 
      2 
      12 
      2 
      4.
      8 
      2 
      0 
      0 
      0 
      1 
      2
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      ###检查变量之间的相关性
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      ###这里使用了Spearman相关性

多元逻辑回归的例子

在此示例中，数据包含缺失值。在R中缺失值用NA表示。SAS通常会无缝地处理缺失值。虽然这使用户更容易，但可能无法确保用户了解这些缺失值的作用。在某些情况下，R要求用户明确如何处理缺失值。处理多元回归中的缺失值的一种方法是从数据集中删除具有任何缺失值的所有观察值。这是我们在逐步过程之前要做的事情，创建一个名为Data.omit的数据框。但是，当我们创建最终模型时，我们只想排除那些在最终模型中实际包含的变量中具有缺失值的观察。为了测试最终模型的整体p值，绘制最终模型，或使用glm.compare函数，我们将创建一个名为Data.final的数据框，只排除那些观察结果。

尽管二项式和poission系列中的模型应该没问题，但是对于使用某些glm拟合的步骤过程存在一些注意事项。

用逐步回归确定模型

最终模型


  
   
    
     
    
    
     
      summary(model.final)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Coefficients:
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
                   
      Estimate 
      Std. 
      Error 
      z 
      value 
      Pr(>|z|)   
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      (Intercept) 
      -3.5496482  
      2.0827400  
      -1.704 
      0.088322 
      . 
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Upland      
      -4.5484289  
      2.0712502  
      -2.196 
      0.028093 
      * 
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Migr        
      -1.8184049  
      0.8325702  
      -2.184 
      0.028956 
      * 
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Mass         
      0.0019029  
      0.0007048   
      2.700 
      0.006940 
      **
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Indiv        
      0.0137061  
      0.0038703   
      3.541 
      0.000398 
      ***
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Insect       
      0.2394720  
      0.1373456   
      1.744 
      0.081234 
      . 
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Wood         
      1.8134445  
      1.3105911   
      1.384 
      0.166455

伪R方


  
   
    
     
    
    
     
      $Pseudo.R.squared.for.model.vs.null
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
                                  
      Pseudo.R.squared
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      McFadden                             
      0.700475
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Cox 
      and Snell (ML) 0.637732
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Nagelkerke 
      (Cragg and Uhler) 0.833284

模型总体p值

在最终模型中创建包含变量的数据框，并省略NA。

偏差表分析


  
   
    
     
    
    
     
      Analysis of Deviance Table
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Model 
      1: Status ~ Upland + Migr + Mass + Indiv + Insect + Wood
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Model 
      2: Status ~ 
      1
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
       
      Resid. Df Resid. Dev Df Deviance  Pr(>Chi)   
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      1        
      63     
      30.
      392                         
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      2        
      69     
      93.
      351 -
      6  -
      62.
      959 
      1.
      125e-
      11 ***

似然比检验


  
   
    
     
    
    
     
      Likelihood ratio test
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
       
      #Df LogLik Df Chisq Pr(>Chisq) 
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      1   
      7 -
      15.
      196                        
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      2   
      1 -
      46.
      675 -
      6 
      62.
      959  
      1.
      125e-
      11 ***

标准化残差图

简单的预测值图

在最终模型中创建包含变量的数据框，并在NA中省略

过度离散检验

过度离散是glm的deviance残差相对于自由度较大的情况。这些值显示在模型的摘要中。一个指导原则是，如果deviance残差与剩余自由度的比率超过1.5，则模型过度离散。过度离散表明模型不能很好地拟合数据：解释变量可能无法很好地描述因变量，或者可能无法为这些数据正确指定模型。如果存在过度离散，一种可能的解决方案是在glm中使用quasibinomial family选项。


  
   
    
     
    
    
     
      Null deviance: 
      93.351  
      on 
      69  
      degrees 
      of 
      freedom
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Residual deviance: 
      30.392  
      on 
      63  
      degrees 
      of 
      freedom
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      deviance 
      /   
      df.residual
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      [
      1] 
      0.482417

评估模型的替代方法

使用逐步程序的替代或补充是将模型与拟合统计进行比较。我的compare.glm 函数将为glm模型显示AIC，AICc，BIC和伪R平方。使用的模型应该都拟合相同的数据。也就是说，如果数据集中的不同变量包含缺失值，则应该谨慎使用。如果您对使用哪种拟合统计数据没有任何偏好，您希望在最终模型中使用较少的术语，我可能会推荐AICc或BIC。

一系列模型可以与标准的anova 功能进行比较。模型应嵌套在先前模型中或anova函数列表中的下一个模型中; 和模型应该拟合相同的数据。在比较多个回归模型时，通常放宽p值为0.10或0.15。

在以下示例中，使用通过逐步过程选择的模型。请注意，虽然模型9最小化了AIC和AICc，但模型8最小化了BIC。anova结果表明模型8不是对模型7的显着改进。这些结果支持选择模型7,8或9中的任何一个。


  
   
    
     
    
    
     
      compareGLM(model.1, 
      model.2, 
      model.3, 
      model.4, 
      model.5, 
      model.6,
     
    
   
    
     
    
    
                
      model.7, 
      model.8, 
      model.9)
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      $Models
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
       
      Formula                                                  
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      1 
      "Status ~ 1"                                             
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      2 
      "Status ~ Release"                                       
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      3 
      "Status ~ Release + Upland"                               
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      4 
      "Status ~ Release + Upland + Migr"                       
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      5 
      "Status ~ Release + Upland + Migr + Mass"                
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      6 
      "Status ~ Release + Upland + Migr + Mass + Indiv"        
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      7 
      "Status ~ Release + Upland + Migr + Mass + Indiv + Insect"
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      8 
      "Status ~ Upland + Migr + Mass + Indiv + Insect"         
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      9 
      "Status ~ Upland + Migr + Mass + Indiv + Insect + Wood"  
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      $Fit.criteria
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
       
      Rank 
      Df.res   
      AIC  
      AICc   
      BIC 
      McFadden 
      Cox.and.Snell 
      Nagelkerke   
      p.value
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      1    
      1     
      66 
      94.34 
      94.53 
      98.75   
      0.0000        
      0.0000     
      0.0000       
      Inf
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      2    
      2     
      65 
      62.13 
      62.51 
      68.74   
      0.3787        
      0.3999     
      0.5401 
      2.538e-09
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      3    
      3     
      64 
      56.02 
      56.67 
      64.84   
      0.4684        
      0.4683     
      0.6325 
      3.232e-10
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      4    
      4     
      63 
      51.63 
      52.61 
      62.65   
      0.5392        
      0.5167     
      0.6979 
      7.363e-11
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      5    
      5     
      62 
      50.64 
      52.04 
      63.87   
      0.5723        
      0.5377     
      0.7263 
      7.672e-11
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      6    
      6     
      61 
      49.07 
      50.97 
      64.50   
      0.6118        
      0.5618     
      0.7588 
      5.434e-11
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      7    
      7     
      60 
      46.42 
      48.90 
      64.05   
      0.6633        
      0.5912     
      0.7985 
      2.177e-11
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      8    
      6     
      61 
      44.71 
      46.61 
      60.14   
      0.6601        
      0.5894     
      0.7961 
      6.885e-12
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      9    
      7     
      60 
      44.03 
      46.51 
      61.67   
      0.6897        
      0.6055     
      0.8178 
      7.148e-12
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Analysis 
      of 
      Deviance 
      Table
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Model 1: 
      Status 
      ~ 
      1
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Model 2: 
      Status 
      ~ 
      Release
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Model 3: 
      Status 
      ~ 
      Release 
      + 
      Upland
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Model 4: 
      Status 
      ~ 
      Release 
      + 
      Upland 
      + 
      Migr
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Model 5: 
      Status 
      ~ 
      Release 
      + 
      Upland 
      + 
      Migr 
      + 
      Mass
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Model 6: 
      Status 
      ~ 
      Release 
      + 
      Upland 
      + 
      Migr 
      + 
      Mass 
      + 
      Indiv
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Model 7: 
      Status 
      ~ 
      Release 
      + 
      Upland 
      + 
      Migr 
      + 
      Mass 
      + 
      Indiv 
      + 
      Insect
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Model 8: 
      Status 
      ~ 
      Upland 
      + 
      Migr 
      + 
      Mass 
      + 
      Indiv 
      + 
      Insect
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      Model 9: 
      Status 
      ~ 
      Upland 
      + 
      Migr 
      + 
      Mass 
      + 
      Indiv 
      + 
      Insect 
      + 
      Wood
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
       
      Resid. 
      Df 
      Resid. 
      Dev 
      Df 
      Deviance 
      Pr(>Chi)   
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      1        
      66     
      90.343                        
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      2        
      65     
      56.130  
      1   
      34.213 
      4.94e-09 
      ***
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      3        
      64     
      48.024  
      1    
      8.106 
      0.004412 
      **
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      4        
      63     
      41.631  
      1    
      6.393 
      0.011458 
      * 
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      5        
      62     
      38.643  
      1    
      2.988 
      0.083872 
      . 
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      6        
      61     
      35.070  
      1    
      3.573 
      0.058721 
      . 
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      7        
      60     
      30.415  
      1    
      4.655 
      0.030970 
      * 
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      8        
      61     
      30.710 
      -1   
      -0.295 
      0.587066   
     
    
   
    
     
    
    
      
     
    
   
    
     
    
    
     
      9        
      60     
      28.031  
      1    
      2.679 
      0.101686