方差计算的简单实现
在概率统计中,方差用于衡量一组数据的离散程度,相关的计算公式如下(总体方差):
μ = 1 N ∑ i = 1 N x i σ 2 = 1 N ∑ i = 1 N ( x i − μ ) 2
μ
=
1
N
∑
i
=
1
N
x
i
σ
2
=
1
N
∑
i
=
1
N
(
x
i
−
μ
)
2
\begin{aligned}&\mu = \frac{1}{N}\sum_{i = 1}^{N}x_i \\&\sigma^2 = \frac{1}{N}\sum_{i = 1}^{N}(x_i - \mu)^2\end{aligned}
μ = N 1 i = 1 ∑ N x i σ 2 = N 1 i = 1 ∑ N ( x i − μ ) 2
其中
μ
μ
\mu
μ 为数据的平均值, 而
σ 2
σ
2
\sigma^2
σ 2 即是(总体)方差.
相应的实现代码如下:
-- Lua
function average(values, count)
local sum = 0
for i = 1, count do
sum = sum + values[i]
end
return sum / count
end
function variance(values, count)
local average = average(values, count)
local variance = 0
for i = 1, count do
local delta = values[i] - average
variance = variance + delta * delta
end
return variance / count
end
通常我们需要在获取新样本数据时更新方差 ,简单的方法就是按照上述公式重新计算一遍,我们可以通过计算数据子集方差 的方式来模拟这个过程:
-- Lua
function variance_list(values)
local ret = {}
for i = 1, #values do
ret[i] = variance(values, i)
end
return ret
end
更好的一种方式是通过递推来计算数据子集的方差,这需要对方差的计算公式做一些变形:
σ 2 = 1 N ∑ i = 1 N ( x i − μ ) 2 ⟹ σ 2 = 1 N ∑ i = 1 N ( x 2 i + μ 2 − 2 x i μ ) ⟹ σ 2 = 1 N ( ∑ i = 1 N x 2 i + ∑ i = 1 N μ 2 − ∑ i = 1 N 2 x i μ ) ⟹ σ 2 = 1 N ( ∑ i = 1 N x 2 i + N μ 2 − 2 μ ∑ i = 1 N x i ) ⟹ σ 2 = 1 N ( ∑ i = 1 N x 2 i + N μ 2 − 2 N μ 2 ) ⟹ σ 2 = 1 N ( ∑ i = 1 N x 2 i − N μ 2 ) ⟹ σ 2 = 1 N ( ∑ i = 1 N x 2 i − N ( ∑ N i = 1 x i N ) 2 ) ⟹ σ 2 = 1 N ( ∑ i = 1 N x 2 i − ( ∑ N i = 1 x i ) 2 N )
σ
2
=
1
N
∑
i
=
1
N
(
x
i
−
μ
)
2
⟹
σ
2
=
1
N
∑
i
=
1
N
(
x
i
2
+
μ
2
−
2
x
i
μ
)
⟹
σ
2
=
1
N
(
∑
i
=
1
N
x
i
2
+
∑
i
=
1
N
μ
2
−
∑
i
=
1
N
2
x
i
μ
)
⟹
σ
2
=
1
N
(
∑
i
=
1
N
x
i
2
+
N
μ
2
−
2
μ
∑
i
=
1
N
x
i
)
⟹
σ
2
=
1
N
(
∑
i
=
1
N
x
i
2
+
N
μ
2
−
2
N
μ
2
)
⟹
σ
2
=
1
N
(
∑
i
=
1
N
x
i
2
−
N
μ
2
)
⟹
σ
2
=
1
N
(
∑
i
=
1
N
x
i
2
−
N
(
∑
i
=
1
N
x
i
N
)
2
)
⟹
σ
2
=
1
N
(
∑
i
=
1
N
x
i
2
−
(
∑
i
=
1
N
x
i
)
2
N
)
\begin{aligned}&\sigma^2 = \frac{1}{N}\sum_{i = 1}^{N}(x_i - \mu)^2 \implies \\&\sigma^2 = \frac{1}{N}\sum_{i = 1}^{N}(x_i^2 + \mu^2 - 2x_i\mu) \implies \\&\sigma^2 = \frac{1}{N}(\sum_{i = 1}^{N}x_i^2 + \sum_{i = 1}^{N}\mu^2 - \sum_{i = 1}^{N}2x_i\mu) \implies \\&\sigma^2 = \frac{1}{N}(\sum_{i = 1}^{N}x_i^2 + N\mu^2 - 2\mu\sum_{i = 1}^{N}x_i) \implies \\&\sigma^2 = \frac{1}{N}(\sum_{i = 1}^{N}x_i^2 + N\mu^2 - 2N\mu^2) \implies \\&\sigma^2 = \frac{1}{N}(\sum_{i = 1}^{N}x_i^2 - N\mu^2) \implies \\&\sigma^2 = \frac{1}{N}(\sum_{i = 1}^{N}x_i^2 - N(\frac{\sum_{i=1}^{N}x_i}{N})^2) \implies \\&\sigma^2 = \frac{1}{N}(\sum_{i = 1}^{N}x_i^2 - \frac{(\sum_{i=1}^{N}x_i)^2}{N})\end{aligned}
σ 2 = N 1 i = 1 ∑ N ( x i − μ ) 2 ⟹ σ 2 = N 1 i = 1 ∑ N ( x i 2 + μ 2 − 2 x i μ ) ⟹ σ 2 = N 1 ( i = 1 ∑ N x i 2 + i = 1 ∑ N μ 2 − i = 1 ∑ N 2 x i μ ) ⟹ σ 2 = N 1 ( i = 1 ∑ N x i 2 + N μ 2 − 2 μ i = 1 ∑ N x i ) ⟹ σ 2 = N 1 ( i = 1 ∑ N x i 2 + N μ 2 − 2 N μ 2 ) ⟹ σ 2 = N 1 ( i = 1 ∑ N x i 2 − N μ 2 ) ⟹ σ 2 = N 1 ( i = 1 ∑ N x i 2 − N ( N ∑ i = 1 N x i ) 2 ) ⟹ σ 2 = N 1 ( i = 1 ∑ N x i 2 − N ( ∑ i = 1 N x i ) 2 )
基于此,我们就可以递推的计算数据子集的方差了,相关的计算复杂度则降低了一个数量级(
O ( n 2 ) ⟹ O ( n )
O
(
n
2
)
⟹
O
(
n
)
O(n^2) \implies O(n)
O ( n 2 ) ⟹ O ( n ) ):
-- lua
function variance_list_recurrence(values)
local ret = {}
local pre_square_sum = 0
local pre_sum = 0
for i = 1, #values do
local val = values[i]
pre_square_sum = pre_square_sum + val * val
pre_sum = pre_sum + val
ret[i] = (pre_square_sum - (pre_sum * pre_sum / i)) / i
end
return ret
end
转载:
https://blog.csdn.net/tkokof1/article/details/104344680