如何解释逻辑回归的系数

html5 • 2022年9月21日 pm3:07 • 问答

我试图弄清楚具有多项式项的逻辑回归系数与预测之间的关系。具体来说，我对 x 轴上预测最高的位置感兴趣。下面的例子：

set.seed(42)

# Setup some dummy data
x <- 1:200
y <- rep(0, length(x))
y[51:150] <- rbinom(100, 1, 0.5)

# Fit a model
family <- binomial()
model  <- glm(y ~ poly(x, 2), family = family)

# Illustrate model
plot(x, y)
lines(x, family$linkinv(predict(model)), col = 2)

上面的模型给了我这些系数：

coef(model)
#> (Intercept) poly(x, 2)1 poly(x, 2)2 
#>   -1.990317   -3.867855  -33.299893

^{由reprex 包(v1.0.0)于 2021 年 8 月 3 日创建}

手册页poly()说明如下：

正交多项式由系数概括，可用于通过 Kennedy & Gentle (1980, pp. 343–4) 中给出的三项递归对其进行评估，并用于代码的预测部分。

但是，我无法访问这本书，也无法从predict.glmS3 方法中辨别出这些系数是如何处理的。有没有办法仅从系数（即不使用predict()找到最大值）重建顶峰的位置（在示例中约为 100 ）？

回答

从正交多项式的理论表达式推导出预测最大值的位置

我得到了Kennedy 和 Gentle (1982)的“统计计算”一书的副本，在文档中引用poly，现在分享我关于正交多项式计算的发现，以及我们如何使用它们来找到最大值的位置 x预测值。

书中介绍的正交多项式（第 343-4 页）是单数的（即最高阶系数始终为 1），并通过以下递归程序获得：

其中q是考虑的正交多项式的数量。

请注意上述术语与文档的以下关系poly：

您的问题中包含的摘录中出现的“三项递归”是第三个表达式的 RHS，它恰好具有三个项。
第三个表达式中的rho(j+1)系数称为“居中常数”。
第三个表达式中的gamma(j)系数在文档中没有名称，但它们与“归一化常数”直接相关，如下所示。

作为参考，我在这里粘贴poly文档“值”部分的相关摘录：

一个矩阵，行对应于 x 中的点，列对应于度数，属性“度”指定列的度数和（除非原始 = TRUE）“coefs”，其中包含用于构造正交多项式的居中和归一化常数

回去复发，我们可以的参数导出的值（J + 1）RHO和伽玛（J）由上施加的正交条件从第三表达式P（J + 1） WRT P（j）的和P（J- 1）。
（重要的是要注意正交性条件不是积分，而是n 个观察到的x点的总和，因此多项式系数取决于数据！--例如 Tchebyshev 正交多项式的情况并非如此）。

参数的表达式变为：

对于回归中使用的 1 阶和 2 阶多项式，我们得到以下表达式，已经用 R 代码编写：

# First we define the number of observations in the data
n = length(x)

# For p1(x):
# p1(x) = (x - rho1) p0(x)      (since p_{-1}(x) = 0)
rho1 = mean(x)

# For p2(x)
# p2(x) = (x - rho2) p1(x) - gamma1
gamma1 = var(x) * (n-1)/n
rho2 = sum( x * (x - mean(x))^2 ) / (n*gamma1)

我们得到：

> c(rho1, rho2, gamma1)
[1]  100.50  100.50 3333.25

请注意，的coefs属性poly(x,2)是：

> attr(poly(x,2), "coefs")
$alpha
[1] 100.5 100.5

$norm2
[1]          1        200     666650 1777555560

其中$alpha包含中心常数，即rho值（与我们的一致 - 顺便说一下，当 x 的分布对于任何q对称时，所有中心常数都等于 x 的平均值！（观察和证明）），并$norm2包含归一化常数（在本例中为p(-1,x)、p(0,x)、p(1,x)和p(2,x)），即常数c(j)将多项式归一化递推公式——通过将它们除以sqrt(c(j)) ——，得到多项式r(j,x)满足sum_over_i{ r(j,x_i)^2 } = 1 ; 请注意，r(j,x)是存储在由返回的对象中的多项式poly()。

从上面已经给出的表达式，我们观察到gamma(j)正是两个连续归一化常数的比率，即：gamma(j) = c(j) / c(j-1)。

我们可以gamma1通过计算来检查我们的值是否与这个比率一致：

gamma1 == attr(poly(x,2), "coefs")$norm2[3] / attr(poly(x,2), "coefs")$norm2[2]

返回TRUE.

回到寻找模型预测的最大值的问题，我们可以：

将预测值表示为 r(1,x) 和 r(2,x) 以及逻辑回归系数的函数，即：

pred(x) = beta0 + beta1 * r(1,x) + beta2 * r(2,x)
导出表达式 wrt x，将其设置为 0 并求解 x。

在 R 代码中：

# Get the normalization constants alpha(j) to obtain r(j,x) from p(j,x) as
# r(j,x) = p(j,x) / sqrt( norm(j) ) = p(j,x) / alpha(j)
alpha1 = sqrt( attr(poly(x,2), "coefs")$norm2[3] )
alpha2 = sqrt( attr(poly(x,2), "coefs")$norm2[4] )

# Get the logistic regression coefficients (beta1 and beta2)
coef1 = as.numeric( model$coeff["poly(x, 2)1"] )
coef2 = as.numeric( model$coeff["poly(x, 2)2"] )

# Compute the x at which the maximum occurs from the expression that is obtained
# by deriving the predicted expression pred(x) = beta0 + beta1*r(1,x) + beta2*r(2,x)
# w.r.t. x and setting the derivative to 0.
xmax = ( alpha2^-1 * coef2 * (rho1 + rho2) - alpha1^-1 * coef1 ) / (2 * alpha2^-1 * coef2)

这使：

> xmax
[1] 97.501114

即使用我之前的答案中描述的其他“经验”方法获得的相同值。

从您提供的代码开始，获取最大预测值的位置 x的完整代码是：

# First we define the number of observations in the data
n = length(x)

# Parameters for p1(x):
# p1(x) = (x - rho1) p0(x)      (since p_{-1}(x) = 0)
rho1 = mean(x)

# Parameters for p2(x)
# p2(x) = (x - rho2) p1(x) - gamma1
gamma1 = var(x) * (n-1)/n
rho2 = mean( x * (x - mean(x))^2 ) / gamma1

# Get the normalization constants alpha(j) to obtain r(j,x) from p(j,x) as
# r(j,x) = p(j,x) / sqrt( norm(j) ) = p(j,x) / alpha(j)
alpha1 = sqrt( attr(poly(x,2), "coefs")$norm2[3] )
alpha2 = sqrt( attr(poly(x,2), "coefs")$norm2[4] )

# Get the logistic regression coefficients (beta1 and beta2)
coef1 = as.numeric( model$coeff["poly(x, 2)1"] )
coef2 = as.numeric( model$coeff["poly(x, 2)2"] )

# Compute the x at which the maximum occurs from the expression that is obtained
# by deriving the predicted expression pred(x) = beta0 + beta1*r(1,x) + beta2*r(2,x)
# w.r.t. x and setting the derivative to 0.
( xmax = ( alpha2^-1 * coef2 * (rho1 + rho2) - alpha1^-1 * coef1 ) / (2 * alpha2^-1 * coef2) )

以上是如何解释逻辑回归的系数的全部内容。

THE END

二维码

使用try-with-resources或在“finally”子句中关闭此“Stream”

< <上一篇

为什么glm变换函数“向后”应用？

下一篇>>

搜索内容

如何解释逻辑回归的系数

回答

从正交多项式的理论表达式推导出预测最大值的位置

目录

目录

推荐文章

最新文章