Getting the dispersion parameter from a negative binomial regression in python

  Kiến thức lập trình

R code to find the dispersion parameter of the negative binomial regression model.
mod.Syn.L.plusGamma <- glm.nb(Syn ~ offset(I(1*log(L))), data = Lang.data)
r = mod.Syn.L.plusGamma$theta

The dispersion parameter theta is estimated by the glm model in R. model summary below:
Call:
glm.nb(formula = Syn ~ offset(I(1 * log(L))), data = Lang.data,
init.theta = 71.78887386, link = log)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -11.2986 0.1231 -91.77 <2e-16 ***

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(71.7889) family taken to be 1)

Null deviance: 494.73  on 3397  degrees of freedom

Residual deviance: 494.73 on 3397 degrees of freedom
AIC: 630.26

Number of Fisher Scoring iterations: 1

          Theta:  72 
      Std. Err.:  1537 

Warning while fitting theta: iteration limit reached

2 x log-likelihood: -626.26

Trying to replicate the same model in python:
def model_3(data):
formula = “Syn ~ 1”
data[‘log_L’] = np.log(data[‘L’])
model = smf.glm(formula=formula, data=data, family=sm.families.NegativeBinomial(), offset=data[‘log_L’], observed=False).fit()
return model

Model summary from regression model in python:
Generalized Linear Model Regression Results

Dep. Variable: Syn No. Observations: 3398
Model: GLM Df Residuals: 3397
Model Family: NegativeBinomial Df Model: 0
Link Function: Log Scale: 1.0000
Method: IRLS Log-Likelihood: -313.43
Date: Tue, 11 Jun 2024 Deviance: 445.59
Time: 14:20:10 Pearson chi2: 3.37e+03
No. Iterations: 6 Pseudo R-squ. (CS): 0.000
Covariance Type: nonrobust

             coef    std err          z      P>|z|      [0.025      0.975]

Intercept -11.2997 0.125 -90.506 0.000 -11.544 -11.055

Model intercept and other variables are nearly identical to the values produced by the model in R however the dispersion parameter is not estimated during the process. Some sources mention that the dispersion parameter theta and the shape parameter alpha are aggregated so their value will be equal to one which is the scale but I am not sure if that is completely true. What is clear is that alpha and theta are not directly accessible from the model.

Are there other ways to estimate the dispersion parameter, maybe from the other model values such as variance and mu?

New contributor

JasonFred Ngwa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

LEAVE A COMMENT