Regression within each group

  Kiến thức lập trình

I am trying to run regression within each group and year. Here is what I have done:

data = pd.DataFrame({
    "Year": [2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001],
    "Group": ["A", "A", "B", "B", "A", "A", "B", "B"],
    "ID": [1, 2, 3, 5, 2, 1, 2, 3],
    "Value 1": [40, 20, 30, 45, 22, 34, 11, 88],
    "Value 2": [3, 22, 11, 55, 5, 9, 4, 15],
})

def func(row, var):
    X = row[var]  # independent variable
    y = row["Value 2"]  # dependent variable
    X = sm.add_constant(X)
    row["Residual"] = sm.OLS(y, X, missing="drop").fit().resid
    return row

data.groupby(["Year", "Group"], group_keys=False).apply(func, var="Value 1")

It works just fine. However, my real dataset is huge. Is there a way to run this more efficiently? For example, using matrix? Becuase I kept getting following error and it’s pretty slow.

“MemoryError: Unable to allocate 5.73 GiB for an array with shape (229, 3359769) and data type float64”

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT