I’m using sklearn 1.4.1 but random forest still cannot handle missing values

  Kiến thức lập trình

I’ve read that random forest algorithm in sklearn > 1.4 should be able to handle NaN. I’ve checked that I’ve the latest version of Sklearn.

! pip install --upgrade scikit-learn

import sklearn
print(sklearn.__version__)

however i still get the error:

ValueError: Input X contains NaN.
RandomForestClassifier does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

Why? Should I import something else? I’m confused.

LEAVE A COMMENT