Relative Content

Tag Archive for xgboost

How does XGBoost calculate base_score?

Since XGBoost 2.0 base_score is automatically calculated if it is not specified when initialising an estimator. I naively thought it would simply use the mean of the target, but this does not seem to be the case:

The XGBoost model only focuses on the first feature indicator

When I use XGBoost for regression prediction, changing the order of features greatly affects the accuracy of the model, and the importance of features only shows that the first feature is very high (for example, if I place a in the first one, a will be high, and if I place b in the first one, b will be high), while the rest will be very low. Moreover, changing the order of feature indicators (such as changing x1 to x2, x2 to x1) can have a significant impact on the accuracy of the model. I would like to ask why and how to solve this problem.

XGBoost (via XGBoost4J) is using only about 1 or 2 cores, instead of my 300+ cores

I’m calling XGBoost using the Java library XGBoost4J on a dataset of ~9 million rows to do a binary classification problem in which 95% of the ~9 millions rows are the “negative control” but I’m interesting in predicting the “positive control” (and in particular, I’m interested in maximizing the number of predicted positive controls at a 1% false discovery rate). However, I noticed that about only 1 or at most 2 cpu cores are being used. Is there a way that I can use all my other cores? (I have about 300 more cores on the cloud VM I’m using!) (The parameters I’m using are below.)