How does XGBoost calculate base_score?
Since XGBoost 2.0 base_score
is automatically calculated if it is not specified when initialising an estimator. I naively thought it would simply use the mean of the target, but this does not seem to be the case:
The XGBoost model only focuses on the first feature indicator
When I use XGBoost for regression prediction, changing the order of features greatly affects the accuracy of the model, and the importance of features only shows that the first feature is very high (for example, if I place a in the first one, a will be high, and if I place b in the first one, b will be high), while the rest will be very low. Moreover, changing the order of feature indicators (such as changing x1 to x2, x2 to x1) can have a significant impact on the accuracy of the model. I would like to ask why and how to solve this problem.
How to Modify XGBoost to Apply a Custom Factor to Predicted Values Exceeding a Threshold?
I am using XGBoost in Python to build a predictive model. My goal is to modify the standard XGBoost behavior such that if the predicted time exceeds a certain threshold, the prediction is multiplied by a factor. This factor and threshold should be hyperparameters that can be optimized.
XGBoost (via XGBoost4J) is using only about 1 or 2 cores, instead of my 300+ cores
I’m calling XGBoost using the Java library XGBoost4J
on a dataset of ~9 million rows to do a binary classification problem in which 95% of the ~9 millions rows are the “negative control” but I’m interesting in predicting the “positive control” (and in particular, I’m interested in maximizing the number of predicted positive controls at a 1% false discovery rate). However, I noticed that about only 1 or at most 2 cpu cores are being used. Is there a way that I can use all my other cores? (I have about 300 more cores on the cloud VM I’m using!) (The parameters I’m using are below.)
Caused by: java.lang.ClassNotFoundException: ml.dmlc.xgboost4j.scala.Booster
Spark Version: 3.0
xgboost4j_2.12-1.7.1