Converting Pandas DF to Spark Pool Data
I am trying to train a CatBoostClassifier model using catboost_spark using a Pandas DataFrame. All of the examples I’ve found create a data pool based on dummy data that uses Vector or VectorAssembler (example 1, example 2). Is there a way to easily use a Pandas df for training a spark model, or is there a way to convert a Pandas df into the data Pool?
Converting Pandas DF to Spark Pool Data
I am trying to train a CatBoostClassifier model using catboost_spark using a Pandas DataFrame. All of the examples I’ve found create a data pool based on dummy data that uses Vector or VectorAssembler (example 1, example 2). Is there a way to easily use a Pandas df for training a spark model, or is there a way to convert a Pandas df into the data Pool?
PySpark ML CrossValidator cannot load serialized CrossValidator because it cannot find CatBoostRegressor class
I asked the question here, but no one has answered.