Relative Content

Tag Archive for pysparkcatboost

Converting Pandas DF to Spark Pool Data

I am trying to train a CatBoostClassifier model using catboost_spark using a Pandas DataFrame. All of the examples I’ve found create a data pool based on dummy data that uses Vector or VectorAssembler (example 1, example 2). Is there a way to easily use a Pandas df for training a spark model, or is there a way to convert a Pandas df into the data Pool?

Converting Pandas DF to Spark Pool Data

I am trying to train a CatBoostClassifier model using catboost_spark using a Pandas DataFrame. All of the examples I’ve found create a data pool based on dummy data that uses Vector or VectorAssembler (example 1, example 2). Is there a way to easily use a Pandas df for training a spark model, or is there a way to convert a Pandas df into the data Pool?