Relative Content

Tag Archive for apache-sparkpysparkdecision-tree

Do Spark / Pyspark ML tree-based algorithms require one-hot encoding?

Tree-based algorithms are able to handle nominal data without one-hot encoding, but whether this works is implementation-specific. I found old answers on StackOverflow that say the old MLLib tree algorithms were able to use metadata from a StringIndexer to properly handle categorical data. Is that still the case in the modern pyspark.ml? And is the metadata preserved by VectorAssembler?

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for apache-sparkpysparkdecision-tree

Do Spark / Pyspark ML tree-based algorithms require one-hot encoding?