Relative Content

Tag Archive for pythonpython-3.xpysparkrddtf-idf

Saving and Loading RDD (pyspark) to pickle file is changing order of SparseVectors

I trained tf-idf on a pre-tokenized (unigram tokenizer) dataset that I converted from list[list(token1, token2, token3, ...)] to an RDD using pyspark’s HashingTF and IDF implementations. I tried to save the RDD with tf-idf values, but when I saved the output to a file and then loaded it from the file. The loaded file outputs an RDD that is the original saved RDD but with the order of the SparseVectors now seemingly with a random one as the first in the RDD and then assigned proper order after that.

Saving and Loading RDD (pyspark) to pickle file is randomizing SparseVectors

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for pythonpython-3.xpysparkrddtf-idf

Saving and Loading RDD (pyspark) to pickle file is changing order of SparseVectors

Saving and Loading RDD (pyspark) to pickle file is randomizing SparseVectors