How to load a csv file onto Spark standalone cluster without HDFS?

  Kiến thức lập trình

I’m working with a Spark standalone cluster and I need to load file onto the workers.

I’ve used SparkFiles, but my file still isn’t visible. It’s present on the master, but not distributed to the workers. How can I resolve this issue?

Ways to add a file:

pyspark --master spark://ip-address:7077 --files data.csv

from pyspark import SparkFiles
spark.sparkContext.addFile("data.csv")

Screenshots:

Try to read file on cluster

Read file on master

New contributor

reysand is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

LEAVE A COMMENT