I’m working with a Spark standalone cluster and I need to load file onto the workers.
I’ve used SparkFiles, but my file still isn’t visible. It’s present on the master, but not distributed to the workers. How can I resolve this issue?
Ways to add a file:
pyspark --master spark://ip-address:7077 --files data.csv
from pyspark import SparkFiles
spark.sparkContext.addFile("data.csv")
Screenshots:
Try to read file on cluster
Read file on master
New contributor