When I submit Python code from PyCharm’s remote interpreter to a Spark cluster on a Linux server, why does it throw an error?
【This is my code:】
from pyspark import SparkConf, SparkContext
if name == ‘main‘:
conf = SparkConf().setMaster(“local[*]”).setAppName(“WordCountHelloWorld”)
sc = SparkContext(conf=conf)
file_rdd = sc.textFile(“hdfs://node1:8020/input/words.txt”)
words_rdd = file_rdd.flatMap(lambda line: line.split(” “))
words_with_one_rdd = words_rdd.map(lambda x:(x,1))
result_rdd = words_with_one_rdd.reduceByKey(lambda a, b: a+b)
print(result_rdd.collect())`
【And this is error imformation:】
ssh://root@node1:22/export/server/anaconda3/envs/pyspark/bin/python -u /tmp/
pycharm_project_111/00_exampl/HelloWorld.py
/export/server/anaconda3/envs/pyspark/lib/python3.8/site-packages/pyspark/bin/spark-class: line 71:
export/server/jdk/bin/java: No such file or directory
/export/server/anaconda3/envs/pyspark/lib/python3.8/site-packages/pyspark/bin/spark-class: line 96: CMD:
bad array subscript
Traceback (most recent call last):
File “/tmp/pycharm_project_111/00_exampl/HelloWorld.py”, line 6, in
sc = SparkContext(conf=conf)
File “/export/server/anaconda3/envs/pyspark/lib/python3.8/site-packages/pyspark/context.py”, line 144,
in init
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)