Cannot execute: spark-submit
I’m trying to automate a PySpark code using Airflow (i m using docker ) and I’m getting the following error:
SparkSubmitOperator for submitting pyspark applications in airflow that load data from a volume mount
I am running spark and airflow as separate docker containers. My spark scripts and the data they load are mounted as a volume with the same path in both the spark and airflow containers. My airflow Dockerfile installs the apache-airflow-providers-apache-spark
package which installs pyspark as a dependency. Therefore, I didn’t download the spark binaries.