AWS Glue script to run SQL command in Pyspark
I need to execute two commands into an Aurora MySQL database, which has a connection in glue already in place. The first command is TRUNCATE TABLE and the second LOAD DATA FROM S3 into a table. I know I could easily do this with a Lambda function in Python but the timeout limit is not enough (15 min), since the data I need to load is an 11GB text file. Also, I’ve read that i could perform the truncate table as a preaction.
I don’t think this needs a source and a target node like any regular ETL job, if only I could run a SQL command using an existing connection.