Relative Content

Tag Archive for apache-sparkazure-databricksmssql-jdbc

DataFrame write to Azure-SQL row-by-row performance

We are using azure databricks spark to write data to Azure SQL database. Last week we switched from runtime 9.1 (spark 3.1) to newer 14.3 (spark 3.5) using spark native JDBC driver. However when we write data, it appears, that Spark JDBC now creates individual “insert into” statements for each row, which results in large DB overhead (especially for large tables) and DB audit log grows enormously.
For examples, when we insert 10k rows/3 cols, it creates 10k insert statements, which turns out to be approx. 8 MB of audit log file on blob storage.