Slow databricks query when filtering by one column and sorting by another
We have huge table which stores information about blockchain blocks, of our particular interest is block number and its timestamp. Suppose we need to map timestamp to block number, to solve the task “give me the highest block before or at given timestamp”. This task translates into SQL:
Databricks Notebook (SQL) – Set Time Zone
Trying to deal with an issue in databricks where the logging framework we have put in place is logging the wrong time (using GETDATE() or CURRENT_TIMESTAMP()). We attempted to resolve this by using ‘SET TIME ZONE’ but it appears this is being ignored in this SQL statement?
Databricks: managed tables vs. external tables
managed tables
are fully managed by the Databricks workspace, where Databricks handles the storage and metadata of the table, including the lifecycle of the data. When you create or delete a managed table, Databricks automatically manages the underlying data files.
External tables
on the other hand, store their data outside of the Databricks-managed storage, often in external systems like cloud storage (e.g., AWS S3 or Azure Blob Storage). While Databricks manages the metadata for external tables, the actual data remains in the specified external location, providing flexibility and control over the data storage lifecycle. This setup allows users to leverage existing data storage infrastructure while utilizing Databricks’ processing capabilities.
Dynamic Transpose of Rows to Columns in Databricks SQL
I just have started to work with Databricks SQL.
Here I have a table that contains for each article a list of attributes and values following this structure:
Databricks Merge destination only supports Delta Sources
I have created two Databricks Delta sources as follows using Databricks SQL as follows:
T-SQL Conversion to Databricks SQL with CONV Function
I’m trying to convert the following T-SQL to Databricks SQL
how to parse csv using from_csv with schema_of_csv?
I want to parse csv data which is the value of column in another table. The thing is that I don’t know the schema of this csv data.
Currently, I am trying the query like this: SELECT from_csv(csv_col, schema_of_csv(csv_col)) AS csv_data FROM csv_data_table;
but it looks like schema_of_csv does not accept column names.