Clickhouse query isssues

  Kiến thức lập trình

I’m trying to do a complicated query in Clikchouse and I’m struggling to see where my error is.

I have three tables within Clickhouse:

  • jump_news_sentiment.news_sentiment_after with columns: assetid, date, agg_sentiment, agg_sentiment_rw, and n
  • jump_news_sentiment.news_sentiment_before with columns: assetid, date, agg_sentiment, agg_sentiment_rw, and n
  • jump_news_sentiment.date_next with columns: date, and date_next

What I try to achieve:

  • Prepare sentiment data by shifting dates forward in the table news_sentiment_after
  • Combine sentiment data from before and after specific dates for each asset, handling nulls and summing up values as necessary.
  • Further process the merged data by filtering based on intervals defined in another table and aggregating sentiment scores and counts within these intervals for each asset.

The query that I use is:

WITH shifted_dates AS (
        SELECT 
            assetid, 
            dateAdd(day, 1, date) AS date, 
            agg_sentiment, 
            agg_sentiment_rw, 
            n
        FROM jump_news_sentiment.news_sentiment_after
    ), merged_data AS (
        SELECT 
            coalesce(b.assetid, a.assetid) AS assetid, 
            coalesce(b.date, a.date) AS date, 
            coalesce(b.agg_sentiment, 0) + coalesce(a.agg_sentiment, 0)  AS agg_sentiment,
            coalesce(b.agg_sentiment_rw, 0) + coalesce(a.agg_sentiment_rw, 0) AS agg_sentiment_rw,
            coalesce(a.n, 0) + coalesce(b.n, 0) AS n
        FROM shifted_dates a
        FULL OUTER JOIN jump_news_sentiment.news_sentiment_before b USING (date, assetid)
    ),
    aggregated_data AS (
        SELECT 
            m.assetid AS assetid,
            max(m.date) AS date,
            sum(m.agg_sentiment) AS agg_sentiment,
            sum(m.agg_sentiment_rw) AS agg_sentiment_rw,
            sum(m.n) AS n
        FROM merged_data m
        JOIN jump_news_sentiment.date_next e ON m.date > e.date AND m.date <= e.date_next
        GROUP BY m.assetid, e.date_next
    )
    SELECT * FROM aggregated_data;

However, when I do this I get the following error:

JOIN  merged_data AS __table2 ALL INNER JOIN jump_news_sentiment.date_next AS __table6 ON (__table2.date > __table6.date) AND (__table2.date <= __table6.date_next) join expression contains column from left and right table.

LEAVE A COMMENT