SQL Stuff with condition to PySpark

  Kiến thức lập trình

I have this sql query:

SELECT COUNT(Id) AS [dup_count], table.Name, Number
    ,[Ids]= STUFF(
    (SELECT ', ' + cast([Id] as varchar)
    FROM table t1
    WHERE t1.Name = table.Name AND t1.Number = table.Number
    FOR XML PATH (''))
    , 1, 1, '')
FROM table
GROUP BY
    table.Name, Number
HAVING
    COUNT(table.EntityId) > 1

I’m trying to write it in Glue PySpark, but I don’t know how to rewrite stuff’s where condition “WHERE t1.Name = table.Name AND t1.Number = table.Number”. I ended up with this and not sure how to include the mentioned condition.

SELECT COUNT(Id) AS dup_count, table.Name, Number
    ,concat_ws(', ', collect_set(Id)) AS Ids
FROM table
GROUP BY
    table.Name, Number
HAVING
    COUNT(table.Id) > 1

LEAVE A COMMENT