SQL to get column value in GROUP BY based on value of another column

  Kiến thức lập trình

I’m struggling to articulate this problem in a concise title, but it will be easily explained with a simple example.

Say we have a single database table, named plum_tree_inspections, that records how many plums there were on a plum trees when inspected at a specific date. Imagine our table is like:

id tree_id date height plum_count
1 1234 12 Mar 2021 1.3 19
2 678 13 Mar 2021 2.8 16
3 1234 10 Mar 2022 1.5 26
4 678 11 Mar 2022 3.2 22
5 1234 20 Mar 2023 1.4 32
6 678 21 Mar 2023 3.6 20

Then I want to have a query that consolidates this table into a summary table (a view, in fact) that gives you info about each distinct tree. This should have columns for the earliest count date, latest count date, the maximum height the tree ever reached (maybe the top branch fell off in a storm one year), average plum count and the plum count from the most recent count.

I can use a simple function for all these columns, except the one to get the plum count from the most recent count. SQL would be like:

SELECT DISTINCT 
    tree_id,
    min(date) AS earliest_date,
    max(date) AS latest_date,
    max(height) AS max_height,
    avg(plum_count) AS avg_plum_count,
    ????? AS latest_plum_count
FROM plum_tree_inspections
GROUP BY tree_id

Is there some simple SQL for the ????? in the query above? Or does this need to be rewritten in a more involved query?

Note that the real world database I’m talking about has about a million records in the inspections table, so any solution needs to be pretty performant (but can be however complex it needs to be).

LEAVE A COMMENT