Question

I’m struggling to articulate this problem in a concise title, but it will be easily explained with a simple example.

Say we have a single database table, named plum_tree_inspections, that records how many plums there were on a plum trees when inspected at a specific date. Imagine our table is like:

id	tree_id	date	height	plum_count
1	1234	12 Mar 2021	1.3	19
2	678	13 Mar 2021	2.8	16
3	1234	10 Mar 2022	1.5	26
4	678	11 Mar 2022	3.2	22
5	1234	20 Mar 2023	1.4	32
6	678	21 Mar 2023	3.6	20

Then I want to have a query that consolidates this table into a summary table (a view, in fact) that gives you info about each distinct tree. This should have columns for the earliest count date, latest count date, the maximum height the tree ever reached (maybe the top branch fell off in a storm one year), average plum count and the plum count from the most recent count.

I can use a simple function for all these columns, except the one to get the plum count from the most recent count. SQL would be like:

SELECT DISTINCT 
    tree_id,
    min(date) AS earliest_date,
    max(date) AS latest_date,
    max(height) AS max_height,
    avg(plum_count) AS avg_plum_count,
    ????? AS latest_plum_count
FROM plum_tree_inspections
GROUP BY tree_id

Is there some simple SQL for the ????? in the query above? Or does this need to be rewritten in a more involved query?

Note that the real world database I’m talking about has about a million records in the inspections table, so any solution needs to be pretty performant (but can be however complex it needs to be).

SQL to get column value in GROUP BY based on value of another column

LEAVE A COMMENT Hủy