Use one query that fetches redundant data from linked tables or multiple queries and join them in code?

I have three tables with a chain of two one-to-many relationships. There are two ways I can think of of fetching the data based on the primary key of the first table.

  1. JOIN each table (or LEFT JOIN if there are nonexistent rows) in a single query (SELECT * FROM t1, t2, t3 WHERE t1.id = 123 AND t2.t1_id = t1.id AND t3.t2_id = t2.id)
  2. Run three queries linking to the first table’s primary key each time (SELECT t3.* FROM t3, t2, t1 WHERE t3.t2_id = t2.id AND t2.t1_id = t1.id AND t1.id = 123). Keep the database connection open until the data has been received from all three tables

Using a single JOIN you will return redundant data, and have to handle the special case when there are zero rows on the right side separately. Running three queries means the database engine will have to do very similar things three times, and could possibly increase latency. It is possible to say which one is “better”? Does it depend on the situation, e.g. the size of the data being retrieved, is the database on the same server/network as the calling code, etc.?

1

Such kind of questions cannot be answered in a sensible manner on an abstract level, without the surrounding context. One has to try this out using the real system: a real database, filled with real data, a real network, a real client machine running a real client application.

Then it will be possible to profile both approaches and compare them. And keep in mind performance might not be the only and not even the primary factor for making the decision which one to choose. For example, if your first query turns out to be slower, but the second one keeps the client code simpler and is fast enough for your use case (or vice versa), you should probably pick the simpler solution.

If you want a recommendation which one to realize first: start with the solution which you “feel” makes the code (SQL + client code in total) simpler. Optimize only if it turns out it is not fast enough for your use case.

1

You should use joins, and limit the columns you select, AND run multiple selects in the single query : (changed your structure slightly to reduce example code)

SELECT t1.* FROM 
    t1
LEFT JOIN
    t2
    on t2.t1_id = t1.id 
WHERE 
    t1.id > 123 and t1.id < 321;

SELECT t2.* FROM 
    t1
INNER JOIN
    t2
    on t2.t1_id = t1.id 
WHERE 
    t1.id > 123 and t1.id < 321;

Your code can then iterate the first dataset to populate the t1s and the second data set adding the children to the correct now in memory t1 object

Dictionary<string,t1> t1s;
foreach(var t1 in dataSet1)
{
    t1s.Add(t1.Id, t1);
}
foreach(var t2 in dataSet2)
{
    t1s[t2.t1id].Childern.Add(t2);
}

The SQL can be optimized with temporary tables etc if required.

This only gives slightly better performance than running the two queries separately though. Providing that your child query is of the form ‘where t2.t1Id is in {1,2,3,4,5..)’ rather than being looped over per id

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *