I am working on an existing database design. There is one table in which each row is created with only half the columns populated (the remainder are initially all
NULL, except for an
is_populated column), and later a single
UPDATE to populate the remaining columns (usually weeks to a month later). None of the data in any of the columns are intended to change after being populated (i.e. when they’re no longer
NULL). Even though the system is quite old there seem to be many more non-updated rows than updated rows.
Let’s say that initially there are 5 populated columns and 5
NULL columns. After the
UPDATE all 10 columns are populated. It’s not allowed for the data to become unpopulated after the first
Is this truly a one-to-one relationship? Should both halves of the data have been initially stored in a single table or should they have been split into two? Are there any negative performance implications if doing a
LEFT JOIN on the first table’s primary key (which would end up returning exactly the same structure) instead of querying a single table? If I was to create a similar structure in the future should I follow this design or separate the two concerns?
This is an optional 1 to 1 relationship or a 1 to (0,1) relationship. A true identity relationship would be 1 to (1,1).
I find this notation useful in understanding the scale of the relationship. A one to many relationship could be 1 to (0,10), 1 to (1,5), 1 to (1, *). The fist digit is always 0 (optional) or 1 (mandatory) while the second specifies an upper boundary or unlimited/unspecified.
A many to many relationship resolves to two 1 to relationships to the required join table.
I generally find it is not useful to move optional columns out of the table. In this case, all 10 columns are required, but it is possible to create a record without having the full set of data.
The one case where I have seen identity relationships make sense is inventory-like tables, where there is tombstone data which doesn’t change much and frequently changing counts. The access rights to the tombstone data are often different as well. As the system scales out this often becomes a one to many relationship with inventory counts for each location. Joins are created to only one of the tables with the same identity, if the required data is in only one of the tables.
I’d leave them in a single table if that’s where they are now.
Splitting the table might make sense if you have some BLOB fields you want to isolate, but you didn’t mention that. I’ve seen tables like this (with many columns that are mostly NULL) with almost 100 columns. They’re ugly to work with, but you should ask yourself
- how much labor would it take to make this cleaner?
- how much labor would it take to fix all the bugs introduced by making it cleaner?
- what other tasks wouldn’t get done because this took priority?
- will the users notice or care if this is done?
- will the business be willing to pay for this labor?
- what’s the worst that can happen if this effort fails?
As for your other questions:
Is this truly a one-to-one relationship?
Not as long as all the columns are in the same table. There’s no term for it – they’re just ‘columns in a table’.
Should both halves of the data be stored in a single table or should they be split into two?
As noted above, leave them alone.
Are there any negative performance implications if doing a LEFT JOIN on the first table’s primary key (which would end up returning exactly the same structure) instead of querying a single table?
For all practical purposes, “no” – unless you have a billion rows or are running on an old XP machine or you have really good timing software. (Technically there’s a tiny difference, but you can safely ignore it).
One last thing – the term “one-to-none relationship” does not make sense to me. It’s implying that a single row in table A is guaranteed NOT to be related to 1 or more rows in table B. That’s pretty much the case for any two tables that don’t have relationship.