I have to clarify first that the status column is not intended to reflect the status of a real-world item represented by the record (row) in the table. Rather, it is intended to show the status of the record itself.
It can be as simple as Active/Inactive or complicated like Approved/Deleted/Locked/Pending/Rejected, etc. The status can be stored on a boolean/short integer column or a single-character column, with mappings like
1 = Active or
A = Approved.
The basic idea is to have a recycle bin/trash-like recovery support in the application (and simulate it in the database). If there is a front-end GUI or other interface that can supposedly let a user “delete” records, it doesn’t actually delete the record in the table, but simply changes the record status to Inactive or Deleted. When the interface fetches records, it always gets the records that only match the condition that the status is Active or Approved.
If the user makes a mistake and the “deleted” record (in user’s perspective) needs to be recovered, a DBA can easily patch the record back to being Active or Approved, which would be better than searching for backups and hopefully finding the original record there. Or the interface itself can let the user view deleted records in a separate view, and restore them as needed, or even permanently delete them (deleting the actual record).
- Is this a good practice, or a bad practice?
- Does it affect normalization of the data?
- What are the potential pitfalls?
- Is there any alternative method of achieving the same goal? (see note)
- How can you have the database enforce unique constraints on the data for a certain status only (but allow any number of duplicates for other statuses)?
- Why don’t databases provide a “recycle bin”-like feature or table-tracking/recovery natively, so we can let interfaces delete the actual records without worry?
Note: I read about maintaining a separate history table but that seems worse in terms of storage and having to generate triggers and keep the triggers up-to-date with the tracked table’s schema.
I know this as a “Soft Delete”; just marking a record as “deleted”, even though it really isn’t.
Is this a good practice, or a bad practice?
If this is something that your users need [a lot] then it’s probably a good thing. In the vast majority of cases, though, I would argue that it’s adding [a lot of] overhead for little benefit.
Does it affect normalization of the data?
No, but it will affect your Indexing of that data.
Ensure that you include the “deleted” column in your indexes, so that these rows get excluded as early as possible in your queries.
What are the potential pitfalls?
Your data becomes a little more complex. Everything that goes anywhere near the data needs to “know” about these extra, “not-really-there” records. Or, you have to create Views on those tables that exclude these rows and use these views in, say, your Reporting Tool of Choice.
Your database may increase in size. If you’re not really deleting these rows then they’re still there, taking up space. This may or may not be an issue, especially since you’ve included them in your indexes, so the space they consume is multiplied up.
Is there any alternative method of achieving the same goal? (see note)
Not really, no.
How can you have the database enforce unique constraints on the data for a certain status only (but allow any number of duplicates for other statuses)?
Not easily. Declarative Referential Integrity (foreign key clauses) is the cleanest way to implement this and its easy for things like Reporting tools to pick up on these rules to determine the relationships between tables. Such rules apply to all records, regardless of “status” (and there’s no way around that).
The alternative is to use Triggers, snippets of procedural code that enforce the referential integrity between tables and do all the clever, conditional stuff that you need. That’s good for your particular case, but most of the benefits of Declarative R.I. go out of the window – there’s no [externally] detectable relationships between your tables; that’s all “hidden” in the triggers.
Why don’t databases provide a “recycle bin”-like feature or table-tracking/recovery natively, so we can let interfaces delete the actual records without worry?
Why would they?
These are databases, after all, not file systems or spreadsheets.
What they do, they [can] do very, very well.
What they don’t do, there probably hasn’t been much demand for.
It’s a practice. Whether it is good or bad depends heavily on your application and how commonly you’re really going to need/ want to do an “undelete”. I’d be pretty dubious of a plan to put that sort of column of every table in the system– it seems highly unlikely that you would really bother implementing undelete on every table in the system. And it requires implementation– in the vast majority of cases, you’re not undeleting a single row from a single table, you have to walk through child tables undeleting rows and updating related tables.
For most of the rest of the questions, it’s highly implementation dependent. For example, Oracle provides different methods to track all changes to a table– Flashback Data Archive (FDA also known as Total Recall) being the most recent approach to maintaining a full history of every version of a row and in-database archiving for implementing the soft delete pattern. Other databases may provide other ways to implement the pattern. Depending on the database and how you implement the soft delete, there will be various impacts to performance, whether and how constraints can be enforced, etc. If we’re talking Oracle, you can do a lot with function-based indexes, for example, in SQL Server you can often use filtered indexes for similar purposes.
It is very common to use a “flagged for deletion” field in MRP/ERP systems.
For instance, one might want to mark a part or inventory record that is no longer sold as inactive, but there are still outstanding orders associated with it. Doing a real delete on the record could affect orders that have not shipped yet, ledger entries that have not posted yet, history tables that will not be built until the month end, etc. Many systems will disallow a record deletion unless it passes a series of validations against other tables. If you are cascading deletions through your relationships, a real delete can be even more destructive.
Instead, by flagging it for deletion, you put a clear marker of intent on the record and later a scheduled task can delete the record if it verifies that all the related tables are no longer referencing it.
A similar case could be made for this feature on a customer table and other “long-term” tables. It even makes sense on more volatile tables like orders, although the name of the flag may become something like “shipped” or “cancelled”. It serves the same function: don’t delete it this second, but use it as a flag for the purge program so it attempts to validate the record’s deletion in the future.
As an alternative solution, the use of event sourcing allows similar goals without complicating table structure, although it does make code for modifying your data a little more complex, as you have to write the modification into an event that can be persisted to an event history. This then allows you to recreate the database as it was at any given time, which can be a very useful feature.
(I don’t believe this is what you meant by “history table”, which I think you meant simply copying modified or deleted records into another table prior to changing them)
I see and use this pattern frequently for these use-cases:
- metadata where you only wish to display the values that are in effect today. For example to pick from a list of car manufacturers in a drop down list where enabled = 1 the tables values for ID,VALUE,ENABLED are 1, ‘Ford’,1 and 2, ‘Edsel’, 0, 3, ‘Toyota’,1 gives only the choices of Ford and Toyota
- for a case management system where the paradigm is that a case can only be in one state at a time. In this case the toggle column was called CURRENT with values of 0 or 1 enforced by check constraints. As a case moves from one state to another the application updates the CURRENT flag of the old state to 0 and the new to 1
The problem is to enforce data integrity if more than one application or web service is writing to tables. How do you ensure that for a case there is only one current state? As Justin Cave points out this can be done in Oracle by creating a virtual index based on a function but this extra overhead for what originally seemed a simple concept.
Its a good practice if you plan to use your data for reporting (any large enough application would need to have reports).
In order to speed up your application, you should really not let reporting tools run on your database. As such you’d need to do a copy/sync to another database.
recordStatus of only two states
CANCELLED in combination with a
lastUpdatedOn timestamp. I use
recordStatus rather than
status which usually has a business meaning.
When I am syncing the reporting database with the application, I do a filter on
lastUpdatedOn to know which ones I am going to replace on the reporting side.
On the reporting side I won’t have the
lastUpdatedOn fields since it’s generally not going to be reported on. As such when I see a
CANCELLED status I would delete the record from the reporting side that way it only has active records.
This can be expanded to other types of stores such as archives or backups where almost full synchronization is required. However, reporting is the more common purpose.
Note your example of
Pending is NOT a good idea to put as a common field as that has a business meaning it should go only to where it makes sense business wise.
As for locked, use
versionNo which provides an optimistic lock for your record.
Another option instead of
recordActive and have it stored as a
boolean which takes up less space and less indexing, but I would be concerned about future needs that you may not foresee.