How to avoid mixing surrogate key with the rest of the data

Suppose you have a very simple CRUD where you are just storing a simple tuple-like type of data, but its natural primary key is cumbersome for the user to remember (very long, difficult to write, etc.). Further, suppose I want to let users refer to the records using a handle such as an autoincremented integer value (which will then act as the primary key, but is surrogate, rather than natural).

They create the records without specifying the handle, but, when they request a list of all records, I want to give them the records with the handle attached to them, so that they can later easily refer to them.

If, in my application, the record is represented as a simple value object, (which is its own DTO), which is created and passed to the database gateway, and returned from the database gateway… how could I properly do the returning of the handle as well?

I don’t want to return a different object (something like a record decorated with the handle) when reading from the database,
because I believe the handle is just a presentation detail demanded by the requirements of this specific user interface, and may be useless for different user interfaces. Sticking the handle to the existing record object makes no sense because it is not really part of the data type.

One thing I could do is have the presentation separately ask for the handle of each record. That would mean two trips to the database but at least it would be clear that the handle is not part of the record.

But I’m positive there must be a better way. Any ideas?

For example

A very simple book catalog. The data is Book, which is represented by tuples of the form (author, title, year). We suppose (author, title) is a natural primary key (in all rigor, this may not be the case but just take this as an assumption of the system). I only have an UI, which is a command-line interface to my system. Here, requiring the users to refer to books by providing the natural primary key (author, title) is cumbersome, hence I decide to link each record to a dumb autoincremented integer which would act as a surrogate primary key. That way, users of the command-line interface can refer to records by their associated integer handle. This link may be implemented as a sparate association table mapping natural PKs to surrogate PKs.

The whole implementation of the system consists of a data type, which may be a tuple/struct/hash/custom plain old object, and a single CRUD repository where I can persist these objects, and read them by their natural primary key. I don’t want to return an object which also has a ‘handle’ member containing its associated surrogate primary key because that data has nothing to do with the rest of the data and is there just as a requirement of the particular UI.

So I’d propose having a method on the repository (or a different repository altogether) which gives me handles of records, providing the natural primary key. With this, this specific UI would use the repository (or those two repositories) to first retrieve the record and then retrieve the handle, in two steps.

5

I don’t see why you feel this way.

“Sticking the handle to the existing record object makes no sense
because it is not really part of the data type.”

If the user needs to identify a record by an arbitrary or incremental number, then it is data. You can even give it some generic name like “ID Number.” Many purchase order (PO Number) and billing systems (Invoice ID) have these numbers. In some places, keeping track of these sequential numbers is very important in an audit. You may not rely on this for an update or delete (It’s not the Primary Key), that’s fine. Don’t send it back to the db if you don’t want to.

6

Been there, done that.

I’ve programmed CLI apps that interact with database items where the natural key is cumbersome to type.

What I did is that I designed my app so that the use case is always as follows:

  • User types the command followed by a string.
  • That string is something the user remembers from the description of the item represented in the database row ( examples: a movie, a person, a book, etc )
  • The app filters the database with LIKE
  • Other parameters and switches can make the query narrow down the number of rows the LIKE command has to act on.
  • If a manageable (for a CLI) number of rows, say 15 at most, is retrieved, a menu is presented. The menu consists in printing each row (a selected group of columns) preceded by a number, but that number is not the surrogate, it’s the line number of a temp file created after the search with the surrogates of the rows. The user will always select from a menu and the numbers will always be 1~15 at most.
  • If the number of rows more than 15 I print a message stating so and that they should refine the search.
  • Notice that here, like in GUIs you are offering the user a way to chose a record without knowing the surrogate (in GUIs the user clicks on the item o selects it from a drop down).

In another CLI app, I had to show the surrogate so users could use it as a parameter for the program. In that case I didn’t use the menu approach because the app was a maintenance CLI tools for the maintainers of the system, doing a set of very complex operations on the data, and they could and should get the surrogate to pass it as as parameter to the program.

Bottom-line:

I don’t want to return an object which also has a ‘handle’ member
containing its associated surrogate primary key because that data has
nothing to do with the rest of the data and is there just as a
requirement of the particular UI.

Surrogate keys are not a requirements of UIs, they are used because the natural key is non-existing, bad (changes often) or cumbersome (too many columns)

Try the menu approach, if it doesn’t suit you, (like in the second example I gave you) then show the surrogate. If the natural key is so cumbersome to type then the surrogate is part of the data.

1

I don’t want to return an object which also has a ‘handle’ member containing its associated surrogate primary key because that data has nothing to do with the rest of the data and is there just as a requirement of the particular UI.

The design of your system becomes much simpler if you just change your mind on this. Having data objects with a 1:1 correspondence between the object’s attributes and the database columns is a straightforward approach which serves most systems I have seen in the past very well. It makes it very easy to generate the whole CRUD code and the code for the data objects themselves from a meta description or from the database, instead of writing it manually.

And yes, these objects might have a dozen attributes, of which a particular UI form, dialog or list will only show a six, seven or ten, and another form will show or use three different ones. The fact the one attribute you do not want to show in one form is a surrogate key, or some kind of business data, or another technical item which has nothing to do with the UI, is irrelevant.

Of course, I have also seen systems where one could omit some columns and allow the retrieval of “partial” objects, where some of the attributes are not initialized after the first object creation, and might be loaded later by a second query. AFAIK some ORMs provide some automatic “lazy loading” mechanisms which work exactly that way. But for most situations, this is an unnecessary optimization, because database query performance is mainly dominated by the number of roundtrips of a SELECT. It makes often no measurable difference if you select 3, 6 or 12 columns from a table, except one of the left-out columns is a BLOB field or huge text field.

So do yourself a favor and try not to overengineer your system by tailoring your data objects to each and every requirement of a specific part of the UI. Start with one data object per table, containing all columns, and one set of CRUD operations, even if that means to include a surrogate key attribute you don’t need everywhere. If you really run into trouble with that, optimize later when you arrive at that situation.

2

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *