Getting the same data in multiple ways

Say I want to get the same data from an SQL database, but want to have multiple ways of getting that data. For example, say I want to be able to search a database for users based on username, real name and date of birth. In my data access layer I would have GetUserByUsername(), GetUserByRealName() and GetUserByDOB(). However, when I have a lot of different ways to potentially search for a lot of different objects in the database (say, for example 15 completely unrelated objects to search for and 5 completely unique ways to search for them each) this very quickly gets messy. Is there a better way to do it?

2

Two options immediately come to mind.

A general search which returns any users with a matching attribute:

GetUsersByValue(value)

GetUsersByValue("1916-09-10")
  -> bbrown|Bob Brown|1916-09-10", schan|Sally Chan|1916-09-10

Whilst useful, this could be very inefficient, and could generate a lot of noise if you have fields which reasonably contain the same sort of data types. eg social security numbers and phone numbers. So you might prefer:

An arbitrary attribute search which returns any users matching the value of the specified attribute:

GetUsersByAttribute(name, value)

GetUsersByAttribute('DOB', '1916-09-10')
  -> bbrown|Bob Brown|1916-09-10", schan|Sally Chan|1916-09-10

This can be solved by providing a generic function like

 GetUsersByWhereCondition(condition)

where condition is an arbitrary string to be used as part of an SQL “WHERE” clause like this one:

 GetUsersByWhereCondition("DOB=#1916-09-10# AND Realname='Foo'")

This will allow you to create almost any query condition you like at run time dynamically, even if it is a complex condition with subselects, combined boolean logic or anything else which is supported by SQL.

But beware, this solution has two potential drawbacks:

  • the implementation of the method needs to append the WHERE condition to a SELECT SQL clause, without the possibility of easily checking the SQL first (at least, not without implementing something like an SQL parser). This makes it somewhat prone to SQL injection attacks. So it is important one knows for sure the caller of the method does not pass any unsanitized input through the method. In some contexts, that is easy to guarantee, in others it could become really hard.

  • this has the potential to make the parts of the program which call that function dependent from the SQL dialect of your database. If those parts are also in the DB access layer, this will probably not be a big problem. If the parts are in the business layer and you want to keep that independent from the SQL dialect, it could be.

So, if you can live with that drawbacks, this solution is very powerful and maybe what you are looking for. Otherwise use a less generic solution like the one from @JohnMee.

4

I had a similar problem for address data once, our solution was to create a single column containing all searchable data.

You will find that indexing all the different columns you want to search on will bloat your DB size by quite a lot. This way reduces that because you only have 1 searchable index, although it duplicates data in a ‘messy’ search column. It does, however, allow you to use fuzzy searches on the data though – eg you store names as soundex (or double-metaphone) values and search on those instead of the full name if your users have difficulty spelling them accurately.

You may want to modify your data with some prefixes (eg SSNs might be prefixed with the token ‘SSN’ or a token delimiter before and after) and spaces will have to be removed or replaced, though YMMV.

You will have to manage the column data as users are created or updated which may be too difficult – for us we only had to do this on creation. You will also have to handle spaces – but you might not, if you want to search for surnames and first names without middle names getting in the way.

You are duplicating data by creating such a special ‘search column’ with this approach, but it worked very well. You could alternatively look into creating a single index from multiple columns (some DBs allow this with features such as free-text search indexing) and simply searching that instead.

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *