I read an article on BBC. One of the examples they said was that people with surname ‘Null’ are having problems with entering their details in some websites.
No explanation is given about the error they are facing.
But as far as I know the string ‘Null’ and the actual Null value is completely different (from a database point of view).
Why would this cause problems in a database?
It doesn’t cause database problems. It causes problems in applications written by developers that don’t understand databases. At the root of the problem is that much database-related software displays a NULL record as the string
NULL. When an application then relies on the string form of a NULL record (likely also using case-insensitive comparison operations), then such an application will consider any
"null" string to be NULL. Consequently a name Null would be considered to not exist by that application.
The solution is to declare non-null columns as
NOT NULL in the database, and to not apply string operations to database records. Most languages have excellent database APIs that make string-level interfaces unnecessary. They should always be preferred, also since they make other mistakes such as SQL injection less likely.
To answer your specific question there are many steps along the chain of events between a web form and the database. If the last name
Null is erroneously interpreted as a
NULL value then the system may reject a perfectly valid name as being invalid. This can happen at the database layer as explained by amon. Incidentally if this is the specific issue then the database is also probably open to SQL injection AKA the Bobby Tables attack. Another step in the chain that could be causing problems is the serialization process.
Overall the article was about a bigger problem. The world is a big messy place that doesn’t always conform to our assumptions. This is especially apparent when you try to internationalize your application. At the end of the day we need to ensure our applications handle and encode our data properly. It is up to the business to decide how many resources we dedicate towards supporting increasingly complicated edge cases. While I fully support being inclusive, I will understand if the business decides that “the artist formally known as Prince” needs to use a Unicode character to represent his name in our database.
Most likely its a programming issue. If you look at this answer here on how NULLs are being passed you could easily cause some undesired behavior if you were “Mr. Null”.
You can see that if some data element was passed as NULL the data would be interpolated as a database null in the database.
“NULL” != Database Null
Some use cases and related behavior…
Let’s say last name was marked in the database as not null, now when data is inserted it will be interpreted as a NULL and fail the insert.
Another case is let’s say the last name was nullable in the database. Mr. NULL is inserted and is transformed into DBNull.Value which is not the same as “NULL”. After the insert we can’t find Mr. Null because his last name is not “NULL” but in reality a database null value.
So, those would be 2 cases of problems. As @Amon points out, databases themselves have no issues with nulls, although one should understand how nulls are handled in each RDMS instance as there will be differences between different vendors.
I would attribute the problem to sloppy programming and poor design of some implementations of SQL. “Null” the name should always be presented and interpreted with quotes. null, the database value, should always be presented without quotes; but when writing ad-hoc code, it’s
easy to slip into the “anything will do” paradigm and accept things believed to be a string in unquoted form.
This is compounded by the fact that other types of data; numbers for example,
can and are accepted in either form because the interpretation is unambiguous.
A problem, fundamentally, is that the term “null” is applied two different database concepts, sometimes using context to distinguish between them:
- Something doesn’t have a known value
- Something is known to have no value
While context can sometimes suffice to distinguish between those concepts, there are times when it really doesn’t. If one is using a record to hold a search query, for example, there should be a difference between saying “I want someone by the name of [whatever], with no last name”, versus “I want someone whose first name is [whatever] but whose last name is unknown.” Many database engines have a bias toward one meaning or the other, but they’re not all the same. Code which is expecting a database engine to work one way may malfunction if run on a different engine that runs differently.
Most of the existing answers focus on the non-SQL parts of an application, but there may be a problem in SQL too:
If instructed to filter out records where a user’s last name is not available, someone who doesn’t understand SQL very well may write a filter
WHERE u.lastname != 'NULL'. Because of the way SQL works, this will appear to check whether
u.lastname IS NOT NULL: all
NULL records get filtered out. All non-
NULL records remain.
Except of course for records where
u.lastname == 'NULL', but there may not have been any such record available during testing.
This becomes more likely if the SQL is generated by some sort of framework, where that framework doesn’t expose an easily accessible way to check for non-
NULL-ness with parameters, and someone notices “hey, if I pass in the string
NULL, it does exactly what I want!”