The most of the database management systems use B-tree
as a data structure for increasing the performance.
Let’s imagine that we have a table users
with the following columns: id (int)
, name (string)
, email (string)
, password (string)
, age (int)
. We want to select only ones that meet the condition: (name.contains("Bob") and not name.contains("A")) or (email.starts_with("angel") and (age >= 30 or age <= 20))
. It’s a really complicated condition. How do the database management systems builds the B-tree
and how do they move inside them to select the data? Which key should be used inside B-tree
? If it’s achievable with another data structure type, it is also welcomed!
I am creating my own RDBMS and faced with this problem 🙂
3
When talking about databases we usually talk about indexes, they are often implemented using a B-Tree, but does not have to be, there are a wide variety of search structures that could be used, depending on the specific use case, hash tables, binary trees, R-Trees, and many others. Typically the primary key would always have a index, but other columns would have indexes created by the user.
Not all queries will be able to use indexes. A complicated query like yours might be able to use the index for the age to narrow down the number of items, and do a linear scan for the other conditions. If you make a sufficiently complicated query the database engine might decide to just do a full table scan. Indexes supporting queries like name.contains("A")
would also need to be more complicated than a regular index, and may have limited availability.
There are composite indexes that can be used when querying multiple columns. I’m no database expert, but I would guess they just start using another column for the key at some depth level of the b-tree.
I would note that creating a general purpose RDBMS system is hard. So you need to carefully consider what your system will do that existing systems do not. It might also be easier to create a specialized system for your specific needs.
1