The most of the database management systems use
B-tree as a data structure for increasing the performance.
Let’s imagine that we have a table
users with the following columns:
age (int). We want to select only ones that meet the condition:
(name.contains("Bob") and not name.contains("A")) or (email.starts_with("angel") and (age >= 30 or age <= 20)). It’s a really complicated condition. How do the database management systems builds the
B-tree and how do they move inside them to select the data? Which key should be used inside
B-tree? If it’s achievable with another data structure type, it is also welcomed!
I am creating my own RDBMS and faced with this problem 🙂
When talking about databases we usually talk about indexes, they are often implemented using a B-Tree, but does not have to be, there are a wide variety of search structures that could be used, depending on the specific use case, hash tables, binary trees, R-Trees, and many others. Typically the primary key would always have a index, but other columns would have indexes created by the user.
Not all queries will be able to use indexes. A complicated query like yours might be able to use the index for the age to narrow down the number of items, and do a linear scan for the other conditions. If you make a sufficiently complicated query the database engine might decide to just do a full table scan. Indexes supporting queries like
name.contains("A") would also need to be more complicated than a regular index, and may have limited availability.
There are composite indexes that can be used when querying multiple columns. I’m no database expert, but I would guess they just start using another column for the key at some depth level of the b-tree.
I would note that creating a general purpose RDBMS system is hard. So you need to carefully consider what your system will do that existing systems do not. It might also be easier to create a specialized system for your specific needs.