Why is SQL the only database query language? [duplicate]

For general-purpose programming there are literally hundreds of programming languages. But for interacting/querying the databases, why is SQL pretty much the only used language?

In addition to Basile’s answer, please also recognize that SQL is not a language like you would think of an object-oriented language or procedural language. In many ways the ANSI SQL standard is more like a protocol or a series of generally accepted statements based on mathematical principles of set theory, predicate logic and relational algebra.

But how individual RDBMS developers implement these standards vary significantly across various proprietary software enough so to almost classify each individual implementation as a language of its own.

For instance, Oracle SQL is quite different from Microsoft SQL Server SQL which is different to MySQL, etc. On top of this, each company implements their own unique functions, database engine and (in some cases) procedural languages on top of the traditional ANSI SQL standard statements. Some even choose to abandon the standard at times in favor of their own personal implementation.

The history of this marriage of SQL to the relational model is primarily due to the fact that the technology and language were developed side-by-side in the 1970s by E.F. Codd, Donald D. Chamberlin and Raymond F. Boyce. There are some pretty decent articles on Wikipedia around the topic if you have an opportunity to read about it.

First SQL is mostly for relational databases. And it has multiple variants or dialects.

So non-relational databases have other query languages. Read about NoSQL. Look into MongoDB for example. See also Query language wikipage listing other query languages.

SQL is not a “language” per se, so much as it is a standard for making a language. See The SQL Timeline for reference. Multiple parties have implemented specific database query languages based on the standard, which vary in the degree to which the standard is adhered to.

In that sense, it is false to say that SQL is the only database query language, as there is no such thing as “the SQL language”; rather you have multiple language implementations, such as Transact-SQL (e.g. as used by Microsoft SQL Server), MySQL, PostgreSQL, Oracle SQL, and so on, which are similar in many aspects, but are largely not compatible with each other due to being designed for the domain of a specific relational database engine.

As Basile Starynkevitch correctly pointed out, there are also multiple “NoSQL” database languages which can be implemented via APIs that can be used by most traditional programming languages.

Good ol’ Uncle Bob

I’d encourage you to consider what Robert “Uncle Bob” Martin has to say on databases. I find it interesting, with regards to some considerations he discusses regarding SQL and NoSQL.

TLDR

SQL popularity was influenced by corporate entities, the open source community, technology stacks like LAMP, and physical storage limitations of hard drives. As of 2016, your top three database systems are all SQL-based (Oracle 12c, Microsoft SQL Server, and MySQL). It was also pointed out in the comments below that SQL has been an approved standard for both ANSI,(American National Standards Institute), and the ISO International Standards Organization for decades, which has vast impacts on its predictability and quality – a very important business consideration for many enterprises.

Physical Storage Considerations

Until the wide-spread appearance of SSDs and cloud computing within the last decade, SQL was the predominant leader in database languages; with these new technologies, however, a new evolution of database languages are appearing as well. There have been some articles discussing the closing gap in price and increase in both popularity and accessibility of SSDs over HDDs. There’s a great website for ranking of database technologies available, which might interest you. You can see that SQL and relational databases still have a firm foot-hold, but NoSQL alternatives such as MongoDB are creeping up as well.

The Goliaths

SQL has gained increasing popularity over the past several decades since it really took off in the mid 1980s. Today, the “concrete” nature of its popularity can be considered with regards to two of the most influential companies on the planet, whose enterprise products rely largely on solutions revolving around the SQL language. Those two companies are: Oracle and Microsoft. Countless small, mid, and large-sized companies use Oracle and Microsoft products, such as Oracle 12c, Microsoft SQL Server and MySQL. It’s also important to note the impact of the open-source community as well, which over the past 20 years or so in which it was very common to see many developers using PHP and MySQL together through things like the LAMP, MAMP, XAMPP, and WAMP stacks.

Both companies, Microsoft and Oracle, are software goliaths, and the OOP programming languages they provide are two of the most popular: Java and C#.

Java, being acquired by Oracle about 6 years ago, is one of the most popular programming languages in the world. Hence, Oracle has a very vested interest in making sure that its programming language is used in tandem with MySQL, which is the source of a very high revenue for the company.

C#, being owned by Microsoft, integrates very well with SQL Server through Microsoft’s various database offerings, such as through cloud storage in Azure as well as local installations of Windows Server, which typically runs with an instance of SQL Server Management Studio.

Other popular languages like PHP also integrate well with MySQL, due to the open-source nature of those languages. The LAMP (Linux, Apache, MySQL, and PHP), as well as MAMP, XAMP, and WAMP, also provided a foray into a mix of easy-to-use out-of-the-box open-source technologies that were all the rave about 20 years ago (and still are very popular to this day).

The open-source community around these languages (in particular, SQL) are huge, so it is quite easy to find support.

The language itself is also very easy to understand and read, given that it is very close to how you would structure a logical English sentence (if there is such a thing!).

Some Competition

However, while SQL and other relational databases are holding strong, other alternative document store technologies, like NoSQL, MongoDB, CouchDB and Microsoft’s NoSQL implementations are becoming increasingly popular as well.

Physical Tech Advances

Part of this increase in the use of NoSQL-esque platforms may be due to the physical hardware technologies that are settling into the market. As SSDs and HDDs are becoming more closely comparable in price, storing data in the form of large JSON documents and in repository and object storage systems, such as Amazon S3 Storage becomes relatively less expensive, both physically and fiscally, whether you are considering storing it on your own hard drive locally or somewhere in the cloud.

Cloud Offerings

Cloud offerings are becoming cheaper by the day, and 1 TB of storage in the cloud is practically dirt cheap as well. I personally have 1 TB of Google Drive storage for $1.99 per month (about $25 per year).

Some Conclusions…

Storage space (physically) used to be a crucial issue, which is where many SQL databases stood to serve a need for a data language that could be used to keep data storage as efficient as possible. This is where you get the CRUD style applications (Create, Read, Update, Delete). When storing massive amounts of data in a SQL database, you want to avoid duplication as much as possible through a process known as normalization, which has to do with designing the database in such a way as to have as efficient of a relational mapping as possible. This would help the physical lookup speed (also improved through indexing), but all of which was centered around a major concern, which was physical space. You simply couldn’t afford to have massive amounts of duplicate data. Not to mention, the need for database replication and backup, to protect against things like natural disasters or power outages, terrorists, etc.

Huge companies with massive amounts of credit card data or health information couldn’t afford to go down for even a minute, so they would need to replicate and distribute their data centers geographically to protect their users’ data. This was enormously expensive, because every database needed a physical endpoint, a computer and a hard drive to store the mass of information.

Fast forward to today, the need for physical size is less expensive. We’ll likely see a shift away from SQL over the coming years, but it has such a large user base, and many large companies are so firmly rooted, that this will likely take a while.

However, new industries, such as infrastructure as a service, provided by companies like Amazon with their Amazon Web Services (AWS) – as well as Microsoft Azure, large scalable datacenters filled to the brim with high-capacity, lower-power consuming SSD hard drives are making it increasingly possible for those large companies to port over to newer, more efficient technologies. As a result, we will likely see the change in database language to reflect new physical, logical, and other scientific limitations.

It’s not the only query language… Query languages have different flavours of implementation. Basically, SQL is taken as reference standards or internally other languages convert to SQL.

Some of the query languages are listed here.

C# developers use the LINQ query language very often.

That is a very good question, but you could more generally ask:

Why are all relational databases so remarkably similar?

Apart from the smaller variants that come with a GUI such as Access and Filemaker and the somewhat special single-user engines, relational databases all share an astounding number of peculiarities:

As is already part of your question, they all use the same (old and flawed) query language to which they all add new features. It’s the worst of both worlds: No one dares to innovate away from SQL, but the dialects are still not compatible.
Even though the database schema is also always defined using this language (with DDL), databases don’t return the current schema in that form. Rather, they usually return it in the form of various forms of tabular data. Many have client-side utilities that can then convert that information back to DDL. Which, of course, is a pain to do diffs on programmatically for updates. Bizarre, but that’s how they all do it.
They can’t give you a live view of a query: The result set together with a real-time feed of the changes. That’s why database applications re-query everything all the time.
Even though they all support the concept of constraints, especially in the form of foreign and unique keys, none of them provide calling code with enough information for that calling code to understand which of those constraints was violated when that happens. That is why database-backed software still needs to do explicit uniqueness and foreign key checking – a violation of DRY and a pain.
They are bad at giving information about why queries are slow, which is why database applications usually just hang without explanation when anything locky or expensive happens. Why can’t I get a live view of the execution plan with progress indicators on the nodes (with links to locking queries if they come up)? [EDIT: Sql Server now indeed has “Live Query Statistics”, very cool)
They all have b-tree indexes for the common stuff and then they all have another two-dimensional index based on floating-point values for geographical applications. But only floating point, and only two dimensions. Then they usually have some full-text search capability extra. And more than one has extra index support for xml data the resides in string columns.
They all have a method to measure the similarity of two words in English.
They all can do aggregates, but none of them can keep aggregate partial values in b-tree branch nodes to do it fast.
As a special instance of the previous point, none of them can efficiently seek to a row by row count. In particular, the common scroll-grids that lazily load the slice of data a user scrolled to are never efficient for a large number of rows – on none of the relational database servers. They all have to scan through all the data from the beginning of the result set to the first returned row. Let that sink in.

In short, even though they are all both incredibly amazing and indispensable for many data storage and processing tasks, they are also all confusing and quirky in a surprisingly similar way. You can see how after the first one was written everybody just copied it, and as time went by people added just the feature that was cool at the time and that then got copied to all players.

This is very different from programming platforms, where there are widely different paradigms and even widely different syntaxes within each paradigm. New ideas get hacked onto older language also, but more often people actually make new languages and platforms. Java was a step forward in 1990. .NET was a step forward in 2002. A language about as ancient as the relational database would be C++, and that shows. Only with relational databases, one has no real choice but to use them.

I know there are “NoSQL” databases, but the relational paradigm is still immensely important. It’s sad that there is so little pluralism there.

Filed under: softwareengineering - @ 12:11

Thẻ: database, programming-languages, sql

Thiết kế website giá rẻ

Danh mục