This is probably a dumb question, but how do programming languages work on a low level? If you go to the Go language GitHub page here, it says almost 90% of the source files are Go files. How is it possible that a programming language is made up of itself, especially 90% of itself? I can kind of understand how a language such as Lua is written in C, but Go is made-up of mostly Go files. It doesn’t make much sense to me, how do the developers use Go to create Go?
A compiler or interpreter is a program just like any other program. You can write programs in any programming language you want, including Go. Ergo, you can write a compiler in Go. Including a Go compiler.
Of course, in order to actually use that compiler or interpreter, you need to have a compiler or interpreter for that language as well.
In the case of Go, for example, the first several versions of the compiler were written in C. So, they already had a working Go compiler. Then, it’s no problem to write a Go compiler in Go, since you already have a working Go compiler written in C, which you can use to compile the Go compiler written in Go. And now, you have compiled version of your Go compiler written in Go, and you can use that to compile future versions of your Go compiler written in Go.
This is called “bootstrapping” (after the old tale of Baron Münchhausen, who pulled himself out of the mud by his own bootstraps).
Note that for Go specifically, there are multiple different compilers for Go, and at least gccgo continues to be written in C++; there is no Go code in gccgo. So, you can always use gccgo to re-start the bootstrapping process, should you ever lose your compiled Go binary.
A compiler that is written in the language it compiles, and that is capable of compiling itself, is called “self-hosting”. There are a couple of advantages to a self-hosting compiler:
- When working on a compiler, you need to know three languages: the language you are compiling (the source language), the language you are compiling to (the target language), and the language you are writing the compiler in (the implementation language). Self-hosting allows you to get rid of one of them. This increases the amount of people able to work in the compiler by lessening the amount of knowledge a potential contributor needs to possess.
- Production-grade industrial-strength high-performance compilers are large, complex, resource-intensive programs. They are a good test for your language (can your language’s abstraction features handle such a large and complex project?) and your compiler (if the compiler can compile itself, then it probably also can compile other large, complex programs).
- If your compiler is very simple, the code of the compiler can serve as a specification of the language’s behavior. (In general, production-grade compilers aren’t simple and simple compilers aren’t production-grade, though. Also, this shouldn’t be your only specification, otherwise you’ll never be able to tell whether or not your compiler is correct.)
- Self-hosting is considered to be an important milestone for a language.
There are also some disadvantages:
- The complex bootstrap process.
- If the compiler writer is also the language designer, there is the danger that he will add features that that he can use while writing the compiler, and leave out features that are hard to write a compiler for, thus ending up with a language that is only good for writing compilers and nothing else. (That’s not necessarily a bad thing if you are designing a language for writing compilers.)
In one of his articles, Prof. Niklaus Wirth gave a nice example of the latter: when designing the Oberon language, he wrote the compiler at the same time he was designing the language. The system he was writing on, only had an obscure proprietary dialect of Fortran. After same time, he realized that he had subconsciously left out or changed features that would make it easier to write programs in Oberon because he couldn’t think of a nice way to implement them in the obscure Fortran dialect. So, he threw away the compiler, re-examined his design and started a new compiler in Oberon itself. Oberon was intended to be a systems programming language, so writing a compiler and a standard library in it was a natural choice.
Now, the question is, how did he solve the bootstrap problem? Well, he was a professor, after all: he handed out portions of the compiler to his students, to manually translate by hand into Fortran.
This is a very simple question with a very deep answer. A full explanation is beyond the scope of a site like this, but I’ll give you enough to get started learning about it.
First, what is a programming language? Or, better put, what defines a programming language? Most professional developers would agree that a language is defined by two primary things: the compiler/interpreter and the standard library. The compiler sets out the syntactical and semantic rules of how the language works, and the standard library helps to establish paradigms and idioms for what the language is most useful for.
Understanding that, we can answer your question. The quick version is that if a programming language is capable of basic file IO and data processing, it’s capable of implementing a compiler for a language, including its own language. How do they write Go in Go? They wrote a Go compiler in Go, of course!
But that’s not the thing you really want to know, is it? You’re asking about the chicken-and-egg problem inherent in that statement: how can you write the first compiler in a language that doesn’t have a compiler for itself yet?
Obviously you can’t, so you write the first compiler in another language. This is known as “bootstrapping.” It doesn’t even need to implement the entire language, though; just enough that you’d be able to compile a Go compiler that can do the same things as your bootstrap compiler can do (ie. compile itself.)
At that point, you have a working compiler in Go, and you can then build new features into the compiler, compile them, and have a new and improved compiler, and so on, until you’ve built up the full language as designed.
In the same way that I can use English to describe and define English.
Programming languages aren’t written as such. Instead they are more of a description of how they should work.
The compiler is the actual implementation. Originally I think it was written in C. But once you can compile the language, you can then create a compiler with that language.
Similar question: When someone writes a new programming language, what do they write it IN?