I have been programming in higher level languages (Python, C#, VBA, VB.NET) for around 10 years and I have completely zero understanding on what’s going on, “under the hood.”
I am wondering what are the benefits of learning assembly, and how will it aid me as a programmer? Can you please provide me with a resource that will show me exactly the connection between what I write in higher level code to what happens in assembly?
12
Because you’ll understand how it really works.
- You’ll understand that function calls are not for free and why the call stack can overflow (e.g., in recursive functions). You’ll understand how arguments are passed to function parameters and the ways in which it can be done (copying memory, pointing to memory).
- You’ll understand that memory is not for free and how valuable automatic memory management is. Memory is not something that you “just have”, in reality it needs to be managed, taken care of and most importantly, not forgotten (because you need to free it yourself).
- You’ll understand how control flow works at a most fundamental level.
- You’ll appreciate the constructs in higher-level programming languages more.
What it boils down to is that all the things we write in C# or Python need to be translated into a sequence of basic actions that a computer can execute. It’s easy to think of a computer in terms of classes, generics and list comprehensions but these only exist in our high-level programming languages.
We can think of language constructs that look really nice but that don’t translate very well to a low-level way of doing things. By knowing how it really works, you’ll understand better why things work the way they do.
8
It will give you a better understanding of what is “happening under the hood” and how pointers work and the meaning of register variables and architecture (memory allocation and management, parameter passing (by value/by reference), etc) in general.
For a quick peek with C how’s this?
#include <stdio.h>
main()
{
puts("Hello World.");
return(0);
}
compile with gcc -S so.c
and take a look at the assembly output in so.s
:
$ cat so.s
.file "so.c"
.section .rodata
.LC0:
.string "Hello World."
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $.LC0, (%esp)
call puts
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3"
.section .note.GNU-stack,"",@progbits
7
I think the answer you seek is here: http://www.codeproject.com/Articles/89460/Why-Learn-Assembly-Language
A quote from the article:
Though it’s true, you probably won’t find yourself writing your next customer’s app in assembly, there is still much to gain from learning assembly. Today, assembly language is used primarily for direct hardware manipulation, access to specialized processor instructions, or to address critical performance issues. Typical uses are device drivers, low-level embedded systems, and real-time systems.
The fact of the matter is, the more complex high level languages become, and the more ADT (abstract data types) that are written, the more overhead is incurred to support these options. In the instances of .NET, perhaps bloated MSIL. Imagine if you knew MSIL. This is where assembly language shines.
Assembly language is as close to the processor as you can get as a programmer so a well designed algorithm is blazing — assembly is great for speed optimization. It’s all about performance and efficiency. Assembly language gives you complete control over the system’s resources. Much like an assembly line, you write code to push single values into registers, deal with memory addresses directly to retrieve values or pointers.
To write in assembly is to understand exactly how the processor and memory work together to “make things happen”. Be warned, assembly language is cryptic, and the applications source code size is much much larger than that of a high-level language. But make no mistake about it, if you are willing to put in the time and the effort to master assembly, you will get better, and you will become a stand out in the field.
Additionally, I’d recommend this book because it has a simplified version of computer architecture:
Introduction to Computing Systems: From Bits and Gates to C and Beyond, 2/e
Yale N. Patt, University of Texas at Austin
Sanjay J. Patel, University of Illinois at Urbana/Champaign
8
In my humble opinion, it doesn’t help much.
I used to know x86 assembly very well. It helped a little when assembly came up in my courses, it came up once during an interview, and it helped me prove that a compiler (Metrowerks) was generating bad code. It’s fascinating how the computer actually works, and I feel intellectually richer for having learned it. It was also very fun to play with at the time.
However, today’s compilers are better at generating assembly than almost anyone on almost any piece of code. Unless you’re writing a compiler or checking that your compiler is doing the right thing, you are probably wasting your time by learning it.
I admit that many questions that C++ programmers still usefully ask are informed by knowing assembly. For example: should I use stack or heap variables? should I pass by value or by const reference? In almost all cases, however, I think that these choices should be made based on code readability rather than computational time savings. (E.g., use stack variables whenever you want to limit a variable to a scope.)
My humble suggestion is to focus on skills that really matter: software design, algorithm analysis, and problem solving. With experience developing big projects, your intuition will improve, which increases your value much more than knowing assembly (in my opinion).
4
You should be familiar with one level ‘deeper’ in the system that you are working at. Skipping too far down in one go isn’t bad, but may not be as helpful as one would desire.
A programmer in a high level language should learn a lower level language (C is an excellent option). You don’t need to go all the way to assembly to have an appreciation of what goes on under the covers when you tell the computer to instantiate an object, or create a hash table, or a set – but you should be able to code them.
For a java programmer, learning some C would help you with memory management, passing arguments. Writing some of the extensive java library in C would go a ways to understanding when to use what implementation of Set (do you want a hash? or tree?). Dealing with char* in a threaded environment will assist in understanding why String is immutable.
Taken to the next level… A C programmer should have some familiarity with assembly, and assembly types (oft found in embedded systems shops) would likely do well with understanding things at the level of gates. Those who work with gates should know some quantum physics. And those quantum physicists, well, they are still trying to figure out what the next abstraction is.
2
Since you didn’t mention C or C++ in the languages you know list. I would STRONGLY recommend learning them well before even thinking about assembly. C or C++ will give all the basic concepts that are totally transparent in managed languages and you will understand most of the concepts mentioned in this page with one of the most important languages that you could use in real world projects. It is a real added value to your programming skills. Please, be aware that assembly is used in very specific areas and it is not nearly as useful as C or C++.
I would even go further to say that you should not dive to assembly before understanding how unmanaged languages work. It is almost a mandatory reading.
You should learn assembly if you want to go even further down. You want to know how exactly each and every construct of the language is created. It is informative but it is a whole lot different level complexity.
If you know a language well, you should have at least basic knowledge of the technology one level of abstraction lower.
Why? When things go wrong, knowledge of the underlying mechanics makes it far easier to debug strange problems, and naturally write more efficient code
Using Python(/CPython) as an example, if you start getting weird crashes or poor performance, knowledge of how to debug C code can be very useful, same with is knowledge of it’s ref-counting memory management method. This would also help you know when/if to write something as a C extension, and so on…
To answer your question in this case, knowledge of assembly really wouldn’t help an experienced Python developer (it’s too many steps down in abstraction – anything done in Python would result in many many assembly instructions)
..but, if you are experienced with C, then knowing “the next level down” (assembly) would indeed be useful.
Similarly, if you are using CoffeScript then it’s (very) useful to know Javascript. If you are using Clojure, knowledge of Java/JVM is useful.
This idea also works outside of programming languages – if you are using Assembly, it’s a good idea to be familiar with how the underlying hardware functions. If you are a web-designer, it’s a good idea to know how the web-application is implemented. If you are a car mechanic, it’s a good idea to have some knowledge of some physics
Write a small c program, and disassemble the output. That’s all. However, be prepared for a greater or lesser degree of “housekeeping” code that is added for the benefit of the Operating System.
Assembly helps you understand what’s going on under the hood because it deals directly with memory, processor registers and the like.
If you really want to go bare-metal without all of the complexity of an operating system complicating things, try programming an Arduino in assembly language.
There is no definitive answer, since programmers are not all of a type. Do you NEED to know what lurks underneath? If so, then learn it. do you just merely want to learn it, out of curiosity? If so, then learn it. If it will have no practical benefit to you, then why bother? Does one need a mechanic’s level of knowledge just to drive a car? Does a mechanic need an engineer’s level of knowledge, just to work on a car? This is a serious analogy. A mechanic can be a very good, productive mechanic without diving to engineer depth understand of the vehicles he maintains. Same for music. Do you really to plumb the complexities of melody, harmony and rhythm to be a good singer or player? No. Some exceptionally talented musicians can’t read a lick of sheet music, let alone tell you the difference between Dorian and Lydian modes. If you want to, fine, but no, you don’t need to. If you are a web developer, assembly has no practical use that I can think of. If you are in embedded systems or something really specially, then it might be necessary, but if it were, you’d know it.
Here’s Joel’s take on these value of leaning a non-high level language:
http://www.joelonsoftware.com/articles/ThePerilsofJavaSchools.html
Actually, what would probably be best for you would be a class that doesn’t (to my knowledge) exist anywhere: It would be a class that combines a brief overview of machine/assembler language and storage addressing concepts with a tour through compiler construction, code generation, and runtime environments.
The problem is that with a high-level, far-away-from-the-hardware language like C# or Python you don’t really appreciate the fact that every move you make turns into hundreds if not thousands of machine instructions, and you don’t tend to comprehend how a few lines of a high-level language can cause vast amounts of storage to be accessed and modified. It’s not so much that you need to know precisely what is going on “beneath the covers”, but you need to have an appreciation for the scope of what’s happening, and a general conception of the types of things that occur.
My answer to this question has evolved relatively recently. The existing answers cover what I would have said in the past. Actually, this is still covered by the top answer – the “appreciate the constructs in higher-level programming” point, but it’s a special-case that I think is worth mentioning…
According to this Jeff Atwood blog post, which references a study, understanding assignment is a key issue in understanding programming. Learner programmers either understand that the notation just represents steps that the computer follows, and reasons by the steps, or else gets perpetually confused by misleading analogies to mathematical equations etc.
Well, if you understand the following from 6502 assembler…
LDA variable
CLC
ADC #1
STA variable
That really is just the steps. Then when you learn to translate that to an assignment statement…
variable = variable + 1;
You don’t need an misleading analogy to a mathematical equation – you already have a correct mental model to map it to.
EDIT – of course if the explanation you get of LDA variable
is basically ACCUMULATOR = variable
, which is exactly what you get from some tutorials and references, you end up back where you started and it’s no help at all.
I learned 6502 assembler as my second language, the first being Commodore Basic, and I hadn’t really learned much of that at the time – partly because there was so little to learn, but also because assembler just seemed so much more interesting back then. Partly the times, partly because I was a 14 year old geek.
I don’t recommend doing what I did, but I wonder if studying a few very simple examples in a very simple assembler language might be a worthwhile preliminary to learning higher-level languages.
Unless you are a compiler writer, or need something highly optimized (like data processing algorithm), learning assembly coding will provide you no benefits.
Writing and maintaining code written in assembly is very difficult, therefore even if you know assembler language very well, you shouldn’t use it, unless there are no other ways.
The “Optimizing for SSE: A Case Study” article shows what is possible to do if you go to assembly. The author managed to optimize the algorithm from 100 cycles/vector to 17 cycles/vector.
2
Writing in assembly would not give you magic increase of speed as due to amount of details (register allocation etc.) you will probably write the most trivial algorithm ever.
Additionally with modern (read – designed after 70-80’s) processors assembly will not give you sufficient number of details to know what is going on (that is – on most processors). Modern PU (CPUs and GPUs) are quite complex as far as scheduling instructions go. Knowing basics of assembly (or pseudoassembly) will allow to understand computer architecture books/courses which would provide further knowledge (caches, out-of-order execution, MMU etc.). Usually you don’t need to know complex ISA to understand them (MIPS 5 is quite popular IIRC).
Why understand processor? It might give you much more understanding what’s going on. Let’s say you write matrix multiplication in naive way:
for i from 0 to N
for j from 0 to N
for k from 0 to N
A[i][j] += B[i][k] + C[k][j]
It may be ‘good enough’ for your purpose (if it is 4×4 matrix it might be compiled to vector instructions anyway). However there are quite important programs when you compile massive arrays – how to optimize them? If you write the code in assembly you might have a few % of improvement (unless you would do as most people do – also in naive way, underutilizing registers, loading/storing to memory constantly and in effect having slower program then in HL language).
However you can reverse tho lines and magically gain performance (why? I leave it as ‘homework’) – IIRC depending on various factors for large matrices it can be even 10x.
for i from 0 to N
for k from 0 to N
for j from 0 to N
A[i][j] += B[i][k] + C[k][j]
That said – there are working on compilers being able to do it (graphite for gcc and Polly for anything using LLVM). They are even capable of transforming it into (sorry – I’m writing blocking from memory):
for i from 0 to N
for K from 0 to N/n
for J from 0 to N/n
for kk from 0 to n
for jj from 0 to n
k = K*n + kk
j = J*n + jj
A[i][j] += B[i][k] + C[k][j]
To summarise – knowing basics of an assembly allows you to dig into various ‘details’ from processor design which would allow you to write faster programs. It might be good to know differences between RISC/CISC or VLIW/vector processor/SIMD/… architectures. However I would not start with x86 as they tend to be quite complicated (possibly ARM too) – knowing what is a register etc. is IMHO sufficient for start.
1
Normally it’s VERY important for debugging purposes. What do you do when the system breaks in the middle of an instruction and the error makes no sense? It’s much less of an issue with .NET languages so long as you’re only using safe code–the system will almost always shield you from what’s going on under the hood.
In short I think the answer is because you can do more if you learn assembly. Learning assembly grants access to the realms of embedded device programming, security penetration and circumvention, reverse engineering and system programming which are very hard to work in if you don’t know assembler.
As for learning it to improve program performance, this is doubtful in applications programming. Most of the time there are so many things to focus on first before ever hitting this level of optimization like optimizing your i/o access on both disk and network, optimizing how you build the GUI, choosing the right algorithms, maxing out all your cores, running on the best hardware money can buy and switching from interpreted to compiled languages. Unless you’re creating software for other end users, hardware is cheap compared to a programmer’s hourly wage, especially with cloud availability.
Also, you have to weigh increased program execution speed with readability of your code after you get hit by a bus, quit or come back to the code base to change it a year after you wrote the last version.
I would recommend learning algorithms: sorting, linked lists, binary trees, hashing, etc.
Also learn lisp see Structure and Interpretation of Computer Programs groups.csail.mit.edu/mac/classes/6.001/abelson-sussman-lectures this video course will teach you everything you need to know, including algorithms (how to do everything based on a few primitive commands, one lisp primitive and some assembler provocatives).
Finally if you must learn assembler learn an easy one like ARM ( also it is used in about 4 times more devices than x86).
Well, the answer is that just simply because the language you are using must be interpreted or compiled into assembler at the end. No matter the language or the machine.
The design of languages derives from the way the CPU works. More on low level programs, less on high level programs.
I will end by saying that it is not only that you need to know little assembler but CPU architecture, which you learn by learning assembler.
Some examples: There are many java programmers that do not understand why this does not work, and even less than know what happens when you run it.
String a = "X";
String b = "X";
if(a==b)
return true;
If you knew a little assembler you would always know that it is not the same the content of a memory location vs the number in the pointer variable that “points” to that location.
Even worse, even in published books you will read something like in JAVA primitives are passed by value and objects by reference, which is completely incorrect. All arguments in Java are passed by value, and Java can NOT pass objects to functions, only pointers, which are passed by value.
If you now assembler its obvious what’s going on, if not it is so complicated to explain that most authors just give you a pious lie.
Of course, the ramifications of these are subtle but can get you in real trouble later on. If you know assembler it’s a non-issue, if no, you are in for a long long night of debugging.
6