Do programers need to manually implement optimization such as loop unfolding, etc, when writing Python code?
I am recently learning some HPC topics and get to know that modern C/C++ compilers is able to detect places where optimization is entitled and conduct it using corresponding techniques such as SIMD, loop unfolding, etc, especially under flag -O3
, with a tradeoff between runtime performance vs compile time and object file size.