What is the advantage of log file rotation?

  softwareengineering

I understand that log file rotation is changing the log file you used when (1) one gets big enough or (2) at EOD, but I’m not sure I understand the reason for (1). I have never had any issues with large files and cannot think of reasons why we might set an arbitrary limit (for example, the limit we were given was 10 MiB).

20

“I have never had any issues with large files” is not synonymous with “there is never a problem with large files”. Your lack of experience with problems does not prove that there aren’t any, it just proves that you haven’t experienced them if they exist.
Anecdotally, I’ve had to work with >2GB text-based log files, and this was back in the x86 era where RAM was limited to 4.2GB (including your OS). Large files become effectively impossible to handle, making them useless as a troubleshooting tool.

The short and sweet answer here is that performance is impacted by file size once you get to non-trivial file sizes. As a very simple example, how are you going to easily look through a file whose size is larger than the RAM of your machine?

10MB is a very low cutoff point, but I suspect that this limit hay have been chosen partly in relation to the physical size of the file in terms of manually scrolling through it and reading the logs. We can argue about what a more sensical cutoff would be, but it should be clear that some cutoff based on file size makes sense, so as to avoid the problems that arise when files get too big.

12

Log file rotation is not only related to technical issues with the file itself. It’s also related with operational considerations, like:

  • being able to incrementally backup the logs (especially in case of transaction logs)
  • reducing to a decent minimum the risk of loss of events logged (open files during a hard system crash can be at risk)
  • passing log files to a secure archival for preventing tampering, or to a SIEM system for threat detection.

While it is less problematic nowadays to deal with large files, large files (constantly reopened) negatively affect these needs.

If you only log debugging information, you can ignore these constraints. but for a long living large system , especially with binary logs, the question is not IF you’ll face a byte corruption but WHEN.

8

The other answers already provided very useful reasons, but there is one more I’m surprised I haven’t seen outside of a comment yet:

To be able to delete old log files to free disk space

A very busy application logging all its processes will do a lot of logging. A month’s worth of logs might be hundreds of megabytes.

If you close your log file when it hits a certain size or at end of day, your log folder will contain many files, each with a date when it was written. When your administrator gets an automated warning that the disk is starting to run out of space, he can simply go into the log folder and delete (or move to some cheaper archive) everything that’s from last year or older. Hit delete and enter, or ctrl-X and ctrl-V, done.

If you just keep writing into one master log file, then even if none of the problems mentioned in the other answers apply (and they will) your administrator would have to follow a more complicated process when disk space runs out. Best case, he’ll rename the logfile (hoping that no write action overlaps exactly that moment), your application will see “no file with the right name” and create a new log, then he can… hmm.

Can he delete the old logfile? No, it contains logs up to just before he renamed it. It might still be needed for days, weeks or months!

Can he move it to archive? He can, but then he’ll have to make sure any developer who has to look at the logs has access to that.

Most likely, he’ll have to rename it, wait a few months (during which time hopefully the disk does not fill up completely), and THEN move it. Two processes instead of one.

Time rotation: to make it easier to find the right log file for an incident at a known time

Rotating your log files every month/day/whatever also helps you find the right log file to read if you have to investigate an incident. A user reports “the application crashed on 2024-04-13 at around 11 am” and you can just go to the logfile that is named “app-log-2024-04-13” (or similar) instead of having to sort all your log files by creation date and scrolling until you find the right one

3

never had any issues with large files

and cannot think of reasons why we might set an arbitrary limit

Every system I have ever ssh’d into has had limited budget and limited attached storage.
If you run $ yes > big.txt you’ll eventually see “FS full”,
and similarly for a busy syslogd.

Often we value uptime, and are loathe to see app errors
caused by a full partition.
An app might typically log messages at a limited rate,
say an average of 10 line / second.
Sometimes the internet clients stimulating the
SUT
will change their behavior, so we see 100 line / sec.
Or the stimulus is ordinary but a local “verbosity” config change
makes us log at an unusually rapid rate.

Given a 10x increase, a daily log rolling policy would be
slow to respond, and by end-of-month the steady state
size of /var/log would expand to ten times its usual size.
OTOH, a size-oriented policy would respond immediately,
and keep the total size of /var/log essentially constant
(± 10 MiB in your example).

The tradeoff, of course, is we’d have just one-tenth
the forensic evidence available to investigate the
anomaly, so for a delayed investigation it’s likely
we wouldn’t see events near the transition point,
as they have already been aged out and deleted.
C’est la vie!

Many people would be willing to accept that tradeoff,
if it means never hearing “My daemon died when it
failed to allocate space for a temp file!”

9

I worked on a bug last year where an action in production was suddenly failing due to a timeout. Fast, near instantaneous in dev and QA, failing in production. Took weeks of on again off again work before someone with the right rights noticed the huge log file. Writing to the log file was the source of the problem.

Now, you might say that was just a bad way to write to the log, but it’s a fact that log rolling (which was available, just not turned on) would have saved us a lot time.

Just because it’s never been a problem for you, doesn’t mean it’s not a problem.

1

Limit total amount of resources on system

In addition to some of the other points mentioned, I would like to highlight that some systems take care to limit the total number of resources.
Not only disk space, also memory, database and other aspects.

Examples may include large scale critical applications for controlling factories, power plants, electricity grids, but also the opposite end, for instance a computer in your car.

Everything that should be able to run years without intervention and should never fail for some weird reason.

Using log file rotation is one of several things to consider if one aims to achieve the above.

Obviously on large scale systems this is now less an issue than some years (or decades) ago, but one may still want to consider it.

The original reason for using size as the rotation criteria almost certainly came from the limitations of 32-bit operating systems, where most applications (and maybe the OS itself) couldn’t handle files larger than 2GB. So it’s necessary to rotate log files before they reach this size. Even if the system where the logging is done can handle larger files, you may want to limit them in case there’s a need to copy the logfile to a system that can’t (e.g. for archiving, or analysis).

And even when there’s no hard technical limit to file size, working with extremely large files can be inconvenient. Many file operations that you’re likely to do with log files are O(n): E.g. loading the file into a text editor or searching for messages matching a pattern or containing a particular string.

There’s little downside to limiting logfiles to a reasonable size, and it usually makes things easier, so it’s a common practice.

Hopefully, if using the Twelve-Factor application design, you would be writing logs directly to stdout. Let the log collection agent (that is either attached as a sidecar on a containerized application, or just another running background process for a standard server-side application) handle the gathering of logs from all the process’s stdout streams and the forwarding of logs to a centralized logging solution.

If this is not possible for your application (you don’t have a log forwarding solution in place), log rotation is useful for general filesystem usage maintenance. Sure, your application right now might not be generating a lot of logs, but imagine a much larger application that processes hundreds of requests per second, all being logged with their connection metadata and the application flow for each transaction. You can imagine how large a single log file can get in this instance (especially with DEBUG logs turned on), and therefore it needs to be rotated.

The exact threshold for rotating a log file depends on the application. If an arbitrary rotation threshold is X (can be measured in bytes or simply lines of logs), and your application only generates enough logs to reach X in a week, then you can rotate your logs weekly. In turn, if your application reaches X in a day, rotate daily – an hour, rotate hourly, etc.

Its up to you and your team to determine what makes sense as your logging threshold X in order to help you debug an issue. Obviously, the larger your threshold, the more lines you would have to search through when debugging some logged transaction, but in general, document search utilities such as grep should be just fine in helping to debug an issue for a given time period under threshold X.

TL;DR: log directly to stdout if you can, assuming you have a log forwarding agent installed that is reading your application’s stream. If that is not possible, determine some threshold X (e.g. number of log lines) that would be the maximum size of a log file you would be comfortable sifting through while debugging an issue, and rotate your logs at the general time interval that they reach X.

13

LEAVE A COMMENT