Why do binary files load quicker than alphanumeric text files?

  softwareengineering

I’ve noticed that when I load/store large data files in a binary format, the program runs much faster than if I load data from an ASCII encoded file.

Why is this the case? The data in my case is plain, with limited parsing involved other than read() or fscanf().

2

Using fscanf() in itself probably explains most of it. fscanf() has to interpret the passed-in format string, and then has to scan the input stream from the file, trying to match the specified pattern. That’s actually a huge amount of work. read() just has to read in the specified number of bytes from the file and doesn’t have to do any parsing of the input. By contrast, fgets() does a little more work than read() since it has to watch for newlines, but it does a lot less work than fscanf().

0

Its difficult to say without knowing what you’re parsing but…

text files are generally bigger. You want to store a number, that can be 4 bytes for an integer, but 10 bytes (*2 for Windows unicode) for some depending on the value.

Both parsing and formatting can be hugely slow. eg. you want to write the integer “123” out to binary – just dump 4 bytes and you’re done. Read it back in, read 4 bytes, problem solved.

In text, this involves understanding how to turn the binary 123 into “123” which is quite time-consuming, write it, and then read it back in – you have to parse the text between whitespace, switch on each character (eg so 3 gets read and added to the value, then 2 gets read, multiplied by 10 and added, then 1 gets read, multiplied by 100 and added). I hope you can see how that can be very much more slower than simply reading in 4 bytes.

1

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website Kho Theme wordpress Kho Theme WP Theme WP

Why do binary files load quicker than alphanumeric text files?

I’ve noticed that when I load/store large data files in a binary format, the program runs much faster than if I load data from an ASCII encoded file.

Why is this the case? The data in my case is plain, with limited parsing involved other than read() or fscanf().

2

Using fscanf() in itself probably explains most of it. fscanf() has to interpret the passed-in format string, and then has to scan the input stream from the file, trying to match the specified pattern. That’s actually a huge amount of work. read() just has to read in the specified number of bytes from the file and doesn’t have to do any parsing of the input. By contrast, fgets() does a little more work than read() since it has to watch for newlines, but it does a lot less work than fscanf().

0

Its difficult to say without knowing what you’re parsing but…

text files are generally bigger. You want to store a number, that can be 4 bytes for an integer, but 10 bytes (*2 for Windows unicode) for some depending on the value.

Both parsing and formatting can be hugely slow. eg. you want to write the integer “123” out to binary – just dump 4 bytes and you’re done. Read it back in, read 4 bytes, problem solved.

In text, this involves understanding how to turn the binary 123 into “123” which is quite time-consuming, write it, and then read it back in – you have to parse the text between whitespace, switch on each character (eg so 3 gets read and added to the value, then 2 gets read, multiplied by 10 and added, then 1 gets read, multiplied by 100 and added). I hope you can see how that can be very much more slower than simply reading in 4 bytes.

1

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website Kho Theme wordpress Kho Theme WP Theme WP

LEAVE A COMMENT