Design optimal read operation in my service

  softwareengineering

—- Description

I’d like to optimize read operation in my service. It has basically put and get methods.

On put I accept some numbers in a string format and put them into thread-safe cache with timed expiration policy. On get I have to perform some math.

Put has concurrent nature and get is non-concurrent, each entry in cache expires within 2 min after it has been written, read performes every 2 sec.

—- Issue

When I put something into my cache I don’t want to convert string to number, because of the performance hit. BUT when I read from cache I have to do this.

And here is the problem! I don’t want to perform double work – convert same numbers over and over again in sequential readings. I want to do this only once.

Seems like I have to add a new cache with mapping string to parsed number! Maybe with the same eviction policy as in the first cache..

—- Question

Do you think introducing additional cache here is a good idea overall?

Or maybe such ideas can be called “over engineering”?

8

Cache invalidation is one of the hardest problems in computer science. Tread lightly here.

Performance is a very popular way to waste time playing with working code rather than move on to new challenges. Tread lightly here.

That said, if you have access to a good hashing data structure that you could test for previously converted numbers before converting you might be able to see some gains. It really depends on the usage. CPU’s are fast and memory fetches are slow so don’t expect this to pay off in every case. You’re betting that the hash lookup will take less time than the conversion. You’re betting that the hash look up will succeed often enough to save more time then is lost when it fails.

If you’re into Big O it’s easy to jump on this saying O(1) is better than O(n) but keep in mind in the real world n can be small making Big O meaningless.

Also understand that this is a space time trade off. Even if you can make this faster that doesn’t mean you got that speed for free. CPU cycles are a limited resource but so is memory. This is going to cost memory. Think about how many numbers you’re going to allow this thing to hash. If users get to decide these numbers think about what a hacker can now do to your memory foot print.

Even if you’re confident that you have a winner because your math and performance tests say this is the way to go you can deploy only to find out that something is different between your test and operational enviroments that is causing a difference in the number of cache misses. Which means the computer happened to leave the number you want in some slower memory. This is hard to test for, hard to predict, and can mess with your performance numbers.

Just trying to answer the question: “which is better?” can be considered over engineering. But if you have tests that show this is where most time is being spent and making it faster here truly is important it might be worth a try.

However, before you whip out that hash table make sure you’ve asked: “Why are we converting the same string over and over anyway?” It might be a problem you can solve with a little change to your architecture. If so then slapping this on would just be a kludge. Make sure you really need to do this before you invest to much in doing it.

4

Your service seems to be based on HTTP GET and PUT requests. And to make them happen, the HTTP libraries will do lots of string/number conversions (and other things) under the hood, so just one more string-to-number conversion inside the PUT implementation body will not be noticable (unless it’s really big numbers with hundreds of digits).

5000 conversions from string to 32-bit integer will take a very small fraction of a second, no matter if you do it in Java, C++ or whatever decent language you choose.

If you really experience performance problems, use a profiler (there are many good options available for Java) to find the bottleneck. I bet it won’t be the string-to-number conversion.

LEAVE A COMMENT