I’m implementing a data processing software. The software gets form the network thousands of events that must be processed in according to rules. I implemented a multi-thread service, which receives the event and insert it into the processing engine.
This means, each incoming event creates a new runnable task and return the control to the main thread. The secondary thread insert the data and dies, which is just some seconds.
This methodology creates lots of threads (between 150 to 260), and I got “complaints” that this is not the right way to do it. I should then limit the threads to 5 or 10. For me the way I implemented is the usual way. Then my questions are:
Should I limit the threads?
Is there a established “right way” to do this kind of stuff?
Note: the threads currently are being created in a Java thread pool:
Thread creation and management is actually a quite expensive operation. Creating more threads than you have CPU cores is usually counter-productive due to thread management overhead. Even moreso when your threads are short-lived which means they are created and destroyed by the operating system all the time.
A better approach for parallelizing a large number of small tasks is to create a fixed number of threads once and then process the queue by giving a new task to each thread when it finished the last one. This sounds like it’s difficult to implement, but doing this with Java is actually really simple because there is already a class for this:
It accepts tasks in form of objects which implement
Runnable. This is a functional interface, so with Java 8 you can just pass a method call.
The size of the thread pool should be chosen in a way that you don’t create more threads than you have cores. A good way to do this is to use the
Executors.newFixedThreadPool helper method in combination with
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() - 1)
- 1 is because you need an additional CPU core for the main thread of your program.
Creating as many threads as incoming events is generally a bad idea if the number of events is “large”. 150 threads is not so problematic, but imagine you receive events more and more frequently… You cannot deal with itindefinitely. So, the fact that you have 150 threads is an evidence that there is something that could become wrong.
Have a look at the ThreadPoolExecutor class, it may be what you are looking for. Basically, it creates a pool of threads (you can use a fixed-size pool of 5 threads for instance), and takes a blocking queue.
When you receive a new event, just put it in the blocking queue (which can have a limited size, so that you cannot fill your entire memory with events; several overloading policies are available)
Each of your threads from the specified pool will continuously ask to the queue for a new event. If such an event is in the queue, then the queue retrieves it to the thread, which can start its job. If the queue is empty, the call will be blocking. So basically, each thread just asks for an event and ‘waits’ until an event comes in the queue.
Edit: I have been left behind by @Philipp , sorry for that.
Please also note that, since Java 7, the ForkJoinPool class is able to run threads that are not blocked, in such a way you really have N active threads at any time. So guessing the actual number of living threads (including the blocked ones) becomes a tough game.
No, that seems like a fairly dicey way to do it.
Inserting into a queue is a very speedy operation (assuming the queue is in memory, or on the machine). Creating a thread, or even pulling one from a pool is going to take more time. If you’re using a common message queue on some other machine, then it may or may not make sense.
The best way to know is to test it. Make a version that uses threads and one that does not and put them under load. How does the response rate change as you add load? When does the system start to fail? Do you encounter concurrency bugs?