I realize there are a lot of questions asking why multithreading can slowdown a program but it always seems to come down to threads aggressively locking shared state. In my case I’m not using any locks.
I have a TCP client that is receiving a byte array that contains a multi channel time series. The samples are interweaved by channel (1st C1 sample, 1st C2 sample, 2nd C1 sample, 2nd C2 sample) and encoded as little endian shorts and I want to save each channel in a std::vector. I’m receiving a lot of data from several hundred channels and I thought I’d be able to divide up the data and multithread the work however using multiple threads actually seems to slow the code down.
I’ve written the following reproduction. The average time spent in each thread increases as I reduce band_size
;
#include <iostream>
#include <thread>
#include <chrono>
#include <vector>
char data[1048576];
int main()
{
for (auto& d : data) {
d = std::rand();
}
auto nchans = 769;
std::vector<std::vector<double>> out(nchans, std::vector<double>());
auto band_size = nchans/12;
std::vector<std::thread> threads;
std::atomic_uint total_ns = 0;
for (auto c = 0; c < nchans; c += band_size) {
threads.emplace_back([&, c] {
auto start = std::chrono::steady_clock::now();
for (auto subc = 0; subc < band_size && c + subc < nchans; ++subc) {
auto channel = c + subc;
for (auto i = 2 * (c + subc); i + 1 < sizeof(data); i += 2 * nchans) {
short sample = data[i] + (data[i + 1] << 8);
out[channel].push_back(sample);
}
}
total_ns += (std::chrono::steady_clock::now() - start).count();
});
}
for (auto& promise : threads) {
promise.join();
}
std::cout << "done " << (total_ns / threads.size()) / 1e6 << "ms";
}