How to divide work to a network of computers?

Imagine a scenario as follows: Lets say you have a central computer which generates a lot of data. This data must go through some processing, which unfortunately takes longer than to generate. In order for the processing to catch up with real time, we plug in more slave computers.

Further, we must take into account the possibility of slaves dropping out of the network mid-job as well as additional slaves being added. The central computer should ensure that all jobs are finished to its satisfaction, and that jobs dropped by a slave are retasked to another.

The main question is: What approach should I use to achieve this?

But perhaps the following would help me arrive at an answer:
Is there a name or design pattern to what I am trying to do?

What domain of knowledge do I need to achieve the goal of getting these computers to talk to each other? (eg. will a database, which I have some knowledge of, be enough or will this involve sockets, which I have yet to have knowledge of?)

Are there any examples of such a system? The main question is a bit general so it would be good to have a starting point/reference point.

Note I am assuming constraints of c++ and windows so solutions pointing in that direction would be appreciated.

Are there any examples of such a system?

Yes. This pattern is known as distributed computing(or distributed programming or whatever cool word you want to put after distributed). My suggestion will be not to build this in-house before looking at other solutions. You can look at this stack overflow question for various options. And then take calculated decision.

As noted by other answers, this field has been known as distributed computing, grid computing, cluster computing, and high performance computing.

Let me add the distinction that, when a system can be resized after start to match the workload, it is said to be “elastic”, and this is different from traditional grid computing. That is one of the (non-marketing) reasons for the term “cloud computing”: the user does not need to plan for capacity, and the number and location of the machines carrying out the computation remain featureless to him as a cloud.

Also, your requirement that the master re-schedules failed tasks is called the “fault tolerance” property of that system. (Mandatory link to this cartoon)

What approach should you use to build your own, private cloud? In my order or preference:

Don’t build your own cloud, use the infrastructure provided by others. Amazon calls this Virtual Private Cloud, Rackspace just Private Cloud; I am sure you can find other offers and compare.
Don’t build your own distributed computing engine, use the engine provided by others. If you insist on using your machines, use at least as much software as possible that is provided and tested by others. You can use Hadoop from C++ via the Pipes interface or from any executable via the Streaming API. There is a similar Streaming interface on Spark.
Don’t code all components from scratch, use components from the community. If, for some reason, you have read so far and want to roll out your own cloud components, don’t start from C++’s standard library. The main components you will need are:
- a queuing system, as noted in a comment, to send tasks from the master to the processing nodes, and to send result confirmations from processing nodes to the master
- a distributed file system, so that processing nodes can access the data to operate on.
There are many alternatives for both. For queuing, RabbitMQ has a Windows installer, as does ZeroMQ. For distributed filesystems, I have really not enough experience on Windows: it looks like you can organize SMB shares into a DFS, but I can’t give you any hint here. You could think, as noted in another answer, to use a distributed database such as MongoDB for the data; it does run on Windows.

You could also think about using MPI (usually the OpenMPI implementation, usually through its Boost wrapper), but notice that MPI programs are neither elastic nor fault tolerant per se; you need to take care of that yourself (at least they provide some mechanisms to achieve this). That is why I would recommend to you first to evaluate a framework for distribution that does have such properties.

Filed under: softwareengineering - @ 01:23

Thẻ: c++, networks, windows

Thiết kế website giá rẻ

Danh mục

How to divide work to a network of computers?