# Predicting financial transactions

I would like to build software to help folks with their personal finances. One area I would like to tackle is trying to predict future transactions based in past transactions. For instance you might see the electric bill being paid near a certain day of the month. The algorithm would then be able to create future predicted transactions in a forecast that would show that bill being paid on that day. It would do that for all the bills and be able to lay out a cash flow forecast based on it.

By cash flow I mean a listing of transactions that are expected to occur in the future based on past patterns. So as in my example, any bills found to pay with certain patterns would continue to follow that pattern.

I think machine learning (ML) is the right tool. I’ve done a lot of learning in the area but have yet to see something that seems to fit this kind of problem. Any ideas as to what algorithms would be applied? I’m just looking for some nudges in the right direction as to which ML algorithms would be best to look into.

I’ll back up and try to give more context to the problem. So today if you want to get the benefit of any kind of forecasting tool for your personal finances, you generally are manually entering and then maintaining a list of bills. I would like to eliminate this need. If you look at a checking account, it probably would be evident to you that certain bills pay on certain schedules, even if you weren’t familiar with the account. So pattern recognition that humans currently would do fairly easily in simpler cases. Sounds classically ML to me. So I would like to try to apply that. The end desired result from this is a projected cash flow showing what the future checking account might look like if the current trends continued.

Please note that the sentiment of not using ML at all is well understood. I disagree and I’m looking for help with the ML. General lectures about the nature of ML and other suggestions about simple averages do not address the question at all.

The overarching principle behind machine learning is that it consumes a data set composed of inputs and historical outputs, updates its internal state, and, when given a new and novel set of inputs, can guess an output that falls into the historical pattern as demonstrated in the historical data. In order for the machine to determine whether an output is valid, it needs feedback– e.g. you run the machine on historical inputs and see if it can compute historical outputs.

If the computed output differs from the actual output, the difference is called a cost function… the machine would then update its internal state variables (typically some sort of weights or biases) to minimize cost with repeated exposure to more data.

In a typical experimental design, the inputs and outputs used for learning come from a different data set from the inputs and outputs used to test validity. E.g. if you expose a machine to John’s inputs and outputs, see if it can learn enough to compute Bob’s outputs if given Bob’s inputs. So you are going to need a lot of data!

Ask yourself what are the inputs and outputs and what exactly you want your “machine” to be able to compute.

For example

Inputs: An array of financial transactions, for a certain period of interest. Each item in the array is composed of the following attributes: date, amount, and type.

Output: Total cash flow for the period of interest.

The above can be accomplished via simple math, e.g. add up the amount field from all of the inputs– the result is the output.

That is it.

Now, if you insist on overthinking this, I suppose you could go to a great deal of trouble to train a neural network to figure out the right gradient values to be able to perform addition without explicitly telling it that it has to do addition (this is not trivial!) OR you could simply code the machine to perform addition, because you know that is what has to be done. In which case you don’t need machine learning at all.

I’m guessing that the problem you are trying to solve does not need machine learning– rather, what you need is a static model, which you yourself would design rather than the machine.

A simple moving average might work. If you want to go a little crazy you could arrange your data in a general linear model which could predict values quite nicely, assuming that the overall mechanism is not nonlinear. So far in your question you have not demonstrated the need for anything more sophisticated than that.