I’m starting Machine Learning. I would like to estimate the price of an article on a certain date. For example, if I know the price between 07/01/24 and 07/11/24, what is the price on 07/12/24 ?
I tried with linear or polynomial regression, RandomForestRegresor and GradientBoostingRegresor. These models return very accurate results on the data provided, but when I try to use it outside of this range, the result is either the same as the last value or an impossible result (I sometimes end up with prices negative).
Does anyone know how to do this? Should I use another method?
For example, I tried a GradientBoosting on the price of gasoline between 01/1992 and 05/2024:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
data = pd.read_csv("essence2.csv", sep=";").iloc[::-1]
x = data["date"].values.reshape(-1, 1)
y = data["price"].values
x_train, x_test, y_train, y_test = train_test_split(
x,
y,
test_size=0.01,
random_state=2,
)
model = GradientBoostingRegressor(
n_estimators=200,
learning_rate=1,
random_state=2,
)
model.fit(x_train, y_train)
y_predictions = np.round(model.predict(x), 4)
residues = y_predictions - y
fig, ax = plt.subplots(2)
ax[0].plot(x, y, color="blue", label="Prix réel")
ax[0].plot(x, y_predictions, color="orange", linestyle="--", label="Gradient Boosting Regressor")
ax[0].xaxis.set_major_locator(MaxNLocator(8))
ax[0].tick_params(axis="x", rotation=45)
ax[0].legend()
ax[1].plot(x, residues, color="orange", label="Résidues")
ax[1].xaxis.set_major_locator(MaxNLocator(8))
ax[1].tick_params(axis="x", rotation=45)
ax[1].legend()
plt.tight_layout()
plt.show()
Essence prections
How to predict the price for 06/2024 ?
New contributor