Which hyperparameters should I adjust to improve accuracy?

  Kiến thức lập trình

I would like to know how to increase the accuracy score and lower the loss in a multilabel classification problem.

If you look at the sklearn reference, there is a mention of multilabel in Multiclass and multioutput algorithms and I am testing it now.
(https://scikit-learn.org/stable/modules/multiclass.html)

The sample data had 10 features using make_multilabel_classification in sklearn.datasets, and a dataset was created by modifying n_classes.

When there are two classes in multilabel, it seems that the accuracy and loss are somewhat satisfactory.

from numpy import mean
from numpy import std
from sklearn.datasets import make_multilabel_classification
from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, hamming_loss

# define dataset
X, y = make_multilabel_classification(n_samples=10000, n_features=10, n_classes=2, random_state=1)

# summarize dataset shape
print(X.shape, y.shape)
# summarize first few examples
for i in range(10):
 print(X[i], y[i])

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=101)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
print(scaler.mean_)
print(scaler.var_)

x_train_std = scaler.transform(X_train)
x_test_std = scaler.transform(X_test)

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(x_train_std, y_train)

pred = knn.predict(x_test_std)

print(accuracy_score(y_test, pred))
print(hamming_loss(y_test, pred))

accuracy_score: 0.8345, hamming_loss: 0.08875

However, as the number of classes exceeds 3, the accuracy score gradually decreases and the loss increases.

# define dataset
X, y = make_multilabel_classification(n_samples=10000, n_features=10, n_classes=3, random_state=1)

n_classes= 3 –> accuracy_score: 0.772, hamming_loss: 0.116

n_classes= 4 –> accuracy_score: 0.4875, hamming_loss: 0.194125

This is also similary when using the RandomForestClassifier algorithm and MLPClassifier algorithm, as shown in Reference, or when using ClassifierChain(estimator=SVC) to use an algorithm that does not support Multilabel classification.

Which hyperparameters should I adjust to improve accuracy?

LEAVE A COMMENT