Changing Keras/Tensorflow “visible devices” makes the model less accurate

  Kiến thức lập trình

I am running a CNN built in Keras/Tensorflow. Because I’m running on my school’s remote server, which other poeple use, I typically set the “visible devices” to limit the number of GPUs used, rather than using all of them. Here is the code I use to do so:

def set_gpus(num_gpus=4):
    print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
    gpus = tf.config.list_physical_devices('GPU')
    if gpus:
        # Restrict TensorFlow to only use the first GPU
        try:
            tf.config.set_visible_devices(gpus[:num_gpus], 'GPU')
            logical_gpus = tf.config.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
        except RuntimeError as e:
            # Visible devices must be set before GPUs have been initialized
            print(e)

I discovered that my model performs much better when I have not run set_gpus than when I have. When only using 4 GPUs, the training accuracy reached 88% after 100 epochs, but the validation and testing accuracy both stayed around 50%, and the F1 scores for each of the two classes showed about 66% accuracy for one class and about 3% for the other. In contrast, when I ran the code without running set_gpus() first, the validation and testing accuracy were around 70%, and class-specific scores were 56% and 77%.

I’ve tried changing the num_gpus when I run set_gpus, but even then, I get the same results as when I only use 4 GPUs. Even when I set the GPU number to 8, which is the total number on the server, the results do not change. Does anyone know what could be causing the problem?

My code is running on a Jupyter Notebook with Python 3.12.2, Keras version 3.0.5, and Tensorflow version 2.16.1. Here is additional code related to my CNN model, in case it helps:

def convolution_encoder(inputs, kernels, kernel_size=[3,3], stride=[1,1], data_format='channels_last'):
    # Normalization and Attention
    x = layers.BatchNormalization(axis=-1)(inputs)
    x = layers.Conv2D(filters=kernels, activation= 'relu', kernel_size=kernel_size, strides=stride, data_format=data_format)(x)
    x = layers.MaxPooling2D(data_format=data_format)(x)
    return x

def build_model_predict(
    input_shape,
    output_dim=2,
    mode='single-channel',
    data_format='channels_last'
):
    inputs = keras.Input(shape=input_shape)
    x = inputs
    x= convolution_encoder(x, 16, [5,5], [2,2], data_format=data_format)    
    x= convolution_encoder(x, 32, [3,3], [1,1], data_format=data_format)
    x= convolution_encoder(x, 64, [3,3], [1,1], data_format=data_format)
    x= layers.Flatten()(x)
    x= layers.Dropout(0.5)(x)
    x= layers.Dense(units=256, activation='sigmoid')(x)
    x= layers.Dropout(0.5)(x)
    outputs = layers.Dense(output_dim, activation="softmax")(x)
    return keras.Model(inputs, outputs)

def run_CNN(x, y, seeded=False, seed=42, epoch_num=50):
    if seeded:
        x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.20, random_state=seed, stratify=y)
        a, x_val, b, y_val= train_test_split(x_train, y_train, test_size= 0.10, random_state=seed, stratify=y_train)
    else:
        x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.20, stratify=y)
        a, x_val, b, y_val= train_test_split(x_train, y_train, test_size= 0.10, stratify=y_train)
    keras.backend.clear_session()
    model = build_model_predict(
        x_train.shape[1:],
        output_dim=y.shape[-1])
    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=5e-4, beta_1=0.9, beta_2=0.999, epsilon=1e-08),  # Optimizer
        # Loss function to minimize
        loss=keras.losses.CategoricalFocalCrossentropy(),
        # List of metrics to monitor
        metrics=[keras.metrics.CategoricalAccuracy()],
    )
    print("Fit model on training data")
    history = model.fit(
        x_train,
        y_train,
        batch_size=64,
        epochs=epoch_num,
        # We pass some validation for
        # monitoring validation loss and metrics
        # at the end of each epoch
        validation_data=(x_val, y_val),
        initial_epoch=0
        #callbacks= callbacks_list
    )
    results = model.evaluate(x_test, y_test, batch_size=128)
    print('Training accuracy by class:')
    y_train_predict = model.predict(x_train)
    score= f1_score(np.argmax(y_train, axis=1), np.argmax(y_train_predict, axis=1), average=None)
    print(score)
    print('Testing accuracy by class:')
    y_predict = model.predict(x_test)
    score= f1_score(np.argmax(y_test, axis=1), np.argmax(y_predict, axis=1), average=None)
    print(score)
    return model

And here is the actual code block that runs the model:

set_gpus()
model=run_CNN(x_stft, y_short, seeded=True, epoch_num=100)

LEAVE A COMMENT