Unidentified TensorFlow retracing leading to ResourceExhaustedError

  Kiến thức lập trình

My Encoder below is a U-Net CNN that aims to invisibly embed a static watermark onto a cover image.

class Encoder(Layer):
    def __init__(self):
        super(Encoder, self).__init__()

        self.conv1_1 = Conv2D(32, (3, 3), padding='same', activation='relu')
        self.conv1_2 = Conv2D(32, (3, 3), padding='same', activation='relu')
        self.pool1 = MaxPooling2D(pool_size=(2, 2))

        self.conv2_1 = Conv2D(64, (3, 3), padding='same', activation='relu')
        self.conv2_2 = Conv2D(64, (3, 3), padding='same', activation='relu')
        self.pool2 = MaxPooling2D(pool_size=(2, 2))

        # and so on ...

In the following, the Classifier() is a model with a Spatial Transformation Network (STN) that rectifies the perspective warp + a CNN that detects the presence of a watermark (using sigmoid function to yield probability). The NoiseAndDistortion() is a layer that applies noises like perspective warp and blurs using OpenCV’s CV2.

encoder = Encoder()
classifier = Classifier()
noise_and_distortion = NoiseAndDistortion()
optimizer = Adam(learning_rate=LEARNING_RATE)

And this is the excerption of my training code:

X_batch = load_batch(current_batch_image_paths, batch_size)
X_batch = tf.convert_to_tensor(X_batch)

with tf.GradientTape() as tape:
    watermarked_images = encoder(X_batch)

    noisy_watermarked_images = apply_distortions(watermarked_images)
    noisy_unmarked_images = apply_distortions(X_batch)

    combined_images = tf.concat([noisy_watermarked_images, noisy_unmarked_images], axis=0)
    combined_labels = tf.concat([tf.ones((BATCH_SIZE, 1)), tf.zeros((BATCH_SIZE, 1))], axis=0)

    classifier_outputs = classifier(combined_images)

    loss = combined_loss(X_batch, watermarked_images, combined_labels, classifier_outputs)

    gradients = tape.gradient(loss, encoder.trainable_variables + classifier.trainable_variables)

    optimizer.apply_gradients(zip(gradients, encoder.trainable_variables + classifier.trainable_variables))

The apply_distortions() function uses the NoiseAndDistortion() earlier to apply simulated noises on each image on the current batch tensor. While the combined_loss() is using TF’s SSIM and BinaryCrossEntropy to evaluate the performance of Encoder and Classifier respectively.

My image size is (400, 560) with a batch size of 4. Everything was running smoothly on a Google Colab T4 GPU instance, from start until the gradient computation, uses about 11.2/15.0GB of GPU. On the execution reaches the last line: optimizer.apply_gradients(......), the following warnings and errors occurs:

WARNING:tensorflow:5 out of the last 5 calls to <function _BaseOptimizer._update_step_xla at 0x79d5f0f07370> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.

Screenshot: ResourceExhaustedError: Out of memory while trying to allocate 3670016132 bytes.

I have tried to solve it with GPT but it doesn’t solve it as well. But following the hints from the warning message:

  1. I did not create any @tf.function in a loop;
  2. I have no idea what is it about, yet I’m confident that the shapes are consistent.
  3. I have double checked, everything used within the tf.GradientTape() context is converted to tensor instead of Python objects.

I believe there are something wrong and leads the execution to run into an infinite loop internally (judging by the fact that it is trying to allocate 3670016132 bytes). Please consider the fact that it is my first time working with TF and Keras. I’m pretty sure there are many faulty things there. I thought these are maybe enough to find my fault but if I can provide anything that could be useful here, please let me know.

Would really appreciate if you could help to troubleshoot. Thanks in advance.

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT