How can the dimensions of a Conv2D layer be calculated?

  Kiến thức lập trình

I’m trying to understand the dimensions of the output of the Generator of my GAN. The dims of the result after each layer is as follows:

Start: torch.Size([128, 74, 1, 1])  
After block1: torch.Size([128, 256, 3, 3])  
After block2: torch.Size([128, 128, 6, 6])  
After block3: torch.Size([128, 64, 13, 13])  
After block4: torch.Size([128, 1, 28, 28])

The Generator code is below. Here z_dim is 74 but was initially 64. It was appended with 10 class labels as shown below.

fake_noise = get_noise(cur_batch_size, z_dim, device=device) 
noise_and_labels = combine_vectors(fake_noise, one_hot_labels)
fake = gen(noise_and_labels)
class Generator(nn.Module):
    '''
    Generator Class
    Values:
        z_dim: the dimension of the noise vector, a scalar
        im_chan: the number of channels of the output image, a scalar
              (MNIST is black-and-white, so 1 channel is your default)
        hidden_dim: the inner dimension, a scalar
    '''
    def __init__(self, z_dim=10, im_chan=1, hidden_dim=64):
        super(Generator, self).__init__()
        self.z_dim = z_dim
        # Build the neural network
        self.block1 = self.make_gen_block(z_dim, hidden_dim * 4)
        self.block2 = self.make_gen_block(hidden_dim * 4, hidden_dim * 2, kernel_size=4, stride=1)
        self.block3 = self.make_gen_block(hidden_dim * 2, hidden_dim)
        self.block4 = self.make_gen_block(hidden_dim, im_chan, kernel_size=4, final_layer=True)

    def make_gen_block(self, input_channels, output_channels, kernel_size=3, stride=2, padding=1, final_layer=False):
        '''
        Function to return a sequence of operations corresponding to a generator block of DCGAN;
        a transposed convolution, a batchnorm (except in the final layer), and an activation.
        Parameters:
            input_channels: how many channels the input feature representation has
            output_channels: how many channels the output feature representation should have
            kernel_size: the size of each convolutional filter, equivalent to (kernel_size, kernel_size)
            stride: the stride of the convolution
            final_layer: a boolean, true if it is the final layer and false otherwise
                      (affects activation and batchnorm)
        '''
        if not final_layer:
            return nn.Sequential(
                nn.ConvTranspose2d(input_channels, output_channels, kernel_size, stride),
                nn.BatchNorm2d(output_channels),
                nn.LeakyReLU(0.2, inplace=True),
            )
        else:
            return nn.Sequential(
                nn.ConvTranspose2d(input_channels, output_channels, kernel_size, stride),
                nn.Tanh(),
            )

    def forward(self, noise):
        '''
        Function for completing a forward pass of the generator: Given a noise tensor,
        returns generated images.
        Parameters:
            noise: a noise tensor with dimensions (n_samples, input_dim)
        '''
        x = noise.view(len(noise), self.z_dim, 1, 1)
        print(f'Gen: {x.shape}')
        x = self.block1(x)
        print(f'After block1: {x.shape}')
        x = self.block2(x)
        print(f'After block2: {x.shape}')
        x = self.block3(x)
        print(f'After block3: {x.shape}')
        x = self.block4(x)
        print(f'After block4: {x.shape}')
        return x

def get_noise(n_samples, z_dim, device='cpu'):
    '''
    Function for creating noise vectors: Given the dimensions (n_samples, z_dim)
    creates a tensor of that shape filled with random numbers from the normal distribution.
    Parameters:
      n_samples: the number of samples to generate, a scalar
      z_dim: the dimension of the noise vector, a scalar
      device: the device type
    '''
    return torch.randn(n_samples, z_dim, device=device)

According to the formula here, the result after the first block will be (1 + 2x0 -1x(3-1) -1)/2 +1 = 0 but it shows 3×3. What am I doing wrong here?

Short answer: Transposed convolutions and (regular) convolutions are not the same thing, so equations to determine the output shape for the latter do not apply to the former.

Long answer

You are asking for the shape of a Conv2d result, and you are using the equation of the torch.nn.Conv2d documentation. However, your code is using transposed convolutions (sometimes also misleadingly called “deconvolutions”), namely torch.nn.ConvTranspose2d layers – which is a whole different thing than (regular) convolutions. See, for example, here for a visual demonstration of various kinds of convolutions, including regular and transposed convolutions.

Using the equations for the output shapes provided in the ConvTranspose2d documentation, you have

H_out = (H_in - 1) · stride[0] - 2 · padding[0] +
        dilation[0] · (kernel_size[0] - 1) +
        output_padding[0] + 1

Thus, for the first block, with H_in=1, stride=2, padding=0¹,
dilation=1, kernel_size=3, output_padding=0, you will have

H_out = (1 - 1) · 2 - 2 · 0 + 1 · (3 - 1) + 0 + 1
      = 0 - 0 + 2 + 0 + 1
      = 3

which is exactly what you saw. (The same equation applies to the width accordingly.)

Methodically, the layer-by-layer increasing size through the use of transposed convolutions that you see makes perfect sense to me, given that it is the role of the generator in a GAN to produce samples from lower-dimensional input noise.

¹) Note here, that the padding=1 argument of make_gen_block()‘s signature is not passed on to the initialization of ConvTranspose2d in the method’s body (in fact, it is not used, at all), so ConvTranspose2d‘s default, padding=0, is actually applied instead.

0

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website Kho Theme wordpress Kho Theme WP Theme WP

LEAVE A COMMENT