#### torch Gaussian random weights initialization and L2-normalization

I have a linear/fully-connected torch layer which accepts a `latent_dim`-dimensional input. The number of neurons in this layer = `height * width`:

``````# Define hyper-parameters for current layer-
height = 20
width = 20
latent_dim = 128

# Initialize linear layer-
linear_wts = nn.Parameter(data = torch.empty(height * width, latent_dim), requires_grad = True)

'''
torch.nn.init.normal_(tensor, mean=0.0, std=1.0, generator=None)

Fill the input Tensor with values drawn from the normal distribution-
N(mean, std^2)
'''
nn.init.normal_(tensor = som_wts, mean = 0.0, std = 1 / np.sqrt(latent_dim))

print(f'1/sqrt(d) = {1 / np.sqrt(latent_dim):.4f}')

print(f'SOM random wts; min = {som_wts.min().item():.4f} &'
f' max = {som_wts.max().item():.4f}'
)
print(f'SOM random wts; mean = {som_wts.mean().item():.4f} &'
f' std-dev = {som_wts.std().item():.4f}'
)

# 1/sqrt(d) = 0.0884
# SOM random wts; min = -0.4051 & max = 0.3483
# SOM random wts; mean = 0.0000 & std-dev = 0.0880
``````

Question-1: For a std-dev = 0.0884 (approx), according to the minimum and maximum values of -0.4051 and 0.3483, it seems that the normal initializer is computing +3.87 standard deviations from mean = 0 and, -4.4605 standard deviations from mean = 0. Is this a correct understanding? I was assuming that the weights are sample from +3 and -3 std-dev away from the mean value?

Question-2: I want the output of this linear layer to be L2-normalized, such that it lies on a unit hyper-sphere. For that there seems to be 2 options:

1. Perform a one-time action of: `linear_wts.data.copy_(nn.Parameter(data = F.normalize(input = linear_wts.data, p = 2.0, dim = 1)))` and then train as usual
2. Get output of layer as: `F.relu(linear_wts(x))` and then perform L2-normalization (for each train step): `F.normalize(input = F.relu(linear_wts(x)), p = 2.0, dim = 1)`

I think that option 2 is more correct. Thoughts?

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website