I followed this link and wrote below code. As you can see, my code stores intermediate outputs also and I want higher order derivatives(order=2,3) of one_hot wrt them. But I’m getting RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior. But I clearly construct graph using one_hot which should be in the graph. I want higher order derivatives of one_hot wrt all intermediate outputs, so is there any efficient way of calculating in least computation possible rather than calculating for each intermediate output separately(I mean using derivatives of interme_out[3] for calculating derivatives of interme_out[0]). Is there any option to not calculate cross-derivatives? Any help is highly appreciated. Thanks in advance.

import torch
import torch.nn as nn
import numpy as np

class Model(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear1 = nn.Linear(10,20)
        self.linear2 = nn.Linear(20,30)
        self.linear3 = nn.Linear(30,40)
        self.linear4 = nn.Linear(40,50)
        self.interme_outs = [[],[],[],[]]

    def save(self,x,i):
        self.interme_outs[i] = x

    def get(self,i):
        return self.interme_outs[i]

    def forward(self,x):
        x = self.linear1(x)
        self.save(x,0)
        x = self.linear2(x)
        self.save(x,1)
        x = self.linear3(x)
        self.save(x,2)
        x = self.linear4(x)
        self.save(x,3)
        return x

model = Model()
inputs = torch.rand(1,10)
output = model(inputs)

index = np.argmax(output.cpu().data.numpy(), axis=-1)
b = output.size()[0]
one_hot = np.zeros((b, output.size()[-1]), dtype=np.float32)
one_hot[np.arange(b), index] = 1
one_hot = torch.from_numpy(one_hot).to(output.device).requires_grad_(True)
one_hot = torch.sum(one_hot * output)

for i in range(3):
    H = grad(one_hot,model.interme_outs[0],create_graph=True)[0]
    one_hot = H.sum()

Khám phá các thẻ bài đăng