PyTorch Basics

PyTorch is a computing library with a focus on deep learning. It provides three mayor submodules to make deep learning easy:

  • A high performance tensor computing package torch.tensor
  • A computational graph that is built while you do your computations torch.autograd
  • Classes to package computations into modules and collect parameters hierarchically torch.nn

Tensor computing with torch.tensor

Tensors can be created from numpy arrays:

>>> a = torch.from_numpy(np.random.randn(10,10,10,10))
>>> b = torch.DoubleTensor(np.random.randn(10,10,10,10))
>>> (a + b).mean()
-0.006650402067610719

Automated differentiation with torch.autograd

>>> a = torch.autograd.Variable(torch.from_numpy(np.random.randn(10,10,10,10)),requires_grad=True)
>>> b = torch.autograd.Variable(torch.from_numpy(np.random.randn(10,10,10,10)),requires_grad=True)
>>> c = (a + b).mean()
>>> c.backward(retain_graph=True) # adds the backpropagated gradient to the gradient buffers
>>> a.grad
# tensor of 1s
>>> b.grad
# tensor of 1s

Model building with torch.nn

PyTorch Extensions in Convis

Layer

Layers are extensions of `torch.nn.Module`s. They behave very similarly, but have a few additional features:

  • a Layer knows if it accepts 1d, 3d, or 5d time sequence input and can broadcast the input accordingly if it has too few dimensions
  • instead of running the model on the complete time series, the input can be automatically chunked by using the .run(.., dt=chunk_length) method instead of calling the Layer directly.
  • a Layer can create its own optimizer

Output

A class that collects all outputs of a Layer for Layers that have more than one output.

Extending Conv3d

To make apparent how convis and PyTorch differ, we will first implement a custom convolution layer that wraps the PyTorch 3d convolution.

To create an output that is the same shape as the input, we need to pad the input at both sides of the x and y dimension, with either a constant, a mirror or a replicating border condition, and we need to remember the the last slice of the previous input, so that we can continously take in input and not lose frames between them.

So what we want the layer to do in its forward pass is:

def forward(self, x):
    if not ... :
        # for the case that we have no input input_state
        # or the input state does not match the shape of x
        self.input_state = torch.autograd.Variable(torch.zeros(...))
        # using eg. the first slice of the input initially
        self.input_state[:,:,-self.filter_length:,:,:] = x[:,:,:self.filter_length,:,:]
    x_pad = torch.cat([self.input_state, x], dim=2) # input padded in time
    self.input_state = x_pad[:,:,-(self.filter_length):,:,:]
    # finally, padding x and y dimension
    x = torch.nn.functional.pad(x,self.kernel_padding, 'replicate')
    return self.conv(x_pad)

A full implementation can look something like this:

class MyMemoryConv(convis.Layer):
    def __init__(self,in_channels=1,out_channels=1,kernel_dim=(1,1,1), bias = False):
        self.dim = 5
        self.autopad = True
        super(MyMemoryConv, self).__init__()
        self.conv = torch.nn.Conv3d(in_channels, out_channels, kernel_dim, bias = bias)
        self.input_state = None
    @property
    def filter_length(self):
        """The length of the filter in time"""
        return self.conv.weight.data.shape[2] - 1
    @property
    def kernel_padding(self):
        """The x and y dimension padding"""
        k = np.array(self.weight.data.shape[2:])
        return (int(math.floor((k[2])/2.0))-1,
                int(math.ceil(k[2]))-int(math.floor((k[2])/2.0)),
                int(math.floor((k[1])/2.0))-1,
                int(math.ceil(k[1]))-int(math.floor((k[1])/2.0)),
                0,0)
    def set_weight(self,w,normalize=False):
        if type(w) in [int,float]:
            self.conv.weight.data = torch.ones(self.conv.weight.data.shape) * w
        else:
            if len(w.shape) == 1:
                w = w[None,None,:,None,None]
            if len(w.shape) == 2:
                w = w[None,None,None,:,:]
            if len(w.shape) == 3:
                w = w[None,None,:,:,:]
            self.conv.weight.data = torch.Tensor(w)
            self.conv.kernel_size = self.conv.weight.data.shape[2:]
        if normalize:
            self.conv.weight.data = self.conv.weight.data / self.conv.weight.data.sum()
    def forward(self, x):
        if (self.input_state is None or
               self.input_state.size()[:2] != x.size()[:2] or
               self.input_state.size()[-2:] != x.size()[-2:]):
            self.input_state = x.detach()
        if self.filter_length > 0:
            if self._use_cuda:
                x_pad = torch.cat([self.input_state[:,:,-(self.filter_length):,:,:].cuda(), x.cuda()], dim=TIME_DIMENSION)
                self.conv.cuda()
            else:
                x_pad = torch.cat([self.input_state[:,:,-(self.filter_length):,:,:], x], dim=TIME_DIMENSION)
        else:
            x_pad = x
        self.input_state = x.detach()
        x_pad = torch.nn.functional.pad(x_pad,self.kernel_padding, 'replicate')
        return self.conv(x_pad)

Now this convolution layer already does most of the hard work of padding the input and remembering a state. A similar one is already implemented in convis under convis.filters.