Problem Set 2

\( \DeclareMathOperator*\argmax{argmax} \)

Due Friday, Feb. 16 at 11:59 pm EST on Gradescope. Please follow the general guidelines regarding homework assignments.

1. Revealing feedforward architecture by a topological sort.

Consider a neural network with the synaptic weight matrix \[ \begin{pmatrix} 0 &1 &1 &1\cr 0 &0 &1 &0\cr 0 &0 &0 &0\cr 0 &1 &1 &0 \end{pmatrix} \] We follow the convention that \(W_{ij}\) denotes the strength of the \(i\leftarrow j\) connection, and \(0\) strength means there is no connection. Show that this is a feedforward net.

Show it drawing a diagram of the network graph.
Show it by sorting the neurons, i.e., renumber them so that all nonzero connections go to a neuron with a higher number. In other words, any nonzero connection \(i\leftarrow j\) should satisfy \(i>j\) after renumbering. (Computer scientists call this a topological sort).
Write the weight matrix after the sort.

2. Representational power of multilayer perceptrons

Construct a multilayer perceptron (MLP) that computes the sum of three binary inputs \(x_{1}\), \(x_{2}\), \(x_{3}\). The two MLP outputs \(y_{1}\) and \(y_{2}\) should represent the two bits of the sum when written as a binary number, i.e., 00, 01, 10, 11. All neurons in the network should have binary outputs, i.e., use the Heaviside step function. Explain briefly and precisely why your construction works.

3. Classifying MNIST images with a multilayer perceptron.

Work through the MultilayerPerceptron notebook. The code trains a two-layer perceptron with 10 output neurons. The learning curves show moving averages over the last 50 examples. The weight vectors of the first layer start out random, resemble combinations of digits at first, and then become harder to interpret. You can also see the hidden layer activity and delta.

The gradients are computed auto-magically by PyTorch. nn.Linear includes a default initialization of the parameters (weights and biases). Uncomment this code block, which overrides the default initialization, zeroing out all parameters.

    with torch.no_grad():
        for p in model.parameters():
            p *= 0

Rerun the notebook. You will notice that almost no learning takes place. Submit your answers to the following questions. Be concise but also precise.

Which parameters are learning, and which are not?
Explain why this is the case.
What happens if you initialize only the second layer parameters at zero? Why?
What happens if you initialize only the first layer parameters at zero? Why?

4. Hand-designed convolutional nets

Open the ConvolutionBasics notebook, and work through it. Do the exercise at the end and submit your edited notebook.

Additional practice not for submission.

Run the LeNet notebook and watch the insides of a convolutional net change as it learns. The end of the MultilayerPerceptron notebook mentions three ways of improving the code.

You can implement minibatch training in the MultilayerPerceptron notebook, following the example in the LeNet notebook.
You can try the other two ways either with the MLP or with LeNet.

This last practice is for your own education and will not be graded.