Homework 3

Table of contents

  1. November 27 Update
  2. Starter Code and Assignment File
  3. Preparation: Setting up AWS
  4. Submission

This homework is due on Thursday Nov 30 2023 at 2pm.

November 27 Update

The starter code, assignment file and latex template have been updated. Here is a summary of the main changes:

  • Added generate.py to the starter code. You will need to complete this file for Question 5.0 in the assignment.
  • The expected submission format has changed (see assigment file for details).
  • Updated latex template to match the latest assignment file.

Starter Code and Assignment File

The starter code for this homework is available here.
The assignment file is available here.
The latex template for this assignment is available here.

Preparation: Setting up AWS

To complete this homework, you will need access to GPU resources for training models.

The TAs have compiled step-by-step setup instructions with screenshots here: AWS setup guide


For homework 3, we will be using Gradescope for code and written answers and Canvas for model checkpoints and data. Please use the link from the Canvas coursepage.
Two assignments have been created on Gradescope: Homework 3 - PDF and Homework 3 - Code.

Homework 3 - PDF will be manually graded by the instructors.
Please submit your a pdf file [andrew-id].pdf with answers to the written questions, and annotate the locations of each question’s solution in your PDF.

Homework 3 - Code contains an autograder that will grade your code with the same unit tests provided in the homework.
We will be evaluating code in Section 4 of the assignment through the autograder in Gradescope, but you should submit all the code that you used to complete the assignment.

For the autograder unit tests, please submit the following files on Gradescope with these exact filenames. Do not place your files in a directory, as the tests will not run properly.

  • cleaning.py
  • tokenization.py

We will also be manually grading and inspecting other parts of your code. To make manual grading possible, please submit the following (also on Gradescope):

  • model.py (if using your HW2 code, include the implementation of your model)
  • config.yaml (if using your HW2 code, include the config file to load your model)
  • generate.py
  • test_other.py (your custom test cases for Q3.3.4)
  • other_scripts.zip (optional, but if other scripts are used, please zip and include a README indicating which scripts correspond to each question)

Since Gradescope has a 100MB size limit, we will submit model checkpoints and code through Canvas: Homework 3 - Model Checkpoints and Data. Please submit these files to Canvas, with these exact filenames:

  • cleaned_data.arrow.zip
  • model-original.pt (weights from your model trained on uncleaned data)
  • model-clean.pt (weights from your model trained on cleaned data)