Important Details

  • Location: Baker Hall A51
  • Time: Tuesdays and Thursdays 2 PM - 3:20 PM
  • Instructor email: llms-11-667 @ andrew.cmu.edu

Course Description

Large Language Models Methods and Applications (11-667) is a graduate-level course that aims to provide a holistic view of the current state of large language models. The first half of this course starts with the basic of language models, including network architectures, training, inference, and evaluation. Then it discusses the interpretation (or attempts of), alignments, and emergent capabilities of large language models, followed by its popular applications in language tasks and new utilizations beyond texts. In the second half, this course first presents the techniques of scaling up language model pretraining and recent approaches in making the pretraining of large models and their deployment more efficient. It then discusses various concerns surrounding the deployment of large language models and wraps up with the challenges and frontiers of LLM developments.

This course is designed to give graduate-level students an overview of the techniques behind LLMs and a thorough grounding on the fundamentals and cutting-edge developments of LLMs, to prepare them for further research or applied endeavors in this new AI era.

Learning Goals

Students who successfully complete this course will be able to:

  • Compare and contrast different models in the LLM ecosystem in order to determine the best model for any given task.
  • Implement and train a neural language model from scratch in Pytorch.
  • Utilize open-source libraries to finetune and do inference with popular pre-trained language models.
  • Understand how to apply LLMs to a variety of downstream applications, and how decisions made during pre-training affect suitability for these tasks.
  • Read and comprehend recent, academic papers on LLMs and have knowledge of the common terms used in them (alignment, scaling laws, RLHF, prompt engineering, instruction tuning, etc.).
  • Design new methodologies to leverage existing large scale language models in novel ways.

Prerequisites

Students should have a basic understanding of machine learning, equivalent to the material covered by 10-301/10-601, and be familiar with concepts in natural language processing, equivalent to those covered by 11-411/11-611.

Students are expected to be fluent in Python. Familiarity with deep learning frameworks such as PyTorch will also be helpful.

Class Format

Classes will be in person, every Tuesday and Thursday 2:00PM-3:20PM at Baker Hall A51.

Readings: There will be reading materials for each lecture, which students are required to read through before the class.

Quizes: Each class will start with an in-person quiz about the reading materials for the lecture or the material from previous lectures.

Interactive Activities: There will be ungraded, interactive activities interspersed through the lectures. These will be things like discussing a topic from the class with those sitting near you or answering questions via polling software.

Homework: There will be six homework assignments, to be completed individually.

Exams: There will be a midterm exam and a final exam.

Grading

  • 60%: Homeworks
    • Each homework is worth 10% of your grade.
  • 20%: Midterm exam
    • Date TBD
  • 20%: Final exam
    • Date TBD

Late Policy

Each student has six free late days to use across the six homeworks. If you are out of late days, then you will not be able to get credit for subsequent late homeworks. One “day” is defined as anytime between 1 second and 24 hours after the homework deadline. The intent of the late day policy it to allow you to take extra time due to unforseen circumstances like illnesses. To use your late days on a homework, you MUST fill out this form.

In the event of a medical emergency, please make your personal health, physical and mental, your first priority. Seek help from medical and care providers such as University Health Services. Students can request medical extensions afterwards with proof/note from providers. These will not count toward your 5 days. For other emmergencies and absences, students can request extensions with corresponding documentation in a case-by-case basis with instructors.

Accomodations

If you have a disability and require accommodations, please contact Catherine Getchell, Director of Disability Resources, 412-268-6121, getchell@cmu.edu. If you have an accommodations letter from the Disability Resources office, we encourage you to discuss your accommodations and needs with us as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate.

Policy on Missing Class

Classes will not be recorded. If you must miss class, you should arrange to get notes from a friend.

Academic Integrity

Please take some time to read through CMU’s Academic Integrity Policy. Students who violate this policy will be subject to the disciplinary actions described in the Student Handbook.

Collaboration on Homeworks

The six homeworks should be completed individually. However, we encourage you to ask questions on Piazza and in office hours. While you may discuss strategies amongst yourselves, all experiments and analyses should be your own.

Use of Language Models

Using a language model to generate any part of a homework answer without attribution will be considered a violation of academic integrity. This means that if you use ChatGPT or CoPilot to assist you on a homework, you must state so explicitly within your response. On each homework, you will be asked to attest to whether you used AI systems to assist on the homework, and if so, in what manner. If you have used AI systems to generate any part of your homework, you must submit the prompts/instructions/inputs you used to obtain the generated output. Your grading will be based on both the correctness of your homework response and the quality of your prompts/instructions. Errors in the generated outputs that appear in your homework response , and non-interesting prompts, e.g., merely putting in the homework questions to the language model, are not intellectual efforts and are unlikely to receive a good grade.

  • Building blocks of modern LLMs

  • Transformer architecture and pre-training learning objectives

  • Pre-training data curation and tokenization

  • Architecture advancements since 2019: mixture-of-experts, layer norms, relative attention, RoPE embeddings

  • Automatic evaluation of LLMs

  • Task-oriented finetuning and parameter-efficient tuning methods

  • Instruction tuning and chain-of-thought reasoning

  • Alignment, RLHF, jail-breaking

  • Retrieval-augmented generation

  • Guest lecture

  • Tool use

  • Popular benchmarks and their brittleness

  • Interpretability methods

  • Human evaluation

  • Fall break - no class

  • Fall break - no class

  • MIDTERM EXAM

  • Bias and ethical issues

  • Scaling laws

  • Optimization and parallel training

  • Democracy day - no class

  • Long-sequence models

  • Flash attention and sparse models

  • Chatbots and AI agents

  • Embedding learning

  • Synthetic data generation, self-play, models teaching models

  • Legal considerations

  • Thanksgiving - no class

  • guest lecture

  • guest lecture