Syllabus - Large Language Models: Methods and Applications / Fall 2025

Important Details

Location: Baker Hall A51
Time: Tuesdays and Thursdays 2 PM - 3:20 PM
Instructor email: llms-11-667 @ andrew.cmu.edu

Course Description

Large Language Models Methods and Applications (11-667) is a graduate-level course that aims to provide a holistic view of the current state of large language models (LLMs). The first half of this course starts with the basics of language models, including network architectures, training, inference, and evaluation. Then it discusses the interpretation (or attempts of), alignments, and emergent capabilities of LLMs, followed by its popular applications in language tasks and new utilizations beyond texts. In the second half, this course first presents the techniques of scaling up language model pretraining and recent approaches in making the pretraining of LLMs and their deployment more efficient. It then discusses various concerns surrounding the deployment of LLMs and wraps up with the challenges and frontiers of recent developments.

The course is designed to give graduate-level students an overview of the techniques behind LLMs and a thorough grounding on the fundamentals and cutting-edge developments of LLMs, to prepare them for further research or applied endeavors in this domain.

Learning Goals

Students who successfully complete this course will be able to:

Compare and contrast different models in the LLM ecosystem in order to determine the best model for any given task.
Implement and train a neural language model from scratch in Pytorch.
Utilize open-source libraries to finetune and do inference with popular pre-trained language models.
Apply LLMs to a variety of downstream applications.
Observe how decisions made during pre-training affect suitability for downstream tasks.
Explain the common terms used in academic papers on LLMs (e.g., alignment, scaling laws, RLHF, prompt engineering, instruction tuning).
Read and comprehend recent, academic papers on LLMs.
Design new methodologies to leverage existing large scale language models in novel ways.

Prerequisites

Students should have a basic understanding of machine learning, equivalent to the material covered by 10-301/10-601, and be familiar with concepts in natural language processing, equivalent to those covered by 11-411/11-611.

Students are expected to be fluent in Python. Familiarity with deep learning frameworks such as PyTorch is desirable but not strictly required.

Class Format

Classes will be in person, every Tuesday and Thursday 2 PM - 3:20 PM at Baker Hall A51.

Readings: There will be reading materials for each lecture, which students are required to read through before the class.

Interactive Activities: There will be ungraded, interactive activities interspersed through the lectures. These may be activities such as discussing a topic from the class with those sitting near you or answering questions via polling software.

Homework: There will be six homework assignments, to be completed individually.

Exams: There will be a midterm exam and a final exam.

Grading

40%: Homeworks
- Each of the five homeworks is worth 8% of your grade.
20%: Mini-project
- Presented in class during one of the last lectures.
20%: Midterm exam
- Date: 10/21/2025 (in class)
20%: Final exam
- Date TBD (in class)

Late Policy

Each student has six free late days to use across the six homeworks. If you are out of late days, then you will not be able to get credit for subsequent late homeworks. One “day” is defined as anytime between 1 second and 24 hours after the homework deadline. The intent of the late day policy is to allow you to take extra time due to unforseen circumstances such as illnesses. To use your late days on a homework, you MUST fill out this form.

In the event of a medical emergency, please make your personal health, physical and mental, your first priority. Seek help from medical and care providers such as University Health Services. Students can request medical extensions afterwards with proof/note from providers. These will not count toward your six days. For other emmergencies and absences, students can request extensions with corresponding documentation on a case-by-case basis with instructors.

Accomodations

If you have a disability and require accommodations, please contact Catherine Getchell, Director of Disability Resources (412-268-6121, getchell@cmu.edu). If you have an accommodations letter from the Disability Resources office, we encourage you to discuss your accommodations and needs with us as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate.

Policy on Missing Class

We will try to record classes, but do not offer a guarantee of this. If you must miss class, you should arrange to get notes from a friend. Hwoever, we plan to make classes interactive, so please try to attend.

Policy on Missing Exams

The final exam cannot be missed. If you have a valid reason for not being able to make the midterm (for example, presenting a paper at a conference or a medical emergency), you should let us know as soon as you are aware of the conflict, and we will discuss accomodations.

Academic Integrity

Please take some time to read through CMU’s Academic Integrity Policy. Students who violate this policy will be subject to the disciplinary actions described in the Student Handbook.

Collaboration on Homeworks

The six homeworks should be completed individually. However, we encourage you to ask questions on Piazza and in office hours. While you may discuss strategies amongst yourselves, all experiments and analyses should be your own.

Use of Language Models

Using a language model to generate any part of a homework answer without attribution will be considered a violation of academic integrity. This means that if you use ChatGPT or CoPilot to assist you on a homework, you must state so explicitly within your response. On each homework, you will be asked to attest to whether you used AI systems to assist on the homework, and if so, in what manner. If you have used AI systems to generate any part of your homework, you must submit the prompts/instructions/inputs you used to obtain the generated output. Your grading will be based on both the correctness of your homework response and the quality of your prompts/instructions. Errors in the generated outputs that appear in your homework response, and non-interesting prompts, e.g., merely putting in the homework questions to the language model, are not intellectual efforts and are unlikely to receive a good grade.