Welcome to CS 294-43: Large Scale Vision and Language Models (Fall, 2024)
Course Logistics
- Lectures: M 1:00PM - 3:00PM PDT, Location Berkeley Way West, Room 8019
- Lecture Videos / Remote Participation: Lecture/seminar information will be posted on the course website, but to encourage discussion, will not be recorded. We will provide a way for remote students to participate in the live lectures.
- Contact: Please use email for all course-related questions. If you need to contact the course staff privately, please use the email addresses listed below.
Course Description
Grounded perception is a key feature of future AI, enabling machines to understand and interact with the world through semantic cues and effective communication with humans. The interplay of vision and language has long tantalized AI researchers, and has recently begun to bear considerable fruit. As time and the interest of participants permit, this course will delve into advancements in vision and language models, focusing on the development of large vision and language models. This course will explore cutting-edge techniques for reinforcement learning from human feedback (RLHF), multimodal dialog models, text-to-image and text-to-video generation, understand the scaling laws for large models, and study continual and long-context learning. Additionally, the course will cover retrieval augmented generation, the synergy between vision, language, and robotics, and address critical ethical issues, interpretability, and explainability of AI systems. The overarching goal of the course is to equip students with the background to contribute to research in the rapidly evolving field of foundational vision and language models.
Previous Offerings
Prerequisites
Permission of instructor required for all students, including auditors. Students are expected to have BOTH completed graduate computer vision and/or NLP courses and be engaged in active research on related topics. Limited to 30 participants, with preference given to those actively researching in the area with the most prior course and publishing experience. Please fill out this request form to summarize your background and express your interest in joining the course either for credit or as an auditor. Permission codes will be sent to selected students to register for the course. To ensure full consideration for participation in the course please fill out the form before 8/20/2024.
Course Format
Each weekly meeting will be structured as a seminar-style discussion, with half of the time dedicated to an overview of the week's area and the other half dedicated to a deep dive into a few papers. The course will be organized as follows:
- Lead Presentations: Each week, one or two students will lead the discussion on the week's topic, providing an overview of the area and highlighting key papers. These students will be responsible for preparing a 20-minute overview presentation and leading the discussion.
- Weekly Readings / Paper Deep Dives: Each week, the leads will help to select 2-3 papers for a deep dive, with the rest of the class reading these papers in advance. Students will volunteed to present the papers (1 volunteer presenter per paper) in detail, followed by a group discussion.
- Spotlights: Each week, we will also spotlight a few other recent papers in the area, with a brief overview and discussion.
Meeting Format / Times:
- Introduction / background [20 min] [4+ papers]
- 2-3 papers presented in detail [20 min presentation + 10 min discussion each]
- Spotlights of other latest work [20 min][2 - 4 papers]
Coursework
The following are the requirements for students taking the course for credit.
Points
Every student will be required to earn a total of at least two points throughout the semester. Points can be earned in the following ways:
- Leading a discussion on a week's topic (2 points)
- Presenting a paper in detail (1 point)
- Additional points may be granted by the course instructor based on class participation, helping with organization etc. The option for additional points to be granted by the instructor also allows for flexibility in recognizing and rewarding students for their efforts and contributions beyond leading and presenting.
For all students
In addition to earning at least two points, all students are required to complete the following:
- Active participation in class discussions
- Completion of short response form before each class summarizing the key idea in one or two assigned key papers each week and asking one critical question or making a suggested extension to the work. Additional optional papers will also be covered each week but no response form will be required.
For students taking the course for more than two units
In addition to the above requirements, students taking the course for more than two units will be required to complete a course project. The project can be completed individually or in groups of up to three students. The project will be graded based on the following criteria:
-
[For 3 units] A course project which is one of the following types: new research results and report judged suitable for submission to a CV, NLP, or NeurIPS workshop, a solid replication or reimplementation of existing work, evaluation of existing work on a new dataset, or a literature survey. (Or other format with permission of instructor.)
-
[For 4 units] A course project with new research results and report judged suitable for acceptance at a top CV or NLP conference or journal venue, or a major new open source repository or dataset with high impact for the community.
Auditing
Unfotunately, we do not have the capacity to accommodate auditors in the course, however course materials will be made available online after each meeting. You are welcome to take the couse for two credits as a S/U course, which has a limited workload (see above for coursework requirements).
All students are welcome
We are committed to doing what we can to work for equity and to create an inclusive learning environment that actively values the diversity of backgrounds, identities, and experiences of everyone in the course. It is our expectation that all interactions with course staff and other students will demonstrate appropriate respect, consideration, and compassion for others. Please remember to be friendly and thoughtful; our community draws from a wide spectrum of valuable experiences. For further reading, please reference the Berkeley Principles of Community and Berkeley Campus Code of Student Conduct.
Special Accommodations
We will provide appropriate accommodations to all students enrolled in Berkeley's Disabled Students Program (DSP). To ensure that you receive the appropriate accommodations, have your DSP specialist submit a letter confirming your status and accommodations. If you're not enrolled in DSP, or are in the process of being onboarded by DSP, you may still be eligible for accommodations (such as extended time on exams or extended deadlines). You may also be eligible for accommodations if serious extenuating circumstances should come up during the semester. If you believe you may require accommodations, please contact us. All DSP and accommodations-related materials for this course are kept in a repository separate from the rest of the course materials that is visible only to the instructors, selected staff, and staff course managers. For any DSP and accommodations-related communications, please reach out to an instructor directly.
Well-Being and Mental Health
If you are experiencing personal, academic, or relationship problems and would like to talk to someone with training and experience, reach out to the Counseling and Psychological Services (CAPS) on campus. CAPS is the university's counseling center dedicated to student mental health and wellbeing. Phone appointments can be made at CAPS by calling (510) 642-9494, or for more information, please visit the wepage at https://uhs.berkeley.edu/counseling. If you are in crisis, please call the 24/7 crisis line at (855) 817-5667.
AI Tools and Ethics
We expect that all material generated in this class, including code, reports, and presentations will adhere to the ACL policy on publication ethics. In particular, authors are responsible for all content submitted, and any use of generative AI tools and technologies to create content should be fully disclosed in the Acknowledgements section - for instance, "Section 3 was written with inputs from ChatGPT."
Schedule
Date | Description | Deadlines | Discussed Papers |
---|---|---|---|
Planning Meeting - 08/26 (Optional) | (Remote) Meeting to discuss course logistics and topics | ||
No Class - Labour Day | |||
Week 2 - 09/09 | Introduction to Large Vision + Language Models | - Overview of Vision + Language Models - Visual Encoders (CLIP, SigLIP, CoCa) - LLaVA (and variants, Prismatic/Cambrian) |
|
Week 3 - 09/16 | Introduction to Large Vision + Language Models (Cont.) | - Flamingo/Chameleon - Idefics 1/2/3 - Transfusion - Video LLaMA |
|
Week 4 - 09/23 | Long Context Learning | - Long Context Learning - Looong-Llava - Visual Haystacks - Large World Models |
|
Week 5 - 09/30 | Project Proposals / Research Pitches | ||
Week 6 - 10/07 | Vision + Language + Robotics | - Overview |
|
Week 7 - 10/14 | Vision + Language + Robotics (Cont.) | - Overview |
|
Week 8 - 10/21 | Interpretability and Explainability | Project Proposals Due | - LVLM Interpretability - Text-Based Interpretability - Multimodal ICL - Concept Editing |
Week 9 - 10/28 | Text to Image Models | - Overview - Prompt to Prompt - Imagen - Latent Diffusion Models |
|
Week 10 - 11/04 | Text to Video Models | ||
No Class - Veterans Day | |||
Week 12 - 11/18 | Instruction Tuning | ||
Week 13 - 11/25 | Policy, Regulation and Ethical Considerations | ||
Week 14 - 12/02 | Project Presentations | ||
(Finals) - 12/15 | - | Final Project Reports Due |
Contact
To contact us, please do so by email:
Instructor
- Prof. Trevor Darrell (trevordarrell@berkeley.edu)
Discussant Provocateur
- Dr. David Chan (davidchan@berkeley.edu)
Seminar Coordinator
- XuDong Wang (xdwang@eecs.berkeley.edu)