Welcome to CS 294-43: Large Scale Vision and Language Models (Fall, 2024)


Course Logistics


Course Description

Grounded perception is a key feature of future AI, enabling machines to understand and interact with the world through semantic cues and effective communication with humans. The interplay of vision and language has long tantalized AI researchers, and has recently begun to bear considerable fruit. As time and the interest of participants permit, this course will delve into advancements in vision and language models, focusing on the development of large vision and language models. This course will explore cutting-edge techniques for reinforcement learning from human feedback (RLHF), multimodal dialog models, text-to-image and text-to-video generation, understand the scaling laws for large models, and study continual and long-context learning. Additionally, the course will cover retrieval augmented generation, the synergy between vision, language, and robotics, and address critical ethical issues, interpretability, and explainability of AI systems. The overarching goal of the course is to equip students with the background to contribute to research in the rapidly evolving field of foundational vision and language models.

Previous Offerings

Prerequisites

Permission of instructor required for all students, including auditors. Students are expected to have BOTH completed graduate computer vision and/or NLP courses and be engaged in active research on related topics. Limited to 30 participants, with preference given to those actively researching in the area with the most prior course and publishing experience. Please fill out this request form to summarize your background and express your interest in joining the course either for credit or as an auditor. Permission codes will be sent to selected students to register for the course. To ensure full consideration for participation in the course please fill out the form before 8/20/2024.

Course Format

Each weekly meeting will be structured as a seminar-style discussion, with half of the time dedicated to an overview of the week's area and the other half dedicated to a deep dive into a few papers. The course will be organized as follows:

Meeting Format / Times:


Coursework

The following are the requirements for students taking the course for credit.

Points

Every student will be required to earn a total of at least two points throughout the semester. Points can be earned in the following ways:

For all students

In addition to earning at least two points, all students are required to complete the following:

For students taking the course for more than two units

In addition to the above requirements, students taking the course for more than two units will be required to complete a course project. The project can be completed individually or in groups of up to three students. The project will be graded based on the following criteria:

Auditing

Unfotunately, we do not have the capacity to accommodate auditors in the course, however course materials will be made available online after each meeting. You are welcome to take the couse for two credits as a S/U course, which has a limited workload (see above for coursework requirements).

All students are welcome

We are committed to doing what we can to work for equity and to create an inclusive learning environment that actively values the diversity of backgrounds, identities, and experiences of everyone in the course. It is our expectation that all interactions with course staff and other students will demonstrate appropriate respect, consideration, and compassion for others. Please remember to be friendly and thoughtful; our community draws from a wide spectrum of valuable experiences. For further reading, please reference the Berkeley Principles of Community and Berkeley Campus Code of Student Conduct.

Special Accommodations

We will provide appropriate accommodations to all students enrolled in Berkeley's Disabled Students Program (DSP). To ensure that you receive the appropriate accommodations, have your DSP specialist submit a letter confirming your status and accommodations. If you're not enrolled in DSP, or are in the process of being onboarded by DSP, you may still be eligible for accommodations (such as extended time on exams or extended deadlines). You may also be eligible for accommodations if serious extenuating circumstances should come up during the semester. If you believe you may require accommodations, please contact us. All DSP and accommodations-related materials for this course are kept in a repository separate from the rest of the course materials that is visible only to the instructors, selected staff, and staff course managers. For any DSP and accommodations-related communications, please reach out to an instructor directly.

Well-Being and Mental Health

If you are experiencing personal, academic, or relationship problems and would like to talk to someone with training and experience, reach out to the Counseling and Psychological Services (CAPS) on campus. CAPS is the university's counseling center dedicated to student mental health and wellbeing. Phone appointments can be made at CAPS by calling (510) 642-9494, or for more information, please visit the wepage at https://uhs.berkeley.edu/counseling. If you are in crisis, please call the 24/7 crisis line at (855) 817-5667.

AI Tools and Ethics

We expect that all material generated in this class, including code, reports, and presentations will adhere to the ACL policy on publication ethics. In particular, authors are responsible for all content submitted, and any use of generative AI tools and technologies to create content should be fully disclosed in the Acknowledgements section - for instance, "Section 3 was written with inputs from ChatGPT."


Schedule

Date Description Deadlines Discussed Papers
Planning Meeting - 08/26 (Optional) (Remote) Meeting to discuss course logistics and topics
Week 1 - 09/02 No Class - Labour Day
Week 2 - 09/09 Introduction to Large Vision + Language Models - Overview of Vision + Language Models
- Visual Encoders (CLIP, SigLIP, CoCa)
- LLaVA (and variants, Prismatic/Cambrian)
Week 3 - 09/16 Introduction to Large Vision + Language Models (Cont.) - Flamingo/Chameleon
- Idefics 1/2/3
- Transfusion
- Video LLaMA
Week 4 - 09/23 Long Context Learning - Long Context Learning
- Looong-Llava
- Visual Haystacks
- Large World Models
Week 5 - 09/30 Project Proposals / Research Pitches
Week 6 - 10/07 Vision + Language + Robotics - Overview
Week 7 - 10/14 Vision + Language + Robotics (Cont.) - Overview
Week 8 - 10/21 Interpretability and Explainability Project Proposals Due - LVLM Interpretability
- Text-Based Interpretability
- Multimodal ICL
- Concept Editing
Week 9 - 10/28 Text to Image Models - Overview
- Prompt to Prompt
- Imagen
- Latent Diffusion Models
Week 10 - 11/04 Text to Video Models
Week 11 - 11/11 No Class - Veterans Day
Week 12 - 11/18 Instruction Tuning
Week 13 - 11/25 Policy, Regulation and Ethical Considerations
Week 14 - 12/02 Project Presentations
(Finals) - 12/15 - Final Project Reports Due

Contact

To contact us, please do so by email:

Instructor

Discussant Provocateur

Seminar Coordinator