DSC 291 Safety in Generative AI (Fall 2025)

Quarter: Sep 30 → Dec 4, 2025 · Tue/Thu · 80 minutes each

Course Overview

Instructor: Prof. Yu-Xiang Wang
Teaching Assistants: Yingyu Lin, Erchi Wang
TA Office Hour: Thursday 11-12am at HDSI 155 (EW), Thursday 3-4 pm at HDSI 239 (YL)
Lectures: Tuesday and Thursday 9:30 AM - 10:50 AM at PCYNH 121
Acknowledgments

We gratefully acknowledge Thinking Machines for providing "Tinker" access to the students. Their support partially enables the hands‑on homeworks, model finetuning training demos, and student projects at scale.

Sponsor recognition does not influence grading, content, or academic policies.

Important Dates

  • Thu 9/25: No class (instructor traveling)
  • Tue 11/11: No class – Veterans Day
  • Thu 11/27: No class – Thanksgiving
  • Tue 12/2: No class – (NeurIPS)
  • Thu 12/4: In-class Quiz
  • Mon 12/8 Mini-Symposium - Project presentations

Your weekly routines

  • Lectures: In-person attendance required.
  • Weekly homework Done in group of 4. Due each saturday.
  • Readings: Due before the Thursday lecture each week. You need to update a reading log (with one-paragraph summary of what you've read).

Project Timeline

  • Weeks 1–2: team formation & idea brainstorming
  • Thu 10/30: Project Proposal Due (2 pages)
  • Thu 11/20: Midterm Project Update (5‑minute check‑in)
  • Mon 12/8: Final Presentation + Written Report Due

Evaluation Breakdown

  • 5% Weekly reading log
  • 5% Class Attendance / Participation
  • 40% Homework (5% each, 8 total)
  • 25% Project
  • 25% Final Quiz
  • Bonus 5% In-class presentation (limited spots — sign up early)

Weekly Schedule

Pre‑Week · Thu 9/25/25 — No Lecture (instructor traveling)
Week 1 · Foundations of GenAI (Tue 9/30, Thu 10/2)

Reading (due before Thursday lecture):

Tue 9/30 – Course Intro, Deep Learning Refresher [slides]

  • Mini‑lecture: Course overview, Deep Learning Basics .
  • Activities: (1) discussion on AI safety videos, (2) vibe‑coding deep learning from scratch.

Thu 10/2 – Foundation Models & Emerging Abilities [slides]

  • Mini‑lecture: zero‑shot, few‑shot / in‑context learning; emergence; prompt engineering (with math + coding).
  • Activities: (1) groups explain one emerging ability from readings; (2) vibe‑code a simple in‑class game; (3) groups explain one potential vulnerability.

Homework (due Saturday):

Week 2 · LLMs & Diffusion Models (Tue 10/7, Thu 10/9)

Reading (due before Thursday lecture):

Tue 10/7 – LLM Fundamentals, Pretraining [ slides ]

  • Mini‑lecture: LLMs, attention mechanism, pretraining, scaling laws.
  • Discussion: Building LLM from scratch

Thu 10/9 – Post‑Training & Safety Alignment [slides]

  • Mini‑lecture: SFT, RLHF, safety alignment.
  • Student Presentation #1: reflections on the GPT “Spec”.[slides]

Homework (due Saturday):

Week 3 · GenAI as Agents (Tue 10/14, Thu 10/16)

Reading (due before Thursday lecture):

Tue 10/14 – VAEs & Diffusion Models

  • Mini‑lecture: VAE & diffusion architectures; training basics. [slides]
  • Activity: “Can you identify AI‑generated images?” (class game).

Thu 10/16 – LLM as Agents

  • Mini‑lecture: AI agent → LLM agent; capabilities & risks. [slides]
  • Discussion: propose an LLM‑agent idea and share.
  • Student Presentations: #2 Computer‑Use Agent [slides]; #3 Coding Agent [slides].

Homework (due Saturday):

Week 4 · Inference‑Time Adversarial Attacks (Tue 10/21, Thu 10/23)

Reading (due before Thursday lecture):

Tue 10/21 – Adversarial Attacks on Images

  • Mini‑lecture: adversarial attacks on image classifiers; how/why they work. [Slides]
  • Experiment: implement gradient‑descent attack in class.
  • Student Presentations: #5 Randomized Smoothing [Slides].

Thu 10/23 – Jailbreaking & Prompt Injection

  • Mini-Lecture: jailbreaking and prompt-injection: attacks and defenses. [Slides]
  • Student Presentations: #6 Complexity of Jailbreaking attack in the wild [slides]; #7 WASP: a benchmark for realistic prompt injection attacks [slides].
  • Experiment: jailbreak "TritonGPT".

Homework (due Saturday):

Week 5 · Training/Post‑Training Time Attacks (Tue 10/28, Thu 10/30)

Reading (due before Thursday lecture):

Tue 10/28 – Data Poisoning Attacks

  • Mini‑lecture: threat models, feasibility, and harms of poisoning. [slides]
  • Discussion: preventing data poisoning (process & technical mitigations).
  • Student Presentations: #4 Adversarially Robust Training [slides]

Thu 10/30 – Model Collapse

  • Mini‑lecture: data poisoning attacks (Part 2) and training on synthetic data [slides]
  • Student Presentations: #8 Model Collapse [slides]; #9 Preventing Collapse. [slides]

Homework (due Saturday):

Week 6 · Societal Risks (Tue 11/4, Thu 11/6)

Reading (due before Thursday lecture):

Tue 11/4 – Data Privacy & Copyright

  • Mini‑lecture: differential privacy. [slides]
  • Student Presentations: #10 Privacy attack [slides]; #11 Does differential privacy solve copyright? [slides]

Thu 11/6 – Existential Threats of AI

  • Mini‑lecture: existential risk landscape. [slides]
  • AI Doomers vs Boomers. Student presentations: #12 AI is an existential threat and it's coming quicker than you think. [slides]
  • Discussion: What are your opinions (after learning how GenAI works)?

Homework (due Saturday):

Week 7 · Deepfakes, Plagiarism & AI Detectors (Tue 11/11, Thu 11/13)

Reading (due before Thursday lecture):

Tue 11/11 – Veterans Day

  • No class.

Thu 11/13 – AI Detectors

  • Mini‑lecture: distinguishing AI from human text/media. [slides]
  • Student Presentation: #14 State-of-the-Art AI detectors. [slides]

Homework (due Saturday):

Week 8 · Watermarking GenAI (Tue 11/18, Thu 11/20)

Reading (due before Thursday lecture):

Tue 11/18 – Watermarking LLMs (Part 1)

  • Mini‑lecture: history of watermarking; modern approaches. [Slides]
  • Discussion: pros/cons; where watermarking fits in the safety stack.

Thu 11/20 – Watermarking LLMs (Part 2)

  • Mini‑lecture: distortion-free watermarks. [Slides]
  • Student Presentation: #15 "On the reliability of Watermarks for LLMs"

Homework (due Saturday):

Week 9 · Watermarking LLMs (Part 3) and beyond (Tue 11/25, Thu 11/27)

Reading (due before Thursday lecture):

Tue 11/25 – Beyond AI Content Watermarking

  • Mini‑lecture: Undetectable watermarks and beyond [slides].
  • Student presentation: #16 "Watermarking in the Sand" [ slides ]
    #17 Watermarked detected, but so what? Technical vs regulatory concerns [slides]

Thu 11/27 – Thanksgiving

  • No class.

Homework:

  • No new HW assigned due to the holiday (use time for projects).
Week 10 · NeurIPS in San Diego (Tue 12/2, Thu 12/4)

Reading:

  • No readings assigned; focus on finalizing projects.

Tue 12/2 – Instructor at NeurIPS

  • No class.

Thu 12/4 – Quiz.

  • In-class closed-book quiz on the course materials.
Week 11 Mini-Symposium on GenAI Safety (Exam Week, Mon 12/8) ·

Mini-Symposium on GenAI Safety (Project Presentations)

  • Adam Dziedzic and Franziska giving keynote (CISPA Helmholtz Center for Information Security)
  • Team presentations: 15 minutes + 5 minutes Q&A each team.

Project report

  • Submit final report by the end of the day.