CSE 132C – Database System Implementation

Lectures: MWF 4-4:50pm PT at WLH 2205

Instructor: Arun Kumar

  • Email: akk018 [at] ucsd.edu

  • Office Hours: Mon 5-6pm PT @ 3218 CSE

Discussions: Fri 5-6pm PT (used only occasionally)

Teaching Assistant: Kyle Luoma

  • Email: kluoma@ucsd.edu [at] ucsd.edu

  • Office Hours: Wed 11am-12pm PT on Zoom only (link posted on Piazza)

  • Extra office hours: Apr 25 9am-2pm PT; May 22 10-11am PT; May 25 9am-1pm PT

Piazza: CSE 132C Spring 2023

  • There is no class on Mon, Apr 3. The introductory lecture will be on Wed, Apr 5.

Course Goals and Content

This is a hands-on systems-focused course on the implementation of a database management system (DBMS), especially, a relational DBMS (RDBMS). RDBMSs are the cornerstone of large-scale data management in numerous application domains that define our modern world, including finance, insurance, retail, logistics, telecommunications, healthcare, governance, and education. Furthermore, concepts developed in the context of RDBMSs are indispensable for the underpinnings of so-called "Big Data" and "NoSQL " systems built for new applications such as Web search, e-commerce, and social media analytics and those of emerging systems for scalable ML/AI and data science.

This course will cover key systems topics in implementing an RDBMS: data storage, buffer management, indexing, sorting, relational operator implementations, a bit of query optimization, and the implementation of so-called "Big Data" systems such as MapReduce/Hadoop and Spark. Cutting-edge topics such as cloud-native RDBMSs and ML for RDBMSs will also be covered.

A major component of this course is hands-on C++ programming to implement two key components of an RDBMS, a buffer manager and a B+ Tree index, on top of a basic RDBMS skeleton that will be provided.

Course Format and Instructions

  • Lectures and Discussions:

    • The class meets 3 times a week for 50-minute lectures.

    • All lectures will be held in person only. The lectures will be automatically podcast and available online for asynchronous viewing.

    • The discussion slot will be used only occasionally, including once before each exam for a review discussion.

    • Attending the lectures and discussions is not mandatory but highly encouraged.

    • Familiarize yourself with this course website and Piazza. All class announcements and asynchronous discussions will be on Piazza.

  • 2 C++ programming projects.

    • Students can work on projects in teams of 2 or 1 (individual).

    • See the projects page for more details, including all dates/deadlines.

    • There are no late days for the projects; plan your work accordingly.

    • Your (team's) code submission must be entirely your (team's) own. The projects page offers more guidance on what level of discussion outside your team is allowed, as well as the policy on usage of CoPilot/ChatGPT or other LLM-based tools.

  • 2 Quizzes and 2 Exams:

    • This course has 2 progress quizzes, a midterm exam, and a cumulative final exam. All of them will be held in person only on pre-announced dates.

    • The exams will have a mix of multiple choice questions (MCQ), quantitative, and essay questions. Some questions may have partial credits. The quizzes will likely have only MCQ.

    • The guideline for time per question is a max of 1min per point. The points of each question will be calibrated accordingly.

    • If you miss a quiz or an exam, you will get no credit for it unless you notify the instructor in advance with a certifiable medical or emergency reason and receive a makeup exam slot.

    • Both the quizzes and exams are closed notes/books/electronics/Web. For all of them, you should neither give nor receive help from anyone by any means.

  • 6 Peer Instruction Activities:

    • They will be held live in class using Google Forms, spread randomly across the quarter.

    • Each activity will have 1 MCQ. You must first answer individually. Then you can discuss the question with you neighbor(s). After that, you can answer the same question again.

    • These activities are also open books/notes/Web.

    • Grading is based on earnest participation in the whole activity.

    • If you miss an activity, you will get no credit for it, unless you notify the instructor in advance with a university approved reason.

    • You can miss up to 1 activity out of the 6 without losing credit.

  • There will be 2 Peer Evaluation Activities delivered via Canvas only. These will be related to the invited industry guest lectures. I will announce more details in due course.

  • I will release some ungraded exercises on Canvas Files throughout the quarter. These questions will act as practice for the quizzes and exams.

  • The discussion slot will be used only twice by me, for a review discussion before each exams. The TA might also use it for presenting about each project.

Prerequisites

  • CSE 132A (DB Systems Principles) or DSC 102 (Systems for Scalable Analytics) is necessary. It will also be helpful if you have taken CSE 120 (Operating Systems) or CSE 132B (DB Systems Applications) but these are not necessary.

  • You should know, or be willing to learn quickly by yourself, the programming language C++ for the projects. Here is a good C++ tutorial.

Textbooks

  • Recommended: Database Management Systems (3rd edition), by Raghu Ramakrishnan and Johannes Gehrke (aka the "cow book").

  • Additional (optional): Database Systems: The Complete Book (2nd edition), by Hector Garcia-Molina, Jennifer Widom, and Jeffrey Ullman.

Quiz and Exam Dates

  • Quiz 1: Mon, May 1; in class

  • Midterm Exam: Mon, May 15; in class

  • Quiz 2: Fri, May 26; in class

  • Final Exam: Thu, Jun 15; 3-6pm PT; room TBD

Grading

  • Project 1: 7%

  • Project 2: 20%

  • Quizzes: 10% (2 x 5%)

  • Midterm Exam: 15%

  • Cumulative Final Exam: 40%

  • Peer Instruction Activities: 5% (5 x 1%)

  • Peer Evaluation Activities: 3% (2 x 1.5%)

Cutoffs

The grading scheme is a hybrid of absolute and relative grading. The absolute cutoffs are based on your absolute total score. The relative bins are based on your position in the total score distribution of the class. The better grade among the two (absolute and relative) will likely be your final grade.

Grade Absolute Cutoff (>=) Relative Bin (Use strictest)
A+ 95 Highest 5%
A 90 Next 10% (5-15)
A- 85 Next 15% (15-30)
B+ 80 Next 15% (30-45)
B 75 Next 15% (45-60)
B- 70 Next 15% (60-75)
C+ 65 Next 5% (75-80)
C 60 Next 5% (80-85)
C- 55 Next 5% (85-90)
D 50 Next 5% (90-95)
F < 50 Lowest 5%


Example: Suppose the total score is 82 and the percentile is 33. So, the relative grade is B-, while the absolute grade is B+. The final grade then is B+.

Non-Letter Grade Options: You have the option of taking this course for a non-letter grade. The policy for P in a P/F option is a letter grade of C- or better; for S in an S/U option is a letter grade of B- or better.

Classroom Rules

  • No late days for submitting the programming projects. Plan your work well up front accordingly.

  • Students are encouraged to ask questions and participate in the discussions in class and also on Piazza. Please raise your hand before speaking and the instructor will call on you to speak.

  • Please review UCSD's honor code and policies and procedures on academic integrity on this website. If plagiarism is detected in your code, or if we detect collusion on the quizzes or exams, or if any other form of academic integrity violation is identified, you will get zero for that component of your score and get downgraded substantially. I will also notify the University authorities for appropriate disciplinary action to be taken, up to and including expulsion from the University.

  • Please review UCSD's principles of community and our commitment to creating an inclusive learning environment on this website.

  • Harassment or intimidation of any form against any student will not be tolerated in class or on Piazza. Please review UCSD's policies on dealing with harassment and discrimination on this website.