CSE 132C – Database System Implementation (Online-Only Edition)

!!! This website is archived. Please see the website of the latest edition of this course among the links listed here. !!!

Lectures: MWF 1:00-1:50pm PT @ Zoom only (link posted on Piazza page)

Instructor: Arun Kumar

  • Email: arunkk [at] eng.ucsd.edu

  • Office Hours: Wed 3:00-4:00pm PT @ Zoom/phone only (link posted on Piazza page)

Teaching Assistants:

  • Palash Chauhan

    • Email: p1chauha [at] eng.ucsd.edu

    • Office Hours: Fri 3:30-4:30pm PT @ Zoom/phone only (link posted on Piazza page)

    • Palash handles questions/doubts regarding the programming projects.

  • Vraj Shah

    • Email: vps002 [at] eng.ucsd.edu

    • Office Hours: Thu 4:30-5:00pm PT @ Zoom/phone only (link posted on Piazza page)

    • Vraj handles questions/doubts regarding the lecture materials/quizzes/exams.

Piazza: CSE 132C (Requires access code posted on Canvas)

Announcements

  • New! Final exam scores and solutions have been released on Canvas. Score statistics, as well as overall grade statistics for the whole class are provided in the last Piazza post.

Course Goals and Content

This is a hands-on systems-focused course on the implementation of a database management system (DBMS), especially, a relational DBMS (RDBMS). RDBMSs are the cornerstone of large-scale data management in numerous application domains that define our modern world, including finance, insurance, retail, logistics, telecommunications, healthcare, governance, and education. Furthermore, concepts developed in the context of RDBMSs are indispensable for the underpinnings of the so-called Big Data and NoSQL systems that were developed for new applications such as Web search, e-commerce, social media analytics, and large-scale machine learning systems.

This course will cover key systems topics in implementing an RDBMS: data storage, buffer management, indexing, sorting, relational operator implementations, a bit of query optimization, and a bit of transaction management and concurrency control. The implementation of newer Big Data systems such as Spark and MapReduce/Hadoop, as well as distributed NoSQL/key-value stores and in-memory RDBMSs will likely be covered too.

A major component of this course is hands-on C++ programming to implement two key components of an RDBMS, a buffer manager and a B+ Tree index, on top of a basic RDBMS skeleton that will be provided.

Course Format and Online-only Modality Instructions

  • The class meets 3 times a week for 50-minute lectures.

    • All lectures will be via a Zoom video conference call. I will play a recorded video of my lecture. You can interrupt to ask doubts/questions live during this call. Major Q & A from lectures may be summarized as a Piazza post. All lecture videos will also be made available online for asynchronous viewing by students. The links will be posted on the schedule page.

    • You must join the class Piazza page (see link above) and follow class announcements and disussions. You must familiarize yourself with Canvas for this course. You are also highly encouraged to install and familiarize yourself with Zoom, which is UCSD's recommended video conferencing software.

    • Students are NOT required to have webcams. But microphones are highly encouraged. All Zoom meetings can be joined via phone as well.

  • 2 C++ programming projects.

    • Students can work on projects either in teams of 2 or individually (teams of 1).

    • Students should email their team decisions to the TA before 11:59pm PT Friday 4/10. All remaining students will be assigned to teams randomly by the TA.

    • See the projects page for more details.

  • 6 short online quizzes on Canvas.

    • Each quiz will be up to 15min long. It will have only multiple choice questions (MCQ). Quantitative/longer problems will exist but only final answer needs to be selected. No partial credits. No negative points.

    • To help enforce academic integrity, the following features of Canvas Quizzes will be used: group questions for random subsets, one question at a time, and answer lock-in.

    • The guideline for time per question is a max of 45sec to 1min per point. The points of each question will be calibrated accordingly

    • The quizzes will be available on Canvas for a fixed time window (e.g., 2 hours). You must take the quiz within this time window; note that time limit still applies.

    • If you fail to take a quiz, you will get no credit by default. If you miss a quiz due to a pre-notified and certifiable medical or emergency reason, that quiz will be waived for you and your score will be reweighted accordingly. At the end of the class, only your 5 best quiz scores will be used for grading.

    • The quizzes are open books/notes/Web. The only requirement is you should neither give nor receive help from any other person.

  • A midterm exam and a cumulative final exam.

    • These will also be delivered as Canvas Quizzes, exactly like the quizzes above, except the midterm exam will be 50min long and the final exam, 180min long. The time windows for taking these exams online will also be longer.

    • If you miss an exam, you will get no credit for it unless you notify the instructor in advance with a certifiable medical or emergency reason.

    • Like the quizzes, both exams are open books/notes/Web. The only requirement is you should neither give nor receive help from any other person.

  • I will also release some ungraded exercises on the docs page throughout the quarter. These questions will act as practice for the graded quizzes and exams.

Prerequisites

  • CSE 132A (DB Systems Principles) is necessary. It will also be helpful if you have taken CSE 120 (Operating Systems) or DSC 102 (Systems for Scalable Analytics) but these are not necessary.

  • You should know, or be willing to learn quickly by yourself, the programming language C++ for the projects. Here is a good C++ tutorial.

Textbooks

  • Recommended: Database Management Systems (3rd edition), by Raghu Ramakrishnan and Johannes Gehrke (aka the "cow book").

  • Additional (optional): Database Systems: The Complete Book (2nd edition), by Hector Garcia-Molina, Jennifer Widom, and Jeffrey Ullman.

Exam Dates

  • Midterm Exam: Wednesday, 4/29; preferred slot: 1:00pm to 1:50pm PT; time window TBD

  • Final Exam: Thursday, 6/11; preferred slot: 11:30am to 2:30pm PT; time window TBD

Grading

  • Project 1: 15%

  • Project 2: 25%

  • Quizzes: 10%

  • Midterm Exam: 15%

  • Final Exam: 35%

Cutoffs

Since this is the very first online-only edition of this course, the grading scheme is a hybrid of absolute and relative grading to mitigate the "cold start" issue. The absolute cutoffs are based on your absolute total score. The relative bins are based on your position in the total score distribution of the class. The better grade among the two (absolute-based and relative-based) will be your final grade.

Grade Absolute Cutoff (>=) Relative Bin (Use strictest)
A+ 95 Highest 5%
A 90 Next 10% (5-15)
A- 85 Next 15% (15-30)
B+ 80 Next 15% (30-45)
B 75 Next 15% (45-60)
B- 70 Next 15% (60-75)
C+ 65 Next 5% (75-80)
C 60 Next 5% (80-85)
C- 55 Next 5% (85-90)
D 50 Next 5% (90-95)
F < 50 Lowest 5%


Example: Suppose the total score is 82 and the percentile is 33. So, the relative grade is B-, while the absolute grade is B+. The final grade then is B+.

Non-Letter Grade Options: You certainly have the option of taking this course for a non-letter grade. As per the CSE department's guidelines, the policy for P in a P/F option is a pass-equivalent letter grade, i.e., D or better; the policy for S in an S/U option is a letter grade of B- or better.

Classroom Rules

  • No late days for submitting the programming projects. Partial credits are possible as per TA's assessment. Plan your work well up front accordingly.

  • Students are encouraged to ask questions and participate in the discussion during the live lecture slot and on Piazza. Enter your name or click "raise your hand" on Zoom Chat; the instructor will pause and ask you to speak or type your question.

  • Please review UCSD's honor code and policies and procedures on academic integrity here. If plagiarism is detected in your project code, or if we detect collusion on the graded quizzes or exams, or if any other form of academic integrity violation is identified, University authorities will be notified for appropriate disciplinary action to be taken. You will also get 0 for that component of your score and get downgraded substantially.

  • Harassment or intimidation of any form against any student will not be tolerated during the calls or on Piazza.