DSC 102: Systems for Scalable Analytics!!! This website is archived. Please see the website of the latest edition of this course among the links listed here. !!! AdministriviaLectures: MWF 1:00-1:50pm PT at WLH 2005 Instructor: Arun Kumar
Discussions: Fri 4:00-4:50pm PT at MANDE B-210 or Zoom (only occasionally) TAs:
Piazza: DSC 102 (access code posted on Canvas) Course Goals and ContentThis course covers the principles of computing systems and tools for scaling data analytics to large datasets. Scalable analytics systems are a central part of modern data science in numerous application domains spanning enterprise business intelligence, Web search, e-commerce, social media, natural and social sciences, healthcare, digital humanities, e-governance, Internet of Things, and more. Topics include computer organization, memory hierarchy, basics of operating systems, scalable and parallel computing, cloud computing, design and use of parallel dataflow systems (MapReduce/Hadoop and Spark), machine learning systems, and the use of deep learning tools. It will cover how relational algebra, SQL, linear algebra, and more general dataflow operations in such systems can be used to perform data preparation and feature engineering for machine learning (ML) at scale, how to scale ML training, how to perform ML model selection and deployment at scale, and how to handle data heterogeneity. It will also introduce the implementation of such data systems and touch upon the latest research in this space. A major component of this course is hands-on Python programming to implement data exploration, data preparation, and model selection pipelines on large real-world data using scalable analytics tools and cloud resources, both Amazon Web Services (AWS) public cloud and SDSC's private cloud. Course Format and Mixed Modality Instructions
Pre-requisites
Suggested Textbooks
GradingComponents
CutoffsThe grading scheme is a hybrid of absolute and relative grading. The absolute cutoffs are based on your absolute total score. The relative bins are based on your position in the total score distribution of the class. The better grade among the two (absolute-based and relative-based) will be your final grade.
Non-Letter Grade Options: You have the option of taking this course for a non-letter grade. The policy for P in a P/F option is a letter grade of C- or better; for S in an S/U option is a letter grade of B- or better. Exam Dates
Classroom Rules
|