Internet Data Science for Cybersecurity
CSE 291-e, Winter 2023
Professor kc claffy
TA:
Ben Du | Office Hours: Friday 12-1pm, CSE B250A
Overview
This course will present a data-science framing for conversations about the role of Internet measurement and data science in a range of public policy issues, with an emphasis on cybersecurity. We will focus on the Internet as a data transport service, and vulnerabilities specific to interdomain routing (BGP), naming (Domain Name System), and certificate management. Persistent security challenges at this layer affect every application that operates over the Internet. This course will provide a data-oriented background on these critical dimensions of the Internet infrastructure.
This course will assume some technical knowledge of how the Internet works, and augment that knowledge with an understanding of how the global Internet is structured, managed, and financed. Understanding these dimensions, as well as interdependencies across layers, is critical to evaluating approaches to improve the security and trustworthiness of Internet infrastructure.
Structure
The class will combine two learning approaches:
- data analysis (coding) assignments to analyze various data sources that reveal characteristics of Internet structure and interdependencies;
- critical reading of recent peer-reviewed research studies that apply such data sources to analysis of the most severe threats to today's Internet infrastructure.
We will review recent research advances in understanding the security of the Internet’s transport systems, what data was used, and opportunities to overcome long-standing barriers to security advances.
The course will consist of lectures, paper discussions, individual data analysis assignments, and a group data science project.
See Canvas for access to assignments and other information.
- Required Knowledge: Some technical understanding of Internet protocols.
- Enforced Prerequisite: None.
- Recommended Preparation for Those Without Required Knowledge:
- Computer Networking: A Top-Down Approach by J. Kurose and K. Ross. ISBN-13 978-0136681557.
- Computer Networks: A Systems Approach by L. Peterson and B. Davie. ISBN-13 978-0128182000.
Learning Goals
- Familiarity with data sets that describe Internet topology structure and naming systems
- Ability to read scientific studies that measure Internet vulnerabilities and describe: vulnerability studied, data set used, strengths and weaknesses of that data set for studying that specific vulnerability
- Ability to survey set of measurement papers that study a specific vulnerability and identify consistencies or contradictions in results
- Ability to reproduce previous or conduct original measurement study of an Internet infrastructure vulnerability, extending results in literature
Paper Summary Questions
For each paper, write 2-3 paragraphs explaining the essence of each paper and addressing the questions below. Please submit your answers on Gradescope. You can access Gradescope through Canvas. Each summary is due 5pm the day before the lecture where the paper is discussed, and late submission is open until the same day 11:59pm. You will have one paper summary per week.
- What did they investigate?
- (Why) is it important?
- What data sets did they use?
- What are the limitations of the data for their question?
- How did they investigate it? What are the limitations of the method?
- What is the essence of what they found?
- What would you have done differently?
- What was surprising to you?
- Anything else you think should be mentioned
Syllabus
See the course syllabus.
Grading
Your final grade will be calculated based on the following items and weights:- Class attendance and participation (30%)
- Questions and thoughts about papers, data sets, and assignments.
- Paper summaries (written) (20%)
- For each paper, submit two/three-paragraph analysis of paper.
- One summary per week.
- Lowest score will be dropped.
- Programming assignment on Gradescope (20%)
- 4 programming assignments to learn to use Internet measurement tools and datasets.
- Final project options (30%)
- Complete 5 more assginemts!
Academic Integrity
"Academic Integrity is expected of everyone at UC San Diego. This means that you must be honest, fair, responsible, respectful, and trustworthy in all of your actions. Lying, cheating or any other forms of dishonesty will not be tolerated because they undermine learning and the University’s ability to certify students’ knowledge and abilities. Thus, any attempt to get, or help another get, a grade by cheating, lying or dishonesty will be reported to the Academic Integrity Office and will result in sanctions. Sanctions can include an F in this class and suspension or dismissal from the University. So, think carefully before you act by asking yourself: a) is what I’m about to do or submit for credit an honest, fair, respectful, responsible & trustworthy representation of my knowledge and abilities at this time and, b) would my instructor approve of my action? You are ultimately the only person responsible for your behavior. So, if you are unsure, don’t ask a friend—ask your instructor, instructional assistant, or the Academic Integrity Office. You can learn more about academic integrity at academicintegrity.ucsd.edu”
Related Courses
Course materials at other web sites you may find interesting:
- The Modern Internet Stanford CS 249i Professor Zakir Durumeric
- Internet Data Science Georgia Tech CS 8803 Professor Alberto Dainotti
- Internet Measurement Columbia ELEN 6774 Professor Ethan Katz-Bassett
- Network Startup Resource Center BGP for All
Acknowledgements
This material is based on research sponsored by the National Science Foundation (NSF) grant OAC-2131987. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NSF.