Internet Data Science for Cybersecurity
CSE 291-e, Winter 2023
This course will present a data-science framing for conversations about the role of Internet measurement and data science in a range of public policy issues, with an emphasis on cybersecurity. We will focus on the Internet as a data transport service, and vulnerabilities specific to interdomain routing (BGP), naming (Domain Name System), and certificate management. Persistent security challenges at this layer affect every application that operates over the Internet. This course will provide a data-oriented background on these critical dimensions of the Internet infrastructure.
This course will assume some technical knowledge of how the Internet works, and augment that knowledge with an understanding of how the global Internet is structured, managed, and financed. Understanding these dimensions, as well as interdependencies across layers, is critical to evaluating approaches to improve the security and trustworthiness of Internet infrastructure.
The class will combine two learning approaches:
- data analysis (coding) assignments to analyze various data sources that reveal characteristics of Internet structure and interdependencies;
- critical reading of recent peer-reviewed research studies that apply such data sources to analysis of the most severe threats to today's Internet infrastructure.
We will review recent research advances in understanding the security of the Internet’s transport systems, what data was used, and opportunities to overcome long-standing barriers to security advances.
The course will consist of lectures, paper discussions, individual data analysis assignments, and a group data science project.
See Canvas for access to assignments and other information.
- Required Knowledge: Some technical understanding of Internet protocols.
- Enforced Prerequisite: None.
- Recommended Preparation for Those Without Required Knowledge:
- Familiarity with data sets that describe Internet topology structure and naming systems
- Ability to read scientific studies that measure Internet vulnerabilities and describe: vulnerability studied, data set used, strengths and weaknesses of that data set for studying that specific vulnerability
- Ability to survey set of measurement papers that study a specific vulnerability and identify consistencies or contradictions in results
- Ability to reproduce previous or conduct original measurement study of an Internet infrastructure vulnerability, extending results in literature
Paper Summary Questions
For each paper, write 2-3 paragraphs explaining the essence of each paper and addressing the questions below. Please submit your answers on Gradescope. You can access Gradescope through Canvas. Each summary is due 5pm the day before the lecture where the paper is discussed, and late submission is open until the same day 11:59pm. You will have one paper summary per week.
- What did they investigate?
- (Why) is it important?
- What data sets did they use?
- What are the limitations of the data for their question?
- How did they investigate it? What are the limitations of the method?
- What is the essence of what they found?
- What would you have done differently?
- What was surprising to you?
- Anything else you think should be mentioned
See the course syllabus.
GradingYour final grade will be calculated based on the following items and weights:
- Class attendance and participation (30%)
- Questions and thoughts about papers, data sets, and assignments.
- Paper summaries (written) (20%)
- For each paper, submit two/three-paragraph analysis of paper.
- One summary per week.
- Lowest score will be dropped.
- Programming assignment on Gradescope (20%)
- 4 programming assignments to learn to use Internet measurement tools and datasets.
- Final project options (30%)
- Complete 5 more assginemts!
Course materials at other web sites you may find interesting:
- The Modern Internet Stanford CS 249i Professor Zakir Durumeric
- Internet Data Science Georgia Tech CS 8803 Professor Alberto Dainotti
- Internet Measurement Columbia ELEN 6774 Professor Ethan Katz-Bassett
- Network Startup Resource Center BGP for All
This material is based on research sponsored by the National Science Foundation (NSF) grant OAC-2131987. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NSF.