CSE 291-F: Graph Mining and Network Analysis

Course Overview

General Info: Graduate-level course, CSE Dept., UC San Diego, Spring quarter 2017

Lecture hours: Tuesday and Thursday, 18:30 - 19:50
Lecture room: PCYNH 106

Instructor: Fragkiskos Malliaros
Email: fmalliaros [at] eng.ucsd.edu
Office hours: Friday, 10am - 12pm at Atkinson Hall, Room 4111 (or send me an email and we will find a good time to meet)

TA: Mohammad Motiei
Email: mmotiei [at] eng.ucsd.edu
Office hours: Wednesday, 4pm - 6pm at CSE Building, Room 4154

Piazza: piazza.com/ucsd/spring2017/cse291f/home


Networks (or graphs) have become ubiquitous as data from diverse disciplines can naturally be mapped to graph structures. Social networks, such as academic collaboration networks and interaction networks over online social networking applications are used to represent and model the social ties among individuals. Information networks, including the hyperlink structure of the Web and blog networks, have become crucial mediums for information dissemination, offering an effective way to represent content and navigate through it. A plethora of technological networks, including the Internet, power grids, telephone networks and road networks are an important part of everyday life. The problem of extracting meaningful information from large scale graph data in an efficient and effective way has become crucial and challenging with several important applications and towards this end, graph mining and analysis methods constitute prominent tools. The goal of this course is to present recent and state-of-the-art methods and algorithms for exploring, analyzing and mining large-scale networks, as well as their practical applications in various domains (e.g., social science, the web, biology).





Schedule and Lectures

The topics of the lectures are subject to change (the following schedule outlines the topics that will be covered in the course). The slides for each lecture will be posted in piazza just before the start of the class.


Week Date Topic Material Assignments/Project
1April 4IntroductionLecture 1
April 6Graph theory and linear algebra recap; basic network propertiesLecture 2
2April 11Random graphs and the small-world phenomenonLecture 3
April 13Power-law degree distribution and the Preferential Attachment modelLecture 4
3April 18Time-evolving graphs and network modelsLecture 5Assignment 1 out
April 20Centrality criteria and link analysis algorithmsLecture 6
4April 25Project proposal short presentations (all teams)Lecture 7Project proposal slides due on April 24
April 27No class. Traveling to SDM 2017Project proposal due
5May 2Graph clustering and community detection (Part I)Lecture 8Assignment 1 due
May 4 Graph clustering and community detection (Part II) Lecture 9
6May 9 Graph clustering and community detection (Part III) Lecture 10Assignment 2 out
May 11 Link prediction Lecture 11
7May 16Graph similarityLecture 12
May 18Graph sampling and summarization Lecture 13
8May 23 Cascading behavior in networksLecture 14Project progress report due
May 25Influence maximizationLecture 15
9May 30Representation learning in graphsLecture 16Assignment 2 due
June 1Core decomposition in graphsLecture 17
10June 6Review of topicsLecture 18
June 8No class (work on projects)Project final report due on June 11
Presentations on June 12 and June 13



[April 4] Lecture 1: Introduction

Introduction to graph mining and network analysis, administrivia, course structure and overview of the topics that will be covered in the course.

Reading:

[April 6] Lecture 2: Graph theory and linear algebra recap; basic network properties

Presentation of basic concepts in graph theory, linear algebra and spectral graph theory that will be used throughout the course. Basic network properties: degree distribution, clustering coefficient and shortest path length.

Reading: Additional:

[April 11] Lecture 3: Random graphs and the small-world phenomenon

The Erdos-Renyi random graph model and its basic properties. Comparison to the properties of real networks. The small-world phenomenon and the small-world model.

Reading: Additional:
  • Random graphs, lecture notes by Aaron Clauset (CU Boulder)
  • Diameter on d-regular random graphs, lecture notes by Yaron Singer (Harvard University)
  • Networks: An Introduction (Chapter 12)
  • P. Erdos and A. Renyi. On Random Graphs I. Publicationes Mathematicae (6) 290-297, 1959
  • P. Erdos and A. Renyi. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Koezl., 1960
  • D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature 393:440-42, 1998
  • P. S. Dodds, R. Muhamad, D. J. Watts. An Experimental Study of Search in Global Social Networks. Science 301, 2003
  • D. J. Watts, P. S. Dodds, M. E. J. Newman. Identity and Search in Social Networks. Science, 296, 1302-1305, 2002
  • M. E. J. Newman. Models of the Small World: A Review., J. Stat. Physics 2000
  • J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. ACM Symposium on Theory of Computing, 2000
  • L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four Degrees of Separation. ACM Web Science Conference. 2012
  • J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The Anatomy of the Facebook Social Graph. arXiv, 2012

[April 13] Lecture 4: Power-law degree distribution and the Preferential Attachment model

Power-law degree distribution in real networks. How to analyze and visualize power-law distributions. The Preferential Attachment model. Consequences of skewed degree distribution in the robustness of real networks.

Reading: Additional:
  • A. Clauset, C.R. Shalizi, and M.E.J. Newman. Power-law distributions in empirical data. SIAM Review 51(4), 661-703, 2009
  • Networks, crowds, and markets (Chapter 18)
  • Graph Mining: Laws, Tools, and Case Studies (Chapter 2 and 9)
  • Bela Bollobas, Oliver Riordan, Joel Spencer and Gabor Tusnady. The degree sequence of a scale-free random graph process. Journal Random Structures and Algorithms 18(3), 2001
  • M. Mitzenmacher. A Brief History of Generative Models for Power Law and Lognormal Distributions. Internet Mathematics 1(2), pp. 226-251, 2004
  • M. Faloutsos, P. Faloutsos, C. Faloutsos. On Power-Law Relationships of the Internet Topology. In SIGCOMM, 1999.
  • R. Albert, H Jeong, and A.-L. Barabasi. The diameter of the world wide web. Nature 401, 130-131, 1999
  • A.L Barabasi, R. Albert. Emergence of scaling in random networks. Science, 286, 1999

[April 18] Lecture 5: Time-evolving graphs and network models

Properties of time-evolving graphs. The Forest-Fire and Kronecker graph models.

Reading: Additional:

[April 20] Lecture 6: Centrality criteria and link analysis algorithms

Centrality criteria in graphs (degree, closeness, betweenness, eigenvector, Katz). Link analysis ranking algorithms (HITS and PageRank).

Reading: Additional:

[April 25] Lecture 7: Project proposal short presentations

Presentation of the project proposal (all teams).


[May 2] Lecture 8: Graph clustering and community detection (Part I)

Strength of weak ties. Community detection in networks. Girvan-Newman algorithm. Modularity and modularity optimization (greedy, spectral, Louvain method).

Reading: Additional:

[May 4] Lecture 9: Graph clustering and community detection (Part II)

Graph partitioning. Spectral clustering. Community evaluation criteria.

Reading: Additional:

[May 9] Lecture 10: Graph clustering and community detection (Part III)

Community detection in directed networks. Overlapping community detection. Community structure of large scale networks.

Reading: Additional:

[May 11] Lecture 11: Link prediction

Node similarity measures. Link prediction in networks.

Reading: Additional:

[May 16] Lecture 12: Graph similarity

Graph similarity. Graph kernels.

Reading: Additional:

[May 18] Lecture 13: Graph sampling and summarization

Graph sampling. Graph sparsification for community detection. Graph summarization.

Reading: Additional:

[May 23] Lecture 14: Cascading behavior in networks

Cascading behavior. Models of virus and information probagation.

Reading: Additional:

[May 25] Lecture 15: Influence maximization

Influence maximization in social networks. The Greedy algorithm. Outbreak detection in networks.

Reading:
Additional:

[May 30] Lecture 16: Representation learning in graphs

Methods for learning node embeddings in graphs (LINE, DeepWalk and node2vec).

Reading:
Additional:

[June 1] Lecture 17: Core decomposition in graphs

Core decomposition and algorithms. Applications in dense subgraph detection, community detection, identification of influential spreaders and NLP.

Reading:
  • M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, and H. A. Makse. Identification of influential spreaders in complex networks. Nature Physics 6, 888-893, 2010
  • C. Giatsidis, D. Thilikos, and M. Vazirgiannis. D-cores: Measuring Collaboration of Directed Graphs Based on Degeneracy. In ICDM, 2011

Additional:

[June 6] Lecture 18: Review of topics

Review of topics covered in the course and Q&A.



Course Structure

Learning objectives

The course aims to introduce students to the field of graph mining and network analysis by:
  • Covering a wide range of topics, methodologies and related applications.
  • Giving the students the opportunity to obtain hands-on experience on dealing with graph data and graph mining tasks.
We expect that by the end of the course, the students will have a thorough understanding of various graph mining and learning tasks, will be able to analyze large-scale graph data as well as to formulate and solve problems that involve graph structures.


Prerequisites

There is no official prerequisite for this course. However, the students are expected to:

  • Have basic knowledge of graph theory and linear algebra.
  • Be familiar with fundamental data mining and machine learning tasks (e.g., CSE 258).
  • Be familiar with at least one programming language (e.g., Python or any language of their preference).
In the second lecture, we will review basic concepts in graph theory, linear algebra and machine learning.


Reading material

Most of the material of the course is based on research articles. Some of the topics are also covered by the following books:



Evaluation

The evaluation of the course will be based on the following:

  1. Two assignments: the assignments will include theoretical questions as well hands-on practical questions and will familiarize the students with basic graph mining and analysis tasks.
  2. Project: this will be the main component for the evaluation of the course. The students are expected to form groups of 2-3 people, propose a topic for their project, and submit a final project report (it would also be interesting to organize a poster session at the end of the quarter). Please, read the project section for more details.

The grading will be as follows:

Assignment 1 (individually): 20%
Assignment 2 (groups of 3-4 students): 20%
Project (groups of 3-4 students): 60%


Academic integrity

UCSD and CSE's policies on academic integrity will be strictly enforced (please see here and here). In particular, all of your work must be your own. Some relevant excerpts from UCSD's policies are:

  • Don't copy another student's assignment, in part or in total, and submit it as your own work.
  • Acknowledge and cite source material in your papers or assignments.




Project

Details about the project of the course can be found on piazza.




Resources

Datasets


Software tools

  • NetworkX: Python software package for graph analytics
  • igraph: collection of software packages for graph theory and network analysis (Python, C++ and R)
  • SNAP: high performance system for the analysis of large network (C++ and Python)
  • Gephi: graph visualization and exploration software

Related conferences
Please find below a list of conferences related to the contents of the course (mostly in the field of data mining, social network analysis and the Web). We provide the DBLP website of each venue where you can access the proceedings (papers, tutorials, etc).

Check out the website of each conference (e.g., KDD 2016 ) for more information.