CSE 291-F: Graph Mining and Network Analysis

Course Overview

General Info: Graduate-level course, CSE Dept., UC San Diego, Spring quarter 2017

Lecture hours: Tuesday and Thursday, 18:30 - 19:50
Lecture room: PCYNH 106

Instructor: Fragkiskos Malliaros
Email: fmalliaros [at] eng.ucsd.edu
Office hours: Friday, 10am - 12pm at Atkinson Hall, Room 4111 (or send me an email and we will find a good time to meet)

TA: Mohammad Motiei
Email: mmotiei [at] eng.ucsd.edu
Office hours: Wednesday, 4pm - 6pm at CSE Building, Room 4154

Piazza: piazza.com/ucsd/spring2017/cse291f/home

Networks (or graphs) have become ubiquitous as data from diverse disciplines can naturally be mapped to graph structures. Social networks, such as academic collaboration networks and interaction networks over online social networking applications are used to represent and model the social ties among individuals. Information networks, including the hyperlink structure of the Web and blog networks, have become crucial mediums for information dissemination, offering an effective way to represent content and navigate through it. A plethora of technological networks, including the Internet, power grids, telephone networks and road networks are an important part of everyday life. The problem of extracting meaningful information from large scale graph data in an efficient and effective way has become crucial and challenging with several important applications and towards this end, graph mining and analysis methods constitute prominent tools. The goal of this course is to present recent and state-of-the-art methods and algorithms for exploring, analyzing and mining large-scale networks, as well as their practical applications in various domains (e.g., social science, the web, biology).

Schedule and Lectures

The topics of the lectures are subject to change (the following schedule outlines the topics that will be covered in the course). The slides for each lecture will be posted in piazza just before the start of the class.

Week	Date	Topic	Material	Assignments/Project
1	April 4	Introduction	Lecture 1
	April 6	Graph theory and linear algebra recap; basic network properties	Lecture 2
2	April 11	Random graphs and the small-world phenomenon	Lecture 3
	April 13	Power-law degree distribution and the Preferential Attachment model	Lecture 4
3	April 18	Time-evolving graphs and network models	Lecture 5	Assignment 1 out
	April 20	Centrality criteria and link analysis algorithms	Lecture 6
4	April 25	Project proposal short presentations (all teams)	Lecture 7	Project proposal slides due on April 24
	April 27	No class. Traveling to SDM 2017		Project proposal due
5	May 2	Graph clustering and community detection (Part I)	Lecture 8	Assignment 1 due
	May 4	Graph clustering and community detection (Part II)	Lecture 9
6	May 9	Graph clustering and community detection (Part III)	Lecture 10	Assignment 2 out
	May 11	Link prediction	Lecture 11
7	May 16	Graph similarity	Lecture 12
	May 18	Graph sampling and summarization	Lecture 13
8	May 23	Cascading behavior in networks	Lecture 14	Project progress report due
	May 25	Influence maximization	Lecture 15
9	May 30	Representation learning in graphs	Lecture 16	Assignment 2 due
	June 1	Core decomposition in graphs	Lecture 17
10	June 6	Review of topics	Lecture 18
	June 8	No class (work on projects)		Project final report due on June 11 Presentations on June 12 and June 13

[April 4] Lecture 1: Introduction

Introduction to graph mining and network analysis, administrivia, course structure and overview of the topics that will be covered in the course.

Reading:

Networks, crowds, and markets (Chapter 1 and 13)
The structure and function of complex networks (Sections I and II)
Network science (Chapter 1)

[April 6] Lecture 2: Graph theory and linear algebra recap; basic network properties

Presentation of basic concepts in graph theory, linear algebra and spectral graph theory that will be used throughout the course. Basic network properties: degree distribution, clustering coefficient and shortest path length.

Reading:

Networks, crowds, and markets (Chapter 2)
Social media mining (Chapters 2 and 5)
Notes in linear algebra, probability, and proof techniques, by Jessica Su (Stanford University)

Additional:

Networks: An Introduction (Chapters 6, 8 and 9)
SVD and Low Rank Matrix Approximations, lecture notes by Tim Roughgarden and Gregory Valiant (Stanford University)
Lectures on Spectral Graph Theory, by Fan Chung (Chapter 2: Eigenvalues and the Laplacian of a graph)
J. Leskovec and E. Horvitz. Planetary-Scale Views on a Large Instant-Messaging Network. In WWW, 2008
Lecture 3, Lecture 4 and Lecture 5 from CSE 258 (background knowledge in data mining and machine learning)

[April 11] Lecture 3: Random graphs and the small-world phenomenon

The Erdos-Renyi random graph model and its basic properties. Comparison to the properties of real networks. The small-world phenomenon and the small-world model.

Reading:

Network science (Chapter 3)
Networks, crowds, and markets (Chapter 20)

Additional:

Random graphs, lecture notes by Aaron Clauset (CU Boulder)
Diameter on d-regular random graphs, lecture notes by Yaron Singer (Harvard University)
Networks: An Introduction (Chapter 12)
P. Erdos and A. Renyi. On Random Graphs I. Publicationes Mathematicae (6) 290-297, 1959
P. Erdos and A. Renyi. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Koezl., 1960
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature 393:440-42, 1998
P. S. Dodds, R. Muhamad, D. J. Watts. An Experimental Study of Search in Global Social Networks. Science 301, 2003
D. J. Watts, P. S. Dodds, M. E. J. Newman. Identity and Search in Social Networks. Science, 296, 1302-1305, 2002
M. E. J. Newman. Models of the Small World: A Review., J. Stat. Physics 2000
J. Kleinberg. The small-world phenomenon: An algorithmic perspective. Proc. ACM Symposium on Theory of Computing, 2000
L. Backstrom, P. Boldi, M. Rosa, J. Ugander, and S. Vigna. Four Degrees of Separation. ACM Web Science Conference. 2012
J. Ugander, B. Karrer, L. Backstrom, and C. Marlow. The Anatomy of the Facebook Social Graph. arXiv, 2012

[April 13] Lecture 4: Power-law degree distribution and the Preferential Attachment model

Power-law degree distribution in real networks. How to analyze and visualize power-law distributions. The Preferential Attachment model. Consequences of skewed degree distribution in the robustness of real networks.

Reading:

Network science (Chapter 4, 5)
Graph Mining: Laws, Tools, and Case Studies (9)
R. Albert, H Jeong, and A.-L. Barabasi. Error and attack tolerance of complex networks. Nature 406, 378-382, 2000

Additional:

A. Clauset, C.R. Shalizi, and M.E.J. Newman. Power-law distributions in empirical data. SIAM Review 51(4), 661-703, 2009
Networks, crowds, and markets (Chapter 18)
Graph Mining: Laws, Tools, and Case Studies (Chapter 2 and 9)
Bela Bollobas, Oliver Riordan, Joel Spencer and Gabor Tusnady. The degree sequence of a scale-free random graph process. Journal Random Structures and Algorithms 18(3), 2001
M. Mitzenmacher. A Brief History of Generative Models for Power Law and Lognormal Distributions. Internet Mathematics 1(2), pp. 226-251, 2004
M. Faloutsos, P. Faloutsos, C. Faloutsos. On Power-Law Relationships of the Internet Topology. In SIGCOMM, 1999.
R. Albert, H Jeong, and A.-L. Barabasi. The diameter of the world wide web. Nature 401, 130-131, 1999
A.L Barabasi, R. Albert. Emergence of scaling in random networks. Science, 286, 1999

[April 18] Lecture 5: Time-evolving graphs and network models

Properties of time-evolving graphs. The Forest-Fire and Kronecker graph models.

Reading:

Graph Mining: Laws, Tools, and Case Studies (Chapter 2 and 3)
J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker Graphs: An approach to modeling networks. Journal of Machine Learning Research (JMLR) 11:985-1042, 2010

Additional:

J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph Evolution: Densification and Shrinking Diameters. ACM TKDD, 2007
J. Leskovec, D. Chakrabarti, J. Kleinberg and C. Faloutsos. Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. Proc. European Conference on Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD), 2005.
C. Seshadhri, A. Pinar, and T. G. Kolda. An In-Depth Analysis of Stochastic Kronecker Graphs. Journal of the ACM, 2013
M. Mahdian and Y. Xu. Stochastic Kronecker Graphs. Proc. 5th Workshop on Algorithms and Models for the Web Graph (WAW), 2007
Graph Mining: Laws, Tools, and Case Studies (Chapter 11)
A. Pinar, C. Seshadhri and T. G. Kolda. The Similarity between Stochastic Kronecker and Chung-Lu Graph Models. In Proc. SDM, 2012
M. Kim, J. Leskovec. Multiplicative attribute graph model of real-world networks. Internet Mathematics, 2012
E. Zheleva, H. Sharara, and L. Getoor. Co-evolution of social and affiliation networks. Proc. KDD, 2009

[April 20] Lecture 6: Centrality criteria and link analysis algorithms

Centrality criteria in graphs (degree, closeness, betweenness, eigenvector, Katz). Link analysis ranking algorithms (HITS and PageRank).

Reading:

Social media mining (Chapters 2 and 5)
Networks, crowds, and markets (Chapter 1)
S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proc. of the 7th International World Wide Web Conference (WWW), 1998
J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of the 9th ACM-SIAM Symposium on Discrete Algorithms (SODA), 1998

Additional:

Networks: An Introduction (Chapter 7, Sections 7.1 - 7.7)
F. Bonchi, G. De Francisci Morales, and M. Riondato. Centrality Measures on Big Graphs: Exact, Approximated, and Distributed Algorithms. Tutorial at WWW, 2016.
U Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology 25(2), 163-177, 2001
T. H. Haveliwala. Topic-Sensitive PageRank. In Proc 11th International World Wide Web Conference, 2002
G. Jeh and J. Widom. Scaling Personalized Web Search. In Proc. roceedings of the 12th international conference on World Wide Web (WWW), 2003
A. Borodin, J. S. Rosenthal, G. O. Roberts, and P. Tsaparas. Finding Authorities and Hubs From Link Structures on the World Wide Web. In Proc. 10th International World Wide Web Conference, 2001
P. Berkhin. A Survey on PageRank Computing. Internet Mathematics 2(1), pages 73-120, 2005

[April 25] Lecture 7: Project proposal short presentations

Presentation of the project proposal (all teams).

[May 2] Lecture 8: Graph clustering and community detection (Part I)

Strength of weak ties. Community detection in networks. Girvan-Newman algorithm. Modularity and modularity optimization (greedy, spectral, Louvain method).

Reading:

Networks, crowds, and markets (Chapter 3)
M. Girvan and M.E.J. Newman. Community structure in social and biological networks. PNAS 99, 2002
M.E.J. Newman. Modularity and community structure in networks. PNAS, 2006

Additional:

J.-P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, A.L. Barabasi. Structure and tie strengths in mobile communication networks. PNAS, 2007
M.E.J. Newman, M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113, 2004
A. Clauset, M.E.J. Newman, C. Moore. Finding community structure in very large networks. Phys. Rev. E 70, 066111, 2004
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008
S. Fortunato. Community detection in graphs. Physics Reports, 2010
Social media mining (Chapters 6)
U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski, and D. Wagner. On Modularity Clustering. IEEE TKDE 20(2), 2008
S. Fortunato and S. Barthelemy. Resolution limit in community detection. Proc. Natl. Acad. Sci., 2007
L. Backstrom, D. Huttenlocher, J. Kleinberg, X. Lan. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proc. KDD, 2006

[May 4] Lecture 9: Graph clustering and community detection (Part II)

Graph partitioning. Spectral clustering. Community evaluation criteria.

Reading:

U. von Luxburg. Tutorial on spectral clustering. Statistics and Computing 17(4), 2007
Networks: An Introduction (Sections 11.5, 11.6)

Additional:

J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-Truth. In ICDM, 2012
S. Fortunato. Community detection in graphs. Physics Reports, 2010
Social media mining (Chapters 6)
J. Shi and J. Malik. Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis And Machine Intelligence, vol. 22, no. 8, 2000
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: analysis and an algorithm . In NIPS, 2001

[May 9] Lecture 10: Graph clustering and community detection (Part III)

Community detection in directed networks. Overlapping community detection. Community structure of large scale networks.

Reading:

G. Palla, I. Derenyi, I. Farkas, T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814-818, 2005
J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics, 2009

Additional:

F. D. Malliaros and M. Vazirgiannis. Clustering and Community Detection in Directed Networks: A Survey. Physics Reports, 533(4): 95-142, 2013

[May 11] Lecture 11: Link prediction

Node similarity measures. Link prediction in networks.

Reading:

D. Liben-Nowell, J. Kleinberg. The Link Prediction Problem for Social Networks. In CIKM, 2003
R. Lichtenwalter, J.T. Lussier, and N.V. Chawla. New perspectives and methods in link prediction. In KDD, 2010
L. Backstrom, J. Leskovec. Supervised Random Walks: Predicting and Recommending Links in Social Networks. In WSDM, 2011

Additional:

G. Jeh and J. Widom. SimRank: A Measure of Structural-Context Similarity. In KDD, 2002
P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and R.Zadeh. WTF: The Who to Follow Service at Twitter. In WWW, 2013
A. Clauset, C. Moore, M.E.J. Newman. Hierarchical structure and the prediction of missing links in networks. Nature, 2008
L.A. Adami and E. Adar. Friends and neighbors on the Web. Social Networks 25, 211-230, 2003
L. Lua and T. Zhoua. Link prediction in complex networks: A survey. Physica A: Statistical Mechanics and its Applications 390(6), 1150-1170, 2011

[May 16] Lecture 12: Graph similarity

Graph similarity. Graph kernels.

Reading:

D. Koutra, N. Shah, J. Vogelstein, B. Gallagher, and C. Faloutsos. DeltaCon: A Principled Massive-Graph Similarity Function with Attribution. ACM TKDD, 2015
S.V.N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, Karsten M. Borgwardt. Graph Kernels. JMLR, 2010

Additional:

P. Papadimitriou, A. Dasdan and H. Garcia-Molina. Web Graph Similarity for Anomaly Detection. Journal of Internet Services and Applications 1(1), pp. 19-30, 2010
S. Soundarajan, T. Eliassi-Rad, and B. Gallagher. A Guide to Selecting a Network Similarity Method. In SDM, 2014
K. M. Borgwardt and H. Kriegel. Shortest-path kernels on graphs. In ICDM, 2005
N. Shervashidze, T. Petri, K. Mehlhorn, K. M. Borgwardt, and S. Vishwanathan. Efficient Graphlet Kernels for Large Graph Comparison. In AISTATS, 2009
T. Gartner, P. Flach, and S. Wrobel. On Graph Kernels: Hardness Results and Efficient Alternatives. Learning Theory and Kernel Machines, 2003
X. Gao, B. Xiao, D. Tao, and X. Li. A survey of graph edit distance. Pattern Analysis and Applications 13(1), 113-129, 2010
R.C. Wilson and P. Zhu. A Study of Graph Spectra for Comparing Graphs and Trees. Journal of Pattern Recognition 41(9), 2833-2841, 2008

[May 18] Lecture 13: Graph sampling and summarization

Graph sampling. Graph sparsification for community detection. Graph summarization.

Reading:

J. Leskovec and C. Faloutsos. Sampling from Large Graphs. In KDD, 2016
A.S. Maiya and T.Y. Berger-Wolf. Sampling Community Structure. In WWW, 2010
V. Satuluri, S. Parthasarathy, and Y. Ruan. Local graph sparsification for scalable clustering. In SIGMOD, 2011

Additional:

P. Hu and W. C. Lau. A Survey and Taxonomy of Graph Sampling. arXiv, 2013
M. Gjoka, M. Kurant, C.T. Butts, and A. Markopoulou. Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In INFOCOM, 2010
C. Hubler, H.-P. Kriegel, K. Borgwardt, and Z. Ghahramani. Metropolis Algorithms for Representative Subgraph Sampling. In ICDM, 2008
A.S. Maiya and T.Y. Berger-Wolf. Benefits of Bias: Towards Better Characterization of Network Sampling. In KDD, 2011
K. LeFevre and E. Terzi. GraSS: Graph Structure Summarization. In SDM, 2010
Y. Liu, A. Dighe, T. Safavi, and D. Koutra. Graph Summarization: A Survey. arXiv, 2016

[May 23] Lecture 14: Cascading behavior in networks

Cascading behavior. Models of virus and information probagation.

Reading:

Networks, crowds, and markets (Chapter 21)
J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst. Cascading Behavior in Large Blog Graphs. In SDM, 2007

Additional:

Networks: An Introduction (Chapter 17: 17.1 - 17.4)
Network science (Chapter 10)
D. Chakrabarti, Y. Wang, C. Wang, J. Leskovec, and C. Faloutsos. Epidemic thresholds in real networks. ACM Trans. Inf. Syst. Secur. 10(4): 1:1-1:26, 2008
B.A. Prakash, D. Chakrabarti, N.C. Valler, M. Faloutsos, and C. Faloutsos. Threshold conditions for arbitrary cascade models on arbitrary networks. Knowledge and Information Systems 33 (3), 2012
M. Goetz, J. Leskovec, M. Mcglohon, and C. Faloutsos. Modeling blog dynamics. In ICWSM, 2009
E. Adar and L. Adamic. Tracking information epidemics in blogspace. In Web Intelligence, 2005
J. Ugander, L. Backstrom, C. Marlow, and J. Kleinberg. Structural Diversity in Social Contagion. PNAS 109 (16), 2012
A. Goyal, F. Bonchi, and L.V.S. Lakshmanan. Learning influence probabilities in social networks. In WSDM, 2010

[May 25] Lecture 15: Influence maximization

Influence maximization in social networks. The Greedy algorithm. Outbreak detection in networks.

Reading:

D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the Spread of Influence through a Social Network. In KDD, 2003

Additional:

A. Goyal, W. Lu, and L. S.V. Lakshmanan. SIMPATH: An Efﬁcient Algorithm for Inﬂuence Maximization under the Linear Threshold Model. In ICDM, 2011
J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective Outbreak Detection in Networks. In KDD, 2007
W. Chen, Y. Wang, and S. Yang. Efficient Influence Maximization in Social Networks. In KDD, 2009
W. Chen, Y. Yuan, and L. Zhang. Scalable Influence Maximization in Social Networks under the Linear Threshold Model. In ICDM, 2010
A. Goyal, W. Lu, and L. V.S. Lakshmanan. CELF++: optimizing the greedy algorithm for influence maximization in social networks. In WWW, 2011
E. Cohen, D. Delling, T. Pajor, and R.F. Werneck. Sketch-based Influence Maximization and Computation: Scaling up with Guarantees. In CIKM, 2014
Y. Wang, G. Cong, G. Song, and K. Xie. Community-based greedy algorithm for mining top-K influential nodes in mobile social networks. In KDD, 2010
M. Richardson and P. Domingos. Mining the Network Value of Customers. In KDD, 2001
M. Richardson and P. Domingos. Mining Knowledge-Sharing Sites for Viral Marketing. In KDD, 2002
J. Leskovec, L. Adamic, and B. Huberman. The Dynamics of Viral Marketing. TWEB, 2007
A. Goyal, F. Bonchi, and K. V.S. Lakshmanan. A Data-Based approach to Social Influence Maximization. In PVLDB, 2012

[May 30] Lecture 16: Representation learning in graphs

Methods for learning node embeddings in graphs (LINE, DeepWalk and node2vec).

Reading:

J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: Large-scale information network embedding. In WWW, 2015
B. Perozzi, R. Al-Rfou, and S. Skiena. DeepWalk: Online Learning of Social Representations. In KDD, 2014
A. Grover and J. Leskovec. node2vec: Scalable Feature Learning for Networks. In KDD, 2016

Additional:

J. B. Tenenbaum, V. De Silva, and J. C. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290:5500, pp. 2319-2323, 2000
S. T. Roweis and L. K. Saul. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290:5500, pp. 2323-2326, 2000
M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, 2001
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv, 2013
T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013
S. Cao, W. Lu, and Q. Xu. GraRep: Learning Graph Representations with global structural information. In CIKM, 2015
S. Cao, W. Lu, and Q. Xu. Deep Neural Networks for Learning Graph Representations. In AAAI, 2016
P. Goyal and E. Ferrara. Graph Embedding Techniques, Applications, and Performance: A Survey. arXiv, 2017
A. Narayanan, M. Chandramohan, L. Chen, Y. Liu, and S. Saminathan. subgraph2vec: Learning Distributed Representations of Rooted Sub-graphs from Large Graphs. In KDD (MLG Workshop), 2016
M. Niepert, M. Ahmed, and K. Kutzkov. Learning Convolutional Neural Networks for Graphs. In ICML, 2016

[June 1] Lecture 17: Core decomposition in graphs

Core decomposition and algorithms. Applications in dense subgraph detection, community detection, identification of influential spreaders and NLP.

Reading:

M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, and H. A. Makse. Identification of influential spreaders in complex networks. Nature Physics 6, 888-893, 2010
C. Giatsidis, D. Thilikos, and M. Vazirgiannis. D-cores: Measuring Collaboration of Directed Graphs Based on Degeneracy. In ICDM, 2011

Additional:

S.B. Seidman. aNetwork Structure and Minimum Degree. Social Networks, 1983
F.D. Malliaros, A.N. Papadopoulos, and M. Vazirgiannis. Core Decomposition in Graphs: Concepts, Algorithms and Applications. In EDBT, 2016
V. Batagelj and M. Zaversnik. Generalized Cores. arXiv, 2002
C. Giatsidis, B. Cautis, S. Maniu, D.M. Thilikos, and M. Vazirgiannis. Quantifying trust dynamics in signed graphs, the S-Cores approach. In SDM, 2014
R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In WAW, 2009
C. Giatsidis, F.D. Malliaros, D.M. Thilikos, and M. Vazirgiannis. CoreCluster: A Degeneracy Based Graph Clustering Framework. In AAAI, 2014
M.P. O'Brien, B.D. Sullivan. Locally estimating core numbers. In ICDM, 2014
F.D. Malliaros, M.-E.G. Rossi, and M. Vazirgiannis. Locating Influential Nodes in Compex Networks. Scientific Reports 6(19307), 2016
J. Wang and J. Cheng. Truss decomposition in massive networks. Proc. VLDB Endow., 5(9):812-823, 2012
S. Carmi, S. Havlin, S. Kirkpatrick, Y. Shavitt, and E. Shir. A Model of Internet Topology using k-shell Decomposition. PNAS 104(27), 2007
J.I. Alvarez-Hamelin, L. Dall'Asta, A. Barrat, and A. Vespignani. Large scale networks fingerprinting and visualization using the k-core decomposition. In NIPS, 2006
F.D. Malliaros and M. Vazirgiannis. To stay or not to stay: modeling engagement dynamics in social graphs. In CIKM, 2013
K. Shin, T. Eliassi-Rad, and C. Faloutsos. CoreScope: Graph Mining Using k-Core Analysis - Patterns, Anomalies and Algorithms. In ICDM, 2016
A. J.-P. Tixier, F. D. Malliaros, and M. Vazirgiannis. A Graph Degeneracy-based Approach to Keyword Extraction. In EMNLP, 2016
F. Rousseau and M. Vazirgiannis. Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction. In ECIR, 2015
P. Meladianos, G. Nikolentzos, F. Rousseau, Y. Stavrakas, and M. Vazirgiannis. Degeneracy-Based Real-Time Sub-Event Detection in Twitter Stream. In ICWSM, 2015

[June 6] Lecture 18: Review of topics

Review of topics covered in the course and Q&A.

Course Structure

Learning objectives

The course aims to introduce students to the field of graph mining and network analysis by:

Covering a wide range of topics, methodologies and related applications.
Giving the students the opportunity to obtain hands-on experience on dealing with graph data and graph mining tasks.

We expect that by the end of the course, the students will have a thorough understanding of various graph mining and learning tasks, will be able to analyze large-scale graph data as well as to formulate and solve problems that involve graph structures.

Prerequisites

There is no official prerequisite for this course. However, the students are expected to:

Have basic knowledge of graph theory and linear algebra.
Be familiar with fundamental data mining and machine learning tasks (e.g., CSE 258).
Be familiar with at least one programming language (e.g., Python or any language of their preference).

In the second lecture, we will review basic concepts in graph theory, linear algebra and machine learning.

Reading material

Most of the material of the course is based on research articles. Some of the topics are also covered by the following books:

David Easley and Jon Kleinberg. Networks, Crowds, and Markets. Cambridge University Press, 2010.
Mark E.J. Newman. Networks: An Introduction. Oxford University Press, 2010.
Deepayan Chakrabarti and Christos Faloutsos. Graph Mining: Laws, Tools, and Case Studies. Synthesis Lectures on Data Mining and Knowledge Discovery, Morgan and Claypool Publishers, 2012.
Reza Zafarani, Mohammad Ali Abbasi and Huan Liu. Social Media Mining. Cambridge University Press, 2014.
Albert-Laszlo Barabasi. Network Science. Cambridge University Press, 2016.

Evaluation

The evaluation of the course will be based on the following:

Two assignments: the assignments will include theoretical questions as well hands-on practical questions and will familiarize the students with basic graph mining and analysis tasks.
Project: this will be the main component for the evaluation of the course. The students are expected to form groups of 2-3 people, propose a topic for their project, and submit a final project report (it would also be interesting to organize a poster session at the end of the quarter). Please, read the project section for more details.

The grading will be as follows:

Assignment 1 (individually):	20%
Assignment 2 (groups of 3-4 students):	20%
Project (groups of 3-4 students):	60%

Academic integrity

UCSD and CSE's policies on academic integrity will be strictly enforced (please see here and here). In particular, all of your work must be your own. Some relevant excerpts from UCSD's policies are:

Don't copy another student's assignment, in part or in total, and submit it as your own work.
Acknowledge and cite source material in your papers or assignments.

Project

Details about the project of the course can be found on piazza.

Resources

Datasets

Stanford Network Analysis Project (SNAP)
The Koblenz Network Collection (KONECT)
The Colorado Index of Complex Networks (ICON)
The Network Data Repository with interactive graph analytics and visualization.
Social Computing Data Repository at Arizona State University
Datasets from the Social Computing Research group at MPI-SWS.
Datasets from the AMiner academic social network.
Datasets from the Complex Networks and Soft Matter Lab (Hernan Makse) at CUNY
Datasets from papers published in the International AAAI Conference on Web and Social Media (ICWSM). Also, check the previous ICWSM conferences.
Datasets from Jure Leskovec's class at Stanford.
Datasets of online social networks by Athina Markopoulou (UC Irvine).
Social media (Instagram and Facebook) datasets by Emilio Ferrara (USC).
KDD cup archives, uncluding a competition about Tencent Weibo (KDD is one of the premier data mining conferences). For this year (2017), the competition is currenty active and the topic is highway tollgates traffic flow prediction (using also the road network topology in the target area, among other data sources).
Graph datasets by Yibo Yao (WSU).
Sina Weibo dataset by Weiboscope (University of Hong Kong). Another Sina Weibo dataset by Tianchi can be found here.
Foursquare dataset by Dingqi Yang (University of Fribourg).
Datasets from the Spatial and Urban Networks group at CEA (Paris).
Yelp dataset challenge
Quora question pairs competition in Kaggle (currently active).
IlliMine repository from the Data Mining Research Group at UIUC (some network datasets are also included).

Software tools

NetworkX: Python software package for graph analytics
igraph: collection of software packages for graph theory and network analysis (Python, C++ and R)
SNAP: high performance system for the analysis of large network (C++ and Python)
Gephi: graph visualization and exploration software

Related conferences
Please find below a list of conferences related to the contents of the course (mostly in the field of data mining, social network analysis and the Web). We provide the DBLP website of each venue where you can access the proceedings (papers, tutorials, etc).

Check out the website of each conference (e.g., KDD 2016 ) for more information.