## Abstracts## Optimal Embedding of Heterogeneous Graph Data with Edge Crossing Constraints
We propose a novel approach to visualization of heterogeneous data characterized by both a relationship graph structure and intrinsic features. Each data point is a node in a graph with a given structure. Each data point is also associated with a set of features that have a corresponding distance or similarity measure. A successful visualization accurately captures the desired proximity structure as measured by some embedding objective while simultaneously optimizing an aesthetic criterion, no edge crossings. The edge-crossing constraint is expressed as a nonlinear constraint which has an intuitive geometric interpretation closely related to support vector machine classification. The approach can be generalized to remove intersections of general convex polygons including node-edge and node-node intersections. We demonstrate the approach on multi-dimensional scaling or equivalently Kamada-Kawai force-directed graph layout, by modifying the stress majorization algorithm to include penalized edge crossings. The resulting Expectation-Maximization-like algorithm can be readily adapted to other supervised and unsupervised optimization-based embedding or dimensionality reduction methods. The method is demonstrated on a problem in tuberculosis molecular epidemiology - creating spoligo forests for visualizing genetic relatedness between strains of the Mycobacterium tuberculosis complex characterized by a phylogenetic forest, and multiple biomarkers with a corresponding non-metric genetic distance. ## Visualization in Low Dimensional Space by Tessellation of Linear Subspaces
The problem of nonlinear dimensionality reduction is most often formulated as an instance of semidefinite programming (SDP). However, the effectiveness of the SDP-based algorithms is limited by the computational complexity of the SDP solvers. Due to this limitation, some large-scale variations of the SDP-based methods have been proposed, which are relatively fast and scalable; however, they reduce the size of the SDP by some heuristic approximation, which results in sub-optimal solutions. In this paper, we propose a novel method, which significantly reduces the size and the number of constraints of the SDP problem. We show the search space of the SDP problem can be restricted to a small face of the semidefinite cone. Crucially, the proposed formulation is not an approximation and very large SDP problems can be solved exactly and efficiently by this technique. In addition, unlike most methods, this algorithm provides a parametric mapping that allows out-of-sample test points to be sensibly mapped into the embedded space. ## Adaptive Visualization of Text Documents Incorporating Domain Knowledge
We present a method for visualizing text corpora that are assumed to contain labeled and unlabeled documents. Our method aims at learning data mappings of labeled documents including the terms that are most relevant for label discrimination. We can use this information to visualize mapped unlabeled documents as well. We also show how this method allows the inclusion of user's feedback. This feedback is supplied in an iterative process, so that the user can use the output of the method to provide its domain knowledge of the data. At the same time, this technique is well suited for providing a new low-dimensional space where traditional clustering or classification methods can be applied. Even though our approach is able to deal with document labels that are discrete classes, continuous values, or associated vectors, we confine the experiments of this article to labels that represent non-overlapped topics. This approach is evaluated using a set of short and noisy documents, which is considered as a challenging task in the text mining literature. ## A Psychophysical Investigation of Dimensionality Reduction
A cornucopia of dimensionality reduction techniques have emerged over the past decade, leaving data analysts with a wide variety of choices for reducing their data. Means of evaluating and comparing low-dimensional embeddings useful for visualization, however, are very limited. When proposing a new technique it is common to simply show rival embeddings side-by-side and let human judgment determine which embedding is superior. This study investigates whether such human embedding evaluations are reliable, i.e., whether humans tend to agree on the quality of an embedding. Our results reveal that, although experts are reasonably consistent in their evaluation of embeddings, novices generally disagree on the quality of an embedding. We discuss the impact of this result on the way dimensionality reduction researchers should present their results, and on applicability of dimensionality reduction outside of machine learning. ## The Nystrom approximation for relational generative topographic mappings
Relational generative topographic mappings (RGTM) provide a statistically motivated data inspection and visualization tool for pairwise dissimilarities by fitting a constraint Gaussian mixture model to the data. Since it is based on pairwise dissimilarities of data, it scales quadratically with the number of training samples, making the method infeasible for large data sets. In this contribution, we transfer the Nystrom approximation to RGTM and we investigate its effect on the method. This leads to a linear method which reliability depends on the intrinsic dimensionality of the dissimilarity matrix. ## Inside the Selection Box: Visualising active learning selection strategies
Visualisations can be used to provide developers with insights into the inner workings of interactive machine learning techniques. In active learning, an inherently interactive machine learning technique, the design of selection strategies is the key research question and this paper demonstrates how spring model based visualisations can be used to provide insight into the precise operation of various selection strategies. Using sample datasets, this paper provides detailed examples of the differences between a range of selection strategies. ## Probabilistic Spectral Dimensionality Reduction
We introduce a new perspective on spectral dimensionality reduction which views these methods as Gaussian random fields (GRFs). Our unifying perspective is based on the maximum entropy principle which is in turn inspired by maximum variance unfolding. The resulting probabilistic models are based on GRFs. The resulting model is a nonlinear generalization of principal component analysis. We show that parameter fitting in the locally linear embedding is approximate maximum likelihood in these models. We develop new algorithms that directly maximize the likelihood and show that these new algorithms are competitive with the leading spectral approaches on a robot navigation visualization and a human motion capture data set. Finally the maximum likelihood perspective allows us to introduce a new approach to dimensionality reduction based on L1 regularization of the GRF via the graphical lasso. ## Fast Optimization for t-SNE
The paper presents an alternative optimization technique for t-SNE that is orders of magnitude faster than the original optimization technique, and that produces results that are at least as good. ## The Topic Browser An Interactive Tool for Browsing Topic Models
Topic models have been shown to reveal the semantic content in large corpora. Many individualized visualizations of topic models have been reported in the literature, showing the potential of topic models to give valuable insight into a corpus. However, good, general tools for browsing the entire output of a topic model along with the analyzed corpus have been lacking. We present an interactive tool that incorporates both prior work in displaying topic models as well as some novel ideas that greatly enhance the visualization of these models. ## Kick-starting GPLVM Optimization via a Connection to Metric MDS
The Gaussian Process Latent Variable Model (GPLVM) is an attractive model for dimensionality reduction, but the optimization of the GPLVM likelihood with respect to the latent point locations is difficult, and prone to local optima. Here we start from the insight that in the GPLVM, we should have that , where is the kernel function evaluated at latent points and , and is the corresponding estimate from the data. For an isotropic covariance function this relationship can be inverted to yield an estimate of the interpoint distances in the latent space, and these can be fed into a multidimensional scaling (MDS) algorithm. This yields an initial estimate of the latent locations, which can be subsequently optimized in the usual GPLVM fashion. We compare two variants of this approach to the standard PCA initialization and to the ISOMAP algorithm, and show that our initialization converges to the best GPLVM likelihoods on all six tested motion capture data sets. ## On the Effect of Clustering on Quality Assessment Measures for Dimensionality Reduction
Visualization techniques constitute important interfaces between the rapidly increasing digital information available today and the human user. In this context, numerous dimensionality reduction models have been proposed in the literature. This fact recently motivated the development of frameworks for a reliable and automatic quality assessment of the produced results. These specialized measurements represent a means to objectively assess an overall qualitative change under spatial transformation. The rapidly increasing size of the data sets, however, causes the need to not only map given data points to low dimensionality, but to priorly compress the available information, e.g., by means of clustering techniques. While a standard evaluation measure for clustering is given by the quantization error, its counterpart, the reconstruction error, is usually infeasible for dimensionality reduction, such that one has to rely on alternatives. In this paper, we investigate in how far two existing quality measures used for dimensionality reduction are appropriate to evaluate the quality of clustering outputs, such that only one quality measure can be used for the full visualization process in the context of large data sets. In this contribution, we present empirical results from different basic clustering scenarios as a first study to analyze how structural characteristics of the data are reflected in the quality assessment. ## Data Triangulation, Information Visualization and Ethnomining: Understanding Everyday Interactions with Technologies
As networked technologies increasingly become part of our everyday lives, more and more data about their usage are being collected. Privacy issues notwithstanding, these data provide an excellent way to develop models of how, when, where and why people use networked technologies, services and applications. By taking what has been dubbed an “ethnomining” approach - that is datamining with an ethnographic lens - I suggest there a excellent opportunities for a highly productive collaboration between datamining, information visualization and social science. In this talk, I will walk through the kinds of research questions that are posed by social scientists and ethnographers, the kinds of research methods that are used, the kinds of data that are generated. I will conclude with a vision for a fruitful collaboration between dataminers, information visualization experts and social scientists interested in building socio technical models of human behaviour. I will illustrate with case studies and examples. ## Information Visualization and Retrieval
There are plenty of good algorithms for manifold learning which are not good for data visualization. The reason is that in visualization the dimensionality may need to be compressed below the inherent dimensionality of data, which the standard methods have not been designed for. We have formulated data visualization as a visual information retrieval problem, where the data display serves as a tool to find similar data points. The retrieval formalization highlights the necessary tradeoff any algorithm needs to make, between precision and recall in finding the true similar data points or, equivalently, false positives and misses. Given a relative cost between the two, the total cost can be optimized, resulting in a data visualization method we call Nerv (Neighbor retrieval visualizer). SNE is a special case which maximizes recall. The visualizations are directly suited as interfaces for retrieving relevant molecular biological experiments, for instance, and have been used for visualizing graphs as well. ## Challenges in Information Visualization
While information visualization (infovis) is developing a lot of useful applications, there is an increasing awareness of its lack of a foundational theory and overarching principles. While we can draw from the theory gathered in statistics, perception, cognition, computing, etc., we need to develop our own set of principles to build future visualization systems on. In this talk, I will describe some recent research results that highlight gaps in our knowledge, and discuss our first steps towards building a new body of theory for infovis. ## Bayesian Inference with Kernels
An embedding of probability distributions into a reproducing kernel Hilbert space (RKHS) has been introduced: like the characteristic function, this provides a unique representation of a probability distribution in a high dimensional feature space. This representation forms the basis of an inference procedure on graphical models, where the likelihoods are represented as RKHS functions. The resulting algorithm is completely nonparametric: all aspects of the model are represented implicitly, and learned from a training sample. Both exact inference on trees and loopy BP on pairwise Markov random fields are demonstrated. Kernel message passing can be applied to general domains where kernels are defined, handling challenging cases such as discrete variables with huge domains, or very complex, non-Gaussian continuous distributions. We apply kernel message passing and competing approaches to cross-language document retrieval, depth prediction from still images, protein configuration prediction, and paper topic inference from citation networks: these are all large-scale problems, with continuous-valued or structured random variables having complex underlying probability distributions. In all cases, kernel BP performs outstandingly, being orders of magnitude faster than state-of-the-art nonparametric alternatives, and returning more accurate results. Joint work with Danny Bickson, Kenji Fukumizu, Carlos Guestrin, Yucheng Low, and Le Song. ## Visual Analytics for Relational Data
TBA ## Kernelized Sorting for Data Visualization: Globby Search Engine
Bridging the gap between keyword-based and content-based search engines is the next big thing in enhancing the user experience and serving up more systematic search results. Globby achieves this by returning keyword-based query relevant images in a set of pages, where each page contains several images with content alike images are placed at proximal locations. Globby is based on Kernelized Sorting, a provably locally convergent statistical method, to arrange similar images close to each other in predefined but arbitrary structures with an additional ranking constraint that the most relevant image to the query is placed at, for example, top left corner. |