SoCal ML Symposium

Accepted Posters

Conditional Generative Adversarial Networks (cGANs) for Near Real-Time Precipitation Estimation from Multispectral GOES-16 Satellite Imageries PDF
Negin Hayatbini, Bailey Kong and Kuolin Hsu

Abstract:In this Study, we present a state-of-the-art precipitation estimation framework which leverages advances in satellite remote sensing as well as Deep Learning (DL). The framework takes advantage of the improvements in spatial, spectral and temporal resolutions of the Advanced Baseline Imager (ABI) onboard the GOES-16 platform along with elevation information to improve the precipitation estimates. The procedure begins by first deriving a Rain/No Rain (R/NR) binary mask through classification of the pixels and then applying regression to estimate the amount of rainfall for rainy pixels. A Fully Convolutional Network is used as a regressor to predict precipitation estimates. The network is trained using the non-saturating conditional Generative Adversarial Network (cGAN) and Mean Squared Error (MSE) loss terms to generate results that better learn the complex distribution of precipitation in the observed data. Common verification metrics such as Probability Of Detection (POD), False Alarm Ratio (FAR), Critical Success Index (CSI), Bias, Correlation and MSE are used to evaluate the accuracy of both R/NR classification and real-valued precipitation estimates. Statistics and visualizations of the evaluation measures show improvements in the precipitation retrieval accuracy in the proposed framework compared to the baseline models trained using conventional MSE loss terms. This framework is proposed as an augmentation for PERSIANN-CCS (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Network- Cloud Classification System) algorithm for estimating global precipitation.

@inproceedings{scmls20_1,
title={Conditional Generative Adversarial Networks (cGANs) for Near Real-Time Precipitation Estimation from Multispectral GOES-16 Satellite Imageries},
author={Negin Hayatbini, Bailey Kong and Kuolin Hsu},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

BooST: Boosting Smooth Transition Regression Trees for Partial Effect Estimation in Nonlinear Regressions PDF
Gabriel Vasconcelos, Marcelo Medeiros, Alvaro De Lima Veiga Filho and Yuri Fonseca

Abstract:In this paper, we introduce a new machine learning (ML) model for nonlinear regression called the Boosted Smooth Transition Regression Trees (BooST), which is a combination of boosting algorithms with smooth transition regression trees. The main advantage of the BooST model is the estimation of the derivatives (partial effects) of very general nonlinear models. Therefore, the model can provide more interpretation about the mapping between the covariates and the dependent variable than other tree-based models, such as Random Forests. We present several examples with both simulated and real data.

@inproceedings{scmls20_2,
title={BooST: Boosting Smooth Transition Regression Trees for Partial Effect Estimation in Nonlinear Regressions},
author={Gabriel Vasconcelos, Marcelo Medeiros, Alvaro De Lima Veiga Filho and Yuri Fonseca},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Deep Learning-Based Navigation Solution for Autonomous Aerial Refueling PDF
Jorge Alberto Bañuelos Garcia and Ahmad Bani Younes

Abstract:Unmanned Aerial Vehicles (UAVs) have been developed in the last decades to meet some of the demands that typical military aircraft, and their human crews, could not carry out safely and effectively. A human crew cannot fly longer than a UAV in many dangerous situations that could jeopardize the safety of the crew. With their relatively small size, not much fuel can be carried on-board to carry out an extended mission. If a UAV is to be successful, it must be able to refuel in-air to carry out more extended missions. Some of the most difficult challenges a human pilot must face are the probe-and-drogue aerial refueling problem that requires great skill to dock an aircraft to another aircraft to refuel. The probe-and-drogue aerial refueling method is that a refueling aircraft deploys a long hose with a receptacle (drogue), and the receiving aircraft maneuvers a probe into the receptacle, as shown in Figure \ref{fig: F35ProbeandDrogue}. An automatic mechanism locks the probe onto the drogue, and refueling begins. Maneuvering the probe into the drogue requires a highly skilled pilot who needs to consider considering the dynamics of the aircraft before and after refueling while dealing with a drogue under turbulence. It, therefore, becomes imperative to develop a method to estimate the drogue's position and attitude relative to the receiving aircraft. This paper develops a navigation solution based on Deep Learning Object Detector algorithms to provide accurate 6-Degree-of-freedom (DoF) information of the drogue relative to a monocular camera that is on board of a flying UAV. An object detector provides the needed information for an autonomous vehicle to dock and refuel without the need for human intervention. The object detector is trained using 8746 images of a mock drogue to detect eight different beacons. Once these beacons are detected, a non-linear least-squares algorithm that uses the collinearity equations as a system model takes the location of the beacons on the captured image to provide an accurate 6-DoF navigation solution. These navigation solutions from the Object Detector are evaluated on multiple metrics and then compared to navigation solutions provided by a VICON motion tracking system. Finally, Monte Carlo analysis is performed using the collinearity equations as a system model to evaluate the performance of an Object Detector with various degrees of noise.

@inproceedings{scmls20_3,
title={Deep Learning-Based Navigation Solution for Autonomous Aerial Refueling},
author={Jorge Alberto Bañuelos Garcia and Ahmad Bani Younes},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Where is the World Headed? Trajectory Prediction for Interacting Agents PDF
Nitin Kamra, Hao Zhu, Dweep Trivedi, Ming Zhang and Yan Liu

Abstract:Trajectory prediction for scenes with multiple agents is an important problem for traffic prediction, pedestrian tracking and path planning. We present a novel relational neural network model to address this problem, which flexibly models interaction between agents by making fuzzy decisions and combining the corresponding responses with a fuzzy operator. Our approach shows significant performance gains over many existing state-of-the-art predictive models in diverse domains such as human crowd trajectories, US freeway traffic and physics datasets.

@inproceedings{scmls20_4,
title={Where is the World Headed? Trajectory Prediction for Interacting Agents},
author={Nitin Kamra, Hao Zhu, Dweep Trivedi, Ming Zhang and Yan Liu},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods PDF
Dylan Slack

Abstract:As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, we propose a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous.

@inproceedings{scmls20_5,
title={Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods},
author={Dylan Slack},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

A Stochastic Time Series Model for Predicting Financial Trends with NLP PDF
Pratyush Muthukumar

Abstract:We consider a stochastic time series model for predicting the risk of a corporation’s assets through linguistic analysis of earnings conference calls (ECC). ECCs are vital performance indicators of a company’s assets over the fiscal year. By translating specific phrases into word vectors and discerning similarities between related concepts and phrases, we can improve the current strategies of time series forecasting. We also introduce a novel method for time series forecasting by computing the stochastic volatilities of stocks based on the subtleties of ECC sessions. Our neural network model has similarities to the structure of a convolutional character decoder, calculates along a time series of data, and can accurately predict the stochastic risk-reward payoff of a company by factoring in human sentiments gathered through conference call analysis. We hope the model can open up new discussion on predicting the human sentiment portion of stochastic noise that is so widely unclear within the financial prediction field. We gratefully acknowledge support from CSU-LSAMP, supported by the National Science Foundation under Grant # HRD-1302873 and the CSU Office of the Chancellor.

@inproceedings{scmls20_6,
title={A Stochastic Time Series Model for Predicting Financial Trends with NLP},
author={Pratyush Muthukumar},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Does Knowledge Transfer Always Help to Learn a Better Policy? PDF
Fei Feng, Wotao Yin and Lin Yang

Abstract:One of the key approaches to save samples when learning a policy for a reinforcement learning problem is to use knowledge from an approximate model such as its simulator. However, does knowledge transfer from approximate models always help to learn a better policy? Despite numerous empirical studies of transfer reinforcement learning, an answer to this question is still elusive. In this paper, we provide a strong negative result, showing that even the full knowledge of an approximate model may not help reduce the number of samples for learning an accurate policy of the true model. We construct an example of reinforcement learning models and show that the complexity with or without knowledge transfer has the same order. On the bright side, effective knowledge transferring is still possible under additional assumptions. In particular, we demonstrate that knowing the (linear) bases of the true model significantly reduces the number of samples for learning an accurate policy.

@inproceedings{scmls20_7,
title={Does Knowledge Transfer Always Help to Learn a Better Policy?},
author={Fei Feng, Wotao Yin and Lin Yang},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Provably Efficient Exploration for RL with Unsupervised Learning PDF
Fei Feng, Ruosong Wang, Wotao Yin, Simon Du and Lin Yang

Abstract:We study how to use unsupervised learning for efficient exploration in reinforcement learning with rich observations generated from a small number of latent states. We present a novel algorithmic framework built upon two components: an unsupervised learning algorithm and a no-regret reinforcement learning algorithm. We show that our algorithm provably finds a near-optimal policy with sample complexity polynomial in the number of latent states, which is significantly smaller than the number of possible observations. Our results give theoretical justification to the prevailing paradigm of using unsupervised learning for efficient exploration [tang2017exploration,bellemare2016unifying].

@inproceedings{scmls20_8,
title={Provably Efficient Exploration for RL with Unsupervised Learning},
author={Fei Feng, Ruosong Wang, Wotao Yin, Simon Du and Lin Yang},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Implicit competitive regularization in GANs PDF
Florian Schaefer, Hongkai Zheng and Anima Anandkumar

Abstract:To improve GANs, we need to understand why they can produce realistic samples. Presently, GANs are understood as the generator minimizing a divergence given by the optimal discriminator. We point out a fundamental flaw of this interpretation that precludes it from explaining why GANs work in practice. Instead, we argue that the performance of GANs is due to the implicit competitive regularization (ICR) arising from the simultaneous optimization of generator and discriminator. We show that opponent-aware modelling of generator and discriminator, as present in competitive gradient descent (CGD), can significantly strengthen ICR and thus stabilize GAN training without explicit regularization. In our experiments, we use an existing implementation of WGAN-GP and show that by training it with CGD we can improve the inception score (IS) on CIFAR10 for a wide range of scenarios, without any hyperparameter tuning. The highest IS is obtained by combining CGD with the WGAN-loss, without any explicit regularization.

@inproceedings{scmls20_9,
title={Implicit competitive regularization in GANs},
author={Florian Schaefer, Hongkai Zheng and Anima Anandkumar},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Deep Learning Tubes for Tube MPC PDF
David D. Fan, Ali-Akbar Agha-Mohammadi and Evangelos Theodorou

Abstract:Learning-based control aims to construct models of a system to use for planning or trajectory optimization, e.g. in model-based reinforcement learning. In order to obtain guarantees of safety in this context, uncertainty must be accurately quantified. This uncertainty may come from errors in learning (due to a lack of data, for example), or may be inherent to the system. Propagating uncertainty in learned dynamics models is a difficult problem. Common approaches rely on restrictive assumptions of how distributions are parameterized or propagated in time. In contrast, in this work we propose using deep learning to obtain expressive and flexible models of how these distributions behave, which we then use for nonlinear Model Predictive Control (MPC). We introduce a deep quantile regression framework for control which enforces probabilistic quantile bounds and quantifies epistemic uncertainty. Using our method we explore different approaches for learning tubes which contain the possible trajectories of the system, and demonstrate how to use them in a Tube MPC scheme. Furthermore, we prove these schemes are recursively feasible and satisfy constraints with a desired margin of probability. Finally, we present experiments in simulation on a nonlinear quadrotor system, demonstrating the practical efficacy of these ideas.

@inproceedings{scmls20_10,
title={Deep Learning Tubes for Tube MPC},
author={David D. Fan, Ali-Akbar Agha-Mohammadi and Evangelos Theodorou},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Finding Social Media Trolls: Dynamic Keyword Selection Methods for Rapidly-Evolving Online Debates PDF
Maya Srikanth, Anqi Liu, Nicholas Adams-Cohen, Betty Wang, Michael Alvarez and Anima Anandkumar

Abstract:Online harassment is a significant social problem. Prevention of online harassment requires rapid detection of harassing, offensive, and negative social media posts. In this paper, we propose the use of word embedding models to identify offensive and harassing social media messages in two aspects: detecting fast-changing topics for more effective data collection and representing word semantics in different domains.We demonstrate with preliminary results that using the GloVe (Global Vectors for Word Representation) model facilitates the discovery of new and relevant keywords to use for data collection and trolling detection. Our paper concludes with a discussion of a research agenda to further develop and test word embedding models for identification of social media harassment and trolling.

@inproceedings{scmls20_11,
title={Finding Social Media Trolls: Dynamic Keyword Selection Methods for Rapidly-Evolving Online Debates},
author={Maya Srikanth, Anqi Liu, Nicholas Adams-Cohen, Betty Wang, Michael Alvarez and Anima Anandkumar},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems
Yoshitomo Matsubara, Sabur Baidya, Davide Callegaro, Marco Levorato and Sameer Singh

Abstract:Offloading the execution of complex Deep Neural Networks (DNNs) to compute-capable devices at the network edge, that is, edge servers, can significantly reduce capture-to-output delay. We propose a framework to split DNNs for image processing and minimize capture-to-output delay in a wide range of network conditions and computing parameters, and distill the head portion of the DNN to reduce its computational complexity and introduce a bottleneck, thus minimizing processing load at the mobile device as well as the amount of wirelessly transferred data.

@inproceedings{scmls20_12,
title={Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems},
author={Yoshitomo Matsubara, Sabur Baidya, Davide Callegaro, Marco Levorato and Sameer Singh},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Compressing Variational Posteriors PDF
Yibo Yang, Robert Bamler and Stephan Mandt

Abstract:Deep Bayesian latent variable models have enabled new approaches to both model and data compression. Here, we propose a new algorithm for compressing latent representations in deep probabilistic models, such as VAEs, in post-processing. Our algorithm generalizes arithmetic coding to the continuous domain, using adaptive discretization accuracy that exploits posterior uncertainty. Our approach separates model design and training from the compression task, and thus allows for various rate-distortion trade-offs with a single trained model, eliminating the need to train multiple models for different bit rates. We obtain promising experimental results on compressing Bayesian neural word embeddings, and outperform JPEG on image compression over a wide range of bit rates using only a single standard VAE.

@inproceedings{scmls20_13,
title={Compressing Variational Posteriors},
author={Yibo Yang, Robert Bamler and Stephan Mandt},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Neural Contextual Bandits with UCB-based Exploration PDF
Dongruo Zhou, Lihong Li and Quanquan Gu

Abstract:We study the stochastic contextual bandit problem, where the reward is generated from an unknown function with additive noise. No assumption is made about the reward function other than boundedness. We propose a new algorithm, NeuralUCB, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration. We prove that, under standard assumptions, NeuralUCB achieves $\tilde O(\sqrt{T})$ regret, where $T$ is the number of rounds. To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee. We also show the algorithm is empirically competitive against representative baselines in a number of benchmarks.

@inproceedings{scmls20_14,
title={Neural Contextual Bandits with UCB-based Exploration},
author={Dongruo Zhou, Lihong Li and Quanquan Gu},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

NetTailor: Tuning the architecture, not just the weights PDF
Pedro Morgado and Nuno Vasconcelos

Abstract:Real-world applications of object recognition often require the solution of multiple tasks in a single platform. Under the standard paradigm of network fine-tuning, an entirely new CNN is learned per task, and the final network size is independent of task complexity. This is wasteful, since simple tasks require smaller networks than more complex tasks, and limits the number of tasks that can be solved simultaneously. To address these problems, we propose a transfer learning procedure, denoted NetTailor, in which layers of a pre-trained CNN are used as universal blocks that can be combined with small task-specific layers to generate new networks. Besides minimizing classification error, the new network is trained to mimic the internal activations of a strong unconstrained CNN, and minimize its complexity by the combination of 1) a soft-attention mechanism over blocks and 2) complexity regularization constraints. In this way, NetTailor can adapt the network architecture, not just its weights, to the target task. Experiments show that networks adapted to simple tasks, such as character or traffic sign recognition, become significantly smaller than those adapted to hard tasks, such as fine-grained recognition. More importantly, due to the modular nature of the procedure, this reduction in network complexity is achieved without compromise of either parameter sharing across tasks, or classification accuracy.

@inproceedings{scmls20_16,
title={NetTailor: Tuning the architecture, not just the weights},
author={Pedro Morgado and Nuno Vasconcelos},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

BRGAN: Generating Graphs of Bounded Rank PDF
William Shiao and Evangelos Papalexakis

Abstract:Graph generation is a task that has been explored with a wide variety of methods. Recently, several papers have applied Generative Adversarial Networks (GANs) to this task, but most of these methods result in graphs of full or unknown rank. However, generating graphs of low rank can be useful. Many real-world graphs have low rank, which roughly translates to the number of communities in that graph. In this paper, we propose BRGAN: a GAN architecture that generates synthetic graphs, which in addition to having realistic graph features, also have bounded (low) rank. We also evaluate the generated graphs and show that they are effectively bounded.

@inproceedings{scmls20_17,
title={BRGAN: Generating Graphs of Bounded Rank},
author={William Shiao and Evangelos Papalexakis},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Revisiting Evaluation of Knowledge Base Completion Models PDF
Pouya Pezeshkpour, Yifan Tian and Sameer Singh

Abstract:Representing knowledge graphs (KGs) by learning embeddings for entities and relations has provided accurate models for existing KG completion benchmarks. Although extensive research has been carried out on KG completion, because of the open-world assumption of existing KGs, previous studies rely on ranking metrics and triple classification with negative samples for the evaluation and are unable to directly assess the models on the goals of the task, completion. In this paper, we first study the shortcomings of these evaluation metrics. More specifically, we demonstrate that these metrics 1) are unreliable for estimating calibration, 2) make strong assumption that are often violated, and 3) do not sufficiently, and consistently, differentiate embedding methods from simple approaches and from each other. To address these issues, we provide a semi-complete KG using a randomly sampled subgraph from test and validation data of YAGO3-10, allowing us to compute accurate triple accuracy on this this data. Conducting thorough experiments on existing models, we provide new insights and directions for the KG completion research.

@inproceedings{scmls20_18,
title={Revisiting Evaluation of Knowledge Base Completion Models},
author={Pouya Pezeshkpour, Yifan Tian and Sameer Singh},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Feature Interaction Interpretability and Beyond PDF
Michael Tsang and Yan Liu

Abstract:We demonstrate the various advantages of interpreting feature interactions in modern prediction models. We leverage our ongoing work on Neural Interaction Detection (NID) to identify interactions on feature perturbations and their inferences through black-box models. As part of this process, we propose an alternate form of NID called GradientNID, which exactly detects relevant interactions in neural network explainer models. Across diverse application domains like image, text, and dna modeling, we showcase new insights brought by feature interaction interpretability. We then focus on a specific application of increasing real-world importance: transparency in ad-targeting. In particular, we show that feature interactions not only explain online ad targeting behavior, but also have high commercial utility in automatic feature engineering.

@inproceedings{scmls20_19,
title={Feature Interaction Interpretability and Beyond},
author={Michael Tsang and Yan Liu},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Toward an Efficient and Online SLAM Solution by Online EM Algorithm over HMMs PDF
Tsang-Kai Chang and Ankur Mehta

Abstract:Autonomous agents rely on the SLAM algorithm to establish the relationship between itself and the surrounding environments. Even though several existing SLAM algorithms are proposed for various scenarios, they are either offline or inefficient with the number of landmarks. Regarding the nature of streaming data and state dependency, we model the SLAM problem as a hidden Markov model, and apply the online EM algorithm to solve it. The first attempt of online EM provides an online and efficient framework, but the performance is sensitive to the parameter choice. We look forward to the variants of the online EM algorithm to mitigate this problem. After all, an efficient and online SLAM algorithm ensures the spatial autonomy of an agent, and also serves the basis for those Markov decision process based control and learning frameworks.

@inproceedings{scmls20_21,
title={Toward an Efficient and Online SLAM Solution by Online EM Algorithm over HMMs},
author={Tsang-Kai Chang and Ankur Mehta},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

AllenNLP Interpret:Explaining Predictions of NLP Models PDF
Eric Wallace, Jens Tuyls, Junlin Wang, Sanjay Subramanian, Matt Gardner and Sameer Singh

Abstract:Neural NLP models are increasingly accurate but are imperfect and opaque—they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for specific model predictions. Unfortunately, existing interpretation codebases make it difficult to apply these methods to new models and tasks, which hinders adoption for practitioners and burdens interpretability researchers. We introduce AllenNLPInterpret, a flexible framework for interpreting NLP models. The toolkit provides interpretation primitives (e.g., input gradients) for any AllenNLP model and task, a suite of built-in interpretation methods, and a library of front-end visualization components. We demonstrate the toolkit’s flexibility and utility by implementing live demos for five interpretation methods (e.g., saliency maps and adversarial attacks) on a variety of models and tasks (e.g., masked language modeling using BERT and reading comprehension using BiDAF). These demos, alongside our code and tutorials, are available athttps://allennlp.org/interpret. A video that walks through various use cases of our toolkit is available as well.

@inproceedings{scmls20_22,
title={AllenNLP Interpret:Explaining Predictions of NLP Models},
author={Eric Wallace, Jens Tuyls, Junlin Wang, Sanjay Subramanian, Matt Gardner and Sameer Singh},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Active Bayesian Assessment for Black-Box Classifiers PDF
Disi Ji, Robert Logan, Padhraic Smyth and Mark Steyvers

Abstract:Recent advances in machine learning have led to increased deployment of black-box classifiers across a wide variety of applications. In many such situations there is a crucial need to assess the performance of these pre-trained models, for instance to ensure sufficient predictive accuracy, or that class probabilities are well-calibrated. Furthermore, since labeled data may be scarce or costly to collect, it is desirable for such assessment be performed in an efficient manner. In this paper, we introduce a Bayesian approach for model assessment that satisfies these desiderata. We develop inference strategies to quantify uncertainty for common assessment metrics (accuracy, misclassification cost, expected calibration error), and propose a framework for active assessment using this uncertainty to guide efficient selection of instances for labeling. We illustrate the benefits of our approach in experiments assessing the performance of modern neural classifiers (e.g., ResNet and BERT) on several standard image and text classification datasets.

@inproceedings{scmls20_23,
title={Active Bayesian Assessment for Black-Box Classifiers},
author={Disi Ji, Robert Logan, Padhraic Smyth and Mark Steyvers},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

AReN: Assured ReLU NN Architecture for Model Predictive Control of LTI Systems PDF
James Ferlez and Yasser Shoukry

Abstract:One of the outstanding problems in data-trained neural networks (NNs) is the design of the NN’s architecture: that is the number of neurons and their connections. Current state-of-the-art practices typically choose a NN architecture either according to heuristics or else via a computationally expensive iteration schemes that involves adapting the architecture iteratively and re-training the NN. Besides being computationally taxing, neither of these provide any assurances that the resultant architecture is sufficient to permit adequate performance of the final trained NN. Indeed, the absence of such a guarantee on the architecture necessarily precludes such a guarantee on the trained network. We recently addressed the problem of automatically designing a regression NN architecture to generate (control) actions for a linear dynamical system under specified performance objectives. Specifically, we proposed AReN, an algorithm that generates assured Rectified Linear Unit (ReLU) NN architectures: given a linear dynamical system, AReN designs a ReLU NN architecture with the assurance that there exist network weights that exactly implement a Model Predictive Control (MPC) expert controller. AReN thus offers new insight into the design of ReLU NN architectures for the control of linear systems and Deep Reinforcement Learning, where MPC experts are commonly used. Instead of the computationally intensive or heuristic-driven methods described above, AReN can provide an adequate NN architecture before training begins.

@inproceedings{scmls20_24,
title={AReN: Assured ReLU NN Architecture for Model Predictive Control of LTI Systems},
author={James Ferlez and Yasser Shoukry},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Fast multi-agent temporal-difference learning via homotopy stochastic primal-dual method PDF
Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang and Mihailo Jovanovic

Abstract:We study a distributed policy evaluation problem in which a group of agents with jointly observed states and private local actions and rewards collaborate to learn the value function of a given policy via local computation and communication. This problem arises in various large-scale multi-agent systems, including power grids, intelligent transportation systems, wireless sensor networks, and multi-agent robotics. We develop and analyze a new distributed temporal-difference learning algorithm that minimizes the mean-square projected Bellman error. Our approach is based on a stochastic primal-dual method and we improve the best-known convergence rate from O(1/\sqrt{T}) to O(1/T) where T is the total number of iterations. Our analysis explicitly takes into account the Markovian nature of the sampling and addresses a broader class of problems than the commonly-used i.i.d. sampling scenario.

@inproceedings{scmls20_25,
title={Fast multi-agent temporal-difference learning via homotopy stochastic primal-dual method},
author={Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang and Mihailo Jovanovic},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Quantifying Gender Bias Over Time Using Dynamic Word Embeddings PDF
Aodong Li, Robert Bamler and Stephan Mandt

Abstract:Dynamic word embeddings are a powerful tool to measure the evolution of word semantics over time, but have not been exploited to-date due to a lack of available software implementations. In this work, we utilized them to quantify gender bias over time. By identifying a gender direction, we find that certain words dramatically change their orientation along this direction in the 1960s, which could be attributed to the Women’s Rights Movement in these years. We specifically demonstrate shifts of gender bias over time in three corpora, proving the versatility of dynamic word embeddings as a tool for the social sciences and humanities.

@inproceedings{scmls20_26,
title={Quantifying Gender Bias Over Time Using Dynamic Word Embeddings},
author={Aodong Li, Robert Bamler and Stephan Mandt},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Fashion Compatibility Recommendation via Unsupervised Metric Graph Learning PDF
Jiali Duan, Xiaoyuan Guo, Son Tran and C.-C. Jay Kuo

Abstract:In the task of fashion compatibility prediction, the goal is to pick an item from a candidate list to complement a partial outfit in the most appealing manner. Existing fashion compatibility recommendation work comprehends clothing images in a single metric space and lacks detailed understanding of users’ preferences in different contexts. To address this problem, we propose a novel Metric-Aware Explainable Graph Network (MAEG). In MAEG, we propose an unsupervised approach to obtain representation of items in a metric-aware latent semantic space. Then, we develop a graph filtering network and Pairwise Preference Attention module to model the interactions between users’ preferences and contextual information. Experiments on real world dataset reveals that MAEG not only outperforms the state-of-the-art methods, but also provides interpretable insights by highlighting the role of semantic attributes and contextual relationships among items.

@inproceedings{scmls20_27,
title={Fashion Compatibility Recommendation via Unsupervised Metric Graph Learning},
author={Jiali Duan, Xiaoyuan Guo, Son Tran and C.-C. Jay Kuo},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

On Background Knowledge and Robustness
Yizuo Chen, Arthur Choi and Adnan Darwiche

Abstract:We consider the role that background knowledge can play on the robustness of a classifier. It stands to reason that encoding knowledge into a classifier would make it more robust to certain adversarial attacks. For example, if we can encode the fact that a stop sign is red and octagonal, then we might expect that it becomes much more difficult to trick the classifier into thinking it is a speed limit sign, just by perturbing some pixels. Towards this goal, we propose to use Testing Bayesian Networks (TBNs), which facilitate the encoding of background knowledge (like Bayesian networks), but are also universal approximators (like neural networks).

@inproceedings{scmls20_28,
title={On Background Knowledge and Robustness},
author={Yizuo Chen, Arthur Choi and Adnan Darwiche},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Preference-Based Bayesian Optimization in High Dimensions with Human Feedback PDF
Myra Cheng, Ellen Novoseller, Maegan Tucker, Richard Cheng, Yisong Yue and Joel Burdick

Abstract:Human-in-the-loop learning algorithms have shown significant promise in improving robotic assistive devices to maximize utility to the user. In particular, interactive preference-based methods have been used to learn optimal device parameters by allowing users to try different parameter combinations and give feedback on which they prefer. Across settings that rely on subjective human feedback, pairwise preferences are a more reliable measure of system performance than absolute numerical scores. Existing preference-based learning methods have only explored low-dimensional domains due to computational limitations. However, robotic systems often have many tunable parameters. Our algorithm, LINESPAR, enables optimization over many more parameters by taking advantage of low-dimensional structure in the high-dimensional search space. It performs Bayesian optimization in high dimensions by iteratively exploring one-dimensional subspaces rather than the entire space of possible parameters at once. This technique also decreases the number of iterations necessary for the algorithm to converge, which is valuable in human trials where the preference data is expensive and difficult to obtain. The LINESPAR algorithm enables faster convergence to user-preferred parameters using only pairwise preference feedback. More importantly, it allows optimization over higher-dimensional spaces, which is not feasible for existing preference-based algorithms. We empirically verify its performance in both simulation and human trials. To the best of our knowledge, this is the first work on high-dimensional preference-based Bayesian optimization.

@inproceedings{scmls20_29,
title={Preference-Based Bayesian Optimization in High Dimensions with Human Feedback},
author={Myra Cheng, Ellen Novoseller, Maegan Tucker, Richard Cheng, Yisong Yue and Joel Burdick},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Employing geometry for rescuing neural networks PDF
Guruprasad Raghavan, Jiayi Li and Matt Thomson

Abstract:Living neural networks in the brain and artificial networks engineered on neuromorphic chips confer systems with the ability to perform multiple cognitive tasks. However, both kinds of networks experience a wide range of physical perturbations, ranging from damage to edges of the network to complete node deletions, that ultimately could lead to network failure. A critical question is to understand how the computational properties of neural networks change in response to node-damage and whether there exist strategies to repair these networks in order to compensate for performance degradation. Here, we study the damage-response characteristics of two classes of neural networks, namely multilayer perceptrons (MLPs) and convolutional neural networks (CNNs) trained to classify images from MNIST and CIFAR-10 datasets respectively. We also propose a new framework to discover efficient repair strategies to rescue damaged neural networks. The framework involves defining damage and repair operators for dynamically traversing the neural networks loss landscape, with the goal of mapping its salient geometric features. Using this strategy, we discover features that resemble path-connected attractor sets in the loss landscape. We also identify that a dynamic recovery scheme, where networks are constantly damaged and repaired, produces a group of networks resilient to damage as it can be quickly rescued. Broadly, our work shows that we can design fault-tolerant networks by applying on-line retraining consistently during damage for real-time applications in biology and machine learning.

@inproceedings{scmls20_30,
title={Employing geometry for rescuing neural networks},
author={Guruprasad Raghavan, Jiayi Li and Matt Thomson},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Mean-Field Analysis of Two-Layer Neural Networks: Non-Asymptotic Rates and Generalization Bounds PDF
Zixiang Chen, Yuan Cao, Quanquan Gu and Tong Zhang

Abstract:A recent line of work in deep learning theory has utilized the mean-field analysis to demonstrate the global convergence of noisy (stochastic) gradient descent for training over-parameterized two-layer neural networks. However, existing results in the mean-field setting do not provide the convergence rate of neural network training, and the generalization error bound is largely missing. In this paper, we provide a mean-field analysis in a generalized neural tangent kernel regime, and show that noisy gradient descent with weight decay can still exhibit a ``kernel-like'' behavior. This implies that the training loss converges linearly up to a certain accuracy in such regime. We also establish a generalization error bound for two-layer neural networks trained by noisy gradient descent with weight decay. Our results shed light on the connection between mean field analysis and the neural tangent kernel based analysis.

@inproceedings{scmls20_32,
title={Mean-Field Analysis of Two-Layer Neural Networks: Non-Asymptotic Rates and Generalization Bounds},
author={Zixiang Chen, Yuan Cao, Quanquan Gu and Tong Zhang},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Generating Factual Documents by Synthesizing Knowledge Sources PDF
Shuyang Li, Jianmo Ni, Henry Mao and Julian McAuley

Abstract:From youth, humans can read and process large amounts of information to write articles, book reports, and conduct deep conversation. Existing large-scale language models are yet incapable of such meaningful generation. We propose a knowledge-grounded document writing task for pre-training an encoder-decoder language model to enable such knowledge synthesis. We will pre-train a model on networks of knowledge-grounded documents from encyclopedias and news, leveraging high-quality source citations common in these fields. We present the datasets that we have collected thus far, methods for large-context knowledge grounded synthesis, and preliminary results indicating the applicability of our framework.

@inproceedings{scmls20_33,
title={Generating Factual Documents by Synthesizing Knowledge Sources},
author={Shuyang Li, Jianmo Ni, Henry Mao and Julian McAuley},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Evaluating Question Answering Evaluation PDF
Anthony Chen, Gabriel Stanovsky, Sameer Singh and Matt Gardner

Abstract:Current metrics for evaluating question answering (QA) datasets are based on n-gram matching, which have a number of known shortcomings. In this work, we examine the quality of current metrics by how well they correlate with human judgements across three diverse QA datasets. Our work indicates that current metrics do reasonably well in evaluating current datasets, but as QA datasets require more abstract generative answering, metrics that go beyond n-gram matching will be required.

@inproceedings{scmls20_34,
title={Evaluating Question Answering Evaluation},
author={Anthony Chen, Gabriel Stanovsky, Sameer Singh and Matt Gardner},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Dueling Posterior Sampling for Interactive Preference-Based Learning PDF
Ellen Novoseller, Maegan Tucker, Yibing Wei, Yanan Sui, Aaron Ames, Yisong Yue and Joel Burdick

Abstract:In many domains, from clinical trials to autonomous driving to human-robot interaction, a reinforcement learning (RL) agent seeks to optimize its behavior while interacting with a human. While many RL algorithms assume the existence of a numerical reward signal, in settings involving humans, it is often unclear how to define a reward signal that accurately reflects intended system performance. For instance, in autonomous driving and robotics, users have been shown to have difficulty with both specifying numerical reward functions and providing demonstrations of desired behavior. Moreover, misspecified reward functions can result in "reward hacking," in which the algorithm finds loopholes in the reward structure, such that undesirable behaviors achieve high rewards. In such situations, while handcrafted numerical reward signals can be problematic, the user's qualitative feedback may more reliably measure her intentions.

@inproceedings{scmls20_35,
title={Dueling Posterior Sampling for Interactive Preference-Based Learning},
author={Ellen Novoseller, Maegan Tucker, Yibing Wei, Yanan Sui, Aaron Ames, Yisong Yue and Joel Burdick},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Deepformers: Training Very Deep Transformers via Dynamical Isometry PDF
Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao and Julian McAuley

Abstract:Neural networks have achieved increasing performance by leveraging the exponential representational capacity of deeper models, with some using thousands of layers. This trend has dominated many convolutional neural networks but has yet to dominate NLP architectural design, such as Transformers. The Transformer self-attention architecture that achieves state of the art performance in many NLP tasks usually has less than 24 layers and we find that trying to train deeper models leads to either convergence difficulties or slow training times. The theoretical study of neural networks with random parameters has revealed a maximum penetration depth that depends on the initialization scheme and the model architecture, which limits the number of layers that can be effectively trained. A network that allows an infinitesimal input perturbation to propagate to the output layer unimpeded in magnitude is said to satisfy the property of dynamical isometry. Fully connected, convolutional and recurrent networks can be initialized to satisfy dynamical isometry, which enables effective training of deep models. In this paper, we aim to leverage dynamical isometry to construct viable deep Transformer-architecture inspired models —Deepformers— specially targeted for generative modeling tasks. As we analyze signal propagation through Transformers, we find two components prohibit dynamical isometry in these models: (1) self-attention (unmasked) and (2) Layer Normalization allowing only alow-dimensional subspace of the input signal to propagate, rendering dynamical isometry impossible. We propose a simple modification to the standard Transformer architecture and show that it enables us to train much deeper networks for downstream language modeling tasks as compared to the standard ones.

@inproceedings{scmls20_36,
title={Deepformers: Training Very Deep Transformers via Dynamical Isometry},
author={Thomas Bachlechner, Bodhisattwa Prasad Majumder, Huanru Henry Mao and Julian McAuley},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

How Much Over-parameterization Is Sufficient toLearn Deep ReLU Networks? PDF
Zixiang Chen, Yuan Cao, Difan Zou and Quanquan Gu

Abstract:A recent line of research on deep learning focuses on the extremely over-parameterized setting, and shows that when the network width is larger than a high degree polynomial of the training sample size $n$ and the inverse of the target accuracy $\epsilon^{-1}$, deep neural networks learned by (stochastic) gradient descent enjoy nice optimization and generalization guarantees. Very recently, it is shown that under certain margin assumption on the training data, a polylogarithmic width condition suffices for two-layer ReLU networks to converge and generalize (Ji and Telgarsky, 2019). However, how much over-parameterization is sufficient to guarantee optimization and generalization for deep neural networks still remains an open question. In this work, we establish sharp optimization and generalization guarantees for deep ReLU networks. Under various assumptions made in previous work, our optimization and generalization guarantees hold with network width polylogarithmic in $n$ and $\epsilon^{-1}$. Our results push the study of over-parameterized deep neural networks towards more practical settings.

@inproceedings{scmls20_37,
title={How Much Over-parameterization Is Sufficient toLearn Deep ReLU Networks?},
author={Zixiang Chen, Yuan Cao, Difan Zou and Quanquan Gu},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Distilling Task-Specific Knowledge from BERT via Adversarial Belief Matching PDF
Huanru Henry Mao, Bodhisattwa Prasad Majumder, Garrison Cottrell and Julian McAuley

Abstract:Large pre-trained language models such as BERT [1] have achieved strong results when fine-tuned on a variety of natural language tasks but are cumbersome to deploy. Applying knowledge distillation (KD) [2] to compress these pre-trained models for a specific downstream task is challenging due to the small amount of task-specific labeled data, resulting in poor performance by the compressed model. Considerable efforts have been spent to improve the distillation process for BERT, involving techniques such as leveraging intermediate hints [3], student pre-training [4] and data augmentation [5]. The success of these methods lead us to hypothesize that the core issue of distilling fine-tuned models is due to the small quantity of data used during the distillation phase, which inhibits seeing sufficient variety of the teacher’s output distribution. Although broadly applicable to KD [6], this problem is exacerbated in the fine-tuned setting because much of the teacher’s knowledge is learned in the pre-training phase and may not be accessible by the student through task-specific data. Using a small number of samples lead to the student to only see a small portion of the teacher’s knowledge, making the knowledge transfer process incomplete. TinyBERT [4] mitigated this by applying a general pre-training stage on the student using BERT’s masked language objective. However, this pre-training step is computationally expensive, requires obtaining a large amount of pre-training data, and is not optimally tailored to the downstream task. Furthermore, in some scenarios the pre-training data may be inaccessible or the pre-training objective may be unknown. In this paper, we focus on tackling task-specific distillation in the low data setting. We aim to distill a BERT model that is fined tuned on some downstream task (referred to as teacher) into a smaller student model. Instead of pre-training the student using a general pre-training method, we propose to tailor our pre-training to the task by training the student on adversarially generated data. To compliment learning, we also incorporate intermediate hints [6] and Sobolev distillation [7] to our learning objective, which enables us to extract more information from the teacher per example.

@inproceedings{scmls20_38,
title={Distilling Task-Specific Knowledge from BERT via Adversarial Belief Matching},
author={Huanru Henry Mao, Bodhisattwa Prasad Majumder, Garrison Cottrell and Julian McAuley},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Personalizing Marked Temporal Point Process Models PDF
Alex Boyd, Padhraic Smyth, Robert Bamler and Stephan Mandt

Abstract:Recurrent neural network (RNN) models have shown considerable promise in recent work for modeling of marked temporal point processes (MTPPs). These models provide flexible characterizations of marked conditional intensity functions, allowing for more effective modeling of complex interactions between events compared to traditional statistical approaches, at the cost of some interpretability. However, a common limitation of these approaches is that they do not account for data heterogeneity within and between sources in an adequate manner. We address this issue by developing a new framework for neural MTPP models that can encode within-source and between-source information via a variational mixture-of-experts autoencoder. Experimental results on three large real-world event datasets illustrate that the proposed approach can leverage source heterogeneity to systematically outperform more traditional models, even with few to no events to condition on when making predictions.

@inproceedings{scmls20_39,
title={Personalizing Marked Temporal Point Process Models},
author={Alex Boyd, Padhraic Smyth, Robert Bamler and Stephan Mandt},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Double Explore-then-Commit: AsymptoticOptimality and Beyond PDF
Tianyuan Jin, Pan Xu, Xiaokui Xiao and Quanquan Gu

Abstract:We study the two-armed bandit problem with subGaussian rewards. The explore-then-commit (ETC) strategy, which consists of an exploration phase followed by an exploitation phase, is one of the most widely used algorithms in a variety of online decision applications. Nevertheless, it has been shown in [GLK16] that ETCis suboptimal in the asymptotic sense as the horizon grows, and thus, is worse than fully sequential strategies such as Upper Confidence Bound (UCB).In this paper, we argue that a variant of ETC algorithm can actually achieve the asymptotically optimal regret bounds for multi-armed bandit problems as UCB-type algorithms do. Specifically, we propose a double explore-then-commit (DETC)algorithm that has two exploration and exploitation phases. We prove that DETC achieves the asymptotically optimal regret bound as the time horizon goes to infinity. To our knowledge, DETC is the first non-fully-sequential algorithm that achieves such asymptotic optimality. In addition, we extend DETC to batched bandit problems, where (i) the exploration process is split into a small number of batches and (ii) the round complexity is of central interest. We prove that a batched version of DETC can achieve the asymptotic optimality with only constant round complexity. This is the first batched bandit algorithm that can attain asymptotic optimality in terms of both regret and round complexity.

@inproceedings{scmls20_40,
title={Double Explore-then-Commit: AsymptoticOptimality and Beyond},
author={Tianyuan Jin, Pan Xu, Xiaokui Xiao and Quanquan Gu},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

Bio-Inspired Hashing for Unsupervised Similarity Search PDF
Chaitanya Ryali, John Hopfield, Leopold Grinberg and Dmitry Krotov

Abstract:The fruit fly Drosophila's olfactory circuit has inspired a new locality sensitive hashing (LSH) algorithm, \texttt{FlyHash}. In contrast with classical LSH algorithms that produce low dimensional hash codes, \texttt{FlyHash} produces sparse high-dimensional hash codes and has also been shown to have superior empirical performance compared to classical LSH algorithms in similarity search. However, \texttt{FlyHash} uses random projections and cannot {\it learn} from data. Building on inspiration from \texttt{FlyHash} and the ubiquity of sparse expansive representations in neurobiology, our work proposes a novel hashing algorithm \texttt{BioHash} that produces sparse high dimensional hash codes in a {\it data-driven} manner. We show that \texttt{BioHash} outperforms previously published benchmarks for various hashing methods. Since our learning algorithm is based on a \textit{local} and \textit{biologically plausible} synaptic plasticity rule, our work provides evidence for the proposal that LSH might be a computational reason for the abundance of sparse expansive motifs in a variety of biological systems. We also propose a convolutional variant \texttt{BioConvHash} that further improves performance. From the perspective of computer science, \texttt{BioHash} and \texttt{BioConvHash} are fast, scalable and yield compressed binary representations that are useful for similarity search.

@inproceedings{scmls20_43,
title={Bio-Inspired Hashing for Unsupervised Similarity Search},
author={Chaitanya Ryali, John Hopfield, Leopold Grinberg and Dmitry Krotov},
booktitle={Southern California Machine Learning Symposium (SCMLS)},
year={2020}
}

SCMLS CANCELLED DUE TO COVID-19

Introduction

Confirmed Keynotes

Yizhou Sun

Matt Gardner

Yuandong Tian

Phebe Vayanos

Call for Contributions

Key Dates

Submission Website

Schedule

This event has been cancelled

Registration

Accepted Posters

Attending

Workshop Organizers

Julian McAuley

Jingbo Shang

Hao Su

Bodhi P. Majumder

Contact us

Previous Symposia

Sponsorship Information

Confirmed Sponsors: