Tutorial: 3D Deep Learning

Talk at Qualcomm by Hao Su, Jiayuan Gu, and Minghua Liu. (March 31st, 2020)
Tutorial on datasets, classification, segmentation, detection, and reconstruction in 3D deep learning.

Learning-based 3D Capturing

Talk at Qualcomm by Rui Chen, Songfang Han, and Shuo Cheng. (March 31st, 2020)
Multi-view Stereo (MVS) is playing an increasingly important role in various fields, eg. AR/VR, autonomous driving. In this session, we will introduce the theory and applications of MVS, analyze classical and recent learning-based MVS, present three papers of our team. Finally, we discuss possible future directions for our research on MVS.

Learning for Interaction

Talk at Qualcomm by Fanbo Xiang, Yuzhe Qin, and Zhiao Huang, and Fangchen Liu. (March 31st, 2020)
Artificial intelligence not only needs to perceive the world but also needs to interact with the environment to accomplish specific goals. For example, the tight coupling of perception and interaction will facilitate robots or autonomous vehicles to make the decision by modeling the complex world. We emphasize the importance of understanding the environment structure for interaction tasks. We first talk about how we help agents interact with the environment by understanding the structure of the environment state. By properly abstracting the state space, we show that combining search algorithms and reinforcement learning can largely improve the generalization ability and data efficiency compared to previous methods. Next, we will talk about how learning methods are applied to real-world problems. We have developed SAPIEN, a robotics research platform that provides rich physical simulations and scenarios. Finally, we will show that we can analyze 3D scenes directly through supervised learning for the robot grasping problem.

Concepts and Graph-based Reasoning

Talk at Qualcomm by Jiayuan Gu, Tongzhou Mu, and Hao Tang. (March 31st, 2020)
A good object-centric abstraction can enable an agent, like an autonomous vehicle or a domestic robot, to adapt to a new environment fast, without intensive and offline re-training. To this end, (I) we propose a novel framework, Task-driven Entity Abstraction (TEA), to learn task-relevant entities from raw visual observations in an unsupervised fashion. TEA can provide high-quality object discovery results, which in turn also benefits solving new tasks in terms of compositional and spatial generalizability. (II) Current graph neural networks (GNNs) lack generalizability with respect to scales (graph sizes, graph diameters, edge weights, etc..). Therefore, we propose an architecture, IterGNN, to approximate iterative graph algorithms, without supervised information about iteration numbers during training. When solving the shortest path length problem, the final model impressively generalized to graphs of diameter as large as 1000, while only trained on graphs of diameter less than 30, which is far superior to existing GNN methods.
SU Lab Research Report of 2018-2019 (Understanding 3D Environments for Interactions)
Edited from Invited talks for CVPR2019/RSS2019 (updated on July 3, 2019)
The mission and big picture of research happening in SU Lab --- learning to interact with the environment. It describes the extension of SU Lab's research focus from deep 3D representation learning to broader topics of artifical intelligence for interacting with the environment. Not all papers published in the year are included in the report. Missing topics are binary neural networks and adversarial defense.
Towards Attack-Agnostic Defense for 2D and 3D Recognition
Invited talk at the Workshop of AdvML in CVPR2019 (updated on July 3, 2019)
A summary of the work on 2D/3D adversarial defense in 2018-2019. The main messages are: (1) Lower-dimensional data seems to be easier to defend; and (2)Defending in lower resolution seems to be more attack agnostic.
Synthesize for Learning
Invited talk at 3DV workshop on Understanding 3D and Visuo-Motor Learning (updated in Sep, 2016)
Use synthetic data to train learning algorithms for applications such as viewpoint estimation, human pose estimation, and robot perception. Based upon 5 recent papers of mine.