Learning Generic and Generalizable Object Manipulation Policies

Talk at UCSD AI Seminar by Hao Su
To build robots with general task-solving abilities as humans, as a pre-requisite, robots must possess a diverse set of object manipulation skills (generic), and these skills must apply to objects and configurations that are even unseen (generalizable).To foster reproducible, low-cost, and fast-cycle research, Su Lab has been pushing the development of open-source task suites, ManiSkill, as a community service. Prof. Su first introduces the ManiSkill project and then introduces a series of algorithms on manipulation skill learning, including how to solve difficult RL problems at scale and how to achieve efficient reinforcement learning when the input is 3D data.

Deep Learning on Point Clouds

Talk at Symposium on Geometry Processing (SGP) 2022 by Hao Su
Point cloud is an important type of geometric data structure. They are simple and unified structures that avoid the combinatorial irregularities and complexities of meshes. These properties make point clouds widely used for 3D reconstruction or visual understanding applications, such as AR, autonomous driving, and robotics. This course will teach how we apply deep learning methods to point cloud data. We will cover the following topics in this short course and will end with some open problems.
  • Basic neural architectures to process point cloud as input or to generate
  • point cloud as output Scene-level understanding of static and dynamic point
  • clouds Point cloud based inverse graphics Learning to convert point cloud to
  • other 3D representations Learning to map point cloud with data in other
  • modalities (images, languages)

3D Learning for Manipulation: Simulation, Benchmark, and Learning

Talk at NeurIPS robotics learning workshop by Hao Su, Kaichun Mo, and Fanbo Xiang.

Compositional Generalizability in Geometry, Physics, and Policy Learning

Talk at UPenn by Hao Su (December 5th, 2020)
It is well known that deep neural networks are universal function approximators and have good generalizability when the training and test datasets are sampled from the same distribution. Most deep learning-based applications and theories in the past decade are based upon this setup. While the view of learning function approximators has been rewarding to the community, we are seeing more and more of its limitations when dealing with the real-world problem space that is combinatorially exploded. In this talk, I will discuss a possible shift of view, from learning function approximators to learning algorithm approximators, by some preliminary work in my lab. Our ultimate goal is to achieve generalizability when learning in a problem space of combinatorial complexity. We refer to this desired generalizability as compositional generalizability. To this goal, we take important problems in geometry, physics, and policy learning as testbeds. Particularly, I will introduce how we build algorithms with state-of-the-art compositional generalizability on these testbeds, following a bottom-up principle and a modularized principle.

Tutorial: 3D Deep Learning

Talk at Qualcomm by Hao Su, Jiayuan Gu, and Minghua Liu. (March 31st, 2020)
Tutorial on datasets, classification, segmentation, detection, and reconstruction in 3D deep learning.

Learning-based 3D Capturing

Talk at Qualcomm by Rui Chen, Songfang Han, and Shuo Cheng. (March 31st, 2020)
Multi-view Stereo (MVS) is playing an increasingly important role in various fields, eg. AR/VR, autonomous driving. In this session, we will introduce the theory and applications of MVS, analyze classical and recent learning-based MVS, present three papers of our team. Finally, we discuss possible future directions for our research on MVS.

Learning for Interaction

Talk at Qualcomm by Fanbo Xiang, Yuzhe Qin, and Zhiao Huang, and Fangchen Liu. (March 31st, 2020)
Artificial intelligence not only needs to perceive the world but also needs to interact with the environment to accomplish specific goals. For example, the tight coupling of perception and interaction will facilitate robots or autonomous vehicles to make the decision by modeling the complex world. We emphasize the importance of understanding the environment structure for interaction tasks. We first talk about how we help agents interact with the environment by understanding the structure of the environment state. By properly abstracting the state space, we show that combining search algorithms and reinforcement learning can largely improve the generalization ability and data efficiency compared to previous methods. Next, we will talk about how learning methods are applied to real-world problems. We have developed SAPIEN, a robotics research platform that provides rich physical simulations and scenarios. Finally, we will show that we can analyze 3D scenes directly through supervised learning for the robot grasping problem.

Concepts and Graph-based Reasoning

Talk at Qualcomm by Jiayuan Gu, Tongzhou Mu, and Hao Tang. (March 31st, 2020)
A good object-centric abstraction can enable an agent, like an autonomous vehicle or a domestic robot, to adapt to a new environment fast, without intensive and offline re-training. To this end, (I) we propose a novel framework, Task-driven Entity Abstraction (TEA), to learn task-relevant entities from raw visual observations in an unsupervised fashion. TEA can provide high-quality object discovery results, which in turn also benefits solving new tasks in terms of compositional and spatial generalizability. (II) Current graph neural networks (GNNs) lack generalizability with respect to scales (graph sizes, graph diameters, edge weights, etc..). Therefore, we propose an architecture, IterGNN, to approximate iterative graph algorithms, without supervised information about iteration numbers during training. When solving the shortest path length problem, the final model impressively generalized to graphs of diameter as large as 1000, while only trained on graphs of diameter less than 30, which is far superior to existing GNN methods.
SU Lab Research Report of 2018-2019 (Understanding 3D Environments for Interactions)
Edited from Invited talks for CVPR2019/RSS2019 (updated on July 3, 2019)
The mission and big picture of research happening in SU Lab --- learning to interact with the environment. It describes the extension of SU Lab's research focus from deep 3D representation learning to broader topics of artifical intelligence for interacting with the environment. Not all papers published in the year are included in the report. Missing topics are binary neural networks and adversarial defense.
Towards Attack-Agnostic Defense for 2D and 3D Recognition
Invited talk at the Workshop of AdvML in CVPR2019 (updated on July 3, 2019)
A summary of the work on 2D/3D adversarial defense in 2018-2019. The main messages are: (1) Lower-dimensional data seems to be easier to defend; and (2)Defending in lower resolution seems to be more attack agnostic.
Synthesize for Learning
Invited talk at 3DV workshop on Understanding 3D and Visuo-Motor Learning (updated in Sep, 2016)
Use synthetic data to train learning algorithms for applications such as viewpoint estimation, human pose estimation, and robot perception. Based upon 5 recent papers of mine.