Learning Generic and Generalizable Object Manipulation Policies
Talk at UCSD AI Seminar by Hao Su
To build robots with general task-solving abilities as humans, as
a pre-requisite, robots must possess a diverse set of object manipulation skills
(generic), and these skills must apply to objects and configurations that are
even unseen (generalizable).To foster reproducible, low-cost, and fast-cycle
research, Su Lab has been pushing the development of open-source task suites,
ManiSkill, as a community service. Prof. Su first introduces the ManiSkill
project and then introduces a series of algorithms on manipulation skill
learning, including how to solve difficult RL problems at scale and how to
achieve efficient reinforcement learning when the input is 3D data.
show more
+
Deep Learning on Point Clouds
Talk at Symposium on Geometry Processing (SGP) 2022 by Hao Su
Point cloud is an important type of geometric data structure. They
are simple and unified structures that avoid the combinatorial
irregularities and complexities of meshes. These properties make
point clouds widely used for 3D reconstruction or visual
understanding applications, such as AR, autonomous driving, and
robotics. This course will teach how we apply deep learning
methods to point cloud data. We will cover the following topics in
this short course and will end with some open problems.
Basic neural architectures to process point cloud as input or to generate
point cloud as output Scene-level understanding of static and dynamic point
clouds Point cloud based inverse graphics Learning to convert point cloud to
other 3D representations Learning to map point cloud with data in other
modalities (images, languages)
show more
+
3D Learning for Manipulation: Simulation, Benchmark, and Learning
Talk at NeurIPS robotics learning workshop by Hao Su, Kaichun Mo, and Fanbo Xiang.
Compositional Generalizability in Geometry, Physics, and Policy Learning
It is well known that deep neural networks are universal function
approximators and have good generalizability when the training and
test datasets are sampled from the same distribution. Most deep
learning-based applications and theories in the past decade are
based upon this setup. While the view of learning function
approximators has been rewarding to the community, we are seeing
more and more of its limitations when dealing with the real-world
problem space that is combinatorially exploded.
In this talk, I will discuss a possible shift of view, from
learning function approximators to learning algorithm
approximators, by some preliminary work in my lab. Our ultimate
goal is to achieve generalizability when learning in a problem
space of combinatorial complexity. We refer to this desired
generalizability as compositional generalizability. To this goal,
we take important problems in geometry, physics, and policy
learning as testbeds. Particularly, I will introduce how we build
algorithms with state-of-the-art compositional generalizability on
these testbeds, following a bottom-up principle and a modularized
principle.
show more
+
Tutorial: 3D Deep Learning
Talk at Qualcomm by Hao Su, Jiayuan Gu, and
Minghua Liu. (March 31st, 2020)
Tutorial on datasets, classification, segmentation, detection, and
reconstruction in 3D deep learning.
Learning-based 3D Capturing
Talk at Qualcomm by Rui Chen, Songfang Han, and
Shuo Cheng. (March 31st, 2020)
Multi-view Stereo (MVS) is playing an increasingly important role in
various fields, eg. AR/VR, autonomous driving. In this session, we
will introduce the theory and applications of MVS, analyze classical
and recent learning-based MVS, present three papers of our team.
Finally, we discuss possible future directions for our research on
MVS.
show more
+
Learning for Interaction
Talk at Qualcomm by Fanbo Xiang, Yuzhe Qin, and Zhiao Huang, and
Fangchen Liu. (March 31st, 2020)
Artificial intelligence not only needs to perceive the world but also
needs to interact with the environment to accomplish specific goals. For
example, the tight coupling of perception and interaction will facilitate robots
or autonomous vehicles to make the decision by modeling the complex world. We
emphasize the importance of understanding the environment structure for
interaction tasks. We first talk about how we help agents interact with the
environment by understanding the structure of the environment state. By properly
abstracting the state space, we show that combining search algorithms and
reinforcement learning can largely improve the generalization ability and data
efficiency compared to previous methods. Next, we will talk about how learning
methods are applied to real-world problems. We have developed SAPIEN, a robotics
research platform that provides rich physical simulations and scenarios.
Finally, we will show that we can analyze 3D scenes directly through supervised
learning for the robot grasping problem.
show more
+
Concepts and Graph-based Reasoning
Talk at Qualcomm by Jiayuan Gu, Tongzhou Mu, and
Hao Tang. (March 31st, 2020)
A good object-centric abstraction can enable an agent, like an
autonomous vehicle or a domestic robot, to adapt to a new
environment fast, without intensive and offline re-training. To
this end, (I) we propose a novel framework, Task-driven Entity
Abstraction (TEA), to learn task-relevant entities from raw visual
observations in an unsupervised fashion. TEA can provide
high-quality object discovery results, which in turn also benefits
solving new tasks in terms of compositional and spatial
generalizability. (II) Current graph neural networks (GNNs)
lack generalizability with respect to scales (graph sizes, graph
diameters, edge weights, etc..). Therefore, we propose an
architecture, IterGNN, to approximate iterative graph algorithms,
without supervised information about iteration numbers during
training. When solving the shortest path length problem, the final
model impressively generalized to graphs of diameter as large as
1000, while only trained on graphs of diameter less than 30, which
is far superior to existing GNN methods.
show more
+
SU Lab Research Report of 2018-2019 (Understanding 3D Environments for Interactions)
Edited from Invited talks for CVPR2019/RSS2019 (updated on July 3, 2019)
The mission and big picture of research happening in SU Lab --- learning to interact with the environment.
It describes the extension of SU Lab's research focus from deep 3D representation learning to broader topics of artifical intelligence for interacting with the environment. Not all papers published in the year are included in the report. Missing topics are binary neural networks and adversarial defense.
Towards Attack-Agnostic Defense for 2D and 3D Recognition
Invited talk at the Workshop of AdvML in CVPR2019 (updated on July 3, 2019)
A summary of the work on 2D/3D adversarial defense in 2018-2019. The main messages are: (1) Lower-dimensional data seems to be easier to defend; and (2)Defending in lower resolution seems to be more attack agnostic.
Synthesize for Learning
Invited talk at 3DV workshop on Understanding 3D and Visuo-Motor Learning (updated in Sep, 2016)
Use synthetic data to train learning algorithms for applications such as viewpoint estimation, human pose estimation, and robot perception. Based upon 5 recent papers of mine.