About me

My current work mainly involves deep generative models for image inpainting, amodal scene parsing, and occlusion reasoning. My PhD dissertation focused on object recognition and parsing with weak supervision, including weakly supervised learning, domain adaptation (learning from synthetic), few-shot/zero-shot learning, etc.。


  • Ph.D. in Computer Science, Johns Hopkins University
  • M.S.E. in Computer Science, Johns Hopkins University
  • M.S. in Molecular and Cell Biology, The University of Texas at Dallas
  • B.S. in Chemistry and Psychology , Peking University


Recent Projects

CGPart: A Part Segmentation Dataset Based on 3D Computer Graphics Models

Part segmentations provide a rich and detailed part-level description of objects, but their annotation requires an enormous amount of work. In this paper, we introduce CGPart, a comprehensive part segmentation dataset that provides detailed annotations on 3D CAD models, synthetic images, and real test images. To illustrate the value of CGPart, we apply it to image part segmentation through unsupervised domain adaptation (UDA). We evaluate several baseline methods by adapting top-performing UDA algorithms from related tasks to part segmentation. Moreover, we introduce a new method called Geometric-Matching Guided domain adaptation (GMG), which leverages the spatial object structure to guide the knowledge transfer from the synthetic to the real images. Experimental results demonstrate the advantage of our algorithm and reveal insights for future improvement.

Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency

Weakly supervised instance segmentation reduces the cost of annotations required to train models. However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of objects and (b) missing object predictions. We show that these issues can be better addressed by training with weakly labeled videos instead of images, and are the first to explore the use of video signals to tackle weakly supervised instance segmentation. First, we adapt inter-pixel relation network (IRN) to effectively incorporate motion information during training. Second, we introduce a new MaskConsist module, which addresses the problem of missing object instances by transferring stable predictions between neighboring frames during training. We demonstrate that both approaches together improve the instance segmentation metric AP50 on video frames of two datasets: Youtube-VIS and Cityscapes by 5% and 3% respectively.

Incremental Few-Shot Meta-Learning via Indirect Discriminant Alignment

We propose a method to train a model so it can learn new classification tasks while improving with each task solved. This amounts to combining meta-learning with incremental learning. Different tasks can have disjoint classes, so one cannot directly align different classifiers as done in model distillation. On the other hand, simply aligning features shared by all classes does not allow the base model sufficient flexibility to evolve to solve new tasks. We therefore indirectly align features relative to a minimal set of "anchor classes". Such indirect discriminant alignment (IDA) adapts a new model to old classes without the need to re-process old data, while leaving maximum flexibility for the model to adapt to new tasks. This process enables incrementally improving the model by processing multiple learning episodes, each representing a different learning task, even with few training examples. Experiments on few-shot learning benchmarks show that this incremental approach performs favorably compared to training the model with the entire dataset at once.

Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval

We investigate the problem ZS-SBIR from the viewpoint of domain adaptation. Based on a framework which starts with a pre-trained model on ImageNet and fine-tunes it on the training set of SBIR benchmark, we advocate the importance of preserving previously acquired knowledge, e.g., the rich discriminative features learned from ImageNet, so as to improve the model's transfer ability. Zero-shot experiments on two extended SBIR datasets verify the superior performance of our approach. Extensive diagnostic experiments validate that knowledge preserved benefits SBIR in zero-shot settings, as a large fraction of the performance gain is from the more properly structured feature embedding for photo images.


Academic Services

  • Reviewer: CVPR 2022, AAAI 2022, WACV 2022, CVPR 2021, ICCV 2021, WACV 2021, ICCV 2019
  • Teaching Assistant: EN. 601.461/661: Computer Vision.
    Johns Hopkins University - Spring 2020

Work Experience

  • Research Scientist/Engineer, Adobe : 2021/11-Present
  • Research Intern, Facebook AI : 2020/05-2020/09
  • Applied Scientist Intern, Amazon AWS ReKognition: 2019/06-2019/09
  • Applied Scientist Intern, Amazon Transaction Risk Management Systems: 2018/05-2018/08