| 
          
            | 
                Longlong Jing
               
                I am a fifth year Ph.D. student in the Media Lab, Dept. Electrical Engineering at The City College of New York, CUNY, advised by Professor Ying-Li Tian. 
                My current research focuses on Video Analysis including human action recognition and self-supervised video feature learning. Prior to CUNY, I finished my Bachelor's degree from Central South University.
                  
               
                Email  / 
                Google Scholar  / 
                Biography  / 
                 LinkedIn 
               |   |  
          
            | 
                News:
                
 
                   Our work on 3D semi-supervised learning is accepted by WACV2023.
  Our work on monocular depth estimation is accepted by ECCV2022.
  Our work on semi-supervised video classification is accepted by CVPR2022.
  Our work on monocular long-range depth estimation is accepted by ICLR2022.
  Our work on monocular 3D detection and tracking is accepted by ICRA2022. 
 |  
          
            | Research 
                I am interested in computer vision, machine learning, and image processing. Most of my research is about video analysis such as human action recognition, video feature self-supervised learning, and video feature learning from noisy data. I have also worked in weakly supervised semantic segmentation and lung nodule segmentation in CT scans with Generative Adversarial Networks.
               |  
	
            |  | Cross-Modal Center Loss Longlong Jing*,
	      Elahe Vahdani*,
	      Jiaxing Tan,
              Yingli Tian
 CVPR, 2021
 PDF
 Proposed a novel loss function named cross-modal center loss to learn modality-invariant features with minimum modality discrepancy for multi-modal data. |  
            |  | Self-supervised Visual Features Learning with Deep Neural Networks: A Survey Longlong Jing,
              Yingli Tian
 TPAMI, 2020
 PDF
 A comprehensive survey about self-supervised visual feature learning with deep ConvNets. Slides will be released very soon.  |  
            |  | Self-supervised Modal and View Invariant Feature Learning Longlong Jing,
              Yucheng Chen,
              Ling Zhang,
              Mingyi He,
              Yingli Tian
 Arxiv , 2020
 PDF
 We proposed to jointly learn modal-invariant and view-invariant features for different modalities including image, point cloud, and mesh with heterogeneous networks for 3D data.  |  
              |  | Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences Longlong Jing,
                Yucheng Chen,
                Ling Zhang,
                Mingyi He,
                Yingli Tian
 Arxiv , 2020
 PDF
 We propose a novel and effective self-supervised learning approach to jointly learn both 2D image features and 3D point cloud features by exploiting cross-modality and cross-view correspondences without using any human annotated labels.  |  
              |  | VideoSSL: Semi-Supervised Learning for Video Classification Longlong Jing,
                Toufiq Parag,
                Zhe Wu,
                Yingli Tian,
                Hongcheng Wang
 WACV , 2021
 PDF
 We propose a semi-supervised learning approach for video classification, VideoSSL, using convolutional neural networks (CNN). |  
            |  | Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction Longlong Jing,
              Xiaodong Yang,
	      Jingen Liu,
              Yingli Tian
 Arxiv , 2019
 PDF
 We can learn spatiotemporal feature from large-scale unlabeled videos with 3D Convolutional Neural Networks. |  
            |  | Multi-camera Vehicle Tracking and Re-identification on AI City Challenge 2019 Yucheng Chen,
              Longlong Jing,
              Elahe Vahdani,
              Ling Zhang,
              Mingyi He,
              Yingli Tian
 CVPR AI City Workshop, 2019
 PDF /
	      Slides /
	      Poster
  Our solutions to the image-based vehicle re-identification track and multi-camera vehicle tracking track on AI City Challenge 2019 (AIC2019).  Our proposed
                  framework outperforms the current state-of-the-art vehicle ReID method by 16.3% on Veri dataset.  |  
            |  | Recognizing American Sign Language Manual Signs from RGB-D Videos Longlong Jing*,
              Elahe Vahdani*,
              Yingli Tian,
              Matt Huenerfaut
 Under Review , 2020
 PDF /
	      Project Page /
	      Dataset /
 We propose a 3D ConvNet based multi-stream framework to recognize American Sign Language (ASL) manual signs in real-time from RGB-D videos. |  
            |  | 
                
                  Coarse-to-fine Semantic Segmentation from Image-level Labels
                
                Longlong Jing*,
                Yucheng Chen*,
                Yingli Tian
 TIP, 2019
 PDF
 We propose a novel recursive coarse-to-fine semantic segmentation framework based on only image-level category labels. |  
            |  | LGAN: Lung Segmentation in CT scans using Generative Adversarial Network Jiaxing Tan*,
              Longlong Jing*,
              Yingli Tian,
              Oguz Akin,
              Yumei Huo
 CMIG, 2020
 PDF
 We propose a novel deep learning Generative Adversarial Network (GAN) based lung segmentation schema for CT scans by redesigning the loss function of the discriminator which leads to more accurate result.  |  
            |  | Video You Only Look Once: Overall Temporal Convolutions for ActionRecognition Longlong Jing,
              Xiaodong Yang,
              Yingli Tian
 JVCI, 2018
 PDF
 We propose an efficient and straightforward approach to capture the overall temporal dynamics from an entire video in a single process for action recognition. |  
            |  | 3D Convolutional Neural Network with Multi-Model Framework for Action Recognition Longlong Jing,
              Yuancheng Ye,
              Xiaodong Yang,
              Yingli Tian
 ICIP, 2017
 PDF /
              Code
 We propose an efficient and effective action recognition framework by combining multiple feature models from dynamic image, optical flow and raw frame, with 3D ConvNet. |  |