CVPR 2022 论文和开源项目合集(papers with code)!
CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view
注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~
- Backbone
- CLIP
- GAN
- GNN
- NAS
- OCR
- NeRF
- Visual Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 知识蒸馏(Knowledge Distillation)
- 目标检测(Object Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 小样本分类(Few-Shot Classification)
- 小样本分割(Few-Shot Segmentation)
- 视频理解(Video Understanding)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 去模糊(Deblur)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D重建(3D Reconstruction)
- 伪装物体检测(Camouflaged Object Detection)
- 深度估计(Depth Estimation)
- 立体匹配(Stereo Matching)
- 车道线检测(Lane Detection)
- 图像修复(Image Inpainting)
- 图像检索(Image Retrieval)
- 人脸识别(Face Recognition)
- 人群计数(Crowd Counting)
- 医学图像(Medical Image)
- 场景图生成(Scene Graph Generation)
- 参考视频目标分割(Referring Video Object Segmentation)
- 风格迁移(Style Transfer)
- Adversarial Examples(对抗样本)
- 弱监督物体检测(Weakly Supervised Object Localization)
- 雷达目标检测(Radar Object Detection)
- 高光谱图像重建(Hyperspectral Image Reconstruction)
- 图像拼接(Image Stitching)
- 水印(Watermarking)
- Action Counting
- Grounded Situation Recognition
- Zero-shot Learning
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
A ConvNet for the 2020s
- Paper: https://arxiv.org/abs/2201.03545
- Code: https://github.com/facebookresearch/ConvNeXt
- 中文解读:https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
MPViT : Multi-Path Vision Transformer for Dense Prediction
- Paper: https://arxiv.org/abs/2112.11010
- Code: https://github.com/youngwanLEE/MPViT
- 中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg
Mobile-Former: Bridging MobileNet and Transformer
- Paper: https://arxiv.org/abs/2108.05895
- Code: None
- 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ
MetaFormer is Actually What You Need for Vision
Shunted Self-Attention via Multi-Scale Token Aggregation
- Paper(Oral): https://arxiv.org/abs/2111.15193
- Code: https://github.com/OliverRensu/Shunted-Transformer
TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing
HairCLIP: Design Your Hair by Text and Reference Image
PointCLIP: Point Cloud Understanding by CLIP
Blended Diffusion for Text-driven Editing of Natural Images
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
-
Homepage: https://semanticstylegan.github.io/
Style Transformer for Image Inversion and Editing
Unsupervised Image-to-Image Translation with Generative Prior
- Homepage: https://www.mmlab-ntu.com/project/gpunit/
- Paper: https://arxiv.org/abs/2204.03641
- Code: https://github.com/williamyang1991/GP-UNIT
OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks
- Paper: https://wanyu-lin.github.io/assets/publications/wanyu-cvpr2022.pdf
- Code: https://github.com/WanyuGroup/CVPR2022-OrphicX
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
- Paper: https://arxiv.org/abs/2111.15362
- Code: None
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
-
Homepage: https://jonbarron.info/mipnerf360/
Point-NeRF: Point-based Neural Radiance Fields
- Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
- Paper: https://arxiv.org/abs/2201.08845
- Code: https://github.com/Xharlie/point-nerf
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images
- Paper: https://arxiv.org/abs/2111.13679
- Homepage: https://bmild.github.io/rawnerf/
- Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc
Urban Radiance Fields
-
Homepage: https://urban-radiance-fields.github.io/
Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
MPViT : Multi-Path Vision Transformer for Dense Prediction
MetaFormer is Actually What You Need for Vision
Mobile-Former: Bridging MobileNet and Transformer
- Paper: https://arxiv.org/abs/2108.05895
- Code: None
- 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ
Shunted Self-Attention via Multi-Scale Token Aggregation
- Paper(Oral): https://arxiv.org/abs/2111.15193
- Code: https://github.com/OliverRensu/Shunted-Transformer
Language-based Video Editing via Multi-Modal Multi-Level Transformer
- Paper: https://arxiv.org/abs/2104.01122
- Code: None
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
- Paper: https://arxiv.org/abs/2203.00859
- Code: None
Embracing Single Stride 3D Object Detector with Sparse Transformer
- Paper: https://arxiv.org/abs/2112.06375
- Code: https://github.com/TuSimple/SST
- 中文解读:https://zhuanlan.zhihu.com/p/476056546
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
- Homepage: https://point-bert.ivg-research.xyz/
- Paper: https://arxiv.org/abs/2111.14819
- Code: https://github.com/lulutang0608/Point-BERT
GroupViT: Semantic Segmentation Emerges from Text Supervision
-
Homepage: https://jerryxu.net/GroupViT/
Restormer: Efficient Transformer for High-Resolution Image Restoration
Splicing ViT Features for Semantic Appearance Transfer
- Homepage: https://splice-vit.github.io/
- Paper: https://arxiv.org/abs/2201.00424
- Code: https://github.com/omerbt/Splice
Self-supervised Video Transformer
-
Homepage: https://kahnchana.github.io/svt/
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
Accelerating DETR Convergence via Semantic-Aligned Matching
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
- Paper: https://arxiv.org/abs/2203.01305
- Code: https://github.com/FengLi-ust/DN-DETR
- 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
Style Transformer for Image Inversion and Editing
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Mask Transfiner for High-Quality Instance Segmentation
Language as Queries for Referring Video Object Segmentation
- Paper: https://arxiv.org/abs/2201.00487
- Code: https://github.com/wjn922/ReferFormer
- 中文解读:https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
AdaMixer: A Fast-Converging Query-Based Object Detector
- Paper(Oral): https://arxiv.org/abs/2203.16507
- Code: https://github.com/MCG-NJU/AdaMixer
Omni-DETR: Omni-Supervised Object Detection with Transformers
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Code: https://github.com/SvipRepetitionCounting/TransRAC
Collaborative Transformers for Grounded Situation Recognition
Conditional Prompt Learning for Vision-Language Models
Bridging Video-text Retrieval with Multiple Choice Question
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
- Paper: https://arxiv.org/abs/2203.06965
- Code: None
Crafting Better Contrastive Views for Siamese Representation Learning
- Paper: https://arxiv.org/abs/2202.03278
- Code: https://github.com/xyupeng/ContrastiveCrop
- 中文解读:https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A
HCSC: Hierarchical Contrastive Selective Coding
-
Homepage: https://github.com/gyfastas/HCSC
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
AlignMix: Improving representation by interpolating aligned features
- Paper: https://arxiv.org/abs/2103.15375
- Code: None
Decoupled Knowledge Distillation
- Paper: https://arxiv.org/abs/2203.08679
- Code: https://github.com/megvii-research/mdistiller
- 中文解读:https://mp.weixin.qq.com/s/-4AA0zKIXh9Ei9-vc5jOhw
BoxeR: Box-Attention for 2D and 3D Transformers
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
- Paper: https://arxiv.org/abs/2203.01305
- Code: https://github.com/FengLi-ust/DN-DETR
- 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
Accelerating DETR Convergence via Semantic-Aligned Matching
Localization Distillation for Dense Object Detection
- Paper: https://arxiv.org/abs/2102.12252
- Code: https://github.com/HikariTJU/LD
- Code2: https://github.com/HikariTJU/LD
- 中文解读:https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg
Focal and Global Knowledge Distillation for Detectors
- Paper: https://arxiv.org/abs/2111.11837
- Code: https://github.com/yzd-v/FGD
- 中文解读:https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ
A Dual Weighting Label Assignment Scheme for Object Detection
AdaMixer: A Fast-Converging Query-Based Object Detector
- Paper(Oral): https://arxiv.org/abs/2203.16507
- Code: https://github.com/MCG-NJU/AdaMixer
Omni-DETR: Omni-Supervised Object Detection with Transformers
SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection
- Paper(Oral): https://arxiv.org/abs/2203.06398
- Code: https://github.com/CityU-AIM-Group/SIGMA
Correlation-Aware Deep Tracking
- Paper: https://arxiv.org/abs/2203.01666
- Code: None
TCTrack: Temporal Contexts for Aerial Tracking
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
Learning of Global Objective for Network Flow in Multi-Object Tracking
- Paper: https://arxiv.org/abs/2203.16210
- Code: None
Novel Class Discovery in Semantic Segmentation
- Homepage: https://ncdss.github.io/
- Paper: https://arxiv.org/abs/2112.01900
- Code: https://github.com/HeliosZhao/NCDSS
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2106.05095
- Code: https://github.com/LiheYoung/ST-PlusPlus
- 中文解读:https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
- Homepage: https://haochen-wang409.github.io/U2PL/
- Paper: https://arxiv.org/abs/2203.03884
- Code: https://github.com/Haochen-Wang409/U2PL
- 中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ
Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation
Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
GroupViT: Semantic Segmentation Emerges from Text Supervision
-
Homepage: https://jerryxu.net/GroupViT/
BoxeR: Box-Attention for 2D and 3D Transformers
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
Mask Transfiner for High-Quality Instance Segmentation
FreeSOLO: Learning to Segment Objects without Annotations
- Paper: https://arxiv.org/abs/2202.12181
- Code: None
Efficient Video Instance Segmentation via Tracklet Query and Proposal
- Homepage: https://jialianwu.com/projects/EfficientVIS.html
- Paper: https://arxiv.org/abs/2203.01853
- Demo: https://youtu.be/sSPMzgtMKCE
Integrative Few-Shot Learning for Classification and Segmentation
Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation
Integrative Few-Shot Learning for Classification and Segmentation
Self-supervised Video Transformer
- Homepage: https://kahnchana.github.io/svt/
- Paper: https://arxiv.org/abs/2112.01514
- Code: https://github.com/kahnchana/svt
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Code: https://github.com/SvipRepetitionCounting/TransRAC
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
-
Paper(Oral): https://arxiv.org/abs/2204.03646
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition
- Paper(Oral): https://arxiv.org/abs/2204.02148
- Code: None
Spatio-temporal Relation Modeling for Few-shot Action Recognition
End-to-End Semi-Supervised Learning for Video Action Detection
- Paper: https://arxiv.org/abs/2203.04251
- Code: None
Style Transformer for Image Inversion and Editing
Blended Diffusion for Text-driven Editing of Natural Images
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
-
Homepage: https://semanticstylegan.github.io/
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
- Paper: https://arxiv.org/abs/2111.15362
- Code: None
Restormer: Efficient Transformer for High-Resolution Image Restoration
Learning the Degradation Distribution for Blind Image Super-Resolution
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
- Paper: https://arxiv.org/abs/2104.13371
- Code: https://github.com/open-mmlab/mmediting
- Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
- 中文解读:https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g
Learning to Deblur using Light Field Generated and Real Defocus Images
-
Homepage: http://lyruan.com/Projects/DRBNet/
-
Paper(Oral): https://arxiv.org/abs/2204.00442
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
-
Homepage: https://point-bert.ivg-research.xyz/
A Unified Query-based Paradigm for Point Cloud Understanding
- Paper: https://arxiv.org/abs/2203.01252
- Code: None
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
PointCLIP: Point Cloud Understanding by CLIP
BoxeR: Box-Attention for 2D and 3D Transformers
- Paper: https://arxiv.org/abs/2111.13087
- Code: https://github.com/kienduynguyen/BoxeR
- 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
Embracing Single Stride 3D Object Detector with Sparse Transformer
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
HyperDet3D: Learning a Scene-conditioned 3D Object Detector
- Paper: https://arxiv.org/abs/2204.05599
- Code: None
Scribble-Supervised LiDAR Semantic Segmentation
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
PTTR: Relational 3D Point Cloud Object Tracking with Transformer
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
- Paper: https://arxiv.org/abs/2203.00859
- Code: None
Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation
- Paper: https://arxiv.org/abs/2203.07697
- Code: None
- 中文解读:https://mp.weixin.qq.com/s/L_F28IFLXvs5R4V9TTUpRw
MonoScene: Monocular 3D Semantic Scene Completion
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
- Homepage: https://banmo-www.github.io/
- Paper: https://arxiv.org/abs/2112.12761
- Code: https://github.com/facebookresearch/banmo
- 中文解读:https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
- Paper: https://arxiv.org/abs/2203.01502
- Code: None
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
- Paper: https://arxiv.org/abs/2203.00838
- Code: None
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- Paper: https://arxiv.org/abs/2112.02306
- Code: None
P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior
ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching
Rethinking Efficient Lane Detection via Curve Modeling
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
Correlation Verification for Image Retrieval
- Paper(Oral): https://arxiv.org/abs/2204.01458
- Code: https://github.com/sungonce/CVNet
AdaFace: Quality Adaptive Margin for Face Recognition
- Paper(Oral): https://arxiv.org/abs/2204.00964
- Code: https://github.com/mk-minchul/AdaFace
Leveraging Self-Supervision for Cross-Domain Crowd Counting
- Paper: https://arxiv.org/abs/2103.16291
- Code: None
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
- Paper: https://arxiv.org/abs/2203.02533
- Code: None
Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification
SGTR: End-to-end Scene Graph Generation with Transformer
- Paper: https://arxiv.org/abs/2112.12970
- Code: None
Language as Queries for Referring Video Object Segmentation
ReSTR: Convolution-free Referring Image Segmentation Using Transformers
- Paper: https://arxiv.org/abs/2203.16768
- Code: None
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
-
Homepage: https://lukashoel.github.io/stylemesh/
Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon
LAS-AT: Adversarial Training with Learnable Attack Strategy
- Paper(Oral): https://arxiv.org/abs/2203.06616
- Code: https://github.com/jiaxiaojunQAQ/LAS-AT
Weakly Supervised Object Localization as Domain Adaption
Exploiting Temporal Relations on Radar Perception for Autonomous Driving
- Paper: https://arxiv.org/abs/2204.01184
- Code: None
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Deep Rectangling for Image Stitching: A Learning Baseline
-
Paper(Oral): https://arxiv.org/abs/2203.03831
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings
- Paper: https://arxiv.org/abs/2104.13450
- Code: None
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
- Code: https://github.com/SvipRepetitionCounting/TransRAC
Collaborative Transformers for Grounded Situation Recognition
Unseen Classes at a Later Time? No Problem
- Paper: https://arxiv.org/abs/2203.16517
- Code: https://github.com/sumitramalagi/Unseen-classes-at-a-later-time
It's About Time: Analog Clock Reading in the Wild
- Homepage: https://charigyang.github.io/abouttime/
- Paper: https://arxiv.org/abs/2111.09162
- Code: https://github.com/charigyang/itsabouttime
- Demo: https://youtu.be/cbiMACA6dRc
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- Paper: https://arxiv.org/abs/2112.02306
- Code: None
Kubric: A scalable dataset generator
- Paper: https://arxiv.org/abs/2203.03570
- Code: https://github.com/google-research/kubric
- 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg
Scribble-Supervised LiDAR Semantic Segmentation
Deep Rectangling for Image Stitching: A Learning Baseline
- Paper(Oral): https://arxiv.org/abs/2203.03831
- Code: https://github.com/nie-lang/DeepRectangling
- Dataset: https://github.com/nie-lang/DeepRectangling
- 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
- Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/
- Paper: https://arxiv.org/abs/2204.02389
- Dataset: https://github.com/rhgao/ObjectFolder
- Demo:https://youtu.be/e5aToT3LkRA
Shape from Polarization for Complex Scenes in the Wild
- Homepage: https://chenyanglei.github.io/sfpwild/index.html
- Paper: https://arxiv.org/abs/2112.11377
- Code: https://github.com/ChenyangLEI/sfp-wild
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
- Paper(Oral): https://arxiv.org/abs/2204.01018
- Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
- Code: https://github.com/SvipRepetitionCounting/TransRAC
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
- Paper(Oral): https://arxiv.org/abs/2204.03646
- Dataset: https://github.com/xujinglin/FineDiving
- Code: https://github.com/xujinglin/FineDiving
- 中文解读:https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg
Aesthetic Text Logo Synthesis via Content-aware Layout Inferring
- Paper: https://arxiv.org/abs/2204.02701
- Dataset: https://github.com/yizhiwang96/TextLogoLayout
- Code: https://github.com/yizhiwang96/TextLogoLayout
论文下载链接:
Language-based Video Editing via Multi-Modal Multi-Level Transformer
- Paper: https://arxiv.org/abs/2104.01122
- Code: None
It's About Time: Analog Clock Reading in the Wild
- Homepage: https://charigyang.github.io/abouttime/
- Paper: https://arxiv.org/abs/2111.09162
- Code: https://github.com/charigyang/itsabouttime
- Demo: https://youtu.be/cbiMACA6dRc
Splicing ViT Features for Semantic Appearance Transfer
- Homepage: https://splice-vit.github.io/
- Paper: https://arxiv.org/abs/2201.00424
- Code: https://github.com/omerbt/Splice
Kubric: A scalable dataset generator
- Paper: https://arxiv.org/abs/2203.03570
- Code: https://github.com/google-research/kubric
- 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Balanced MSE for Imbalanced Visual Regression
- Paper(Oral): https://arxiv.org/abs/2203.16427
- Code: https://github.com/jiawei-ren/BalancedMSE
SNUG: Self-Supervised Neural Dynamic Garments
- Homepage: http://mslab.es/projects/SNUG/
- Paper(Oral): https://arxiv.org/abs/2204.02219
- Code: https://github.com/isantesteban/snug
Shape from Polarization for Complex Scenes in the Wild
- Homepage: https://chenyanglei.github.io/sfpwild/index.html
- Paper: https://arxiv.org/abs/2112.11377
- Code: https://github.com/ChenyangLEI/sfp-wild
LASER: LAtent SpacE Rendering for 2D Visual Localization
- Paper(Oral): https://arxiv.org/abs/2204.00157
- Code: None
Single-Photon Structured Light
- Paper(Oral): https://arxiv.org/abs/2204.05300
- Code: None
3DeformRS: Certifying Spatial Deformations on Point Clouds
- Paper: https://arxiv.org/abs/2204.05687
- Code: None
Aesthetic Text Logo Synthesis via Content-aware Layout Inferring
- Paper: https://arxiv.org/abs/2204.02701
- Dataset: https://github.com/yizhiwang96/TextLogoLayout
- Code: https://github.com/yizhiwang96/TextLogoLayout
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Robust and Accurate Superquadric Recovery: a Probabilistic Approach