CVPR2022论文列表（中英对照）

时间：2022-08-27 15:00:00 exact传感器电子变压器点胶机 100pin矩形连接器电连接器接触件的固定结构旋转电位器r097g 直通组装式连接器

Cascade Transformers for End-to-End Person Search用于端到端人员搜索的级联变压器
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning结构化变分跨图对应学习的组合时间基础
Long-Tailed Recognition via Weight Balancing通过权重平衡识别长尾
InfoGCN: Representation Learning for Human Skeleton-based Action RecognitionInfoGCN：基于人体骨骼的动作识别表示学习
Interactive Geometry Editing of Neural Radiance Fields神经辐射场的交互式几何编辑
MLSLT: Towards Multilingual Sign Language TranslationMLSLT：迈向多语言手语翻译
360MonoDepth: High-Resolution 360° Monocular Depth Estimation360MonoDepth：高分辨率 360° 单目深度估计
Generating Diverse and Natural 3D Human Motions from textual descriptions文本生成多样化和自然 3D 人体运动
Masked-attention Mask Transformer for Universal Image Segmentation用于通用图像分割 Masked-attention Mask Transformer
Pointly-Supervised Instance Segmentation点监督实例分割
A Closer Look at Few-shot Image Generation近距离观察少镜头图像的产生
Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation学习多人姿势估计的局部-全局适应
Neural 3D Scene Reconstruction with the Manhattan-world Assumption基于曼哈顿世界假设的神经 3D 场景重建
Masked Autoencoders Are Scalable Vision Learners蒙面自动编码器是可扩展的视觉学习者
De-rendering 3D Objects in the Wild在野外渲染 3D 对象
Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction直接元素网格优化：辐射场重建超快收敛
Finding Badly Drawn Bunnies寻找画得不好的兔子
GradViT: Gradient Inversion of Vision TransformersGradViT：视觉变压器的梯度反转
On the Importance of Asymmetry for Siamese Representation Learning不对称对连体表示学习的重要性
Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation无偏场景图生成的堆叠混合注意力和小组合作学习
Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks学习遥感任务的自监督材料和纹理
Rethinking Efficient Lane Detection via Curve Modeling通过曲线建模重新思考高效车道检测
StyleT2I: Toward Compositional and High-Fidelity Text-to-Image SynthesisStyleT2I：走向组合和高保真文本到图像合成
Learning Fair Classifiers with Partially Annotated Group Labels学习一些注释组标签的公平分类器
Demystifying the Neural Tangent Kernel from a Practical Perspective: Can it be trusted for Neural Architecture Search without training?从实用的角度揭开神经切线核心的神秘面纱：你能信任神经架构搜索而无需训练吗？
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis用于 3D 医学图像分析 Swin Transformers 自我监督预训练
A ConvNet for the 2020s2020 卷积网络年代
Consistent 3D Scene Stylization as Stylized NeRF via 2D-3D Mutual Learning通过 2D-3D 相互学习将是一致的 3D 场景风格化为风格化 NeRF
Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast从像素到原型对比的弱监督语义分割
Connecting the Complementary-view Videos: Joint Camera Identification and Subject Association连接互补视图视频：联合相机识别和主题关联
Decoupled Knowledge Distillation蒸馏解耦知识
Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation从未匹配图像到图像转换的最大空间扰动一致性
Compound Domain Generalization via Meta-Knowledge Encoding基于元知识编码的复合域泛化
Bilateral Video Magnification Filter双边视频放大滤镜
EDTER: Edge Detection with TransformerEDTER：使用 Transformer 边缘检测
Structure-Aware Motion Transfer with Deformable Anchor Model具有可变锚模型的结构感知运动传输
Attentive Fine-Grained Structured Sparsity for Image Restoration用于图像恢复的细粒度结构稀疏
Sign Language Video Retrieval with Free-Form Textual Queries手语视频检索有自由格式文本查询
SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted SystemsSplitNets：神经架构是为头戴系统上的高效分布式计算设计的
Neural Mean Discrepancy for Efficient Out-of-Distribution Detection用于有效分布外检测的神经平均差异
LAKe-Net: Topology-Aware Point Cloud Completion by Localizing Aligned KeypointsLAKe-Net：拓扑感知点云通过定位对齐的关键点完成
Focal and Global Knowledge Distillation for Detectors探测器的焦点和全球知识蒸馏
Enhancing Adversarial Robustness for Deep Metric Learning加强对抗鲁棒性的深度学习
Novel Class Discovery in Semantic Segmentation语义分割中的新发现
IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding AlignmentIDEA-Net：动态通过深度嵌入对齐 3D 点云插值
WarpingGAN:Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation为对抗性 3D 点云产生扭曲多个均匀的先验
Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection基于自动编码器的分布外重新思考和重构检测
HyperDet3D: Learning a Scene-Conditioned 3D Object DetectorHyperDet3D：基于场景学习 3D 物体检测器
Deep Decomposition for Stochastic Normal-Abnormal Transport随机正常-异常传输的深度分解
Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production大规模手语：学习为大规模逼真手语制作共同发音标志
Self-supervised Video Transformers监控视频转换器
HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional ImagingHLRTF：多维成像中逆问题分层低秩张量分解
φ-SfT: Shape-frm-Template with a Physics-based Deformation Modelφ-SfT：具有基于物理的变形模型的模板形状
Boosting View Synthesis with Residual Transfer使用残差转移促进视图合成
DINE: Domain Adaptation from Single and Multiple Black-box PredictorsDINE：来自单个和多个黑盒预测器的域适应
Occluded Human Mesh Recovery遮挡人体网格恢复
Understanding Uncertainty Maps in Vision with Statistical Testing通过统计测试了解视觉中的不确定性图
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets在分析汇集的神经影像数据集时，等方差允许处理多个讨厌的变量
Learning from Pixel-Level Label Noise: A New Perspective for Light Field Salient Object Detection从像素级标签噪声中学习：光场显着目标检测的新视角
Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation with Reliable Voted Pseudo Labels具有可靠投票伪标签的点云域自适应的自监督全局-局部结构建模
Towards An End-to-End Framework for Flow-Guided Video Inpainting面向流引导视频修复的端到端框架
E-CIR: Event-Enhanced Continuous Intensity RecoveryE-CIR：事件增强的连续强度恢复
Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization using Satellite Image超越跨视图图像检索：使用卫星图像进行高度准确的车辆定位
Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers具有多视图 Cosegmentation 和 Clustering Transformers 的无监督分层语义分割
Forward Propagation, Backward Regression and Pose Association for Hand Tracking in the Wild野外手部追踪的前向传播、后向回归和姿势关联
FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in VideosFERV39k：用于视频中面部表情识别的大规模多场景数据集
Efficient Neural Radiance Fields高效的神经辐射场
Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurementsRobust Equivariant Imaging：一个完全无监督的框架，用于从噪声和部分测量中学习图像
HumanNeRF: Efficiently Generated Human Radiance Field from Sparse InputsHumanNeRF：从稀疏输入高效生成人体辐射场
Attributable Visual Similarity Learning可归因的视觉相似性学习
Efficient Multi-view Stereo by Iterative Dynamic Cost Volume通过迭代动态成本量实现高效的多视图立体
Replacing Labeled Real-image Datasets with Auto-generated Contours用自动生成的轮廓替换标记的真实图像数据集
SOMSI: Spherical Novel View Synthesis with Soft Occlusion Multi-Sphere ImagesSOMSI：具有软遮挡多球面图像的球面新视图合成
AutoSDF: Shape Priors for 3D Completion, Reconstruction, and GenerationAutoSDF：用于 3D 完成、重建和生成的形状先验
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio DescriptionsMAD：电影音频描述视频语言基础的可扩展数据集
PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image DecompositionPIE-Net：用于内在图像分解的光度不变边缘引导网络
DST: Dynamic Substitute Training for Data-free Black-box AttackDST：无数据黑盒攻击的动态替代训练
HCSC: Hierarchical Contrastive Selective CodingHCSC：分层对比选择性编码
Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis迈向多样化和自然的场景感知 3D 人体运动合成
Inertia-Guided Flow Completion and Style Fusion for Video Inpainting用于视频修复的惯性引导流完成和样式融合
PlaneMVS: 3D Plane Reconstruction from Multi-View StereoPlaneMVS：从多视图立体重建 3D 平面
Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance FieldsRef-NeRF：神经辐射场的结构化视图相关外观
Interactiveness Field of Human-Object Interactions人与物交互的交互领域
Learning Memory-Augmented Unidirectional Metrics for Cross-modality Person Re-identification学习用于跨模态人员重新识别的记忆增强单向度量
Event-based Video Reconstruction via Potential-assisted Spiking Neural Network通过电位辅助尖峰神经网络进行基于事件的视频重建
SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object DetectionSIGMA：用于域自适应对象检测的语义完整图匹配
Surface Reconstruction from Point Clouds by Learning Predictive Context Priors通过学习预测上下文先验从点云重建表面
Active Teacher for Semi-Supervised Object Detection半监督目标检测的主动教师
Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning非示例类增量学习的自我维持表示扩展
RCL: Recurrent Continuous Localization for Temporal Action DetectionRCL：用于时间动作检测的循环连续定位
GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction with Relational ReasoningGroupNet：使用关系推理进行轨迹预测的多尺度超图神经网络
SPAMs: Structured Implicit Parametric Models垃圾邮件：结构化隐式参数模型
A Keypoint-based Global Association Network for Lane Detection基于关键点的车道检测全球关联网络
Weakly Supervised Semantic Segmentation using Out-of-Distribution Data使用分布外数据的弱监督语义分割
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and AlignmentBasicVSR++：通过增强的传播和对齐提高视频超分辨率
Investigating Tradeoffs in Real-World Video Super-Resolution调查现实世界视频超分辨率的权衡
OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object InteractionOakInk：用于理解手物交互的大型知识库
Bending Graphs: Hierarchical Shape Matching using Gated Optimal Transport弯曲图：使用门控最优传输的分层形状匹配
The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization规范必须继续：通过规范化进行动态无监督域适应
SimT: Handling Open-set Noise for Domain Adaptive Semantic SegmentationSimT：处理域自适应语义分割的开放集噪声
Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation用于引用视频对象分割的语言桥接时空交互
Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification基于图采样的深度度量学习用于可泛化的人员重新识别
Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion基于运动不确定性扩散的随机轨迹预测
Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation半监督语义分割的无偏子类正则化
Stratified Transformer for 3D Point Cloud Segmentation用于 3D 点云分割的分层变压器
Cloning Outfits from Real-World Images to 3D Characters for Generalizable Person Re-Identification将现实世界图像中的服装克隆为 3D 角色以进行可概括的人物重新识别
ImplicitAtlas: Learning Deformable Shape Templates in Medical ImagingImplicitAtlas：学习医学成像中的可变形形状模板
Sparse Instance Activation for Real-Time Instance Segmentation实时实例分割的稀疏实例激活
Pastiche Master: Exemplar-Based High-Resolution Portrait Style TransferPastiche Master：基于示例的高分辨率肖像风格转移
Unsupervised Image-to-Image Translation with Generative Prior具有生成先验的无监督图像到图像翻译
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation用于协同语音手势生成的学习分层跨模式关联
Versatile Multi-Modal Pre-Training for Human-Centric Perception用于以人为中心的感知的多功能多模态预训练
Instance-wise Occlusion and Depth Orders in Natural Scenes自然场景中的实例遮挡和深度顺序
Degradation-agnostic Correspondence from Resolution-asymmetric Stereo来自分辨率非对称立体声的与退化无关的对应
No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces没有痛苦，收获很大：通过拟合特征级时空表面，用静态模型对动态点云序列进行分类
Multi-Dimensional with Intensity: A Crowd-sourced Method for Measuring the Perception of Facial Expression具有强度的多维：一种用于测量面部表情感知的众包方法
Class-Incremental Learning with Strong Pretrained Models具有强预训练模型的类增量学习
A Patch-centric Error Analysis of Image Super-Resolution图像超分辨率的以块为中心的误差分析
IFOR: Iterative Flow Minimization for Robotic Object RearrangementIFOR：机器人对象重排的迭代流最小化
3D-aware Image Synthesis via Learning Structural and Textural Representations通过学习结构和纹理表示进行 3D 感知图像合成
DeeCap: Dynamic Early Exiting for Efficient Image CaptioningDeeCap：用于高效图像字幕的动态提前退出
GAN-Supervised Dense Visual AlignmentGAN监督的密集视觉对齐
Multilayer GAN Inversion and Editing多层 GAN 反转和编辑
On Aliased Resizing and Surprising Subtleties in GAN Evaluation关于 GAN 评估中的别名调整大小和令人惊讶的细微之处
Learning Pixel Trajectories with Multiscale Contrastive Random Walks使用多尺度对比随机游走学习像素轨迹
Comparing Correspondences: Video Prediction with Correspondences-wise Losses比较对应：视频预测与对应损失
Mix and Localize: Localizing Sound Sources from Mixtures混音和本地化：从混音中本地化声源
AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D PerceptionAziNorm：利用点云的径向对称性进行方位归一化 3D 感知
Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-time用于实时动态辐射场渲染的傅里叶 PlenOctrees
Point Cloud Pre-training with Natural 3D Structures使用自然 3D 结构进行点云预训练
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding将更多注意力转移到视觉骨干上：用于端到端视觉基础的查询调制细化网络
Video K-Net: A Simple, Strong, and Unified Baseline for Video SegmentationVideo K-Net：一个简单、强大、统一的视频分割基线
Mr.BiQ: Post-Training Non-Uniform Quantization based on Minimizing the Reconstruction ErrorMr.BiQ：基于最小化重构误差的训练后非均匀量化
Drop the GAN: In Defense of Patches Nearest Neighbors as Single Image Generative Models放弃 GAN：保护最近邻的补丁作为单图像生成模型
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video RecognitionMeMViT：用于高效长期视频识别的记忆增强多尺度视觉转换器
MS-TCT: Multi-Scale Temporal ConvTransformer for Action DetectionMS-TCT：用于动作检测的多尺度时间 ConvTransformer
Reversible Vision Transformers可逆视觉变形金刚
RigNeRF: Fully Controllable Neural 3D PortraitsRigNeRF：完全可控的神经 3D 肖像
Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation重新思考多视图立体的深度估计：统一表示
Integrative Few-Shot Learning for Classification and Segmentation用于分类和分割的集成少样本学习
Learning Affordance Grounding from Exocentric Images从离中心图像中学习可供性基础
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection用于通用事件边界检测的多级密集差分图的渐进式注意
Exploring Geometry Consistency for monocular 3D object detection探索单目 3D 对象检测的几何一致性
Visual Abductive Reasoning视觉溯因推理
Putting People in their Place: Monocular Regression of 3D People in Depth把人放在他们的位置上：3D 人物深度的单目回归
Exploiting Explainable Metrics for Augmented SGD利用增强 SGD 的可解释指标
Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation重新思考用于半监督体积医学图像分割的贝叶斯深度学习方法
A Hybrid Quantum-Classical Algorithm for Robust Fitting一种用于鲁棒拟合的混合量子经典算法
Dataset Distillation by Matching Training Trajectories通过匹配训练轨迹进行数据集蒸馏
DiLiGenT10^2: A Photometric Stereo Benchmark Dataset with Controlled Shape and Material VariationDiLiGenT10^2：具有受控形状和材料变化的光度立体基准数据集
Scene Representation Transformer场景表示转换器
ConDor: Self-Supervised Canonicalization of 3D Pose for Partial ShapesConDor：部分形状的 3D 姿势的自我监督规范化
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion学习倾听：非确定性二元面部运动建模
Injecting Visual Concepts into End-to-End Image Captioning将视觉概念注入端到端的图像字幕
Learning Neural Light Fields with Ray-Space Embedding Networks使用光线空间嵌入网络学习神经光场
What’s in your hands? 3D Reconstruction of Generic Objects in Hands你手里有什么？手中通用对象的 3D 重建
Virtual Correspondences: Human as a Cue for Extreme-View Geometry虚拟通信：人类作为极端视图几何的线索
Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering通过联合表示学习和在线聚类进行无监督活动分割
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation RecognitionTransRank：通过基于排名的转换识别进行自监督视频表示学习
SketchEdit: Mask-Free Local Image Manipulation with Partial SketchesSketchEdit：使用部分草图进行无蒙版局部图像处理
GroupViT: Zero-Shot Transfer to Semantic Segmentation with Text SupervisionGroupViT：零样本转移到带有文本监督的语义分割
LSVC: A Learning-based Stereo Video Compression FrameworkLSVC：基于学习的立体视频压缩框架
BEHAVE: Dataset and Method for Tracking Human Object InteractionsBEHAVE：跟踪人类对象交互的数据集和方法
Learning to Align Sequential Actions in the Wild在野外学习对齐顺序动作
Motion-from-Blur: 3D Shape and Motion Estimation of Motion-blurred Objects in VideosMotion-from-Blur：视频中运动模糊对象的 3D 形状和运动估计
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction通过学习的物理模拟和功能预测修复故障对象
Simulated Adversarial Testing of Face Recognition Models人脸识别模型的模拟对抗测试
GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping目标：为手物体抓取生成 4D 全身运动
Ensembling Off-the-shelf Models for GAN Training为 GAN 训练集成现成模型
Global Tracking Transformers全球追踪变形金刚
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline可见热无人机跟踪：大规模基准和新基线
Joint Global and Local Hierarchical Priors for Learned Image Compression用于学习图像压缩的联合全局和局部分层先验
D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object InteractionsD-Grasp：用于手物交互的物理上合理的动态抓取合成
Human-Aware Object Placement for Visual Environment Reconstruction用于视觉环境重建的人类感知对象放置
Dual-path Image Inpainting with Auxiliary GAN Inversion具有辅助 GAN 反转的双路径图像修复
Accurate 3D Body Shape Regression using Metric and Semantic Attributes使用度量和语义属性进行准确的 3D 身体形状回归
BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed InformationBARC：通过利用品种信息学习从图像中回归 3D 狗形状
Capturing and Inferring Dense Full-Body Human-Scene Contact捕获和推断密集的全身人体场景接触
Not All Labels Are Equal: Rationalizing The Labeling Costs for Training Object Detection并非所有标签都是平等的：合理化训练对象检测的标签成本
Background Activation Suppression for Weakly Supervised Object Localization弱监督目标定位的背景激活抑制
Attribute Group Editing for Reliable Few-shot Image Generation属性组编辑用于可靠的少镜头图像生成
Negative-aware Attention for Image-Text Matching图像-文本匹配的负意识注意
Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects看它移动：无监督地发现 3D 关节以重新定位铰接物体
TransWeather: Transformer-based Restoration of Images Degraded by Adverse Weather ConditionsTransWeather：基于变压器的恶劣天气条件下图像的恢复
HyperTransformer: A Textural and Spectral Feature Fusion Transformer for PansharpeningHyperTransformer：用于全色锐化的纹理和光谱特征融合转换器
gDNA: Towards Generative Detailed Neural AvatarsgDNA：迈向生成详细的神经化身
CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural HomeomorphismCaDeX：通过神经同胚学习动态表面表示的规范变形坐标空间
BACON: Band-limited Coordinate Networks for Multiscale Scene RepresentationBACON：用于多尺度场景表示的带限坐标网络
Revisiting Near/Remote Sensing with Geospatial Attention用地理空间注意力重新审视近/遥感
Simple multi-dataset detection简单的多数据集检测
Generalizable Cross-modality Medical Image Segmentation via Style Augmentation and Dual Normalization通过风格增强和双重归一化的可泛化跨模态医学图像分割
Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation用于 LiDAR 语义分割的点到体素知识蒸馏
Online Convolutional Re-parameterization在线卷积重新参数化
Neural Inertial Localization神经惯性定位
MNSRNet: Multimodal Transformer Network for 3D Surface Super-ResolutionMNSRNet：用于 3D 表面超分辨率的多模态变压器网络
Unsupervised Pre-training for Temporal Action Localization Tasks时间动作定位任务的无监督预训练
Augmented Geometric Distillation for Data-Free Incremental Person ReID无数据增量人员 ReID 的增强几何蒸馏
HEAT: Holistic Edge Attention Transformer for Structured ReconstructionHEAT：用于结构化重建的整体边缘注意力转换器
NomMer: Nominate Synergistic Context in Vision Transformer for Visual RecognitionNomMer：在视觉转换器中为视觉识别指定协同上下文
ContrastMask: Contrastive Learning to Segment Every ThingContrastMask：对比学习来分割每一件事
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression用于高效神经图像压缩的统一多元高斯混合
CoordGAN: Self-Supervised Dense Correspondences Emerge from GANsCoordGAN：来自 GAN 的自我监督密集通信
MAT: Mask-Aware Transformer for Large Hole Image InpaintingMAT：用于大孔图像修复的掩模感知变压器
A Comprehensive Study of End-to-End Temporal Action Detection端到端时间动作检测的综合研究
Rethinking Image Cropping: Exploring Diverse Compositions from Global Views重新思考图像裁剪：从全局视图中探索多样化的构图
OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D ReconstructionOcclusionFusion：实时动态 3D 重建的遮挡感知运动估计
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose EstimationMHFormer：用于 3D 人体姿势估计的多假设变换器
Asynchronous Event-based Graph-Neural Networks基于异步事件的图神经网络
RAMA: A Rapid Multicut Algorithm on GPURAMA：GPU 上的快速多切算法
EvUnroll: Neuromorphic Events based Rolling Shutter Image CorrectionEvUnroll：基于神经形态事件的滚动快门图像校正
Cycle-Consistent Counterfactuals by Latent Transformations潜在变换的循环一致反事实
Understanding 3D Object Articulation in Internet Videos了解互联网视频中的 3D 对象衔接
Synthetic Generation of Face Videos with Plethysmograph Physiology用体积描记器生理学合成人脸视频
MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object DetectionMonoJSG：单目 3D 对象检测的联合语义和几何成本量
Neural Architecture Search with Representation Mutual Information具有表示互信息的神经架构搜索
Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning基于高斯的对比建议学习的弱监督时间句子接地
Blind2Unblind: Self-Supervised Image Denoising with Visible Blind SpotsBlind2Unblind：具有可见盲点的自我监督图像去噪
Semi-Supervised Object Detection via Multi-instance Alignment with Global Class Prototypes基于全局类原型的多实例对齐的半监督目标检测
Fine-Grained Predicates Learning for Scene Graph Generation用于场景图生成的细粒度谓词学习
Meta Distribution Alignment for Generalizable Person Re-Identification可泛化人员重新识别的元分布对齐
Align Representations with Base: A New Approach to Self-Supervised Learning将表示与基础对齐：一种自我监督学习的新方法
Style-Based Global Appearance Flow for Virtual Try-On基于样式的虚拟试穿全局外观流程
Learning Semantic Associations for Mirror Detection学习镜像检测的语义关联
Task Decoupled Framework for Reference-based Super-Resolution基于参考的超分辨率的任务解耦框架
Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement超越语义到实例分割：通过语义知识转移和自我完善的弱监督实例分割
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction用于高效高光谱图像重建的掩模引导光谱变换器
GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic CamerasGLAMR：使用动态相机进行全局遮挡感知人体网格恢复
Fast and Unsupervised Action Boundary Detection for Action Segmentation用于动作分割的快速且无监督的动作边界检测
Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture神经 MoCon：用于物理上合理的人体运动捕捉的神经运动控制
Unified Transformer Tracker for Object Tracking用于对象跟踪的统一 Transformer Tracker
NeuralHOFusion: Neural Volumetric Rendering under Human-object InteractionsNeuralHOFusion：人机交互下的神经体积渲染
H $^2$ FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object DetectionH $^2$ FA R-CNN：跨域弱监督目标检测的整体和分层特征对齐
ICON: Implicit Clothed humans Obtained from Normals图标：从法线获得的隐式穿衣人类
Semantic-Aware Domain Generalized Segmentation语义感知领域广义分割
ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose EstimationZebraPose：用于 6DoF 对象姿态估计的粗到细表面编码
Detecting Deepfakes with Self-Blended Images使用自混合图像检测 Deepfake
Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization任意风格迁移和域泛化的精确特征分布匹配
FreeSOLO: Learning to Segment Objects without AnnotationsFreeSOLO：学习在没有注释的情况下分割对象
Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage通过生成梯度泄漏审计联邦学习中的隐私防御
Differentially Private Federated Learning with Local Regularization and Sparsification局部正则化和稀疏化的差分私有联邦学习
Modeling 3D Layout For Group Re-Identification为组重新识别建模 3D 布局
DASO: Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised LearningDASO：不平衡半监督学习的面向分布的语义导向伪标签
Structured Local Radiance Fields for Human Avatar Modeling用于人体化身建模的结构化局部辐射场
Contrastive Regression for Domain Adaptation on Gaze Estimation凝视估计领域适应的对比回归
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition用于半监督动作识别的跨模型伪标签
Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification联合分布问题：Few-Shot 分类的深度布朗距离协方差
Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation树能量损失：走向稀疏注释的语义分割
Learning Second Order Local Anomaly for General Face Forgery Detection学习用于一般人脸伪造检测的二阶局部异常
LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer NetworkLGT-Net：使用几何感知变压器网络进行室内全景房间布局估计
Audio-Adaptive Activity Recognition Across Video Domains跨视频域的音频自适应活动识别
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective迈向稳健和自适应运动预测：因果表示视角
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos以自我为中心的视频的联合手部运动和交互热点预测
Omnivore: A Single Model for Many Visual Modalities杂食动物：多种视觉形式的单一模型
Multi-Frame Self-Supervised Depth with Transformers带有变形金刚的多帧自监督深度
Voice-Face Homogeneity Tells Deepfake声脸同质性告诉 Deepfake
Representation Compensation Networks for Continual Semantic Segmentation连续语义分割的表示补偿网络
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation弥合在视觉和语言导航的离散和连续环境中学习之间的差距
FLAVA: A Foundational Language And Vision Alignment ModelFLAVA：基础语言和视觉对齐模型
Vision Prompt Tuning视觉提示调整
Vehicle trajectory prediction works, but not everywhere车辆轨迹预测有效，但并非无处不在
Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification用于孤立摄像机监督人员重新识别的摄像机条件稳定特征生成
ReSTR: Convolution-free Referring Image Segmentation Using TransformersReSTR：使用 Transformers 进行无卷积的参考图像分割
DATA: Domain-Aware and Task-Aware Self-supervised Learning数据：领域感知和任务感知自监督学习
Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval无忧素描：基于素描的抗噪图像检索
Balanced MSE for Imbalanced Visual Regression用于不平衡视觉回归的平衡 MSE
The Devil Is in the Details: Window-based Attention for Image Compression细节中的魔鬼：图像压缩的基于窗口的注意力
DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in VideosDeltaCNN：视频中稀疏帧差异的端到端 CNN 推断
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud UnderstandingCrossPoint：用于 3D 点云理解的自监督跨模态对比学习
Video Frame Interpolation Transformer视频帧插值转换器
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling通过鲁棒的跨模态伪标签进行开放词汇实例分割
LASER: LAtent SpacE Rendering for 2D Visual LocalizationLASER：用于 2D 视觉定位的潜在空间渲染
LaTr: Layout-Aware Transformer for Scene-Text VQALaTr：用于场景文本 VQA 的布局感知转换器
Universal Photometric Stereo Network using Global Lighting Contexts使用全局光照上下文的通用光度立体网络
Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training引导 ViT：从预训练中解放视觉变形金刚
Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models随机反向传播：一种用于训练视频模型的内存高效策略
Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic MemoryBailando：具有编排记忆的演员评论家 GPT 的 3D 舞蹈生成
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis用于 3D 感知图像合成的多视图一致生成对抗网络
AdaViT: Adaptive Tokens for Efficient Vision TransformerAdaViT：高效视觉转换器的自适应令牌
Neural Template: Topology-aware Reconstruction and Disentangled Generation of 3D Meshes神经模板：拓扑感知重建和解缠结生成 3D 网格
CRAFT: Cross-Attentional Flow Transformer for Robust Optical FlowCRAFT：用于鲁棒光流的跨注意力流转换器
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition用于基于 RGB-D 的运动识别的时空表示的解耦和重新耦合
Cross-Modal Transferable Adversarial Attacks from Images to Videos从图像到视频的跨模态可转移对抗攻击
PTTR: Relational 3D Point Cloud Object Tracking with TransformerPTTR：使用 Transformer 进行关系 3D 点云对象跟踪
Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds点云的变形和对应感知无监督合成到真实场景流估计
Lifelong Unsupervised Domain Adaptive Person Re-identification with Coordinated Anti-forgetting and Adaptation具有协同抗遗忘和适应能力的终身无监督域自适应人重新识别
Object Localization under Single Coarse Point Supervision单粗点监督下的目标定位
Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation通过代表性片段知识传播的弱监督时间动作本地化
TubeDETR: Spatio-Temporal Video Grounding with TransformersTubeDETR：使用变压器的时空视频接地
Reinforced Structured State-Evolution for Vision-Language Navigation用于视觉语言导航的强化结构化状态演化
Learning to Anticipate Future with Dynamic Context Removal通过动态上下文删除学习预测未来
Learning Program Representations for Food Images and Cooking Recipes食物图像和烹饪食谱的学习计划表示
Transferability Estimation using Bhattacharyya Class Separability使用 Bhattacharyya 类可分离性的可迁移性估计
LiDAR Snowfall Simulation for Robust 3D Object Detection用于稳健 3D 对象检测的 LiDAR 降雪模拟
Masked Feature Prediction for Vision Self-Supervised Pre-Training视觉自监督预训练的掩蔽特征预测
Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based DetectorsUnbiased Teacher v2：无锚和基于锚的检测器的半监督目标检测
Shape from Polarization for Complex Scenes in the Wild野外复杂场景的极化形状
PhotoScene: Physically-Based Material and Lighting Transfer for Indoor ScenesPhotoScene：室内场景的基于物理的材质和照明传输
Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization通过节点到邻域互信息最大化的图中节点表示学习
Selective-Supervised Contrastive Learning with Noisy Labels带有噪声标签的选择性监督对比学习
LAVT: Language-Aware Vision Transformer for Referring Image SegmentationLAVT：用于参考图像分割的语言感知视觉转换器
L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic SegmentationL2G：用于弱监督语义分割的简单本地到全球知识转移框架
TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial EditingTransEditor：用于高度可控面部编辑的基于变换器的双空间 GAN
Leveraging Self-Supervision for Cross-Domain Crowd Counting利用自我监督进行跨域人群计数
Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency从未修剪的视频中学习：具有分层一致性的自我监督视频表示学习
TimeReplayer: Unlocking the Potential of Event Cameras for Video InterpolationTimeReplayer：释放事件摄像机用于视频插值的潜力
Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation弱监督语义分割的自监督图像特定原型探索
Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation用于域自适应语义分割的类平衡像素级自标记
Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences弱监督语义对应的概率扭曲一致性
DIFNet: Boosting Visual Information Flow for Image CaptioningDIFNet：提升图像字幕的视觉信息流
ScaleNet: A Shallow Architecture for Scale EstimationScaleNet：一种用于规模估计的浅层架构
HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static ImagesHODOR：用于从静态图像中学习的视频中对象重新分割的高级对象描述符
Density-preserving Deep Point Cloud Compression保密度深点云压缩
Exploring Dual-task Correlation for Pose Guided Person Image Generation探索姿势引导人物图像生成的双任务相关性
Exploring Endogenous Shift for Cross-domain Detection: A Large-scale Benchmark and Perturbation Suppression Network探索跨域检测的内生转移：大规模基准和扰动抑制网络
Transferability metrics for selecting Source Model Ensembles用于选择源模型集成的可迁移性指标
The Auto Arborist Dataset: A Large-Scale Benchmark for Multimodal Urban Forest Monitoring Under Domain ShiftAuto Arborist 数据集：域转移下多模式城市森林监测的大规模基准
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose EstimationEPro-PnP：用于单目物体姿态估计的广义端到端概率透视-n-点
Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection用于多模态 3D 目标检测的激光雷达相机深度融合
Learning from Temporal Gradient for Semi-supervised Action Recognition从时间梯度中学习半监督动作识别
JoinABLe: Learning Bottom-up Assembly of Parametric CAD JointsJoinABLe：学习参数化 CAD 关节的自下而上装配
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse MotionDanceTrack：统一外观和多样化运动中的多对象跟踪
Defensive Patches for Robust Recognition in the Physical World物理世界中强大识别的防御补丁
UniCoRN: A Unified Conditional Image Repainting NetworkUniCorN：一个统一的条件图像重绘网络
APES: Articulated Part Extraction from Sprite SheetsAPES：从 Sprite 表中提取关节部分
Learning Deep Implicit Functions for 3D Shapes with Dynamic Code Clouds使用动态代码云学习 3D 形状的深度隐式函数
Neural Rays for Occlusion-aware Image-based Rendering用于遮挡感知的基于图像的渲染的神经射线
DisARM: Displacement Aware Relation Module for 3D DetectionDisARM：用于 3D 检测的位移感知关系模块
A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration时间缝合节省九个：用于改进神经网络校准的训练时间正则化损失
RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape StructuresRIM-Net：用于分层形状结构无监督学习的递归隐式场
Weakly Supervised Object Localization as Domain Adaption作为域适应的弱监督对象定位
Reflash Dropout in Image Super-Resolution图像超分辨率中的闪退丢失
Semantic Segmentation by Early Region Proxy早期区域代理的语义分割
EyePAD++: A Distillation-based approach for joint Eye Authentication and Presentation Attack Detection using Periocular ImagesEyePAD++：一种基于蒸馏的方法，用于使用眼周图像进行联合眼睛身份验证和演示攻击检测
Online Learning of Reusable Abstract Models for Object Goal Navigation对象目标导航可重用抽象模型的在线学习
Time Microscope: Event-based Frame Interpolation with Parametric Non-linear Flow and Multi-scale Fusion时间显微镜：具有参数非线性流和多尺度融合的基于事件的帧插值
OSOP: A Multi-Stage One Shot Object Pose Estimation FrameworkOSOP：多阶段单镜头对象姿态估计框架
Localization Distillation for Dense Object Detection密集对象检测的定位蒸馏
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse InputsRegNeRF：对来自稀疏输入的视图合成的神经辐射场进行正则化
Cross-Image Relational Knowledge Distillation for Semantic Segmentation用于语义分割的跨图像关系知识蒸馏
Trustworthy Long-tailed Classification可信长尾分类
Episodic Memory Question Answering情景记忆问答
REX: Reasoning-aware and Grounded ExplanationREX：推理意识和扎根的解释
Query and Attention Augmentation for Knowledge-Based Explainable Reasoning基于知识的可解释推理的查询和注意力增强
LOLNerf: Learn from One LookLOLnerf：一目了然
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object InteractionsBongard-HOI：对人-物交互的少数镜头视觉推理进行基准测试
CoNeRF: Controllable Neural Radiance FieldsCoNeRF：可控神经辐射场
Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization SpaceVision Transformer Slimming：连续优化空间中的多维搜索
UnweaveNet: Unweaving Activity StoriessUnweaveNet：解开活动故事
MeMOT: Multi-Object Tracking with MemoryMeMOT：带内存的多对象跟踪
VisualHow: Multimodal Problem SolvingVisualHow：多模式问题解决
Affine Medical Image Registration with Coarse-to-Fine Vision Transformer使用粗到精视觉变压器的仿射医学图像配准
Unpaired Deep Image Deraining Using Dual Contrastive Learning使用双重对比学习的非配对深度图像去雨
DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image AnalysisDiRA：用于自我监督医学图像分析的判别性、恢复性和对抗性学习
Mask Transfiner for High-Quality Instance Segmentation用于高质量实例分割的 Mask Transfiner
GLASS: Geometric Latent Augmentation for Shape Spaces玻璃：形状空间的几何潜在增强
Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot LearningMAML 的全局收敛和受理论启发的神经架构搜索以进行 Few-Shot 学习
Multi-modal Extreme Classification多模态极端分类
CodedVTR: Codebook-Based Sparse Voxel Transformer in Geometric RegionsCodedVTR：几何区域中基于码本的稀疏体素变换器
Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity对语义相似性的频率驱动的不可察觉的对抗性攻击
Learning to Refactor Action and Co-occurrence Features for Temporal Action Localization学习重构动作和共现特征以进行时间动作定位
Self-augmented Unpaired Image Dehazing via Density and Depth Decomposition通过密度和深度分解的自增强非配对图像去雾
QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object DetectionQueryDet：用于加速高分辨率小目标检测的级联稀疏查询
Cross-modal Representation Learning for Zero-shot Action Recognition零样本动作识别的跨模态表示学习
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation非均匀到均匀量化：通过广义直通估计实现精确量化
AUV-Net: Learning Aligned UV Maps for Texture Transfer and SynthesisAUV-Net：学习用于纹理转移和合成的对齐 UV 贴图
Bijective Mapping Network for Shadow Removal阴影去除的双射映射网络
ObjectFormer for Image Manipulation Detection and Localization用于图像处理检测和定位的 ObjectFormer
GraFormer: Graph-oriented Transformer for 3D Pose EstimationGraFormer：用于 3D 姿势估计的面向图的 Transformer
Multi-Granularity Alignment Domain Adaptation for Object Detection用于目标检测的多粒度对齐域自适应
Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection用于长尾目标检测的自适应分层表示学习
Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial SensorsPhysical Inertial Poser (PIP)：来自稀疏惯性传感器的物理感知实时人体运动跟踪
3D Scene Painting via Semantic Image Synthesis通过语义图像合成进行 3D 场景绘画
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionMViTv2：用于分类和检测的改进的多尺度视觉转换器
One-bit Active Query with Contrastive Pairs具有对比对的一位主动查询
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object InteractionHOI4D：用于类别级人-物交互的 4D 以自我为中心的数据集
Leveraging Object-Level Rotation Equivariance for 3D Object Detection利用对象级旋转等方差进行 3D 对象检测
DenseCLIP: Language-Guided Dense Prediction with Context-Aware PromptingDenseCLIP：具有上下文感知提示的语言引导密集预测
JIFF: Jointly-aligned Implicit Face Function for High Fidelity Single View Clothed Human ReconstructionJIFF：用于高保真单视图着装人体重建的联合对齐隐式人脸函数
Prompt Distribution Learning快速分布学习
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped WindowsCSWin Transformer：具有十字形窗口的通用视觉变压器主干
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense CaptioningX-Trans2Cap：使用 Transformer 进行 3D 密集字幕的跨模式知识转移
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds超越 3D 连体跟踪：点云中 3D 单对象跟踪的以运动为中心的范式
Noisy Boundaries: Lemon or Lemonade for Semi-supervised Instance Segmentation?嘈杂的边界：半监督实例分割的柠檬还是柠檬水？
Interactive Image Synthesis with Panoptic Layout Generation具有全景布局生成的交互式图像合成
Learning to Find Good Models in RANSAC学习在 RANSAC 中寻找好的模型
Meta-attention for ViT-backed Continual LearningViT 支持的持续学习的元注意力
Deep Anomaly Discovery from Unlabeled Videos via Normality Advantage and Self-Paced Refinement通过常态优势和自定进度细化从未标记视频中发现深度异常
Improving neural implicit surfaces geometry with patch warping使用补丁变形改进神经隐式曲面几何
Rope3D: Take A New Look from the 3D Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection TaskRope3D：从用于自动驾驶和单目 3D 目标检测任务的 3D 路边感知数据集中重新审视
AME: Attention and Memory Enhancement in Hyper-Parameter OptimizationAME：超参数优化中的注意力和记忆增强
TopFormer: Token Pyramid Transformer for Mobile Semantic SegmentationTopFormer：用于移动语义分割的令牌金字塔转换器
Automated Progressive Learning for Efficient Training of Vision Transformers用于高效训练视觉转换器的自动渐进式学习
Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions重新审视 3D 对象姿势估计的模板：对新对象的泛化和对遮挡的鲁棒性
Towards Implicit Text-Guided 3D Shape Generation迈向隐式文本引导的 3D 形状生成
Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation手臂手动态估计的时空并行变压器
Revisiting skeleton-based action recognition重新审视基于骨架的动作识别
Mutual Quantization for Cross-Modal Search with Noisy Labels带有噪声标签的跨模态搜索的相互量化
Revisiting Temporal Alignment for Video Restoration重新审视视频恢复的时间对齐
Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation在野外学习多视图聚合以进行大规模 3D 语义分割
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural ActivitiesAssembly101：用于理解程序活动的大规模多视图视频数据集
Video Frame Interpolation with Transformer使用 Transformer 进行视频帧插值
Autofocus for Event Cameras事件相机的自动对焦
Event-based Direct Sparse Odometry基于事件的直接稀疏里程计
OpenTAL: Towards Open Set Temporal Action LocalizationOpenTAL：走向开放集时间动作本地化
Programmatic Concept Learning for Human Motion Description and Synthesis用于人体运动描述和合成的程序化概念学习
MAXIM: Multi-Axis MLP for Image ProcessingMAXIM：用于图像处理的多轴 MLP
Temporal Alignment Networks for Long-term Video长期视频的时间对齐网络
Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches自己涂鸦：通过绘制一些草图进行课堂增量学习
Registering Explicit to Implicit: Towards High-Fidelity Garment mesh Reconstruction from Single Images将显式注册到隐式：从单个图像实现高保真服装网格重建
Progressive End-to-End Object Detection in Crowded Scenes拥挤场景中的渐进式端到端对象检测
Object-aware Video-language Pre-training for Retrieval用于检索的对象感知视频语言预训练
Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection用于深度无监督显着性检测的多源不确定性挖掘
Surface Representation for Point Clouds点云的表面表示
Context-Aware Video Reconstruction for Rolling Shutter Cameras滚动快门相机的上下文感知视频重建
MonoScene: Monocular 3D Semantic Scene CompletionMonoScene：单目 3D 语义场景完成
Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts弱但深度监督的遮挡推理参数化道路布局
Point Cloud Color Constancy点云颜色恒常性
HDNet: High-resolution Dual-domain Learning for Spectral Compressive ImagingHDNet：光谱压缩成像的高分辨率双域学习
iPLAN: Interactive and Procedural Layout PlanningiPLAN：交互式和程序化布局规划
End-to-End Multi-Person Pose Estimation with Transformers使用变形金刚的端到端多人姿势估计
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation在鸡尾酒会上阅读聆听：多模态语音分离
Adversarial Eigen Attack on Black-Box Models对黑盒模型的对抗性特征攻击
Domain-Aware Representation Learning for Unsupervised Domain Generalization无监督域泛化的域感知表示学习
Sub-word Level Lip Reading With Visual Attention带有视觉注意的子词级唇读
Efficient Video Instance Segmentation via Tracklet Query and Proposal通过 Tracklet Query 和 Proposal 进行高效的视频实例分割
Towards cross-modal pose localization from text-based position descriptions从基于文本的位置描述迈向跨模态姿势定位
Opening up Open World Tracking开放开放世界追踪
Dynamic Clustering Mask Transformers for Panoptic Segmentation用于全景分割的动态聚类掩码转换器
Compressive Single-Photon 3D Cameras压缩单光子 3D 相机
Style-ERD: Responsive and Coherent Online Motion Style TransferStyle-ERD：响应式和连贯的在线运动风格转移
MixFormer: Mixing Features across Windows and DimensionsMixFormer：跨窗口和维度混合功能
Robust Image Forgery Detection over Online Social Network Shared Images基于在线社交网络共享图像的鲁棒图像伪造检测
Semantic-aligned Fusion Transformer for One-shot Object Detection用于一次性目标检测的语义对齐融合转换器
Long-term Video Frame Interpolation Via Feature Propagation通过特征传播的长期视频帧插值
Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation使用分层视觉语言知识蒸馏的开放词汇单阶段检测
GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI DetectionGEN-VLKT：简化关联并增强对 HOI 检测的交互理解
ETHSeg: An Amodel Instance Segmentation Network and a Real-world Dataset for X-Ray Waste InspectionETHSeg：用于 X 射线废物检测的 Amodel 实例分割网络和真实数据集
SEEG: Semantic Energized Co-speech Gesture GenerationSEEG：语义激励的协同语音手势生成
Instance-Dependent Label-Noise Learning With Manifold-Regularized Transition Matrix Estimation使用流形正则化转移矩阵估计的实例相关标签噪声学习
Acquiring a Dynamic Light Field through a Single-Shot Coded Image通过单次编码图像获取动态光场
How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting多少个观察就足够了？轨迹预测的知识蒸馏
FaceVerse: a Fine-grained and Detail-changeable 3D Neural Face Model from a Hybrid DatasetFaceVerse：来自混合数据集的细粒度和可更改细节的 3D 神经人脸模型
Learning Where to Learn in Cross-View Self-Supervised Learning在 Cross-View Self-Supervised Learning 中学习在哪里学习
Automatic Relation-aware Graph Network Proliferation自动关系感知图网络扩散
CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised LearningCoSSL：不平衡半监督学习的表示和分类器的共同学习
P3Depth: Monocular Depth Estimation with a Piecewise Planarity PriorP3Depth：具有分段平面先验的单目深度估计
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability知识蒸馏作为高效的预训练：更快的收敛、更高的数据效率和更好的可迁移性
En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot LearningEn-Compactness：用于广义零样本学习的自蒸馏嵌入和对比生成
Unsupervised Learning of Accurate Siamese Tracking准确连体跟踪的无监督学习
Accelerating DETR Convergence via Semantic-Aligned Matching通过语义对齐匹配加速 DETR 收敛
Co-advise: Cross Inductive Bias Distillation共同建议：交叉感应偏置蒸馏
Medial Spectral Coordinates for 3D Shape Analysis用于 3D 形状分析的内侧光谱坐标
Coupled Iterative Refinement for 6D Multi-Object Pose Estimation用于 6D 多目标姿态估计的耦合迭代细化
DeepCurrents: Learning Implicit Representations of Shapes with BoundariesDeepCurrents：学习带边界形状的隐式表示
Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image向外看：从单个图像合成一致的长期 3D 场景视频
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation零经验要求：用于语义视觉导航的即插即用模块化迁移学习
Day-to-Night Image Synthesis for Training Nighttime Neural ISPs用于训练夜间神经 ISP 的日夜图像合成
Playable Environments: Video Manipulation in Space and Time可播放环境：空间和时间中的视频操作
Unified Contrastive Learning in Image-Text-Label Space图文标签空间中的统一对比学习
Many-to-many Splatting for Efficient Video Frame Interpolation用于高效视频帧插值的多对多 Splatting
Uncertainty-Aware Deep Multi-View Photometric Stereo不确定性感知深度多视图光度立体
Multi-Robot Active Mapping via Neural Bipartite Graph Matching基于神经二分图匹配的多机器人主动映射
Location-free Human Pose Estimation无位置人体姿态估计
Multiview Transformers for Video Recognition用于视频识别的多视图转换器
RIO: Rotation-equivariance supervised learning of robust inertial odometryRIO：稳健惯性里程计的旋转等方差监督学习
Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment基于松弛空间结构对齐的 Few Shot 生成模型自适应
MiniViT: Compressing Vision Transformers with Weight MultiplexingMiniViT：使用权重复用压缩视觉变压器
Pop-Out Motion: 3D-Aware Image Deformation via Learning Shape Laplacian弹出运动：通过学习形状拉普拉斯算子实现 3D 感知图像变形
On the Road to Online Adaptation for Semantic Image Segmentation语义图像分割的在线适应之路
Generalized Binary Search Network for Highly-Efficient Multi-View Stereo用于高效多视图立体的广义二元搜索网络
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation视觉语言导航中指令跟踪和生成的反事实循环一致学习
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger TokensMSG-Transformer：通过操作 Messenger 令牌交换本地空间信息
Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning用于提高元学习中的泛化和内存效率的动态内核选择
Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation弱监督语义分割的区域语义对比和聚合
DLFormer:Discrete Latent Transformer for Video InpaintingDLFormer：用于视频修复的离散潜在变压器
Continuous Scene Representations for Embodied AI具身 AI 的连续场景表示
vCLIMB: A Novel Video Class Incremental Learning BenchmarkvCLIMB：一种新颖的视频类增量学习基准
NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image RegistrationNODEO：基于神经常微分方程的可变形图像配准优化框架
ONCE-3DLanes: Building Monocular 3D Lane DetectionONCE-3DLanes：构建单目 3D 车道检测
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real TransferObjectFolder 2.0：用于 Sim2Real 传输的多感官对象数据集
HairMapper: Removing Hair from Portraits Using GANsHairMapper：使用 GAN 从肖像中去除头发
Dist-PU: Positive-Unlabeled Learning from a Label Distribution PerspectiveDist-PU：从标签分布的角度进行正无标签学习
Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection多样性很重要：充分利用深度线索进行可靠的单目 3D 对象检测
Interactive Multi-Class Tiny-Object Detection交互式多类微小物体检测
Generalizable Human Pose Triangulation可概括的人体姿势三角测量
Towards Discriminative Representation: Multi-view Trajectory Contrastive Learning for Online Multi-object TrackingTowards Discriminative Representation：用于在线多目标跟踪的多视图轨迹对比学习
A Simple Episodic Linear Probe Improves Visual Recognition in the Wild一个简单的情节线性探针提高了野外的视觉识别
Learning to Learn by Jointly Optimizing Neural Architecture and Weights通过联合优化神经架构和权重来学习学习
Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot LearningTransformers 中的属性替代学习和频谱令牌池化，用于少样本学习
Learning Soft Estimator of Keypoint Scale and Orientation with Probabilistic Covariant Loss学习具有概率协变损失的关键点尺度和方向的软估计器
Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin具有自适应置信度的半监督深度面部表情识别
Cross Domain Object Detection by Target-Perceived Dual Branch Distillation目标感知双分支蒸馏的跨域目标检测
Depth-Aware Generative Adversarial Network for Talking Head Video Generation用于说话头视频生成的深度感知生成对抗网络
OccAM’s Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR DataOccAM 的激光：基于遮挡的 3D 物体检测器在 LiDAR 数据上的属性图
Improving Adversarially Robust Few-shot Image Classification with Generalizable Representations使用可泛化的表示改进对抗性鲁棒的 Few-shot 图像分类
DyTox: Transformers for Continual Learning with DYnamic TOken eXpansionDyTox：使用动态令牌扩展进行持续学习的变形金刚
Stable Long-

锐单商城拥有海量元器件数据手册、IC替代型号，打造电子元器件IC百科大全！

CVPR2022论文列表（中英对照）

相关文章