layout title tags comments post Artificial Intelligence Paper Review ai ml paper review Segmentation Classification Inpainting Image Editing Face Swap Video Generation Diffusion Model Volume Rendering Virtual Try On Voice Conversion tts Large Language Model speech recognition Object Detection Fundamental RAG false Retrieval-Augmented Generation Text Embeddings by Weakly-Supervised Contrastive Pre-training (DRAGIN) Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models (DeepRAG) Thinking to Retrieval Step by Step for Large Language Models Large Language Model (ChipNeMo) Domain-Adapted LLMs for Chip DesignBefore Projection CONTINUAL PRE-TRAINING OF LANGUAGE MODELS (Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models (SteerLM) Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF Nemotron-4 340B Technical Report (LogParser-LLM) Advancing Efficient Log Parsing with Large Language Models Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces (Penetrative AI) Making LLMs Comprehend the Physical World Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies DeepSeek-V3 Technical Report (DeepSeek-R1) Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Visual Language Model (Video-LLaVA) Learning United Visual Representation by Alignment Before Projection (VILA) On Pre-training for Visual Language Models Sigmoid Loss for Language Image Pre-Training (NVILA) Efficient Frontier Visual Language Models (Template Matters) Understanding the Role of Instruction Templatesin Multimodal Language Model Evaluation and Training (DEPLOT) One-shot visual language reasoning by plot-to-table translation (MATCHA) Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering (Pix2Struct) Screenshot Parsing as Pretraining for Visual Language Understanding Large Model Optimization (AWQ) ACTIVATION-AWARE WEIGHT QUANTIZATION FOR ON-DEVICE LLM COMPRESSION AND ACCELERATION Speculative Decoding Computer Vision Segmentation Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement PortraitNet: Real-time Portrait Segmentation Network for Mobile Device Real-time Hair Segmentation and Recoloring on Mobile GPUs TTVOS: Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss SINet: Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder (PP-LiteSeg) A Superior Real-Time Semantic Segmentation Model (SemPLeS) Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation Object Detection Scaled-YOLOv4: Scaling Cross Stage Partial Network Pose Estimation (OpenPose) Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields Image Classification (Background Splitting) Finding Rare Classes in a Sea of Background Image Inpainting (PiiGAN) Generative Adversarial Networks for Pluralistic Image Inpainting Recurrent Feature Reasoning for Image Inpainting Image Editing Spatially-invariant Style-codes Controlled Makeup Transfer Adaptive semantic attribute decoupling for precise face image editing (Arbitrary Facial Attribute Editing) Only Change What You Want Face Swap (SimSwap) An Efficient Framework For High Fidelity Face Swapping (MobileFaceSwap) A Lightweight Framework for Video Face Swapping (MobileFSGAN) MIGRATING FACE SWAP TO MOBILE DEVICES: A LIGHTWEIGHT FRAMEWORK AND A SUPERVISED TRAINING SOLUTION (A new face swap method for image and video domains) a technical report (Smooth-Swap) A Simple Enhancement for Face-Swapping with Smoothness Region-Aware Face Swapping GHOST — A New Face Swap Approach for Image and Video Domains Video Generation PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering (MakeItTalk) Speaker-Aware Talking-Head Animation First Order Motion Model for Image Animation (DaGAN) Depth-Aware Generative Adversarial Network for Talking Head Video Generation Thin-Plate Spline Motion Model for Image Animation (SadTalker) Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation Diffusion Model (InstructPix2Pix) Learning to Follow Image Editing Instructions High-Resolution Image Synthesis with Latent Diffusion Models Null-text Inversion for Editing Real Images using Guided Diffusion Models Volume Rendering (NeRF) Representing Scenes as Neural Radiance Fields for View Synthesis (R2L) Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis Real-Time Neural Light Field on Mobile Devices (Instant-NGP) Instant Neural Graphics Primitives with a Multiresolution Hash Encoding (MobileNeRF) Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures (Re-ReND) Real-time Rendering of NeRFs across Devices (BakedSDF) Meshing Neural SDFs for Real-Time View Synthesis Virtual Try On (ARShoe) Real-Time Augmented Reality Shoe Try-on System on Smartphones CONTINUAL PRE-TRAINING OF LANGUAGE MODELS (Reuse, Don’t Retrain) A Recipe for Continued Pretraining of Language Models Natural Language Text-to-Speech (YourTTS) Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone (VITS) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech (NaturalSpeech2) Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers (NaturalSpeech) End-to-End Text to Speech Synthesis with Human-Level Quality Voice Conversion Voice Conversion With Just Nearest Neighbors LOW-LATENCY REAL-TIME VOICE CONVERSION ON CPU (QuickVC) Any-To-Many Voice Conversion Using Inverse Short-Time Fourier Transform for Faster Conversion Speech Recognition (Whisper) Robust Speech Recognition via Large-Scale Weak Supervision (WhisperX) Time-Accurate Speech Transcription of Long-Form Audio Music Fingerprinting (SpectroMap) Peak detection algorithm for audio fingerprinting MUSIC AUGMENTATION AND DENOISING FOR PEAK-BASED AUDIO FINGERPRINTING Fundamental Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness Searching for MobileNetV3 Supervised Contrastive Learning (Wavelet Knowledge Distillation) Towards Efficient Image-to-Image Translation (Teachers Do More Than Teach) Compressing Image-to-Image Models Coordinate Attention for Efficient Mobile Network Design Image Augmentations for GAN Training Improved Consistency Regularization for GANs (GraN-GAN) Piecewise Gradient Normalization for Generative Adversarial Networks TOWARDS FASTER AND STABILIZED GAN TRAINING FOR HIGH-FIDELITY FEW-SHOT IMAGE SYNTHESIS (GAN Compression) Efficient Architectures for Interactive Conditional GANs Improving GANs with A Dynamic Discriminator Systematic Analysis and Removal of Circular Artifacts for StyleGAN