The two-volume set LNCS 13141 and LNCS 13142 constitutes the proceedings of the 28 th International Conference on MultiMedia Modeling, MMM 2022, which took place in Phu Quoc, Vietnam, during June 6-10, 2022. The 107 papers presented in these proceedings were carefully reviewed and selected from a total of 212 submissions. They focus on topics related to multimedia content analysis; multimedia signal processing and communications; and multimedia applications and services.
The two-volume set LNCS 13141 and LNCS 13142 constitutes the proceedings of the 28th International Conference on MultiMedia Modeling, MMM 2022, which took place in Phu Quoc, Vietnam, during June 6-10, 2022.
The 107 papers presented in these proceedings were carefully reviewed and selected from a total of 212 submissions. They focus on topics related to multimedia content analysis; multimedia signal processing and communications; and multimedia applications and services.
BEST PAPER SESSION.- Real-time detection of tiny objects based on a weighted bi-directional FPN.- Multi-Modal Fusion Network for Rumor Detection with Texts and Images.- PF-VTON: Toward High-Quality Parser-Free Virtual Try-On Network.- MF-GAN: Multi-conditional fusion Generative Adversarial Network for Text-to-Image Synthesis.- APPLICATIONS 1.- Learning to classify weather conditions from single images without labels.- Learning Image Representation via Attribute-aware Attention Networks for Fashion Classification.- Toward Detail-Oriented Image-Based Virtual Try-On with Arbitrary Poses.- Parallel DBSCAN-Martingale estimation of the number of concepts for automatic satellite image clustering.- MULTIMEDIA APPLICATIONS - PERSPECTIVES, TOOLS & APPLICATIONS (Special Session) & BRAVE NEW IDEAS.- AI for the Media Industry: Application Potential and Automation Level.- Color the Word: Leveraging Web Images for Machine Translation of Untranslatable Words.- ACTIVITIES & EVENTS.- MGMP: Multimodal Graph Message Propagation Network for Event Detection.- Pose-Enhanced Relation Feature for Action Recognition in Still Images.-Prostate Segmentation of Ultrasound Images based on Interpretable-guided Mathematical Model.- Spatiotemporal Perturbation Based Dynamic Consistency for Semi-Supervised Temporal Action Detection.- MULTIMEDIA DATASETS FOR REPEATABLE EXPERIMENTATION (Special Session).- A Task Category Space for User-Centric Comparative Multimedia Search Evaluations.- GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval.- LLQA - Lifelog Question Answering Dataset.- LEARNING.- Category-sensitive Incremental Learning For Image-based 3D Shape Reconstruction.- AdaConfigure: Reinforcement Learning-based Adaptive Configuration for Video Analytics Services.- Mining Minority-class Examples With Uncertainty Estimates.- Conditional Context-aware Feature Alignment for Domain Adaptive Detection Transformer.- MULTIMEDIA for MEDICAL APPLICATIONS (Special Session).- Human activity recognition with IMU and vital signs feature fusion.- On Identifying Pareidolia Phenomenon by Emulating Patient Behavior.- Using Explainable AI to Identify Differences between Clinical and Experimental Pain Detection Models Based on Facial Expressions.- APPLICATIONS 2.- Double Granularity Relation Network with Self-Criticism for Occluded Person Re-Identification.- A Complementary Fusion Strategy for RGB-D Face Recognition.- Multi-scale Cross-modal Transformer Network for RGB-D Object Detection.- Joint Re-Detection and Re-Identification for Multi-Object Tracking.- MULTIMEDIA ANALYTICS for CONTEXTUAL HUMAN UNDERSTANDING (Special Session).- An Investigation into Keystroke Dynamics and Heart Rate Variability as Indicators of Stress.- Fall detection using multimodal data.- Prediction of Blood Glucose using Contextual LifeLog Data.- Multimodal Embedding for Lifelog Retrieval.- APPLICATIONS 3.- A Multiple Positives Enhanced NCE Loss for Image-Text Retrieval.- SAM: Self Attention Mechanism for Scene Text Recognition based on Swin Transformer.- JVCSR: Video Compressive Sensing Reconstruction with Joint In-loop Reference Enhancement and Out-loop Super-resolution.- Point Cloud Upsampling via a Coarse-to-fine Network.- IMAGE ANALYTICS.- Arbitrary Style Transfer With Adaptive Channel Network.- Fast Single Image Dehazing Using Morphological Reconstruction and Saturation Compensation.- One-Stage Image Inpainting with Hybrid Attention.- Real-time FPGA Design for OMP Targeting 8K Image Reconstruction.- SPEECH & MUSIC.- Time-Frequency Attention For Speech Emotion Recognition With Squeeze-and-Excitation Blocks.- SPEECH INTELLIGIBILITY ENHANCEMENT BY NON-PARALLEL SPEECH STYLE CONVERSION USING CWT AND iMetricGAN BASED CycleGAN.- A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody.- MULTIMODAL ANALYTICS.- Bi-attention modal separation network for multimodal video fusion.- Combining Knowledge and Multi-modal Fusion for Meme Classification.- Non-Uniform Attention Network for Multi-modal Sentiment Analysis.- Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder.
BEST PAPER SESSION.- Real-time detection of tiny objects based on a weighted bi-directional FPN.- Multi-Modal Fusion Network for Rumor Detection with Texts and Images.- PF-VTON: Toward High-Quality Parser-Free Virtual Try-On Network.- MF-GAN: Multi-conditional fusion Generative Adversarial Network for Text-to-Image Synthesis.- APPLICATIONS 1.- Learning to classify weather conditions from single images without labels.- Learning Image Representation via Attribute-aware Attention Networks for Fashion Classification.- Toward Detail-Oriented Image-Based Virtual Try-On with Arbitrary Poses.- Parallel DBSCAN-Martingale estimation of the number of concepts for automatic satellite image clustering.- MULTIMEDIA APPLICATIONS - PERSPECTIVES, TOOLS & APPLICATIONS (Special Session) & BRAVE NEW IDEAS.- AI for the Media Industry: Application Potential and Automation Level.- Color the Word: Leveraging Web Images for Machine Translation of Untranslatable Words.- ACTIVITIES & EVENTS.- MGMP: Multimodal Graph Message Propagation Network for Event Detection.- Pose-Enhanced Relation Feature for Action Recognition in Still Images.-Prostate Segmentation of Ultrasound Images based on Interpretable-guided Mathematical Model.- Spatiotemporal Perturbation Based Dynamic Consistency for Semi-Supervised Temporal Action Detection.- MULTIMEDIA DATASETS FOR REPEATABLE EXPERIMENTATION (Special Session).- A Task Category Space for User-Centric Comparative Multimedia Search Evaluations.- GPR1200: A Benchmark for General-Purpose Content-Based Image Retrieval.- LLQA - Lifelog Question Answering Dataset.- LEARNING.- Category-sensitive Incremental Learning For Image-based 3D Shape Reconstruction.- AdaConfigure: Reinforcement Learning-based Adaptive Configuration for Video Analytics Services.- Mining Minority-class Examples With Uncertainty Estimates.- Conditional Context-aware Feature Alignment for Domain Adaptive Detection Transformer.- MULTIMEDIA for MEDICAL APPLICATIONS (Special Session).- Human activity recognition with IMU and vital signs feature fusion.- On Identifying Pareidolia Phenomenon by Emulating Patient Behavior.- Using Explainable AI to Identify Differences between Clinical and Experimental Pain Detection Models Based on Facial Expressions.- APPLICATIONS 2.- Double Granularity Relation Network with Self-Criticism for Occluded Person Re-Identification.- A Complementary Fusion Strategy for RGB-D Face Recognition.- Multi-scale Cross-modal Transformer Network for RGB-D Object Detection.- Joint Re-Detection and Re-Identification for Multi-Object Tracking.- MULTIMEDIA ANALYTICS for CONTEXTUAL HUMAN UNDERSTANDING (Special Session).- An Investigation into Keystroke Dynamics and Heart Rate Variability as Indicators of Stress.- Fall detection using multimodal data.- Prediction of Blood Glucose using Contextual LifeLog Data.- Multimodal Embedding for Lifelog Retrieval.- APPLICATIONS 3.- A Multiple Positives Enhanced NCE Loss for Image-Text Retrieval.- SAM: Self Attention Mechanism for Scene Text Recognition based on Swin Transformer.- JVCSR: Video Compressive Sensing Reconstruction with Joint In-loop Reference Enhancement and Out-loop Super-resolution.- Point Cloud Upsampling via a Coarse-to-fine Network.- IMAGE ANALYTICS.- Arbitrary Style Transfer With Adaptive Channel Network.- Fast Single Image Dehazing Using Morphological Reconstruction and Saturation Compensation.- One-Stage Image Inpainting with Hybrid Attention.- Real-time FPGA Design for OMP Targeting 8K Image Reconstruction.- SPEECH & MUSIC.- Time-Frequency Attention For Speech Emotion Recognition With Squeeze-and-Excitation Blocks.- SPEECH INTELLIGIBILITY ENHANCEMENT BY NON-PARALLEL SPEECH STYLE CONVERSION USING CWT AND iMetricGAN BASED CycleGAN.- A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody.- MULTIMODAL ANALYTICS.- Bi-attention modal separation network for multimodal video fusion.- Combining Knowledge and Multi-modal Fusion for Meme Classification.- Non-Uniform Attention Network for Multi-modal Sentiment Analysis.- Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder.
Es gelten unsere Allgemeinen Geschäftsbedingungen: www.buecher.de/agb
Impressum
www.buecher.de ist ein Internetauftritt der buecher.de internetstores GmbH
Geschäftsführung: Monica Sawhney | Roland Kölbl | Günter Hilger
Sitz der Gesellschaft: Batheyer Straße 115 - 117, 58099 Hagen
Postanschrift: Bürgermeister-Wegele-Str. 12, 86167 Augsburg
Amtsgericht Hagen HRB 13257
Steuernummer: 321/5800/1497