Last Modified: April 16, 2021 _____________________________________________________________________ Multimodal Audio-Visual Detection (MAVD) dataset @author Rivera Valverde, Francisco and Valeria Hurtado, Juana and Valada, Abhinav hurtadoj@cs.uni-freiburg.de http://multimodal-distill.cs.uni-freiburg.de _____________________________________________________________________ There are 50 directories under the name `drive_*`. We explain the contents of the directories in the `Dataset Structure` section of this readme. Each of these drives registers a 15 min car drive through different diverse regions of Freiburg im Breisgau and nearby towns. There are day, night, and poor light condition recordings among these drives. We split the data using a 60 (train) /20 (validation)/20 (test) scheme, which can be found under the `SPLITS` directory. We recorded 2 times per location for a total of approximately 24 different scenes. Refer to our paper for more details https://arxiv.org/abs/2103.01353 _____________________________________________________________________ ## Dataset Structure The folder structure for each drive is as follows: ``` {drive_name}/{modality}/*.ext ``` where drive_name = drive__ Internally each drive contains: audio/audio__.mp3 fl_ir_aligned/fl_ir_aligned_.jpg fl_rgb/fl_rgb_.jpg fl_rgb_depth/fl_rgb_depth_.jpg All modalities are aligned under a TIMESTAMP. For more details on the alignment technique, please refer to our paper. _____________________________________________________________________ CITATION If you use this dataset or report results on it, please consider citing our paper below: Francisco Rivera Valverde, Juana Valeria Hurtado, Abhinav Valada, "There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge", The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021. @article{riverahurtado2021mmdistillnet, title={There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge}, author={Rivera Valverde, Francisco and Valeria Hurtado, Juana and Valada, Abhinav}, booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition}, year={2021} } _____________________________________________________________________ CHANGELOG - v0.1 Initial public release