Last Modified: April 16, 2021
_____________________________________________________________________

Multimodal Audio-Visual Detection (MAVD) dataset

@author
    Rivera Valverde, Francisco and Valeria Hurtado, Juana and Valada, Abhinav
    hurtadoj@cs.uni-freiburg.de
    http://multimodal-distill.cs.uni-freiburg.de
_____________________________________________________________________

There are 50 directories under the name `drive_*`. We explain the contents of the directories in the `Dataset Structure` section of this readme. Each of these drives registers a 15 min car drive through different diverse regions of Freiburg im Breisgau and nearby towns. There are day, night, and poor light condition recordings among these drives. We split the data using a 60 (train) /20 (validation)/20 (test) scheme, which can be found under the `SPLITS` directory. We recorded 2 times per location for a total of approximately 24 different scenes.

Refer to our paper for more details https://arxiv.org/abs/2103.01353 
_____________________________________________________________________

## Dataset Structure

The folder structure for each drive is as follows:
```
{drive_name}/{modality}/*.ext
```
where drive_name = drive_<night|day>_<timestamp>

Internally each drive contains:
    audio/audio_<mic_number_from_0_to_7>_<TIMESTAMP>.mp3                            
    fl_ir_aligned/fl_ir_aligned_<TIMESTAMP>.jpg                    
    fl_rgb/fl_rgb_<TIMESTAMP>.jpg                           
    fl_rgb_depth/fl_rgb_depth_<TIMESTAMP>.jpg

All modalities are aligned under a TIMESTAMP. For more details on the alignment technique, please refer to our paper.
_____________________________________________________________________

CITATION

If you use this dataset or report results on it, please consider citing our paper below:

Francisco Rivera Valverde, Juana Valeria Hurtado, Abhinav Valada, "There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge",
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

@article{riverahurtado2021mmdistillnet,
  title={There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge},
  author={Rivera Valverde, Francisco and Valeria Hurtado, Juana and Valada, Abhinav},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

_____________________________________________________________________

CHANGELOG

- v0.1 Initial public release