Youtube-360

A dataset of uncurated 360 video content with spatial audio.

Spatial Media

360 video+FOA

+88000 Clips

(10s each)

246 Hours

We collected a dataset of 360° video with first order ambisonics from YouTube, containing clips from a diverse set of topics such as musical performances, vlogs, sports, and others. The dataset was cleaned by removing videos that 1) did not contain valid ambisonics, 2) only contain still images, or 3) contain a significant amount of post-production sounds such as voice-overs and background music. To evaluate a model's ability to localize objects in a 360 scene, we also provide semantic segmentation predictions provided by a state-of-the-art ResNet101 Panoptic FPN model trained on the MS-COCO dataset. For more information about the dataset, please check our paper.

Download

We provide Youtube URLs and segment timestamps. If you experience issues downloading and processing the dataset, please email the authors for assistance.

Youtube-360

A dataset of uncurated 360 video content with spatial audio.

Spatial Media

+88000 Clips

246 Hours

Download

Training set (IDs)

Test set (IDs)

Segments

Semantic Segmentation

Acknowledgements