740 Zip: Download 736
Five unique human-annotated descriptions for every audio clip.
Reference the original paper: Drossos, K., Lipping, S., & Virtanen, T. (2020). "Clotho: an Audio Captioning Dataset." Proc. IEEE ICASSP, pp. 736-740 .
If you are writing a technical report or paper using this data, ensure you include these standard sections:
Five unique human-annotated descriptions for every audio clip.
Reference the original paper: Drossos, K., Lipping, S., & Virtanen, T. (2020). "Clotho: an Audio Captioning Dataset." Proc. IEEE ICASSP, pp. 736-740 .
If you are writing a technical report or paper using this data, ensure you include these standard sections: