Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post dataset #40

Open
ZFTurbo opened this issue Jul 21, 2024 · 5 comments
Open

Post dataset #40

ZFTurbo opened this issue Jul 21, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@ZFTurbo
Copy link
Owner

ZFTurbo commented Jul 21, 2024

We can try to gather all public and private datasets in one place. To post dataset, please, fill the form:

Dataset name:
Description:
Instuments: 
Format: sample rate and compression
Volume: number of tracks
Size: in GB
Download link:
@ZFTurbo ZFTurbo added the enhancement New feature or request label Jul 21, 2024
@ZFTurbo ZFTurbo pinned this issue Jul 21, 2024
@ZFTurbo
Copy link
Owner Author

ZFTurbo commented Jul 21, 2024

Dataset name: MUSDB18-HQ
Description: Most popular dataset for Music Source Separation
Instuments: bass, drums, vocals, other
Format: Sample rate 44100 Hz, wav.
Volume: 100 train songs, 50 test songs
Size: 22.7GB
Download link: https://zenodo.org/records/3338373

@jarredou
Copy link
Contributor

jarredou commented Jul 23, 2024

Dataset name: StemGMD
Description: A Large-Scale Audio Dataset of Isolated Drum Stems for Deep Drums Demixing
Instuments: Drums
[Kick Drum, Snare, High Tom, Low-Mid Tom, High Floor Tom, Closed Hi-Hat, Open Hi-Hat, Crash Cymbal, Ride Cymbal]

Format: 16-bit/44.1 kHz stereo WAV file using ten realistic-sounding acoustic drum kits sourced from the Logic Pro X sample libraries

Volume: StemGMD contains 1224 hours of audio, which correspond to more than 136 hours of full-kit mixtures
Size: 1.13 TB
Download link: https://zenodo.org/records/7860223 and https://zenodo.org/records/7882857

Thoughts : Even if it was made simultaneously than LarsNet by same authors, it's maybe not ideal for source separation as it's really lacking sound diversity (only 10 differents drumkits), but elements can be mixed with other kits elements (and from a sound mixing engineer point of vue, it could also have been way better).

@happyTonakai
Copy link

Dataset name: MoisesDB
Description: MoisesDB is a comprehensive multitrack dataset for source separation beyond 4-stems, comprising 240 previously unreleased songs by 47 artists spanning twelve high-level genres. The total duration of the dataset is 14 hours, 24 minutes and 46 seconds, with an average recording length of 3:36 seconds. MoisesDB is offered free of charge for non-commercial research use only and includes baseline performance results for two publicly available source separation methods.
Instuments: Bass, bowed strings, drums, guitar, other, other keys, other plucked, percussion, piano, vocals, wind
Format: I didn't download it yet but it seems to be 44100 Hz wav
Volume: 240 songs in 14 hours 24 minutes and 46 seconds
Size: 82.7 GB
Download link: https://developer.moises.ai/research, https://github.com/moises-ai/moises-db

@deton24
Copy link

deton24 commented Aug 7, 2024

Here you can find more (but not sorted as neat as in the form above): https://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5c/edit#heading=h.k3cm3bvgsf4j

@ZFTurbo
Copy link
Owner Author

ZFTurbo commented Aug 28, 2024

Dataset name: The MAESTRO v3.0.0
Description: MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) is a dataset composed of about 200 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms.
Instuments: piano
Format: Sample rate 44100 Hz, wav.
Volume: 1286 songs (~200 hours)
Size: 120 GB
Download link: https://magenta.tensorflow.org/datasets/maestro#v300

изображение

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants