Recent advances in image detection through deep algorithms such as Faster-RCNN or YOLO have lead to significant improvement of the state-of-the-art. These algorithms excel in subdividing images in zones of interest, and then recognising the key elements composing the original image.
The goal of this project is to transpose this approach to audio. More specifically, we would like to create a 2D audio map, along the x and y axes. As a result we have two main goals: 1) localising audio sources 2) recognising the audio source.
During the eNTERFACE project we will be focusing on two main categories of audio sources: home environment (closing doors, vacuum cleaner, blender, etc.) and crowd behaviour (applause, booing, cheering), as our goal is to reuse to obtained results for two other larger projects: IGLU and DeepSport.
- Gueorgui Pironkov (Gueorgui.PIRONKOV@umons.ac.be)
- Stéphane Dupont (Stephane.DUPONT@umons.ac.be)
- Deliverable 1: Small audio database annotated both in terms of localization and source.
- Deliverable 2: Code for the audio scene analysis.