Abstract
- Principal Mel-Spectrum
Components (Feature) - Temporal Pooling
Functions (Model) - Single Hidden Layer Neural Network, thus Multi-layer Perceptron
(Classifier)
Audio Preprocessing
Feature: PMSC (Principal Mel-Spectrum
Components)
- Original Data:
30s, 22.05KHz, mono, wav - Process
Steps: - DFT (spectral
domain)
we compute DFTs over windows of 1024
samples on audio at 22.05 KHz (i.e. roughly 46ms) with a frame step
of 512
samples. - Mel-Compression
we
run the spectral amplitudes through a set of 256
mel-scaled triangular filters to abtain a set of spectral energy
bands. - Principal Component
analysis whitening (PCA whitening)
we compute the principal components of
a random sub-sample of training set. In order to obtain features with
unitary variance, we multiply(乘以) each component by the inverse square of
its eigenvalue(特征值平方的倒数). ---- PCA whitening.
Model
PFC (Pooled Features
Classifier)
- Pooling Operation
the model applies a given set of pooling functions
(how many?) to the PMSC features, and sends the pooled features to a
classifier(MLP, with hidden layer of 2000 units, sigmoid activation, L2 weight
decay and cross-entropy cost). - Classify
each pooling window is considered as a training example for
the classifier, and average the predictions of the classifier over all the
windows of a given clip to obtain the final classification (what is the
rule?).
Tasks
- Classification (train/test task)
the MLP outputs an affinity prediction
for each class (pooling functions tread each pooling window as a training
example). - Tagging
- Affinity
the
affinity scores for a song is
thus directly the output of the MLP. - Binary Classification
choose the threshold that optimizes the
F1-score on the validation set.
Tools
- Theano: Theano is
a numerical computation library for Python. In
Theano, computations are expressed using a NumPy-like
syntax and compiled to
run efficiently on either CPU or GPU architectures.
PH_Pooled Featrues Classification MIREX 2011 Submission
时间: 2024-10-25 04:38:28