PH_Pooled Featrues Classification MIREX 2011 Submission


  1. Principal Mel-Spectrum
    Components (Feature)

  2. Temporal Pooling
    Functions (Model)

  3. Single Hidden Layer Neural Network, thus Multi-layer Perceptron

Audio Preprocessing

Feature: PMSC (Principal Mel-Spectrum

  1. Original Data:
     30s, 22.05KHz, mono, wav

  2. Process

    1. DFT (spectral
      we compute DFTs over windows of 1024
      samples on audio at 22.05 KHz (i.e. roughly 46ms) with a frame step
      of 512

    2. Mel-Compression
      run the spectral amplitudes through a set of 256
      mel-scaled triangular filters to abtain a set of spectral energy

    3. Principal Component
      analysis whitening (PCA whitening)
      we compute the principal components of
      a random sub-sample of training set. In order to obtain features with
      unitary variance, we multiply(乘以) each component by the inverse square of
      its eigenvalue(特征值平方的倒数). ---- PCA whitening.


PFC (Pooled Features

  1. Pooling Operation
    the model applies a given set of pooling functions
    (how many?) to the PMSC features, and sends the pooled features to a
    classifier(MLP, with hidden layer of 2000 units, sigmoid activation, L2 weight
    decay and cross-entropy cost).

  2. Classify
    each pooling window is considered as a training example for
    the classifier, and average the predictions of the classifier over all the
    windows of a given clip to obtain the final classification (what is the


  1. Classification (train/test task)
    the MLP outputs an affinity prediction
    for each class (pooling functions tread each pooling window as a training

  2. Tagging

    1. Affinity
      affinity scores for a song is
      thus directly the output of the MLP.

    2. Binary Classification
      choose the threshold that optimizes the
      F1-score on the validation set.


  1. Theano: Theano is
    a numerical computation library for Python. In
    Theano, computations are expressed using a NumPy-like
    syntax and compiled to
    run efficiently on either CPU or GPU architectures.

    来源: <>


MIREX作为国际最权威音频检索评测大赛,竟然在百度上找不到任何介绍,只有几个与什么搜狗.腾讯获得什么成绩相关的检索内容,相比而言,TRECVID的内容收到重视多了...由于研究生阶段主要研究音频领域,需要对整个领域有一个大致的了解,感觉还是从MIREX入手比较合适,所以借此机会也与大家分享一记. MIREX全称Music Information Retrieval Evaluation eXchange,即音乐信息检索评测

