PH_Pooled Featrues Classification MIREX 2011 Submission

Abstract

  1. Principal Mel-Spectrum
    Components (Feature)

  2. Temporal Pooling
    Functions (Model)

  3. Single Hidden Layer Neural Network, thus Multi-layer Perceptron
    (Classifier)

Audio Preprocessing

Feature: PMSC (Principal Mel-Spectrum
Components)

  1. Original Data:
     30s, 22.05KHz, mono, wav

  2. Process
    Steps:

    1. DFT (spectral
      domain)
      we compute DFTs over windows of 1024
      samples on audio at 22.05 KHz (i.e. roughly 46ms) with a frame step
      of 512
      samples.

    2. Mel-Compression
      we
      run the spectral amplitudes through a set of 256
      mel-scaled triangular filters to abtain a set of spectral energy
      bands.

    3. Principal Component
      analysis whitening (PCA whitening)
      we compute the principal components of
      a random sub-sample of training set. In order to obtain features with
      unitary variance, we multiply(乘以) each component by the inverse square of
      its eigenvalue(特征值平方的倒数). ---- PCA whitening.

Model

PFC (Pooled Features
Classifier)

  1. Pooling Operation
    the model applies a given set of pooling functions
    (how many?) to the PMSC features, and sends the pooled features to a
    classifier(MLP, with hidden layer of 2000 units, sigmoid activation, L2 weight
    decay and cross-entropy cost).

  2. Classify
    each pooling window is considered as a training example for
    the classifier, and average the predictions of the classifier over all the
    windows of a given clip to obtain the final classification (what is the
    rule?).

Tasks

  1. Classification (train/test task)
    the MLP outputs an affinity prediction
    for each class (pooling functions tread each pooling window as a training
    example).

  2. Tagging

    1. Affinity
      the
      affinity scores for a song is
      thus directly the output of the MLP.

    2. Binary Classification
      choose the threshold that optimizes the
      F1-score on the validation set.

Tools

  1. Theano: Theano is
    a numerical computation library for Python. In
    Theano, computations are expressed using a NumPy-like
    syntax and compiled to
    run efficiently on either CPU or GPU architectures.

    来源: <http://en.wikipedia.org/wiki/Theano_(software)>

来自为知笔记(Wiz)

PH_Pooled Featrues Classification MIREX 2011 Submission

时间: 2024-08-09 22:10:24

PH_Pooled Featrues Classification MIREX 2011 Submission的相关文章

[MIREX] MIREX评测介绍

MIREX作为国际最权威音频检索评测大赛,竟然在百度上找不到任何介绍,只有几个与什么搜狗.腾讯获得什么成绩相关的检索内容,相比而言,TRECVID的内容收到重视多了...由于研究生阶段主要研究音频领域,需要对整个领域有一个大致的了解,感觉还是从MIREX入手比较合适,所以借此机会也与大家分享一记. MIREX全称Music Information Retrieval Evaluation eXchange,即音乐信息检索评测,至于eXchange放在这不太清楚什么意思,或许与“交流”类似的含义吧

hdu 2011 多项式求和

多项式求和 Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others) Total Submission(s): 52871    Accepted Submission(s): 30814 Problem Description 多项式的描述如下: 1 - 1/2 + 1/3 - 1/4 + 1/5 - 1/6 + ... 现在请你求出该多项式的前n项的和. Input 输入数据由2行组

What are the advantages of different classification algorithms?

What are the advantages of different classification algorithms? For instance, if we have large training data set with approx more than 10000 instances and more than 100000 features ,then which classifier will be best to choose for classification Want

ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton 摘要 我们训练了一个大型的深度卷积神经网络,来将在ImageNet LSVRC-2010大赛中的120万张高清图像分为1000个不同的类别.对测试数据,我们得到了top-1误差率37.5%,以及top-5误差率17.0%,这个效果比之前最顶尖的都要好得多.该神经网络有

中文版 ImageNet Classification with Deep Convolutional Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks 摘要 我们训练了一个大型深度卷积神经网络来将ImageNet LSVRC-2010竞赛的120万高分辨率的图像分到1000不同的类别中.在测试数据上,我们得到了top-1 37.5%, top-5 17.0%的错误率,这个结果比目前的最好结果好很多.这个神经网络有6000万参数和650000个神经元,包含5个卷积层(某些卷积层后面带有池化层)和3个全连接层,最后是一个1

Best Resources for Imbalanced Classification

Best Resources for Imbalanced Classification 2019-12-26 08:47:39 Source: https://machinelearningmastery.com/resources-for-imbalanced-classification/ Classification is a predictive modeling problem that involves predicting a class label for a given ex

Character-level Convolutional Networks for Text Classification

Abstract Semantic word spaces have been very useful but cannot express the meaning of longer phrases in a principled way. 语义词空间是非常有用的,但它不能有原则地表达较长短语的意义. Further progress towards understanding compositionality in tasks such as sentiment detection requ

BZOJ2440 [中山市选2011]完全平方数

Description 小 X 自幼就很喜欢数.但奇怪的是,他十分讨厌完全平方数.他觉得这些数看起来很令人难受.由此,他也讨厌所有是完全平方数的正整数倍的数.然而这丝毫不影响他对其他数的热爱. 这天是小X的生日,小 W 想送一个数给他作为生日礼物.当然他不能送一个小X讨厌的数.他列出了所有小X不讨厌的数,然后选取了第 K个数送给了小X.小X很开心地收下了. 然而现在小 W 却记不起送给小X的是哪个数了.你能帮他一下吗? Input 包含多组测试数据.文件第一行有一个整数 T,表示测试数据的组数.

开始使用CCA CRM 2011

你可能从微软的市场动态获知我们最近发布了最新版本的Microsoft Dynamics CRM 2011的客户关怀加速器(CCA R2).CCA在一个单一的用户界面提供呼叫中心功能相结合的,能够显示和操纵来自不同业务应用程序的数据.CCA提供了许多功能,包括: l 集成代理的桌面 l 脚本以消除重复的数据输入 l 计算机电话集成(CTI) l 代理活动报告 CCA的核心是一个允许开发人员构建自己的代理的桌面,并提供多会话管理等功能的框架.UI集成不同类型的应用程序(包括Web.Windows窗体