[转载]审稿意见分享Modeling Temporal Dependencies in High

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to
Polyphonic Music Generation and Transcription

by Nicolas Boulanger-LewandowskiYoshua BengioPascal Vincent at ICML 2012

We investigate the problem of modeling symbolic sequences
of polyphonic music in a completely general piano-roll representation. We
introduce a probabilistic model based on distribution estimators conditioned on
a recurrent neural network that is able to discover temporal dependencies in
high-dimensional sequences. Our approach outperforms many traditional models of
polyphonic music on a variety of realistic datasets. We show how our musical
language model can serve as a symbolic prior to improve the accuracy of
polyphonic transcription.

审稿意见

Posted on behalf of anonymous ICML reviewer.

Summary:
The papers main strengths are the generality of the proposed
methods, their clear improvement of the state of the art, the clarity of the
presentation, and the thoroughness of the experimental results.

Perhaps the greatest weakness is the degree of novelty: the only new model is
the RNN-RBM which is an incremental change from an RTRBM. However some of the
model-combinations (such as RNN-NADE) appear to be novel, as is the application
to
music.
--------------------------------------------------------
Detailed
Comments:
The paper addresses the problem of modelling polyphonic musical
notes. More specifically it aims to learn a predictive distribution over musical
notes at the next timestep, given those at previous timesteps. As well as being
interesting in its right, the task is compelling because it combines
(relatively) high dimensional, unconstrained density modelling with long-range
sequence modelling. Unlike most existing approaches (which use domain-specific
representations to reduce the output space) the proposed system allows any note
to be emitted at any time, which suggests that it will transfer well to other
kinds of high-dimensional sequence.

The paper also demonstrates that the predictive model can be used to improve
the performance of polyphonic transcription from audio data - the musical
equivalent of combining a language model with an audio model in speech
recognition. The combination is performed in a somewhat ad-hoc manner, and
requires the audio data to be preprocessed. However it clearly outperforms the
baseline HMM smoothing model.

The system is based on various flavour of recurrent neural network. This is
attractive for several reasons: RNNs place no restrictions on the amount of
previous context used; they can be easily adapted to different kinds of
sequential data; and they allow the density modelling and the sequence modelling
to be jointly optimised. As well as evaluating several RNN variants found in the
literature, the paper introduces a new RNN: the RNN-RBM. Although this only a
slight modification of an existing architecture (the RTRBM), the change is
clearly motivated and its advantages are demonstrated by the experiments.

The experimental results in Section 6 are exceptionally thorough, comparing
the six proposed variants of RNN models with seven common baselines across four
different datasets. This makes it possible to tease out the relative pros and
cons of the different architectures. I also liked the comparison with the
non-temporal frame-level models, which clearly shows how much benefit the
sequential models bring. The best proposed systems outperform the baselines on
all the datasets, often by a large margin.

The paper is very clearly written, well organized, and commendably concise,
given the large amount of material covered.

Suggestions:

Abstract: 
‘in a completely general representation‘ doesn‘t say much.
Either remove this or reword it to be more specific.
‘, that is appropriate
to discover temporal dependencies in such high-dimensional sequences with the
help of Hessian-free optimization and pretraining techniques.‘ -> ‘that is
able to discover temporal dependencies in high-dimensional sequences.‘

Introduction:
‘designed for the multiple classification task‘ Does this
mean 1-of-K classification? Please clarify.

Section 2:
line 217. If possible, put the exact gradient into the paper to
keep it self-contained.

Section 4.1
Is the cross entropy cost in equation (12) the objective
function used for the RNN and RNN (HF) models in Section 6? Please clarify.

Section 5
What does Figure 3 add besides an obligatory receptive field
picture? I think the space could be better used.

Section 6:
Line 560: What does ‘transposed in a common tonality‘
mean?
Lines 592-598: This sentence isn‘t clear to me. Which datasets are
complex and which are simple, and why?
line 629: There‘s an additional n in
‘additionnal‘
line 632: Why do you say RNN-NADE is more robust? RNN-RBM has
the best log-loss on 2/4 datasets and the best accuracy on 3. 
Table 1:
Put the best score in each column in bold. Maybe also reorder the table
according to some overall performance measure... average log-loss?
How big
was N for the note N-gram and N-gram models, and how many different N-grams were
needed? I would like to see this recorded for each dataset.

Conclusions:
Make sure you take out the bold text for the final
version.

原文链接

http://icml.cc/discuss/2012/

http://icml.cc/discuss/2012/590.html

[转载]审稿意见分享Modeling Temporal Dependencies in High

时间: 2024-08-06 11:53:56

[转载]审稿意见分享Modeling Temporal Dependencies in High的相关文章

RNN-RBM for music composition 网络架构及程序解读

RNN(recurrent  neural network)是神经网络的一种,主要用于时序数据的分析,预测,分类等. RNN的general介绍请见下一篇文章<Deep learning From Image to Sequence>.本文针对对deep learning有一点基础(神经网络基本training原理,RBM结构及原理,简单时序模型)的小伙伴讲一下Bengio一个工作(RNNRBM)的原理和实现. 本文重点内容:针对RNN(recurrent neural network)一个应

如何回复审稿人意见

Manuscript number: BXXXXXKMS Type: ArticleTitle: "XXXX"Correspondence Author: XXXDear Dr. Fay Riordan:Thank you very much for your attention and the referee’s evaluation and comments on our paper BXXXXK. We have revised the manuscript according

项目中使用Quartz集群分享--转载

项目中使用Quartz集群分享--转载 在公司分享了Quartz,发布出来,希望大家讨论补充. CRM使用Quartz集群分享  一:CRM对定时任务的依赖与问题  二:什么是quartz,如何使用,集群,优化  三:CRM中quartz与Spring结合使用 1:CRM对定时任务的依赖与问题  1)依赖  (1)每天晚上的定时任务,通过sql脚本 + crontab方式执行 Xml代码   #crm 0 2 * * * /opt/***/javafiles/***/shell/***_dail

NIPS 2014

一个星期的NIPS终于开完了,Montreal的会场真是高大上,比去年的又大又新太多,设施很好,组织的也很好.因为离我们近,好多深藏不露的人物都来了,不光machine learning,好多vision,NLP和compbio的人也都来了,甚至Radford Neal这种鄙视publication的都出现了.NIPS感觉整体水平还是比ICML高一点,我喜欢NIPS的single track,促进领域间的互相了解,ICML多个track同时进行让人很容易就错过一些有意思的内容. Montreal

你在发表理科学术文章过程中有哪些经验值得借鉴?

将要开始写论文发表了,虽然之前也有过不是一作的文章,但文章不是我写的.因此想请教一下前辈们:在发表文章的过程中有哪些经验可借鉴? 补充一点信息:我指的不是普遍适用的学术型文章的写作方式:比如用语要精练准确.注重表达的逻辑性之类,而是诸如: 1.是先不管格式篇幅图片要求一类随便写一个版本,然后再在其上按照具体的杂志要求修改,还是一上来就按某具体杂志的要求来写? 2.开始写文章的时候是要先想好可能被argue的内容,然后再以此构建文章整体的结构:还是不管那些,先构建文章整体结构,再考虑可能被argu

写科研论文导师不传授的细节

写科研论文导师不传授的细节 当审稿人拿起文章的时候,总体印象起了很大的作用.审稿人的心理定势就是判断“过”还是“不过”,没有第三种选择.如果他的印象是“不过”,就反复找理由挑毛病.所以要想通过,首先要写得好,让审稿人挑不出毛病.而如何写得好?很多细节导师是不会传授的.悟性非常重要!以前的硕士生导师的科技英语写作水平非常好,有时候她修改别人的文章后,会把修改文章的思路和文章的写法传授给我.后来读博士的时候,导师的写作水平就更棒了,他虽然没有条条框框地指导,但是从反复修改文章的红笔文字中,我也悟到了

SCI论文从入门到精通

第一部分 经验谈 一.先想先写最后做 做研究之前,必须想清楚:结果能不能发表?发表在哪里? 先把文章大框写好,空出数据,等做完实验填完空就可以发了:正所谓心中有沟壑! 在未搞清“写什么.发哪里.自己研究与同类研究有何出色之处”之前,就不要动手做! 继续去看文献,去想:想不清楚就做还不如不做! 要想这样做,就得先看文献!要知道如何把文章架起来.要知道别人是如何讨论的.要知道自己的数据是不是说明了与别人不同的东东或别人没有做过……这个过程就是阅读文献及思考的过程,这些搞清楚了,写就简单了! 要是先做

{ICIP2014}{收录论文列表}

This article come from HEREARS-L1: Learning Tuesday 10:30–12:30; Oral Session; Room: Leonard de Vinci 10:30  ARS-L1.1—GROUP STRUCTURED DIRTY DICTIONARY LEARNING FOR CLASSIFICATION Yuanming Suo, Minh Dao, Trac Tran, Johns Hopkins University, USA; Hojj

Draft-TCAD

IEEEtran.cls 1 %% 2 %% IEEEtran.cls 2007/03/05 version V1.7a 3 %% 4 %% 5 %% This is the official IEEE LaTeX class for authors of the Institute of 6 %% Electrical and Electronics Engineers (IEEE) Transactions journals and 7 %% conferences. 8 %% 9 %% S