[转载]审稿意见分享Modeling Temporal Dependencies in High

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to
Polyphonic Music Generation and Transcription

by Nicolas Boulanger-Lewandowski, Yoshua Bengio, Pascal Vincent at ICML 2012

We investigate the problem of modeling symbolic sequences
of polyphonic music in a completely general piano-roll representation. We
introduce a probabilistic model based on distribution estimators conditioned on
a recurrent neural network that is able to discover temporal dependencies in
high-dimensional sequences. Our approach outperforms many traditional models of
polyphonic music on a variety of realistic datasets. We show how our musical
language model can serve as a symbolic prior to improve the accuracy of
polyphonic transcription.

审稿意见

Posted on behalf of anonymous ICML reviewer.

Summary:
The papers main strengths are the generality of the proposed
methods, their clear improvement of the state of the art, the clarity of the
presentation, and the thoroughness of the experimental results.

Perhaps the greatest weakness is the degree of novelty: the only new model is
the RNN-RBM which is an incremental change from an RTRBM. However some of the
model-combinations (such as RNN-NADE) appear to be novel, as is the application
to
music.
--------------------------------------------------------
Detailed
Comments:
The paper addresses the problem of modelling polyphonic musical
notes. More specifically it aims to learn a predictive distribution over musical
notes at the next timestep, given those at previous timesteps. As well as being
interesting in its right, the task is compelling because it combines
(relatively) high dimensional, unconstrained density modelling with long-range
sequence modelling. Unlike most existing approaches (which use domain-specific
representations to reduce the output space) the proposed system allows any note
to be emitted at any time, which suggests that it will transfer well to other
kinds of high-dimensional sequence.

The paper also demonstrates that the predictive model can be used to improve
the performance of polyphonic transcription from audio data - the musical
equivalent of combining a language model with an audio model in speech
recognition. The combination is performed in a somewhat ad-hoc manner, and
requires the audio data to be preprocessed. However it clearly outperforms the
baseline HMM smoothing model.

The system is based on various flavour of recurrent neural network. This is
attractive for several reasons: RNNs place no restrictions on the amount of
previous context used; they can be easily adapted to different kinds of
sequential data; and they allow the density modelling and the sequence modelling
to be jointly optimised. As well as evaluating several RNN variants found in the
literature, the paper introduces a new RNN: the RNN-RBM. Although this only a
slight modification of an existing architecture (the RTRBM), the change is
clearly motivated and its advantages are demonstrated by the experiments.

The experimental results in Section 6 are exceptionally thorough, comparing
the six proposed variants of RNN models with seven common baselines across four
different datasets. This makes it possible to tease out the relative pros and
cons of the different architectures. I also liked the comparison with the
non-temporal frame-level models, which clearly shows how much benefit the
sequential models bring. The best proposed systems outperform the baselines on
all the datasets, often by a large margin.

The paper is very clearly written, well organized, and commendably concise,
given the large amount of material covered.

Suggestions:

Abstract:
‘in a completely general representation‘ doesn‘t say much.
Either remove this or reword it to be more specific.
‘, that is appropriate
to discover temporal dependencies in such high-dimensional sequences with the
help of Hessian-free optimization and pretraining techniques.‘ -> ‘that is
able to discover temporal dependencies in high-dimensional sequences.‘

Introduction:
‘designed for the multiple classification task‘ Does this
mean 1-of-K classification? Please clarify.

Section 2:
line 217. If possible, put the exact gradient into the paper to
keep it self-contained.

Section 4.1
Is the cross entropy cost in equation (12) the objective
function used for the RNN and RNN (HF) models in Section 6? Please clarify.

Section 5
What does Figure 3 add besides an obligatory receptive field
picture? I think the space could be better used.

Section 6:
Line 560: What does ‘transposed in a common tonality‘
mean?
Lines 592-598: This sentence isn‘t clear to me. Which datasets are
complex and which are simple, and why?
line 629: There‘s an additional n in
‘additionnal‘
line 632: Why do you say RNN-NADE is more robust? RNN-RBM has
the best log-loss on 2/4 datasets and the best accuracy on 3.
Table 1:
Put the best score in each column in bold. Maybe also reorder the table
according to some overall performance measure... average log-loss?
How big
was N for the note N-gram and N-gram models, and how many different N-grams were
needed? I would like to see this recorded for each dataset.

Conclusions:
Make sure you take out the bold text for the final
version.

原文链接

http://icml.cc/discuss/2012/

http://icml.cc/discuss/2012/590.html

[转载]审稿意见分享Modeling Temporal Dependencies in High

时间： 2024-08-06 11:53:56

[转载]审稿意见分享Modeling Temporal Dependencies in High

[转载]审稿意见分享Modeling Temporal Dependencies in High的相关文章

RNN-RBM for music composition 网络架构及程序解读

如何回复审稿人意见

项目中使用Quartz集群分享--转载

NIPS 2014

你在发表理科学术文章过程中有哪些经验值得借鉴？

写科研论文导师不传授的细节

SCI论文从入门到精通

{ICIP2014}{收录论文列表}

Draft-TCAD