[1] The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle(使...混乱) and hide more or less the different explanatory factors(可解释的因素) of variation behind the data.
[2] For that reason, much of the actual effort in deploying machine learning algorithms goes into the design of preprocessing pipelines(预处理流程) and data transformations that result in a representation of the data that can support effective machine learning.
[3] However, the one-hot representation of a word suffers from data sparsity: Namely, for words that are rare in the labeled training data, their corresponding model parameters will be poorly estimated. Moreover, at test time, the model cannot handle words that do not appear in the labeled training data. These limitations of one-hot word representations have prompted researchers to investigate unsupervised methods for inducing(诱导,推导) word representations over large unlabeled corpora.
[4] With the increase in available data parallel machine learning has become an increasingly pressing problem.
[5] Given that(鉴于) the bandwidth of storage and network per computer has not been able to keep up with the increase in data, the need to design data analysis algorithms which are able to perform most steps in a distributed fashion without tight constraints on communication has become ever more pressing.
[6] Three recent papers attempted to break this parallelization barrier, each of them with mixed success(喜忧参半).
[7] Unfortunately, these algorithms are not applicable to a MapReduce setting since the latter is fraught with(充满了) considerable latency and bandwidth constraints between the computers.