Wide & Deep Learning Model

Generalized linear models with nonlinear feature transformations are widely used for large-scale regression and classification problems with sparse inputs. Memorization of feature interactions through a wide set of cross-product feature transformations are effective and interpretable, while generalization requires more feature engineering effort.

With less feature engineering, deep neural networks can generalize better to unseen feature combinations through low-dimensional dense embeddings learned for the sparse features. However, deep neural networks with embeddings can over-generalize。

Wide & Deep learning—jointly trained wide linear models and deep neural networks—to combine the benefits of memorization and generalization for recommender systems.

The Wide Component

The wide component is a generalized linear model of the form $y = w^T x + b$, as illustrated in Figure 1 (left). y is the prediction, $x = [x_1, x_2, ..., x_d] $ is a vector of d features, $w = [w_1, w_2, ..., w_d]$ are the model parameters and b is the bias. The feature set includes raw input features and transformed features(比如组合特征).

The Deep Component

The deep component is a feed-forward neural network, as shown in Figure 1 (right). For categorical features, the original inputs are feature strings (e.g., “language=en”). Each of these sparse, high-dimensional categorical features are first converted into a low-dimensional and dense real-valued vector, often referred to as an embedding vector. The dimensionality of the embeddings are usually on the order of O(10) to O(100). The embedding vectors are initialized randomly and then the values are trained to minimize the final loss function during model training. These low-dimensional dense embedding vectors are then fed into the hidden layers of a neural network in the forward pass. Specifically, each hidden layer performs the following computation:

$a ^{(l+1)} = f(W^{(l)} a ^{(l)} + b^ {(l)} )$

where l is the layer number and f is the activation function, often rectified linear units (ReLUs). $a^{(l)} , b^{(l)} , and W^{(l)} $ are the activations, bias, and model weights at l-th layer.

Joint Training of Wide & Deep Model

The wide component and deep component are combined using a weighted sum of their output log odds as the prediction, which is then fed to one common logistic loss function for joint training. Note that there is a distinction between joint training and ensemble. In an ensemble, individual models are trained separately without knowing each other, and their predictions are combined only at inference time but not at training time. In contrast, joint training optimizes all parameters simultaneously by taking both the wide and deep part as well as the weights of their sum into account at training time. for joint training the wide part only needs to complement the weaknesses of the deep part with a small number of cross-product feature transformations, rather than a full-size wide model.

Joint training of a Wide & Deep Model is done by backpropagating the gradients from the output to both the wide and deep part of the model simultaneously using mini-batch stochastic optimization. In the experiments, we used Followthe-regularized-leader (FTRL) algorithm [3] with L1 regularization as the optimizer for the wide part of the model, and AdaGrad [1] for the deep part.

The combined model is illustrated in Figure 1 (center). For a logistic regression problem, the model’s prediction is:

where Y is the binary class label, σ(·) is the sigmoid function, φ(x) are the cross product transformations of the original features x, and b is the bias term. $w_{wide}$ is the vector of all wide model weights, and $w_{deep}$ are the weights applied on the final activations $a^{(lf)}$ .

http://www.datakit.cn/blog/2016/08/21/wdnn.html

时间: 2024-08-11 05:28:37

Wide & Deep Learning Model的相关文章

【RS】Wide & Deep Learning for Recommender Systems - 广泛和深度学习的推荐系统

[论文标题]Wide & Deep Learning for Recommender Systems (DLRS'16) [论文作者] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil,Zakaria Haque, Lichan Hong,

论文笔记-Wide & Deep Learning for Recommender Systems

本文提出的W&D是针对rank环节的模型. 网络结构: 本文提出的W&D是针对rank环节的模型. 网络结构: wide是简单的线性模型,但是可以预先对特征做各种变换.交叉等来增加wide模型的非线性性. deep是一个FNN,对高维稀疏类别特征采取embedding降维,embedding的结果是在训练时候学出来的. wide与deep结合的方式,是将两者的输出通过加权最后喂给一个logistic损失函数.值得注意的是,这里是join train并不是ensemble,ensemble是

Review of Image Super-resolution Reconstruction Based on Deep Learning

Abstract With the deep learning method being applied to image super-resolution (SR), SR methods based on deep learning have achieved better reconstruction results than traditional SR methods. This paper briefly summarizes SR methods based on deep lea

Decision Boundaries for Deep Learning and other Machine Learning classifiers

Decision Boundaries for Deep Learning and other Machine Learning classifiers H2O, one of the leading deep learning framework in python, is now available in R. We will show how to get started with H2O, its working, plotting of decision boundaries and

【转载】A Brief Overview of Deep Learning

A Brief Overview of Deep Learning (This is a guest post by Ilya Sutskever on the intuition behind deep learning as well as some very useful practical advice. Many thanks to Ilya for such a heroic effort!) Deep Learning is really popular these days. B

A Brief Overview of Deep Learning

A Brief Overview of Deep Learning (This is a guest post by Ilya Sutskever on the intuition behind deep learning as well as some very useful practical advice. Many thanks to Ilya for such a heroic effort!) Deep Learning is really popular these days. B

Udacity Nanodegree Program: Deep Learning Foundation: New Syllusbus

Program Structure Every week, you can expect to see this content coming up: Siraj's introductory video & One hour coding session Additional lesson(s) from Mat & other Udacity experts Then, approximately every four weeks you'll get a project. The f

(转)分布式深度学习系统构建 简介 Distributed Deep Learning

HOME ABOUT CONTACT SUBSCRIBE VIA RSS DEEP LEARNING FOR ENTERPRISE Distributed Deep Learning, Part 1: An Introduction to Distributed Training of Neural Networks Oct 3, 2016 3:00:00 AM / by Alex Black and Vyacheslav Kokorin Tweet inShare27   This post

Deep Learning Libraries by Language

Deep Learning Libraries by Language Tweet Python Theano is a python library for defining and evaluating mathematical expressions with numerical arrays. It makes it easy to write deep learning algorithms in python. On the top of the Theano many more l