L1 and L2 regularization add a cost to high valued weights to prevent overfitting. L1 regularization is an absolute value cost function and tends to set more weights to 0 (places more mass on zero weights) compared to L2 regularization.
Difference between L1 and L2
L2 (Ridge) shrinks all the coefficient by the same proportions but eliminates none, while L1 (Lasso) can shrink some coefficients to zero, performing variable selection.
Which to use
If
all the features are correlated with the label, ridge outperforms
lasso, as the coefficients are never zero in ridge. If only a subset of
features are correlated with the label, lasso outperforms ridge as in
lasso model some coefficient can be shrunken to zero.
reference:http://www.quora.com/What-is-the-difference-between-L1-and-L2-regularization