[Machine Learning for Trading] {ud501} Lesson 23: 03-03 Assessing a learning algorithm | Lesson 24: 03-04 Ensemble learners, bagging and boosting

A closer look at KNN solutions

What happens as K varies

What happens as D varies

Metric 1 RMS Error

In Sample vs out of sample

Which is worse?

Cross validation

5-fold cross validation

Roll forward cross validation

Metric 2: correlation

Correlation and RMS error

In most cases, as RMS error increases, correlation goes down. But yes, there are some cases where the opposite might happen (e.g. when there is a large bias).

So it might be hard to say for sure.

Overfitting

Overfitting Quiz

When k = 1, the model fits the training data perfectly, therefore in-sample error is low (ideally, zero). Out-of-sample error can be quite high.

As k increases, the model becomes more generalized, thus out-of-sample error decreases at the cost of slightly increasing in-sample error.

After a certain point, the model becomes too general and starts performing worse on both training and test data.

A Few other considerations

Ensemble learners

How to build an ensemble

If we combine several models of different types (here parameterized polynomials and non-parameterized kNN models), we can avoid being biased by one approach.

This typically results in less overfitting, and thus better predictions in the long run, especially on unseen data.

Bootstrap aggregating bagging

Correction: In the video (around 02:06), the professor mentions that n’ should be set to about 60% of n, the number of training instances. It is more accurate to say that in most implementations, n’ = n. Because the training data is sampled with replacement, about 60% of the instances in each bag are unique.

Overfitting

Yes, as we saw earlier, a 1NN model (kNN with k = 1) matches the training data exactly, thus overfitting.

An ensemble of such learners trained on slightly different datasets will at least be able to provide some generalization, and typically less out-of-sample error.

Correction: As per the question "Which is most likely to overfit?", the correct option to pick is the first one (single 1NN). The second option (ensemble of 10) has been mistakenly marked - but the intent is the same.

Bagging example

instances that was modelled poorly in the overall system before

Overfitation

As m increases, AdaBoost tries to assign more and more specific data points to subsequent learners, trying to model all the difficult examples.

Thus, compared to simple bagging, it may result in more overfitting.

Summary

原文地址:https://www.cnblogs.com/ecoflex/p/10977437.html

时间: 2024-11-05 14:38:00

[Machine Learning for Trading] {ud501} Lesson 23: 03-03 Assessing a learning algorithm | Lesson 24: 03-04 Ensemble learners, bagging and boosting的相关文章

[Machine Learning for Trading] {ud501} Lesson 3: 01-02 Working with multiple stocks

Lesson outline Lesson outline Here's an overview of what you'll learn to do in this lesson. Documentation links are for reference. Read in multiple stocks: Create an empty pandas.DataFrame with dates as index: pandas.date_range Drop missing date rows

[Machine Learning for Trading] {ud501} Lesson 19: 02-09 The Fundamental Law of active portfolio management | Lesson 20: 02-10 Portfolio optimization and the efficient frontier

this lesson => Buffet said two things => (1) investor skill => (2) breadth / the number of investments Grinold's Fundamental Law breadth => more opportunities to applying that skill => eg. how many stocks you invest in IC => information

[Machine Learning for Trading] {ud501} Lesson 7: 01-06 Histograms and scatter plots

A closer look at daily returns Histogram of daily returns gaussian => kurtosis = 0 How to plot a histogram Computing histogram statistics Select the option that best describes the relationship between XYZ and SPY. Note: These are histograms of daily

[Machine Learning for Trading] {ud501} Lesson 9: 01-08 Optimizers: Building a parameterized model | Lesson 10: 01-09 Optimizers: How to optimize a portfolio

What is an optimizer? Minimization example How to defeat a minimizer Convex problems Building a parameterized model Minimizer finds coefficients What is portfolio optimization? The difference optimization can make Which criteria is easiest to solve f

[Machine Learning for Trading] {ud501} Lesson 25: 03-05 Reinforcement learning | Lesson 26: 03-06 Q-Learning | Lesson 27: 03-07 Dyna

原文地址:https://www.cnblogs.com/ecoflex/p/10977470.html

[Machine Learning for Trading] {ud501} Lesson 21: 03-01 How Machine Learning is used at a hedge fund | Lesson 22: 03-02 Regression

原文地址:https://www.cnblogs.com/ecoflex/p/10977432.html

Deep Learning(深度学习)之(四)Deep Learning学习资源

十一.参考文献和Deep Learning学习资源 先是机器学习领域大牛的微博:@余凯_西二旗民工:@老师木:@梁斌penny:@张栋_机器学习:@邓侃:@大数据皮东:@djvu9-- (1)Deep Learning http://deeplearning.net/ (2)Deep Learning Methods for Vision http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/ (3)Neural Network for

【deep learning学习笔记】Recommending music on Spotify with deep learning

主要内容: Spotify是个类似酷我音乐的音乐网站,做个性化音乐推荐和音乐消费.作者利用deep learning结合协同过滤来做音乐推荐. 具体内容: 1. 协同过滤 基本原理:某两个用户听的歌曲都差不多,说明这两个用户听歌的兴趣.品味类似:某两个歌曲,被同一群人听,说明这两个歌曲风格类似. 缺点: (1)没有利用歌曲本身的特征(信息) (2)无法对"层级"的item进行处理,对于歌曲来说,这种层级关系体现在:专辑-主打歌-副歌,上面,这几种因素并不是同等重要的 (3)冷启动问题:

[Knowledge-based AI] {ud409} Lesson 23: 23 - Learning by Correcting Mistakes

Optional Readings : Winston Chapter 18 http://courses.csail.mit.edu/6.034f/ai3/rest.pdf Questions for Correcting Mistakes Visualizing Error Detection Error Detection Algorithm Explanation-Free Repair Explaining the Mistake Correcting the Mistake Conn