第一章的主要目的是为了了解一下基本概念,如什么是机器学习、无监督学习、监督学习等等。
一、什么是机器学习
1、机器学习是一门新的研究领域,主要是指在不需要显示编程情况下,计算机具有学习的能力
Field of study that gives computers the ability to learn without being explicitly programmed——Arthur Samuel (1959)
2、A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E——Tom Mitchell (1998)
question:
Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam. What is the task T in this setting?
A. Classifying emails as spam or not spam. T
B. Watching you label emails as spam or not spam. E
C. The number (or fraction) of emails correctly classified as spam/not spam. P
D. None of the above—this is not a machine learning problem.
二、机器学习算法
1、Supervised learning
2、Unsupervised learning
3、Reinforcement learning
4、Recommender system
三、Supervised learning
有监督学习的特点:样本是有标签的
1、回归问题:预测给定样本(测试样本)的输出值
2、分类问题:分类出给定样本(测试样本)的标签,如:肿瘤问题,1表示肿瘤是恶性的,0表示良性
question:
Problem 1: You have a large inventory of identical items. You want to predict how many of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for each account decide if it has been hacked/compromised.
Should you treat these as classification or as regression problems?
A. Treat both as classification problems.
B. Treat problem 1 as a classification problem, problem 2 as a regression problem.
C. Treat problem 1 as a regression problem, problem 2 as a classification problem.
D. Treat both as regression problems.
四、Unsupervised learning
无监督学习的特点:样本没有标签,如下图,聚类是经典的无监督学习
question:
which would you address using an unsupervised learning algorithm?
A. Given email labeled as spam/not spam, learn a spam filter.
B. Given a set of news articles found on the web, group them into set of articles about the same story.
C. Given a database of customer data, automatically discover market segments and group customers into different market segments.
D. Given a dataset of patients diagnosed as either having diabetes or not, learn to classify new patients as having diabetes or not.