Chapter 9 Linear Predictors

In this chapter we will study the family of linear predictors, one of the most useful families of hypothesis classes. Many learning algorithms that are being widely used in practice rely on linear predictors, first and foremost because of the ability to learn them efficiently in many cases. In addition, linear predictors are intuitive, are easy to interpret, and fit the data reasonably well in many natural learning problems.

We will introduce several hypothesis classes belonging to this family – halfspaces, linear regression predictors, and logistic regression predictors – and present relevant learning algorithms: linear programming and the Perceptron algorithm for the class of halfspaces and the Least Squares algorithm for linear regression. This chapter is focused on learning linear predictors using the ERM approach; however, in later chapters we will see alternative paradigms for leaning these hypothesis classes.

First, we define the class of affine functions as

(1)

where

(2)

It will be convenient also to use the notation

(3)

which reads as follows: is a set of functions, where each function is parameterized by and , and each function takes as input a vector and returns as output the scalar .

The different hypothesis classes of linear predictors are compositions of a function on . For example, in binary classification, we can choose to be the sign function, and for regression problems, where , is simply the identity function.

It may be more convenient to incorporate , called the bias, into as an extra coordinate and add an extra coordinate with a value of 1 to all ; namely, let and let . Therefore,

(4)

It follows that each affine function in can be rewritten as a homogenous linear function in applied over the transformation that appends the constant 1 to each input vector. Therefore, whenever it simplifies the presentation, we will omit the bias term and refer to as the class of homogenous linear functions of the form .

时间: 2024-08-10 23:29:55

Chapter 9 Linear Predictors的相关文章

Chapter(1) -- Linear Equations in Linear Algebra

1. Systems of Linear Equations A linear equation in the variables x1,....xn is an equation that can be written in the form: a1x1+a2x2+......+anxn=b a1,....an is the cofficients A system of linear equations or linear system is a collection of one or m

A Statistical View of Deep Learning (I): Recursive GLMs

A Statistical View of Deep Learning (I): Recursive GLMs Deep learningand the use of deep neural networks [1] are now established as a key tool for practical machine learning. Neural networks have an equivalence with many existing statistical and mach

模型评估---交叉验证

1.原始交叉验证 # Import the linear regression class from sklearn.linear_model import LinearRegression # Sklearn also has a helper that makes it easy to do cross validation from sklearn.cross_validation import KFold import numpy as np # The columns we'll us

泰坦尼克号问题

学习了机器学习这么久,第一次真正用机器学习中的方法解决一个实际问题,一步步探索,虽然最后结果不是很准确,仅仅达到了0.78647,但是真是收获很多,为了防止以后我的记忆虫上脑,我决定还是记录下来好了. 1,看到样本是,查看样本的分布和统计情况 #查看数据的统计信息print(data_train.info())#查看数据关于数值的统计信息print(data_train.describe()) 通常遇到缺值的情况,我们会有几种常见的处理方式 如果缺值的样本占总数比例极高,我们可能就直接舍弃了,作

Memo - Chapter 6 of Strang's Linear Algebra and Its Applications

1.实对称矩阵的正定 2.实对称矩阵的半正定 3. Sylvester’s law of inertia : 4.Sylvester’s law of inertia 的推论: 5. SVD 6.瑞利伤: Memo - Chapter 6 of Strang's Linear Algebra and Its Applications

Memo - Chapter 3 of Strang's Linear Algebra and Its Applications

1.正交向量.正交空间.正交补空间 2.号称是本书最重要的配图 3.向量的cosine距离,投影变换,最小二乘 4.正交基与Schmidt正交化与QR分解 5.函数空间,傅里叶级数,Hilbert空间 Memo - Chapter 3 of Strang's Linear Algebra and Its Applications

Notes : <Hands-on ML with Sklearn & TF> Chapter 7

.caret, .dropup > .btn > .caret { border-top-color: #000 !important; } .label { border: 1px solid #000; } .table { border-collapse: collapse !important; } .table td, .table th { background-color: #fff !important; } .table-bordered th, .table-bordere

ISL - Ch.6 Linear Model Selection and Regularization

Q: Why might we want to use another fitting procedure instead of least squares? A: alternative fitting procedures can yield better prediction accuracy and model interpretability. 6.1 Subset Selection 6.1.1 Best Subset Selection Now in order to select

《CS:APP》 chapter 6 The Memory Hierarchy笔记

The Memory Hierarchy 6.1 Storage Technologies The earliest IBM PCs didn't even have a hard disk. 让我惊奇的是早期的IBM直接没有硬盘... 6.1.1 Random-Access Memory Random-access memory(RAM) comes in two varieties- static anddynamic . Static RAM (SRAM) is faster and si