【Coursera - machine learning】 Linear regression with one variable-quiz

Question 1

Consider the problem of predicting how well a student does in her second year of college/university, given how well they did in their first year. Specifically, let x be equal to the number of "A" grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of "A" grades they get in their second year (sophomore year).

Questions 1 through 4 will use the following training set of a small sample of different students‘ performances. Here each row is one training example. Recall that in linear regression, our hypothesis is $hθ(x)=θ_0+θ_1x$, and we use m to denote the number of training examples.

x y
5 4
3 4
0 1
4 3

For the training set given above, what is the value of m? In the box below, please enter your answer (which should be a number between 0 and 10).

Answer

m is the number of training examples. In this example, we have m=4 examples.

4

Question 2

Many substances that can burn (such as gasoline and alcohol) have a chemical structure based on carbon atoms; for this reason they are called hydrocarbons. A chemist wants to understand how the number of carbon atoms in a molecule affects how much energy is released when that molecule combusts (meaning that it is burned). The chemists obtains the dataset below. In the column on the right, “kJ/mol” is the unit measuring the amount of energy released. examples.

You would like to use linear regression (hθ(x)=θ_0+θ_1x) to estimate the amount of energy released (y) as a function of the number of carbon atoms (x). Which of the following do you think will be the values you obtain for θ_0 and θ_1? You should be able to select the right answer without actually implementing linear regression.

Answer

Since the carbon atoms (x) increase and the released heat (y) decreases, θ_1 has to be negative. θ_0 functionas as the offset. Looking at the table: a few θ_0 should be higher than -1000

  • θ_0=−1780.0,θ_1=−530.9
  • θ_0=−569.6,θ_1=−530.9
  • θ_0=−1780.0,θ_1=530.9
  • θ_0=−569.6,θ_1=530.9

Question Explanation

We can give an approximate estimate of the θ0 and θ1 values observing the trend of the data in the training set. We see that the y values decrease quite regularly when the x values increase, then θ1 must be negative. θ0 is the value that the hypothesis takes when x is equal to zero, therefore it must be superior to y(1) in order to satisfy the decreasing trend of the data. Among the proposed answers, the only one that meets both the conditions is hθ(x)=−569.6−530.9x. We can better appreciate these considerations observing the graph of the training data and the linear regression (below):


Question 3

Suppose we set θ_0=−1,θ_1=0.5. What is hθ(4)?

Answer

hθ(x) = θ_0 + θ_1x
hθ(x) = -1 + 0.5x
hθ(4) = -1 + 0.5 * 4
hθ_θ(4) = 1

Question 4

Let f be some function so that f(θ0,θ1) outputs a number. For this problem, f is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so f may have local optima). Suppose we use gradient descent to try to minimize f(θ0,θ1) as a function of θ0 and θ1. Which of the following statements are true? (Check all that apply.)

Answer

  • If θ0 and θ1 are initialized so that θ0=θ1, then by symmetry (because we do simultaneous updates to the two parameters), after one iteration of gradient descent, we will still have θ0=θ1.

    • The updates to θ0 and θ1 are different (even though we‘re doing simultaneous updates), so there‘s no particular reason to expect them to be the same after one iteration of gradient descent.
  • Setting the learning rate α to be very small is not harmful, and can only speed up the convergence of gradient descent.
    • If the learning rate is small, gradient descent ends up taking an extremely small step on each iteration, so this would actually slow down (rather than speed up) the convergence of the algorithm.
  • If the first few iterations of gradient descent cause f(θ0,θ1) to increase rather than decrease, then the most likely cause is that we have set the learning rate α to too large a value.
    • If alpha were small enough, then gradient descent should always successfully take a tiny small downhill and decrease f(\theta_0,\theta_1) at least a little bit. If gradient descent instead increases the objective value, that means alpha is too large (or you have a bug in your code!).
  • If the learning rate is too small, then gradient descent may take a very long time to converge.
    • If the learning rate is small, gradient descent ends up taking an extremely small step on each iteration, and therefore can take a long time to converge.

Question 5

Suppose that for some linear regression problem (say, predicting housing prices as in the lecture), we have some training set, and for our training set we managed to find some θ0, θ1 such that J(θ0,θ1)=0. Which of the statements below must then be true? (Check all that apply.)

Answer

  • For this to be true, we must have θ0=0 and θ1=0 so that hθ(x)=0

    • If J(θ0,θ1)=0, that means the line defined by the equation "y=θ0+θ1x" perfectly fits all of our data. There‘s no particular reason to expect that the values of θ0 and θ1 that achieve this are both 0 (unless y(i)=0 for all of our training examples).
  • Our training set can be fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straight line.
    • If J(θ0,θ1)=0, that means the line defined by the equation "y=θ0+θ1x" perfectly fits all of our data.
  • For this to be true, we must have y(i)=0 for every value of i=1,2,…,m.
    • So long as all of our training examples lie on a straight line, we will be able to find θ0 and θ1 so that J(θ0,θ1)=0. It is not necessary that y(i)=0 for all of our examples.
  • We can perfectly predict the value of y even for new examples that we have not yet seen. (e.g., we can perfectly predict prices of even new houses that we have not yet seen.)
    • Even though we can fit our training set perfectly, this does not mean that we‘ll always make perfect predictions on houses in the future/on houses that we have not yet seen.
时间: 2024-10-11 16:59:46

【Coursera - machine learning】 Linear regression with one variable-quiz的相关文章

【Stanford Open Courses】Machine Learning:Linear Regression with One Variable (Week 1)

从Ⅱ到Ⅳ都在讲的是线性回归,其中第Ⅱ章讲得是简单线性回归(simple linear regression, SLR)(单变量),第Ⅲ章讲的是线代基础,第Ⅳ章讲的是多元回归(大于一个自变量). 本文的目的主要是对Ⅱ章中出现的一些算法进行实现,适合的人群为已经看完本章节Stanford课程的学者.本人只是一名初学者,尽可能以白话的方式来说明问题.不足之处,还请指正. 在开始讨论具体步骤之前,首先给出简要的思维路线: 1.拥有一个点集,为了得到一条最佳拟合的直线: 2.通过"最小二乘法"来

Machine Learning - II. Linear Regression with One Variable (Week 1)

http://blog.csdn.net/pipisorry/article/details/43115525 机器学习Machine Learning - Andrew NG courses学习笔记 单变量线性回归Linear regression with one variable 模型表示Model representation 例子: 这是Regression Problem(one of supervised learning)并且是Univariate linear regressi

Machine Learning:Linear Regression With One Variable

Machine Learning:Linear Regression With One Variable 机器学习可以应用于计算机视觉,自然语言处理,数据挖掘等领域,可以分为监督学习(Supervised Learning),无监督学习(Unsupervised Learning),强化学习(Reinforcement Learning)等. 首先我们从一个简单的监督学习入手:假如给我们一组训练集(在这里就是Size和Price),我们如何才能建立一个可以预测房价的模型呢? 这里(x,y)称为一

【machine learning】linear regression

一.曲线拟合 1.问题引入 ①假设现在有一份关于某城市的住房面积与相应房价的数据集 表1    居住面积与房价关系 图1    居住面积与房价关系 那么给定这样一个数据集,我们怎么学习出一个以住房面积大小为自变量的用于预测该城市房价的函数? 问题可形式化为 给定大小为m的训练样本集 我们希望学习的目标函数为 房价预测本质上是回归问题 a.回归分析挖掘自变量与因变量之间的关系 b.有监督的学习问题,所有的样本点都带有目标变量 c.输出变量为连续值,可取任意实数 ②假设现在我们有份更详尽的数据集,它

Machine Learning:Linear Regression With Multiple Variables

Machine Learning:Linear Regression With Multiple Variables 接着上次预测房子售价的例子,引出多变量的线性回归. 在这里我们用向量的表示方法使表达式更加简洁. 变量梯度下降跟单变量一样需同步更新所有的theta值. 进行feature scaling的原因是为了使gradient descent算法收敛速度加快.如下图所示,左图theta2与theta1的量级相差太大,这样导致Cost Function的等高图为一个细高的椭圆形状,可以看到

CheeseZH: Stanford University: Machine Learning Ex1:Linear Regression

(1) How to comput the Cost function in Univirate/Multivariate Linear Regression; (2) How to comput the Batch Gradient Descent function in Univirate/Multivariate Linear Regression; (3) How to scale features by mean value and standard deviation; (4) Ho

Machine Learning - IV. Linear Regression with Multiple Variables (Week 2)

http://blog.csdn.net/pipisorry/article/details/43529845 机器学习Machine Learning - Andrew NG courses学习笔记 multivariate linear regression多变量线性规划 (linear regression works with multiple variables or with multiple features) Multiple Features(variables)多特征(变量)

机器学习 Machine Learning(by Andrew Ng)----第二章 单变量线性回归(Linear Regression with One Variable)

第二章 单变量线性回归(Linear Regression with One Variable) <模型表示(Model Representation)>                                                             <代价函数(Cost Function)>                                                          <梯度下降(Gradient Descent)

ON THE EVOLUTION OF MACHINE LEARNING: FROM LINEAR MODELS TO NEURAL NETWORKS

ON THE EVOLUTION OF MACHINE LEARNING: FROM LINEAR MODELS TO NEURAL NETWORKS We recently interviewed Reza Zadeh (@Reza_Zadeh). Reza is a Consulting Professor in the Institute for Computational and Mathematical Engineering at Stanford University and a