机器学习笔记（Washington University）- Regression Specialization-week three-白红宇

机器学习笔记（Washington University）- Regression Specialization-week three

阅读量：6688 次

发布时间：2019-06-25

本文共 2216 字，大约阅读时间需要 7 分钟。

1. Training Error

Define a loss funtion like below:

and the train error is defined as theaverage loss on houses in training set:

and RMSE is simply the square root of the average loss:

The traning error decreases with the increase of model complexity.

The training error is overly optimistic.Because the weights war trained

to fit the training data, therefore,it is not a good measure of predictive performance.

2. Generalization error

Suppose that we can enum all the possible pair of square footage and the house price in a

distribution and the generalization error is averaged value over all pairs weighted by how likely

they are in the distribution.

With the increase of the model complexity, the error firstlky goes down, then goes up.

And we can not compute the generalization error.

3. Three errors

Noise:

it is inherently in the data.

Bias:

Over all possible N training set, and the bias is the difference between the average fit and the true

relationship,

For low complexity model, it has a high bias and it is not flexible enought to represent the true relationship

for high complexity model, the average fit is closer to the true relationship

variance:

for high complexity model, the difference between different fits is larger.

tradeoff:

MSE=bias^bias + variance(we cannot compute bias and variance, because it is define using the true function)

and the goal is to find the minimum point in the MSE curve

4. Amount of data

If the model complexity is fixed, the true error decease with the increase of data points, and it will flaten out to

bias + noise, bacause our model may not be flexible enought to capture the true relationship between x and y.

And the training error increase with the increase of data points and will flaten out to nearly the same point as the true error.

5. Validation set

In order to tune the model complexity, the validation set is needed. If we only use the test set, then the model complexity

was selected to minimize the test error, it is over optimistic. So we need train set,validation set and test set.

Validation set is used to choose the model complexity.

Test set is used to approximate the generalization error.T

转载于:https://www.cnblogs.com/climberclimb/p/6810245.html

你可能感兴趣的文章

linux bash bc awk 浮点计算比较

VMXNET3 vs E1000E and E1000

查看>>

7200的GRE（隧道）+ipsec(传输模式+pre-share)配置

查看>>

四、编译安装php-5.5.34

查看>>

Thinkpad X240修改bios引导，U盘安装系统

查看>>

Slave SQL: Relay log read failure: Could not parse relay log event entry.

大数据_JAVA_第二天_进制转化和补码存储方式

查看>>

linux下oracle 11g dg环境搭建

查看>>

laravel安装intervention/image图像处理扩展报错fileinfo is missing