博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
机器学习笔记(Washington University)- Regression Specialization-week three
阅读量:6688 次
发布时间:2019-06-25

本文共 2216 字,大约阅读时间需要 7 分钟。

1. Training Error

Define a loss funtion like below:

and the train error is defined as theaverage loss on houses in training set:

and RMSE is simply the square root of the average loss: 

The traning error decreases with the increase of model complexity.

The training error is overly optimistic.Because the weights war trained

to fit the training data, therefore,it is not a good measure of predictive performance.

 

2. Generalization error

Suppose that we can enum all the possible pair of square footage and the house price in a

distribution and the generalization error is averaged value over all pairs weighted by how likely

they are in the distribution.

With the increase of the model complexity, the error firstlky goes down, then goes up.

And we can not compute the generalization error.

 

3. Three errors

Noise:

it is inherently in the data.

 

Bias:

Over all possible N training set,  and the bias is the difference between the average fit and the true 

relationship, 

For low complexity model, it has a high bias and it is not flexible enought to represent the true relationship

for high complexity model,  the average fit is closer to the true relationship

 

variance:

for high complexity model,  the difference between different fits is larger.

 

tradeoff:

MSE=bias^bias + variance(we cannot compute bias and variance, because it is define using the true function)

and the goal is to find the minimum point in the MSE curve

 

4. Amount of data

If the model complexity is fixed, the true error decease with the increase of data points, and it will flaten out to  

bias + noise, bacause our model may not be flexible enought to capture the true relationship between x and y.

And the training error increase with the increase of data points and will flaten out to nearly the same point as the true error.

 

5. Validation set

In order to tune the model complexity, the validation set is needed. If we only use the test set, then the model complexity

was selected to minimize the test error, it is over optimistic. So we need train set,validation set and test set.

Validation set is used to choose the model complexity.

Test set is used to approximate the generalization error.T

 

转载于:https://www.cnblogs.com/climberclimb/p/6810245.html

你可能感兴趣的文章
linux bash bc awk 浮点 计算 比较
查看>>
基于socket.io的实时消息推送
查看>>
查询进程并杀死
查看>>
VMXNET3 vs E1000E and E1000
查看>>
7200的GRE(隧道)+ipsec(传输模式+pre-share)配置
查看>>
四、编译安装php-5.5.34
查看>>
Thinkpad X240修改bios引导,U盘安装系统
查看>>
Slave SQL: Relay log read failure: Could not parse relay log event entry.
查看>>
抽取Zabbix的图形整合到自己后台
查看>>
Linux输入子系统
查看>>
大数据_JAVA_第二天_进制转化和补码存储方式
查看>>
linux下oracle 11g dg环境搭建
查看>>
laravel安装intervention/image图像处理扩展 报错fileinfo is missing
查看>>
Jenkins(2)
查看>>
满血回归
查看>>
利用ARP欺骗另一台电脑并偷窥
查看>>
第一周作业
查看>>
Web应用的工作原理
查看>>
Python和Java就业前景对比
查看>>
Python学习笔记__9章 IO编程
查看>>