Week 1 (9/18) Multiple Linear regression and Cross validation.

Multiple Linear Regression

Multiple linear regression is a stats technique that uses several independent variables or explanatory variable to get the outcome of a response variable or dependent variable

The formula for Multiple linear regression

yi = βo + β1 x1 + β2 x2 + …… + βp xp + ε
Where for i = n observation
yi = Dependent variable (Response variable)
xi = independent variable (explanatory variable)
βo = y-intercept
βp = Slope
ε = Error

For our project, we have only three variables obesity, inactivity and diabetic
we use the below formula
y = β₀ + β₁*x₁ + β₂*x₂ + ε.

Over Fitting

Overfitting is a condition where the model is working fine and predicting the values as well, when the new data is replaced with others or added into the model it won’t work and the result will be miss leading, all the other dependent values are all changes like R2 value, p-value

The newly added data will be either left out or broken in the model graphs
To overcome this issue we use Cross-validation

Cross Validation

Cross-validation means to test the data we use the randomly pick the data from the model and use it to create the model so that the previously created model uses the sequence of data but here we  randomly pick the data and build the model with limits the  Over Fitting  issue.

Leave a Reply

Your email address will not be published. Required fields are marked *