Multiple Linear Regression
Multiple linear regression is a stats technique that uses several independent variables or explanatory variable to get the outcome of a response variable or dependent variable
The formula for Multiple linear regression
yi = βo + β1 x1 + β2 x2 + …… + βp xp + ε
Where for i = n observation
yi = Dependent variable (Response variable)
xi = independent variable (explanatory variable)
βo = y-intercept
βp = Slope
ε = Error
For our project, we have only three variables obesity, inactivity and diabetic
we use the below formula
y = β₀ + β₁*x₁ + β₂*x₂ + ε.
Over Fitting
Overfitting is a condition where the model is working fine and predicting the values as well, when the new data is replaced with others or added into the model it won’t work and the result will be miss leading, all the other dependent values are all changes like R2 value, p-value
The newly added data will be either left out or broken in the model graphs
To overcome this issue we use Cross-validation
Cross Validation
Cross-validation means to test the data we use the randomly pick the data from the model and use it to create the model so that the previously created model uses the sequence of data but here we randomly pick the data and build the model with limits the Over Fitting issue.