Month: December 2020
Why do we use adjusted R-squared?
Adjusted R-Squared Statistics is used to ensure that increase in magnitude of R-Squared Statistics is not due to increase in number of variables but due to increase in model accuracy
What is an adjusted R-Squared Statistics?
Scientists observed that as the number of independent variables increased, so did the value of R-Squared Statistics. Thus casting doubt on the reliability of R-Squared Statistic. To remedy this vulnerability adjusted R-Squared Statistics was introduced. Its equation penalizes/counters the increase in R-Squared Statistics with increase in number of variables by introducing number of columns in the denominator of R-Squared calculations.
What is the R-Squared Statistics?
R-Squared statistics is used to deduce how much our model is able to explain the change in dependent variable.
What are the remedies for multicollinearity?
Three approaches can be used to deal with multicollinearity:
- Drop the variable responsible for multicollinearity
- Create a new variable by combining the correlated variables
- Leave it as it is.
How to detect multicollinearity?
The two prominent ways to detect multicollinearity are (1) Plotting correlation heat map (2) Calculating VIF of each independent variable.
What is multicollinearity?
Multicollinearity is observed when independent variables are correlated to one other i.e. change in one independent variable can be explained, to some extent, by other.
What is correlation?
Correlation is the statistical relationship between two variables. Variation in one is accompanied by variation in other, irrespective of the fact whether both have same source/cause of change or not.
In regression analysis, what does the term residuals signify?
Residual is difference between actual value of dependent variable and its predicted value.
What is gradient descent, and why is it used?
Gradient descent equation describes relationship between (1) current value of error and next value of error, or (2) slope (say m) and intercept of regression line (say c) and the corresponding error.
The error term is differentiated (partial derivatives) with respect to m & c. Then the resultant equations (1st order derivative) is equated with zero, as at minima the value of first order derivative is zero, and the values of m & c are calculated to plot/get the best fit line.