Computer Programming

Machine Learning | Multiple Linear Regression

In the last post we learnt about Linear regression with one variable. The hypothesis function for it was:

ℎ_ \theta(x) = \theta_ 0 + \theta_ 1(𝑥)

which is a straight line.

What if, we have more than one independent variables or features. Then our hypothesis function could be like:

h_ \theta (x) = \theta_ 0 + \theta_ 1 (x) + \theta_ 2 (x^2) + \theta_ 3 (x^3)

And our (y vs x plots) graphs would be like:

As we could see, with more independent variables, our model exactly fits the training data. This is called Overfitting. While with less features it doesn’t fit the input well, this is called Underfitting. Now while underfitting would result in wrong predictions, overfitting is also not good for our machine learning model. Because though, it would match the training set well, but it might not work well with the test or actual data. As it is biased towards our training set and will only works well with the same or similar input. It would give poor performance on variable data sets.

How to choose the number of parameters?

It is obvious that, to solve underfitting, we could introduce more number of features. But how do we solve overfitting?

To avoid overfitting:

  1. One natural way is to reduce the number of parameters and adjust them so that it doesn’t just works well with the given set of data.
  2. Second way is Regularisation. It means to reduce the magnitude of parameters \theta_ i, while keeping all the features.

Regularisation

For instance, if our hypothesis function is:

h_ \theta (x) = \theta_ 0 + \theta_ 1 (x) + \theta_ 2 (x^2) + \theta_ 3 (x^3) + ....

and we penalise \theta_ 3 and remaining terms in the cost function. So our cost function will become like:

J( \theta ) = 1/2m[\sum_{i=1}^{n} ({h_ \theta (x^i)} - {pred^i}) + \lambda \sum_{j=1}^{n} (\theta_ j ^2)]

Here, \lambda \sum_{j=1}^{n} (\theta_ j ^2)] is the regularisation term. Remember, our goal is to minimise cost function. Now, by introducing regularisation we are increasing the value of cost function for certain parameters (\theta_ j)s .

Choosing Regularisation factor (\lambda )

If \lambda is too large, it may cause underfitting. And if \lambda is too small, it may have no effect. So we have to choose regularisation factor according to our data.

NOTE: In certain ML libraries like XGBoost, there are two regularisation factor: L1 (\lambda) & L2 (\alpha).

Reference:

Coursera | Online Courses & Credentials From Top Educators. Join for Free | Coursera