Hello students! I apologize if this is boring for some of you but I feel that it was so helpful to try to explain even the most basic things (because you realize that maybe you don't even understand it yourself when you're trying to explain it) so I will continue. Feel free (obviously) to skip if you're not at all interested and maybe next time I'll post some pictures of mountains or French cakes or something (stay tuned to this post however and you might be surprised! hey.. you can't just cheat and scroll down looking for the pictures right away.. that's not how this thing works.)
One thing you may be asking yourself is why do we need any math at all to predict y given x? We can easily plot the data and just predict a y value for any x by inspection. We could just take a look at the data and draw a line ourselves. Well, we started with a super easy example! Normally, we will have more than one feature. Consider whether a bank is going to accept a loan that you propose. The output y is YES or NO and the input x should generally consist of more than just the amount of the loan. They probably will want to consider the amount, your salary, your credit history, what the loan is for, etc. All of these are called.... that's right, features. Let's go back to those (grrrrr) cats. Say we still want to predict the weight of the brain, but now instead of just knowing the body weight, we also know the sex (seems reasonable, right?). We need a way for the computer to understand the feature 'sex' so let's just say that for a female cat, sex=1, and for a male cat, sex=2. Then, we can plot this data but we need a 3d plot now because the response (the brain weight) will be the on the vertical axis (the height), and the other two variables will be -- to be simple -- length and depth. It looks something like this:
So here the red points near the 'front' of the screen are female cats and the black points near the 'back' of the screen are males. We could still draw some kind of predictor but now it wouldn't just be a line, it would have to be a plane (like a flat rectangle). We can still learn this using our linear regression where the predictor will now be f(b,s) = h = w0 + w1*b + w2*s (I changed the function name from h to f because h is for the height in this problem and we don't want to get confused). So, given the body weight and the sex, we want to predict the height. When we run our gradient descent algorithm, we get the following learned parameters:
w0 = -0.4150
w1 = 4.0758
w2 = -0.0821
So, things are almost the same as before for the first two parameters, but the new parameter w2 shows that (because we chose s=1 for females and s=2 for males) we'll want to adjust down the estimate of the heart weight for males twice as much as we do for females. This also makes some intuitive sense because males may be bigger in general but we might expect heart weights to be more uniform. We can still graph the predictor (plane now) and it looks like this:
Note that this is not a great example because one of the axes (sex) is pretty limited in range (it's binary) but it helps to illustrate the point (I hope). If we had just one more feature then we could no longer visualize the data very easily and you start to see why we need computational approaches. Even with the single feature case although we could draw a line through the data that approximates the relationship, if we want to be precise and formal about things, getting the "best" fit can be done rigorously this way.
Ok, I'll give one more example that we can visualize (2 features) where there are more possibilities for the predictor features. Here, we want to predict the gas mileage (miles per gallon) of a car given its weight in tons and the horse power of the engine. Note that generally we would use even more features (for example maybe the number of cylinders in the engine) but we want to be able to visualize it for now just for our edification. Here's the data along with the predictor plane fit using our linear model:
Not bad, right? Here we can really see how you need a plane and how both features contribute to the overall prediction. For a little more "understanding" of what this provides us in terms of the features, here are the parameters for the weight and for the engine horse power:
weight = -3.87783
hp = -0.03177
So, we can see that the higher the weight, the lower the miles per gallon, and the higher the horse power, the lower the miles per gallon. Again, intuitive. We also see that since the weight parameter's magnitude is so much bigger, we can conclude that weight is a much more important factor than horse power in getting a lower mpg.
When we're performing much more complex regression with many (maybe hundreds or even thousands) of features, of course there's no way that we can visualize it perfectly. But, we can determine a predictor function and a lot of the time do quite well.
One other thing you may be wondering about is why we have to use an incremental approach to learn the parameters. Well, in most interesting applications, we do have to, but here we actually don't (I just showed it because it's the more general approach). It turns out that for this simple problem we can get a closed-form (exact) solution without having to incrementally get there. I won't go through the math now, but for simple linear regression, if all of our training data is stored in a matrix X where each row is a training example and each column represents a single feature, then we can get the exact values for the parameters (the w's) as follows:
w = (XTX)-1XTy
Cool, eh?
Perhaps next time we'll look at either binary classification (perhaps given the body and heart weight of the cat, predict its sex) and maybe we'll cover some more goods on linear regression! EXCITEMENT AWAITS!!!
No comments:
Post a Comment