Assignment 4

Due Saturday 11:59 pm (Week 10)

In this assignment, you will be required to do research about Decision tree Regressor

and other common regressions such as Ridge Regression, Lasso Regression, and Logistic

Regression.

Like Assignment 3, you will need to do the research about these regression models, but

you don’t need to find their mathematical formulas. You are only required to know what

they are, how and when to use them.

1. Research (20 points)

You will need to find the answers for the following questions in order to help you

understand how these regressions work. Write your answers for each question in

your write-ups.

• What is Decision tree Regressor?

• What is the difference between Decision Tree Regressor and Decision Tree

Classifier.

• What is the feature importance in Decision Tree Regressor?

• What is Ridge Regression?

• What is Lasso Regression?

• What is Logistic Regression?

2. Use the Boston housing data again (Assignment 2). Since we have done EDA of this

dataset in assignment 2, it will save us a lot of time so that we can focus on applying

each regression that we discussed above. (50 points, each regression counts as 10

points.)

• For Linear Regression, Ridge Regression, Lass Regression, and Logistic

Regression, find the correlations for all the independent variables and

dependent variables. Select the feature variables that correlate to the price of

the house. (To use the logistics model, you may have to separate the price of the

house into low, medium, and high).

For Decision Tree Regressor, we will use all features to predict the price of the

Boston house price.

• Apply Linear Regression, Ridge Regression, Lass Regression, Logistic Regression,

and Decision Tree Regressor to the data. Your assignment should have at least 5

models.

• Comparing the MSE, RMSE, and its accuracies.

• Choose the model(s) that you think appropriate and predict the house price.

• Only for Decision Tree Regressor, do the tree visualization, and plot the feature

importance, find which feature has the highest importance, and which feature is

the second highest importance.

• Interpret the results.

General Requirements for all your assignments.

You will need to write up your findings, interpretations, and results (30 points) for this

assignment. Use the Machine Learning Workflow of Week 6 as a guideline for your

assignment. It will be a great idea to screenshot your codes, results, and graphs so that

you can explain your findings along with them. (It is also easier for me to follow you

when I read your paper). A pdf file is required. There is no page limit but try to be

straightforward with your answers.

The py file that you have used to finish your assignment. (It may be a duplicate or

somewhat duplicate of the screenshots that you have inserted in your paper but that is

okay. I would like to look over your codes.)