Call toll free: +1 (304) 900-6229 or Request a call

support@essayhelp123

Background · For this assignment, you will be using the Cleansing_Week4.R script and the data.csv · The code is in R

Background

· For this assignment, you will be using the Cleansing_Week4.R script and the data.csv

· The code is in R programming language. You should open R studio and open the file Cleansing_Week4.R. Follow the steps in the code and answer each of the following questions below.

· Some manipulation and rework of the code is required. The steps are explained in detail in the Code.

· Steps 0 through Step 5, included, should be completed.

Instructions

You should complete all the steps provided in the code and answer the following questions in a report.

After you complete your readings, and listen to the provided videos (Required), you will proceed with this implementation and report.

1.  Introduction

· Provide information about the Language, GUI, and Data File you are using in this assignment. Use references to support the importance of the language you are using, the advantages, disadvantages, and how it relates to other languages that are used in Data Science.

· Provide the Value stored in the variable Randomizer in your code and your Student ID in this section. Take a printscreen of the output in your Console and paste it here.

2.  Data Presentation before Cleansing

Run Step 0 and answer the following questions.

A. Data file format and the corresponding command that you used to read the data. Does the file have headers?

B. How many observations are there?

C. How many variables are in the data?

D. What is the purpose of the command str(df). Take a printscreen of the output in your Console and paste it here.

E. summary(df) # find out what this means and answer the question in your paper.

F. Answer the following questions:

a. # What type of variables does your file include

b. # Specific data types?

c. # Are they read properly?

d. # Are there any issues?

e. # Does your file include both NAs and blanks? How did you identify those?

f. # How many NAs do you have and

g. # How many blanks?

3.  Data Preprocessing

A. Summarize the steps of preprocessing you expect to complete before you run the previous steps in your code. Recommend methods of inputting NAs in each of the variables when needed, and or observations. Review literature and suggest methods of imputation for Categorical and Numeric Variables.

B. Run the Step 1 in your code. How this step affected the NAs and the blanks in your variables (you can run summary(df)) to determine this. Take a printscreen of the output in your Console and paste it here.

C. For each of the Numeric Variables record the Mean and the Median, for the Categorical Variables record the counts. Present them on your paper on a table.

D. Run Steps 2-3 and 4. How many observations include NAs, how many variables include NAs, what is the percentage of rows and columns that have NAs, if we were to eliminate those, what is the approximate size of the remaining dataset? Is this the proper method of imputing?

E. Run Step 5 and answer the following questions:

1.

a. What is the method of imputation that is described? What does linear interpolation mean? Research and discuss if this is an appropriate method. The above method of imputation has now changed some of the statistics of your variables.

· Run summary(df) and compare with the previous statistics. Take a printscreen of the output in your Console and paste it here.

· Do you observe any undesired changes? Explain in detail, how could you have avoided this?

· Are there any more NA’s in your file?

Length: This assignment must be 4-5 pages (excluding the title and reference page)

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

Operational Excellence Week 2 Assignment Information

Operational Excellence Week 2 Assignment Information Systems for Business and Beyond Questions: · Chapter 3 – study questions 1-8, Exercise 2, 4 & 5 Information Technology and Organizational Learning Assignment: Chapter 3 – Complete the two essay assignments noted below:  · Review the strategic integration section.  Note what strategic integration is and how

Operational Excellence Week 2 Assignment information

Operational Excellence Week 2 Assignment information Systems for Business and Beyond Questions · Chapter 2 – study questions 1-10, Exercise 2      Information Technology and Organizational Learning Questions · Chapter 2 – Note why the IT organizational structure is an important concept to understand.  Also, note the role of

Pg. 01 Project I Project Deadline: Sunday 12/5/2024 @ 23:59 [Total

Pg. 01 Project I Project Deadline: Sunday 12/5/2024 @ 23:59 [Total Mark is 14] Introduction to Database IT244 College of Computing and Informatics Project Instructions · You can work on this project as a group (minimum 2 and maximum 3 students). Each group member must submit the project individually with