Grand Canyon University Data Analytics Naïve Analysis & Summary
The Universal Bank data set will be needed for this assignment. To access the data set, review the “Universal Bank” topic Resource.
Part 1: Using R, complete all portions of Problem 1 in Chapter 8. Please note, you can add comments using # in your code to address the narrative parts of the problem. Be sure to include your R code and R output as a .txt file with your submission.
Problems
Personal Loan Acceptance. The file UniversalBank.csv contains data on 5000 customers of Universal Bank. The data include customer demographic information (age, income, etc.), the customer’s relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign. In this exercise, we focus on two predictors: Online (whether or not the customer is an active user of online banking services) and Credit Card (abbreviated CC below) (does the customer hold a credit card issued by the bank), and the outcome Personal Loan (abbreviated Loan below).Partition the data into training (60%) and validation (40%) sets.
Create a pivot table for the training data with Online as a column variable, CC as a row variable, and Loan as a secondary row variable. The values inside the table should convey the count. In R use functions melt() and cast(), or function table().
Consider the task of classifying a customer who owns a bank credit card and is actively using online banking services. Looking at the pivot table, what is the probability that this customer will accept the loan offer? [This is the probability of loan acceptance (Loan = 1) conditional on having a bank credit card (CC = 1) and being an active user of online banking services (Online = 1)].
Create two separate pivot tables for the training data. One will have Loan (rows) as a function of Online (columns) and the other will have Loan (rows) as a function of CC.
Compute the following quantities [P(A ? B) means “the probability of A given B”]:
- P(CC = 1 ? Loan = 1) (the proportion of credit card holders among the loan acceptors)
P(Online = 1 ? Loan = 1)
P(Loan = 1) (the proportion of loan acceptors)
P(CC = 1 ? Loan = 0)
P(Online = 1 ? Loan = 0)
P(Loan = 0)
Use the quantities computed above to compute the naive Bayes probabilityP(Loan = 1 ? CC = 1, Online = 1).
Compare this value with the one obtained from the pivot table in (b). Which is a more accurate estimate?
Which of the entries in this table are needed for computing P(Loan = 1 ? CC = 1, Online = 1)? In R, run naive Bayes on the data. Examine the model output on training data, and find the entry that corresponds to P(Loan = 1 ? CC = 1, Online = 1). Compare this to the number you obtained in (e).
Part 2: How can the bank use the information about online customers and those with credit cards to inform its strategy for increasing the number of personal loans accepted by customers? Present your findings and recommendations to management in the form of an executive summary that includes relevant data, charts, and tables in Microsoft Word.