ID3 Algorithm – R Programming

School of Computer & Information Sciences

ITS 836 Data Science and Big Data Analytics

 

ITS 836

1

HW07-1 Apply ID3 Algorithm to demonstrate the Decision Tree for the data set

ITS 836

3

http://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/06prop/id3/id3.html

Select Size Color Shape
yes medium blue brick
yes small red sphere
yes large green pillar
yes large green sphere
no small red wedge
no large red wedge
no large red pillar

Back to HW07 Overview

HW07 Q 2

Analyze R code in section 7_1 to create the decision tree classifier for the dataset: bank_sample.csv

 

Create and Explain all plots an d results

 

 

 

ITS 836

4

# install packages rpart,rpart.plot

# put this code into Rstudio source and execute lines via Ctrl/Enter

library(“rpart”)

library(“rpart.plot”)

setwd(“c:/data/rstudiofiles/”)

banktrain <- read.table(“bank-sample.csv”,header=TRUE,sep=”,”)

## drop a few columns to simplify the tree

drops<-c(“age”, “balance”, “day”, “campaign”, “pdays”, “previous”, “month”)

banktrain <- banktrain [,!(names(banktrain) %in% drops)]

summary(banktrain)

# Make a simple decision tree by only keeping the categorical variables

fit <- rpart(subscribed ~ job + marital + education + default + housing + loan + contact + poutcome,method=”class”,data=banktrain,control=rpart.control(minsplit=1),

parms=list(split=’information’))

summary(fit)

# Plot the tree

rpart.plot(fit, type=4, extra=2, clip.right.labs=FALSE, varlen=0, faclen=3)

Back to HW07 Overview

 

4

HW07 Q 2

Analyze R code in section 7_1 to create the decision tree classifier for the dataset: bank_sample.csv

 

Create and Explain all plots an d results

 

 

 

ITS 836

5

 

5

HW07 Q 2

Analyze R code in section 7_1 to create the decision tree classifier for the dataset: bank_sample.csv

 

Create and Explain all plots and results

 

 

 

ITS 836

6

 

6

HW 7 Q3

Explain how a Random Forest Algorithm Works

ITS 836

7

ITS 836

Use Decision Tree Classifier and Random Forest

Attributes: sepal length, sepal width, petal length, petal width

All flowers contain a sepal and a petal

For the iris flowers three categories (Versicolor, Setosa, Virginica) different measurements

R.A. Fisher, 1936

8

HW07 Q4 Using Iris Dataset

Back to HW07 Overview

Get data and e1071 package

sample<-read.table(“sample1.csv”,header=TRUE,sep=”,”)

traindata<-as.data.frame(sample[1:14,])

testdata<-as.data.frame(sample[15,])

traindata #lists train data

testdata #lists test data, no Enrolls variable

install.packages(“e1071”, dep = TRUE)

library(e1071) #contains naïve Bayes function

model<-naiveBayes(Enrolls~Age+Income+JobSatisfaction+Desire,traindata)

model # generates model output

results<-predict(model,testdata)

Results # provides test prediction

 

 

 

 

 

 

 

 

 

ITS 836

10

Q5 HW07 Section 7.2 Naïve Bayes in R

Back to HW07 Overview

 

10

7.3 classifier performance

# install some packages

install.packages(“ROCR”)

library(ROCR)

# training set

banktrain <- read.table(“bank-sample.csv”,header=TRUE,sep=”,”)

# drop a few columns

drops <- c(“balance”, “day”, “campaign”, “pdays”, “previous”, “month”)

banktrain <- banktrain [,!(names(banktrain) %in% drops)]

# testing set

banktest <- read.table(“bank-sample-test.csv”,header=TRUE,sep=”,”)

banktest <- banktest [,!(names(banktest) %in% drops)]

# build the na?ve Bayes classifier

nb_model <- naiveBayes(subscribed~.,

data=banktrain)

 

 

ITS 836

11

# perform on the testing set

nb_prediction <- predict(nb_model,

# remove column “subscribed”

banktest[,-ncol(banktest)],

type=’raw’)

score <- nb_prediction[, c(“yes”)]

actual_class <- banktest$subscribed == ‘yes’

pred <- prediction(score, actual_class)

perf <- performance(pred, “tpr”, “fpr”)

 

plot(perf, lwd=2, xlab=”False Positive Rate (FPR)”,

ylab=”True Positive Rate (TPR)”)

abline(a=0, b=1, col=”gray50″, lty=3)

 

## corresponding AUC score

auc <- performance(pred, “auc”)

auc <- unlist(slot(auc, “y.values”))

auc

Back to HW07 Overview

7.3 Diagnostics of Classifiers

We cover three classifiers

Logistic regression, decision trees, naïve Bayes

Tools to evaluate classifier performance

Confusion matrix

 

ITS 836

12

Back to HW07 Overview

 

12

7.3 Diagnostics of Classifiers

Bank marketing example

Training set of 2000 records

Test set of 100 records, evaluated below

 

ITS 836

13

Back to HW07 Overview

 

13

HW07 Q07 Review calculations for the ID3 and Naïve Bayes Algorithm

ITS 836

14

Record OUTLOOK TEMPERATURE HUMIDITY WINDY PLAY GOLF
X0 Rainy Hot High False No
X1 Rainy Hot High True No
X2 Overcast Hot High False Yes
X3 Sunny Mild High False Yes
4 Sunny Cool Normal False Yes
5 Sunny Cool Normal True No
6 Overcast Cool Normal True Yes
7 Rainy Mild High False No
8 Rainy Cool Normal False Yes
9 Sunny Mild Normal False Yes
10 Rainy Mild Normal True Yes
11 Overcast Mild High True Yes
12 Overcast Hot Normal False Yes
X13 Sunny Mild High True No

Back to HW07 Overview

 

Questions?

ITS 836

15

 
Do you need a similar assignment done for you from scratch? Order now!
Use Discount Code "Newclient" for a 15% Discount!