Introduction to Machine Learning (CS 5710)¶

Assignment 2¶

Due by 26th September (Thursday) 11:59pm¶

In this assignment, you need to complete the following four sectoins:

KNN
Linear regression
Logistic regression
Regularization

Submission guideline¶

Open this notebook with jupyter notebook and start writing codes
After finishing writing your codes, click the Save button at the top of the Jupyter Notebook.
Please make sure to have entered your UCM ID below.
Select Cell -> All Output -> Clear. This will clear all the outputs from all cells (but will keep the content of all cells).
Select Cell -> Run All. This will run all the cells in order.
Once you’ve rerun everything, select File -> Download as -> HTML or PDF via LaTeX
Look at the HTML/PDF file and make sure all your solutions are there, displayed correctly.
Zip BOTH the HTML/PDF file and this .ipynb notebook (updated with your codes). Rem
Submit your zipped file.

# Please Write Your UCM ID Here:

Section 1. KNN [30 pts]¶

The following KNN assignment is modified from Stanford CS231n. Please complete and hand in this completed worksheet.

In [1]:

# Run some setup code for this notebook.

from __future__ import print_function
import random
import numpy as np
from data_utils import load_CIFAR10
import matplotlib.pyplot as plt


# This is a bit of magic to make matplotlib figures appear inline in the notebook
# rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

Download data:¶

Once you have the starter code (regardless of which method you choose above), you will need to download the CIFAR-10 dataset. Run the following from the assignment1 directory:

cd data ./get_datasets.sh

In [2]:

# Load the raw CIFAR-10 data.
cifar10_dir = 'data/cifar-10-batches-py'

# Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
try:
   del X_train, y_train
   del X_test, y_test
   print('Clear previously loaded data.')
except:
   pass

X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

Training data shape:  (50000, 32, 32, 3)
Training labels shape:  (50000,)
Test data shape:  (10000, 32, 32, 3)
Test labels shape:  (10000,)

In [3]:

# Visualize some examples from the dataset.
# We show a few examples of training images from each class.
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
    idxs = np.flatnonzero(y_train == y)
    idxs = np.random.choice(idxs, samples_per_class, replace=False)
    for i, idx in enumerate(idxs):
        plt_idx = i * num_classes + y + 1
        plt.subplot(samples_per_class, num_classes, plt_idx)
        plt.imshow(X_train[idx].astype('uint8'))
        plt.axis('off')
        if i == 0:
            plt.title(cls)
plt.show()

In [4]:

# Subsample the data for more efficient code execution in this exercise
num_training = 5000
mask = list(range(num_training))
X_train = X_train[mask]
y_train = y_train[mask]

num_test = 500
mask = list(range(num_test))
X_test = X_test[mask]
y_test = y_test[mask]

In [5]:

# Reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
print(X_train.shape, X_test.shape)

(5000, 3072) (500, 3072)

To make things much structural, we now put everything together into the KNearestNeighbor class. You don’t need to implement any fucntion in this class now. Later you will need to come back here and implement the asked function, per the instruction.

In [6]:

import numpy as np
class KNearestNeighbor(object):
  """ a kNN classifier with L2 distance """

  def __init__(self):
    pass

  def train(self, X, y):
    """
    Train the classifier. For k-nearest neighbors this is just 
    memorizing the training data.

    Inputs:
    - X: A numpy array of shape (num_train, D) containing the training data
      consisting of num_train samples each of dimension D.
    - y: A numpy array of shape (N,) containing the training labels, where
         y[i] is the label for X[i].
    """
    self.X_train = X
    self.y_train = y
    
  def predict(self, X, k=1, num_loops=0):
    """
    Predict labels for test data using this classifier.

    Inputs:
    - X: A numpy array of shape (num_test, D) containing test data consisting
         of num_test samples each of dimension D.
    - k: The number of nearest neighbors that vote for the predicted labels.
    - num_loops: Determines which implementation to use to compute distances
      between training points and testing points.

    Returns:
    - y: A numpy array of shape (num_test,) containing predicted labels for the
      test data, where y[i] is the predicted label for the test point X[i].  
    """
    if num_loops == 0:
      dists = self.compute_distances_no_loops(X)
    elif num_loops == 1:
      dists = self.compute_distances_one_loop(X)
    elif num_loops == 2:
      dists = self.compute_distances_two_loops(X)
    else:
      raise ValueError('Invalid value %d for num_loops' % num_loops)

    return self.predict_labels(dists, k=k)

  def compute_distances_two_loops(self, X):
    """
    Compute the distance between each test point in X and each training point
    in self.X_train using a nested loop over both the training data and the 
    test data.

    Inputs:
    - X: A numpy array of shape (num_test, D) containing test data.

    Returns:
    - dists: A numpy array of shape (num_test, num_train) where dists[i, j]
      is the Euclidean distance between the ith test point and the jth training
      point.
    """
    num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train))
    for i in range(num_test):
      for j in range(num_train):
        #####################################################################
        # TODO:                                                             #
        # Compute the l2 distance between the ith test point and the jth    #
        # training point, and store the result in dists[i, j]. You should   #
        # not use a loop over dimension.                                    #
        #####################################################################

        #####################################################################
        #                       END OF YOUR CODE                            #
        #####################################################################
    return dists

  def compute_distances_one_loop(self, X):
    """
    Compute the distance between each test point in X and each training point
    in self.X_train using a single loop over the test data.

    Input / Output: Same as compute_distances_two_loops
    """
    num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train))
    for i in range(num_test):
      #######################################################################
      # TODO:                                                               #
      # Compute the l2 distance between the ith test point and all training #
      # points, and store the result in dists[i, :].                        #
      #######################################################################
      #######################################################################
      #######################################################################
    return dists

  def compute_distances_no_loops(self, X):
    """
    Compute the distance between each test point in X and each training point
    in self.X_train using no explicit loops.

    Input / Output: Same as compute_distances_two_loops
    """
    num_test = X.shape[0]
    num_train = self.X_train.shape[0]
    dists = np.zeros((num_test, num_train)) 
    #########################################################################
    # TODO:                                                                 #
    # Compute the l2 distance between all test points and all training      #
    # points without using any explicit loops, and store the result in      #
    # dists.                                                                #
    #                                                                       #
    # You should implement this function using only basic array operations; #
    # in particular you should not use functions from scipy.                #
    #                                                                       #
    # HINT: Try to formulate the l2 distance using matrix multiplication    #
    #       and two broadcast sums.                                         #
    #########################################################################

    #########################################################################
    #                         END OF YOUR CODE                              #
    #########################################################################
    return dists

  def predict_labels(self, dists, k=1):
    """
    Given a matrix of distances between test points and training points,
    predict a label for each test point.

    Inputs:
    - dists: A numpy array of shape (num_test, num_train) where dists[i, j]
      gives the distance betwen the ith test point and the jth training point.

    Returns:
    - y: A numpy array of shape (num_test,) containing predicted labels for the
      test data, where y[i] is the predicted label for the test point X[i].  
    """
    num_test = dists.shape[0] #num_test has same no of elements as testing data
    y_pred = np.zeros(num_test)
    for i in range(num_test):
      # A list of length k storing the labels of the k nearest neighbors to
      # the ith test point.

      #########################################################################
      # TODO:                                                                 #
      # Use the distance matrix to find the k nearest neighbors of the ith    #
      # testing point, and use self.y_train to find the labels of these       #
      # neighbors. Store these labels in closest_y.                           #
      # Hint: Look up the function numpy.argsort.                             #
      #########################################################################
      #########################################################################
      # TODO:                                                                 #
      # Now that you have found the labels of the k nearest neighbors, you    #
      # need to find the most common label in the list closest_y of labels.   #
      # Store this label in y_pred[i]. Break ties by choosing the smaller     #
      # label.                                                                #
      #########################################################################
      #########################################################################
      #                           END OF YOUR CODE                            # 
      #########################################################################

    return y_pred

In [7]:

# Create a kNN classifier instance. 
# Remember that training a kNN classifier is a noop: 
# the Classifier simply remembers the data and does no further processing 
classifier = KNearestNeighbor()
classifier.train(X_train, y_train)

We would now like to classify the test data with the kNN classifier. Recall that we can break down this process into two steps:

First we must compute the distances between all test examples and all train examples.
Given these distances, for each test example we find the k nearest examples and have them vote for the label

Lets begin with computing the distance matrix between all training and test examples. For example, if there are Ntr training examples and Nte test examples, this stage should result in a Nte x Ntr matrix where each element (i,j) is the distance between the i-th test and j-th train example.

First, open k_nearest_neighbor.py and implement the function compute_distances_two_loops that uses a (very inefficient) double loop over all pairs of (test, train) examples and computes the distance matrix one element at a time.

In [8]:

# Implement compute_distances_two_loops.

# Test your implementation:
dists = classifier.compute_distances_two_loops(X_test)
print(dists.shape)

(500, 5000)

In [9]:

# We can visualize the distance matrix: each row is a single test example and
# its distances to training examples
plt.imshow(dists, interpolation='none')
plt.show()

Second, open k_nearest_neighbor.py and implement the function predict_labels that predicts a label for each test point.

In [10]:

# Now implement the function predict_labels and run the code below:
# We use k = 1 (which is Nearest Neighbor).
y_test_pred = classifier.predict_labels(dists, k=1)

# Compute and print the fraction of correctly predicted examples
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))

Got 137 / 500 correct => accuracy: 0.274000

You should expect to see approximately 27% accuracy. Now lets try out a larger k, say k = 5:

In [11]:

y_test_pred = classifier.predict_labels(dists, k=5)
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))

Got 139 / 500 correct => accuracy: 0.278000

In [12]:

# Now lets speed up distance matrix computation by using partial vectorization
# with one loop. Implement the function compute_distances_one_loop and run the
# code below:
dists_one = classifier.compute_distances_one_loop(X_test)

# To ensure that our vectorized implementation is correct, we make sure that it
# agrees with the naive implementation. There are many ways to decide whether
# two matrices are similar; one of the simplest is the Frobenius norm. In case
# you haven't seen it before, the Frobenius norm of two matrices is the square
# root of the squared sum of differences of all elements; in other words, reshape
# the matrices into vectors and compute the Euclidean distance between them.
difference = np.linalg.norm(dists - dists_one, ord='fro')
print('Difference was: %f' % (difference, ))
if difference < 0.001:
    print('Good! The distance matrices are the same')
else:
    print('Uh-oh! The distance matrices are different')

Difference was: 0.000000
Good! The distance matrices are the same

In [13]:

# Now implement the fully vectorized version inside compute_distances_no_loops
# and run the code
dists_two = classifier.compute_distances_no_loops(X_test)

# check that the distance matrix agrees with the one we computed before:
difference = np.linalg.norm(dists - dists_two, ord='fro')
print('Difference was: %f' % (difference, ))
if difference < 0.001:
    print('Good! The distance matrices are the same')
else:
    print('Uh-oh! The distance matrices are different')

Difference was: 0.000005
Good! The distance matrices are the same

In [14]:

# Let's compare how fast the implementations are
def time_function(f, *args):
    """
    Call a function f with args and return the time (in seconds) that it took to execute.
    """
    import time
    tic = time.time()
    f(*args)
    toc = time.time()
    return toc - tic

two_loop_time = time_function(classifier.compute_distances_two_loops, X_test)
print('Two loop version took %f seconds' % two_loop_time)

one_loop_time = time_function(classifier.compute_distances_one_loop, X_test)
print('One loop version took %f seconds' % one_loop_time)

no_loop_time = time_function(classifier.compute_distances_no_loops, X_test)
print('No loop version took %f seconds' % no_loop_time)

# you should see significantly faster performance with the fully vectorized implementation

Two loop version took 65.921167 seconds
One loop version took 149.449537 seconds
No loop version took 0.807989 seconds

Cross-validation¶

We have implemented the k-Nearest Neighbor classifier but we set the value k = 5 arbitrarily. We will now determine the best value of this hyperparameter with cross-validation.

In [15]:

num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]

X_train_folds = []
y_train_folds = []
################################################################################
# TODO:                                                                        #
# Split up the training data into folds. After splitting, X_train_folds and    #
# y_train_folds should each be lists of length num_folds, where                #
# y_train_folds[i] is the label vector for the points in X_train_folds[i].     #
# Hint: Look up the numpy array_split function.                                #
################################################################################
################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

# A dictionary holding the accuracies for different values of k that we find
# when running cross-validation. After running cross-validation,
# k_to_accuracies[k] should be a list of length num_folds giving the different
# accuracy values that we found when using that value of k.


################################################################################
# TODO:                                                                        #
# Perform k-fold cross validation to find the best value of k. For each        #
# possible value of k, run the k-nearest-neighbor algorithm num_folds times,   #
# where in each case you use all but one of the folds as training data and the #
# last fold as a validation set. Store the accuracies for all fold and all     #
# values of k in the k_to_accuracies dictionary.                               #
################################################################################

################################################################################
#                                 END OF YOUR CODE                             #
################################################################################

# Print out the computed accuracies
for k in sorted(k_to_accuracies):
    for accuracy in k_to_accuracies[k]:
        print('k = %d, accuracy = %f' % (k, accuracy))

k = 1, accuracy = 0.263000
k = 1, accuracy = 0.257000
k = 1, accuracy = 0.264000
k = 1, accuracy = 0.278000
k = 1, accuracy = 0.266000
k = 3, accuracy = 0.239000
k = 3, accuracy = 0.249000
k = 3, accuracy = 0.240000
k = 3, accuracy = 0.266000
k = 3, accuracy = 0.254000
k = 5, accuracy = 0.248000
k = 5, accuracy = 0.266000
k = 5, accuracy = 0.280000
k = 5, accuracy = 0.292000
k = 5, accuracy = 0.280000
k = 8, accuracy = 0.262000
k = 8, accuracy = 0.282000
k = 8, accuracy = 0.273000
k = 8, accuracy = 0.290000
k = 8, accuracy = 0.273000
k = 10, accuracy = 0.265000
k = 10, accuracy = 0.296000
k = 10, accuracy = 0.276000
k = 10, accuracy = 0.284000
k = 10, accuracy = 0.280000
k = 12, accuracy = 0.260000
k = 12, accuracy = 0.295000
k = 12, accuracy = 0.279000
k = 12, accuracy = 0.283000
k = 12, accuracy = 0.280000
k = 15, accuracy = 0.252000
k = 15, accuracy = 0.289000
k = 15, accuracy = 0.278000
k = 15, accuracy = 0.282000
k = 15, accuracy = 0.274000
k = 20, accuracy = 0.270000
k = 20, accuracy = 0.279000
k = 20, accuracy = 0.279000
k = 20, accuracy = 0.282000
k = 20, accuracy = 0.285000
k = 50, accuracy = 0.271000
k = 50, accuracy = 0.288000
k = 50, accuracy = 0.278000
k = 50, accuracy = 0.269000
k = 50, accuracy = 0.266000
k = 100, accuracy = 0.256000
k = 100, accuracy = 0.270000
k = 100, accuracy = 0.263000
k = 100, accuracy = 0.256000
k = 100, accuracy = 0.263000

In [16]:

# plot the raw observations
for k in k_choices:
    accuracies = k_to_accuracies[k]
    plt.scatter([k] * len(accuracies), accuracies)

# plot the trend line with error bars that correspond to standard deviation
accuracies_mean = np.array([np.mean(v) for k,v in sorted(k_to_accuracies.items())])
accuracies_std = np.array([np.std(v) for k,v in sorted(k_to_accuracies.items())])
plt.errorbar(k_choices, accuracies_mean, yerr=accuracies_std)
plt.title('Cross-validation on k')
plt.xlabel('k')
plt.ylabel('Cross-validation accuracy')
plt.show()

In [17]:

# Based on the cross-validation results above, choose the best value for k,   
# retrain the classifier using all the training data, and test it on the test
# data. You should be able to get above 28% accuracy on the test data.
best_k = 10

classifier = KNearestNeighbor()
classifier.train(X_train, y_train)
y_test_pred = classifier.predict(X_test, k=best_k)

# Compute and display the accuracy
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))

Got 141 / 500 correct => accuracy: 0.282000

Section 2. Linear Regression [25 pts]¶

The following linear regression assignment is modified from Stanford CS229. Please complete and hand in this completed worksheet.

Linear regression with one variable¶

Before starting on any task, it is often useful to understand the data by visualizing it. For this dataset, you can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population). (Many other problems that you will encounter in real life are multi-dimensional and can’t be plotted on a 2-d plot.)

The dataset is loaded from the data file into the variables X and y:

In [18]:

data = np.loadtxt('data/ex1data1.txt', delimiter=",") # read comma separated data
m = data.shape[0]                                     # number of training example
X = data[:,0].reshape(m,1)
y = data[:,1].reshape(m,1)                             
print (X.shape)
print (y.shape)

(97, 1)
(97, 1)

In [19]:

plt.plot(X,y, 'rx')                         # Plot the data
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.show()

In this part, you will fit the linear regression parameters $\theta$ to our dataset using gradient descent.

The objective of linear regression is to minimize the cost function \begin{equation*} J(\theta) = \frac{1}{2m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})^2 \end{equation*}

where the hypothesis $h_\theta(x)$ is given by the linear mode \begin{equation*} h_{\theta}(x^{(i)}) = \theta^Tx = \theta_0 + \theta_1 x_1 \end{equation*}

Recall that the parameters of your model are the $\theta_j$ values. These are the values you will adjust to minimize cost $J(\theta)$. One way to do this is to use the batch gradient descent algorithm. In batch gradient descent, each iteration performs the update \begin{equation*} \theta_j := \theta_j – \alpha \frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)}) x_j^{(i)} \end{equation*}

With each step of gradient descent, your parameters $\theta_j$ come closer to the optimal values that will achieve the lowest cost $J(\theta)$.

As you perform gradient descent to learn minimize the cost function J(θ), it is helpful to monitor the convergence by computing the cost. In this section, you will implement a function to calculate J(θ) so you can check the convergence of your gradient descent implementation.

Your next task is to complete the compute_cost function, which is a function that computes J(θ). As you are doing this, remember that the variables X and y are not scalar values, but matrices whose rows represent the examples from the training set.

In [20]:

def compute_cost(X, y, theta):
    m = len(y)
    # You need to return the following variables correctly 
    J = 0    #####################################################################
    # Compute the cost of a particular choice of theta                  #
    #               You should set J to the cost.                       #
    #####################################################################
    #####################################################################
    #                       END OF YOUR CODE                            #
    ####################################################################
    return J

In [21]:

X = np.concatenate((np.ones((m, 1)), data[:,0].reshape(m,1)), axis=1)
theta = np.zeros((2, 1)) 

compute_cost(X, y, theta)

Out[21]:

32.072733877455676

You should expect to see a cost of 32.07.

Next, you will implement gradient descent function. The loop structure has been written for you, and you only need to supply the updates to θ within each iteration.

As you program, make sure you understand what you are trying to optimize and what is being updated. Keep in mind that the cost J(θ) is parameterized by the vector θ, not X and y. That is, we minimize the value of J(θ) by changing the values of the vector θ, not by changing X or y.

A good way to verify that gradient descent is working correctly is to look at the value of J(θ) and check that it is decreasing with each step. The starter code calls compute_cost on every iteration and prints the cost. Assuming you have implemented gradient descent and compute_cost correctly, your value of J(θ) should never increase, and should converge to a steady value by the end of the algorithm.

In [22]:

def gradient_descent(X, y, theta, alpha, num_iters):
    # GRADIENTDESCENT Performs gradient descent to learn theta
    # theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
    # taking num_iters gradient steps with learning rate alpha

    # Initialize some useful values
    m = len(y)
    J_history = []

    
    for iter in range(num_iters):

        
        #####################################################################
        # Instructions: Perform a single gradient step on the parameter     #
        #               vector theta.                                       #
        #                                                                   #      
        # Hint: While debugging, it can be useful to print out the values   #
        #       of the cost function (compute_cost) and gradient here.       # 
        #####################################################################
        #####################################################################
        #                       END OF YOUR CODE                            #
        #####################################################################


        # Save the cost J in every iteration 
        J = compute_cost(X, y, theta)
        J_history.append(J)
    
    return theta, J_history

Now let’s find the parameter θ and plot the linear fit.

In [23]:

print('Running Gradient Descent ...\n')

X = np.concatenate((np.ones((m, 1)), data[:,0].reshape(m,1)), axis=1) # Add a column of ones to x
theta = np.zeros((2, 1))                                              # initialize fitting parameters

# Some gradient descent settings
iterations = 1500
alpha = 0.01

# gradient descent
theta, J_history = gradient_descent(X, y, theta, alpha, iterations)
print('Theta found by gradient descent: ')
print(theta[0], theta[1])


plt.plot(X[:,1], y, 'rx')                         # Plot the data
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')

plt.plot(X[:,1], np.dot(X, theta), '-')
plt.show()

Running Gradient Descent ...

Theta found by gradient descent: 
[-3.63029144] [1.16636235]

Linear regression with multiple variable¶

In this part, you will implement linear regression with multiple variables to predict the prices of houses. Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices.

The file ex1data2.txt contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.

In [24]:

data = np.loadtxt('data/ex1data2.txt', delimiter=",") # read comma separated data
m = data.shape[0]                                     # number of training example
X = data[:,0:2].reshape(m,2)
y = data[:,2].reshape(m,1)

By looking at the values, note that house sizes are about 1000 times the number of bedrooms. When features differ by orders of magnitude, first performing feature scaling can make gradient descent converge much more quickly.

In [25]:

def feature_normalize(X):
    
    # FEATURENORMALIZE Normalizes the features in X 
    #   FEATURENORMALIZE(X) returns a normalized version of X where the mean value of each
    #   feature is 0 and the standard deviation is 1. This is often a good preprocessing 
    #   step to do when working with learning algorithms.

    # You need to set these values correctly
    X_norm = X
    mu     = 0
    sigma  = 0

    #####################################################################
    # Instructions: First, for each feature dimension, compute the mean #
    #               of the feature and subtract it from the dataset,    #
    #               storing the mean value in mu. Next, compute the     #
    #               standard deviation of each feature and divide       #
    #               each feature by it's standard deviation, storing    #
    #               the standard deviation in sigma.                    #
    #                                                                   #
    #               Note that X is a matrix where each column is a      #
    #               feature and each row is an example. You need        #
    #               to perform the normalization separately for         #
    #               each feature.                                       #
    #                                                                   #
    # Hint: You might find the 'mean' and 'std' functions useful.       #
    #####################################################################
    #####################################################################
    #                       END OF YOUR CODE                            #
    #####################################################################


    return X_norm, mu, sigma

Previously, you implemented gradient descent on a univariate regression problem. The only difference now is that there is one more feature in the matrix X. The hypothesis function and the batch gradient descent update rule remain unchanged.

You should complete the function gradientDescentMulti to implement the gradient descent for linear regression with multiple variables.

Make sure your code supports any number of features and is well-vectorized.

In [26]:

X = np.concatenate((np.ones((m, 1)),feature_normalize(data[:,0:2].reshape(m,2))[0]), axis=1)
theta = np.zeros((3, 1)) 

compute_cost(X, y, theta)

Out[26]:

65591548106.45744

You should expect to see a cost of 65591548106.

Next, you will implement gradient descent function with multiple variable.

In [27]:

def gradient_descent_multi(X, y, theta, alpha, num_iters):
    #GRADIENTDESCENTMULTI Performs gradient descent to learn theta
    #   theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
    #   taking num_iters gradient steps with learning rate alpha

    # Initialize some useful values
    m = len(y)
    J_history = []
    
    
    for iter in range(num_iters):

        
        #####################################################################
        # Instructions: Perform a single gradient step on the parameter     #
        #               vector theta.                                       #
        #                                                                   #      
        # Hint: While debugging, it can be useful to print out the values   #
        #       of the cost function (compute_cost) and gradient here.      # 
        #####################################################################
        #####################################################################
        #                       END OF YOUR CODE                            #
        #####################################################################


        # Save the cost J in every iteration 
        J = compute_cost(X, y, theta)
        print(J)
        J_history.append(J)
    
    return theta, J_history

Now let’s find the parameter θ and plot the linear fit.

In [28]:

alpha = 0.01;
num_iters = 400;
print(np.dot(X,theta).shape)
theta = np.zeros((3, 1))
theta, J_history = gradient_descent_multi(X, y, theta, alpha, num_iters)

(47, 1)
64297776251.62011
63031018305.52132
61790694237.53249
60576236901.991035
59387091739.9886
58222716488.38939
57082580895.8954
55966166445.97885
54872966086.50778
53802483965.89506
52754235175.605446
51727745498.85994
50722551165.380974
49738198612.02588
48774244249.16026
47830254232.6268
46905804241.168976
46000479259.1725
45113873364.59137
44245589521.92844
43395239380.144295
42562443075.371216
41746829038.312386
40948033806.20948
40165701839.264984
39399485341.40871
38649044085.30025
37914045241.46274
37194163211.44539
36489079464.91514
35798482380.58049
35122067090.852936
34459535330.1538
33810595286.77683
33174961458.219242
32552354509.895863
31942501137.15358
31345133930.505222
30759991244.004223
30186817066.683086
29625360896.981297
29075377620.08948
28536627388.13903
28008875503.167988
27491892302.795826
26985453048.54132
26489337816.71971
26003331391.85654
25527223162.557613
25060807019.775593
24603881257.415604
24156248475.223606
23717715483.902462
23288093212.402443
22867196617.33388
22454844594.4512
22050859892.15871
21655069026.99004
21267302201.013813
20887393221.120007
20515179420.14188
20150501579.77011
19793203855.216385
19443133701.585064
19100141801.912357
18764081996.833694
18434811215.84058
18112189410.089695
17796079486.72735
17486347244.693893
17182861311.972996
16885493084.252026
16594116664.960405
16308608806.65347
16028848853.710594
15754718686.316616
15486102665.696718
15222887580.575449
14964962594.831402
14712219196.319695
14464551146.835112
14221854433.189379
13984027219.376747
13750969799.802755
13522584553.551311
13298775899.666433
13079450253.424904
12864515983.577183
12653883370.534124
12447464565.477882
12245173550.375597
12046926098.875242
11852639738.063328
11662233711.064756
11475628940.465513
11292747992.539381
11113515042.260374
10937855839.082855
10765697673.47193
10596969344.166983
10431601126.161716
10269524739.384388
10110673318.062336
9954981380.75535
9802384801.04264
9652820778.848698
9506227812.393553
9362545670.75337
9221715367.017591
9083679132.02917
8948380388.694815
8815763726.852383
8685774878.682936
8558360694.655188
8433469119.990486
8311049171.636583
8191050915.738868
8073425445.597918
7958124860.102508
7845102242.627467
7734311640.386057
7625708044.226663
7519247368.864025
7414886433.535263
7312582943.071296
7212295469.37446
7113983433.293285
7017607086.885628
6923127496.061648
6830506523.598125
6739706812.515992
6650691769.813038
6563425550.543964
6477873042.240136
6393999849.661561
6311772279.873769
6231157327.642514
6152122661.139253
6074636607.950605
5998668141.385206
5924186867.071307
5851163009.838901
5779567400.880069
5709371465.181493
5640547209.223233
5573067208.9379015
5506904597.924616
5442033055.912168
5378426797.46598
5316060560.933568
5254909597.623319
5194949661.2115345
5136156997.3727865
5078508333.628749
5021980869.410768
4966552266.331585
4912200638.661632
4858904544.005542
4806642974.174524
4755395346.250381
4705141493.837055
4655861658.495639
4607536481.3589525
4560146994.921772
4513674615.002972
4468101132.875878
4423408707.563237
4379579858.293242
4336597457.113228
4294444721.657557
4253105208.066543
4212562804.053023
4172801722.1135597
4133806492.8811145
4095561958.6161766
4058053266.8334436
4021265864.061104
3985185489.7299514
3949798170.1895227
3915090212.8486094
3881048200.437472
3847658985.3891406
3814909684.337368
3782787672.7286496
3751280579.545978
3720376282.141932
3690062901.1787825
3660328795.6733546
3631162558.1444583
3602553009.8606644
3574489196.1863756
3546960382.0240526
3519956047.350615
3493465882.846027
3467479785.6120825
3441987854.979558
3416980388.401809
3392447877.4330564
3368381003.789511
3344770635.491656
3321607823.085977
3298883795.944417
3276589958.64
3254717887.3969865
3233259326.613998
3212206185.458606
3191550534.531864
3171284602.6013308
3151400773.401179
3131891582.497912
3112749714.2204375
3093967998.653024
3075539408.689931
3057457057.1503615
3039714193.9525356
3022304203.3455877
3005220601.1981487
2988457032.342386
2972007267.972377
2955865203.095665
2940024854.0369077
2924480355.9925337
2909225960.6353436
2894256033.7680163
2879565053.0245204
2865147605.618414
2850998386.1370893
2837112194.380988
2823483933.2468657
2810108606.654196
2796981317.513811
2784097265.7379103
2771451746.29061
2759040147.2781234
2746857948.0778456
2734900717.505465
2723164112.0193577
2711643873.9614825
2700335829.8340216
2689235888.6110344
2678340040.0844083
2667644353.24338
2657144974.6869745
2646838127.0686274
2636720107.572393
2626787286.4200387
2617036105.408408
2607463076.476443
2598064780.3012333
2588837864.9225197
2579779044.395034
2570885097.4681597
2562152866.292284
2553579255.1513553
2545161229.221069
2536895813.352176
2528780090.878393
2520811202.4484067
2512986344.881492
2505302770.046249
2497757783.761987
2490348744.7222986
2483073063.440365
2475928201.215545
2468911669.120834
2462021027.0107284
2455253882.5491185
2448607890.2567806
2442080750.5780616
2435670208.9663877
2429374054.9881926
2423190121.444897
2417116283.5125785
2411150457.8989606
2405290602.017369
2399534713.177319
2393880827.7913885
2388327020.598047
2382871403.900107
2377512126.818505
2372247374.561065
2367075367.7059803
2361994361.4996643
2357002645.168741
2352098541.2458296
2347280404.908872
2342546623.333738
2337895615.0598025
2333325829.3682785
2328835745.673001
2324423872.9234447
2320088749.0197077
2315828940.2392287
2311643040.674992
2307529671.684991
2303487481.3527303
2299515143.958527
2295611359.4614096
2291774852.991374
2288004374.351835
2284298697.5320034
2280656620.2290435
2277076963.379789
2273558570.701808
2270100308.243673
2266701063.944202
2263359747.200523
2260075288.4447656
2256846638.7292147
2253672769.319745
2250552671.297394
2247485355.1678667
2244469850.4788623
2241505205.445017
2238590486.5803514
2235724778.33804
2232907182.7573757
2230136819.1177626
2227412823.5996304
2224734348.952087
2222100564.1672115
2219510654.1608334
2216963819.4596634
2214459275.894684
2211996254.3006086
2209574000.221364
2207191773.6214075
2204848848.6028104
2202544513.1279545
2200278068.747763
2198048830.3353295
2195856125.8248405
2193699295.9557047
2191577694.021751
2189490685.625428
2187437648.4368777
2185417971.9578023
2183431057.290015
2181476316.9086003
2179553174.4395676
2177661064.4419203
2175799432.194059
2173967733.484414
2172165434.4062343
2170392011.156456
2168646949.838549
2166929746.269276
2165239905.7892857
2163576943.0774593
2161940381.9689326
2160329755.27673
2158744604.616923
2157184480.237262
2155648940.849195
2154137553.4632096
2152649893.227437
2151185543.269453
2149744094.541205
2148325145.667005
2146928302.7945316
2145553179.4487762
2144199396.388881
2142866581.4677956
2141554369.494718
2140262402.10025
2138990327.6042066
2137737800.8860512
2136504483.257877
2135290042.3399014
2134094151.9384081
2132916491.9261124
2131756748.124874
2130614612.1907258
2129489781.5011702
2128381959.0446966
2127290853.3124804
2126216178.1922035
2125157652.8639822
2124115001.6983278
2123087954.1561348
2122076244.6906202
2121079612.6512039
2120097802.1892736
2119130562.1658094
2118177646.0608199
2117238811.884559
2116313822.09049
2115402443.4899635
2114504447.1685586
2113619608.404084
2112747706.5861764
2111888525.137485
2111041851.4363952
2110207476.7412794
2109385196.1162214
2108574808.3582084
2107776115.925742
2106988924.8688555
2106213044.760492
2105448288.6292474

Let’s plot the convergence graph

In [29]:

plt.plot(list(range(0, len(J_history))), J_history, '-b')                         # Plot the data
plt.xlabel('Number of iterations')
plt.ylabel('Cost J')
plt.show()

Section 3. Logistic Regression [25 pts]¶

The following logistic regression assignment is modified from Stanford CS229. Please complete and hand in this completed worksheet.

Logistic Regression¶

In this section, you need to implement logsitic regression to solve a binary classification problem. Let’s first get our data ready:

In [30]:

# Only use the first 70 samples for training (and validation),
# and treat the rest of them as hold-out testing set.
X = np.loadtxt('data/logistic_x_.txt') 
y = np.loadtxt('data/logistic_y_.txt').reshape(-1, 1) 


X, mu, std = feature_normalize(X)

# Add a column of ones to X for the bias weight.
m = len(X)
X = np.concatenate((np.ones((m, 1)), X), axis=1)

Here, the input $x^{(i)}\in\mathbb{R^2}$ and $y^{(i)}\in\{-1, 1\}$. Like we have mentioned, it is better to visualize the data first before you start working on it.

In [31]:

# Plot the feature according to their class label.
# Note that we exclude column 0, which is the colunm we padded with one in the previous block.
plt.plot(X[np.where(y==1), 1], X[np.where(y==1), 2], 'rx')
plt.plot(X[np.where(y==-1), 1], X[np.where(y==-1), 2], 'bo')  
plt.xlabel('x2')
plt.ylabel('x1')
plt.show()

In the following, you need to implement logistic regression. Recall that when $y^{(i)}\in{-1,1}$, the objective function for binary logistic regression can be expressed as: \begin{equation*} J(\theta) = \frac{1}{m}\sum_{i=1}^{m}\log{\left(1+e^{-y^{(i)\theta^Tx^{(i)}}}\right)}=-\frac{1}{m}\sum_{i=1}^m\log{\left(h_{\theta}(y^{(i)}x^{(i)})\right)} \end{equation*} where the hypothesis is the sigmoid function: \begin{equation*} h_\theta(y^{(i)}x^{(i)})=\frac{1}{1+e^{-y^{(i)}\theta^{T}x^{(i)}}} \end{equation*} which we have seen in class (and assignment 0). Similar to the previous section, we can minimize the objective function $J(\theta)$ using batch gradient descent: \begin{equation*} \theta_j := \theta_j – \alpha \frac{1}{m}\sum_{i=1}^{m}h_\theta(-y^{(i)}x_j^{(i)})(-y^{(i)}x_j^{(i)}) \end{equation*}

Now, your task is to complete the function sigmoid, compute_cost, gradient_descent for logistic regression.

In [32]:

def sigmoid(z):
    #####################################################################
    # Instructions: Implement sigmoid function g                        #
    #####################################################################

    #####################################################################
    #                       END OF YOUR CODE                            #
    #####################################################################
    return g

def compute_cost(X, y, theta):
    
    # You need to return the following variables correctly 
    J = 0;
    #####################################################################
    # Instructions: Implement the objective function J(theta)           #
    #####################################################################
    #####################################################################
    #                       END OF YOUR CODE                            #
    #####################################################################
    return J

def compute_gradient(X, y, theta):
    #####################################################################
    # Instructions: Implement gradient function gradient_               #
    #####################################################################
    #####################################################################
    #                       END OF YOUR CODE                            #
    #####################################################################
    return gradient_


def gradient_descent_logistic(X, y, theta, alpha, num_iters):
    m = len(y)
    J_history = []
    for iter in range(num_iters):
        theta = 0

        #####################################################################
        # Instructions: Perform a single gradient step on the parameter     #
        #               vector theta using the implemented compute_gradient #
        #                                                                   #      
        # Hint: While debugging, it can be useful to print out the values   #
        #       of the cost function (compute_cost) and gradient here.      # 
        #####################################################################
        
        #####################################################################
        #                       END OF YOUR CODE                            #
        #####################################################################


        # Save the cost J in every iteration 
        J = compute_cost(X, y, theta)
        print(J)
        J_history.append(J)
    
    return theta, J_history

Now, fit your model, and see if it is learning.

In [33]:

# Train your model.
theta = np.zeros((X.shape[1], 1))
alpha = 0.1;
num_iters = 400;
theta, J_history = gradient_descent_logistic(X, y, theta, alpha, num_iters)

0.6767318709516239
0.6612759836177738
0.6467213868522412
0.6330122173560181
0.6200951054326651
0.6079193220430319
0.5964368590615188
0.5856024539525343
0.5753735694532695
0.565710337891584
0.5565754786361017
0.547934195984304
0.5397540636253004
0.5320049007207323
0.5246586436612436
0.5176892166916711
0.5110724038582442
0.5047857241105058
0.4988083108795615
0.49312079704042866
0.48770520583666754
0.4825448480874091
0.47762422579843516
0.4729289421494971
0.4684456177202448
0.46416181273904
0.4600659550858426
0.4561472737467817
0.4523957373993958
0.448801997800213
0.44535733764742047
0.44205362259850367
0.4388832571341371
0.43583914397384127
0.4329146467649332
0.43010355578324566
0.4274000564013932
0.4247987000975499
0.42229437779447193
0.4198822953346259
0.417557950912616
0.4153171143005741
0.4131558077157223
0.41107028819193164
0.4090570313288082
0.40711271630263857
0.40523421203348253
0.4034185644118437
0.401662984496735
0.39996483760461843
0.3983216332157216
0.3967310156306258
0.3951907553158623
0.3936987408825787
0.3922529716471801
0.39085155072727046
0.389492678630239
0.3881746472954943
0.386895834554683
0.38565469897726545
0.3844497750715729
0.38327966881399933
0.3821430534812575
0.38103866576272566
0.3799653021318076
0.378921815456965
0.37790711183465914
0.3769201476278858
0.3759599266953023
0.3750254977971449
0.37411595216524024
0.37323042122541344
0.3723680744615167
0.3715281174111411
0.3707097897838466
0.3699123636934485
0.3691351419965436
0.36837745673005506
0.3676386676411173
0.36691816080311873
0.36621534731218414
0.3655296620587962
0.3648605625696431
0.3642075279151391
0.36357005767838924
0.36294767098167463
0.3623399055668146
0.36174631692601383
0.36116647748004677
0.3605999758008458
0.36004641587576497
0.35950541641097494
0.35897661017162175
0.35845964335653885
0.3579541750054525
0.3574598764367555
0.3569764307140532
0.3565035321398041
0.35604088577448395
0.3555882069798088
0.35514522098464013
0.35471166247229
0.3542872751880217
0.3538718115656143
0.3534650323719397
0.3530667063685542
0.3526766099893788
0.3522945270335911
0.3519202483729106
0.35155357167250495
0.3511943011247944
0.3508422471954714
0.3504972263810946
0.3501590609776562
0.34982757885955074
0.3495026132684144
0.34918400261132765
0.348871590267909
0.348565224405848
0.34826475780445815
0.3479700476858499
0.34768095555334516
0.3473973470367804
0.34711909174436023
0.3468460631207451
0.3465781383110707
0.34631519803061844
0.34605712643986486
0.34580381102465674
0.34555514248127206
0.345311014606137
0.3450713241899841
0.34483597091624574
0.3446048572634892
0.3443778884117099
0.34415497215230506
0.3439360188015665
0.34372094111753104
0.34350965422004076
0.3433020755138719
0.34309812461479494
0.3428977232784407
0.3427007953318474
0.34250726660757574
0.34231706488027847
0.3421301198056233
0.3419463628614649
0.3417657272911741
0.341588148049033
0.3414135617476077
0.34124190660702
0.34107312240603516
0.3409071504348955
0.3407439334498255
0.34058341562914135
0.34042554253089924
0.34027026105202324
0.34011751938884927
0.33996726699903296
0.33981945456476426
0.3396740339572391
0.3395309582023386
0.33939018144746774
0.3392516589295106
0.33911534694385576
0.33898120281445493
0.33884918486487225
0.33871925239028705
0.33859136563041586
0.33846548574331686
0.3383415747800449
0.3382195956601256
0.3380995121478175
0.33798128882913353
0.3378648910895934
0.33775028509268207
0.3376374377589863
0.337526316745985
0.3374168904284732
0.3373091278795917
0.33720299885244454
0.33709847376228236
0.33699552366923224
0.3368941202615529
0.3367942358393999
0.336695843299079
0.33659891611777665
0.33650342833874436
0.33640935455692644
0.3363166699050138
0.3362253500399096
0.33613537112959346
0.3360467098403689
0.33595934332448446
0.335873249208112
0.33578840557967354
0.3357047909785037
0.3356223843838368
0.3355411652041074
0.33546111326655587
0.33538220880712655
0.3353044324606509
0.33522776525130665
0.33515218858334067
0.3350776842320533
0.3350042343350292
0.33493182138361116
0.33486042821460876
0.33479003800223184
0.33472063425024473
0.3346522007843335
0.3345847217446783
0.3345181815787264
0.33445256503415804
0.3343878571520404
0.3343240432601633
0.33426110896655176
0.3341990401531485
0.33413782296966316
0.33407744382758214
0.33401788939433474
0.3339591465876101
0.3339012025698215
0.33384404474271323
0.3337876607421052
0.3337320384327727
0.3336771659034561
0.3336230314619962
0.33356962363059456
0.33351693114119074
0.3334649429309573
0.33341364813790614
0.3333630360966051
0.3333130963339994
0.3332638185653382
0.3332151926901997
0.33316720878861517
0.33311985711728553
0.3330731281058928
0.3330270123534994
0.33298150062503384
0.332936583847863
0.3328922531084448
0.33284849964906227
0.33280531486463416
0.33276269029960326
0.3327206176448954
0.332679088734953
0.33263809554483686
0.33259763018739597
0.33255768491050325
0.3325182520943564
0.33247932424884014
0.3324408940109503
0.33240295414227644
0.33236549752654304
0.3323285171672061
0.33229200618510496
0.33225595781616735
0.33222036540916666
0.33218522242352894
0.3321505224271905
0.3321162590945021
0.3320824262041812
0.332049017637309
0.33201602737537256
0.3319834494983492
0.3319512781828345
0.33191950770021006
0.33188813241485315
0.33185714678238315
0.33182654534794825
0.33179632274454823
0.3317664736913936
0.3317369929923008
0.3317078755341208
0.33167911628520264
0.33165071029388815
0.3316226526870411
0.33159493866860495
0.3315675635181929
0.331540522589707
0.3315138113099868
0.33148742517748564
0.3314613597609752
0.33143561069827676
0.33141017369501896
0.3313850445234213
0.33136021902110263
0.33133569308991445
0.33131146269479794
0.33128752386266447
0.331263872681299
0.3312405052982859
0.33121741791995624
0.33119460681035684
0.3311720682902398
0.3311497987360723
0.3311277945790667
0.33110605230422985
0.3310845684494312
0.3310633396044891
0.33104236241027657
0.33102163355784303
0.33100114978755546
0.33098090788825313
0.33096090469642286
0.3309411370953882
0.330921602014514
0.33090229642842806
0.33088321735625664
0.3308643618608752
0.33084572704817367
0.3308273100663361
0.3308091081051331
0.330791118395229
0.3307733382075021
0.33075576485237684
0.3307383956791697
0.3307212280754471
0.3307042594663958
0.33068748731420383
0.3306709091174556
0.3306545224105354
0.3306383247630441
0.33062231377922613
0.33060648709740686
0.33059084238944064
0.3305753773601688
0.33056008974688883
0.33054497731883087
0.3305300378766469
0.3305152692519064
0.3305006693066033
0.33048623593267107
0.33047196705150633
0.33045786061350146
0.3304439145975868
0.3304301270107778
0.33041649588773486
0.33040301929032695
0.33038969530720547
0.33037652205338547
0.330363497669833
0.3303506203230612
0.33033788820473337
0.3303252995312721
0.3303128525434763
0.3303005455061445
0.3302883767077052
0.3302763444598531
0.33026444709719155
0.3302526829768828
0.3302410504783019
0.3302295480026991
0.3302181739728658
0.330206926832808
0.33019580504742474
0.3301848071021919
0.33017393150285146
0.33016317677510665
0.33015254146432144
0.3301420241352258
0.330131623371626
0.33012133777611874
0.3301111659698119
0.33010110659204867
0.3300911583001365
0.3300813197690814
0.33007158969132516
0.3300619667764893
0.33005244975112075
0.33004303735844304
0.33003372835811245
0.33002452152597644
0.3300154156538369
0.33000640954921795
0.32999750203513634
0.3299886919498766
0.32997997814676927
0.32997135949397327
0.3299628348742615
0.32995440318480934
0.32994606333698856
0.3299378142561623
0.3299296548814842
0.32992158416570166
0.32991360107496076
0.3299057045886162
0.32989789369904143
0.3298901674114456
0.3298825247436895
0.3298749647261081
0.3298674864013327
0.32986008882411827
0.32985277106117206
0.3298455321909866
0.3298383713036725
0.32983128750079727
0.3298242798952241

Again, plot and check to see if the model is converging.

In [34]:

plt.plot(list(range(0, len(J_history))), J_history, '-b')  
plt.xlabel('Number of iterations')
plt.ylabel('Cost J')
plt.show()
print (theta)

[[-0.03801384]
 [ 1.38454246]
 [ 1.91783357]]

Decision Boundary¶

In addition to checking convergence graph and accuracy, we can also plot out the decision boundary to see what does the model actually learn.

In [35]:

# Plot the feature according to their class label.
# Note that we exclude column 0, which is the colunm we padded with one in the previous block.
plt.plot(X[np.where(y==1), 1], X[np.where(y==1), 2], 'rx')
plt.plot(X[np.where(y==-1), 1], X[np.where(y==-1), 2], 'bo')

#####################################################################
# Instructions: Plot out the decision boundary.                     #
# Hint: To plot the boundary, which is a straight line in our case, #
#       you need to find the two ends of the line, and plot it with #
#       plt.plot(). Note that the decision boundary is the line that#
#       y = 0.                                                      # 
#####################################################################
#####################################################################
#                       END OF YOUR CODE                            #
#####################################################################

plt.xlabel('x1')
plt.ylabel('x2')
plt.show()

Section 4. Regularization [30 pts]¶

In this section, you need to incorporate L2 regularization into your logistic regression.

L2 Regularization¶

Overfitting is a notorious problem in the world of machine learning. One simple way to counter this issue is to put constraints on your model weights $\theta$, as we have discussed in class. In this section, you need to modify the the objective function to impose L2 regularization on the logistic regression: \begin{equation*} J(\theta) = -\frac{1}{m}\sum_{i=1}^m\log{\left(h_{\theta}(y^{(i)}x^{(i)})\right)} + \lambda\vert\vert\theta\vert\vert_2^2 \end{equation*} Derive the gradient for this new objective to incorporate it into your logistic regression model.

To make things much structural, we now put everything together into a class. Please use the class template below to implement your logistic regression. Note that you can add your own class methods if needed.

In [36]:

class LogisticRegression(object):
    
    def __init__(self, alpha=0.1, lamb=0.1, regularization=None):
        # setting the class attribute.
        self.alpha = alpha                   # Set up your learning rate alpha.
        self.lamb = lamb                     # Strength of regularization.
        self.regularization = regularization 
        assert regularization == 'l2' or regularization == None # we only consider these two cases
    
    def _compute_cost(self, X, y):
        #####################################################################
        # Instructions: Compute the cost function here.                     #
        #               You need to handle both the cases with, and without #
        #               regularization here.                                #
        #####################################################################
        #####################################################################
        #                       END OF YOUR CODE                            #
        #####################################################################
        return J
        
    def _compute_gradient(self, X, y):
        #####################################################################
        # Instructions: Compute the gradient here.                          #
        #               You need to handle both the cases with, and without #
        #               regularization here.                                #
        #####################################################################
        #####################################################################
        #####################################################################
        #                       END OF YOUR CODE                            #
        #####################################################################
        return gradient

    def fit(self, X, y, num_iter=5):
        self.theta = np.zeros((X.shape[1], 1))
        m = len(y)
        J_history = []
        #####################################################################
        #####################################################################
       
        #                       END OF YOUR CODE                            #
        #####################################################################
        return J_history
    
    def predict(self, X):
        #####################################################################
        # Instructions: Use your hypothese to make predictions.             #
        #####################################################################
        #####################################################################
        #                       END OF YOUR CODE                            #
        #####################################################################
        return y_hat

Load the wine datasets, in which $x_j\in\mathbb{R}^{12}$ is different attribute for alcohol, and $y\in\{-1,1\}$ is that class label (red or white wine).

In [37]:

# Load dataset
X_train = np.loadtxt('data/wine_train_X.txt')
y_train = np.loadtxt('data/wine_train_y.txt').reshape(-1, 1)
X_test = np.loadtxt('data/wine_test_X.txt')
y_test = np.loadtxt('data/wine_test_y.txt').reshape(-1, 1)

X_train, mu, std = feature_normalize(X_train)
X_test, mu, std = feature_normalize(X_test)


X_train = np.concatenate((np.ones((X_train.shape[0], 1)), X_train), axis=1)
X_test = np.concatenate((np.ones((X_test.shape[0], 1)), X_test), axis=1)

Now, let’s train two different logistic regression models: one with, and one without regularization.

In [38]:

log_reg = LogisticRegression(alpha=0.1) # Without regularization
log_reg_l2 = LogisticRegression(alpha=0.1, lamb=1.0, regularization='l2') # Without regularization

J_history = log_reg.fit(X_train, y_train, num_iter=500)
J_history_l2 = log_reg_l2.fit(X_train, y_train, num_iter=500)

Next, we evaluate the accuracy for each method:

In [39]:

def evaluate_accuracy(X, y, model):
    y_pred = model.predict(X)
    y_pred[y_pred > 0.5] = 1
    y_pred[y_pred <= 0.5] = -1
    return np.mean(y_pred == y)

print("Accuracy on training set: ", evaluate_accuracy(X_train, y_train, log_reg))
print("Accuracy on testing set: ", evaluate_accuracy(X_test, y_test, log_reg))
print("Accuracy w/ L2 training set: ", evaluate_accuracy(X_train, y_train, log_reg_l2))
print("Accuracy w/ L2 testing set: ", evaluate_accuracy(X_test, y_test, log_reg_l2))

Accuracy on training set:  0.9925
Accuracy on testing set:  0.9925
Accuracy w/ L2 training set:  0.9925
Accuracy w/ L2 testing set:  0.9925

To see the effect of regularization on $\theta$, we can plot out each $\theta_j$ under different $\lambda$.

In [40]:

def plot_theta(theta, lamb):
    """
    Helper function for plotting out the value of theta with respect to different lambda.
    theta  (list): list of theta under different lambda.
    lambda (list): list of lambda values you tried.
    """
    plt.hlines(y=0, xmin=0, xmax=np.max(lamb), color='red', linewidth = 2, linestyle = '--')
    for i in range(theta.shape[1]):
        plt.plot(lamb, theta[:,i])
    plt.ylabel('theta')
    plt.xlabel('lambda')
    plt.xscale('log')
    plt.show()

In [41]:

lamb = [0.1, 1, 10, 100, 1000]
theta = []

#####################################################################
# Instructions: For each value in lamb, try a model for it, and     #
#               append the trained weights into the theta           #
#####################################################################
#####################################################################
#                       END OF YOUR CODE                            #
#####################################################################

plot_theta(np.array(theta), lamb)

In [ ]:

Introduction To Machine Learning -Python

Introduction to Machine Learning (CS 5710)¶

Assignment 2¶

Due by 26th September (Thursday) 11:59pm¶

Submission guideline¶

Section 1. KNN [30 pts]¶

Download data:¶

Cross-validation¶

Section 2. Linear Regression [25 pts]¶

Linear regression with one variable¶

Linear regression with multiple variable¶

Section 3. Logistic Regression [25 pts]¶

Logistic Regression¶

Decision Boundary¶

Section 4. Regularization [30 pts]¶

L2 Regularization¶

Do you need a similar assignment done for you from scratch? Order now!
Use Discount Code "Newclient" for a 15% Discount!

DISCLAIMER

PAYMENT METHODS

CONTACT US

E MAIL:

Archive

Introduction to Machine Learning (CS 5710)¶

Assignment 2¶

Due by 26th September (Thursday) 11:59pm¶

Submission guideline¶

Section 1. KNN [30 pts]¶

Download data:¶

Cross-validation¶

Section 2. Linear Regression [25 pts]¶

Linear regression with one variable¶

Linear regression with multiple variable¶

Section 3. Logistic Regression [25 pts]¶

Logistic Regression¶

Decision Boundary¶

Section 4. Regularization [30 pts]¶

L2 Regularization¶

Do you need a similar assignment done for you from scratch? Order now! Use Discount Code "Newclient" for a 15% Discount!

Share this:

DISCLAIMER

PAYMENT METHODS

CONTACT US

E MAIL:

Archive

Do you need a similar assignment done for you from scratch? Order now!
Use Discount Code "Newclient" for a 15% Discount!