MS CS – Machine Learning Assignment Help

MS CS – Machine Learning Assignment Help

CS Machine Learning

Homework 1 – Theory

Keywords: Boolean functions, mistake bounds, PAC learning

Instructions: Please either typeset your answers (LATEX recommended) or write them very clearly and legibly and scan them, and upload the PDF on edX. Legibility and clarity are critical for fair grading.

1. Let D be an arbitrary distribution on the domain {−1, 1}n, and let f, g : {−1, 1}n → {−1, 1} be two Boolean functions. Prove that

Px∼D[f(x) 6= g(x)] = 1− Ex∼D[f(x)g(x)]

2 .

Would this still be true if the domain were some other domain (such as Rn, where R denotes the real numbers, with say the Gaussian distribution) instead of {−1, 1}n? If yes, justify your answer. If not, give a counterexample.

2. Let f be a decision tree with t leaves over the variables x = (x1, . . . , xn) ∈ {−1, 1}n. Explain how to write f as a multivariate polynomial p(x1, . . . , xn) such that for every input x ∈ {−1, 1}n, f(x) = p(x). (You may interpret −1 as FALSE and 1 as TRUE or the other way round, at your preference.) (Hint: try to come up with an “indicator polynomial” for every leaf, i.e. one that evaluates to the leaf ’s value if x is such that that path is taken, and 0 otherwise.)

3. Compute a depth-two decision tree for the training data in table 1 using the Gini function, C(a) = 2a(1− a) as described in class. What is the overall accuracy on the training data of the tree?

X Y Z Number of positive examples Number of negative examples

0 0 0 10 20 0 0 1 25 5 0 1 0 35 15 0 1 1 35 5 1 0 0 5 15 1 0 1 30 10 1 1 0 10 10 1 1 1 15 5

Table 1: decision tree training data

4. Suppose the domain X is the real line, R, and the labels lie in Y = {−1, 1}, Let C be the concept class consisting of simple threshold functions of the form hθ for some θ ∈ R, where hθ(x) = −1 for all x ≤ θ and hθ(x) = 1 otherwise. Give a simple and efficient PAC learning algorithm for C that uses only m = O(1� log

1 δ ) training examples to output a classifier with

error at most � with probability at least 1− δ.

1

 

gchourasia
Cross-Out

 

5. In this problem we will show that mistake bounded learning is stronger than PAC learning, which should help crystallize both definitions. Let C be a function class with domain X = {−1, 1}n and labels Y = {−1, 1}. Assume that C can be learned with mistake bound t using algorithm A. (You may also assume at each iteration A runs in time polynomial in n, as well as that A only updates its state when it gets an example wrong.) The concrete goal of this problem is to show how a learner, given A, can PAC-learn concept class C with respect to any distribution D on {−1, 1}n. The learner can use A as part of its output hypothesis and should run in time polynomial in n, 1/�, and 1/δ.

To achieve this concrete goal in steps, we will break down this problem into a few parts. Fix some distribution D on X, and say the examples are labeled by an unknown c ∈ C. For a hypothesis (i.e. function) h : X → Y , let err(h) = Px∼D[h(x) 6= c(x)].

(a) Fix a hypothesis h : X → Y . If err(h) > �, what is the probability that h gets k random examples all correct? How large does k need to be for this probability to be at most δ′? (The contrapositive view would be: unless the data is highly misleading, which happens with probability at most δ′, it must be the case that err(h) ≤ �. Make sure this makes sense.)

(b) As we feed examples to A, how many examples do we need to see before we can be sure of getting a block of k examples all correct? (This doesn’t mean the hypothesis needs to be perfect; it just needs to get a block of k all correct. Think about dividing the stream of examples into blocks of size k, and exploit the mistake bound. How many different hypotheses could A go through?)

(c) Put everything together and fully describe (with proof) a PAC learner that is able, with probability of failure at most δ, to output a hypothesis with error at most �. How many examples does the learner need to use (as a function of �, δ, and t)? (MS CS – Machine Learning Assignment Help)

References

https://www.ibm.com/topics/machine-learning

 
Do you need a similar assignment done for you from scratch? Order now!
Use Discount Code "Newclient" for a 15% Discount!