Discrete Mathematics Vignettes: Counting

This spring semester I am co-teaching a first-year undergraduate course in computer science at Rice University. The name of the course is COMP 182: Algorithmic Thinking. In reality the name of the course can be somewhat misleading, as I would rather title it “Introduction to Discrete Mathematics and Algorithm Design and Analysis”, but it doesn’t quite roll off the tongue as easily. However, this is not a post about the course or teaching experience, this is a post about one of the upcoming topics in the class: counting.

Introduction

What is counting? In general in discrete mathematics counting concerns deriving mathematical expressions which capture the total number of certain objects that exist. Of course, since we can always make an identical copy of an object and thus increase the count by one, what we really mean is counting the objects up to an appropriate notion of isomorphism, or put plainly, counting distinct objects up to whatever notion of being distinct we choose.

While deriving expressions that capture the counts of objects is an important task, what I want to talk about today goes one step beyond the basics of counting, and is rather aimed to provide you with some applied tools you can sue in the future (of course, in the spirit of Alexander Razborov, here by applied I mean applied to the study of mathematics). However, in order to make this primer relatively complete we will start with some basic results in counting, and then switch to the applications of counting techniques and ideas to more interesting problems.

Permutations

This is the classic problem with which most counting modules of discrete math books start: Find the number of ways in which n distinct objects can be arranged in a line. Key information here is that we are talking about distinct objects and an arrangement in a line. It is easy to show that the total number of such permutations is n!=1\cdot 2\cdot 3\cdot\ldots\cdot n. We can do so by observing that given an ordering of n-1 objects we have n places to put the new object in, and this holds for every ordering of n-1 objects. More formally, if we denote the number of permutations of n objects by P(n) then we have P(n)=nP(n-1). Also, while we are here, it is useful to note that there is exactly one way of putting no objects down at all, or in other words 0!=1.

Now, we can take a look at the number of permutations in a slightly different way. For the sake of brevity we will be denoting the set of n positive integers as [n]=\{1, 2, 3, ..., n\}. Consider the set of functions f:[n]\to[n], we will call this set \mathcal{F}_n=\{f|f:[n]\to[n]\}. Recall that a function is an assignment of outputs for any given input, such that exactly one output value is assigned to any input value. Thus, we can see that |\mathcal{F}_n|=n^n, as for any given input i\in[n] we have n choices of the output value, and since all choices are independent of each other we have a total of n^n functions. A reasonable follow up question, is to ask how many of those functions are bijections. Recall that we call a function a bijection iff for any two distinct input values it produces distinct outputs, and for every possible output value there exists an input value that maps to it via our function. In other words, when we look at \mathcal{F}_n bijections are precisely the functions that assign to each input i\in[n] a distinct output j\in[n] s.t. for any pair of i_1\neq i_2 we have f(i_1)\neq f(i_2). Note, that since the functions are from [n] to [n] it follows that such an assignment has to use all possible output values. If we look closer at the structure of such bijection, we can realize that it is essentially a permutation of n distinct objects, since we can think of input values as positions on the line, and outputs as the distinct objects we place in those positions. Hence, it follows that there is n! bijections in \mathcal{F}_n.

What we saw above is the fundamental use of counting in combinatorics, we find a combinatorial object of interest (say bijections on the set of n elements) and we count the number of such objects. Often we will see seemingly unrelated families of objects that result in the same total counts, offering us different combinatorial interpretations of the same formula. Furthermore, once we are able to count the number of objects with a given property (say being a bijection) among the total collection of objects (say functions between two sets of n objects) we can immediately get the probability of an object selected uniformly at random to possess the property of interest. In the case of the bijections above we have that Pr(f\in\mathcal{F}_n,\,f\text{ is a bijection})=\frac{P(n)}{|\mathcal{F}_n|}=\frac{n!}{n^n}. We will see why such simple observations can be of use a bit later in the post.

Count once, count twice

Whenever we count number of certain objects we can sometimes derive the formula based on different inputs. This fact becomes useful in cases when we want to show equality between two expressions. In other words, if we want to prove that two expression are equal, one way to do so can be expressing the count of a particular combinatorial object in two ways, each corresponding to one of the expressions. A quick simple example of this proof technique is given by the handshake lemma in graph theory. Consider a simple graph G=(V, E). We want to evaluate the sum of degrees of all vertices \sum_{v\in V}\mathrm{deg}(v). In order to do so, we can think of a counting argument. What is a degree of any given vertex? It is exactly the number of edges incident to that vertex. Hence, if we sum degrees of all the vertices in a graph it follows that we are indirectly counting the edges of the graph. However, each edge in a simple graph is incident to exactly two vertices. This means that when we summed our degrees, we actually counted occurrence of each edge exactly twice (once per endpoint). Thus, it follows that \sum_{v\in V}\mathrm{deg}(v)=2|E|. While this result at first might not appear as a counting argument, if you think about it for a bit you can recognize that we essentially counted the number of edges in a graph in two ways, once by considering degrees of vertices and once, by simply counting all edges.

Another common example for double counting comes from the problem of finding the number of subsets of a set with n elements. Specifically, on one hand each subset is determined by whether each element is in or out of it, i.e. for each element in [n] we have two possible choices either it is in the subset (1) or it is not (0), thus the total number of possible subsets is the number of all possible strings of n characters over the alphabet of {0, 1}, which is 2^n. On the other hand, each subset has some size k where 0\leq k\leq n, and we know that the number of ways to pick k elements from n is \binom{n}{k}. Hence, we can conclude that \sum_{k=0}^n\binom{n}{k}=2^n.

A fun extension of the argument above can give us a way to count a slightly more complicated set of objects. We are now interested in how many subsets of a set of n elements have an even number of elements in them. On one hand we can express this number as the sum of \binom{n}{2k} with k\leq n/2, while on the other hand we can consider the following argument: take an arbitrary subset of a set with n-1 elements, now there is a unique way to extend it to an even sized subset of an n element set. Namely, if our starting subset has an even number of elements, we have to exclude, i.e. assign value 0, to the last element in the n element set, if the starting subset has an odd number of elements then we need to add the last element to it to make the size even, i.e. we assign the value of 1 to the last element. It follows that for any of the 2^{n-1} subsets of a n-1 element set, there is a unique extension to an even sized subset of n element set. Hence, we can conclude that \sum_{k=0}^{n/2}\binom{n}{2k}=2^{n-1}, or in other words, exactly half of all subsets have an even number of elements in them. This is a nice confirmation to an intuitive guess we might have had originally, but sadly the argument does not extend for the number of subsets with a multiple of 3 elements in them. Let’s formulate this more general version of the question. Let S(n, k) be the number of subsets of a n element set that have size divisible by k. We have from our previous work the values S(n, 1)=2^n, and S(n, 2)=2^{n-1}. It would be nice if S(n, 3) was simply 2^n/3, but sadly that is not an integer. However, with a little bit of elbow grease and usage of binomial theorem we can show that |S(n, 3)-\frac{2^n}{3}|< 1. I will leave the solution to this problem as an exercise for a curious reader.

To summarize, we just saw how counting the same combinatorial objects from two perspectives can serve us as a proof technique for showing equalities between expressions. Thus, we are starting to explore the field of applications that counting techniques provide us.

It exists! But I can’t show you an example

Recall how in the bijection counting question we brought up the probability of a randomly picked function from [n] to [n] to be a bijection. Besides the classical uses of probabilities, such as analyses of randomized algorithms, we can also wield probability as a tool for showing existence of objects with desirable properties. This technique is called probabilistic method and it was pioneered in combinatorics by Paul Erdős. The premise for the application is quite simple at the first glance, in order to show that an object x with a desirable property exists, we will show that the probability that a random element x belongs to the set of elements with desirable property x\in A, where x is picked from out universe \Omega is greater than 0.

In order to make this more concrete we will work through an application of this method to a graph theoretic question. Consider a complete graph on n vertices K_n. We will call a tournament an orientation of the complete graph, i.e. for every edge {u, v} in a complete graph we will pick a direction and replace it with a directed edge u\to v or v\to u. It is easy to check that the total number of tournaments on n vertices is 2^{\binom{n}{2}}=2^{\frac{n(n-1)}{2}}. Now, we are interested in determining whether for n\geq 3 there exists a tournament with n vertices that contains at least (n-1)!/2^n Hamiltonian cycles.

To start, consider the sample space of all tournaments on n vertices and assume the uniform distribution on it. Now, we want to introduce a random variable X that counts the number of Hamiltonian circuits (recall that a random variable is a function from the sample space to some set, in our case we have X:\Omega\to\mathbb{N}) in a tournament. Fix a node and consider some permutation of the remaining nodes in the tournament (we have a total of (n-1)! such permutations), we want to know when there is a Hamiltonian cycle that achieves that permutation. Clearly we need to have a directed edge from every predecessor to its successor in the cycle, hence in total we need n edges to point in the correct direction determined by our permutation. Now, let’s define an indicator random variable X_\sigma which is equal to 1 whenever the Hamiltonian cycle defined by the permutation \sigma:[n]\to[n] exists in a tournament. We want to compute Pr(X_\sigma =1). In order to do so, we note that we need exactly n edges to have fixed orientation, and hence it follows that the probability of such an event occurring is 1/2^n (since we can think of flipping a coin for orientation of each edge, it’s up to reader to check that this process indeed results in uniform distribution over tournaments). Now, by construction we have X=\sum_{\sigma}X_\sigma where \sigma goes over all possible permutations on n-1 vertices. Hence, by the linearity of expectation we have that \mathbb{E}[X]=\sum_{\sigma}\mathbb{E}[X_\sigma]=\sum_{\sigma}Pr(X_\sigma =1). Now, using our previous result we can conclude that \mathbb{E}[X]=(n-1)!/2^n. Hence, since the expected value is (n-1)!/2^n it follows that there exists some tournament that has at least (n-1)!/2^n Hamiltonian cycles in it.

While it is a stretch to claim that this result is purely enabled by our ability to count combinatorial objects, a lot of key ingredients to the proof rely on our ability to count. Furthermore, this showcases the power of probabilistic method as a non-constructive proof technique for showing existence. In other words, if we can properly count the number of certain objects, we may have a hope of proving interesting results about existence of objects with certain properties.

Instead of conclusion

Of course this is a cursory and incomplete account of counting techniques and applications in discrete mathematics. We haven’t touched upon some key counting ideas such as the inclusion-exclusion principle nor did we discuss any of the more advanced approaches such as generating functions. Furthermore, my account of probabilistic method was purposely short and lacked some of the proper rigor required. However, my aim here was to showcase some of the fun applications of counting techniques and to kindle the spark of interest in the reader. I am not certain whether I will write a follow up to this post any time soon, but stay tuned in case if I do.

Cheers, and don’t forget to count your chickens both before and after they hatch! 🐣

Week(ly quick)ies: Random walks

Introduction

Since my blog has been experiencing a shortage of content (mostly due to the fact that I am busy with my current research work), I decided to try out a new format that will hopefully motivate me to post more often. “Weekies” i.e. “weekly quickies” are going to be weekly posts of small puzzles that I come up with or come across on the web. I plan to follow up each “weeky” with a longer write-up of solution and discussion roughly within a month or so of the original post. In the meantime do not hesitate to post solutions and discuss the puzzle in the comments under the original post.

Random walks

Consider the K4 graph/tetrahedron depicted below. The following questions concern different random walks starting at the vertex A.

Q1. Assume that at any time step we will move to a random neighboring vertex with the equal probability of 1/3. What’s the probability that we will visit all vertices of the graph before returning to A?

Q2. Given the same walk as in question 1, what’s the probability that we will visit any one vertex out of {B, C, D} at least n times before returning to A?

Q3. Now, let’s change our walk to have a 1/2 probability of remaining at the current vertex and probabilities of 1/6 to move to each of the neighboring vertices. How do answers to questions 1 and 2 change for this walk?

Q4. With the walk described in question 3, what is the expected number of steps we need to take to visit all vertices at least once?

Q5. Let us record the sequence of visited vertices as a string of characters A, B, C, D. Consider a random walk in question 3 and record its first 10 vertices. What’s the probability that the string “ABCD” occurs as a subsequence in the recorded sequence?

Q6. We can generalize the walk described in question 3 using the parameter q ∈ [0, 1] by letting the probability of staying at the current vertex be q and probability of going to any given neighbor be (1-q)/3. Additionally, let the start vertex be picked uniformly at random. Let n be the length of the sequence of vertices that we record, analogously to question 5. Let Pq,n be the distribution on the Ω={A, B, C, D}n generated by the described process. Let Un be the uniform probability distribution on the Ω. Let D be the total variation distance function and let us denote d(q,n) = D(Pq,n, Un). Investigate behavior of the function d : [0, 1] ⨉ ℕ → [0, 1].

Sets

General definitions

From Wikipedia:

In mathematics, a set is a collection of distinct objects, considered as an object in its own right. For example, the numbers 2, 4, and 6 are distinct objects when considered separately, but when they are considered collectively they form a single set of size three, written {2, 4, 6}. The concept of a set is one of the most fundamental in mathematics. Developed at the end of the 19th century, set theory is now a ubiquitous part of mathematics, and can be used as a foundation from which nearly all of mathematics can be derived. In mathematics education, elementary topics from set theory such as Venn diagrams are taught at a young age, while more advanced concepts are taught as part of a university degree.

To break this down into simpler terms there are two important aspects of what constitutes a set:

  1. A set is a collection of distinct objects.
  2. A set itself constitutes an object, i.e. we can think of it as a tangible collection.

An example of a set can be pizza offerings at Giordano’s (a pizzeria in Chicago). This set contains distinct elements: Pepperoni pizza, Supreme pizza, Goat cheese and spinach pizza, Italian sausage pizza, Margherita pizza; and is in itself an object: a pizza menu.

The code below illustrates how we can declare a set in Python.

my_set = {0,3,4,0,7,9,13,35,0}
print(my_set)
{0, 3, 4, 35, 7, 9, 13}

We can see that in fact, even if we declared some non-distinct (i.e. repeated) elements, the set doesn’t contain them, as evidenced by the print() function.

Set Membership and Subsets

Given an object and a set we can test whether this object belongs to the given set. This is a check for set membership. We can also verify if an object does not belong to a set.

Given a set $A$ and an object $x$, we use the notation $x\in A$ to denote that $x$ is an element of $A$. We also use notation $x\notin A$ to denote that $x$ is not an element of $A$.

The code below illustrates how we can test these conditions in Python.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
print(3 in A)                         # Will print True, because 3 is an element of A.
print(6 in A)                         # Will print False, because 6 is not an element of A.
print(7 not in A)                     # Will print False, because 7 is an element of A.    (note the use of "not")
print(8 not in A)                     # Will print True, because 8 is not an element of A. (note the use of "not")
True
False
False
True

Another important relation is that of being a subset. If membership is a relation between an object and a set, then being a subset is a relation between two sets. Namely we say that $B$ is a subset of $A$, denoted $B\subseteq A$, if every element of $B$ is also an element of $A$. We also will say that $B$ is a proper subset of $A$, denoted $B\subset A$, if every element of $B$ is also an element of $A$, but there are elements in $A$ that are not in $B$.

The code below illustrates how we can test these relations in Python, and provides some examples of subsets and proper subsets.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 3, 5, 7}                      # B is the set of prime numbers less than 10.

print(B.issubset(A))                  # Check if B is a subset of A. Will print True.
print(B <= A)                         # Check if B is a subset of A. Will print True.
print(B < A)                          # Check if B is a proper subset of A. Will print True,
                                      # since all elements of B are in A (subset condition),
                                      # but 11 is in A, and not in B (proper condition).
        
print(A.issubset(A))                  # Check if A is a subset of A. Will print True. 
print(A <= A)                         # Check if A is a subset of A. Will print True.
print(A < A)                          # Check if A is a proper subset of A. Will print False,
                                      # since all elements of A are in A.
                                      # Note: a set is always a subset of itself.
True
True
True
True
True
False

Set Operations

Now, let us take a look at some common set operations. As many things in mathematics, these concepts can become more natural if visualized. Hence, let us briefly introduce the idea of Venn diagrams.

A Venn diagram is a schematic representation of a set and its possible relations with other sets. We usually will use (possibly misshapen) circles to denote the “set” and colors or the elements itself to denote the elements of this set. The few examples below will illustrate this idea.

Venn diagram of letters

Venn's four ellipses diagram

Set Union

The first set operation we will look at is set union. We can think of it as addition for the sets. The result of a set union is the set containing elements that appear in either of the sets. The following Venn diagram shows in red the union of sets $A$ and $B$, denoted $A\cup B$.

A union B

We can compute a union of two sets in Python by using the union method or by using | operation on sets. The code below illustrates this.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A | B
print(C)
{2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 19}
A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A.union(B)
print(C)
{2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 19}
# Note that any possible "overlapping" elements will be only accounted for once,
# and thus the result will be a set (elements are distinct). This can be seen from
# the Venn diagram (the intersecting region is covered once) and from the example
# below.
A = {4, 5, 6, 7, 8, 9}
B = {6, 7, 8, 9, 10, 11}
C = A | B
print(C)
{4, 5, 6, 7, 8, 9, 10, 11}

Set Intersection

The next operation is set intersection. A set intersection is a set (possibly an empty one) that contains elements that appear in both sets. In other words, intersection is the overlap of the original sets. The following diagram shows the intersection of sets $A$ and $B$, denoted $A\cap B$.

A intersection B

We can compute an intersection of two sets in Python by using the intersection method or by using & operation on sets. The code below illustrates this.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A & B
print(C)
{2}
A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A.intersection(B)
print(C)
{2}
# Only the elements present in BOTH sets get into the intersection. Thus in some
# cases the intersection can be empty. A Venn diagram for thsi case would be two
# non-overlapping circles.
A = {4, 5, 6, 7, 8, 9}
B = {10, 11, 12, 13, 14}
C = A & B
print(C)
set()

Set Difference

Next operation we will look at is the set difference. It is useful to know which elements belong to one set, but not the other. The set difference is a set that contains elements from the first set, but not the second one. The following diagram shows the difference of sets $A$ and $B$, denoted $A – B$ or $A\setminus B$.

A difference B

We can compute a difference between two sets in Python by using the difference method or by using - operation on sets. The code below illustrates this.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A - B
print(C)
{3, 5, 7, 11, 13, 17, 19}
A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A.difference(B)
print(C)
{3, 5, 7, 11, 13, 17, 19}
# Note that just like the difference of two numbers depends on the order, the difference
# of two sets also depends on which one we want to subtract from. The example below
# illustrates this idea.
A = {4, 5, 6, 7, 8, 9}
B = {7, 8, 9, 10, 11}
C = A - B
D = B - A
print("A - B is {}".format(C))
print("B - A is {}".format(D))
A - B is {4, 5, 6}
B - A is {10, 11}

Set Symmetric Difference

The last set operation we will talk about is the symmetric difference. There are several ways you can think about the symmetric difference, but all of them encapsulate the same idea. We want to have a set that has elements that appear in either $A$ or $B$, but not in the both sets. Using the notation defined above we can write this as $(A\cup B) – (A\cap B)$ (the union/sum of the sets minus their intersection) or alternatively as $(A – B) \cup (B – A)$ (the $A$ without $B$ union $B$ without $A$). The following diagram shows the symmetric difference of sets $A$ and $B$, denoted $A \Delta B$.

A symmetric difference B

We can compute the symmetric difference between two sets in Python by using the symmetric_difference method or by using ^ operation on sets. The code below illustrates this.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A ^ B
print(C)
{3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 19}
A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = A.symmetric_difference(B)
print(C)
{3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 19}
# The symmetric difference, unlike the regular set difference is symmetric. 
# Which means that the order of sets does not matter, the result will be
# the same, as illustrated by code below.
A = {4, 5, 6, 7, 8, 9}
B = {6, 7, 8, 9, 10, 11}
C = A ^ B
D = B ^ A
print(C)
print(D)
{4, 5, 10, 11}
{4, 5, 10, 11}

Additional remarks

Just like arithmetic operations are defined using two numbers, but can be extended to lengthier expressions, the set operation can be applied to multiple sets. In Python the easiest way to encapsulate this concept is using the set operations |, &, -, ^ and the appropriate () to group those operations. Examples below illustrate this idea.

A = {2, 3, 5, 7, 11, 13, 17, 19}      # A is the set of prime numbers less than 20.
B = {2, 4, 6, 8, 10, 12, 14, 16, 18}  # B is the set of even numbers > 0 and < 20.
C = {3, 5, 7, 9, 11, 13, 15, 17, 19}  # C is the set of odd numbers > 1 and < 20.
D = {3, 6, 9, 12, 15, 18}             # D is the set of numbers divisble by 3, > 0 and < 20. 
# This set is the union of C and D minus A. 
# Effectively it will contain numbers divisible by 3 or odd, that lie between 1 and 20,
# but will not contain the numbers that are prime.
E = (C | D) - A 
print(E)
{6, 9, 12, 15, 18}
# This set is a symmetric difference of A, B and difference between
# C and D. 
# Effectively it will contain numbers that are either prime, or even, or
# odd, but not divisible by 3. However, it will not contain numbers that satisfy
# more than two of those conditions at the same time (i.e. it won't contain 11,
# since it is both prime and not divisible by 3).
F = A ^ B ^ (C - D)
print(F)
{3, 4, 6, 8, 10, 12, 14, 16, 18}

Use Cases

We went through the trouble of learning the definitions for basic set arithmetic (the Operations section) and membership and subset relations (the Membership and Subsets section), so now is a good time to present some use cases for these structures and operations.

Besides being an essential building block in modern mathematics, sets often present a highly convenient data structure in programming. The examples below will guide you through some useful applications of sets in programming. Some of these examples are inspired by real production code.

Filtering down unwieldy lists

Sometimes we are faced with the problem of filtering a rather large list to only show unique values. A few common examples include the following:

  1. Identifying the unique caller IDs for a large list of phone calls.
  2. Identifying categories of the items carried by a store from the full inventory list.

Below we will address both of the problems by leveraging the property that a set contains distinct elements, and hence will effectively filter out only the unique elements.

# Problem 1.
# ----------
# Write a function that takes in a list of phone numbers (as strings),
# and returns a list containing the unique phone numbers from the original
# list.
#
# Input: list of phone numbers.
# 
# Output: list of unique phone numbers.
def phone_id_unique(numbers):
    unique_numbers_set = set(numbers)
    unique_numbers_list = list(unique_numbers_set)
    return unique_numbers_list
# Problem 1.
# ----------
# Tests:
#
# 1. Input:  ["800-000-0000" repeated 1 000 000 times]
#    Output: ["800-000-0000"]
test_input = ["800-000-0000"] * 1000000
print(phone_id_unique(test_input))

# 2. Input:  ["800-100-0000" repeated 1 000 000 times, "800-200-0000" repeated 1 000 000 times, ...,
#             "800-900-0000" repeated 1 000 000 times]
#    Output: ["800-000-0000", "800-100-0000", ..., "800-900-0000"]
test_input = []
for i in range(1, 10):
    test_input = test_input + ["800-{}00-0000".format(i)] * 1000000
print(phone_id_unique(test_input))

# 3. Input: ["800-000-0000", "800-010-0000", "800-020-0000", "800-030-0000"]
#    Output: ["800-000-0000", "800-010-0000", "800-020-0000", "800-030-0000"]
test_input = ["800-000-0000", "800-010-0000", "800-020-0000", "800-030-0000"]
print(phone_id_unique(test_input))
['800-000-0000']
['800-200-0000', '800-500-0000', '800-700-0000', '800-300-0000', '800-600-0000', '800-400-0000', '800-800-0000', '800-100-0000', '800-900-0000']
['800-020-0000', '800-000-0000', '800-010-0000', '800-030-0000']
# Problem 2.
# ----------
# Write a function that takes in a list of store carried product (as dictioanries),
# and returns a list containing the product categories that appear in the original
# list.
#
# Input: list of items.
# 
# Output: list of product categories.
def product_categories(items):
    categories_set = set([item["category"] for item in items])
    categories = list(categories_set)
    return categories
# Before testing we will load some data from .csv files. These files should be put into
# the same directory as the notebook. CSV stands for comma-separated values, and is a 
# common standard for representing data in text format.
items = []

import csv
with open("data_produce.csv", "r") as f:
    reader = csv.reader(f, delimiter = ",")
    for line in reader:
        items.append({"id": line[0], 
                      "category": line[1],
                      "stock": line[2],
                      "price": line[3]})

# Tests:
#
# 1. Input:  [1000000 items from 8 categories]
#    Output: ["perishables", "water", "kitchen", "furniture", "electronics", "paper", "pantry", "misc"]
print(product_categories(items))
[' pantry', ' furniture', ' water', ' kitchen', ' perishables', ' paper', ' electronics', ' misc']

Implementing common logical operations

Mathematical logic and set arithmetic are tightly connected. This allows us to use set arithmetic to model common logical operations, which in turn can easily encapsulate some everyday tasks we want to perform with out data.

Set union is analogous to logical OR operation, set intersection to logical AND, and the symmetric difference is analogous to logical XOR (exclusive OR) operation. Thus, we can use these operations to translate common tasks into set operations. Let us look at some of the examples below.

# Problem 3.
# ----------
# Write a function that takes in a set of items on mom's shopping list,
# a set of items on dad's shopping list, a set of items already bought by
# mom, a set of items already bought by dad, and finally a set of items
# that are currently in the fridge. The output should be a consolidated
# shopping list, i.e. it should only include the items that are not in the
# fridge and are not yet bought. 
#
# Input: 5 sets of items as described above.
# 
# Output: list of items that need to be procured.
def shopping_list_cons(mom_to_buy, dad_to_buy,
                       mom_bought, dad_bought,
                       in_fridge):
    all_to_buy = (mom_to_buy | dad_to_buy)
    all_bought = (mom_bought | dad_bought)
    to_buy = list(all_to_buy - all_bought - in_fridge)
    return to_buy
# Problem 3.
# ----------
# Tests:
#
# 1. Input:  mom_to_buy = {"apples", "candy", "chicken", "beef"}
#            dad_to_buy = {"candy", "beef", "bread", "cola"}
#            mom_bought = {"bread", "biscuits"}
#            dad_bought = {"milk", "coffee"}
#            in_fridge  = {"eggs", "chicken"}
#    Output: ["apples", "candy", "beef", "cola"]
mom_to_buy = {"apples", "candy", "chicken", "beef"}
dad_to_buy = {"candy", "beef", "bread", "cola"}
mom_bought = {"bread", "biscuits"}
dad_bought = {"milk", "coffee"}
in_fridge  = {"eggs", "chicken"}
print(shopping_list_cons(mom_to_buy, dad_to_buy, mom_bought, dad_bought, in_fridge))
['apples', 'candy', 'beef', 'cola']

Here we take advantage of some set operations to solve the problem. First, we consolidate both “to buy” lists taking their union, thus ensuring that all items are accounted for and none are double counted. Then we consolidate the already bought items by taking another union. Finally, we take the difference between what we need to buy and what is already bought or already in the fridge.

In the next problem will take a look at some applications of the symmetric difference.

# Problem 4.
# ----------
# Students at Chicken Soup High-school are offered two
# options for Calculus classes. There is an "Intro to Calculus"
# class and "Calculus" class. Some students take only 
# the first class through their time in high-school,
# some only take the second class, by placing out of the
# first one, and finally some students take both classes
# as a sequence.
# At the end of each year, average for these classes performance
# is computed to evaluate effectiveness of instructors. 
# The average is computed according to a strange formula,
# because statistics and performance department of 
# Chicken Soup high loves hard to understand numbers.
# You are provided with the formula and the list of 
# students in each class and their grades. To protect
# students' privacy you are given unique StudentIDs.
# 
# AVG = (avg grade for students who only took "Intro") +
#     + (avg grade for students who only took "Calculus") +
#     + 1.75 * (avg grade for students who took both)
#
# Write a function that takes in two dictionaries of 
# StudentIDs and grades and computes the average according
# to the given formula.
#
# Input: a dictionary for students who took "Intro to Calculus",
#        a dictionary for students who took "Calculus".
# 
# Output: the average grade.
import numpy as np
def chicken_soup_high_avg(intro, calc):
    one_course = set(intro.keys()) ^ set(calc.keys())
    both_courses = set(intro.keys()) & set(calc.keys())
    one_grades = []
    both_grades = []
    for studentID in one_course:
        try:
            one_grades.append(intro[studentID])
        except:
            pass
        
        try:
            one_grades.append(calc[studentID])
        except:
            pass
    
    for studentID in both_courses:
        both_grades.append(intro[studentID])
        both_grades.append(calc[studentID])
        
    avg = np.mean(one_grades) + 1.75 * np.mean(both_grades)
    
    return avg
# Problem 4.
# ----------
# Tests:
#
# 1. Input: {"1": 4.0, "2": 3.75, "3": 3.4},
#           {"2": 3.8, "3": 3.0, "4": 4.0}
#    Output: 10.10
print("{:.2f}".format(chicken_soup_high_avg({"1": 4.0, "2": 3.75, "3": 3.4},
                                            {"2": 3.8, "3": 3.0, "4": 4.0})))
10.10

Further remarks

While not strictly necessary, sets can make several classic graph algorithms easier to write and explain. We will cover those in the graphs section.

Downloads