Pages

Sunday, 19 January 2014

Paul Revere and Graph Theory

Constructing a graph based on historical data, Kieran Healy explains how we can use the graph to "find" Paul Revere.
I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty’s subjects. This is in connection with the discussion of the role of “metadata” in certain recent events and the assurances of various respectable parties that the government was merely “sifting through this so-called metadata” and that the “information acquired does not include the content of any communications”. I will show how we can use this “metadata” to find key persons involved in terrorist groups operating within the Colonies at the present time. I shall also endeavour to show how these methods work in what might be called a relational manner.

Saturday, 4 January 2014

When to set office hours

This past term, I was assisting as a tutor for one section of a math course (I am teaching this course in the present term).  The students in the class wanted to have some extra tutorial time.  One of the students collected information about when everyone was available.  It turned out that there was no common period where all interested students could come, so there ended up being no extra tutorial.

This was between the professor and the students.  I didn't find out about it until after the fact.  I assume from the conversation, though, that there was going to be at most one extra tutorial, because the proposal was nixed once it was revealed that there was no common period.

But suppose that the professor was willing to offer a number of tutorials limited only by what periods where either he or I would have been available to lead one (assuming that the tutorials would be held during the day).  Then it may still be possible to find a way to accommodate everyone.  In fact, it's pretty easy to tell whether or not it is.

We already have a list of which periods students are available.  Since at least one of the available tutors (in this case, me and the professor) must be at the tutorial, we can strike from that list all periods where none are available.  If everyone is available in at least one of the remaining periods, then all students can be accommodated.  Otherwise it's not.

One way to do this is simply hold tutorials in each of those remaining periods.  However, professors have many responsibilities besides teaching ("publish or perish" and all that), so it's desirable to use the fewest tutorial periods possible.  So how can we do that?

It turns out that this is an application of a well studied problem in combinatorics, known as the set cover problem.  Suppose we are given a collection of sets, say $S_1,S_2,\ldots, S_k$.  Let $U=\cup_{i=1}^k S_i.$  Suppose that $I$ is a subset of $\{1,2,\ldots,k\}$ and $U=\cup_{i\in I}S_i$.  Then $I$ is called a {\em set cover} of $U$.  The set cover problem is to minimize the size of $I$.

Now, suppose that the periods are numbered $1,2,\ldots, k$.  For each $i$, let $S_i$ be the set of students who are available during period $i$.  Then the we can find the fewest tutorial periods needed by finding the smallest set cover of $U$.

Unfortunately, the set cover problem is NP-complete.  The formal definition of what this means takes a fair bit of work to explain, so I won't get into it here (if you're interested, follow the link to the topic in the Wikipedia article on the set cover problem).  In practical terms, though, it means that there is no currently known way to find the smallest set cover efficiently (and there probably won't ever be).  Perhaps the number of students and number of periods is small enough that it is still manageable by computer, or there is some approximation algorithm that will give a near-optimal solution.