Pages

Wednesday 7 February 2018

Seconds behind

The common instruction for keeping a safe distance behind another vehicle is to stay 2 seconds behind it (I don't think it's actually a law, at least not in Ontario -- correct me if I'm wrong). The common method for calculating this is to look for a large object on the side of road, ahead of the car in front of you, wait until that car passes the object, then count the number of seconds until you pass it. If it's less than 2 Mississippis, you're too close.

I've tried this a few times. It's not quite satisfying. First off, it requires some low grade distracted driving. I should be paying attention to road, especially the car ahead of me, as well as other cars around me and the traffic signs, but instead I'm counting as I watch things pass by on the side of the road. Because of this divided focus, the method is also prone to inaccuracy.

After having seen enough accidents that were almost certainly made worse because drivers were following too closely and read about a few more, I became curious what the actual safe distances were. If I am driving at a certain speed behind another car driving the same speed, exactly how many metres does a 2 second gap translate to?

The instruction is common enough that I figured I should be able to Google it and find these values for some common speed limits. All I found was the same instruction, whether from government transportation agencies or driving enthusiasts. No values were given, just the same method for computing them, if anything.

There is another method we can use, though: math. In particular, the kind I learned in high school chemistry class, namely dimensional analysis. Using this technique, it's pretty easy to convert from kilometres per hour to metres per second (details below). Multiply by 2 seconds and you get the distance.

In Canada, the speed limits typically run from 40 km/hr to 120 km/hr (typically on the 10s). These speed limits, their equivalents in metres per second, and the resulting safe minimum distances are summarized in the following table (rounded to the nearest 10th).

km/hrm/secmetres behind
4011.122.2
5013.927.8
6016.733.3
7019.438.9
8022.244.4
9025.050.0
10027.855.6
11030.661.1
12033.366.7

Here's a similar chart for imperial measurements, with miles in places of kilometres and feet instead of metres.

miles/hrfeet/secfeet behind
2029.358.7
2536.773.3
3044.088.0
3551.3102.7
4058.7117.3
4566.0132.0
5073.3146.7
5580.7161.3
6088.0176.0
6595.3190.7
70102.7205.3
75110.0220.0
80117.3234.7

Having these numbers isn't the perfect solution to knowing how far back to to stay behind other vehicles either. It's not like we can get out and measure while we're driving, and I don't think we're great judges of distance, especially when we're driving. Nevertheless, I think it can still be informative.

For example, suppose we are driving at the lowest speed on the first chart (the typical speed through a school zone in Ontario). The bumper to bumper length of a Honda Accord, a mid-sized sedan, is a little under 5 metres. At that speed, the space between me and the car in front of me should be at least big enough to fit an Accord, and there should be plenty of room to fit another in front and another behind (of course, it may be difficult to manoeuvre those cars into that space at that speed, and if we do, we're well out of the realm of safe distances).

A highway speeds (100 to 120 km / hr), the numbers are more than double that. The gap between cars should be larger than the length of an Olympic sized swimming pool. And yet, it's not uncommon to see cars on the highway driving so close that there isn't room enough for a single car between. In other words, people aren't even keeping the safe distance that is supposed to be used in a school zone.

I was pretty surprised to see these numbers. I generally thought I was keeping safe distances. It's certainly true that most of the time I'm keeping a safer distance than a lot of vehicles I encounter. But it's clear, after doing the calculations, that I'm not keeping as far back as I should be.

There is a third option to consider for calculating safe distances. Don't require drivers to do it themselves. The required technology has been around for a long time. Though I expect it will be an essential component of self-driving cars, it's rarely been applied to cars human-driven cars.

------

There are 1000 metres per kilometre, 60 minutes per hour, and 60 seconds per minute. If we are driving X kilometres per hour, that works out to $(X km / hr) \times (1000 m / km) \times (60 min / hr) \times (60 sec / min) = (1000/3600)\times X m / sec = (5/18)\times X m / sec.$

The calculations for feet per second are similar, with 5280 feet per mile in place of 1000 metres per kilometre, yielding
$(X mi / hr) \times (5280 ft / mi) \times (60 min / hr) \times (60 sec / min) = (5280/3600)\times X ft / sec = (22/15)\times X ft / sec.$

Saturday 7 June 2014

Friends in high density places

Last August, I found the map below.  2% of the Australian population lives in the yellow area and the remaining 98% lives in the white area.



Like Australia, Canada is geographically large, but sparsely populated.  Canada's population density is a little higher, though not by much.  On average, there are 3.56 people per square kilometre in Canada compared to Australia's 3.06.  Also, like Australia, our population is centred around a small number of urban areas.  The three most populous metropolitan areas, Toronto, Montreal, and Vancouver, alone make up more than a third of our population, the top 10 more than a half, and the top 100 nearly 80% (mind you, most of the cities in that top 100 list have less than 100,000 people, and some, such as Timmins and Cape Breton are quite large in area).  (Population data culled from various Wikipedia pages).

So I wondered what a similar map for Canada would look like.

Unfortunately, there doesn't seem to be much information out there about how the map was created.  Within each of the yellow and white areas, we see other borders drawn in.  I'm assuming that these represent some national subdivisions, perhaps municipal boundaries or census regions.  So whoever made the map made a choice of which regions to colour yellow and which regions to colour white.

There are many choices that would split the population between 2% and 98%, though.  For example, to construct a region containing 98% of the population, like the one coloured white here, one could start with the sparsely populated inland regions and add increasingly denser regions until 98% is reached.  The result would not be very interesting.  Australia's population is roughly 23.5 million.  2% of that is 470,000, only a bit smaller than the population of Tasmania, the country's smallest province, and roughly 10% of the population of Sydney's, the country's largest city.

So we could create a map where all of Australia other than some parts of Tasmania are coloured white, or where all but a few urban specks of Sydney are coloured white.  In the latter case, the yellow parts would probably so small that it would be difficult or impossible to see them.  The map would hardly be different from a plain map of Australia.  What makes the existing map interesting, though, is how small the area containing 98% of the population is.

To make the map as interesting as possible, then, we would want the white part to be as small as possible.  As I said above, however, there is no information about how the map was created.  Is the white region as small as it could be, based on the subdivisions used, or is there something other choice that would lead to an even smaller region?  Did the creator simply pick the $m$ sparsest or least populous subdivisions, or was something more sophisticated used?

I found it through Twisted Sifter, which refers back to a Reddit thread.  This thread links to an image hosted on imgur, which I embedded above.  After that the trail goes cold.  One of comments from the original poster suggest that he or she is the one who created it, but there is no follow up, and the user's online profile doesn't offer any direction either.

Without knowing, then, we'd have to come up with our own method.  One simple method is to rank the subdivisions by population, and choose the first $m$ that give 98% of the population.  Another would be to rank by area, and choose the first $m$ that give 98% of the population.  A third option would be to rank by density and choose the first $m$ that give 98% percent of the population.  None of these are guaranteed to give the smallest area that includes 98% percent of the population (or, equivalently, the largest area that includes 2% of the population).

Another less obvious way to solve the problem is through a mathematical technique known as linear programming.  We create a function that gives us the area if a certain set of subdivisions is chosen, then we try to maximize the area under the condition that the percentage of the population that lives in the chosen set of regions is less than or equal to 2%. Ideally, we'd be able to insist that the population is equal to 2%, but this might not be possible.  For our purposes that shouldn't be a problem, though, since a smaller number than 2% would only further illustrate the sparseness.

Now let's write this in mathematical notation.  Suppose that ${\bf c}$ is a vector where entry $i$ of ${\bf c}$ is the area of subdivision $i$.  Let $A$ be a row matrix where column $i$ of the only row of $A$ is the percentage of the population in subdivision $i$.  Let ${\bf x}$ be a vector of variables where entry $i$ of ${\bf x}$ is the variable $x_i$.  If $x_i=1$ when subdivision $i$ is chosen and $0$ otherwise, then $f({\bf x})$ gives the total area for that choice of subdivisions and $A{\bf x}$ is the corresponding percentage of the population.  Then we want the following
  • maximize $f({\bf x})={\bf c}\cdot {\bf x}$ such that
  • $A{\bf x}\leq 0.02(=2\%)$ and
  • $x_i$ is 0 or 1 for each $i$
This is not exactly an linear programming problem.  The third condition makes it a binary integer programming problem, because each $x_i$ can be one of only two integer values, 0 or 1.  To make it a linear programming problem we replace the third condition with the following
  • $x_i\geq 0$ for each $i$.
 This could allow for the possibility that a region is partially included.  That is, only a certain percentage of a region's population counts towards the total percentage.  That means that we also want $x_i\leq 1$, since we can't include more than 100% of a region's population.  In that case, the problem is
  • maximize $f({\bf x})={\bf c}\cdot {\bf x}$ such that
  • $A{\bf x}\leq 0.02$ and
  • $0\leq x_i\leq 1$ for each $i$.
As stated, this is not a linear programming problem.  However, we can easily turn it into one.  Suppose that there are $n$ subdivisions in total.  Let $B$ be the $(n+1)\times n$ matrix whose first row is $A$ and whose remaining $n$ rows is the $n\times n$ identity matrix.  Let ${\bf b}$ be the vector with $n+1$ components whose first entry is $0.02$ and whose remaining entries are all 1.  Then we can rewrite the above as
  • maximize $f({\bf x})={\bf c}\cdot {\bf x}$ such that
  • $B{\bf x}\leq {\bf b}$ and
  • $x_i\geq 0$ for each $i$.
This is now a linear programming problem.  Unfortunately, the solution to the linear programming problem is not what we want, although it has the upside of being easier to solve.  In general, solving a binary integer programming problem is not easy.  So we may be able to solve the linear programming problem and use that to get a good solution to our problem (for example, by rounding), even though it might not be the optimal one.  Note that if we restricted to integer solutions, then we get the binary integer programming problem back.  There are some special cases where integer linear problems are easy to solve.  Perhaps this is one.  (Who knows? Maybe one of the three simple methods I gave above will turn out to be the optimal solution, though I'm sceptical any of them would be.)

Seeing a version of this map for Canada would be most interesting to me, because that's where I'm from and because of the geographical similarities.  But maps of this sort could be interesting for other countries, or cities, provinces or states, continents, or even the whole inhabitable world.  And of course, we need not insist on a split between 2% and 98%.  Perhaps a 50/50 split would be interesting.  Maybe there is some other interesting percentage, such as a percentage $X$ for which $X$ percent of the population lives on $100-X$ of the land (if that number even exists).

If the binary integer programming problem could be solved, it would be a simple (though possibly tedious) matter of collecting the required geographic data and plugging it into some map making software.

Update: After finishing this post, I emailed a colleague of mine who has much more experience in linear programming than me to ask him if he had any insights.  He told me that this is a knapsack problem.  I had heard of this problem before, but only informally, so I didn't clue in that it was a special case of integer linear programming.  Oddly, neither the linear programming nor the integer linear programming Wikipedia pages refer to the knapsack problem, nor does the knapsack page refer to either of the other two.

In any case, this is sort of good news, but only good in the sense of "maybe less bad."  The general binary linear programming problem is NP-hard, whereas the knapsack problem is NP-complete.  NP-hard problems are at least as hard to solve as NP-complete problems.  The reason it's only "maybe less bad" is that there is not currently known whether or not there are efficient algorithms for solving NP-complete problems.  According to the Wiki page on the knapsack problem, though, there are efficient algorithms for finding approximate solutions.  Maybe these would be good enough.  Indeed, our eyes may not be able to discern between the optimal solution and an approximate one.  In that case, I guess it's more or less good news.

Notes:
  • For more on linear programming, see here and here or, better yet, pick up a book, such as this one, which was what I first learned about linear programming from.
  • I don't currently have the resources at my disposal to do any of the above, but the population density below from Statistics Canada probably gives a rough idea of what the result would be.

Thursday 22 May 2014

Teaching Math with PowerPoint - a post mortem

I taught a second year linear algebra course in the fall term of the 2013 - 2014 school year.  The first few lectures were review of the prerequisite course.  My normal teaching practice consists almost exclusively of writing on the board when I'm not directly interacting with students.  But I figured that since my students had learned the material already, they didn't need to take notes and I could run through it a little faster than if they were seeing it for the first time.

So I put all of the review notes on PDF Slides (not actually PowerPoint, despite what I wrote in the title), using the Beamer package for LaTeX.  The first week went fairly well, so instead of switching back to my usual practice at the end of the review stage, I decided to to continue using slides.  I didn't intend on using them for the rest of the term, but in the end, that's what I did.

Overall, it was well received by the students.  Direct feedback was positive, and I received no complaints on course evaluations.  One complaint that could have been levelled at me is that the slides were not always perfect.  Sometimes there were typos, for example.  Perhaps the students were content with the corrected versions that I later posted on the course website.

Student performance was consistent with other courses I've taught.  A couple of students, but one was a student who was also doing poorly in other classes, and there were extenuating circumstances to explain the other student's failure. 

It would be irresponsible to draw conclusions about the effectiveness of the practice at this stage, mind you.  It was the first and only time I've taught this course and the first time I've used slides to this extent.  Nevertheless, it went well enough that I plan on doing it again in the upcoming year.

Although the course was far more heavily oriented towards slides than any other time I taught, I still did rely heavily on the blackboard.  I believe it's better to work out examples with the students, so I would only write the examples on the slides, and work out the solutions on the blackboard (often putting the final answer on the slides too, though not always).  

I also made heavy use of Beamer's "pause" command.   That way, I could present the text of a given slide gradually, one line at a time or even one word at a time if I wanted to, rather than all at once.  To some extent, this mimics writing on the board, where the content is revealed chalk stroke by chalk stroke.  I think this makes it easier to absorb.  Rather than trying to take in a whole slide's worth of information, only one line needs to be comprehended at time.  I also think it helps to keep the pace of the lecture from getting too fast, an easy temptation to give into with slides (or overheads, in the olden days).

Switching back and forth between the blackboard and the screen was a bit frustrating at times, since it seems that most classrooms equipped with projectors (which is most classrooms, if not all, these days) are not designed to mix the two methods. 

I taught in three different classrooms for the class.  Each room had a different layout with different upsides and downsides.

One of the classrooms had the projector screen covering up one panel of the chalkboard.  This was okay as long as I could fit everything in the other panel.  Occasionally, I needed both, which meant rolling up the screen and turning off the projector, then, when I'm done with the other panel, pulling the screen down and waiting for the projector to turn back on.  None of these are onerous tasks, but they interrupt the flow of the lecture and use up time.

The other two classrooms had their screen covering the middle of the blackboard, with the edges of the boards still exposed to write on.  In one room, the exposed edges were quite large, about the size of the uncovered panel in the other classroom.  In the other, however, they were fairly narrow.  The annoyance in both cases was that if an example took up more than one board, I had to finish it on the other side where it was hard to see what I had written on the first side because the screen was in the way.

Another issue I had is the fact that screen real-estate is quite limited.  Some things could not be fit onto a single slide.  Something on one slide might refer to something on another slide.  On a blackboard, one can use two different panels for the different things, but there is only one screen.

The other problem I ran into was malfunctioning technology.  In one room, the projector bulb burnt out.  In another, I could not access the network to log in to the computer.  Fortunately, computer services was quick to replace the bulb in the former case, and there was another open classroom with a working projector in the latter case.


If I had my way, each room would be equipped with two, or even three, projector screens, one for the current slide, and the other(s) for showing preceding slides or calling up older results or examples.  There would also be a chalkboard with at least two panels to write on, even when all screens are in use.  Of course, this wouldn't be feasible in all but the largest classrooms, and even if it were, it could end up being the classroom version of The Homer.  I'd settle for an effort on the part of classroom designers to consider that someone might want to use both blackboard and projector at the same time.

Despite the space taken up by the above complaints, they were all fairly minor in the grand scheme of the course.  They don't add up to enough to discourage me from doing it again, looking for places to improve, and maybe thinking about trying it for other courses.

Wednesday 30 April 2014

Pythagorean Statistics

Suppose that $X_1$ and $X_2$ are independent random variables such that $X_1+X_2$ is defined.  Let $\sigma_1$ and $\sigma_2$, and $\sigma$ be the standard deviations of $X_1$, $X_2,$ and  $X_1+X_2$ respectively.  Then $\sigma^2=\sigma_1^2+\sigma_2^2$ (the numbers $\sigma_1^2,$ $\sigma_2^2$, and $\sigma^2$ are called the variances of $X_1,$ $X_2$, and $X_1+X_2$ respectively).  This looks very Pythagorean Theorem (PT) like.  In fact, at least one person calls it the Pythagorean Theorem of Statistics (PToS).

In the link, the author gives a proof of the PToS, but the proof doesn't look much like the proof for the PT.  Nevertheless, the similarity is hard to ignore.  So I wonder, is the PToS just the PT dressed up in statistical clothing, or is it merely a coincidence that the similarity exists?

I suspect it's the former, but I don't quite see the connection yet.  The PT is about right triangles in a plane, and I don't see what that plane would be for the PToS, nor what the triangle is, nor why the third side of that triangle should be related to $X_1+X_2$.  The other author doesn't seem to be aware of a connection either, since none of the reasons he gives for calling it the PToS are "because it is the Pythagorean Theorem."

Update:

My initial instinct was to represent $X_1$ and $X_2$ with a pair of orthogonal axes using $(\mu_1,\mu_2)$ as the origin, where $\mu_1$ and $\mu_2$ are the means of $X_1$ and $X_2$ respectively.  If we let $\arrow x_1=(\sigma_1,0)$ and $\arrow x_2=(0,\sigma_2)$, then we could represent $X_1+X_2$ with the line through $(\mu_1,\mu_2)$ with the direction vector  $\arrow x_1+\arrow x_2$.  The length of $\arrow x_1+\arrow x_2$ is $\sigma=\sqrt{\sigma_1^2+\sigma_2^2}.$  Therefore, $\sigma^2={\sigma_1^2+\sigma_2^2}.$  So we get the Pythagorean identity. 

This isn't a geometric proof of the Pythagorean Theorem of Statistics, though.  At best, it is an illustration of the Pythagorean Theorem of Statistics by the Pythagorean Theorem from geometry.  It is perhaps natural to represent $X_1$ and $X_2$ by orthogonal axes.  Representing $X_1+X_2$ by the line in the direction of $\arrow x_1+\arrow x_2$ was forced to make the geometry work.  It's not as natural.  The more natural thing to do is represent $X_1+X_2$ by a third axis orthogonal to both $X_1$ and $X_2$.  Also, I do not see how the statistical theorem would follow from the geometrical illustration.

Sunday 9 March 2014

Competition and the Logistic Distribution

In my recent post on the other bell curves, I wondered if there was some familiar processes underlying the logistic distribution that would parallel coin-flipping for the normal distribution.  I noted there that logistic distribution works better for the ELO rating system than the normal distribution did.

I also noted that the curve that describes the logistic distribution arises from population growth models that account for competition for resources.  The ELO rating system is also used in the context of competition, but for games.  I wondered whether there was something to this.  Was it a mere linguistic coincidence, or was there a deeper connection?

Well, I still don't have a complete answer, but I can see at least one connection between the two scenarios.  In both cases, there is competition for a finite resource.  For the population growth model it is food supply, and for games, it is wins. 

The parallel is not exact, however.  In particular, the logistic model is used to describe the size of a population, but for a game, the logistic distribution is only used to compute the probable outcomes within a set of competitors, not the number of competitors. So the connection is not complete.

Thursday 6 March 2014

Understanding Self-Information

I've never taken a course in information theory, but recently I've been trying to learn more about it on my own.  I reached a satisfactory understanding of most of the concepts, but an acceptable understanding of self-information proved elusive for quite a while.  If it were just another term of many, I might have been content to ignore it, but the definitions of all of the other terms rest on this concept, so I didn't consider ignoring it to be an option.

Most introductory texts on information theory that I've seen take the definition as axiomatic.  If an event occurs with probability $p$, then self-information of that event is defined to be $-\log p$, where the logarithm is to some chosen base (often 2).  What I wanted to understand better is why the self-information is defined that way.  In my experience so far, this is rarely explained and never explained well.

It did make some intuitive sense, as various authors claim.  One property of this definition is that the self-information of an event increases as the probability decreases.  This is a reasonable property (I would even say necessary) for any self-information function.  Suppose, for example, that we are playing a game of hangman.  The letter "z" is far less frequent in English than the letter "s", so if a "z" is showing, then it is easier to narrow down the word than it would be if an "s" were showing.  Thus, the rare event "contains the letter 'z'" gives more information than the common event "contains the letter 's'."

But $-\log p$ is not the only function that works like that.  We could also use $1-p,$ which is far simpler and avoids the dreaded logarithms.  Simple is always preferred to complicated, so there must be a reason why $-\log p$ is the function of choice.

Well, in order for information to be transmitted, there must be some way to represent it.  Spoken languages use sounds.  Written languages use words with letters some alphabet.  Computers use strings of 0s and 1s, which we could think of as words from a two letter alphabet. 

The number of possible strings (words) of length $k$ from an alphabet with $n$ symbols is $n^k$.  Let's represent this number by $w$, so $w=n^k.$  We could also write this relationship between $w,$ $n$, and $k$ with the equation $k=\log_n w$.  Although this equation is equivalent to $w=n^k$, we can interpret it differently as follows.  If we want to assign unique labels to $w$ different objects using strings of the same length from an alphabet of size $n$ (that's a mouthful!), then $k$ is the smallest possible length of the strings.

Now, if these $w$ strings are chosen randomly with equal probability, say $p$, then $p=1/w$.  Therefore, $w=1/p,$ and so $k=\log_n \frac{1}{p}=-\log_n p$.  Aside from the base appearing explicitly in this expression for $k$, it is identical to the self-information.  So the self-information of an event with a given probability $p$ can be interpreted as the length of a string that is needed to represent that event if it is one of many events that all occur with equal probability.  Of course, it is possible that some event other than $w$ would have a different probability, but that doesn't change the amount of information provided by by the occurrence of $w$.

I don't know if this (rather long-winded) explanation makes the definition any more sensible for you, the reader.  But once I arrived at this realization, I finally felt like I had understood why self-information is defined that way it is.

-----------------

Another difficulty that I had with the definition is the probabilistic model through which is usually explained.  I can see why using this model is sensible from the perspective of the working information theorist.  But as someone trying to learn about it, I couldn't see how a randomly transmitted symbol could provide any information at all.  If I flip a coin and it shows up heads, so what, even if it's an unfair coin and heads is rare?  And information is not usually transmitted randomly anyway.

Tuesday 25 February 2014

The other bell curves

After lines, conic sections (circles, parabolas, etc.), and graphs of trigonometric functions, the most familiar curve has to be the bell curve.  Perhaps it is even more famous than some or all of those other curves, especially amongst students who just finished an exam that they don't feel good about.

Explaining exactly why we use the bell curve takes some work, but a rough explanation can be given in terms of coin flips.  If you flip a coin say 20 times, then you can expect to get 20 heads with a certain probability, 19 heads and 1 tail with a certain probability, 18 heads and 2 tails with a certain probability, and so on.  This is an example of a binomial distribution.  If you plot these percentages, you get a shape that looks roughly like the bell curve.  For 30 coin flips, the shape looks even more like the bell curve.

In fact, you can make the shape looks as close to the bell curve as you want by considering a large enough number of coin flips.  The coin doesn't even need to be fair.  That is, it could favour one of heads or tails more than the other, and with enough coin flips, you could still make the shape look as much like the bell curve as you like.

So lurking behind many things that give a shape that looks like the bell curve, there is a process something like a coin flip going on.  This is an oversimplification, of course.  There are other ways to reach the bell curve besides coin flips.  The whole story would take a while to explain, however.  The point is that we can arrive at the bell curve through a simple, familiar process.

The most famous bell curve is not the only bell curve, though.  The one I'm talking about above is known as the Normal Curve or Gaussian Curve [1], but there are other curves besides these that have a bell shape, but aren't Normal Curves.

One curve comes from the logistic distribution.  Another curve comes from the Cauchy distribution.  What I wonder about these curves is this.  Is there is some way to arrive at either of these distributions from some simple process in the same way that we can arrive at the normal distribution through the process of coin flips?

One place where the logistic distribution crops up is in the ELO rating system.  This system was developed by a physicist named Arpad Elo (for some reason, his name gets capitalized, as if it's an acronym, when it comes to the rating system) to rate chess players, though it can be used for any type of competition, from individual games like chess to team sports, like soccer.  Each competitor has a rating, and the probability that one competitor will beat another can be calculated from their ratings.  The calculation depends a probability distribution.  In the original scheme, a normal distribution was used, but it was discovered that a logistic distribution gave more accurate predictions.

In fact, it was this application of the logistic distribution that led me to my question above.  The empirical evidence suggests that it's the better choice, but I want to know if there is a reason why.  Is there some sort of logistic equivalent to coin flipping for the normal distribution that explains why predictions are better with the logistic distribution?


Logistic curves also pop up in some population growth models.  There are continuous and discrete versions of these models, which in a sense parallel the normal curve (continuous) and the coin flips (discrete).  Perhaps the answer to my question lies here, but I can't see the connection between population growth and games.  (Both do involve competition, I suppose, but it's not clear to me whether or not this is a meaningful connection, or just a linguistic coincidence.)

The Wikipedia article for the Cauchy distribution says that "it has the distribution of a random variable that is the ratio of two independent standard normal random variables."  That suggests that ratios of two independent binomial random variables (coin flips, for example) could be an avenue to explore.

It's also not hard to create other bell curves.  Take any function $f(x)$ that is positive, symmetric about some vertical axis, and concave up.  Then $1/f(x)$ will give a bell curve.  I expect that most of these aren't interesting, though.  Are there any others, besides the 3 mentioned above, that are?

[1] It's actually a class of curves that are related to each other by stretching and shifting vertically and horizontally.