(Updated, see below)
The idea for this post came from a book that I picked up and put down at a Chapters/Indigo store some time ago. I don't recall the title or author of the book, unfortunately. I've gone to both locations in town to find the it back again, but can't. So instead I'm going to try to recreate the argument that I didn't have time to read in full when I first saw it.
One thing I do recall from the book is that Apple got a bit of bad press because people didn't think their randomization algorithm was truly random. The time it took to hear a repeat was too short according to listeners. Suppose for example, that I have 100 tracks on my iPod. Then shouldn't I hear each track once in every 100 tracks? On average, yes. If the selection is truly random and I listen to enough tracks consecutively, then the average gap between each repeat of that track will be 100 (or 99, depending on whether or not you count the second occurrence towards the gap length).
However, the average doesn't describe what always happens, just what happens overall. It's possible, although unlikely, for the first two tracks to be the same, so we get a repeat right away. It's also possible, although very unlikely, to be able to listen to all 100 tracks straight, without a single repeat, so we don't get our first repeat until the 101st track.
Note that we must have a repeat by track 101. For suppose that we listened to 100 tracks straight with no repeat. Then we have heard all 100 tracks. Thus, the 101st track must be a repeat. This is an application of the Pigeonhole Principle, where each track is a "pigeonhole" and there is one "pigeon" in it for each time we hear the track. Therefore, we only need to concern ourselves with the chance of a repeat in the first 101 tracks.
We'll answer the question of the average time before a repeat in stages. First we'll determine the probabilities that the next track is or isn't a repeat, assuming that no repeats have occurred. Next we'll determine the probabilities that the first repeat occurs at track 1, track 2, etc. Finally, we'll use these to calculate the average track number at which the first repeat occurs.
Probability that the next track is a repeat
First, we consider the probabilities of a repeat and non-repeat, assuming that previous track is not a repeat. Obviously, there is a $0\%$ chance that the first track is a repeat, because there are no tracks before it that it can be a repeat of.
After listening to the first randomly chosen track, there are 99 other tracks that we haven't heard. Since each track is equally likely to occur, the chance that the second track is same as the first is $\frac{1}{100}$, and the chance that it is different is $\frac{99}{100}$.
Suppose that the first two tracks are different, i.e. that the second track was not a repeat. There are now 2 tracks we have heard, and 98 that we haven't. Thus, there is a $\frac{2}{100}$ chance that the next track is a repeat and a $\frac{98}{100}$ chance that it is not.
Similarly, if there is no repeat by the 3rd track, then there is a $\frac{3}{100}$ chance that the next track is a repeat and a $\frac{97}{100}$ chance that it is not. If there is no repeat by the 4th track, then there is a $\frac{4}{100}$ chance that the next track is a repeat and a $\frac{96}{100}$ chance that it is not, and so on.
In general, if we have listened to $k$ tracks without a repeat, then the chance that the next track is a repeat is $\frac{k}{100}$ and the chance that it is not is $\frac{100-k}{100}$.
Probability that a specific track is the first repeat
The next things we need to know are the chances that we will hear the first repeat at track 1, track 2, track 3, etc.
The chance of hearing the first repeat at txrack 1 is 0, as explained above. The chance of hearing the first repeat at track 2 is $\frac{1}{100}$.
In order to hear the first repeat at track 3, tracks 1 and 2 must be distinct, and track 3 must be the same as either track 1 or 2. As explained above, the probability that 1 and 2 are distinct is $\frac{99}{100}$ and the probability that 3 is the same as 1 or 2, assuming 1 and 2 are distinct, is $\frac{2}{100}$. Thus, the probability that the first two tracks are distinct but the third is the same as one of the first two is
$$\frac{99}{100}\frac{2}{100}$$
In order to hear the first repeat at track 4, tracks 1 through 3 must be distinct. The probability that second track is different from the first is $99/100$ and, assuming the the first two are distinct, the probability that the third is also distinct is $98/100$. Thus, the probability that the first 3 tracks are distinct is
$$\frac{99}{100}\frac{98}{100}$$
Since the first three tracks are distinct, the probability that the fourth track is a repeat is $3/100$, so the probability that track 4 is the first repeat is
$$\frac{99}{100}\frac{98}{100}\frac{3}{100}$$
Similarly, the probability that track 5 is the first repeat is
$$\frac{99}{100}\frac{98}{100}\frac{97}{100}\frac{4}{100}$$
In general, the probability that the first repeat happens at track $k$ is
$$\frac{99}{100}\frac{98}{100}\cdots \frac{(100-k+2)}{100}\frac{k-1}{100}=\frac{(k-1)99!}{100^{k-1}(100-k+1)!}$$
where $n!=n(n-1)(n-2)\cdots (2)(1)$ (read "$n$ factorial")
Average track of the first repeat
To get the average track number at which the first repeat happens, we multiply each number $k$ between 1 and 100 by the probability that the track $k$ is the first repeat, then add the results. That is, the average track number for the first repeat is
$$0(1)+\frac{1}{100}(2)+\frac{99}{100}\frac{2}{100}(3)+\frac{99}{100}\frac{98}{100}\frac{3}{100}(4)+\cdots+\frac{(101-1)99!}{100^{101-1}(100-101+1)!}(101)$$
It's far too much work to calculate this number by hand, but plugging this into a computer gives track 13.21 (rounded to two decimal places). So we can expect to get through only a small fraction of my music collection before we start to hear repeats.
If we use the average length of a pop song of between 4 and 4.5 minutes, that means we can expect to hear the first repeat at somewhere between 53.8 minutes and and 59.4 minutes.
The other averages
Typically when we talk about average of some values, we mean the mean, i.e. the sum of the values divided by the number of values. There are other types of averages, though. The median and the mode are probably the most common. You probably learned about them in school, but forgot about them because they're not as widely used, although they are sometimes situations where one of them might be more suitable.
The median, for example, is the middle value (if the numbers are ordered). For example, suppose our values are $1,2,2,3,4$. There are 5 values in total, so the third value is the middle value. Therefore, the median of this list of values is 2. The median is preferable to the mean when talking about average incomes, because the very small number of people with very large incomes drives up the mean, giving a skewed perception of the typical income. The median, on the other hand, is more stable. For example, the $4$ could be changed to $40$ or $4,000$, but the medians in both cases would still be 2, while the means change to 9.6 and 801.6 respectively.
The mode is the most frequently occurring value. In each of the lists of values in the preceding paragraph, the number $2$ appears twice, while each other number only appears once, so the mode is 2. The mode is useful when our values aren't numbers. For example, you could take a poll on people's favourite colour. The mode is the colour that is picked by the most people. The median and the mean are not defined in this case, because colours are not numbers (they can be described numerically, but the numerical description would not be of much use in this case).
A more detailed discussion on mean, median, and mode can be found here.
The median for number of tracks before a repeat in this case is 12 or 13. This means that roughly half of the time, the first repeat will occur on or before the 12th track, and roughly half of the time, the first repeat will occur on or after the 13th track.
The mode in this case is 11. That means that the first repeat happens at track 11 more often than it happens at any other track. The probability that it will happen at track 11 is quite small at only 6.28\%. However, the probability that it will happen at any other track is even smaller (and the further from 11 we get in either direction, the lower the probability is).
Conclusion
All three of these averages are pretty close to each other, relative to the size of the music collection, and they all represent a pretty small fraction of the collection. To make matters worse, more than half the time, a repeat is going to happen before the average.
Now, a music collection of only 100 tracks is rather small. My own currently has around 400 tracks, and I'm not even an avid purchaser of music. The mean track number of the first repeat for a 400 track collection is 25.73, the median is 24, and the mode is 21. Thus, the different averages just barely double (if that), even though the music collection is 4 times as large.
If we multiply by 4 again, for a collection of 1,600 tracks, the mean is 50.8, the median 48, and the mode 41, again, roughly only doubling the number of tracks before a repeat.
Thus, even for large collections, no matter how much Apple made sure that tracks were chosen randomly, we would be bound to hear repeats after only a small proportion of the tracks. So we don't actually want true randomness. Apple solves this problem by choosing a fixed ordering of the tracks each time the iPod (or iPhone, iTunes, etc) is started, much like shuffling a deck of cards, thus the name "shuffle".
Admittedly, I'm using a bit of verbal sleight of hand here, taking advantage of the ambiguity of the word "random". There is an element of randomness in the shuffle too. After all, the category of games of chance includes games played with a shuffled deck of cards. To form the list, the first track can be chosen at random from the collection, and each subsequent track can be chosen randomly from the tracks that haven't been chosen yet, but this is not typically what's meant by the word "random" when it appears without any qualifiers.
Despite Apple's attempts to fix the problem, however, there are still skeptics.
Update. It's been about a year since I first wrote this. I never did find the book that I referred to above. However, I did find a related problem in another book called Duelling Idiots and Other Probability Puzzlers by Paul Nahin. In one of the latter chapters, the author discusses the NBA draft where a similar problem shows up.
Randy Elzinga's mathematics blog. Graph theory, algebra, and real life. Not peer reviewed.