This post is motivated by the March INFORMS blog challenge: O.R. and Sports.
A couple or so years ago, someone contacted me about a model to assign children to sports teams. My correspondent volunteered for a youth recreation league in his home town. Two things are significant in that sentence. The first is "volunteered", which comes from the Latin phrase meaning "budget = zero". So whatever solution we came up with had to be cost-free. The second is "youth". In recreational sports leagues (at least for the youngest competitors), the tendency is for the parents to enroll their children in a league, rather than a specific team. The league then has to form rosters, and the goal is not to produce winners (which would imply losers as well), but to form teams that are as close to competitive parity as possible.
The approach taken by this league was to try to get the average (mean) value of various attributes as close to constant across teams as possible, subject to some constraints (minimum and maximum team sizes, for instance). For qualitative attributes, such as gender, we can treat this as averaging indicator variables (1=girl, 0=boy or 1=experienced, 0=inexperienced). So far, this looks like a modified version of the Hitchcock (transportation) problem, with kids as unit-supply "sources" and teams as "sinks". The modifications include some additional constraints (a utilized "sink" has to contain at least enough players to field a full team) and a rather funky (and, as it turns out, nonlinear) objective function. The nonlinearity occurs because attribute averages involve dividing by team size, which is not constant.
Anyone who has designed a model for a real-world (not textbook) problem is probably painfully aware that the problem you are initially asked to model winds up not being the problem you actually have to model. Sure enough, additional wrinkles creep in. Parents with twins want/need to have the kids on the same team (to avoid car-pool nightmares). Some kids may need to be separated (to avoid fights). Some kids have parents who volunteer (that word again) to be assistant coaches, but only for the team their kid is on; there are not enough assistant coaches to go around, so we really don't want two on the same team, which means separating the kids with volunteer parents. And so on.
I won't go into mind-numbing details (I'll save that for students that can't outrun me :-)), but the code I ended up writing (called Parity Builder) is available open-source. All it requires for infrastructure is Sun Java (1.6 or higher), and it comes with a tutorial.
One last point, which I find interesting. One of my colleagues in organizational behaviour (the "OB world" in the blog title) pointed me to some literature alleging that competitive performance may depend more on the within-team variance of certain attributes than the mean. If true (and if it applies to any of the attributes used by the recreational league), perhaps the approach we took does not really optimize balance. On the other hand, it almost surely optimizes the appearance of balance, and the true objective function was probably "minimize parental bitching". So maybe we're good after all.