OR in an OB World: 2025

Monday, July 14, 2025

A Shinyvalidate Hack

Lately I've been coding a Shiny web application (in R) in which users are confronted by several forms. Since their inputs go into a database, some amount of input validation seems prudent (to put it mildly). I came across the shinyvalidate package, which is proving very useful. You can specify a variety of rules for input fields (starting with whether they are required and moving on to various filters on what is allowed). If the input fails to meet the validation rules, messages are printed in red under the input controls (something you've probably seen before if you ever tried to submit a web form with something missing or glaringly incorrect).

The remainder of this post probably won't make much sense unless you have some experience with shinyvalidate. (I'm not promising it will make sense if you do.) I ran into a bit of a snag with a few input fields where the validation rules involved a disjunction. In one case the fields was the URL of an organization's web site, and in the other it was their contact email address. Both fields are "required" in my application, with required in quotes because it is possible for an organization not to have a web site and/or not to have a contact email address. (For instance, some organizations have web sites but no email address, expecting people to use a contact form on their home page.) The approach I'm taking is to allow "none" as a valid response for both web and email addresses if there is none. So I want to enforce the rule that the input value is either "none" or a valid address.

The shinyvalidate package provides mechanisms for enforcing conjunctions of rules (satisfy all of these requirements) but apparently does not have an explicit mechanism for enforcing disjunctions (either this or that). It does let you create custom functions to validate inputs, but I wanted to use their built-in web (sv_url()) and email (sv_email()) functions so that I would not have to go down the regular expression rabbit hole. That meant writing a function that combined a simple test (x != "none", where x is the value submitted in a text field of the form) with their functions.

It took me a while to figure out that while sv_url and sv_email take a few arguments (including an optional customized error message), they do not actually take the form input (what I'm calling x) as an argument. Instead, the output value of sv_url() or sv_email() is itself a function with the form input as its sole argument. Once that sank in, making the custom validators was trivial. Here is my URL validator.

urlOrNone <- function(x) {
if (x != "none") sv_url("Please enter a valid URL, including the http/https prefix.")(x)
}

If x either equals "none" or satisfies the sv_url validator, the return value is NULL, which allows the input to pass validation. Otherwise, the user gets the custom error message and the input is disallowed.

Tuesday, July 8, 2025

Android Silliness Part II

Last year I documented at great length a mess I went through when a very straightforward morning radio alarm I had set on my bedroom smart speaker had to be replaced by a Google "automation". That post ended with my having successfully implemented a solution.

Well, that was then and this is now. Google recently introduced Android 16 and I recently upgraded my phone from a Pixel 6a to a Pixel 9a. One or both of those events killed my morning alarm. Instead of hearing Google's voice telling me "streaming NPR on WKAR" (my local NPR radio station), I heard "OK, here's a playlist from YouTube Music" followed by some noises that I suspect are used to soften up presumed terrorist before interrogating them.

This was baffling in several ways, starting with the fact that I have never, ever instructed Google to play random music from YouTube Music early in the morning. More baffling is that when I went into the Google automation settings, my morning routine was still there, still marked enabled, and still listed as doing the same things: playing the radio (station = WKAR FM) on the bedroom speaker. Manually triggering it, however, did not work.

I have no idea what the problem was (is), but I eventually found a work around. I changed the "action" setting from playing the radio (one of a menu of possibilities controlled by a multiple select input) to a "custom" action (selected via "Try adding your own (Experiment with custom actions)". That opens a text input where you can type pretty much anything you might verbally ask Google Assistant to do. In my case that meant "Stream WKAR FM radio". To date, that has worked, which I assume means Google has engineers working on a fix for it.

If any Google engineers are reading this, may I recommend consulting the "Redneck Repair Manual" (page 1): "If it ain't broke, don't fix it."

Wednesday, June 11, 2025

RStudio Republish Menu

I'm a big fan of the RStudio IDE for coding in R, but like all software it has one or two quirks. Someone recently asked about one on the Posit Community forum. It's one that has annoyed me a bit.

When editing certain kinds of files (Shiny applications, RMarkdown or Quarto documents, ...), the IDE gives you a drop-down menu to publish (or republish) the document. The menu lets you publish to an existing account or to define a new account and publish there. It lists accounts to which you previously published this app or document. There is also an option to clear the list of previous locations. Here comes the gripe: there is no option to delete selectively one or more previous locations from the list. All you can do is clear the entire list.

It turns out that you can partially but not completely winnow the list, but it takes a file browser or terminal/command prompt and a bit of exertion. Since I've only published Shiny apps, I'll document the steps for that. The process for a Quarto or RMarkdown document is presumably similar but perhaps not identical.

The first step is to navigate to the folder containing the document or application. It should contain a folder named "rsconnect", which is where you want to go next. In there, you should find a folder for each site (server) on which the app has been published. In each of those folders you should find a folder for each name/account under which the app was published on that server. Delete the folders corresponding to the locations you want to delete from the "republish" menu. If you are deleting locations on more than one server, repeat for each pertinent server folder.

For example, I wrote a Shiny application for a colleague to use in a course. The app is installed on shinyapps.io in two places, a paid account used by my colleague and my free developer account. The app files (ui.R and server.R -- it predates the option to use a single consolidated app.R file) live in file X on my PC. So I go there and drill down to X/rsconnect. Since both installations are on the same service, there is only one file there, X/rsconnect/shinyapps.io. Inside that folder are two folders, one bearing the name of the course account (call that A) and the other bearing the name of my developer account (call that B). If I want to remove just my developer account from the republish list, I delete folder X/rsconnect/shinyapps.io/B but leave A in place.

Note that this does not remove the app from any server on which it is current published. For that you need to log into the server and do something. On shinyapps.io, you find the app in your administrative panel, sleep it, archive it and then delete it.

Monday, April 28, 2025

Routing With Sequencing

The motivation for this post comes from a sequence of questions posted on OR Stack Exchange (including this one), having to do with a mixed integer programming model for routing an electronic vehicle (EV) serving various customers. One difference from the basic single vehicle routing models with which I'm familiar is that the EV has to visit a charging station periodically during the route. That is easy to accommodate. Where it gets funky is that the modeler needs to know within the model which customer was last on the route, because the EV is required to go to the nearest charging station after its last stop. I'll take that a step further and require the model to provide (via variables) the position of each node (first, second, third) in the route sequence. This might be useful if, for example, the model had to enforce a rule that customer X must be among the first three customers served. Identifying just the last customer node is easier, as I'll describe at the end.

Attempts by the author of the original question followed the usual pattern for vehicle routing. Assume that there is a single vehicle, each customer must be visited once, and there are no time windows complicating things. You have a digraph containing nodes for each customer and each charging station. You typically start by assigning a binary variable $x_{ij}$ to each arc $(i, j),$ taking value 1 if and only if the vehicle crosses that arc, and proceed from there.

To collect sequencing information, I would normally employ the Miller-Tucker-Zemlin formulation of subtour elimination constraints. The MTZ approach adds a nonnegative auxiliary variable $u_i$ for each node $i$ together with the constraints $$u_j \ge u_i + x_{ij} - M(1 - x_{ij})$$ for each pair of distinct nodes $i\neq j.$ This says that if we cross arc $(i, j),$ the value of $u_j$ must be at least one higher than the value of $u_i,$ preventing any loops. If $n$ is the number of nodes and we are willing to start numbering with $u_s=0$ ($s$ being the starting node for the tour), we can choose $M=n-1.$ The MTZ constraints are intended to prevent subtours, but as a side effect the $u$ variables number the stops in the order they occur.

This would work for the EV problem if there were a rule that the EV cannot use the same charging station twice during a tour. If the vehicle can stop more than once at the same charging station (which I assume would normally be the case), we cannot use the MTZ constraints because a repeat visit to a charging station would create a subtour. This also complicates (I think) the use of subtour elimination constraints to prevent disjoint subtours. Fortunately, there are at least two "reasonable" (in my opinion) workarounds, both using the MTZ constraints. Unfortunately both are clunky.

The first workaround is to create multiple clones of each charging station node. So if node $s$ represents a charging station, we introduce additional nodes $s', s'', s''' \dots$ that are all charging stations, all in the same location (meaning time / distance / charge consumption between node $i$ and any of the clones is the same). For any arcs $(i, s)$ and $(s, j)$ we add arcs $(i, s'), (s', j), (i, s''), (s'', j)$ etc. We do not require that every charging node be entered (unlike customer nodes, which must all be visited), but we do limit each charging node to at most one entry. That removes the threat of loops and let us use the MTZ approach. Besides making the digraph larger, this forces the modeler to guess how many clones of each charging node will be needed.

The other approach is change to a multigraph, with fewer nodes but more arcs. We include only customer nodes, plus dummy start and end nodes. Arcs from the start node to each customer (within EV range) are the same as before. The end node is a stand-in for the closest charging station to the last customer visited. The arc from any customer node to the end node has the time / distance / charge consumption required to reach the closest charging station. (The closest charging station to each customer node is computed before building the model.)

For each pair of customer nodes $i \neq j,$ the arc $(i, j)$ (if it exists) represents moving directly from $i$ to $j.$ For select charging nodes $s$ we add another arc from $i$ to $j$, which I will denote $<i, s, j>,$ that represents going from $i$ to $s,$ recharging, and then proceeding from $s$ to $j.$ Each of those arcs produces another MTZ constraint. As usual, every customer node should be entered/exited exactly once.

If the number of charging stations is small, we can create extra arcs $<i, s, j>$ for every combination of two customers and a charging station, weeding out those that are infeasible (meaning an EV with a full charge could not get from $i$ to $s$ or from $s$ to $j$). To get a smaller model, we can throw out arcs that are dominated by other arcs. For $<i, s, j>$ to dominate $<i, s', j>,$ you would need power consumption from $i$ to $s$ to be no greater than power consumption from $i$ to $s'$ and power consumption from $s$ to $j$ to be no greater than power consumption from $s'$ to $j.$ (If other criteria, such as mileage or transit time, appear in the objective function then they would also factor into the determination of dominance.)

One advantage of this approach is that it would let us dispense with the $u$ variables and the MTZ constraints if the only reason for them was to enable constraints forcing the EV to end at the charging station closest to the last customer, since that is baked into the arcs leading to the dummy end node. If we want to use the cloned charging station approach, we can also use a dummy end node linked to each customer by an arc representing the link to that customer's nearest charging station to enforce the desired ending rule for the tour.

Monday, April 14, 2025

Retaining Libraries During R Upgrades

Today I was able to upgrade R to version 4.5, and was reminded in the process of a tedious "feature" of the upgrade.

R libraries are organized into two distinct groups. If you use RStudio, look at the "Packages" tab. You will see the heading "User Library" followed by whatever packages you have installed manually. Scroll down and you will get to another heading, "System Library", followed by another group of packages. These are the packages that were automatically installed when you installed R itself.

The upgrade from R 4.4 to R 4.5 was very easy, since it comes as a system package, at least on Ubuntu and Linux Mint (and presumably other Linux distributions). I'm not sure about Windows, macOS etc. The Mint update manager offered me updates to several system packages (r-base, r-base-core, r-base-html and r-recommended) among the morning's gaggle of updates. I just installed those and R 4.4.3 was replaced by R 4.5.0. That part could not be easier.

After the updates were done, I opened RStudio and looked to see if any packages there needed updates. The "System Library" group was there, and none of them needed updates (no shock since they had just been installed during the upgrade). The "User Library" did not exist. I should have known this was coming based on previous R upgrades, but I forgot.
You can of course reinstall all your previously installed libraries manually (if you can remember which ones they were), or you can just wait until something doesn't work due to a missing library and install it then. I prefer to reinstall them all at once, and I most definitely do not have the list memorized. The fix is easy if you know how to do it (and remember that you have to do it).

The first step is to open a file manager and navigate to the system directory where your libraries for the previous R version are stored. They will still be there. If you do not know where they are hiding, you can run the command .libPaths() and get a list of the directories in which R will look for libraries. One of them will contain the R version number. (It is consistently the first entry in the list when I do this, but I do not know if that will always be true.) In my case, the entry is "/home/paul/R/x86_64-pc-linux-gnu-library/4.5", which means I want to open "/home/paul/R/x86_64-pc-linux-gnu-library" in the file manager. There I find two directories, one for the previous version ( "/home/paul/R/x86_64-pc-linux-gnu-library/4.4") with lots of subdirectories and one for the new version ( "/home/paul/R/x86_64-pc-linux-gnu-library/4.5") that is empty. All it takes is copying or moving the contents of the older directory to the newer one. Once you have confirmed that the new R version can see the libraries (for instance, by observing that the "User Library" section has returned to the "Packages" tab in RStudio), you can delete the folder for the older version.

With that done, you will want to check for updates to the "User Library" packages. Several that I had installed needed updates today after moving to R 4.5. Updating them is done in the usual way.

I wonder if either R or RStudio has a "inherit libraries from previous version" function stashed away that would automate this? If so, I haven't found it.

Monday, March 3, 2025

Boolean Grid II

As the title implies, this is a sequel to my previous post, to deal with a variety of odds and ends.

There's a saying that you cannot teach an old dog new tricks. That's untrue; it's just that dogs as old as I am are slow learners. When I first started tangling with integer programs, there was a rule that you never made the model dimensions (number of rows or columns) any larger than you absolutely had to, partly to conserve memory and partly so as not to slow down pivoting. Once solvers switched from Gaussian pivoting (the way I learned to do it by hand) to matrix factoring, keeping dimensions down took a back seat to reducing matrix density.

Similarly, once upon a time I learned (the hard way) that symmetry in my model would slow down pruning of the search tree. In the previous post, I alluded to research on exploiting symmetry and said something about solvers having symmetry detection. Imre Polik mentioned in a comment that Xpress already detects the symmetry by default. CPLEX might need some encouragement to do so. Near the start of the Xpress output, it describes the model as symmetric and lists statistics on "orbits" (groups of variables whose values can be permuted). It is unclear to me whether Xpress exploits that information by using "orbital branching" (a relatively recent development) or in some other way. Early in the CPLEX output I see a message that it is "detecting symmetries", but the model dimensions do not change and there are no further mentions of symmetry.

Moving on, Imre suggested in his comment to the previous post that another possible antisymmetry constraint is to assert a dominance inequality between the number of true values in the top row and the number in the left column. This is compatible with my original two constraints, and so I added a version of it (combining Imre's constraint with mine) to my code. Rob Pratt suggested yet another possibility, an inequality between the subdiagonal and superdiagonal. I'm not convinced that one plays nice with my original constraints, meaning that if you added Rob's constraint to my original two you might legislate the optimal solution out of existence, so I added it to my code by itself. Also, it only works when the grid is square. Finally, since both solvers have parameters to control how hard they work to detect symmetry, I added an option to my code to skip adding any constraints and just crank up the solver's response.

The results are summarized in the following graph, which shows the optimality gap (best bound at the left end, best incumbent at the right end) for each solver and modeling option. The dashed vertical line is the optimal solution (from Erwin's post).

plot of solver/model combination results

The "Bilateral" and "Trilateral" models are my original two constraints and my two constraints plus Imre's, respectively. The "Diagonal" model uses Rob's constraint. "None" is just the model by itself and "Solver" is the unmodified model plus a solver parameter setting to get it to work harder detecting symmetries. Note that the results for Xpress with "None" and Xpress with "Solver" are pretty much identical, confirming Imre's assertion that Xpress would detect the symmetry on its own. CPLEX saw a bit of improvement in both incumbent and bound going from "None" to "Solver", so apparently the nudge helped there. Only two runs found the optimal solution, both times CPLEX with some help from antisymmetry constraints. None of the runs got the best bound anywhere near tight.

Returning to my "old dog, new tricks" theme, the takeaway for me is that before I go nuts try to constrain away symmetry in a model, I need to investigate whether the solver can recognize it and, if so, whether it can eliminate or even exploit the symmetry.

Lastly, I belatedly realized that Erwin got his proven optimum quickly because the modified the model to use equality constraints in the interior of the grid and inequalities only in the two outermost rows/columns on each edge. I added that to my code as well, and yes, it gets a proven optimum incredibly fast. I was a bit leery about assuming that redundant coverage would only be required near the boundary, but per some comments by Rob on Erwin's post, 227 is indeed the (known) optimal value for a 32x32 grid.

Friday, February 28, 2025

A Boolean Grid

A recent blog post by Erwin Kalvelagen discusses a very straightforward integer programming problem. You have a rectangular grid of boolean variables, where a variables neighbors are the variable immediately above, below, to the left or to the right of it. The sole constraint is that, for any cell, at least one of that cell's variable or its neighbor variables must be true. The objective is to minimize the number of true cells in the grid. Erwin coded the model in GAMS and ran it for a 32x32 grid. He reported that he got an incumbent value of 227 in about 65 seconds but had trouble getting to optimality. (This might be a good time to point out that Erwin's computer is probably better than mine, since he is a consultant.)

I was curious whether a couple of redundant constraints would help. The problem suffers (if that is the correct term) from symmetry. Draw a grid and color in the cells of an optimal solution. Now flip the grid, switching either top with bottom, left with right, or both. The colored cells still form an optimal solution. What is the harm of symmetry? Think about the branch-and-bound (or, if you prefer, branch-and-cut) algorithm and specifically how it prunes nodes based on bound. When you find a new incumbent solution, you prune any node whose bound is no better than that solution. Typically the node bound will be at least slightly loose (meaning, in a minimization problem, that the bound will be strictly less than the objective value of the best feasible solution lurking in that node). In the context of the current problem, when a feasible solution is found, there will be at least three other feasible solutions with the same objective value, obtained by reversing the indexing of rows, columns or both. Each of them will likely be in a node of the search tree with an objective value at least slightly better than their true value, meaning that none of those nodes can be pruned right away, even if they do not contain an even better solution.

So symmetry can slow pruning and also slow improvement of the best bound. There has been research on how to exploit symmetry in IP models, but as far as I know that work has to be baked into a solver to be used. At least some solvers have some built-in capability to recognize and deal with symmetry, but I'm not sure how well that works. I usually keep an eye out for symmetry and, if I think it might be slowing improvement of the bound, see if I can constrain away some of it.

In this case, the symmetry I identified can be removed by adding just two constraints. One is that the number of true cells in the top row should not exceed the number in the bottom row (so that the vertical flip is ruled out unless the top and bottom rows are tied). Similarly, the other is that the number of true cells in the left column should not exceed the number in the right column (ruling out the horizontal flip). Since these constraints shrink the feasible region, it would not be surprising if they slow down identification of improved incumbents. By enlarging the model, they also slow down (very slightly?) the rate at which nodes are solved. The hope is that faster bound improvements compensate for that.

I ran the 32x32 case (coded in Java) using two solvers, FICO Xpress MP (version 44.01.01) and IBM CPLEX (version 22.1.2), both with and without the antisymmetry constraints. Each run was limited to 15 minutes (wall clock time) and used default settings for all parameters. Here are the results, in the format "best solution/best bound".

	Xpress MP	CPLEX
With antisymmetry	231 / 215.73	227 / 214.40
Without antisymmetry	228 / 215.95	229 / 214.38

There is enough randomness in IP solvers that I would not read much into a single iteration of each model. CPLEX actually did a bit better on the incumbent with the antisymmetry constraints included, which surprises me. Xpress had a very slightly worse bound with them included, which also surprises me. The bottom line seems to be that the antisymmetry constraints do not help much (which I find disappointing) and that, as Erwin noted, the problem is a bit stubborn.

As always, my code is available for download.

Monday, February 10, 2025

CPLEX Drops Documentation

CPLEX Studio 22.1.2 is now available for download, at least on most (possibly not all) supported platforms. The previous version was 22.1.1, and judging from the numbering I assume this is a fairly minor update. That's the good news. The bad news is that the new version no longer installs documentation.

Previous versions created a folder named "doc" in the main folder (parallel to "concert", "cplex", "cpoptimizer" etc.). Under the "doc" folder, if you drilled down a couple of levels, were all sorts of manuals. Particularly important to me were the "refjavacplex" and "refjavacpoptimizer" folders, which contained Javadoc documentation that could be integrated into an IDE (NetBeans in my case) for Java programming. It was also convenient to have the reference manuals installed locally, so that I could bookmark them in my browser and access them even if I was working offline.

Version 22.1.2 does not install the "doc" folder. The reference manuals are still available online if you know where to look (https://www.ibm.com/docs/en/icos/22.1.2), but I do not see a way to make that work as Javadoc in an IDE (and, of course, it is only available when you are online, unless you want to web-scrape it to get a local copy).

I've submitted an "idea" to restore the documentation to the download. If you too want it back, please vote for the suggestion. I would have no objection to it being a separate download, but taking away the Javadoc really seems a bit unfriendly to me.

Meanwhile, I am linking my IDE to the Javadoc for version 22.1.1 and hoping that nothing much has changed.

Wednesday, January 29, 2025

Minimizing Flow Support (II)

In yesterday's post I described a graph optimization problem (from a post on OR Stack Exchange) and how to generate random test instances. You are given a digraph with a single commodity flowing through it. There may be multiple supply and demand nodes (with supply and demand in balance), and arcs have neither costs nor capacity limits. The problem is to find a subgraph with all the original nodes and a minimal number of arcs (the "support" of the flow) such that there is a feasible flow pattern to satisfy all demands. In this post, I will discuss a couple of mixed integer programming (MIP) models for the problem.

I will use the following notation. The nodes are $n_1, \dots, n_N$, and the supply or demand at node $n_i$ is given by $s_i,$ where $s_i > 0$ at supply nodes and $s_i < 0$ at demand nodes. The total supply is given by $F.$ The set of arcs is $A,$ and at each node $n_i$ we denote by $\delta^+(n_i)$ and $\delta^-(n_i)$ respectively the set of arcs flowing into and out of $n_i.$ The common elements of my two MIP models are the following.

Variable $x_a \in \lbrace 0,1\rbrace$ is 1 if and only if arc $a\in A$ is selected for the subgraph.
Variable $y_a \in [0, F]$ is the flow volume over arc $a \in A.$
The objective is to minimize the total number of arcs selected:

$$\textrm{minimize }\sum_{a\in A} x_a.$$
For each node $n_i$ we require that flows in and out combined with any supply or demand balance:

$$\sum_{a \in \delta^-(n_i)} y_a - \sum_{a\in \delta^+(a)} y_a = s_i.$$

The models differ in the remaining requirement, that there be no flow on any arcs that were not selected (i.e., $y_a = 0$ if $x_a =0$). Those constraints are added as follows.

In the "big M" model, for each $a\in A$ we add the constraint $y_a \le Fx_a.$
In the "indicators" model, for each $a\in A$ we add an if-then constraint $x_a = 0 \implies y_a = 0.$

I tested both models using two different solvers, IBM CPLEX 22.1.1 and FICO Xpress 9.5 Optimizer (version 44.01.01). Before running a "production" problem I ran all four combinations of model and solver on a small instance using the solvers' respective tuning routines. In three cases, the default solver settings seemed best. In one case (CPLEX on the big M model), the tuner suggested a couple of nondefault parameter settings, but they fared poorly on the production problem. So I used default parameter settings on all the production runs.

The production problem had 25 supply nodes, 34 demand nodes and 41 transit nodes (nodes where supply was 0) and 526 arcs. Total supply was $F=1000.$ I gave each combination of model and solver a one hour time limit (on my slightly vintage PC). I was mainly curious about how the two models would compare, and secondarily on how the two solvers would compare. Of course, running one test instance for one hour per combination is far from probative, but my curiosity has its bounds. Here is what I found.

Solver	Model	Incumbent	Lower bound	Gap (%)
CPLEX	big M	55	49.5	10
CPLEX	indicators	54	48	11
Xpress	big M	57	49	16
Xpress	indicators	58	49	18

There are some differences among combinations in the results (which, again, might not bear up under multiple tests), but what I found a bit interesting was that the gap never made it below 10% in any combination, even though I consider the test problem to be not particularly large. (Also slightly interesting was that only roughly 10% of the arcs in the graph were needed.)

I will note one difference in the solvers. I'm not sure if it is a function of different default parameter settings or different memory management. Within the one hour run time limit, neither attempt with Xpress ran into memory issues. In contrast, one of the CPLEX runs exhausted system memory (and hung the system) before the hour was up. So I did the other CPLEX run with a limit of 9500 MB on the tree size (set via the parameter CPXPARAM_MIP_Limits_TreeMemory), and that run ended due to memory exhaustion before the hour was up. (Both CPLEX runs lasted only about 53 minutes.)

One of the main reasons I ran the tests was to see whether the model with indicators would give be tighter than the big M model. A bit of wisdom I received a long time ago was that big M was better than indicators if you have insights into the model that allow you to use a not terribly large value of $M.$ Here, the worst case would be if the entire flow volume $F$ passed through a single arc, which lets me use $F$ (1000 in the test problem) as my value of $M.$ The big M runs did produce smaller gaps than their indicator counterparts, but not by much, and possibly not by a "statistically significant" amount (if you can even mention "statistical significance" while working with a sample size of 1 🥲).

Still, for now I will stick to preferring big M constraints with not-so-big values of $M$ over indicator constraints.

As mentioned in the previous post, you can find my code here, including a README.md file that explains the code structure.

Tuesday, January 28, 2025

Minimizing Flow Support (I)

This is the first of hopefully two posts related to a question posted on Operations Research Stack Exchange. You are given a digraph through which a single commodity flows. Arcs have neither costs nor capacity limits. Each node $n_i$ is either a supply node (with supply $s_i>0$), a demand node (with demand $s_i<0$ treated as a "negative supply” in the optimization models), or what I will call a “transit node” ($s_i =0$) with neither supply nor demand. Key assumptions are that total supply equals total demand and that it is possible to find paths through the digraph satisfying all demands (and thus consuming all supplies).

The problem is to select a minimal number of arcs such that the reduced digraph (using all the original nodes but just the selected arcs) contains routes fulfilling all demands. In this post, I'll describe one way to generate random test instances of the problem. The following post will discuss modeling and solving the problem. I have Java code demonstrating both parts in my university GitLab repository.

My approach to generating a test instance starts with specification of the number of nodes in the graph and a lower bound for the number of arcs. (The lower bound might need to be exceeded in order to ensure that a feasible flow satisfying all demands exists.) My code also asks the user to specify an integer value $F$ for total supply (and total demand). I used integer flows just to make printing the supply/demand at each node and flow on each arc neat. You might prefer to use real valued flows and (since the problem is invariant with respect to the flow volume) just set the total supply/demand at 1.

In what follows, I will use the terms "upstream" and "downstream" to refer to nodes from which there are directed paths to a given node (upstream) or to which there are directed paths from a given node (downstream). To simplify explanation, I will treat demands as positive values. The construction process starts by creating the desired number of nodes and partitioning them into supply, demand and transit nodes. Since you need at least one supply node and at least one demand node, the first node is assigned as a supply node and the second as a demand node. (My code graph also assigns one node as a transit node, but if you do not care whether there are any transit nodes you can skip that.) The remaining nodes are randomly classified as supply, demand or transit. Since my code uses integer flow values, each supply (demand) node needs a supply (demand) of at least 1. If the partitioning process creates more than $F$ supply or demand nodes, the excess nodes are reclassified as transit nodes. If you are using real flow values, this can be skipped.

The next step is to allocate total supply (total demand) randomly across the supply (demand) nodes. Again, since I am using integer flows, my code first allocates one unit of supply or demand to each non-transit node, then allocates the remaining supply (demand) one unit at a time, randomly choosing with replacement a supply (demand) node to receive the next unit.

With the nodes created, it is time to move on to creating arcs. My code allocates to each node of any type two initially empty sets of nodes, those upstream and downstream, and two initially empty sets of arcs, those entering and those leaving the node. It also assigns a temporary variable containing the excess supply if the node is a supply node or the unmet demand if the node is a demand node. Two sets of nodes are created, one containing supply nodes with unused supply (initially, all the supply nodes) and the other containing demand nodes with unmet demands (initially, all the demand nodes).

A list of all possible arcs (excluding arcs from a node to itself) is created and randomly shuffled. Arcs are now added from that list until enough directed paths exist to ensure that all demands can be met. As each arc $(a, b)$ is added to the digraph, it is added to the set of arcs exiting the tail node $a$ and to the set of arcs entering the head node $b.$ The set of nodes upstream of $b$ is updated to include $a$ and all nodes upstream of $a,$ and the set of nodes downstream of $a$ is updated to include $b$ and all nodes downstream of $b.$ Finally, nodes upstream of $b$ with unused supply and nodes downstream of $a$ with unmet demand are paired up. The lesser of the unused supply and unmet demand is subtracted from both the supply of the upstream node and the demand of the downstream node, and whichever has zero supply/demand left is removed from the set of nodes with unused supply/unmet demand. It is possible (and certain when the very last drop of supply/demand is accounted for) that both nodes will be removed from their respective sets.

Once there are no nodes left with excess supply/demand, we can be certain the digraph contains at least one feasible solution. All that remains is to randomly add arcs until the user's specified minimum number of arcs is met.

Friday, January 17, 2025

Mint Madness

I'm a long time, and generally quite content, user of the Linux Mint operating system. For quite a while now, it has had a very nice Software Manager program that lets you install or uninstall programs, as well as an older program named Synaptic that provides more fine-grained capabilities. Synaptic has not been updated in almost two years. To a lot of computer nerds, that means it needs to be replaced. To me, it's a reminder of the first sentence of the "Red-neck Repair Manual": "If it ain't broke, don't fix it."

Today I upgraded Mint from version 22 (Wilma) to version 22.1 (Xia). To my surprise, the upgrade uninstalled Synaptic and did not reinstall it. There was no warning in the Mint 22.1 release notes that this was going to happen (?!). Fortunately, it remains available in a repository and I was able to reinstall it (somewhat ironically, via Software Manager, its ostensible replacement).

I looked at the Mint message boards, and there was a fair bit of confusion and concern about this, along with some relief when users found out they could reinstall it. Looking for a rationale for the removal, all I could find were some vague assertions about there being better alternatives (Software Manager?) than the aged Synaptic, and speculation that the Mint developers were perhaps trying to push users to Software Manager or something new.

Here's the problem with this: Software Manager is not a replacement for Synaptic. It's fine for installing programs, but as best I can tell it is useless for installing libraries. For example, suppose that I want to install a program or library that will mess around with PDFs, and the installer balks because it cannot find the libpoppler library (or cannot find the correct version of it). With Synaptic, I can search "libpoppler" and see which versions I have installed and which are out there but not installed. Odds are a third-party program or library looking for it wants libpoppler-dev, and if I don't already have it, it's a couple of clicks to install it with Synaptic. If I search Software Manager, it won't find what I need. (As of today, at least, it just suggests Ruby-poppler, which it says contains Ruby bindings for libpoppler, as opposed to libpoppler itself.) There's a switch in Software Manager labeled "Search in package descriptions (even slower search)". I tried that and I can confirm that "even slower" was a massive understatement. "Glacial" might be more accurate. The extra time was for naught; it found an R library that uses libpoppler and one other similarly irrelevant hit, but nothing related to installing libpoppler.

I am at a total loss as to why the Mint folks would take away a convenient way to install libraries (not included in programs). There are other ways to install libraries (including directly from the command line), but I am not aware of any as easy to use as Synaptic ... which I fortunately still have.

OR in an OB World