Saturday, September 16, 2023

Unnecessary LyX Backup Files

If you edit a file from an older version of LyX and then save it, LyX will create an extra backup file with a lengthy name containing something about "version". As with other LyX backups, the file extension is ".lyx~". I believe this is a defense against the possibility that changes to the file format might muck up the file, and I have absolutely no problem with it. Let's call this the version backup.

That said, for a long time now it has been the case that every time I created a new LyX file, saved it, then edited it and saved the edit, I would get both the standard backup file and (after the first edit only) the version backup. Subsequent saves would only produce the usual backup. This was a minor annoyance for me, since I was perpetually tracking down and deleting the version backups to reduce drive clutter (and cut down the time it took to backup my hard drive, which I do regularly).

Turns out there was a simple explanation (and fix) for this, which I found in a thread on a support forum. Like many users, at some point I created a new file, customized a few things, and then in Document > Settings..., clicked "Save as Document Defaults" to make those settings the defaults for all new documents. Doing so creates a file named "defaults.lyx" in the templates directory (which, on my Linux Mint system, is ~/.lyx/templates). That file used the file version in effect when I set it as defaults, meaning every new file I've created since then has started with that version and then been upgraded to whatever the current version is ... leading to all those version backups.

The fix was simple. I just created a new document and set it as the default template, which updated the template to the current LyX format (544 as of this writing). Now I just have to remember to repeat this whenever a new LyX version comes out with a new file format. (At my age, "I just have to remember" should not be taken lightly.)

Tuesday, September 12, 2023

Randomly Generating a Connected Graph

Someone posted a question on Operations Research Stack Exchange about generating random test instances of a connected graph. (They specified a planar graph, but I am going to ignore that restriction, which I suspect was unintended and which I am pretty sure would make the problem much harder.) The premise is fairly straightforward. You start with a fixed set of nodes, partitioned into three subsets (supply, transshipment and demand nodes). The goal is to select a random set of edges of fixed cardinality such that (a) the resulting graph is connected and (b) no edge connects two supply nodes or a supply node and a demand node. Setting the number of edges too low will obviously make the problem infeasible. The author specified that the graph could be either directed or undirected. My solutions are coded for undirected graphs but can be adapted to directed graphs.

I proposed two solutions, both of which I coded and tested in Java. One uses a mixed integer programming (MIP) model, which my code solves using CPLEX. Let $E$ be the set of all valid edges, $N$ the number of nodes and $K$ the desired number of edges. The MIP model uses binary variables $x_{i,j}$ for all $(i,j)\in E.$ The requirement to include $K$ edges is handled by the constraint $$\sum_{(i,j)\in E} x_{i,j} = K$$ and randomness is achieved by minimizing $$\sum_{(i,j)\in E} c_{i,j} x_{i,j}$$ where the cost coefficients $c_{i,j}$ are generated randomly. 

That leaves the matter of forcing the graph to be connected. To do that, we introduce forward and backward flow variables $f_{i,j} \ge 0$ and $r_{i,j} \ge 0$ for each edge $(i,j),$ where $f_{i,j}$ is interpreted as flow from $i$ to $j$ and $r_{i,j}$ as flow from $j$ to $i.$ The concept is to single out one node (call it $s$) as a source for $N-1$ units of flow of some mystery commodity and assign every other node a demand of one unit of the commodity. For $i\neq s,$ the flow conservation constraint is $$\sum_{(j,i)\in E} (f_{j,i} - r_{j,i}) + \sum_{(i,j)\in E} (r_{i,j} - f_{i,j}) = 1,$$ which says that flow in minus flow out equals 1. To ensure that flow occurs only on selected edges, we add the constraints $$f_{i,j} \le (N-1) x_{i,j}$$ and $$r_{i,j} \le (N-1) x_{i,j}$$ for all $(i,j)\in E.$

The edge removal heuristic is a bit simpler. We start by selecting all eligible edges and shuffling the list randomly. While the number of edges in the graph exceeds $K,$ we pop the next edge from the list, remove it, and test whether the graph remains connected. If yes, the edge is permanently gone. If no, we replace the edge in the graph (but not on the list) and move to the next edge in the list. To confirm that removing edge $(i,j)$ leaves a connected graph, we start from node $i$ and find all nodes that can be reached from $i$ in one step (i.e., remain adjacent). Then we find all nodes adjacent to those nodes, and continue until either we encounter node $j$ (in which case the graph is still connected) or run out of nodes to test (in which case the graph is now disconnected).

The edge removal heuristic is simpler to explain, does not require a MIP solver and likely is faster. There are two potential disadvantages compared to the MIP approach. One is that the MIP is done after solving once, whereas there is a possibility that the edge removal heuristic fails to produce a connected graph due to an "unlucky" choice of which edges to remove, requiring one or more restarts with different random number seeds. The other is that if $K$ is set too low (making a connected graph impossible), the MIP model will detect that the problem is infeasible. With the edge removal heuristic, you would not be able to distinguish infeasibility from bad luck (although if the heuristic failed multiple times with different random seeds, infeasibility would start to look likely). In very very limited testing of my code, the edge removal heuristic was definitely faster than the MIP and achieved a valid result in the first pass each time.

Java code for both methods is in my code repository.