Someone posted a question on Operations Research Stack Exchange about generating random test instances of a connected graph. (They specified a planar graph, but I am going to ignore that restriction, which I suspect was unintended and which I am pretty sure would make the problem much harder.) The premise is fairly straightforward. You start with a fixed set of nodes, partitioned into three subsets (supply, transshipment and demand nodes). The goal is to select a random set of edges of fixed cardinality such that (a) the resulting graph is connected and (b) no edge connects two supply nodes or a supply node and a demand node. Setting the number of edges too low will obviously make the problem infeasible. The author specified that the graph could be either directed or undirected. My solutions are coded for undirected graphs but can be adapted to directed graphs.

I proposed two solutions, both of which I coded and tested in Java. One uses a mixed integer programming (MIP) model, which my code solves using CPLEX. Let $E$ be the set of all valid edges, $N$ the number of nodes and $K$ the desired number of edges. The MIP model uses binary variables $x_{i,j}$ for all $(i,j)\in E.$ The requirement to include $K$ edges is handled by the constraint $$\sum_{(i,j)\in E} x_{i,j} = K$$ and randomness is achieved by minimizing $$\sum_{(i,j)\in E} c_{i,j} x_{i,j}$$ where the cost coefficients $c_{i,j}$ are generated randomly.

That leaves the matter of forcing the graph to be connected. To do that, we introduce forward and backward flow variables $f_{i,j} \ge 0$ and $r_{i,j} \ge 0$ for each edge $(i,j),$ where $f_{i,j}$ is interpreted as flow from $i$ to $j$ and $r_{i,j}$ as flow from $j$ to $i.$ The concept is to single out one node (call it $s$) as a source for $N-1$ units of flow of some mystery commodity and assign every other node a demand of one unit of the commodity. For $i\neq s,$ the flow conservation constraint is $$\sum_{(j,i)\in E} (f_{j,i} - r_{j,i}) + \sum_{(i,j)\in E} (r_{i,j} - f_{i,j}) = 1,$$ which says that flow in minus flow out equals 1. To ensure that flow occurs only on selected edges, we add the constraints $$f_{i,j} \le (N-1) x_{i,j}$$ and $$r_{i,j} \le (N-1) x_{i,j}$$ for all $(i,j)\in E.$

The edge removal heuristic is a bit simpler. We start by selecting all eligible edges and shuffling the list randomly. While the number of edges in the graph exceeds $K,$ we pop the next edge from the list, remove it, and test whether the graph remains connected. If yes, the edge is permanently gone. If no, we replace the edge in the graph (but not on the list) and move to the next edge in the list. To confirm that removing edge $(i,j)$ leaves a connected graph, we start from node $i$ and find all nodes that can be reached from $i$ in one step (i.e., remain adjacent). Then we find all nodes adjacent to those nodes, and continue until either we encounter node $j$ (in which case the graph is still connected) or run out of nodes to test (in which case the graph is now disconnected).

The edge removal heuristic is simpler to explain, does not require a MIP solver and likely is faster. There are two potential disadvantages compared to the MIP approach. One is that the MIP is done after solving once, whereas there is a possibility that the edge removal heuristic fails to produce a connected graph due to an "unlucky" choice of which edges to remove, requiring one or more restarts with different random number seeds. The other is that if $K$ is set too low (making a connected graph impossible), the MIP model will detect that the problem is infeasible. With the edge removal heuristic, you would not be able to distinguish infeasibility from bad luck (although if the heuristic failed multiple times with different random seeds, infeasibility would start to look likely). In very very limited testing of my code, the edge removal heuristic was definitely faster than the MIP and achieved a valid result in the first pass each time.

Java code for both methods is in my code repository.