A question on OR Stack Exchange asked about an optimization model (specifically a MILP) for finding the "smallest hull" containing a finite set of points in the plane. The phrase "smallest hull" is rather fuzzy. In a follow up comment, the author of the question clarified that "smallest" means smallest area. That leaves the meaning of "hull". Note that any Hamiltonian path through the points would contain all the points (with all of them on its "boundary") and would have area zero. Based on a sample illustration added by the author, I will assume that "hull" here means polygon (but not necessarily a convex one).

I'm pretty sure that a MILP model could be devised, but I suspect it would be rather large and might solve slowly. So I looked at the possibility of a heuristic solution, and came up with something that seems to do pretty well but definitely does not produce an optimal solution in all cases. I got it working in R. You can see the full notebook (which includes the code) here. (Warning: It's almost 6MB, due to the presence of some graphs.) The notebook uses three R libraries: ggplot2 and plotly, for plotting; and interp, for generating a tesselation (more about that below). The plotly library is used just once, to generate an interactive plot with "tooltips" identifying each point, which is mainly useful for debugging. So if you don't feel like installing it, you can just delete any references to it and still run the important parts of the code.

Suppose we start with the following 2D point set.

Point set (with convex hull) |

Delaunay triangulation |

The heuristic now takes a lap around the boundary of the convex hull. For each line segment $[a,b]$, it locates the (unique) triangle containing that edge. Let's say the third vertex of that triangle is $c$. If $c$ is not currently on the boundary, the heuristic adds it to the boundary, replacing the segment $[a,b]$ with the segments $[a,c]$ and $[c,b]$. It then deletes the $a-b-c$ triangle from the polygon, shaving off some area, and moves on to the next boundary segment.

This loop is repeated until a full circuit around the boundary fails to make any changes, at which point the heuristic terminates. Here is what the final polygon looks like.

Final polygon |

What appear to be line segments sticking out in a few places are actually very skinny triangles. Clearly the heuristic did not select this polygon for its aesthetics.

The points were randomly generated in the unit square, which of course has area 1. For this example, the convex hull has area 0.865, while the final polygon has area 0.511, a reduction of almost 41%. At the same time, it is clear from the final plot that the solution is not optimal. There is one point left in the interior of the polygon, around $(0.493, 0.378)$. The heuristic cannot do anything about it because none of the triangles to which it belongs have an edge on the boundary. We could certainly add line segments from it to two boundary points, at $(0.483, 0.280)$ and $(0.450, 0.175)$, that currently share an edge. That would form a new triangle containing no other points, and we could drop the existing edge, adding the two new edges to boundary, and shed a little more area.

So the heuristic does not produce "nice looking" polygons, nor does it find an optimal solution, but my intuition after running a few examples is that it probably does reasonably well.

FYI folks in GIS call this generating the alpha hull of a set of points, see for example one solution https://cran.r-project.org/web/packages/alphahull/index.html or https://grass.osgeo.org/grass78/manuals/addons/v.concave.hull.html.

ReplyDeleteThanks! That's very interesting. I'm going to try out the R package and see how it does area-wise.

DeleteI played a bit with the R alphahull package. The alpha-convex hull is not only not convex, it has curved sides, so it's not a polygon. That might or might not matter to the person who posted the original question. It also can produce "donuts" (regions with interior holes), which again may or may not work for the original poster.

DeleteFitting the alpha hull requires a trial-and-error selection of the parameter $\alpha > 0$. For very large $\alpha$, you get a slightly curvy approximation of the convex hull. As $\alpha$ decreases, curvature increases and enclosed area decreases, but at some point you start getting a disjoint union of regions (rather than a single region). Shrinking $\alpha$ further, some (eventually all) of the regions appear to become singleton points, defeating the purpose of the exercise.

What I found most interesting was that, on a few test cases, I could not reduce the area of the alpha hull to match what my heuristic produced without creating multiple disjoint regions. Whether this is a consequence of the method used to compute the alpha hull, or the function that computes (approximates?) its area, or what I do not know. It did surprise me a bit, partly because the alpha hull has more flexibility in a sense (curved sides, donut holes) and partly because my heuristic is not based on any sophisticated mathematical insights.

IIRC the R package is alittle bit different than the other GIS implementations I have seen in what it returns. But the implementation under the hood is similar.

DeleteSo I think most libraries just calculate the points that count as the exterior and return those points (akin to the chull R function). So most GIS libraries don't return donuts and don't return the buffers. See https://gis.stackexchange.com/q/1200/751 for other library examples.

Your algorithm reminds me more of the Douglas Peucker algorithm in reverse.

My knowledge of GIS is pretty much limited to being able to expand the acronym, so I'll take your word for this. :-) Thanks for the link. I didn't realize there was an SE site devoted to GIS questions.

Delete