Saturday, December 12, 2020

Incorrect Program Listings in MythTV

I've been using MythTV as a personal video recorder (PVR) since 2012, and for the most part I have been quite satisfied ... but there have definitely been adventures. You can type Mythbuntu or MythTV in the blog search box to see a litany of posts about things that did not work as expected (or at all) (or threatened to implode the universe) and fixes I found. Today's post is a two part adventure regarding program listings.

I have an account with Schedules Direct, which lets me download XML files containing program listings for my cable provider. I access it from two different machines. On my main PC, I have FreeGuide, an open-source TV guide program, installed. Once a week I download the latest listings and use FreeGuide to decide what I want to record during the coming week. On the PC that acts as my PVR, I have MythTV set up to download the same files from Schedules Direct when I tell it to refill the program database, which I again do once a week when programming that week's recordings.

The first part of today's adventure has been going on for a long time. When programming the week's recording schedule, I would occasionally run into discrepancies between the two machines. That is, in certain time slots FreeGuide would report one program and the listings on the PVR would report a different program, even though both were working from identical downloaded data files. When this happened, the listings in FreeGuide were invariably correct and the listings on the PVR incorrect. The obvious workaround was to use the FreeGuide listings, but that meant that when I wanted to record a program that FreeGuide said was there and the PVR said was not, I had to set up a manual recording rule, which is doable but somewhat inconvenient.

Eventually I figured out what was going on (probably with the help of considerable googling). I download 14 days of listings at a time from Schedules Direct, on both machines. Since I do this weekly, there is a one week overlap between the previously downloaded data and the newly downloaded data. FreeGuide replaces the old listings with the fresh download, but apparently MythTV only downloads the second week of data to its database and ignores the first week of the new download. The discrepancies occurred when the contents of a time slot in the overlap week changed between downloads. FreeGuide went with the newer data, but MythTV went with the older (incorrect) data.

I confirmed this by setting up a two line script on the PVR to let me download schedule data manually and overwrite the old data. The script is:

#!/bin/sh
mythfilldatabase --dd-grab-all

Note the option -dd-grab-all, which signals that all 14 days are to be downloaded and added to the database, updating any existing data. Running this from a terminal eliminated the inconsistencies between machines.

This brings me to the second part of today's adventure. I normally update the listings on the PVR machine by choosing the menu option to grab EPG (electronic program guide) data from the MythWelcome user interface. That was set up, back when I first installed MythTV, to run mythfilldatabase without any optional settings. I wanted to update that setting to add the -dd-grab-all option. The problem was, I could not find where to make the change. I did some googling (of course), and every post I found led to the same solution posted in the MythTV wiki: run mythtv-setup; go to the "General" section, and within that to the "Program Schedule Downloading Options" section; then use the second of the six settings there ("Guide Data Program") to set up the program or script to download the guide data. That sounds simple enough, but when I run mythtv-setup and go to that section only the first entry (the toggle for automatically updating listings) is present. The other five are nowhere to be found. I'm pretty sure they were there when I first installed MythTV, but they do not show up when I run the setup program on a machine that has already been configured. Possibly I need to run setup as a different user (the MythTV account "owner"?).

Anyway, I found a simple solution. The PVR machine runs MythWeb, a web interface to MythTV. I use MythWeb to program recordings from my main PC. It also has the ability to access settings (by clicking the button whose icon shows a key and a wrench). In the settings editor, I picked the button labeled "MythTV" and did some serious scrolling. Fortunately the settings are in alphabetic order. The one labeled "MythFillDatabasePath" has the path to the mythfilldatabase program. I added the --dd-grab-all option there, clicked the "Save" button at the bottom of the page, and that (hopefully) fixes the problem. Time will tell.



Thursday, December 3, 2020

New CPLEX MIP Emphasis

Brace yourself (or flee now) -- this is a rather long post.

Introduction

IBM has announced the next version of CPLEX Studio (20.1), with planned availability around December 11, 2020. As to why the version number is jumping from 12.10 to 20.1, I have no idea ... but this is 2020, and I have no explanation for pretty much everything that has happened this year.

Among the changes in version 20.1, they have added a new value to the MIP emphasis parameter. Prior to 20.1, there were five possible values for the emphasis parameter:

  • 0 = Balance optimality and feasibility(default)
  • 1 = Emphasize feasibility
  • 2 = Emphasize proven optimality
  • 3 = Emphasize improving the best bound
  • 4 = Emphasize finding "hidden" feasible solutions.

They have added the following new value:

  • 5 = Emphasize heuristics (what Xavier Nodet calls "heuristics on steroids").

The motivation for this is fairly clear: commercial (i.e., paying) customers with difficulty MIP models are frequently less concerned about provable optimality than with getting the best solution they can find within some limited amount of run time. Exactly how the new setting differs from setting 4 (and, for that matter, how setting 4 differs from setting 1) is unclear to me, which is OK. (I'm worried that if I really understood all the nuances, my brain would explode.)

I've been part of the beta test program for 20.1, and I've tried the new setting on a few MIP models. Going in, I expected it to slow down throughput (the number of nodes digested per minute), since running lots of heuristics means spending more time at a typical node. The question is whether the extra time per node pays for itself by reducing sufficiently the number of nodes required to find a solution of a specified quality.

My first attempt was on a difficult problem that arose in some recently published research, and on that problem the setting was definitely not helpful. In fairness, though, there may be a good reason for that. The solution approach involves a variant of Benders decomposition, so the extra time spent on heuristics will frequently produce a "good" solution only to see it shot down by the subproblem (producing a feasibility cut violated by the solution). So the remainder of my tests were on MIP models that did not involve decomposition.

 

Test case 1: Partition

The first test case is a MIP model that glues together sets to minimize the range of set sizes in a partition of a master set. It was originally posted here in August, 2020. The test problem is actually quite easy to solve, with an optimal value of 1 (meaning the cardinalities of the sets formed differ by at most 1).

I ran the problem with a 90 second time limit (irrelevant in most cases), using each of the emphasis settings 0, 1, 4 and 5. The following plot (a log-log plot to enhance readability) shows the progress under each setting.

Progress on partitioning problem


MIPEmphasis 1 ("Feasibility") makes the earliest progress but does not reach the optimal value of 1 within 90 seconds. (At that point, the incumbent value is 5.) Although just shy of one second some of the other levels do a little better than default, overall the default setting reaches the optimal solution fastest and the new setting is worse than the "Hidden Feasibility" setting. We can check the time at which each run (other than with emphasis 1) finds the optimal solution to confirm this.

MIPEmphasis
Time
Default5.50
Hidden Feasibility21.19
Heuristics40.85 

 

Test case 2: Typewriter

The second test case is a MIP model for laying out the keyboard of a hypothetical 19th century typewriter. The problem was featured in a series of posts, and the model used here appeared in the last of those posts. As I noted in that post, I was unable to find a provably optimal solution in large part due to a slow moving best bound, so for this demonstration I set a 60 second run limit. The problem seeks to minimize a distance measure. Once again, I'll use a log-log plot to show progress.

Progress on the typewriter example


All the emphasis settings produce a rapid reduction in the objective function early on. After about a second or so, emphasis 1 (feasibility) seems to do a bit better than the others. Settings 4 and 5 seem to lag a bit. Looking at the final objective values (at the 60 second cutoff), however, it seems that setting 4 (hidden feasibility) did best, and setting 5 (heuristics) slightly outperformed the other settings.

MIPEmphasis
Best
Default5650882
Feasibilty5660625
Heuristics5640159
Hidden Feasibility5517363

We can also look at node throughput. As a general rule, we would expect that increased use of heuristics would slow down node throughput. One possible exception would be settings that encouraged "diving" (local depth-first search), which might speed up processing of nodes during a dive.

Typewriter problem node throughput


The "heuristics" and "hidden feasibility" settings do in fact process fewer nodes in 60 seconds than does the default setting. The "feasibility" setting process about twice as many nodes as does the default setting, which may mean it does a fair bit of diving.

 

Test case 3: Group Selection

The last example is a group selection problem from yet another earlier post. I tested two slightly different MIP models with five minute time limits. The first variant uses continuous variables for some inherently boolean quantities, while the second variant makes those variables explicitly integer-valued. The second variant seems to be a bit harder to solve, even though they are mathematically equivalent.

The problem is a maximization problem, and none of the runs come remotely near proof of optimality. As noted in the earlier post, nonlinear approaches yielded an objective value of 889.3463, which is apparently optimal.

Looking at progress in the incumbent value, we see that all methods make substantial progress at the root node but shortly after the root node appear to bog down. In the first model, there is not much difference among the emphasis settings.

Progress on first group selection model


In the second model, the feasibility setting is a bit faster than the other to reach its maximum, and the heuristics setting is slower.

In both cases, though, the new "heuristics" setting produces the best objective value after 300 seconds.


MIPEmphasis
Best
Default885.7781
Feasibilty885.7781
Heuristics889.3451
Hidden Feasibility889.3130

MIPEmphasis
Best
Default884.6917
Feasibilty884.6917
Heuristics889.3392
Hidden Feasibility889.3130

As for node throughput, the next two plots show that node throughput is clearly greater in the first variant (where inherently boolean variables are treated a continuous with domain [0, 1]), and the "feasibility" setting is again fastest in both variants, while the new "heuristics" setting is slowest.

 

Group Selection Model 1 Node Througput


Group Selection Model 2 Node Througput

 

Conclusion

Testing on a small set of examples does not tell us much. On the group selection models, where progress is hard to come by after a short time, the new setting produced the best results, but was not much better than the old "hidden feasibility" setting. The same was true on the typewriter problem. So I am still waiting to encounter a problem instance where the new setting will provide a substantial improvement.