Sunday, March 31, 2013

Auto-collapsing Tree in Java

I'm writing a program in Java using Swing to build the user interface. Programming is not exactly my strong suit, and building graphical user interfaces is pretty much my Kryptonite. So it's no big shock that progress is slow.

One of the controls in my interface is a tree (instance of the JTree class). To keep the interface clean, I want the act of expanding any node to automatically collapse any currently open sibling node (and any open descendants of that sibling). In other words, only one path from the root node to a leaf node should be expanded at any given time.

This seems both simple and something that would be commonplace, so I naively assumed there would be some property setting in JTree to enforce this. Not only did I not find such a property (or method), but I pretty much wore out one of Google's servers looking in vain for any discussion or sample code relating to this. I did not even find any unanswered questions about it online. So either it is not as common a requirement as I thought or my search technique is atrophying.

I eventually found a way that I think works, although it may be a bit inefficient (a hallmark of my coding). Here's a snippet that demonstrates it, applied to an instance mainTree of the JTree class that is defined elsewhere.

    // listen for tree expansion and collapse the previously open path
    mainTree.addTreeWillExpandListener(new TreeWillExpandListener() {
      public void treeWillExpand(TreeExpansionEvent event)
                  throws ExpandVetoException {
        TreePath target = event.getPath();  // the path that will expand
        TreePath parent = target.getParentPath();  // parent of the target
        // get the currently expanded descendants of the parent note
        Enumeration<TreePath> expanded = mainTree.getExpandedDescendants(parent);
        // copy the enumeration to a nonvolatile list (collapsing things
        // will alter the enumeration on the fly)
        ArrayList<TreePath> open = new ArrayList<>();
        while (expanded != null && expanded.hasMoreElements()) {
        // no reason to collapse the parent; it will just reexpand when
        // the target expands
        // sort the list so that longer paths (nodes deeper in the tree) are 
        // closed first -- this prevents closed nodes from reopening as their
        // descendants are closed
        Collections.sort(open, new Comparator<TreePath>() {
          public int compare(TreePath o1, TreePath o2) {
            return, o2.getPathCount());
        // now collapse open paths, starting at their lowest levels and
        // working back up the tree toward the common parent
        for (TreePath p : open) {

      public void treeWillCollapse(TreeExpansionEvent event) 
                  throws ExpandVetoException {

I'll point out a few key features:
  • I'm attaching a TreeWillExpandListener, which is called after the click that tells Swing a node needs to be expanded but before the expansion actually takes place.
  • The getExpandedDescendants method returns an enumeration of expanded nodes (in the form of TreePath instances). From the Java 7 documentation for this method:
If you expand/collapse nodes while iterating over the returned Enumeration this may not return all the expanded paths, or may return paths that are no longer expanded.
I speak from experience: they're not kidding. In order to collapse everything in the enumeration, I first convert it into a list (ArrayList).
  •  I remove the parent node from the results of the enumeration. It's harmless to collapse the parent, but also pointless: the parent will re-expand when the target child expands.
  •  If you need to collapse a path that descends more than one level from the common parent, you need to do it in reverse order (from the lowest expanded node back toward the parent). Otherwise, as you collapse descendants of some node, Java will re-expand the ancestor. In genealogical terms, if your sister and nephew are currently expanded, and its your turn to expand, the order of collapse has to be nephew first, then sister. Otherwise, if your sister is collapsed first, the act of collapsing the nephew appears to cause Swing to expand the sister. So I sort the list of TreePaths to collapse using an anonymous comparator class that sorts in reverse length order (longest path first -- don't miss the minus sign).
  • The listener listens for both will-expand and will-collapse signals. I left the will-collapse part empty because it's irrelevant to my application.

Friday, March 29, 2013

Justice Flunks Math ... Or Not

Catching up on some online reading, I just chanced upon a New York Times op-ed piece titled "Justice Flunks Math". It deals with the Amanda Knox case. The authors' argument for their thesis (captured well by the title) centers around the following:
One of the major pieces of evidence was a knife collected from Mr. Sollecito’s apartment, which according to a forensic scientist contained a tiny trace of DNA from the victim. Even though the identification of the DNA sample with Ms. Kercher seemed clear, there was too little genetic material to obtain a fully reliable result — at least back in 2007.
By the time Ms. Knox’s appeal was decided in 2011, however, techniques had advanced sufficiently to make a retest of the knife possible, and the prosecution asked the judge to have one done. But he refused. His reasoning? If the scientific community recognizes that a test on so small a sample cannot establish identity beyond a reasonable doubt, he explained, then neither could a second test on an even smaller sample.
Whatever concerns the judge might have had regarding the reliability of DNA tests, he demonstrated a clear mathematical fallacy: assuming that repeating the test could tell us nothing about the reliability of the original results. In fact, doing a test twice and obtaining the same result would tell us something about the likely accuracy of the first result. Getting the same result after a third test would give yet more credence to the original finding.
Imagine, for example, that you toss a coin and it lands on heads 8 or 9 times out of 10. You might suspect that the coin is biased. Now, suppose you then toss it another 10 times and again get 8 or 9 heads. Wouldn’t that add a lot to your conviction that something’s wrong with the coin? It should.
My answer to the final  (rhetorical?) question is yes: my conviction that the coin was biased would increase, because the second test is plausibly independent of the first test. Whether that same reasoning applied to a retest of DNA evidence would depend on whether the retest would be probabilistically independent or, if not, how strongly the two test results would covary.

Suppose, hypothetically, that we have a test that is sometimes accurate, sometimes inaccurate, but infallibly produces the same result (right or wrong) on a given sample. No number of retests will improve the accuracy of the test.

So the use of the coin flip analogy is somewhat facile. (I can understand the temptation to use it, though. The authors were writing for a general audience, not the more mathematically sophisticated -- not to mention orders of magnitude smaller -- audience for this blog.) Retrials of the DNA test are likely to be neither independent nor identical, but somewhere in between. So a retest might add some information, but might well not alter our confidence in the original test enough to justify it. Bear in mind that retesting has both monetary and evidentiary expenses (it consumes portions of a finite, irreplaceable sample).

I'm inclined to believe that the second DNA test should have been done, not because a repeated test would necessarily raise confidence substantially, but because technology had "advanced" -- but only if there were expert testimony that the technological improvements justified consumption of more of the sample.

Tuesday, March 26, 2013

Farewell to Google Reader

Google's announcement that it would be ending Google Reader service on July 1 caused considerable wailing and gnashing of teeth ... and that was just me. A lot of other folks are also inconvenienced, to put it mildly. I have no intention of slamming Google over the decision. They provided the service at no charge to me, and I'm grateful to have had the use of it. Now it's time to move on.

John D. Cook did a couple of blog posts about alternatives (see here and here), and there is no shortage of web pages devoted to the subject. I've spent more time than I care to think shopping for a solution. My requirements, in descending order of importance are as follows.
  1. The reader must be accessible from both a web browser (on Linux, although that's unlikely to be an issue) or Linux desktop client (browser preferred) and an Android client (or browser application, but native client preferred).
  2. The reader must synchronize between my Android tablet and my Linux PC.
  3. I must be able to import my Google Reader subscriptions (preferably including the folders into which they are organized).
  4. The reader should have straight-forward navigation, including the ability to flag articles as read. (I'm not worried about liking, +1-ing or other social features.)
  5. There should be easy (one click) linking from the reader summary of an article to the source (original web page) in a browser.
  6. I would rather not have a magazine-style interface. For me, it's just unnecessary clutter, and somewhat inappropriate. Some of my subscriptions are blogs, but I also use RSS to subscribe to forums and Twitter feeds, which just look dopey in a magazine layout.
It turns out that synchronization (my second priority) is an issue. Some alternatives currently sync very well, but they use Google's Reader back-end to do it. That leaves them scrambling to find alternatives by the end of June. Some do not sync at all, which is a deal-breaker for me. I'm busy enough that it's a bit of a struggle to keep up with the feeds to which I subscribe, and I really do not have time to spend flagging articles that I've already read on another device.

I thought that Dropbox might be an easy syncing solution. Brent Simmons, who I believe authored the NetNewsWire reader for Apple devices, argues in a blog post that syncing through a cloud file service like Dropbox is unlikely to work. So much for the easy way out.

Since Feedly grabbed far and away the lion's share of recommendations on several sites I checked, I tried that first. Importing my Google Reader subscriptions was trivial. The interface took a bit of getting used to, which is going to be an issue with any alternative to Google Reader. On the PC (in Firefox, using their extension), some articles were "featured" (displayed in larger boxes) than others. I found it easy to mark as read those that were not featured, but for the life of me I could not find an easy way to mark the featured ones read. It was either click on them and read them, or click on a different article to make it "featured" and then, with the original article no longer featured, mark it as read. That's a bit inefficient.

The Feedly Android client was actually a bit easier to navigate, once I learned that (a) swiping horizontally was the way to mark an article read or unread and (b) I needed to be very careful about not using too long a swipe. (A long swipe marks everything on screen read/unread.) Synchronization worked, although I found that I had to log out and log back in at least once on my desktop browser in order to catch changes from the Android client.

Unfortunately, one glitch in the Android application proved to be a deal-breaker. Tapping a link provided with each synopsis let me read the article in a browser, which was embedded in the Feedly client. In many cases, the article is a post on a forum, to which I want to respond. Finger-painting a response on an Android device is painful as it is. In at least a couple of cases, though, Feedly ate my response. After laboriously typing in the answer, I had to scroll up or down to access the button to submit the message. Feedly apparently interpreted the vertical swipe to mean "go back to the previous screen", losing my work in the process. I could not find a setting that would compel Feedly to send me to an external browser (either the default Android browser or Firefox) to read the full article, so that was the end of my Feedly trial.

I'm currently trying Netvibes. There is (as yet) no native Android application, so I access it via web browser on all devices. I can live with that. Synchronization seems to work (knock on virtual wood). Importing my Google Reader subscriptions (including folders), while not as easy as with Feedly, went fairly smoothly, although previous posts in some cases came in with very incorrect dates (as in, all posts from one source were dated seven minutes prior to import). I spent a bit of time marking things read, but that is a one-time phenomenon. The interface is quite clean. I find the "widgets view" more visually appealing but, for busy feeds, the "reader view" more functional.
Widgets View
Widgets View
Reader View
Reader View

Update: After a week plus of use, I've posted my impressions of Netvibes.

Update #2: According to eWeek, Feedly has grabbed some 3 million Google Reader users (and counting) and is adding/improving features.

Update #3: I've now switched to Inoreader, mainly because I'm a bit more comfortable with it on mobile devices. After about a month of use, I'm quite happy with it.

Tuesday, March 5, 2013

The Value of Knowing the Value of Your Degree

Fellow blogger Laura McLay wrote today about a push in Congress to require someone (apparently states) to report statistics on graduate earnings by college/university and major. (See this report at Inside Higher Ed for more details; tip of the hat to Laura for the link.) Laura raises some excellent points about this pitfalls of this, and I will try not to duplicate her analysis.

Proponents of this sort of disclosure throw the word "transparency" around a fair bit, and in general I'm in favor of transparency (possible exceptions being clothing and curtains). Those of us associated with analytics are unlikely to argue against the provision of data (and, hopefully, some statistical analysis of it). Anyone who has used or taught decision analysis knows that, under typical assumptions, the expected value of imperfect information is nonnegative. In other words, it can't hurt to know. Those "typical assumptions", while mathematically mild, are important, and include the following:
  1. we have some ability to assess the general accuracy (or inaccuracy) of the information; and
  2. we make rational use of it.
Tips about the stock market, for example, point to the importance of the first assumption. Stock tips are never fully accurate, and stock tips from your halfwit brother-in-law may be chronically inaccurate (which raises the value of the information as long as you realize that -- just do the opposite of what he suggests). Buy recommendations from a generally reliable broker who just happens to have a big fish client trying to dump a chunk of that particular security, though, are problematic because you cannot assess even their approximate accuracy.

The second issue -- can we make rational use of the information (and will we) -- is one reason doctors are not always supportive of genetic testing and sometimes even follow-up tests for marginal positive results. Will information that the patient's risk of some relatively unlikely or slowly progressing condition unduly depress the patient, cause the patient to embark on expensive, invasive and/or risky tests or procedures, or otherwise push the patient to do something that might not be entirely reasonable?

So my first reaction to the notion of making information available to potential college students about career prospects (placement rates, starting salaries, salaries five years out etc.) as they relate to the nature and source of the college degree is positive: more information is better. My second reaction is that it needs to be information, not just data, meaning that someone reliable (knows analytics) and trustworthy (not out to recruit students) needs to process the data and translate it into actionable knowledge. Moreover, it needs to be communicated to prospective students in a way that lets them understand both the implications and the limitations of the information. So we need statisticians or analytics professionals involved, and we need communications professionals involved.

I'll end with a few specific comments:
  • As Laura mentions, the analysis should at minimum provide ranges and not just averages. Those of us with analytics training are only too aware of the Flaw of Averages.
  • Salaries are one way to look at the value of a degree. Break-even analysis (the time required to earn enough to pay off the cost of the degree, include lost earnings for the time spent in college) is another, but it is trickier to compute.
  • Some nontrivial statistical modeling may be required to account for various factors other than school and major that might influence earning power. For example, some schools have an explicit pre-med major, while at some schools pre-med students major in chemistry, biology or biochemistry, and at some schools they major in something unrelated. When I was an undergraduate, the student living next door to me was a pre-med who majored in English. If that were true across the board (and I have no idea if it was, but at minimum it was not an anomaly), then our English majors probably out-earned English majors at schools with explicit pre-med majors.
  • There is more to a career than salary, and that needs to be conveyed to consumers of this information. Before the markets tanked in '07-'08, finance majors hired by Wall Street trading firms enjoyed rather high salaries (higher than what finance majors earned in corporate finance positions, and certainly higher than many other majors). They also "enjoyed" a high cost of living, ungodly work hours and high stress. My impression is that aggressive personalities tended to fare better than less aggressive ones. So that high salary figure for finance majors at schools that fed the Wall Street mill needed to be tempered by an understanding of those other factors.
  • Laura mentions the effect of time. Widespread dissemination of salary data might lead to gluts in the better-paying fields, driving down salaries in those fields. At the same time, demographic, economic or technological trends might augur for higher salaries down the road in fields that recently have not paid that well. (I'd mention gerontology, but someone might read something personal into it.)
  • There are risk factors involved in the decision to attend college, the choice of the college to attend, and the choice of major. Major A might pay more after graduation than major B, but if majoring in A makes it likely you will fail to graduate and B is safer (given your particular skill set and inclinations), maybe B is really the better deal.