## Monday, July 25, 2016

I recently picked up a pair of Bluetooth headphones (Mixcder ShareMe 7) for use with my laptop (which runs Linux Mint). Getting them to connect properly was a bit of an adventure. After I had things (mostly) sorted out, I decided to script the steps necessary to get them working so that I could just double-click a script file and let it do the bulk of the work for me. I thought I'd pass along my script (and some background) here in case it helps anyone else. Note that what follows works on Mint, and probably (maybe? hopefully?) on other Debian-based distributions of Linux. Windows and Mac users may need to look elsewhere.

### Preliminaries

I already had Bluetooth support installed on the laptop. (It did not come with the basic Mint installation.) On top of the drivers, I installed the blueman and bluez-tools packages from the Canonical repositories (using Synaptic). The former provides a nice graphic user interface for managing Bluetooth connections, while the latter provides some handy command line tools.

The first step was to pair the headphones with the laptop. It's fairly easy to do using blueman, and I won't bother with the details here (especially since I've already forgotten them). One thing I will mention is that, during the pairing, I was asked for a PIN code for the headphones. I couldn't find it in the otherwise extensive (and multilingual) manual, though I may have missed it. At any rate, the common default choice for headsets and many other Bluetooth devices worked: 0000.

A couple of other things need to be noted that might be peculiar to my setup. First, I have a command that runs at startup and turns off the Bluetooth service. I do that to conserve battery power, since I don't use Bluetooth all that often on the laptop. Second, my laptop has better than average battery life but isn't exactly a portable supercomputer. I had to stick a pause in the script below due to a timing issue. Whether others would have the same timing issue, and whether they would need the same duration pause, a longer one or a shorter one, is an empirical question. (In other words, caveat emptor.)

### Setup script

I run this script (by double-clicking it, or invoking it in a terminal, depending on my mood) after turning on the headphones:

#!/bin/bash
# Script to connect Mixcder headphones as audio sink.
# Note: turn on the headphones first!
#
# Unblock the Bluetooth adapter (blocked by startup script)
gksudo rfkill unblock bluetooth
# Load the Bluetooth discovery module.
# Pause a few seconds -- attempting to connect immediately fails
# (timing issue?)
sleep 5
# (Ignore a "did not receive a reply" error -- seems harmless.)
bt-audio -c "Mixcder ShareMe 7"
# Make sure the headphones use the audio sink (A2DP) profile, not the
# telephony profile (or the turned off "profile")
pactl set-card-profile bluez_card.E8_99_FF_22_76_44 a2dp

• The call to rfkill (which requires superuser privileges) turns Bluetooth back on. You will not need that if you leave Bluetooth on by default.
• The next line loads the Bluetooth discovery module, which is apparently necessary for Bluetooth to find the headphones.
• After that, I have the script take a five second nap to let the discovery module do some discovering. (The default time unit for sleep is seconds.) As I mentioned above, five seconds seems to work for my laptop; your mileage may vary (or you may not need this at all).
• The call to bt-audio connects the headset. Substitute the name of your device (collected in the previous section), in quotes, if you're not using a Mixcder ShareMe 7.
• Finally, the last line sets the profile for the headphones to be a high quality audio sink (meaning the microphone is turned off and the headphones act like stereo speakers). The "a2dp" is what specifies that profile. If you want your headset to act like a telephone (microphone on, lower quality sound), change "a2dp" to "hsp". Note that the "card" name is "bluez_card." followed by the MAC address (which you collected above) with the colons converted to underscores (no idea why).
If you have problems with getting the correct profile name (i.e., neither "a2dp" nor "hsp" does what you want), try running 'pactl list' in a terminal. The section of output labeled "Profiles:" lists the available profiles, giving for each one the name (e.g., "a2dp") followed by a colon and a description of it.

## Tuesday, July 19, 2016

### Finding the Kernel of a Matrix

I'm working on an optimization problem (coding in Java) in which, should various celestial bodies align the wrong way, I may need to compute the rank of a real matrix $A$ and, if it's less than full rank, a basis for its kernel. (Actually, I could get by with just one nonzero vector in the kernel, but I'm greedy.)

So I spent a couple of days doing Google searches to see which open-source linear algebra libraries do what. My ideal library would be easy to install (no compiling from source), would support sparse matrices (which mine will be), would make it easy to find the kernel (which turned out not to be a given), and would run fast (without using my computer's GPU, which some of the libraries do). My search led me to install and test several libraries, only to discover that some did not support sparse matrices and some did certain things in rather peculiar ways, designed to make it hard if not impossible to drill down to the basis of the kernel. (One particularly annoying library threw an exception because the matrix I had just constructed, in the immediate preceding line using a sparse matrix constructor, was not considered sparse.) I found something that works, and I thought I'd document it here. If I find something better in the future, I'll post that as well.

Before proceeding, let me point out the Java Matrix Benchmark, which provides useful benchmarking information (and links to) a number of linear algebra packages.

What works for me, at least for now, involves the Apache Commons Mathematics library. This is one of the more commonly used (no pun intended) mathematics libraries in Java-land. Commons Math supports sparse matrices, at least for storage. (I'm not sure if computational operations, other than add/subtract/multiply, exploit sparsity.) It also does both QR and SVD decompositions. A number of responses on Q&A sites suggested using either QR (faster) or SVD (more numerically stable) decomposition of the matrix $A$ to get to its kernel. I opted for a QR decomposition. As I found out the hard way, though, not all QR decompositions are created equal.

Cutting to the chase scene, the key is to do a "rank-revealing" QR decomposition, which means using the RRQRDecomposition class, not the QRDecomposition class. What you decompose is actually $A^T$, the transpose of $A$. So if $A$ is an $m \times n$ matrix, the decomposition looks like$$A^T P = Q R,$$where
• $P$ is an $m \times m$ pivot matrix,
• $Q$ is an $n \times n$ orthonormal matrix (i.e., $Q^T Q = I$), and
• $R$ is an $n \times m$ upper triangular matrix.
If $A$ has rank $r$, the last $n - r$ columns of $Q$ provide a basis for the kernel of $A$. (If $r = n$, $A$ is full column rank and the kernel is just $\{0\}$.)

I wrote a little test program (one short Java file) to make sure I was doing things correctly. It generates a random matrix, decomposes it, and confirms that the last however many columns of $Q$ really belong to the kernel of the matrix. If you want to see things in action, you can get the code from the blog's GitLab repository. You'll need to have a recent version of the Commons Math library (I used 3.6.1) on your class path. There are various parameters you can play with: a random seed; the dimensions of $A$; how dense $A$ should be; a rounding tolerance (how close to 0 counts as 0); and a flag which, if set true, tells the matrix generator to replace one column of $A$ with a random linear combination of the others (just to ensure that $A$ does not have full column rank).

## Tuesday, July 5, 2016

### Over- and Underfitting

I just read a nice post by Jean-François Puget, suitable for readers not terribly familiar with the subject, on overfitting in machine learning. I was going to leave a comment mentioning a couple of things, and then decided that with minimal padding I could make it long enough to be a blog post.

I agree with pretty much everything J-F wrote about overfitting. He mentioned cross-validation as a tool for combating the tendency to overfit. It is always advisable to partition your sample into a training set (observations used to compute parameters of a model) and a testing set (used to assess the true accuracy of the model). The rationale is that a trained model tends to look more accurate on the training data than it truly is. In cross-validation, you repeatedly divide the original sample (differently each time), repeating the training and testing.

A related approach, perhaps better suited to "big data" situations, is to split your (presumably large) sample into three subsamples: training, testing and validation. Every model under consideration is trained on the same training set, and then tested on the same testing set. Note that if your model contains a tunable parameter, such as the weight assigned to a regularization term, the same basic model with different (user-chosen) values of the tuning parameter are treated as distinct models for our purposes here. Since the testing data is used to choose among models, the danger of the results on the training set being better than they really are now morphs into the danger that the results on the testing set for the "winning" model being better than they really are. Hence the third (validation) sample is used to get a more reliable estimate of how good the final model really is.

One statement by J-F with which I disagree, based on a combination of things I've read and my experiences teaching statistics to business students, is the following:
Underfitting is quite easy to spot: predictions on train[ing] data aren't great.
My problem with this is that people building machine learning models (or basic regression models, for that matter) frequently enter the process with a predetermined sense of either how accurate the model should be or how accurate they need it to be (to appease journal reviewers or get the boss off their backs). If they don't achieve this desired accuracy, they will decide (consistent with J-F's statement) that predictions "aren't great" and move to a different (most likely more complex or sophisticated) model. In the "big data" era, it's disturbingly easy to throw in more variables, but that was a danger even in the Dark Ages (i.e., when I was teaching).

I recall one team of MBAs working on a class project requiring them to build a predictive model for demand of some product. I gave every team the same time series for the dependent variable and told them to pick whatever predictors they wanted (subject, of course, to availability of data). This particular team came up with a reasonably accurate, reasonably plausible model, but it temporarily lost accuracy on observations from the early 1980s. So they stuck in an indicator variable for whether Ronald Reagan was president of the US, and instantly got better accuracy on the training data. I'm inclined to think this was overfitting, and it was triggered because they thought their model needed to be more accurate than it realistically could be. (It was interesting to hear them explain the role of this variable in class.)

When I taught regression courses, I always started out by describing data as a mix of "pattern" and "noise", with "noise" being a relative concept. I defined it as "stuff you can't currently explain or predict", leaving the door open to some future combination of better models, greater expertise and/or more data turning some of the "noise" into "pattern". Overfitting occurs when your model "predicts" what is actually noise. Underfitting occurs when it claims part of the pattern is noise. The problem is that the noise content of the data is whatever the universe / the economy / Loki decided it would be. The universe does not adjust the noise level of the data based on what predictive accuracy you want or need. So calling a model underfitted just because you fell short of the accuracy you thought you should achieve (or needed to achieve) amounts to underestimating the relative noise content, and is both unreliable and likely to induce you to indulge in overfitting.