Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

Sunday, June 16, 2019

R v. Python

A couple of days ago, I was having a conversation with someone that touched on the curriculum for a masters program in analytics. One thing that struck me was requirement of one semester each of R and Python programming. On the one hand, I can see a couple of reasons for requiring both: some jobs will require the worker to deal with both kinds of code; and even if you're headed to a job where one of them suffices, you don't know which one it will be while you're in the program. On the other hand, I tend to think that most people can get by nicely with just one or the other (in my case R), if not neither. Also, once you've learned an interpreted language (plus the general concepts of programming), I tend to think you can learn another one on your own if you need it. (This will not, however, get you hired if proficiency in the one you don't know is an up-front requirement.)

I don't really intend to argue the merits of requiring one v. both here, nor the issue of whether a one-semester deep understanding of two languages is as good as a two-semester deep understanding of one language. Purely by coincidence, that conversation was followed by my discovering R vs. Python for Data Science, a point-by-point comparison of the two by Norm Matloff (a computer science professor at UC Davis). If you are interested in data science and trying to decide which one to learn (or which to learn first), I think Matloff's comparison provides some very useful information. With the exception of "Language unity", I'm inclined to agree with his ratings.

Matloff calls language unity a "horrible loss" for R, emphasizing a dichotomy between conventional/traditional R and the so-called "Tidyverse" (which is actually a set of packages). At the same time, he calls the transition form Python 2.7 to 3.x "nothing too elaborate". Personally, I use Tidyverse packages when it suits me and not when it doesn't. There's a bit of work involved in learning each new Tidyverse package, but that's not different from learning any other package. He mentions tibbles and pipes. Since a tibble is a subclass of a data frame, I can (and often do) ignore the differences and just treat them as data frames. As far as pipes go, they're not exactly a Tidyverse creation. The Tidyverse packages load the magrittr package to get pipes, but I think that package predates the Tidyverse, and I use it with "ordinary" R code. Matloff also says that "... the Tidyverse should be considered advanced R, not for beginners." If I were teaching an intro to R course, I think I would introduce the Tidyverse stuff early, because it imposes some consistency on the outputs of R functions that is sorely lacking in base R. (If you've done much programming in R, you've probably had more than a few "why the hell did it do that??" moments, such as getting a vector output when you expected a scalar or vice versa.) Meanwhile, I've seen issues with software that bundled Python scripts (and maybe libraries), for instance because users who came to Python recently and have only ever installed 3.x discover that the bundled scripts require 2.x (or vice versa).

Anyway, that one section aside (where I think Matloff and I can agree to disagree), the comparison is quite cogent, and makes for good reading.

Saturday, June 30, 2012

The White Knight Returns

Back in May, on his excellent blog The Endeavor, John D. Cook posed (and subsequently answered) the following problem: if a knight begins in one corner of an empty chessboard and moves randomly, each time choosing uniformly over the available moves from its current position, what is the mean number of moves until it returns to the corner from which it started?

Several people posed exact or approximate answers in the comments. The approximate answers were based on simulation. The most elegant solution among the comments, posted by 'deinst', modeled the problem as a reversible Markov chain on a graph. John Cook's answer used a rather powerful theorem about a random walk on a graph, which I think boils down to the same logic used by 'deinst'. (Cook also includes Python code for a simulation.)

When I read the problem, my first reaction was also to model it as a Markov chain, but sadly I've forgotten most of what I knew about MC's, including the concept of reversibility. I did, at least, recall the essentials of first passage time. The question reduces to computing the expected first passage time from the corner square to itself. The Wikipedia link for first passage times is not entirely helpful, as it is not specific to Markov chains, but the concept is easy to explain. Let the corner in which the knight starts be position $s$, and let $\mu_j$ denote the expected number of moves required until the knight first returns to position $s$, given that it is currently at position $j$ (i.e., the expected first passage time from $j$ to $s$). The answer to the puzzle is $\mu_s$, the expected first passage time from $s$ to itself. Further, let $M_j$ denote the set of positions that the knight can reach in one move, given that it is currently at position $j$. Letting $p_{jk}$ denote the probability that a knight at position $j$ next moves to position $k$, the first passage times $\mu_\cdot$ satisfy the following system of linear equations:\[ \mu_{j}=1+\sum_{k\in M_{j}\backslash\{s\}}p_{jk}\mu_{k}\quad\forall j. \] In our case, $p_{jk} = 1/|M_j|$ if $k\in M_j$ and 0 otherwise. The solution to this system has $\mu_s=168$ (matching the answers of John Cook and deinst).

Other than a quick refresher on Markov chains, my main interest in this was that I'm teaching myself Python while also trying to get a handle on using Sage. So I wrote a Sage notebook to set up and solve the first passage time equations. Here is the content of that notebook, in case anyone is curious. (Please bear in mind I'm new to Python.)

Define the board

board = [(i, j) for i in range(8) for j in range(8)]

Assign a variable for the expected first passage time (to the lower left corner) from each square of the board

vlist = {b : var('r_' + str(b[0]) + '_' + str(b[1])) for b in board}

List the eight moves a knight can make

moves = [(i, j) for i in [-2, -1, 1, 2] for j in [-2, -1, 1, 2] if abs(i) + abs(j) == 3]

Define a function that takes a position and move and returns the new position if the move is legal, None otherwise

def move(position, change): 
    if (position in board) and (change in moves): 
        landing = (position[0] + change[0], position[1] + change[1])
        return landing if landing in board else None
    else:
        return None

Compute all legal moves from each board position

legal = {p : [move(p, m) for m in moves if not(move(p, m) is None)] for p in board}

Set up the first passage time equations

eqns = [vlist[b] == 1.0 + (1.0/len(legal[b]))*sum([vlist[x] for x in legal[b] if x != (0, 0)]) for b in board]

Solve the equations

sols = solve(eqns, vlist.values(), solution_dict = True)

Verify that the solution is unique

len(sols) == 1

True

Find the mean first passage time from position (0,0) to itself

sols[0][vlist[(0,0)]]

168

One last note: I had originally intended to give the post a Christian Bale buzz by titling it "The Dark Night Returns". Turns out DC Comics beat me to that title. Oh, well.