Thursday 29 March 2012

Getting straight single quotes for code/verbatim in Sweave/knitr

Update 16 April 2012: Yihiu has fixed this from knitr 0.5! Thanks!

I've recently started using knitr to write reports, mainly about code I've written in R.

One thing that I insist upon in documentation on code is that code snippets within the document be able to be copy-and-pasted easily so the user can follow along.

This is why I don't like having the following in my documents:

> words <- c('Hello','world!')
> paste(words)
[1] "Hello"  "world!"

If the user wants to perform the code I've just written, they can't simply select everything and copy-paste; the > symbols are going to get in the way.

I much prefer something like this:

words <- c('Hello','world!')
paste(words)
## "Hello"  "world!"

This is why I like knitr as opposed to Sweave on which it is based; knitr seems to be more flexible in suppressing the leading > in input commands, and putting comments (##) in front of the outputs.

However, knitr has an annoying drawback that Sweave doesn't when it comes to typesetting code: a single quote mark/apostrophe ' in Sweave will stay as such in the output; in knitr, it will be converted to a left or right single quote.

By default, LaTeX will change "straight" single quotes ' into left and right single quotes that are curled depending on whether the quote is open or closed.

If you try and copy-paste these into a terminal you will run into trouble, and R will complain about "unexpected input in "??"", where the "??" may be a funny looking symbol (depending on your terminal) that basically means "I don't understand this fancy symbol you gave me!"

Now, Sweave has some way of dealing with this. It converts all of these funky quotes into normal straight quotes that can be safely copy-pasted into R. Knitr doesn't.

weird curly quotes in knitr

How to fix this? Well, there is a LaTeX package upquote that converts all single quotes that occur in a verbatim environment (or \verb commands) from left/right single quotes into straight single quotes.

It uses the textcomp package to access the command \textquotesingle which is the straight single quote (it also does backticks via \textasciigrave). The upquote package basically says "if you encounter a quote in a verb-like environment, make sure it's \textquotesingle!".

So how does this tie into getting straight quotes in Sweave/knitr? Easy: add \usepackage{upquote} and a fairly arcane command to your preamble:

\documentclass{article}
\usepackage{upquote} % to convert funny quotes to straight quotes
\setbox\hlnormalsizeboxsinglequote=\hbox{\normalsize\verb.'.}%
\begin{document}
<<eval=TRUE,echo=TRUE,tidy=FALSE>>=
    words <- c('Hello','world!')
@
\end{document}

Now you can just run knitr on this and then pdflatex, and voila! Straight quotes.

straight quotes in knitr!

How does this work?

For the more TeX-inclined among you, this is why it works.

First of all, when knitr process a Rnw file (and makes a tex file as an output), it defines a whole bunch of individual characters and uses them in the output. Have a look at the preamble of a knitted document and you will see a whole bunch of:


\newsavebox{\hlnormalsizeboxclosebrace}%
\newsavebox{\hlnormalsizeboxopenbrace}%
....
\setbox\hlnormalsizeboxopenbrace=\hbox{\begin{normalsize}\verb.{.\end{normalsize}}%
\setbox\hlnormalsizeboxclosebrace=\hbox{\begin{normalsize}\verb.}.\end{normalsize}}%

There are lots and lots of these definitions. There appears to be one for each punctuation character and text size.

In particular there is one for the single quote mark, called \hlnormalsizeboxsinglequote. Every single time you have a single quote in a code chunk, knitr replaces this single quote with \usebox{\hlnormalsizeboxsinglequote}. Every single time you use any punctuation character at all within a code chunk, knitr will replace it with the relevant \hlnormalsize[charactername]. It's bizarre, and leads to very ugly code!

For example, the simple chunk in the example above gets rendered (in the tex document) like so:


\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{flushleft}
\ttfamily\noindent
{\ }{\ }{\ }{\ }\hlsymbol{words}{\ }\hlassignement{\usebox{\hlnormalsizeboxlessthan}-}{\ }\hlfunctioncall{c}\hlkeyword{(}\hlstring{\usebox{\hlnormalsizeboxsinglequote}Hello\usebox{\hlnormalsizeboxsinglequote}}\hlkeyword{,}\hlstring{\usebox{\hlnormalsizeboxsinglequote}world!\usebox{\hlnormalsizeboxsinglequote}}\hlkeyword{)}\mbox{}
\normalfont
\end{flushleft}
\end{kframe}
\end{knitrout}
How gross!

Anyhow, remember that knitr inserts all the savebox commands before the preamble you put in your Rnw document. Well, \setbox operates such that it calculates the contents of the box straight away and saves it to the box register, and then forgets the definition of the box (i.e. the \normalsize\verb.'.).

What this means is that \hlnormalsizeboxsinglequote gets defined before the upquote package is even loaded, and hence the effect of upquote (redefining ' to \textquotesingle within verbatim commands) happens too late to affect the \verb.'. that occurs in \hlnormalsizeboxsinglequote.

To fix this, we would like to retrieve the definition of the \hlnormalsizeboxsinglequote command after we load the upquote package so that its definition gets re-parsed. Then we'd just have to type something like \edef\hlnormalsizeboxsinglequote\hlnormalsizeboxsinglequote to say "set \hlnormalsizeboxsinglequote to what it used to be, but re-read the definition first".

Unfortunately there appears to be no way to do this. Hence the only fix is to look up how knitr defines \hlnormalsizeboxsinglequote by grabbing it out of the preamble of a knitted document, and copy its definition into the preamble of the source document.

This works for now, but it just means that if the knitr package changes how it defines \hlnormalsizeboxsinglequote (maybe in one revision they decide to make all quotes blue in colour), it is up to you to make sure that your redefinition of \hlnormalsizeboxsinglequote in your Rnw file matches that used by Sweave.

Wednesday 7 March 2012

Be a NethackR!

Net Hack is one of the most amazing games of all times.

It's a rogue-like game, where the quest is to retrieve the Amulet of Yendor from the bottom of the dungeon and bring back it up to the top in order to sacrifice it to your deity and achieve immortal fame & glory, etc. Along the way, one must avoid the many (and I mean many) ways to die, including from the evil Wizard of Yendor, also known as Rodney.

Adventuring through the dungeon (aww, I died)

There are many, many, many ways to die in Net Hack. Also, there's no saving except to resume your game later - once you die, you die. You have to restart the game from scratch. Finally, the game comes with a small hints book to get you started, but no real instructions (like "don't look at Medusa or you'll die! Don't touch a cockatrice or you'll turn to stone! Don't eat to much or you'll die of overeating (not kidding!)").

These factors all make Net Hack a very, very, hard game. And yet addictive! I have yet to win the game after a couple of years of on and off playing, but I still love it.

Anyhow, I decided to write an R package that would let me play Net Hack in R (terminal version, of course! I wouldn't play the graphics version unless I didn't have a keyboard!).

Why would I want to play Net Hack from R? Well ... why not? :D

You can download it from here - either go to the 'Downloads' page and grab the .zip file and install within R (Packages -> Install package(s) from local zip files... OR install.packages('nethackR_1.0.1.zip',repos=NULL)), or if you feel hackerish and are running Cygwin or Linux (or Mac? haven't tested it there), you can grab the source, unzip, and type:

make
make install

After that, go into R, read the help file, and start a game!

library(nethackR)
?nethackR  # read some help files
?nethack   # read some help files
nethack()  # start a game!

You can even feed in nethack options:

nethack(dogname='Indy',catname='lolcatz',hilite_pet=TRUE,time=TRUE)

Enjoy! (and let me know of bugs, I'm sure there are some).

As a note - the package comes bundled with the Net Hack executable already. You may not feel secure running an exe that the package author (me) guarantees you is the actual NetHack.exe and not one filled with viruses. If so, download NetHack yourself and place it within the bin/your_OS-type folder in the nethackR folder of your R library. your_OS-type is either 'unix' (for Linux and Mac) or 'windows' (for Windows). That way you can be sure the executable is safe.

Onward NethackRs!

RIP yet another character!

Extra rambling (mainly for R people):

This was mostly an exercise in writing R packages - it was the first one I ever wrote and wanted something fun to motivate me.

It turned out to be much easier than I thought - you can just call system('nethack'), and R takes care of the rest, even the interactive part - it's as if I'd just run nethack from the terminal instead.

However, I then took this to Windows to test, and if I used the GUI console for R (Rgui.exe as opposed to Rterm.exe), NetHack would start but hang my system until I forcibly closed the NetHack.exe process using the System Manager.

I figured out the solution today by looking at the help file for system in R in Windows (turns out the help file is different in Linux and didn't include this all-important information) - turns out I can't run interactive (text) programs in Rgui, it just doesn't work.

So instead, if the user uses Rgui, the package will launch a command prompt from which the user can play.