Wednesday 29 May 2013

Searching for a Syntax Highlighter for R

I've recently been looking at various syntax highlighters I could add to my blog to prettify it a bit more. The only criteria was that it had to be easy for me to incorporate into this blog (me being an interwebs ignoramus), and it should support syntax highlighting for R and Javascript, being two languages I post a lot of snippets of.

At a (very) cursory glance, I found a few options:

R-Highlight

To use the highlight package for R:

  • include jQuery, the highlight script and CSS stylesheet;
  • initiate highlighting by using jQuery to select the nodes we wish to highlight and calling r_syntax_highlight();
  • has a list of recognised functions that will be highlighted (no regex there);
  • can even cause each function to link to its documentation (!)

However, the script seems to support the R language only (the highlight package when used from R appears to support many languages, but the script for webpages seems to only expose R syntax highlighting. I could be wrong here).

Syntax Highlighter

To use SyntaxHighlighter:

  • include the script shCore.js and stylesheets shCore.css, shThemeDefault.css, plus the script for each language you wish to syntax highlight;
  • initiate highlighting by labelling <pre> tags to be highlighted with class: "brush: <language>", and add a call to SyntaxHighlighter.all();
  • does not support R out of the box, but Yihui Xie has written a language definition for it (regex-based).

PrismJS

To use PrismJS:

  • include the script prism.js and stylesheet prism.css;
  • mark code to be highlighted using <code class="language-<language>">;
  • no need to call any functions to initiate highlighting;
  • does not support R out of the box, but languages can be defined using regexes.

Conclusion

They all seem pretty awesome, and I absolutely love R-highlight's ability to link to a function's documentation as well as marking it up. All three are fairly easily themeable. I strongly recommend you check them all out.

However, in the end I went with PrismJS, because

  • I couldn't work out how to use R-highlight for languages other than R (I often blog with Javascript snippets).
  • SyntaxHighlighter, while very popular, required me to host and link to many Javascript files (one per language). I didn't feel like doing this.
  • I love how PrismJS requires the class="language-X" part to be in the code tag, not the pre tag. A language is an attribute of code, not of a preformatted block, and as such you should mark a code block's language in the code tag, not the pre tag. Plus, this way of hinting the code language is recommended in the HTML5 specification.
  • PrismJS requires me to include just two files; the script and stylesheet. In addition, it's tiny! With Javascript, R, HTML/XML, CSS and Bash highlighting support the file is all of 7.8kB. <3

I wrote a syntax highlighter for R and PrismJS; I'll post it tomorrow (or whenever I get round to it). Here's a sneak-preview:

# iterate a dis/like of green eggs and ham
helloSam <- function (times=10, like=F) {
    str <- paste('I', ifelse(like, 'like', 'do not like'), 'green eggs and ham!')
    for (i in seq_len(times)) {
        message(str)
    }
}

Sunday 26 May 2013

R gotcha - regular expressions

Just a quick post --- I came across this today and thought it was worth mentioning.

By default, the regular expression functions grep, gsub, regexpr, etc use extended regular expressions. By passing in perl=TRUE as an argument, one can use Perl regular expressions.

Note that in extended regular expressions, the . character matches the newline character '\n'. In Perl regular expressions, it doesn't.

grep('.', '\n')
## [1] 1
grep('.', '\n', perl=T)
## integer(0)

Something to keep in mind if you use regular expressions in R with strings with embedded newlines and were having puzzling results.