John Myles White

John Myles White

Statistics hacker, PhD student


Filed under:

Who are you, and what do you do?

I'm John Myles White. I'm one of the authors of Machine Learning for Hackers, and I'm a Ph.D student in psychology. I spend most of my time trying to help people use statistics to understand the world around us.

What hardware do you use?

I mostly use either my 2008 MacBook, which has 4 GB of RAM and a 2.4 GHz Intel Core 2 Duo Processor, or my first generation iPad. I've got a 2007 iMac that I mostly use to watch movies, although it's also acted as a server at times.

And what software?

To stay in touch with the world, I use, Safari, the Mac Twitter client and Reeder. I write notes to myself using Evernote. I write all my blog posts using MarsEdit.

I use Hazel to keep my files organized and I keep my PDF library organized using Papers 2. My only complaint about Papers 2 is that it doesn't do the best job of figuring out when I have duplicates in my library, but I have the same complaint about iTunes. I also use the Kindle app on my iPad to do most of my pleasure reading. I'm trying to move away from paper books to save space and the environment.

I write all my code using TextMate 2. The alpha release is a little brittle, but I've been a huge fan of TextMate ever since I first started building sites using Ruby on Rails, so I was really excited to see TextMate 2 leave the world of vaporware. If I'm working only at the command-line, I'll use emacs.

Most of my work involves programming, so programming languages and their libraries are the bulk of the software I use. I primarily program in R, but, if the situation calls for it, I'll use Matlab, Ruby or Python. Lately I've been programming a lot in a new language called Julia. It hasn't even reached a 0.1 release yet, but it's often nearly as fast as C while still being as readable as Ruby. I'm hoping we have Julia ready to be a replacement for the computationally heavy programming that R isn't well-suited for in the near future.

That said, for me the specific language I use is much less important than the libraries availble for that language. In R, I do most of my graphics using ggplot2, and I clean my data using plyr, reshape, lubridate and stringr. I do most of my analysis using rjags, which interfaces with JAGS, and I'll sometimes use glmnet for regression modeling. And, of course, I use ProjectTemplate to organize all of my statistical modeling work. To do text analysis, I'll use the tm and lda packages.

I've got a huge music library that keeps me using iTunes a lot, although I've moved some of my listening over to Spotify.

I'm a big believer in MySQL and like to use Sequel Pro as a convenient GUI for editing databases.

To keep all of my files synchronized across the machines I own, I use Unison, a great program that seems to have never gained the traction it deserves. I also use Dropbox to collaborate with other people, although I've started to use Google Docs for a bunch of collaborative editing of Word documents. When I need to write something mathematical, I use MacTex.

And, for version control, I use Git. I'm moving increasingly towards keeping all the work I do in Git, including all the text I write.

What would be your dream setup?

For me, the main limiting factor in my work is always memory: either RAM or hard disk space. I'm working at MSR this summer and there's an urban legend that there's a machine with 2 TB of RAM here that some of the researchers have access to. Having a machine with that kind of power is really my ideal, although I could also benefit from more hard disk space. My other dream situation would be to have a small Hadoop cluster running at home that's only for my own work.