Machine Learning for Hackers by Drew Conway and John Myles White, O’Reily Media

machine learning

Machine Learning for Hackers gets you started using R for machine learning. The book does a good job telling you how to install R and where to find help.

All the code and data for this book is on https://github.com/johnmyleswhite/ML_for_Hackers.git

Sadly there is not an R package.

There are lots examples on how to explore data using ggplot2. Other package covered include plyr which they equal to map reduce.  tm package which is used in polynomial regression. glmnet and the Lamda function. K-Nearist neighbor algorithm which uses the class package.

Also good information on how to work with api’s and json using RCurl. RJSONIO and igraph.

This book is written for hackers, people who already know how to code. The theory is found in other books. More detail on specific techniques and R code is in other books. This book is a good starting point for machine learning and R.

Fisher, Neyman, and the Creation of Classical Statistics

fisher

I took a break from trying to figure how to get the data that goes along with the books that I am reading, to read a Springer Book
Fisher, Neyman, and the Creation of Classical Statistics; by Erich Lehman.
The book was a nice break, I enjoyed reading about the Human traits of the founders of modern classical statistics. The author put a lot of work into finding and citing the writings from Fisher and Neyman.
I learned that Ronald Aylmer Fisher was a wrangler, a student doing the best in examinations. I have been puzzled by the term data wrangler, thinking about rodeos and the west. It makes more sense to be the best student. Although a lasso might come in handy when fetching data.
It was fun to read about the silver jubilee of my dispute with Fisher by Neyman. Twenty five years of arguments. Wow that is a conflict.
The book ends with a discussion on the irony of Bayesian Inference.
This is a well done book that I recommend reading. I also think that it would make a great graphic novel.

Things to Remeber When Updating R

Recently I updated my R package to 3.0.1 Good Sport. I wanted to download a package that wasn’t available in 2.4 toasted marshmallows. The book said that the package works with 2.5. I guess that it is only 2.5 because it doesn’t work in 3.0.1 either.
The point of this post is to remind myself to keep a list of the packages that I am using. When I upgraded R didn’t keep all the packages. At first I was puzzled and surprised. Then I figured it out. That upgrading into a new folder was part of the problem.
I am going to solve the problem by starting up my other computer the MAC book and compare packages. I try to keep my windows and MAC R environments the same.
Next time I upgrade I am going to write down a list of packages.

R for Business Analytics published by Springer

I am enjoying reading this book, authored by A Ohri. I like the short interviews of people like Hadley Wickam author of ggplot2, ch 5.10 and James Dixon founder of Pentaho, ch 4.6.6.1.
We discussed Pentaho at a recent Quantified Self Meet-up. After  learning about Pentaho, I was pleased to find a section on in this book.
Along with how to work with R and every current database, cloud service, api’s and json there is a section on postgreSQL my favorite database.
After reading this book I feel more confident about getting  data into R.

The amount of information about graphics cover just about everything. Chapter 5 has code for pie charts and Venn diagrams, even code for a word cloud.

Chapter 6 Building Regression Models covers multicollinearity and hetroscedasticity.  Something that I don’t think is talked about often.

Note about the code in this book he uses = as the assignment operator not <-

Each Chapter has a summary at the end listing all the packages and functions used in the chapter. I am finding this to be a very useful book on business analytics.978-1-4614-4342-1

Numerical Analysis for Statisticians

978-1-4419-5944-7

Numerical Analysis for Statisticians by Kenneth Lange 2010
Although this book doesn’t have any code in in it, it is still useful. The theory and equations are well defined and easy enough to read.
I went to a talk on FFT and Python at OSCON 2013. Sound Analysis with the Fourier Transform and Python, given by Caleb Madrigal.
Chapter 19 on Fourier Transforms goes along nicely with the talk. Caleb presented the formulas and talked about which ones to use. This book gives you all the details you need for choosing formulas and libraries when implementing Fourier Transforms.

A Short History of Random Numbers, and Why You Need to Care given by Matthew Garrett, was another talk that I went to. Chapter 22 Generating Random Deviates is a nice over view of some of the material covered in the talk.

In general this is a good book, I just wish that it had some code examples, pseudo code, algorithms etc. It is not easy to take equations and turn them into code.

Instant PostgreSQL Starter

Author Daniel K Lyons published by Pakct Publishing

http://www.packtpub.com/instant-postgresql-starter/book
I wish that his book would of been available when I first started using PostgreSQL, it would of saved me a lot of trouble.
The Installation instructions are straight forward. The quick start section has clear SQL instructions.
Top 9 features you need to know about covers, things like properly storing passwords, encryption using pgcrypto and backup and restore which are necessary for all databases.

XLConnect

Wow this works sweet. Thank you Six Sigma with R.
I have an Excel worksheet that I need to analyze. They are not always to smoothest thing to read into R.
I just downloaded and used XLConnect. First try exactly is what I wanted.

dummy code
library (XLConnect)
wb <- loadWorkbook(“toyprob.xls”)
data.toyprob <- readWorksheet(wb, sheet = 3)
str (data.toyprob)

this side is the object <- what it is assigned to

R Error Messages

I spent a good part of yesterday trying to figure what an error message meant. I was trying to draw a classification tree. I kept getting an environmental error message. I couldn’t figure out what was wrong. I searched for answer, only to find nothing useful. Then I remembered about vectorization and turning my data into a data object. I didn’t think I needed it here since I was following the example exactly. But I did.
Useful information on Data objects is in Six Sigma with R, Emilio Cano, Javier Moguerza and Andres Redchuk; Chapter 2.4.

Useful information on subsetting is in R Cookbook, Paul Teetor; Chapter 5.24

examples of what worked.
toycat <-subset(datatoycat, select= c(animal,eye,fur, legs))

toy <- rpart(toycat, method = "class")

Ignite OSCON

I am presenting at Ignite OSCON 2013. Is There a Cat in Here, Data Mining with Toys. I am busy working on my slides. It is difficult to condense data mining into 20 slides in five minutes. I am having fun doing this. I have lots of great pictures for my slides. Books that I have been using for the theory and practice of data mining are: The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer Press. And one that I now owe the library fines on, Introduction to Algorithms by Thomas Cormen