R for Business Analytics published by Springer

I am enjoying reading this book, authored by A Ohri. I like the short interviews of people like Hadley Wickam author of ggplot2, ch 5.10 and James Dixon founder of Pentaho, ch
We discussed Pentaho at a recent Quantified Self Meet-up. After  learning about Pentaho, I was pleased to find a section on in this book.
Along with how to work with R and every current database, cloud service, api’s and json there is a section on postgreSQL my favorite database.
After reading this book I feel more confident about getting  data into R.

The amount of information about graphics cover just about everything. Chapter 5 has code for pie charts and Venn diagrams, even code for a word cloud.

Chapter 6 Building Regression Models covers multicollinearity and hetroscedasticity.  Something that I don’t think is talked about often.

Note about the code in this book he uses = as the assignment operator not <-

Each Chapter has a summary at the end listing all the packages and functions used in the chapter. I am finding this to be a very useful book on business analytics.978-1-4614-4342-1

Numerical Analysis for Statisticians


Numerical Analysis for Statisticians by Kenneth Lange 2010
Although this book doesn’t have any code in in it, it is still useful. The theory and equations are well defined and easy enough to read.
I went to a talk on FFT and Python at OSCON 2013. Sound Analysis with the Fourier Transform and Python, given by Caleb Madrigal.
Chapter 19 on Fourier Transforms goes along nicely with the talk. Caleb presented the formulas and talked about which ones to use. This book gives you all the details you need for choosing formulas and libraries when implementing Fourier Transforms.

A Short History of Random Numbers, and Why You Need to Care given by Matthew Garrett, was another talk that I went to. Chapter 22 Generating Random Deviates is a nice over view of some of the material covered in the talk.

In general this is a good book, I just wish that it had some code examples, pseudo code, algorithms etc. It is not easy to take equations and turn them into code.

Instant PostgreSQL Starter

Author Daniel K Lyons published by Pakct Publishing

I wish that his book would of been available when I first started using PostgreSQL, it would of saved me a lot of trouble.
The Installation instructions are straight forward. The quick start section has clear SQL instructions.
Top 9 features you need to know about covers, things like properly storing passwords, encryption using pgcrypto and backup and restore which are necessary for all databases.


Wow this works sweet. Thank you Six Sigma with R.
I have an Excel worksheet that I need to analyze. They are not always to smoothest thing to read into R.
I just downloaded and used XLConnect. First try exactly is what I wanted.

dummy code
library (XLConnect)
wb <- loadWorkbook(“toyprob.xls”)
data.toyprob <- readWorksheet(wb, sheet = 3)
str (data.toyprob)

this side is the object <- what it is assigned to

R Error Messages

I spent a good part of yesterday trying to figure what an error message meant. I was trying to draw a classification tree. I kept getting an environmental error message. I couldn’t figure out what was wrong. I searched for answer, only to find nothing useful. Then I remembered about vectorization and turning my data into a data object. I didn’t think I needed it here since I was following the example exactly. But I did.
Useful information on Data objects is in Six Sigma with R, Emilio Cano, Javier Moguerza and Andres Redchuk; Chapter 2.4.

Useful information on subsetting is in R Cookbook, Paul Teetor; Chapter 5.24

examples of what worked.
toycat <-subset(datatoycat, select= c(animal,eye,fur, legs))

toy <- rpart(toycat, method = "class")

Ignite OSCON

I am presenting at Ignite OSCON 2013. Is There a Cat in Here, Data Mining with Toys. I am busy working on my slides. It is difficult to condense data mining into 20 slides in five minutes. I am having fun doing this. I have lots of great pictures for my slides. Books that I have been using for the theory and practice of data mining are: The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer Press. And one that I now owe the library fines on, Introduction to Algorithms by Thomas Cormen

Open Source Bridge 2013

I spoke at Open Source Bridge this year. My slides

Study Design OSB – https://docs.google.com/presentation/d/1DK2Y7SWKyNljORjHY8KaAsU3o72zo52uS9wkRa-gs94/edit?usp=sharing

Open Source Bridge is a great conference. We were sad to see it over. Registration is already open for next year. I hope to see you there. http://opensourcebridge.org/

Instant R Starter, the missing piece

Instant R Starter from Packpub.com is a book that I wish was available awhile ago. The book has information that I had to dig for when I needed it. Special values (NA, NaN, INF) NA is missing number. NaN not a number. And how they are used in R.

Clear directions on working with vectors. How vectors can be used as arguments of functions.

A clear concise book that is the missing piece.  It covers R programming with code and examples of loops and how to make your own functions. The missing piece in my library of R books.

Six Sigma with R


Six Sigma with R

Statistical Engineering for Process Improvement
Cano, Emilio L.; Martinez Moguerza, Javier; Redchuk, Andrés
Publication year 2012
I am a six sigma black belt. Six Sigma with R is a straight forward book that seamlessly matches my other Six Sigma books. The R code is understandable and easy to reuse. I am using the book to help me write a talk for Open Source Bridge 2013.