Building a Recommendation System with R

Written by  Suresh K. Gorakala and Michele Usuelli, published by Packt Press 2015

This is whole book on a topic that is often only a single chapter in a book. It is a book for people who already know R and machine learning .

The book uses Math equations not just code for teaching the concepts.

Covers confusion matrix for classification. Along with sensitivity and specification.  Lots of details about type one and type two errors. This  clearly written section will help you understand why you don’t want either type of error and what they are.

Classification similarity measures include Euclidean Distance, Cosine Distance and Pearson Correlation.

Dimensionality  reduction techniques include Principle Component Analysis.

Data Mining techniques include K-means clustering and Support Vector Machine.

Recommender System includes collaborative filtering and content based filtering.


R package for the book is recommenderlab.

recommenderlab: Lab for Developing and Testing Recommender Algorithms by Michael Hahsler at

Other packages used are lsa, e1071, cluster.


On Meaningful Scientific Laws

On Meaningful Scientific Laws book cover
On Meaningful Scientific Laws

Written by Jean-Claude Falmange and Christopher Doble published by Springer 2015

This is a concise book full of proofs. They define Scientific Law in the Preface. An equation in which the variables represent quantities that are physical or geometrical. Meaningful is described in Chapter 5, Defining meaningfulness.

I read this book to take a break from the tangles data messes I am straightening out. It was a pleasant break.  This is a well written book with concise definitions. Well constructed proofs. I think a patient beginner could understand the book.

Useful ideas that would lead to not tangling data in the first place, like ratio scales in Chapter 2.

Chapter 9 Dimensional Invariance and Dimensional Analysis. You have to read thru the whole book because each section builds on previous to get to this valuable part. It is worth for the knowledge.

Now I am ready to get back to untangling data.

Sample Bias or is this a Classification Problem?

curious white  dog
curious white dog

I am very frustrated with the hiring process. This is going to focus on logic. Given that a diverse workforce is wanted. Take 500 diverse qualified applicants screen to 50 then 5. One screening criteria, phone screens.

Chance is .01 to make it thru the process.
Let us go back to the screening criteria, a cell phone.

A successful phone screen requires a quality cell phone and a quiet place to chat.
Did the diverse workforce even make it to start of the process? Or is the screening process starting earlier making sure that you only get the like you people.

Data Manipulation with R second edition

Jaynal Abedin, Kishor Kumar Das wrote
Data Manipulation with R second edition March 2015 published by Packt Publishing.
I found the first book useful. The second edition continues to be very useful. Chapter 5 covers R and databases. The limits of using R in memory and practical ideas of workarounds to solve the problem. Like ff and filehash packages.

Chapter 4 covers the melt function in the reshape2 package.

A lot of useful information for the hard work of getting data in shape.

Indirect Questioning in Sample Surveys

indirectquestioningIndirect Questioning in Sample Surveys written by Arijit Chaudhuri and Tasos C Christofides. Published by Springer 2013

Some topics are extremely important to have information on, but so sensitive and touchy that it is hard to get people to talk about them.

This is a good book to help with gathering information about sensitive characteristics in surveys.

Lots of math proofs to explain the techniques. Such as Item Count Technique and Three Card Method.

Chapter 4.6.2 is about Crossed Model for the case of two stigmatizing characteristics.

Chapter 7 is is on protection of privacy.

This is a good book. I look forward to digging into it and figuring out how to write computer programs for some of the techniques.

Beginning Data Science with R


Beginning Data Science with R written by Manas A. Pathak, published by Springer Publishing 2014.
ISBN 978-3-319-12065-2

Code examples at

This book is written for coders who already know how to code to learn R for data science.

The book covers how to install and use R, but not an IDE like RStudio.

Chapter 2 includes control structures and functions. That functions in R are treated as first class objects. A fundamental property of functional programming languages.

Chapter 3 is on getting data into R. How do get the data into R is a common question. Years ago I was puzzled about getting data into R. I didn’t want to type it all into an array. You don’t have to type in the data, R will read, pull, connect to all sorts of data sources.

Chapter 4 is a nice over view of data visualization.

The book goes on to cover necessary topics and techniques in Data Science. What I want to point out is Chapter 7.3.1 on nearest neighbors uses a package that I haven’t used before kknn. The package is straight forward to use. The author Pathak has written an easy to grasp explanation of the technique.

This is a good book to get you stated coding in R for data science.

R for Cloud Computing


R for Cloud Computing, An Approach for Data Scientists
A Ohri, published by Springer Science Business Media 2014

This is a useful book on how do cloud computing with R. How to set up your accounts and use OAuth to access services.

Chapter 8.1  page 237 shows how to ensure your R code doesn’t contain your login keys.

The book covers the major services available Amazon AWS, Google Cloud and MicroSoft Azure.

AWS needs a credit card even for the free services.

There are nice data visualization services. Google Vis which has an R package googleVis. And  which has an R library that you install with devtools from git hub, package plotly. Direction are on site under r/getting started.

Some things change between when a book is published and when you go to use it. Be flexible and search for what is similar and works.

Google code is gone. There is now a package on cran to interface to Google Analytics called RGoogleAnalytics. Information on Google developer site.

In addition to being a good overview of what is out there are interviews that are fun to read and a nice table of my first 25 R commands.

Practical Tools for Designing and Weighting Survey Samples

survey9781461464488Practical Tools for Designing and Weighting Survey Samples written by: Richard Valliant, Jill Dever and Frauke Kreuter. Published by Springer Science 2013.

This useful book covers code in R and SAS along with EXCELL.

Some of the R packages used are: alabama, doby, quadprog and survey.

Disposition codes Chapter 6.1 page 164 covers several different versions in use. These useful codes help get the most information out of your data, defining outcome rates. Table 6.3 of the concordance between AAPOR, American Association for Public Opinion Research and American Time Use Survey has disposition codes and descriptions.  Codes like 27, unknown eligibility, privacy detector.  Useful for using all of the information you acquire and not throwing a lot of it out.

Read this book before you design your next survey, questionnaire to get the results that you are looking for, with appropriate  sample size and power.

color in R

There are 101 shades of gray in R.  Along with lightgray, lightslategray, slategray, darkgray, darkslategray. Way more shades of gray than I will ever use.  I think I will try lavenderblush4 and chocolate4

[1] “white” “aliceblue” “antiquewhite” “antiquewhite1”
[5] “antiquewhite2” “antiquewhite3” “antiquewhite4” “aquamarine”
[9] “aquamarine1” “aquamarine2” “aquamarine3” “aquamarine4”
[13] “azure” “azure1” “azure2” “azure3”
[17] “azure4” “beige” “bisque” “bisque1”
[21] “bisque2” “bisque3” “bisque4” “black”
[25] “blanchedalmond” “blue” “blue1” “blue2”
[29] “blue3” “blue4” “blueviolet” “brown”
[33] “brown1” “brown2” “brown3” “brown4”
[37] “burlywood” “burlywood1” “burlywood2” “burlywood3”
[41] “burlywood4” “cadetblue” “cadetblue1” “cadetblue2”
[45] “cadetblue3” “cadetblue4” “chartreuse” “chartreuse1”
[49] “chartreuse2” “chartreuse3” “chartreuse4” “chocolate”
[53] “chocolate1” “chocolate2” “chocolate3” “chocolate4”
[57] “coral” “coral1” “coral2” “coral3”
[61] “coral4” “cornflowerblue” “cornsilk” “cornsilk1”
[65] “cornsilk2” “cornsilk3” “cornsilk4” “cyan”
[69] “cyan1” “cyan2” “cyan3” “cyan4”
[73] “darkblue” “darkcyan” “darkgoldenrod” “darkgoldenrod1”
[77] “darkgoldenrod2” “darkgoldenrod3” “darkgoldenrod4” “darkgray”
[81] “darkgreen” “darkgrey” “darkkhaki” “darkmagenta”
[85] “darkolivegreen” “darkolivegreen1” “darkolivegreen2” “darkolivegreen3”
[89] “darkolivegreen4” “darkorange” “darkorange1” “darkorange2”
[93] “darkorange3” “darkorange4” “darkorchid” “darkorchid1”
[97] “darkorchid2” “darkorchid3” “darkorchid4” “darkred”
[101] “darksalmon” “darkseagreen” “darkseagreen1” “darkseagreen2”
[105] “darkseagreen3” “darkseagreen4” “darkslateblue” “darkslategray”
[109] “darkslategray1” “darkslategray2” “darkslategray3” “darkslategray4”
[113] “darkslategrey” “darkturquoise” “darkviolet” “deeppink”
[117] “deeppink1” “deeppink2” “deeppink3” “deeppink4”
[121] “deepskyblue” “deepskyblue1” “deepskyblue2” “deepskyblue3”
[125] “deepskyblue4” “dimgray” “dimgrey” “dodgerblue”
[129] “dodgerblue1” “dodgerblue2” “dodgerblue3” “dodgerblue4”
[133] “firebrick” “firebrick1” “firebrick2” “firebrick3”
[137] “firebrick4” “floralwhite” “forestgreen” “gainsboro”
[141] “ghostwhite” “gold” “gold1” “gold2”
[145] “gold3” “gold4” “goldenrod” “goldenrod1”
[149] “goldenrod2” “goldenrod3” “goldenrod4” “gray”
[153] “gray0” “gray1” “gray2” “gray3”
[157] “gray4” “gray5” “gray6” “gray7”
[161] “gray8” “gray9” “gray10” “gray11”
[165] “gray12” “gray13” “gray14” “gray15”
[169] “gray16” “gray17” “gray18” “gray19”
[173] “gray20” “gray21” “gray22” “gray23”
[177] “gray24” “gray25” “gray26” “gray27”
[181] “gray28” “gray29” “gray30” “gray31”
[185] “gray32” “gray33” “gray34” “gray35”
[189] “gray36” “gray37” “gray38” “gray39”
[193] “gray40” “gray41” “gray42” “gray43”
[197] “gray44” “gray45” “gray46” “gray47”
[201] “gray48” “gray49” “gray50” “gray51”
[205] “gray52” “gray53” “gray54” “gray55”
[209] “gray56” “gray57” “gray58” “gray59”
[213] “gray60” “gray61” “gray62” “gray63”
[217] “gray64” “gray65” “gray66” “gray67”
[221] “gray68” “gray69” “gray70” “gray71”
[225] “gray72” “gray73” “gray74” “gray75”
[229] “gray76” “gray77” “gray78” “gray79”
[233] “gray80” “gray81” “gray82” “gray83”
[237] “gray84” “gray85” “gray86” “gray87”
[241] “gray88” “gray89” “gray90” “gray91”
[245] “gray92” “gray93” “gray94” “gray95”
[249] “gray96” “gray97” “gray98” “gray99”
[253] “gray100” “green” “green1” “green2”
[257] “green3” “green4” “greenyellow” “grey”
[261] “grey0” “grey1” “grey2” “grey3”
[265] “grey4” “grey5” “grey6” “grey7”
[269] “grey8” “grey9” “grey10” “grey11”
[273] “grey12” “grey13” “grey14” “grey15”
[277] “grey16” “grey17” “grey18” “grey19”
[281] “grey20” “grey21” “grey22” “grey23”
[285] “grey24” “grey25” “grey26” “grey27”
[289] “grey28” “grey29” “grey30” “grey31”
[293] “grey32” “grey33” “grey34” “grey35”
[297] “grey36” “grey37” “grey38” “grey39”
[301] “grey40” “grey41” “grey42” “grey43”
[305] “grey44” “grey45” “grey46” “grey47”
[309] “grey48” “grey49” “grey50” “grey51”
[313] “grey52” “grey53” “grey54” “grey55”
[317] “grey56” “grey57” “grey58” “grey59”
[321] “grey60” “grey61” “grey62” “grey63”
[325] “grey64” “grey65” “grey66” “grey67”
[329] “grey68” “grey69” “grey70” “grey71”
[333] “grey72” “grey73” “grey74” “grey75”
[337] “grey76” “grey77” “grey78” “grey79”
[341] “grey80” “grey81” “grey82” “grey83”
[345] “grey84” “grey85” “grey86” “grey87”
[349] “grey88” “grey89” “grey90” “grey91”
[353] “grey92” “grey93” “grey94” “grey95”
[357] “grey96” “grey97” “grey98” “grey99”
[361] “grey100” “honeydew” “honeydew1” “honeydew2”
[365] “honeydew3” “honeydew4” “hotpink” “hotpink1”
[369] “hotpink2” “hotpink3” “hotpink4” “indianred”
[373] “indianred1” “indianred2” “indianred3” “indianred4”
[377] “ivory” “ivory1” “ivory2” “ivory3”
[381] “ivory4” “khaki” “khaki1” “khaki2”
[385] “khaki3” “khaki4” “lavender” “lavenderblush”
[389] “lavenderblush1” “lavenderblush2” “lavenderblush3” “lavenderblush4”
[393] “lawngreen” “lemonchiffon” “lemonchiffon1” “lemonchiffon2”
[397] “lemonchiffon3” “lemonchiffon4” “lightblue” “lightblue1”
[401] “lightblue2” “lightblue3” “lightblue4” “lightcoral”
[405] “lightcyan” “lightcyan1” “lightcyan2” “lightcyan3”
[409] “lightcyan4” “lightgoldenrod” “lightgoldenrod1” “lightgoldenrod2”
[413] “lightgoldenrod3” “lightgoldenrod4” “lightgoldenrodyellow” “lightgray”
[417] “lightgreen” “lightgrey” “lightpink” “lightpink1”
[421] “lightpink2” “lightpink3” “lightpink4” “lightsalmon”
[425] “lightsalmon1” “lightsalmon2” “lightsalmon3” “lightsalmon4”
[429] “lightseagreen” “lightskyblue” “lightskyblue1” “lightskyblue2”
[433] “lightskyblue3” “lightskyblue4” “lightslateblue” “lightslategray”
[437] “lightslategrey” “lightsteelblue” “lightsteelblue1” “lightsteelblue2”
[441] “lightsteelblue3” “lightsteelblue4” “lightyellow” “lightyellow1”
[445] “lightyellow2” “lightyellow3” “lightyellow4” “limegreen”
[449] “linen” “magenta” “magenta1” “magenta2”
[453] “magenta3” “magenta4” “maroon” “maroon1”
[457] “maroon2” “maroon3” “maroon4” “mediumaquamarine”
[461] “mediumblue” “mediumorchid” “mediumorchid1” “mediumorchid2”
[465] “mediumorchid3” “mediumorchid4” “mediumpurple” “mediumpurple1”
[469] “mediumpurple2” “mediumpurple3” “mediumpurple4” “mediumseagreen”
[473] “mediumslateblue” “mediumspringgreen” “mediumturquoise” “mediumvioletred”
[477] “midnightblue” “mintcream” “mistyrose” “mistyrose1”
[481] “mistyrose2” “mistyrose3” “mistyrose4” “moccasin”
[485] “navajowhite” “navajowhite1” “navajowhite2” “navajowhite3”
[489] “navajowhite4” “navy” “navyblue” “oldlace”
[493] “olivedrab” “olivedrab1” “olivedrab2” “olivedrab3”
[497] “olivedrab4” “orange” “orange1” “orange2”
[501] “orange3” “orange4” “orangered” “orangered1”
[505] “orangered2” “orangered3” “orangered4” “orchid”
[509] “orchid1” “orchid2” “orchid3” “orchid4”
[513] “palegoldenrod” “palegreen” “palegreen1” “palegreen2”
[517] “palegreen3” “palegreen4” “paleturquoise” “paleturquoise1”
[521] “paleturquoise2” “paleturquoise3” “paleturquoise4” “palevioletred”
[525] “palevioletred1” “palevioletred2” “palevioletred3” “palevioletred4”
[529] “papayawhip” “peachpuff” “peachpuff1” “peachpuff2”
[533] “peachpuff3” “peachpuff4” “peru” “pink”
[537] “pink1” “pink2” “pink3” “pink4”
[541] “plum” “plum1” “plum2” “plum3”
[545] “plum4” “powderblue” “purple” “purple1”
[549] “purple2” “purple3” “purple4” “red”
[553] “red1” “red2” “red3” “red4”
[557] “rosybrown” “rosybrown1” “rosybrown2” “rosybrown3”
[561] “rosybrown4” “royalblue” “royalblue1” “royalblue2”
[565] “royalblue3” “royalblue4” “saddlebrown” “salmon”
[569] “salmon1” “salmon2” “salmon3” “salmon4”
[573] “sandybrown” “seagreen” “seagreen1” “seagreen2”
[577] “seagreen3” “seagreen4” “seashell” “seashell1”
[581] “seashell2” “seashell3” “seashell4” “sienna”
[585] “sienna1” “sienna2” “sienna3” “sienna4”
[589] “skyblue” “skyblue1” “skyblue2” “skyblue3”
[593] “skyblue4” “slateblue” “slateblue1” “slateblue2”
[597] “slateblue3” “slateblue4” “slategray” “slategray1”
[601] “slategray2” “slategray3” “slategray4” “slategrey”
[605] “snow” “snow1” “snow2” “snow3”
[609] “snow4” “springgreen” “springgreen1” “springgreen2”
[613] “springgreen3” “springgreen4” “steelblue” “steelblue1”
[617] “steelblue2” “steelblue3” “steelblue4” “tan”
[621] “tan1” “tan2” “tan3” “tan4”
[625] “thistle” “thistle1” “thistle2” “thistle3”
[629] “thistle4” “tomato” “tomato1” “tomato2”
[633] “tomato3” “tomato4” “turquoise” “turquoise1”
[637] “turquoise2” “turquoise3” “turquoise4” “violet”
[641] “violetred” “violetred1” “violetred2” “violetred3”
[645] “violetred4” “wheat” “wheat1” “wheat2”
[649] “wheat3” “wheat4” “whitesmoke” “yellow”
[653] “yellow1” “yellow2” “yellow3” “yellow4”
[657] “yellowgreen”

CellularAutomation R Package

CellularAutomation R Package for doing cellular automation.  The package looks like it has all the functions needed to do cellular automata. The documentation is  light on how and what to do. It is going to take more time and research to come up with an explanation for general audiences. Until then enjoy this picture and example code.

ca = CellularAutomaton(n = 110, t = 100, seed = c(0, 0, 1, 0, 0, 0), bg = -1)
ca$plot(col = c(“white”, “purple”))

automaa celluar