Beginning Data Science with R


Beginning Data Science with R written by Manas A. Pathak, published by Springer Publishing 2014.
ISBN 978-3-319-12065-2

Code examples at

This book is written for coders who already know how to code to learn R for data science.

The book covers how to install and use R, but not an IDE like RStudio.

Chapter 2 includes control structures and functions. That functions in R are treated as first class objects. A fundamental property of functional programming languages.

Chapter 3 is on getting data into R. How do get the data into R is a common question. Years ago I was puzzled about getting data into R. I didn’t want to type it all into an array. You don’t have to type in the data, R will read, pull, connect to all sorts of data sources.

Chapter 4 is a nice over view of data visualization.

The book goes on to cover necessary topics and techniques in Data Science. What I want to point out is Chapter 7.3.1 on nearest neighbors uses a package that I haven’t used before kknn. The package is straight forward to use. The author Pathak has written an easy to grasp explanation of the technique.

This is a good book to get you stated coding in R for data science.

R for Cloud Computing


R for Cloud Computing, An Approach for Data Scientists
A Ohri, published by Springer Science Business Media 2014

This is a useful book on how do cloud computing with R. How to set up your accounts and use OAuth to access services.

Chapter 8.1  page 237 shows how to ensure your R code doesn’t contain your login keys.

The book covers the major services available Amazon AWS, Google Cloud and MicroSoft Azure.

AWS needs a credit card even for the free services.

There are nice data visualization services. Google Vis which has an R package googleVis. And  which has an R library that you install with devtools from git hub, package plotly. Direction are on site under r/getting started.

Some things change between when a book is published and when you go to use it. Be flexible and search for what is similar and works.

Google code is gone. There is now a package on cran to interface to Google Analytics called RGoogleAnalytics. Information on Google developer site.

In addition to being a good overview of what is out there are interviews that are fun to read and a nice table of my first 25 R commands.

Practical Tools for Designing and Weighting Survey Samples

survey9781461464488Practical Tools for Designing and Weighting Survey Samples written by: Richard Valliant, Jill Dever and Frauke Kreuter. Published by Springer Science 2013.

This useful book covers code in R and SAS along with EXCELL.

Some of the R packages used are: alabama, doby, quadprog and survey.

Disposition codes Chapter 6.1 page 164 covers several different versions in use. These useful codes help get the most information out of your data, defining outcome rates. Table 6.3 of the concordance between AAPOR, American Association for Public Opinion Research and American Time Use Survey has disposition codes and descriptions.  Codes like 27, unknown eligibility, privacy detector.  Useful for using all of the information you acquire and not throwing a lot of it out.

Read this book before you design your next survey, questionnaire to get the results that you are looking for, with appropriate  sample size and power.

color in R

There are 101 shades of gray in R.  Along with lightgray, lightslategray, slategray, darkgray, darkslategray. Way more shades of gray than I will ever use.  I think I will try lavenderblush4 and chocolate4

[1] “white” “aliceblue” “antiquewhite” “antiquewhite1″
[5] “antiquewhite2″ “antiquewhite3″ “antiquewhite4″ “aquamarine”
[9] “aquamarine1″ “aquamarine2″ “aquamarine3″ “aquamarine4″
[13] “azure” “azure1″ “azure2″ “azure3″
[17] “azure4″ “beige” “bisque” “bisque1″
[21] “bisque2″ “bisque3″ “bisque4″ “black”
[25] “blanchedalmond” “blue” “blue1″ “blue2″
[29] “blue3″ “blue4″ “blueviolet” “brown”
[33] “brown1″ “brown2″ “brown3″ “brown4″
[37] “burlywood” “burlywood1″ “burlywood2″ “burlywood3″
[41] “burlywood4″ “cadetblue” “cadetblue1″ “cadetblue2″
[45] “cadetblue3″ “cadetblue4″ “chartreuse” “chartreuse1″
[49] “chartreuse2″ “chartreuse3″ “chartreuse4″ “chocolate”
[53] “chocolate1″ “chocolate2″ “chocolate3″ “chocolate4″
[57] “coral” “coral1″ “coral2″ “coral3″
[61] “coral4″ “cornflowerblue” “cornsilk” “cornsilk1″
[65] “cornsilk2″ “cornsilk3″ “cornsilk4″ “cyan”
[69] “cyan1″ “cyan2″ “cyan3″ “cyan4″
[73] “darkblue” “darkcyan” “darkgoldenrod” “darkgoldenrod1″
[77] “darkgoldenrod2″ “darkgoldenrod3″ “darkgoldenrod4″ “darkgray”
[81] “darkgreen” “darkgrey” “darkkhaki” “darkmagenta”
[85] “darkolivegreen” “darkolivegreen1″ “darkolivegreen2″ “darkolivegreen3″
[89] “darkolivegreen4″ “darkorange” “darkorange1″ “darkorange2″
[93] “darkorange3″ “darkorange4″ “darkorchid” “darkorchid1″
[97] “darkorchid2″ “darkorchid3″ “darkorchid4″ “darkred”
[101] “darksalmon” “darkseagreen” “darkseagreen1″ “darkseagreen2″
[105] “darkseagreen3″ “darkseagreen4″ “darkslateblue” “darkslategray”
[109] “darkslategray1″ “darkslategray2″ “darkslategray3″ “darkslategray4″
[113] “darkslategrey” “darkturquoise” “darkviolet” “deeppink”
[117] “deeppink1″ “deeppink2″ “deeppink3″ “deeppink4″
[121] “deepskyblue” “deepskyblue1″ “deepskyblue2″ “deepskyblue3″
[125] “deepskyblue4″ “dimgray” “dimgrey” “dodgerblue”
[129] “dodgerblue1″ “dodgerblue2″ “dodgerblue3″ “dodgerblue4″
[133] “firebrick” “firebrick1″ “firebrick2″ “firebrick3″
[137] “firebrick4″ “floralwhite” “forestgreen” “gainsboro”
[141] “ghostwhite” “gold” “gold1″ “gold2″
[145] “gold3″ “gold4″ “goldenrod” “goldenrod1″
[149] “goldenrod2″ “goldenrod3″ “goldenrod4″ “gray”
[153] “gray0″ “gray1″ “gray2″ “gray3″
[157] “gray4″ “gray5″ “gray6″ “gray7″
[161] “gray8″ “gray9″ “gray10″ “gray11″
[165] “gray12″ “gray13″ “gray14″ “gray15″
[169] “gray16″ “gray17″ “gray18″ “gray19″
[173] “gray20″ “gray21″ “gray22″ “gray23″
[177] “gray24″ “gray25″ “gray26″ “gray27″
[181] “gray28″ “gray29″ “gray30″ “gray31″
[185] “gray32″ “gray33″ “gray34″ “gray35″
[189] “gray36″ “gray37″ “gray38″ “gray39″
[193] “gray40″ “gray41″ “gray42″ “gray43″
[197] “gray44″ “gray45″ “gray46″ “gray47″
[201] “gray48″ “gray49″ “gray50″ “gray51″
[205] “gray52″ “gray53″ “gray54″ “gray55″
[209] “gray56″ “gray57″ “gray58″ “gray59″
[213] “gray60″ “gray61″ “gray62″ “gray63″
[217] “gray64″ “gray65″ “gray66″ “gray67″
[221] “gray68″ “gray69″ “gray70″ “gray71″
[225] “gray72″ “gray73″ “gray74″ “gray75″
[229] “gray76″ “gray77″ “gray78″ “gray79″
[233] “gray80″ “gray81″ “gray82″ “gray83″
[237] “gray84″ “gray85″ “gray86″ “gray87″
[241] “gray88″ “gray89″ “gray90″ “gray91″
[245] “gray92″ “gray93″ “gray94″ “gray95″
[249] “gray96″ “gray97″ “gray98″ “gray99″
[253] “gray100″ “green” “green1″ “green2″
[257] “green3″ “green4″ “greenyellow” “grey”
[261] “grey0″ “grey1″ “grey2″ “grey3″
[265] “grey4″ “grey5″ “grey6″ “grey7″
[269] “grey8″ “grey9″ “grey10″ “grey11″
[273] “grey12″ “grey13″ “grey14″ “grey15″
[277] “grey16″ “grey17″ “grey18″ “grey19″
[281] “grey20″ “grey21″ “grey22″ “grey23″
[285] “grey24″ “grey25″ “grey26″ “grey27″
[289] “grey28″ “grey29″ “grey30″ “grey31″
[293] “grey32″ “grey33″ “grey34″ “grey35″
[297] “grey36″ “grey37″ “grey38″ “grey39″
[301] “grey40″ “grey41″ “grey42″ “grey43″
[305] “grey44″ “grey45″ “grey46″ “grey47″
[309] “grey48″ “grey49″ “grey50″ “grey51″
[313] “grey52″ “grey53″ “grey54″ “grey55″
[317] “grey56″ “grey57″ “grey58″ “grey59″
[321] “grey60″ “grey61″ “grey62″ “grey63″
[325] “grey64″ “grey65″ “grey66″ “grey67″
[329] “grey68″ “grey69″ “grey70″ “grey71″
[333] “grey72″ “grey73″ “grey74″ “grey75″
[337] “grey76″ “grey77″ “grey78″ “grey79″
[341] “grey80″ “grey81″ “grey82″ “grey83″
[345] “grey84″ “grey85″ “grey86″ “grey87″
[349] “grey88″ “grey89″ “grey90″ “grey91″
[353] “grey92″ “grey93″ “grey94″ “grey95″
[357] “grey96″ “grey97″ “grey98″ “grey99″
[361] “grey100″ “honeydew” “honeydew1″ “honeydew2″
[365] “honeydew3″ “honeydew4″ “hotpink” “hotpink1″
[369] “hotpink2″ “hotpink3″ “hotpink4″ “indianred”
[373] “indianred1″ “indianred2″ “indianred3″ “indianred4″
[377] “ivory” “ivory1″ “ivory2″ “ivory3″
[381] “ivory4″ “khaki” “khaki1″ “khaki2″
[385] “khaki3″ “khaki4″ “lavender” “lavenderblush”
[389] “lavenderblush1″ “lavenderblush2″ “lavenderblush3″ “lavenderblush4″
[393] “lawngreen” “lemonchiffon” “lemonchiffon1″ “lemonchiffon2″
[397] “lemonchiffon3″ “lemonchiffon4″ “lightblue” “lightblue1″
[401] “lightblue2″ “lightblue3″ “lightblue4″ “lightcoral”
[405] “lightcyan” “lightcyan1″ “lightcyan2″ “lightcyan3″
[409] “lightcyan4″ “lightgoldenrod” “lightgoldenrod1″ “lightgoldenrod2″
[413] “lightgoldenrod3″ “lightgoldenrod4″ “lightgoldenrodyellow” “lightgray”
[417] “lightgreen” “lightgrey” “lightpink” “lightpink1″
[421] “lightpink2″ “lightpink3″ “lightpink4″ “lightsalmon”
[425] “lightsalmon1″ “lightsalmon2″ “lightsalmon3″ “lightsalmon4″
[429] “lightseagreen” “lightskyblue” “lightskyblue1″ “lightskyblue2″
[433] “lightskyblue3″ “lightskyblue4″ “lightslateblue” “lightslategray”
[437] “lightslategrey” “lightsteelblue” “lightsteelblue1″ “lightsteelblue2″
[441] “lightsteelblue3″ “lightsteelblue4″ “lightyellow” “lightyellow1″
[445] “lightyellow2″ “lightyellow3″ “lightyellow4″ “limegreen”
[449] “linen” “magenta” “magenta1″ “magenta2″
[453] “magenta3″ “magenta4″ “maroon” “maroon1″
[457] “maroon2″ “maroon3″ “maroon4″ “mediumaquamarine”
[461] “mediumblue” “mediumorchid” “mediumorchid1″ “mediumorchid2″
[465] “mediumorchid3″ “mediumorchid4″ “mediumpurple” “mediumpurple1″
[469] “mediumpurple2″ “mediumpurple3″ “mediumpurple4″ “mediumseagreen”
[473] “mediumslateblue” “mediumspringgreen” “mediumturquoise” “mediumvioletred”
[477] “midnightblue” “mintcream” “mistyrose” “mistyrose1″
[481] “mistyrose2″ “mistyrose3″ “mistyrose4″ “moccasin”
[485] “navajowhite” “navajowhite1″ “navajowhite2″ “navajowhite3″
[489] “navajowhite4″ “navy” “navyblue” “oldlace”
[493] “olivedrab” “olivedrab1″ “olivedrab2″ “olivedrab3″
[497] “olivedrab4″ “orange” “orange1″ “orange2″
[501] “orange3″ “orange4″ “orangered” “orangered1″
[505] “orangered2″ “orangered3″ “orangered4″ “orchid”
[509] “orchid1″ “orchid2″ “orchid3″ “orchid4″
[513] “palegoldenrod” “palegreen” “palegreen1″ “palegreen2″
[517] “palegreen3″ “palegreen4″ “paleturquoise” “paleturquoise1″
[521] “paleturquoise2″ “paleturquoise3″ “paleturquoise4″ “palevioletred”
[525] “palevioletred1″ “palevioletred2″ “palevioletred3″ “palevioletred4″
[529] “papayawhip” “peachpuff” “peachpuff1″ “peachpuff2″
[533] “peachpuff3″ “peachpuff4″ “peru” “pink”
[537] “pink1″ “pink2″ “pink3″ “pink4″
[541] “plum” “plum1″ “plum2″ “plum3″
[545] “plum4″ “powderblue” “purple” “purple1″
[549] “purple2″ “purple3″ “purple4″ “red”
[553] “red1″ “red2″ “red3″ “red4″
[557] “rosybrown” “rosybrown1″ “rosybrown2″ “rosybrown3″
[561] “rosybrown4″ “royalblue” “royalblue1″ “royalblue2″
[565] “royalblue3″ “royalblue4″ “saddlebrown” “salmon”
[569] “salmon1″ “salmon2″ “salmon3″ “salmon4″
[573] “sandybrown” “seagreen” “seagreen1″ “seagreen2″
[577] “seagreen3″ “seagreen4″ “seashell” “seashell1″
[581] “seashell2″ “seashell3″ “seashell4″ “sienna”
[585] “sienna1″ “sienna2″ “sienna3″ “sienna4″
[589] “skyblue” “skyblue1″ “skyblue2″ “skyblue3″
[593] “skyblue4″ “slateblue” “slateblue1″ “slateblue2″
[597] “slateblue3″ “slateblue4″ “slategray” “slategray1″
[601] “slategray2″ “slategray3″ “slategray4″ “slategrey”
[605] “snow” “snow1″ “snow2″ “snow3″
[609] “snow4″ “springgreen” “springgreen1″ “springgreen2″
[613] “springgreen3″ “springgreen4″ “steelblue” “steelblue1″
[617] “steelblue2″ “steelblue3″ “steelblue4″ “tan”
[621] “tan1″ “tan2″ “tan3″ “tan4″
[625] “thistle” “thistle1″ “thistle2″ “thistle3″
[629] “thistle4″ “tomato” “tomato1″ “tomato2″
[633] “tomato3″ “tomato4″ “turquoise” “turquoise1″
[637] “turquoise2″ “turquoise3″ “turquoise4″ “violet”
[641] “violetred” “violetred1″ “violetred2″ “violetred3″
[645] “violetred4″ “wheat” “wheat1″ “wheat2″
[649] “wheat3″ “wheat4″ “whitesmoke” “yellow”
[653] “yellow1″ “yellow2″ “yellow3″ “yellow4″
[657] “yellowgreen”

CellularAutomation R Package

CellularAutomation R Package for doing cellular automation.  The package looks like it has all the functions needed to do cellular automata. The documentation is  light on how and what to do. It is going to take more time and research to come up with an explanation for general audiences. Until then enjoy this picture and example code.

ca = CellularAutomaton(n = 110, t = 100, seed = c(0, 0, 1, 0, 0, 0), bg = -1)
ca$plot(col = c(“white”, “purple”))

automaa celluar


Cellular Automation in Image Processing and Geometry

9783319064307cellautoCellular Automation in Image Processing and Geometry

Edited by Paul Rosin, Andrew Adamatzky and Xianfang Sun

Published by Springer March 2014

This morning I went looking for a book to explain the topic of Cellular Automata.  Last night at Ruby Brightnight which is a code challenge group, I found that I couldn’t adequately explain cellular automata. Our challenge was to code the game of life in ruby.

This book looks helpful. Especially chapter 13 Interactive Cellular Automata Systems for Creative Projects written by Angus Graeme Forbes.  The chapter discusses the game of life. Then goes into Fluid Automata. A very pretty algorithm with pseudo code.

This is an interesting book worth digging into.

Applied Predictive Modeling

apppredictlearnMax Kuhn and Kjell Johnson; Applied Predictive Modeling published by Springer 2013

This is such a good book it has taken me awhile to work through the book.  All the while finding examples of why people should read the book.

The summary in 2.3 does a good job of explaining why this subject is so important. Easy to pick a model, hard to get it correct with reliable, trustworthy results.

I was asked what models were in the book. All the commonly used ones  like K-Nearest Neighbors, plus models like Multivariate Adaptive Regression Spines and Cubist Regression Trees for Regression Models.

Classification Models including Nearest Shrunken Centroid and Nonlinear Classification Models.

Well thought out examples with the R packages and example code.

Take your time and work through this book.




Git Suddenly Wanted My Password

I have been feeling pretty good about using git. Then when I tired to push back to git hub. Git asked for a password for git hub. Frustrating, every  time I feel comfortable coding with out training wheels , something happens. I solved the problem following the directions on git hub and installed osxkeychain. Git push worked the next time.

How You Can Improve Open Source Documentation


I gave a talk at the end of the Ascend Project about my troubles with Open Source Documentation. I called the talk Cookie Crumbs.  Maybe I should of named it after the three Bears and Goldilocks.  Some documentation is too little, some too much. Here is what you can do to make it just right. I am assuming that you have a git hub account and use git.

Find a project repository on git hub . Read the documentation.

Fork the repo to your git hub account.

In a terminal window on your own computer make a folder for the cloned repo.

mkdir name_of_folder

cd into that folder

cd name_of_folder

In that directory at command prompt type in the text from SSH clone url box on the forked repository page.

git clone your_fork_SSH_clone_url_code

Next so that you can push code back to git hub you want to check out a branch. Give the branch the same name as the folder.

git checkout -b name_ of_branch

This command will also switch branches, an extra bonus.

This command makes your branch the master.

git branch -D master

Open the documentation that came with the clone repo in a text editor. Documentation often ends in .md for markdown. You want to use git hub flavored markdown.

Write , code whatever you think is necessary to fix the documentation. It can be as little as spelling and grammar errors, localization or as big as a fancy new tutorial.  When you are finished make sure you save the file into the folder for the checked out branch.

Now you are ready to push the code back to git hub.

Before you push check git status and commit anything that needs committing.

git status

git commit -m”message”

git push -u origin name_of_branch

Go to your forked repository on git hub and issue a pull request for the pushed committed code back to the original repository.  Hopefully your pull request is accepted.  If not the documentation is still in your forked repository ready to be found and used by someone.