Pet Containers

On git hub hackoregon/data-science-pet-containers are very useful Docker containers. Tools included are PostgreSQL, PostGIS, Anaconda and Rstudio. Add data and you are all set for exploring and developing. If you have any problems file an issue on git-hub. I am looking for what is missing from the directions.

# Tag Archives: R

# Data Visualisation with R

Data Visualisation with R

written by Thomas Rahlf

published by Springer International Publishing 2017

Originally published as Datendesign mit R, 2014

www.datavisualisation-r.com

This is a well written book for designers. Part one of the book basics and techniques covers more than the basics. Fig 2.1 is of Elements of a figure. R has the commands to put all these things on a graph.

Typefaces, fonts and symbols again more information than I usually see in an R book.

Part two is the examples. 100 examples are on their web site. The examples talk about good design layout and readability.

One of my favorites is figure 6.3.7 Tree Map. Tree Map is a good way to see proportions. How much is each part of the budget. Small important items do not disappear.

Enjoy this book. I am having fun getting the code to work on other data.

# ggplot2

Latest addition of Hadley Wickam’s book ggplot2

Springer International Publishing 2016

This is a major update. I spent a lot of time going over the last chapters in the book.

Part 3 Data Analysis covers a different way of using ggplot2. Instead of doing analysis then plotting. Do both parts at the same time using ggplot2 plot and other new useful packages.

Chapter 9 covers tidy data. Tidy data has variables in columns and observations in rows. Straight forward but the data doesn’t always come that way. Packages tidyr and dplyr help with tidying up data.

One of things covered in Chapter 10 is pipes and the package magrittr. Using pipes makes for cleaner code.

Chapter 11 Modelling for Visualization. Introduces the new package called broom. broom package takes messy data out put of model functions such as lm, glm, anova and makes them tidy.

The beginning of the book covers aes() and that you need it for your plot and geom() you keep adding them as layers.

This a good book for learning how to use ggplot2 and new techniques for analyzing data.

# Separating Data in R

I had some messy data to turn tidy. Column of data that needed to be separated into two columns. All the directions where obscure and not helpful. Try searching for a regular expression on the web.

One of the things I was puzzled over was \\.+ found out it meant gosub(). Much easier to search on. Delimiter was another puzzling thing until I realized that I could treat it the same as when I read csv files. This is the R code that worked.

library(dplyr)

library(tidyr)

tidymessydata <- (separate(messydata, State.ZIP, into = c(“State”,”Zip”), sep = ” “))

separate is a function

messydata is the data.frame and State.Zip is the column that should be two.

into is the new column names

sep is the delimiter function, space is what it was separated on. I pressed the space bar between the quotation marks.

Hopefully this is clearer than what I found for directions.

# Text Analysis with R for Students of Literature

Text Analysis with R for Students of Literature by Matthew L. Jockers, published by Springer.

This is a well written book on the topic of Text Analysis. There is enough information to give you a good start using R. Followed by easy to understand details about text analysis.

Covered in Chapter 6 type token ratio, TTR.

Chapter 7 hapex legomena, words that appear in frequency.

Chapter 8, KWIC Key word context. Including how to make a corpus.

Chapter 11, covers clustering. Chapter 12, classification Shows how to do crosstabs with xtabs function. Also SVM support Vector Machine.

Chapter 13 covers topic modeling.

This is a good book to have if you are doing text analysis.

# Building Interactive Graphs with ggplot2 and Shiny

Packt Publishing recently released a video called Building Interactive Graphs with ggplot2 and Shiny.

video available here bit.ly/1kEqYFZ

ggplot2 is a plotting system for R.

Shiny is web application server for R from Rstudio, Inc.

The video consists of talking slides and code demos. The site has code that can be downloaded to follow along with the video. The code runs. Packt publishing is really good about providing code that runs without editing. The video is clear about what and where things are that need to be downloaded to run the code.

This video is good for beginners. There is enough information and links to more info to keep you from getting lost and puzzled.

I learned more about how to use ggplot2. I have a better understanding about code elements like Aesthetics.

I have now set up a Shiny web server. The video has plenty of suggestions on how to go about hosting and sharing your shiny app.

This video has given me lots of ideas of pretty graphs and plots to develop.

# R Statistical Application Development by Example Beginner’s Guide

R Statistical Application Development by Example Beginner’s Guide by

Prabhanjan Narayanachar Tattar 1849519447 published by packtpub.com 2013

This book doesn’t do everything for you. It gets you started on topics covered in each chapter then gives you opened ended problems to solve. It took me awhile to work thru the book. The time for action exercises are worth the effort to puzzle thru and play with. The start of the book is good for beginners. The rest of the book has more advanced topics, like CART and ridge regression.

# R By Example

R by Example by Jim Albert and Maria Rizzo. Published Springer Press 2013

The thoughtfulness of this book demonstrates the authors statement that this book was written to answer students questions.

Data sets used are varied, old and newer. Including horse kicks to Prussian army officers(my great,great grandpa Peter was in the Prussian Army) and Chapter 13.1 estimating when will Sam meet Annie from Sleepless in Seattle, using Monte Carlo method for computing intervals.

Chapter 3.4 shows how to make a contingency table in R. Something that I wish there was a good package for.

Chapter 11 on Simulating Experiments tells where the term Monte Carlo came from then continues on to show by example how to implement the code.

11.5 on Patterns of dependence in a sequence has good information and R code for computing the significance of a streak. Demonstrated with winning streaks in baseball.

Appendix A covers arrays, vectors and matrix.

# True or False

This is something that I keep tripping up on in R programming. True is all caps when used as a logic operator. Same with false. Type TRUE when I want to know if something is true or set it to true. Same for false, type FALSE. And don’t leave the caps lock on.

# Machine Learning for Hackers by Drew Conway and John Myles White, O’Reily Media

Machine Learning for Hackers gets you started using R for machine learning. The book does a good job telling you how to install R and where to find help.

All the code and data for this book is on https://github.com/johnmyleswhite/ML_for_Hackers.git

Sadly there is not an R package.

There are lots examples on how to explore data using ggplot2. Other package covered include plyr which they equal to map reduce. tm package which is used in polynomial regression. glmnet and the Lamda function. K-Nearist neighbor algorithm which uses the class package.

Also good information on how to work with api’s and json using RCurl. RJSONIO and igraph.

This book is written for hackers, people who already know how to code. The theory is found in other books. More detail on specific techniques and R code is in other books. This book is a good starting point for machine learning and R.