Category Archives: Book Review

Data Visualisation with R

Data Visualisation with R

written by Thomas Rahlf

published by Springer International Publishing 2017

Originally published as Datendesign mit R, 2014

www.datavisualisation-r.com

This is a well written book for designers.  Part one of the book basics and techniques covers more than the basics.  Fig 2.1 is of Elements of a figure. R has the commands to put all these things on a graph.

Typefaces, fonts and symbols again more information than I usually see in an R book.

Part two is the examples. 100 examples  are on their web site. The examples talk about good design layout and readability.

One of my favorites is figure 6.3.7 Tree Map. Tree Map is a good way to see proportions. How much is each part of the budget. Small important items do not disappear.

Enjoy this book. I am having fun getting the code to work on other data.

Data the World’s Most Valuable Resource

I just read an Article in the May 6th 2017 The Economist. Briefing The data economy, Fuel of he future.

An interesting thought that data is the oil of this century.

“Data are to this century what oil was to the last one: a driver of growth and change.” page 19

This idea gives you a lot to think about.

One thing to think about is the lack of fungibility for data.

And who owns the data?

Lot’s to think about.

ggplot2

ggplot2 book cover
gggplot2 Elegant Graphics for Data Analysis

Latest addition of Hadley Wickam’s book ggplot2

Springer International Publishing 2016

This is a major update. I spent a lot of time going over the last chapters in the book.

Part 3 Data Analysis covers a different way of using ggplot2. Instead of doing analysis then plotting. Do both parts at the same time using ggplot2 plot and other new useful packages.

Chapter 9 covers tidy data. Tidy data has variables in columns  and observations  in rows. Straight forward but the data doesn’t always come that way.  Packages tidyr and dplyr  help with tidying up data.

One of things covered in Chapter 10 is pipes and the package magrittr. Using pipes makes for cleaner code.

Chapter 11 Modelling for Visualization. Introduces the new package called broom. broom package takes messy data out put of model functions such as lm, glm, anova and makes them tidy.

The beginning of the book covers aes() and that you need it for your plot and geom() you keep adding them as layers.

This a good book for learning how to use ggplot2 and new techniques for analyzing data.

The Cox Model and Its Applications

The Cox Model and Its Applications
The Cox Model and Its Applications

The Cox Model and Its Applications published in Springer Briefs in Statistics 2016. Written by Mikhail Nikulin and Hong-Dar Isaac Wu.

I enjoyed reading this book although it has no code examples. I think I can figure out the code from the precise equations.

Cox proportional hazards model is a type of survival analysis.  The proportional hazards model was put forward by Sir David Cox in 1972.

Chapter 2 covers the basic concepts for models. Including  classical parametric models and how to handle censored data.

Chapter 3 covers the cox proportional hazards model including tampered failure time model.

Chapter 5 is about Cross-effect Models of Survival Functions.

5.2 Parametric Weibull Regression with Hetroscedastic Shape parameter.

There are lots more models. I recommend reading the book with a card you have written on explaining in a way you understand the  definitions and symbols used in the book.

 

Web Application Development with R using Shiny

Chris Beely wrote Web Application Development with R using Shiny, second edition, published by Packt Publishing January 2016

Shiny is based on bootstrap.

This is a good book to read even if you were not planning on using Shiny because it covers a lot about web app development.

I have found R and Shiny a useful tool for data scientists to communicate with developers. It make great mock-ups.

The code for the book is on packtpub.com

jQuery Essentials

Troy Miles wrote jQuery Essentials published by Packt Publishing 2016. This is a good enough book that twice I started a review of it. The code for this book is available to download on packtpub.com

The book has good coverage of the DOM, document object model. I like the section in chapter9 about never modify the DOM in a loop.

Chapter 8 about separation of concerns covers unit tests. Tells you how to use events to decouple code. Break the code into logical units. Separation of Concerns is a useful software architecture pattern.

The book covers key fundamentals of jQuery

Practical DevOps

Practical DevOps by Joakim Verona published by Packt Publishing 2016

I am taking a DevOps class thru Hack Oregon. I found this book useful and recommended it to my class. We are learning how to use Ansible to provision and this book was most helpful. Chapter Seven has code to do Ansible and Docker together. I am working on getting this to work.

Python Data Science Essentials

Authors: Alberto Boschetti and Luca Massaron published by Packt April 2015.

I am a Data Scientist who usually codes in R. It was a challenge to get comfortable  enough in python code to review the book. Python come in a lot of flavors.  I used Anaconda Launcher to run jupyter notebooks. The code is on the publishers page.

With broad strokes in six chapters it cover the fundamentals of Data Science using python. The pretty blue mosaic tile swirl on the cover catches your eye.

My favorite chapter is chapter five on Social Network Analysis. I like the table on graph types, node and edges. For example Twitter, a directed graph, people are nodes and followers are edges. Very useful table for writing code.

Get the code, run the notebooks, have fun.

 

 

Mastering Social Media Mining with R

Mastering Social Media Mining with R

Sharan Kumar Ravindran September 2015 Packt publisher

Useful R book that covers current Social Media and  data science techniques.

My favorite library in this book is from chapter six, SocialMediaMineR.

The function get_facebook from SocialMediaMineR package takes a URL and returns a data frame of shares, likes etc.  The function is easy to use. You do not need OAuth just a link. Works like this:

> library(SocialMediaMineR)
> get_facebook(“https://www.packtpub.com”)
trying URL ‘http://curl.haxx.se/ca/cacert.pem’
Content type ‘¸’
ýþ’ length 256338 bytes (250 Kb)
opened URL
downloaded 250 Kb

url normalized_url
1 https://www.packtpub.com https://www.packtpub.com/
share_count like_count comment_count total_count click_count
1 432 361 155 948 0
comments_fbid commentsbox_count
1 10150745127795008 0
>

This one function could keep you occupied for a long time.

But there are other useful libraries in this book: ROAuth for OAuth, twitterR for Twitter, Rfacebook for facebook, and rgithub for github.

The book covers exploratory data analysis, EDA. in the chapter on github.

Sentiment Analysis in the chapter on Twitter.

The book briefly covers a lot. There are many other books that cover a single topic in more detail. Read this book to discover what you want to explore.