All posts by Mary Anne

Data Scientist

Documentation Needed

Dark Brown Chocolate cups cakes with white powdered sugar sprinkled on cupcakes
Chocolate cup cakes

I write cooking recipes for publication. Documentation is a lot like writing recipes. Recipes are for all types of cooks, beginners and experts. Most of the documentation I find is sparse, lacking details, written for the person who wrote the code. It is hard to write directions for people who don’t know how to use or do the coding. I have been trying to expand the documentation for some Docker containers and I can’t find a new user that can tell me what is missing. I hear everything is find from experts.

Suggestions for requirements for documentation from recipes submission requirements.

Title, something instead of documentation.

Tell us about your recipe. Needs to be a little more than it is a Docker container.

Ingredients. Let people know what is needed. Software, libraries, how much memory needed. Give people an idea if they even can do the task with what system they have. Do not send people to the grocery store for a missing ingredient or to borrow a cup of sugar from a neighbor. This is really important for when people come to a half day workshop only to find out they can’t participate.

Prep method, for recipes what is needed is stove top, oven, grill, blender. What are we doing? Baking? Spinning up a web server, database. Be clearer are we frying or blending?

After clearly stating what we are doing, write out the directions. Include prep and cook time. For example it takes three hours to code this. Build time is half a day. Recipes state what occasion Holiday they are for. Also what category. Github tags make this easy, use them. Course Breakfast, lunch, Dinner. Quantity served. From these ideas Think the documentation thru to more useful documentation. Stack Overflow should not be your only documentation.

Pet Containers

Pet Containers
On git hub hackoregon/data-science-pet-containers are very useful Docker containers. Tools included are PostgreSQL, PostGIS, Anaconda and Rstudio. Add data and you are all set for exploring and developing. If you have any problems file an issue on git-hub. I am looking for what is missing from the directions.

large white dog
curious white dog

Technically Wrong

Written by Sara Wachter-Boettcher, Technically Wrong is the best popular press book written about miss-classification and algorithms.

The chapters in the book cover common issues and problems that should not be happening.

From Chapter 10, Technically Dangerous

page 196

Software is designed and coded by people not representing the general population …” The narrower those people’s perspective’s are, the more they design and code like themselves and shrug off any responsibility for outcome, the more inequality, insensitivity and hate can thrive ”

People die from classification errors.

Small choices matter.

 

 

 

SPSS and RStudio

What if you have a lot of old data files in SPSS form and you do not have  SPSS  software. You are in luck if you have Rstudio. Current versions of Rstudio load SPSS data files without needing a special library for the task. There are libraries for this task.

To load SPSS data:

Open RStudio, on menu bar choose File, scroll to Import Data set, choose from SPSS.

A window opens to load the data.

Enter file name or url into file/url box.

Preview data in preview window. If it looks okay push the import button. And the data is loaded into Rstudio ready for use.

This is a very useful feature of RStudio.

A Crash Course in Statistics

Written by Ryan J. Winter, A Crash Course in Statistics published by Sage Publications, 2018 is an easy to read, short, concise book.  It is just the right book when you want a quick overview of what you have forgotten about Statistics. The code in the book is SPSS  available at the book’s web-site at study.sagepub.com/winter .

Covers descriptive Statistics, Chi-Square, t-Test, and ANOVA.

The book can be used as a text-book as it has quizzes at the end of each chapter.

If you do not have SPSS Rstudio  will load the  SPSS  data files ready to use.

Statistical Analysis with Measurement Error or Misclassification

Written by Grace Y. Yi , Statistical Analysis with Measurement Error or Misclassification, published by Springer Science Business Media LLC 2017.

Is a treasure of a book to go with a coding book. It gives the what, why and how of Missing data , Measurement error and Misclassification.

Chapter 2 covers Measurement error, incorrect readings of precise measurement. For example reading a three as an eight.

Systemic error, Sampling error, statistical bias, each type of error has it own way of handling it. And often the data contains more than on type of error.

Naive estimators incur larger bias than than estimators obtained from valid metrics but the later ones entail more variation than the naive estimators.

Lots to think about, Chapter 9 asks a lot of good questions.

Use the most plausible method to handle missing, mis classified and  error prone data. The methods are well covered in the book.

This is a Stat’s book the key to the symbols is the beginning of the book.

It is know that ignoring measurement error can cause misleading results.

Hidden Inequalities in the Workplace

Publisher Springer International Publishing, 2018
Hidden Inequalities in the Workplace
Editors: Valerie Caven and Stefanos Nachmias

I have been commissioned by AONW to do a study on Ageism in the Workplace.  I am glad that I found this book while doing research. It is a timely book on difficult topics.

They make a business case for diversity: The real benefit assigned to diversity management is gaining competitive advantage and enhance performance thru human capital.

The Quality of Work Among Older Workers
Chapter 5 page 91
written by Christopher Lawton and Daniel Wheatly

This chapter sheds light into this under explored area of the labor market.  Concluding with that working into later life can bring benefits to society including; higher national output, lower unemployment, lower welfare costs and reduced health speeding.

Cognitive Biases in Recruitment, Selection and Promotion: The Risk of Subconscious Discrimination
written by Zara Whysall

This chapter states that despite documented benefits of workplace diversity, progress in achieving this has been slow.

This book has given me a lot to think about and a lot more to explore.

Tables with R

People around Thanksgiving table enjoying dinner
Thanksgiving Table 11/26/2009

Cirque du Soleil Kurios show has an act where they mirror a table. It is amazing to see people upside down mirroring a table.

R   programming language has several packages for doing tables with R. Basic has a function called table. Which is good enough. Sometimes you want more. At a meeting last night someone said pander was the best package. Someone else said that they liked htmltable better. Also there is xtable and tables. tables was written by someone to be like SAS PROC TABULATE.  Many choices, pick out the one that you understand the directions and meets your publishing needs. Better depends on your point of view.

table

tables

xtable

htmltable

pander

Thanksgiving Table right side up
table right side up