Color-coding groups for plots: string.to.colors function in fifer()

I used to hate color-coding plots. 'Twas a big pain. Let's say we're trying to plot the relationship between awesomeness and attractiveness in R versus. First, let's read in the R know-how and awesomeness dataset.

Let's peak under the “head”, shall we?

require(fifer)
d = read.csv("Awesomeness_Rknowhow.csv")
head(d)
##   Number.of.Friends R.Know.how           Club
## 1           0.46841    -0.7202 Mat Black Labs
## 2           0.01932     0.3678 Mat Black Labs
## 3           0.68488    -0.9006 Mat Black Labs
## 4          -1.15628     1.8706 Mat Black Labs
## 5          -0.76576    -0.1406 Mat Black Labs
## 6           0.15211    -1.4046 Mat Black Labs

Nice! And let's peak under the “tail.” (Okay, bad joke).

tail(d)
##     Number.of.Friends R.Know.how    Club
## 95            -0.6766    -1.4407 Rs-R-Us
## 96            -0.3809    -1.0437 Rs-R-Us
## 97             1.5243     2.4358 Rs-R-Us
## 98             0.3377    -0.3047 Rs-R-Us
## 99            -1.3506    -1.0900 Rs-R-Us
## 100           -1.4359    -1.6769 Rs-R-Us

Let's say we're at an uber geek convention where SAS, R, and Matlab users alike meet to…er…mingle and speak of common interests. Being the research-minded student you are, you decide to measure three traits of the convention participants: how many friends they have (okay, I know you can't have negative friends and you can't have a fraction of a friend. Stop being so critical!), how much they know about R, and which club they belong to–the Mat Black Labs or the R-R-Us(es). You then plot the relationship betwixt the two quantitative traits:


plot(d[,1:2], ylab="Number of Friends", 
xlab="R Know-How", xaxt="n", yaxt="n")

plot of chunk unnamed-chunk-3

What a jumbled mess! Then you remember that you forgot you measured the two groups…but how to plot them. Why, let's color-code them!

This is where the string.to.color function comes in. It requires a vector of strings as inputs (and an optional vector of colors–one for each unique grouping value) and it will output a string of colors (the same length as the original string). Let's take a look:

#### let's look at that vector of strings (or factors)
d$Club
##   [1] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##   [5] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##   [9] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##  [13] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##  [17] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##  [21] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##  [25] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##  [29] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##  [33] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##  [37] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##  [41] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##  [45] Mat Black Labs Mat Black Labs Mat Black Labs Mat Black Labs
##  [49] Mat Black Labs Mat Black Labs Rs-R-Us        Rs-R-Us       
##  [53] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
##  [57] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
##  [61] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
##  [65] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
##  [69] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
##  [73] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
##  [77] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
##  [81] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
##  [85] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
##  [89] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
##  [93] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
##  [97] Rs-R-Us        Rs-R-Us        Rs-R-Us        Rs-R-Us       
## Levels: Mat Black Labs Rs-R-Us

And now let's see what string.to.colors does

string.to.colors(d$Club, col=c("red", "blue"))
## colors colors colors colors colors colors colors colors colors colors 
##  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red" 
## colors colors colors colors colors colors colors colors colors colors 
##  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red" 
## colors colors colors colors colors colors colors colors colors colors 
##  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red" 
## colors colors colors colors colors colors colors colors colors colors 
##  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red" 
## colors colors colors colors colors colors colors colors colors colors 
##  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red"  "red" 
## colors colors colors colors colors colors colors colors colors colors 
## "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" 
## colors colors colors colors colors colors colors colors colors colors 
## "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" 
## colors colors colors colors colors colors colors colors colors colors 
## "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" 
## colors colors colors colors colors colors colors colors colors colors 
## "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" 
## colors colors colors colors colors colors colors colors colors colors 
## "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue" "blue"

So all it does is replace all the values of “Rs-R-Us” with “blue” and all the values of “Mat Black Labs” with “red.”

Now we can put that into the plot to tell R how we wanna display it:


plot(d[,1:2], ylab="Number of Friends", xlab="R Know-How", 
    xaxt="n", yaxt="n",
    col = string.to.colors(d$Club, col=c("orange", "purple")))
legend("topleft", legend=c("Mat Black Labs", "Rs-R-Us"), 
    text.col=c("orange", "purple"), bty="n")

plot of chunk unnamed-chunk-6

We can also “cheat” and use the string.to.colors function to use different symbols!

plot(d[,1:2], ylab="Number of Friends", 
    xlab="R Know-How", xaxt="n", yaxt="n",
    pch = as.numeric(string.to.colors(d$Club, col=c(11, 16))))
legend("topleft", legend=c("Mat Black Labs", "Rs-R-Us"), 
    pch=c(11, 16), bty="n")

plot of chunk unnamed-chunk-7

Neato, eh?

Leave a Reply