Basic Commands

Scalars, Vectors, and Matrices

Let me start by informing you how much you know. You already know what a scalar and a vector are. Remember the variable “hello”? That was a scalar. A scalar is a variable that contains only one value (in that case it was 107). The variable “weight” on the other hand, was a vector–it contained multiple values. There is still one other type of variable: a matrix. Aside from being a deceptive reality that people like Neo and Morpheus battle to overthrow, a matrix also has special meaning in R. When you think of a matrix, think of a spreadsheet. If you’re unfamiliar with what a spreadsheet is, think of a table with columns and rows. Vectors don’t have columns and rows, it’s just a list of numbers. Matrices, on the other hand, do.

Let’s try to get a visual of what a matrix looks like. In your file that you created (the one that I called “weight”), write the following after the line that plots the data

weightData = c(143, 137, 137, 131, 129, 125, 125, 124, 124, 120)
matrixWeightData = matrix(weightData, nrow=5, ncol=2)
matrixWeightData

After that, highlight the lines that you just wrote, then hit command-enter. On the left window, you should see the following

##      [,1] [,2]
## [1,]  143  125
## [2,]  137  125
## [3,]  137  124
## [4,]  131  124
## [5,]  129  120

Before you get overwhelmed by what I just did, let me talk in R. Here’s what R heard, “Alright R, I want you to create a new vector called weightData. Give it the values 143, 137, 137, 131, 129, 125, 125, 124, 124, and 120. Then, R, I want you to create a matrix called matrixWeightData that contains the exact same information as weightData, but I want you to give the matrix 5 rows and 2 columns. Got it R? Ok, then I want you to show me what matrixWeightData looks like. Now, go!”

Notice that both the variables “weight” and “matrixWeightData” contain the exact same information. The only difference is that weight is a vector, and matrixWeightData is a matrix. So, now so you can see them right next to each other, type the following. Notice how I’m typing it into the left window because I feel no need to save this information, but you can if you want to.

weight
##  [1] 143 137 137 131 129 125 125 124 124 120
matrixWeightData
##      [,1] [,2]
## [1,]  143  125
## [2,]  137  125
## [3,]  137  124
## [4,]  131  124
## [5,]  129  120

Again, the variables have exactly the same information, but one is contained as a vector and the other is contained as a matrix.

First, you need to download the workout dataset. Store that file in the same folder that you saved your “weight” script in. Now, make sure R is open, then I want you to select click on the menu called “Misc” then click on “Change Working Directory” just like the picture below.

Next, navigate to the location where you have been using your R files. For me, that was located in the documents folder. If you’re on a PC, you’ll have to first click on the left window, then click on File->Change Dir…

Here’s what you just did. By default, if you ask R to find a particular file, it will search wherever its default directory is. That’s a problem when it doesn’t default to the folder where your file is. All we did was change it’s default directory. Now when you tell R to find the file “workout.csv,” it will know where to find it.

Now that we’ve changed the default directory, let’s tell R to import the file. I’ll be using my right window so I can edit it later. You can continue using the same script that you used before, or you can create a new one.

weightLoss = read.csv("workout.csv")
head(weightLoss)

Here’s what I’m telling R for the first line of code: “Hey R, in the default directory you should find a file called workout.csv.’ Open that file and put all of its contents in a variable called weightLoss.”

The second line of code (i.e., head(weightLoss)) simply tells R to return the first 7 rows of the data. Now if I run the code, the left the window will show

weightLoss = read.csv("workout.csv")
head(weightLoss)
##   ExerciseHours WeightLoss
## 1           6.1        2.7
## 2           4.8        2.7
## 3           4.6        2.7
## 4           4.1        1.2
## 5           4.3        3.5
## 6           4.9        2.5

So now we’ve got a matrix called “weightLoss” that has two columns: one records how many hours a week a person exercised, and the second row records their weight loss for that week. This is a matrix because it has both rows and columns.

You can always use R to read in data if it comes in csv form. If your data do not come in csv form, then you’ll have to use Excel to convert it to csv. R doesn’t handle Excel files very well.

Regression

Let’s continue to work with the dataset you imported (i.e., weightLoss). First let’s compute the mean of the two variables

mean(weightLoss$ExerciseHours) ## [1] 4.953 mean(weightLoss$WeightLoss)
## [1] 3.09

Be careful to watch capitalization–R is case sensitive. Also, you’ll notice that I’m only showing you the output (i.e., the left-side window). I’m actually writing it in my right window, but am only showing what happens in the left window to save space.

So, the mean amount of time spent exercising is around 4.95 hours. Also, the average amount of weight loss was around 3, 3, 3, 1, 4, 2, 3, 3, 4, 3, 4, 4, 2, 3, 2, 1, 4, 3, 4, 3, 4, 4, 3, 4, 3, 2, 5, 3, 5, 2 pounds. Let’s see how large the sample is. To do that, we’ll use the function “nrow,” which is short for number of rows (which is the sample size).

nrow(weightLoss)

So, there are rnrow(weightLoss) individuals who participated in this study. Let’s compute the correlation. To do so, we’ll use the function “cor.”

cor(weightLoss)

That function returns a correlation matrix. The correlation is moderately high, with a value around 0.58.

Now let’s go ahead and run a regression analysis. To do that we write

lm(WeightLoss~ExerciseHours, data=weightLoss)
##
## Call:
## lm(formula = WeightLoss ~ ExerciseHours, data = weightLoss)
##
## Coefficients:
##   (Intercept)  ExerciseHours
##        -0.913          0.808

Sometimes R defaults to outputting weird things when you run a function. We can be a little more specific about what we want by assigning the regression model to an object. For example

model = lm(WeightLoss~ExerciseHours, data=weightLoss)

Now R stores all the information about the regression into an object called “model.” We can now ask R to report several things such as

##### output a summary of the model
summary(model)
##
## Call:
## lm(formula = WeightLoss ~ ExerciseHours, data = weightLoss)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -1.6893 -0.4752 -0.0505  0.6501  1.2723
##
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)     -0.913      1.066   -0.86  0.39901
## ExerciseHours    0.808      0.213    3.79  0.00073 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.781 on 28 degrees of freedom
## Multiple R-squared:  0.339,  Adjusted R-squared:  0.316
## F-statistic: 14.4 on 1 and 28 DF,  p-value: 0.000735
##### give me just the intercept and slope of the model
model$coefficients ## (Intercept) ExerciseHours ## -0.9126 0.8081 ##### give me the conditional variance summary(model)$sigma
## [1] 0.7814

Notice the use of the pound signs (####). That tells R to ignore everything on that line. In other words, they are simply comments to myself.

cf = round(model\$coefficients, digits=4)

Using the results from $$model\coefficients$$, we see that the best-fitted regression equation is $$\hat{\text{Weight Loss}} = -0.9126 + 0.8081\times\text{Exercise}$$. In other words, with no exercise, we’re expected to lose approximately -0.9 pounds (i.e, we’re expected to gain a little bit). For every hour we exercise, we’re expected to lose about 0.8 pounds.

Let’s go ahead and look at a scatterplot of the data with a regression line in red.

plot(weightLoss)
abline(model, col="red")

The first line tells R to plot the pairs of datapoints. The second code (abline….) tells R to plot a line from a to b (hence, abline) based on the object called “model.” Remember that this object (model) contains the results of the regression equation. Somehow, R knows in the background that it’s supposed to plot a line. Then, I told it to plot the line in red.

Functions

A function is a set of instructions to the computer. It receives input, then spits out output. For example, we used the function “mean” to compute the mean of the weight dataset. It received an input (the vector called weight) and spit out an output (the mean). Also the plot function received an input (a vector or a matrix) and returned an output (a graph).

Sometimes a function returns multiple outputs such as the $$lm$$ function. (Recall that it spit out the slope and intercept parameters, the conditional variance, a summary, etc.) Also, sometimes functions require multiple inputs. Again, the $$lm$$ function was one such example (we had to input a regression equation and the dataset).

The Table below lists some of the functions we have learned so far. In one column we show what the inputs are and in the other we show what the outputs are.

Sometimes, however, you may forget what the inputs and/or outputs are. There’s a simple way to access that information. Let’s see what the inputs/outputs are for the cor function.

?cor`

Notice when you did that, either a window popped up or a new webpage in your browser appeared. It should look like this:

Whenever you put a question mark in front of a function name then run the command, R will automatically bring up the documentation for it. The description is obvious–it tells you what the function does and may give some other relevant information. The Usage section tells what arguments the function takes. You’ll notice it says under the cor function “x, y = NULL,” etc. If you’re ever confused about what an argument means, then you can read in the Arguments section. For example, if we were unsure of what “x” was supposed to be, we would read, “a numeric vector, matrix, or data frame.” Notice that, although we only supplied one argument before (i.e., the weight vector or the weightLoss matrix) there are several more arguments we could have passed it. If you’re interested in what those arguments are, please read on.

If you scroll down, you will notice another section called “Examples”. Not surprisingly, this section will give you examples about how to use the “cor” function.

The important point to take away from this is that any function you use has arguments (inputs) and returns a result (outputs). If you ever have questions about what inputs/outputs are attached to a function or how to use it, type a question mark before the function and run the command.

Next, R Packages.

One thought on “Basic R Commands”

1. Pingback: The R Interface | Dustin Fife