Home Page

Let’s start with a dataframe built from training purposes.

df = mtcars

Show the first rows

We use the head(df, n) function where df corresponds to the dataframe and n the number of rows we want to show.

head(df, n = 5)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

Show the last rows

We use the tail(df, n) function where df corresponds to the dataframe and n the number of rows we want to show.

tail(df, n = 5)
##                 mpg cyl  disp  hp drat    wt qsec vs am gear carb
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
## Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
## Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

Select a cell

We can use the df[row#,col#].

df[1,1]
## [1] 21

We can use the row name and column names.

df["Mazda RX4","mpg"]
## [1] 21

Select Rows or Columns

We use df[row, column] where row or column can be a number, a value, a vector of numbers, a vector of values. If for instance we need to show row 9 and 12 of column 1 and 2.

df[c(9,12), c(1,2)]
##             mpg cyl
## Merc 230   22.8   4
## Merc 450SE 16.4   8

We can achieve the same results with the row names and column names.

df[c("Merc 230", "Merc 450SE"), c("mpg","cyl")]
##             mpg cyl
## Merc 230   22.8   4
## Merc 450SE 16.4   8

Rename columns

We use the colnames(df) function where we can change the all the column names with a vector of names as shown in the example below.

colnames(df) = c("MPG", "CYL", "DISP", "HP", "DRAT", "WT", "QSEC", "VS", "AM", "GEAR", "CARB")
head(df, n = 5)
##                    MPG CYL DISP  HP DRAT    WT  QSEC VS AM GEAR CARB
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

We can also change a specific column name using the column number.

colnames(df)[2] = "CYL"

Convert row names into a column

We create a new column names CAR ($ separates the dataframe name from the column name) and copy the row names using the row.names() function. Then we put NULL in the row names.

df$CAR = row.names(df)
rownames(df) = NULL
head(df, n = 5)
##    MPG CYL DISP  HP DRAT    WT  QSEC VS AM GEAR CARB               CAR
## 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4         Mazda RX4
## 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4     Mazda RX4 Wag
## 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1        Datsun 710
## 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    Hornet 4 Drive
## 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 Hornet Sportabout

Apply filter

We use the which function on the rows of df to apply a condition. In the example we select the rows where MPG is greater than 25.

df2 = df[which(df$MPG > 25),]
df2
##     MPG CYL  DISP  HP DRAT    WT  QSEC VS AM GEAR CARB            CAR
## 18 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1       Fiat 128
## 19 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2    Honda Civic
## 20 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 Toyota Corolla
## 26 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1      Fiat X1-9
## 27 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2  Porsche 914-2
## 28 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2   Lotus Europa

If we need the car models with the MPG greater than 25.

df3 = df[which(df$MPG > 25), "CAR"]
df3
## [1] "Fiat 128"       "Honda Civic"    "Toyota Corolla" "Fiat X1-9"     
## [5] "Porsche 914-2"  "Lotus Europa"

If we need the car models with the MPG greater than 25 and GEAR equal 5.

df4 = df[which(df$MPG > 25 & df$GEAR == 5), "CAR"]
df4
## [1] "Porsche 914-2" "Lotus Europa"

Sort dataframe

We use the order function to sort the rows. We can combine multiple columns and we can sort the rows descending by using the - sign as shown in the example below.

df2 = df[which(df$MPG > 25),]
df2 = df2[order(df2$GEAR, -df2$HP),]
df2
##     MPG CYL  DISP  HP DRAT    WT  QSEC VS AM GEAR CARB            CAR
## 18 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1       Fiat 128
## 26 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1      Fiat X1-9
## 20 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 Toyota Corolla
## 19 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2    Honda Civic
## 28 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2   Lotus Europa
## 27 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2  Porsche 914-2