Merge dataframe

In this article, we will use the merge function to join 2 tables (dataframe). This is an equivalent of a join in SQL or “a kind of” vlookup in Excel. The 2 tables need a key (identifier) upon which the merge will operate. The objective is not to address all the possible uses of this function but just the most common usages.

Merge function

The main arguments of the merge function are:

x: name of the first table
y: name of the second table
by: name of the column containing the key. If the 2 tables have different column names, we will use by.x and by.y.
The type of join:
- if all.x is TRUE (left outer join), the ouput will contains all the table x rows even if there is no match with table y (red and purple).
- if all.y is TRUE (right outer join), the ouput will contains all the table y rows even if there is no match with table x (purple and blue).
- if all is TRUE (full outer join), the output will contains all the rows from table x and y even the rows with no match between the 2 tables (red, purple and blue).
- by default (inner join), the output will only contains rows with a match between the 2 tables (purple).

Example

Initial Tables

Let’s consider a first Table (Opportunity) containing a list of sales opportunities with Opportunity Number, Opportunity Value and the ID of the employee in charge.

##   Opportunity.Nb Employee.ID Opportunity.Value
## 1           8123         111              1000
## 2           8124         112              1500
## 3           8125         111             10000
## 4           8126         110                99

Let’s consider a second Table (Employee) containing the Employee Name and Employee ID. We ensured this table does not contain any duplicate.

##   Employee.ID Employee.Name
## 1         111          John
## 2         112          Mike
## 3         113          Mary
## 4         114          Jane
## 5         115           Bob

Merge all.x

If we need to find the name the employee in charge of each opportunity, we perform a merge of Opportunity and Employee keeping all the Opportunity rows (all.x = TRUE and all.y = FALSE). We note Employee does not contain any Employee ID 110, so no employee name was found for Opportunity Number 8126.

Table.all.x <- merge(x = Opportunity, y = Employee, by = "Employee.ID", all.x = TRUE,  all.y = FALSE)
Table.all.x

##   Employee.ID Opportunity.Nb Opportunity.Value Employee.Name
## 1         110           8126                99          <NA>
## 2         111           8123              1000          John
## 3         111           8125             10000          John
## 4         112           8124              1500          Mike

Merge all.y

If we keep all Employee rows (all.x = FALSE and all.y = TRUE), we obtain the opportunities for which the employee ID was found (Opportunity number 8126 is not in the output) but all employees contained in Employee are present in the output even if there is no corresponding opportunity in Opportunity.

Table.all.y <- merge(x = Opportunity, y = Employee, by = "Employee.ID", all.x = FALSE, all.y = TRUE )
Table.all.y

##   Employee.ID Opportunity.Nb Opportunity.Value Employee.Name
## 1         111           8123              1000          John
## 2         111           8125             10000          John
## 3         112           8124              1500          Mike
## 4         113           <NA>                NA          Mary
## 5         114           <NA>                NA          Jane
## 6         115           <NA>                NA           Bob

Merge all

If we keep all Opportunity and Employee rows (all = TRUE), the output contains all the opportunities and all the employees.

Table.all   <- merge(x = Opportunity, y = Employee, by = "Employee.ID", all   = TRUE                )
Table.all

##   Employee.ID Opportunity.Nb Opportunity.Value Employee.Name
## 1         110           8126                99          <NA>
## 2         111           8123              1000          John
## 3         111           8125             10000          John
## 4         112           8124              1500          Mike
## 5         113           <NA>                NA          Mary
## 6         114           <NA>                NA          Jane
## 7         115           <NA>                NA           Bob