CSV (comma-separated values) file are very common to exchange data from one system or application to another. The data is stored a plain text with no compression and no size limitations compared to some other formats.
Let’s use the read.csv
function with three arguments:
"D:\\Data\\Example.csv"
)header = TRUE
)na.strings = c("", "NaN", "#N/A", "#REF!"))
)Note:
na.strings
list without it (it is part of the default list otherwise).MyDataframe = read.csv("D:\\Data\\Example.csv", header = TRUE, na.strings = c("", "NaN", "#N/A", "#REF!"))
If we want the file to be selected during the execution of the script, we can replace the path of the file by the command file.choose()
as shown below. A pop-up window will invite the user to select the file.
MyDataframe = read.csv(file.choose(), header = TRUE, na.strings = c("", "NaN", "#N/A", "#REF!"))
Col A | Col B | Col C | 1 Col D |
---|---|---|---|
a | b | 3 | |
f | d | 4 |
Col.A | Col.B | Col.C | X1.Col.D |
---|---|---|---|
a | b | 3 | NA |
f | d | 4 | NA |
After importing a CVS file, we should verify the numbers have been correctly interpreted (it works fine in most cases).
To export a dataframe to a CVS file, let’ use the write.csv
function with three arguments:
MyDataFrame
)"D:\\Data\\Export.csv"
)row.names = FALSE
)If we did not rename the columns, the csv file will use the the column names of the dataframe (with “.” replacing spaces, etc.) and the blank cells will contain “NA”.
write.csv(MyDataframe, "D:\\Data\\Export.csv", row.names = FALSE)
Tips: we can add the date to the file name by pasting the system date. For that we will use:
paste
function to paste strings: the folder, the date and the file nameSys.time
function to obtain the date from the system and use the “YYYY-MM-DD” naming conventionwrite.csv(MyDataframe,
paste("D:\\Data\\", format(Sys.time(), "%Y-%m-%d"), " Export.csv", sep = ""),
row.names = FALSE)
To import an Excel spreadsheet let’s use the read_excel
function part of the readxl
package with the following arguments:
library(readxl)
MyDataframe2 = read_excel("D:\\Data\\Example.xlsx", sheet = 1)
When we import from an Excel spreadsheet, the column names have not changed: they still contains spaces and they can start with a digit. The blank cells are marked NA.
Col A | Col B | Col C | 1 Col D |
---|---|---|---|
a | b | 3 | NA |
f | d | 4 | NA |
In the example above, we imported the first tab of the file but we could have used the name of the tab as shown below.
library(readxl)
MyDataframe2 = read_excel("D:\\Data\\Example.xlsx", sheet = "Example")
We will simply use the write_xlsx
from the writexl
package with the following arguments:
library(writexl)
write_xlsx(MyDataframe, "D:\\Data\\Export.xlsx")