Create folders and files with for-loop

As an environmentalist, I have worked with historical data and modeling. Some programs will need you to have individual folders and individual data files with certain extensions for each day, month, year, or study location that you want to analyze. What if your data is in a single file 🤯. Making 10files will be ok, but 2000 will take a while nonetheless; thanks to the power of R, we can do that in a few minutes ⏱.

What you will learn:

  • For-loops
  • Work with excel files
  • Create folder
  • Subset data
  • Write and save text files

Libraries

Generally, I like to work with CSV or TXT files. Nonetheless, it is common to have data in excel files. To avoid changing the format, I will use the package readxl to import .xlsx files.

#install.packages("readxl")
library(readxl)
library(tidyverse)
library(utils)

Excel files

The example file will have two locations, one sheet for each site. For each area, there are multiple variables with daily data between 1979 and 2100. You can get the Dataset for this tutorial

Objective: Have individual text files for each year and locations stored in the corresponding year folder.

Read excel files

Let’s begin by reading the file

data<- readxl::read_excel("climate_data.xlsx",sheet = 1,skip = 0)

With the read_excel you can specify the sheet as well as skip a few rows.

Subset your data

Now, we need to filter the data. We will select the first 4 columns and data from 2000.

dt1 <- dplyr::filter(data[,1:4],Year==2002) #if you get some problems when filtering years, you can use Year=factor(year)
head(dt1)
## # A tibble: 6 x 4
##   Year  Day   Maxtemp Mintemp
##   <chr> <chr>   <dbl>   <dbl>
## 1 2002  1       -15.6   -23.2
## 2 2002  2       -14.7   -21.5
## 3 2002  3       -11     -18.3
## 4 2002  4       -11.2   -19.4
## 5 2002  5       -11.2   -18
## 6 2002  6        -8.3   -17.5

There are many ways to subset your data, but one of the most straightforward ways is using filter from dplyr. As I did with data[,1:4], this specifies the dimensions that I want to use data[rows,collumns]. Tip: To remember which one goes first, think of a Red Chicken (Rows, Columns).

This is also one of my favorite packages because you can integrate extended code for data analysis and prosses.

Now that we selected our year, we can write the text file that we need.

write.table(dt1[,2:4], #remove additional variables from your filtered data
            file="station_1_year_2000.txt", #filemane of your file.
            #if nested folders then add the folder_name/file_name for each file
            col.names=T,row.names = F,quote = FALSE) #remove all the row.names and " " from col.names

For-loop

As I said before, we will have many years and locations, so we want to run that for all our data. When something is the same repeated, then for-loops are a great alternative.

study_year<-min(data$Year):max(data$Year) #store all the years in a new vector
tail(study_year) # have a look at last five variables
## [1] 2095 2096 2097 2098 2099 2100

Create folders

Let’s create our folders and have a simple example of what we will do with the for-loop. The structure that we need to follow is this for (variable in vector).

#for (variable in vector)
for(year in # year can be any text
    study_year[1:2]){ #our vector
  print(as.character(year)) # show me which year the loop is running
  dir.create(path = paste(year)) # create the folder named by year
}

Magic

It is common to use guides with for-loops to know which iteration the loop is doing; I used print to display a message.

for (j in 1:2) {  # how many pages you have 1:n pages
  data<-readxl::read_excel("climate_data.xlsx",sheet = j)
  filename = paste("station ",j,".txt",sep = "") #file name for each i year

  for (i in c(1:length(study_year))){ #i in as many years we got
  dt1 <- dplyr::filter(data[,1:4],Year==study_year[i]) #filter the data
  write.table(dt1[,2:4], #remove additional variables from your filtered data
            file="station_1_year_2000.txt", #filemane of your file.
            #if nested folders then add the folder_name/file_name for each file
            col.names=T,row.names = F,quote = FALSE) #remove all the row.names and " " from col.names
  print(paste("done with year",study_year[i],"Station",j)) # message
  }
}

Summary

Writing files is easy with R and will save you a lot of time. Some of the take away messages and points to follow are:

  • Read your long files 📚.
  • Filter your information 📋.
  • Create folders 📂.
  • Use a for-loop to filter and save to the right folder ✅.

These were some of the steps that I followed to create over 2000 files. You can replace the extension make sure to change the function, e.g., write.csv(). Hopefully, you have a long data set and you need to create new files soon. You got it under control, and it will just take a few minutes. If you liked this post, share it with your friends and on your social media. Constantly check my page for more practical and efficient R tutorials 👏.

Roberto Supe
Roberto Supe
PhD Student of Environmental Science

My research interests include data analysis, environmental pollution, risk assessment, climate change, and ecology.