Processing CTD measurements in R with Oce and tidyverse packages

In the previous post, I illustrate how to process ship-based CTD data with oce¹ package in R² enviroment. We saw the power of this package in reading, summarizing and visualizing CTD data. The downside of oce package in my opinion is its strickest nature of relying on R base for data processing and plotting—preventing customization. Therefore, in this post I will illustrate how to tranforming oce dataset format into a tibble—a modern data frame. Once we have CTD measurements, we can use the power of tidyverse³ package to easy tidy⁴, manipulate⁵ CTD data and visualize⁶ the results in a more elegent way.

You need to install several packages (if not yet installed) that I am goint to use in this post.

require(oce)
require(ocedata)
require(sf)
require(leaflet)
require(tidyverse)

Since I need to process all the CTD stations at once, I loop them. Before I loop the files, the path of each file must be determined, which can easily done in R with the dir() function.

### Identify the list in the directory
files = dir(path = "./ctd18/", pattern = ".cnv", full.names = TRUE )

The loop through to read the CTD files from the nine stations. However, before we loop, a container to store the files during the looping process is required. I preallocated the file using the list() and then loop the process with a FOR function

## preallocate the container to store the individual 
ctd = list()

### Loop the CNV file and create a list of casted CTD
for (i in 1:length(files)){
  
ctd[[i]]  = read.ctd(files[i])%>%
  ctdTrim(method = "downcast")%>% # select downcast
  ctdDecimate(p = 1) # align to the same standard pressure
}

Once the list of CTD stations are created, each station can individually separated from a list and transform it from oce format to a tibble. This involves extracting CTD measurement of the first station, then use mutate() from dpyr package to adding station name, date of cast, longitude and latitude of station. I used the pipe operator (%>%) to chain the process.

Table 1 show the the randomly selected cleaned CTD measurements of Station AM00882 casted. For simplist, the data were aligned in the standard pressure interval of 20 meters.

## select variables of interest
ctd.tb = ctd.tb%>%
  select(cruise, station, date, lon,lat, 
         pressure, depth, temperature, 
         conductivity, salinity, oxygen, 
         fluorescence, turbidity = turbidity2)

## summarize the table of station 1
knitr::kable(ctd.tb%>%
               dplyr::slice(seq(1,500,20))%>%
               select(-c(cruise, station, depth, turbidity))%>%
               mutate(date = as.Date(date)), 
             digits = 2, caption = "Summary of CTD measurement at station AM00882 spaced at 20 meter interval", 
             col.names = c("Date", "Longitude", "Latitude","Pressure", "Temperature",
                           "Conductivity","Salinity", "O2", "Fluorescence"),
             align = "c")

Table 1: Summary of CTD measurement at station AM00882 spaced at 20 meter interval
Date	Longitude	Latitude	Pressure	Temperature	Conductivity	Salinity	O2	Fluorescence
2018-06-18	39.29	-5.02	0	NA	NA	NA	NA	NA
2018-06-18	39.29	-5.02	20	26.93	5.51	34.96	3.96	0.34
2018-06-18	39.29	-5.02	40	26.92	5.50	34.96	3.94	0.34
2018-06-18	39.29	-5.02	60	26.90	5.50	34.96	3.95	0.31
2018-06-18	39.29	-5.02	80	26.89	5.50	34.96	3.94	0.33
2018-06-18	39.29	-5.02	100	26.87	5.50	34.96	3.92	0.35
2018-06-18	39.29	-5.02	120	23.28	5.16	35.17	3.34	0.17
2018-06-18	39.29	-5.02	140	18.19	4.64	35.22	2.55	0.06
2018-06-18	39.29	-5.02	160	17.45	4.56	35.20	2.53	0.06
2018-06-18	39.29	-5.02	180	16.66	4.49	35.19	2.49	0.02
2018-06-18	39.29	-5.02	200	16.51	4.47	35.19	2.52	0.02
2018-06-18	39.29	-5.02	220	15.49	4.37	35.19	2.58	0.02
2018-06-18	39.29	-5.02	240	14.61	4.28	35.15	2.62	0.01
2018-06-18	39.29	-5.02	260	12.44	4.06	35.07	3.17	0.02
2018-06-18	39.29	-5.02	280	12.10	4.03	35.04	3.27	0.02
2018-06-18	39.29	-5.02	300	12.02	4.02	35.03	3.32	0.01
2018-06-18	39.29	-5.02	320	11.74	3.99	35.01	3.41	0.01
2018-06-18	39.29	-5.02	340	11.65	3.98	35.00	3.39	0.01
2018-06-18	39.29	-5.02	360	11.34	3.95	34.97	3.35	0.01
2018-06-18	39.29	-5.02	380	11.14	3.93	34.95	3.36	0.03
2018-06-18	39.29	-5.02	400	10.92	3.91	34.93	3.36	0.04

I then used ggplot2 package to make standard oceanographic profiles of temperature, salinity and oxygen (Figure 1).

temp = ggplot(data = ctd.tb, aes(x = temperature, y = pressure))+
  geom_path(col = "red")+
  scale_y_reverse()+
  scale_x_continuous(position = "top", breaks = seq(12,30,6))+
  theme_bw()+
  theme(axis.text = element_text(colour = "black", size = 10),
        axis.title = element_text(colour = "black", size = 12),
        panel.grid = element_blank(),
        panel.background = element_rect(colour = "black", fill = NULL))+
    labs(y="Pressure (dbar)",
         x=expression(Temperature~(~degree~C)))

 salinity = ggplot(data = ctd.tb, aes(x = salinity, y = pressure))+
  geom_path(col = "red")+
  scale_y_reverse()+
  scale_x_continuous(position = "top", breaks = seq(34.8,35.4,0.1))+
  theme_bw()+
  theme(axis.text = element_text(colour = "black", size = 10),
        axis.title = element_text(colour = "black", size = 12),
        panel.grid = element_blank(),
        panel.background = element_rect(colour = "black", fill = NULL))+
    labs(y="Pressure (dbar)",
         x= "Salinity")
 
 oxygen = ggplot(data = ctd.tb, aes(x = oxygen, y = pressure))+
  geom_path(col = "red")+
  scale_y_reverse()+
  scale_x_continuous(position = "top", breaks = seq(2.2,4,.5))+
  theme_bw()+
  theme(axis.text = element_text(colour = "black", size = 10),
        axis.title = element_text(colour = "black", size = 12),
        panel.grid = element_blank(),
        panel.background = element_rect(colour = "black", fill = NULL))+
    labs(y="Pressure (dbar)",
         x=expression(Oxygen~(~mlL^-1)))
 
 cowplot::plot_grid(temp, salinity, oxygen, nrow = 1)

Figure 1: Profile of Temperature, salinity and oxygen collected at Station AM00882

That is for a single station, but hydrographic section requires more than one stations stitched together to render smoothed profiles based on either time, distance, latitude or longitude. Therefore, I am going to loop through to make a tibble of several stations and then plot the hydrogphic section with ggplot2 package. In a nutshell, the process involves three steps. First, create empty container with NULL() function that will store individual tibble generate. Second, a dplyr’s bind_row() function was used to stitch to the bottom each generated file in a sequential order. Third, dplyr’s select() function was used to pick variable of interest and drop the rest. Note that the selection process is done outside the loop boundary, because this is done after the looping process is complete. You can also chain the selection process inside the loop and you will have no problem.

Table 2 show the stations, number of scan, the minimum and maximum depth in each station.

ctd.tb.all.summary = ctd.tb.all%>%
  group_by(station)%>%
  summarise(count = n(), 
            min.depth = min(depth, na.rm = TRUE), 
            max.depth = max(depth, na.rm = TRUE))

knitr::kable(ctd.tb.all.summary, digits = 2, align = "c", 
             col.names = c("Station", "Counts", 
                           "Minimum Depth (m)", 
                           "Maximum Depth (m)"),
             caption = "The summary of the nine stations")

Table 2: The summary of the nine stations
Station	Counts	Minimum Depth (m)	Maximum Depth (m)
AM00882	409	3.46	405.35
AM00883	489	3.14	484.76
AM00884	354	2.24	350.57
AM00885	66	2.36	64.94
AM00886	45	3.10	43.66
AM00887	28	2.45	26.77
AM00888	28	2.07	26.90
AM00889	47	3.44	45.68
AM00890	104	3.91	102.19

Using the distinct() function from dplyr package, I was able to identify the unique date and time of CTD cast and the longitude and latitude of each station (Table 3).

ctd.tb.all.info = ctd.tb.all%>%
  distinct(station, lon, lat, date)%>%
  separate(date, c("date", "time"), sep = " ")

knitr::kable(ctd.tb.all.info, digits = 2, align = "c", 
             col.names = c("Station", "Date of cast", "Time of cast", "Longitude", "Latitude"),
             caption = "The geographical locations, the date and time of casts of the nine stations")

Table 3: The geographical locations, the date and time of casts of the nine stations
Station	Date of cast	Time of cast	Longitude	Latitude
AM00882	2018-06-18	05:27:07	39.29	-5.02
AM00883	2018-06-18	13:32:50	39.24	-5.26
AM00884	2018-06-18	19:22:04	39.19	-5.44
AM00885	2018-06-19	00:03:26	39.11	-5.71
AM00886	2018-06-19	05:32:23	39.07	-5.93
AM00887	2018-06-19	09:54:46	38.99	-6.14
AM00888	2018-06-19	13:35:21	39.01	-6.26
AM00889	2018-06-19	18:28:52	39.24	-6.49
AM00890	2018-06-19	23:17:05	39.35	-6.67

Leaflet⁷ package developed by Cheng and others (2017) was used to create an interactive map using the geographical positions (longitude and latitude) of the CTD casts in coastal water of the Pemba and Zanzibar channel (Figure ??). Because the map is interactive, you can zoom and pan. You can also click on the symbol and a station name will popup.

leaflet(data = ctd.tb.all.info)%>%
  addTiles()%>%
  addMarkers(lng = ~lon, lat = ~lat, popup = ~station)

Figure ?? show the vertical structure of temperature for the nine stations. BUt the the hydrographic section leaves the gap between the stations. This is because the geom_raster() requirest equally spaced interval. To achive this, I used akima package to interpolate temperature based on equal spaced latitude and pressure. The figur ?? show the vertical structure of interpolated temperature from the surface to 40 meter deep of the the nine stations.

ggplot(data = ctd.tb.all%>%na.omit(), 
       aes(x = lat, y = pressure))+
  geom_raster(aes(fill = temperature), interpolate = FALSE)+
  scale_y_reverse()+
  scale_fill_gradientn(colours = oceColorsJet(120), name = "Temperature")+
  theme_bw()+
  theme(legend.key.height = unit(3.5, "lines"), 
        axis.text = element_text(colour = 1, size = 10),
        axis.title = element_text(colour = 1, size = 12),
        panel.grid = element_blank())+
  labs(x = "Latitude", y = "Pressure (dba)")

# temp.interp = interpBarnes(x = ctd.tb.all$lat, 
#                            y = ctd.tb.all$pressure, 
#                            z = ctd.tb.all$temperature)

ctd.tb.all = ctd.tb.all%>%na.omit()

temp.interp = akima::interp(x = ctd.tb.all$lat, 
              y = ctd.tb.all$pressure, 
              z = ctd.tb.all$temperature, 
              duplicate = "mean", nx = 200, ny = 200)

temp.interp = akima::interp2xyz(temp.interp)%>%as.tibble()%>%rename(latitude = x, pressure = y, temperature = z)%>%na.omit()%>%filter(pressure <=40 & pressure > 4)
  
  
  ggplot(data = temp.interp, aes(x = latitude, y = pressure))+
    geom_raster(aes(fill = temperature), interpolate = TRUE)+
    geom_contour(aes(z = temperature), col = 1)+
    scale_fill_gradientn(colours = oce.colorsJet(120), name = "Temperature")+
    scale_y_reverse()+
    theme_bw()+
    theme(legend.key.height = unit(3.5, "lines"), 
          axis.text = element_text(colour = 1, size = 10),
          axis.title = element_text(colour = 1, size = 12),
          panel.grid = element_blank())+
    labs(x = "Latitude", y = "Pressure (dba)")

Conclusion

This post illustrate the processing of oceanographic data in R using a combination of packages to achieve a standard plot of temperature profile and section.

Cited Articles

Kelley, D., & Richards, C. (2017). oce: Analysis of Oceanographic Data. R package version 0.9-22.↩
eam, R. C. (2015). R: A language and environment for statistical computing.↩
Wickham, H. (2017). Tidyverse: Easily install and load’tidyverse’packages. R package version, 1(1).↩
Wickham, H. (2016). tidyr: Easily Tidy Data with spread () and gather () Functions. Version 0.6. 0.↩
Wickham, H., Francois, R., Henry, L., & Müller, K. (2015). dplyr: A grammar of data manipulation. R package version 0.4, 3.↩
Wickham, H. (2016). ggplot2: elegant graphics for data analysis. Springer.↩
Cheng, J., Karambelkar, B., Xie, Y., Wickham, H., Russell, K., & Johnson, K. Leaflet: Create Interactive Web Maps with the JavaScript “Leaflet” Library (2017). URL https://CRAN. R-project. org/package= leaflet. R package version, 1(1), 134.↩