I’m super stoked to announce {geotargets} version 0.2.0! The {geotargets} package extends {targets} to work with geospatial data formats.
I’d like to firstly acknowledge the strong work by Eric Scott on getting this release ready. I do want to emphasise that while post is on my website, this project is very much a team effort.
You can download {geotargets} from the R universe like so:
install.packages("geotargets", repos = c("https://njtierney.r-universe.dev", "https://cran.r-project.org"))
Why should I use geotargets and targets?
You could benefit from using targets and geotargets if you do geospatial data analysis involving rasters or shapefiles, specifically with terra or stars R packages. For example, if you are doing large downloads of rasters, then operations like cropping, reprojection, resampling, and masking.
The main benefit to using targets and geotargets is your analysis will only rerun when you change relevant parts of your data analysis. For example, you might do a lot of geospatial data processing that feeds downstream into a machine learning model to make predictions on bushfire risk. Writing with targets and geotargets means if you change the parts of the code that related to the machine learning components, then only the relevant parts with machine learning would change. This means you can save time by avoiding running computational expensive spatial data processing.
For more details on what targets is, and why we need geotargets, I would recommend reading the 0.1.0 release blog post, as well as reading the {targets} manual. The “Get started in 4 minutes” guide to targets is also excellent.
Main changes in 0.2.0?
In addition to smaller changes and improvements, there are three main additions in this release:
- Support for dynamic branching.
- Preserving spatRaster metadata.
- Support of
stars
andstars_proxy
.
And very in very exciting news, we have a new hex sticker!
Thanks to Hubert Hałun for their work on getting this together, we are really happy with the new sticker!
Dynamic Branching
The main addition in this release is a demonstration of using dynamic branching using a new “target factory” function, tar_terra_tiles()
. This allows you to break raster operations into tiles, and then perform these operations on the tiles and combine them together. This means we can break computationally intensive raster operations that work in pixel-wise manner over tiled subsets of the raster. This is useful when, for example, loading an entire raster into memory and doing computations on it results in out of memory errors.
As part of this addition, we created helper functions:
These help us define different extents that we can pass along as different parts of the dynamic branches. You can think of these as tools that we can use to specify how to slice, or tile up, a raster into smaller pieces that we can then do analysis on separately and combine later.
Let’s briefly unpack these, and then show how these would be used in dynamic branching. First let’s read in some example elevation data from terra and plot it:
f <- system.file("ex/elev.tif", package="terra")
r <- rast(f)
plot(r)
tile_n()
We can use tile_n()
, which is the simplest of the three. It produces about n
tiles in a grid.
r_tile_4 <- tile_n(r, 4)
#> creating 2 * 2 = 4 tile extents
r_tile_4
#> [[1]]
#> xmin xmax ymin ymax
#> 5.741667 6.141667 49.816667 50.191667
#>
#> [[2]]
#> xmin xmax ymin ymax
#> 6.141667 6.533333 49.816667 50.191667
#>
#> [[3]]
#> xmin xmax ymin ymax
#> 5.741667 6.141667 49.441667 49.816667
#>
#> [[4]]
#> xmin xmax ymin ymax
#> 6.141667 6.533333 49.441667 49.816667
# some plot helpers
rect_extent <- function(x, ...) {
rect(x[1], x[3], x[2], x[4], ...)
}
plot_extents <- function(x, ...) {
invisible(lapply(x, rect_extent, border = "hotpink", lwd = 2))
}
plot(r)
plot_extents(r_tile_4)
plot(r)
tile_n(r, 6) |> plot_extents()
#> creating 2 * 3 = 6 tile extents
tile_grid()
For more control, use tile_grid()
, which allows specification of the number of rows and columns to split the raster into. Here we are specify that we want three columns and 1 row:
r_grid_3x1 <- tile_grid(r, ncol = 3, nrow = 1)
r_grid_3x1
#> [[1]]
#> xmin xmax ymin ymax
#> 5.741667 6.008333 49.441667 50.191667
#>
#> [[2]]
#> xmin xmax ymin ymax
#> 6.008333 6.266667 49.441667 50.191667
#>
#> [[3]]
#> xmin xmax ymin ymax
#> 6.266667 6.533333 49.441667 50.191667
plot(r)
plot_extents(r_grid_3x1)
plot(r)
tile_grid(r, ncol = 2, nrow = 3) |> plot_extents()
tile_blocksize()
The third included helper is tile_blocksize()
, which tiles by file block size. The block size is a property of raster files, and is the number of pixels (in the x and y direction) that is read into memory at a time. Tiling by multiples of block size may therefore be more efficient because only one block should need to be loaded to create each tile target. You can find the blocksize with fileBlocksize
:
fileBlocksize(r)
#> rows cols
#> [1,] 43 95
This tells us that it reads in the raster in 43x95 pixel sizes.
The tile_blocksize
function is similar to tile_grid
, except instead of saying how many rows and columns, we specify in units of blocksize.
If we just run tile_blocksize()
on r
we get the extents of the specified blocksize:
tile_blocksize(r)
#> [[1]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.833333 50.191667
#>
#> [[2]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.475000 49.833333
#>
#> [[3]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.441667 49.475000
Which is the same as specifying blocksize for row and column at unit 1:
r_block_size_1x1 <- tile_blocksize(r, n_blocks_row = 1, n_blocks_col = 1)
r_block_size_1x1
#> [[1]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.833333 50.191667
#>
#> [[2]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.475000 49.833333
#>
#> [[3]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.441667 49.475000
plot(r)
plot_extents(r_block_size_1x1)
Here the block size is the same size for the first two blocks, and then a much more narrow block. This is different to the two other tile methods.
Here the column block size is the full width of the raster.
So we could instead have the blocksize extent be written out to 2 blocks in a row, and 1 block size for the columns:
r_block_size_2x1 <- tile_blocksize(r, n_blocks_row = 2, n_blocks_col = 1)
r_block_size_2x1
#> [[1]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.475000 50.191667
#>
#> [[2]]
#> xmin xmax ymin ymax
#> 5.741667 6.533333 49.441667 49.475000
plot(r)
plot_extents(r_block_size_2x1)
This only works when the SpatRaster
points to a file—in-memory rasters have no inherent block size.
sources(r)
#> [1] "/Users/nick/Library/R/arm64/4.4/library/terra/ex/elev.tif"
#force into memory
r2 <- r + 0
sources(r2)
#> [1] ""
#this now errors
tile_blocksize(r2)
#> Error: [aggregate] values in argument 'fact' should be > 0
For more detail on using this in targets, please see the geotargets vignette, “Dynamic branching with raster tiles”
Preserving spatRaster metadata
tar_terra_rast()
gains a preserve_metadata
option that when set to "zip"
reads/writes targets as zip archives that include aux.json “sidecar” files sometimes written by terra
(#58).
Support of stars
and stars_proxy
We have created tar_stars()
and tar_stars_proxy()
that create stars
and stars_proxy
objects, respectively. These are currently experimental.
Minor changes in 0.2.0
Other changes include:
- Created utility function
set_window()
mostly for internal use withintar_terra_tiles()
. - Removes the
iteration
argument from alltar_*()
functions.iteration
now hard-coded as"list"
since it is the only option that works (for now at least). - Added the
description
argument to alltar_*()
functions which is passed totar_target()
. - Suppressed the warning “[rast] skipped sub-datasets” from
tar_terra_sprc()
, which is misleading in this context (#92, #104). - Requires GDAL 3.1 or greater to use “ESRI Shapefile” driver in
tar_terra_vect()
(#71, #97) geotargets
now requirestargets
version 1.8.0 or higherterra
(>= 1.7.71),withr
(>= 3.0.0), andzip
are now required dependencies ofgeotargets
(moved fromSuggests
toImports
)
What’s next?
We have finished developing the main milestones for geotargets, but will continue actively developing it. Soon we will be submitting the package for review by rOpenSci, and subsequently submit the work to the Journal of Open Source Software (JOSS), and then submit to CRAN.
Currently, the next release will focus on adding support for:
You can see the full list of issues for more detail on what we are working on.
Thanks
We would like to thank the R Consortium for generously supporting this project, “{geotargets}: Enabling geospatial workflow management with {targets}".
We would also like to thank Michael Sumner, Anthony North, and Miles McBain for their helpful discussions throughout, as well as Will Landau for writing targets, and being incredibly responsive and helpful to the issues and questions we have asked as we wrote {geotargets}.