Title: | Create, Stow, and Read Data Packages |
---|---|
Description: | Data frame, tibble, or tbl objects are converted to data package objects using specific metadata labels (name, version, title, homepage, description). A data package object ('dpkg') can be written to disk as a 'parquet' file or released to a 'GitHub' repository. Data package objects can be read into R from online repositories and downloaded files are cached locally across R sessions. |
Authors: | Cole Brokamp [aut, cre, cph] |
Maintainer: | Cole Brokamp <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.0.9000 |
Built: | 2024-11-05 05:25:12 UTC |
Source: | https://github.com/cole-brokamp/dpkg |
Convert a data frame into a data package (dpkg
) by providing specific metadata in the arguments.
as_dpkg( x, name = deparse(substitute(x)), version = "0.0.0.9000", title = character(), homepage = character(), description = character() )
as_dpkg( x, name = deparse(substitute(x)), version = "0.0.0.9000", title = character(), homepage = character(), description = character() )
x |
a tibble or data frame |
name |
a lowercase character string consisting of only
|
version |
a character string representing a semantic version (e.g., "0.2.1") |
title |
a character string that is a title of the data package for humans |
homepage |
a valid URL that links to a webpage with code or descriptions related to creation of the data package |
description |
a character string (markdown encouraged!) of more details about how the data was created, including the data sources, references to code or packages used, relevant details for any specific columns, and notes about (mis)usage of the data |
name
should be specified, but if is not will be deparsed from code defining x
;
this might not result in a valid name
(e.g., when piping code to create a data frame)
a dpkg object
x <- as_dpkg(mtcars, name = "mtcars", title = "Motor Trend Road Car Tests") attr(x, "description") <- "This is a data set all about characteristics of different cars" attr(x, "homepage") <- "https://github.com/cole-brokamp/dpkg" x
x <- as_dpkg(mtcars, name = "mtcars", title = "Motor Trend Road Car Tests") attr(x, "description") <- "This is a data set all about characteristics of different cars" attr(x, "homepage") <- "https://github.com/cole-brokamp/dpkg" x
The release will be tagged at the current commit and
named according to the name
and version
of the dpkg.
The GITHUB_PAT
environment variable must be set and the working directory
must be inside of a git repository with a GitHub remote.
Trying to create more than one release from the current commit will result in an error.
dpkg_gh_release(x, draft = TRUE, generate_release_notes = FALSE)
dpkg_gh_release(x, draft = TRUE, generate_release_notes = FALSE)
x |
a data package ( |
draft |
logical; mark release as draft? |
generate_release_notes |
logical; include GitHub's auto-generated release notes below the description in the release body? |
the URL to the release (invisibly)
## Not run: dpkg_gh_release( as_dpkg(mtcars, version = "0.0.0.9000", title = "Foofy Cars", homepage = "https://github.com/cole-brokamp/dpkg", description = paste("# Foofy Cars\n", "This is a test for the [dpkg](https://github.com/cole-brokamp/dpkg) package.", collapse = "\n" ) ), draft = FALSE ) ## End(Not run) #> created release at: https://github.com/cole-brokamp/dpkg/releases/tag/mtcars-v0.0.0.9000
## Not run: dpkg_gh_release( as_dpkg(mtcars, version = "0.0.0.9000", title = "Foofy Cars", homepage = "https://github.com/cole-brokamp/dpkg", description = paste("# Foofy Cars\n", "This is a test for the [dpkg](https://github.com/cole-brokamp/dpkg) package.", collapse = "\n" ) ), draft = FALSE ) ## End(Not run) #> created release at: https://github.com/cole-brokamp/dpkg/releases/tag/mtcars-v0.0.0.9000
get the metadata associated with a data package
dpkg_meta(x)
dpkg_meta(x)
x |
a dpkg object |
a list of metadata key value pairs
x <- as_dpkg(mtcars, name = "mtcars", title = "Motor Trend Road Car Tests") attr(x, "description") <- "This is a data set all about characteristics of different cars" attr(x, "homepage") <- "https://github.com/cole-brokamp/dpkg" x dpkg_meta(x)
x <- as_dpkg(mtcars, name = "mtcars", title = "Motor Trend Road Car Tests") attr(x, "description") <- "This is a data set all about characteristics of different cars" attr(x, "homepage") <- "https://github.com/cole-brokamp/dpkg" x dpkg_meta(x)
read (meta)data from dpkg on disk
read_dpkg_metadata(x) read_dpkg(x)
read_dpkg_metadata(x) read_dpkg(x)
x |
path to data package ( |
for read_dpkg()
, a dpkg object; for read_dpkg_metadata()
, a list of metadata
d <- as_dpkg(mtcars, version = "0.1.0", title = "Motor Trend Road Car Tests") attr(d, "description") <- "This is a data set all about characteristics of different cars" attr(d, "homepage") <- "https://github.com/cole-brokamp/dpkg" write_dpkg(d, dir = tempdir()) |> read_dpkg() # geo objects are supported via the `geoarrow_vctr` in the geoarrow package library(geoarrow) sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf")) |> as_dpkg(name = "nc_data") |> write_dpkg(tempdir()) d <- read_dpkg(fs::path_temp("nc_data-v0.0.0.9000.parquet")) d # as a simple features collection d$geom <- sf::st_as_sfc(d$geom) sf::st_as_sf(d) # read just the metadata read_dpkg_metadata(fs::path_temp("nc_data-v0.0.0.9000.parquet"))
d <- as_dpkg(mtcars, version = "0.1.0", title = "Motor Trend Road Car Tests") attr(d, "description") <- "This is a data set all about characteristics of different cars" attr(d, "homepage") <- "https://github.com/cole-brokamp/dpkg" write_dpkg(d, dir = tempdir()) |> read_dpkg() # geo objects are supported via the `geoarrow_vctr` in the geoarrow package library(geoarrow) sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf")) |> as_dpkg(name = "nc_data") |> write_dpkg(tempdir()) d <- read_dpkg(fs::path_temp("nc_data-v0.0.0.9000.parquet")) d # as a simple features collection d$geom <- sf::st_as_sfc(d$geom) sf::st_as_sf(d) # read just the metadata read_dpkg_metadata(fs::path_temp("nc_data-v0.0.0.9000.parquet"))
stow
R user directoryUse stow to abstract away the process of downloading a file or a GitHub release asset to a user's data directory, only downloading files that have not already been downloaded.
stow_gh_release(owner, repo, dpkg, overwrite = FALSE) stow(uri, overwrite = FALSE) stow_url(url, overwrite = FALSE)
stow_gh_release(owner, repo, dpkg, overwrite = FALSE) stow(uri, overwrite = FALSE) stow_url(url, overwrite = FALSE)
owner |
string of repo owner |
repo |
string of repo name |
dpkg |
string of gh release tag (will be the same as the filename without the |
overwrite |
logical; re-download the remote file even though a local file with the same name exists? |
uri |
character string universal resource identifier; currently, must begin
with |
url |
a URL string starting with |
Supported URI prefixes include:
https://
, http://
: download from a file
gh://
: download a github release asset, formatted as gh://owner/repo/name
Stow downloads files to the users data directory; see ?tools::R_user_dir
.
Specify an alternative download location by setting the R_USER_DATA_DIR
environment variable.
The stow cache works by name only; that is, if a file with the same URI
has already been downloaded once, it will not be re-downloaded again
(unless overwrite = TRUE
).
path to the stowed file or url to github release
Sys.setenv(R_USER_DATA_DIR = tempfile("stow")) # get by using URL stow("https://github.com/geomarker-io/appc/releases/download/v0.1.0/nei_2020.rds", overwrite = TRUE) |> readRDS() # will be faster (even in later R sessions) next time stow("https://github.com/geomarker-io/appc/releases/download/v0.1.0/nei_2020.rds") |> readRDS() # get a data package from a GitHub release stow("gh://cole-brokamp/dpkg/mtcars-v0.0.0.9000", overwrite = TRUE) |> arrow::read_parquet() stow("gh://cole-brokamp/dpkg/mtcars-v0.0.0.9000") |> arrow::read_parquet()
Sys.setenv(R_USER_DATA_DIR = tempfile("stow")) # get by using URL stow("https://github.com/geomarker-io/appc/releases/download/v0.1.0/nei_2020.rds", overwrite = TRUE) |> readRDS() # will be faster (even in later R sessions) next time stow("https://github.com/geomarker-io/appc/releases/download/v0.1.0/nei_2020.rds") |> readRDS() # get a data package from a GitHub release stow("gh://cole-brokamp/dpkg/mtcars-v0.0.0.9000", overwrite = TRUE) |> arrow::read_parquet() stow("gh://cole-brokamp/dpkg/mtcars-v0.0.0.9000") |> arrow::read_parquet()
get info about stowed files
get the path to a stowed file (or the stow directory)
test if a stowed file (or the stow directory) exists
get the size of a stowed file
remove a stowed file (or the stow entire directory)
stow_info(filename = NULL) stow_path(filename = NULL) stow_exists(filename = NULL) stow_size(filename = NULL) stow_remove(filename = NULL, .delete_stow_dir_confirm = FALSE)
stow_info(filename = NULL) stow_path(filename = NULL) stow_exists(filename = NULL) stow_size(filename = NULL) stow_remove(filename = NULL, .delete_stow_dir_confirm = FALSE)
filename |
character filename of stowed file; if NULL, then information about all stowed files or the directory where files are stowed is returned |
.delete_stow_dir_confirm |
set to TRUE in order to delete the entire stow directory without interactive user confirmation |
for stow_info()
, a tibble of file or folder information;
for stow_path()
, a character path to the stowed file or stow directory;
for stow_exists()
, a logical;
for stow_size()
, a fs::
Sys.setenv(R_USER_DATA_DIR = tempfile("stow")) stow_path() stow("https://github.com/geomarker-io/appc/releases/download/v0.1.0/nei_2020.rds") stow_path("nei_2020.rds") stow_exists("nei_2020.rds") stow_size("nei_2020.rds") stow("https://github.com/geomarker-io/appc/releases/download/v0.1.0/nei_2017.rds") stow_info("nei_2017.rds") stow_info() stow_size() stow_remove(.delete_stow_dir_confirm = TRUE)
Sys.setenv(R_USER_DATA_DIR = tempfile("stow")) stow_path() stow("https://github.com/geomarker-io/appc/releases/download/v0.1.0/nei_2020.rds") stow_path("nei_2020.rds") stow_exists("nei_2020.rds") stow_size("nei_2020.rds") stow("https://github.com/geomarker-io/appc/releases/download/v0.1.0/nei_2017.rds") stow_info("nei_2017.rds") stow_info() stow_size() stow_remove(.delete_stow_dir_confirm = TRUE)
The badge relies on shields.io for the images, which will always display to the most recently released version and will link to the releases specific to the dpkg name.
use_dpkg_badge(x)
use_dpkg_badge(x)
x |
a data package ( |
Note that this relies on the structure of the release created with
dpkg_gh_release()
, but relies on a dpkg object before it is released.
This will lead to broken release badges and links until an initial
dpkg release is created with dpkg_gh_release()
.
character string of markdown
## Not run: as_dpkg(mtcars, version = "0.0.0.9000", title = "Foofy Cars", homepage = "https://github.com/cole-brokamp/dpkg", description = paste("# Foofy Cars\n", "This is a test for the [dpkg](https://github.com/cole-brokamp/dpkg) package.", collapse = "\n" ) ) |> use_dpkg_badge() ## End(Not run)
## Not run: as_dpkg(mtcars, version = "0.0.0.9000", title = "Foofy Cars", homepage = "https://github.com/cole-brokamp/dpkg", description = paste("# Foofy Cars\n", "This is a test for the [dpkg](https://github.com/cole-brokamp/dpkg) package.", collapse = "\n" ) ) |> use_dpkg_badge() ## End(Not run)
write dpkg to disk
write_dpkg(x, dir)
write_dpkg(x, dir)
x |
a data package ( |
dir |
path to directory where dpkg parquet file will be written |
path to the written file, invisibly