Introduction
The goal of codeminer is to simplify working with clinical codes for research using electronic health records. The workflow is as follows:
- Create a local resource containing lookup and mapping tables for various clinical codings systems (e.g. ICD10 and Read codes)
- Build clinical code lists for conditions of interest by querying this resource
This vignette demonstrates the above using dummy data included with
the package. You can try out the steps either locally by installing
codeminer on your own machine, or online by clicking on the following
link to RStudio Cloud1 and navigating to this Rmd file in the
‘vignettes’ directory:
Also included are functions for mapping between different clinical
coding systems, and using Phecodes(Denny,
Bastarache, and Roden 2016; Wu et al. 2019) with UK Biobank data.
See vignettes vignette('MAP')
vignette('caliber') and vignette('phecodes')
for further information.
Build a local clinical codes lookup and mappings resource
The first step is to create a local database containing lookup and
mapping tables for various clinical coding systems using
build_database().
By default this will download the following resources:
UK Biobank resource 592 (Clinical coding classification systems and maps)
UK Biobank data codings file
Phecode lookup and mapping files (for ICD9 and ICD10 to phecode)
The tables are imported into R, reformatted, and stored as a named list of data frames:
# Create a temporary database with dummy data
(db_path <- create_dummy_database())
#> Creating new database at /tmp/Rtmp1fov3a/file24944e063b9b.duckdb
#> Reading 17 selected tables from UKB Resource 592
#>
#> Extending read_v2_drugs_bnf with BNF hierarchy and descriptions
#> Extending read_v2_icd10 by expanding ICD-10 code ranges
#> Adding tables to database
#> ✔ Lookup table BNF_UKB v4 added successfully.
#> ✔ Relationship table BNF_relationship_UKB v4 added successfully.
#> ✔ Lookup table DM+D_UKB v4 added successfully.
#> ✔ Lookup table ICD-9_UKB v4 added successfully.
#> ✔ Relationship table ICD-9_relationship_UKB v4 added successfully.
#> ✔ Lookup table ICD-10_UKB v4 added successfully.
#> ✔ Relationship table ICD-10_relationship_UKB v4 added successfully.
#> ✔ Mapping table ICD-9_ICD-10_UKB v4 added successfully.
#> ✔ Lookup table Read 2_UKB v4 added successfully.
#> ✔ Relationship table Read 2_relationship_UKB v4 added successfully.
#> ✔ Lookup table Read 2, drugs_UKB v4 added successfully.
#> ✔ Mapping table Read 2, drugs_BNF_UKB v4 added successfully.
#> ✔ Mapping table Read 2_ICD-9_UKB v4 added successfully.
#> ✔ Mapping table Read 2_ICD-10_UKB v4 added successfully.
#> ✔ Mapping table Read 2_OPCS4_UKB v4 added successfully.
#> ✔ Mapping table Read 2_Read 3_UKB v4 added successfully.
#> ✔ Lookup table Read 3_UKB v4 added successfully.
#> ✔ Mapping table Read 3_ICD-9_UKB v4 added successfully.
#> ✔ Mapping table Read 3_ICD-10_UKB v4 added successfully.
#> ✔ Mapping table Read 3_OPCS4_UKB v4 added successfully.
#> ✔ Mapping table Read 3_Read 2_UKB v4 added successfully.
#> ✔ Dummy database ready to use!
#> [1] "/tmp/Rtmp1fov3a/file24944e063b9b.duckdb"
Sys.getenv("CODEMINER_DB_PATH")
#> [1] "/tmp/Rtmp1fov3a/file24944e063b9b.duckdb"Setting the CODEMINER_DB_PATH environment variable
ensures that all subsequent codeminer calls will use this
database.
To persist the database across sessions, set the
CODEMINER_DB_PATH environment variable to a path on your
system, e.g. using usethis::edit_r_environ(scope = "project"):
# ./.Renviron
CODEMINER_DB_PATH=/path/to/codeminer-database.duckdb
Alternatively, if the environment variable is not set,
codeminer will store the database in a default location,
determined by rappdirs::user_data_dir().
The database is a duckdb database and can be inspected using the DBI package.
# connect to Duckdb database
con <- DBI::dbConnect(duckdb::duckdb(), db_path, read_only = TRUE)
DBI::dbListTables(con)
#> [1] "BNF_UKB v4" "BNF_relationship_UKB v4"
#> [3] "DM+D_UKB v4" "ICD-10_UKB v4"
#> [5] "ICD-10_relationship_UKB v4" "ICD-9_ICD-10_UKB v4"
#> [7] "ICD-9_UKB v4" "ICD-9_relationship_UKB v4"
#> [9] "Read 2, drugs_BNF_UKB v4" "Read 2, drugs_UKB v4"
#> [11] "Read 2_ICD-10_UKB v4" "Read 2_ICD-9_UKB v4"
#> [13] "Read 2_OPCS4_UKB v4" "Read 2_Read 3_UKB v4"
#> [15] "Read 2_UKB v4" "Read 2_relationship_UKB v4"
#> [17] "Read 3_ICD-10_UKB v4" "Read 3_ICD-9_UKB v4"
#> [19] "Read 3_OPCS4_UKB v4" "Read 3_Read 2_UKB v4"
#> [21] "Read 3_UKB v4" "_lookup_metadata"
#> [23] "_mapping_metadata" "_relationship_metadata"
# Close the connection when you're done
DBI::dbDisconnect(con)Note that manual interaction with the database should not be
necessary, codeminer will take care of this for you.
Build a clinical code list
Explore codes
Codes may be explored with:
-
CODES(): look up descriptions for a set of code in the given code system type
CODES(
codes = c("E10", "E11"),
code_type = "ICD-10"
)
#> ℹ Using 'UKB v4' as latest version
#> # A tibble: 2 × 14
#> code description ICD10_CODE USAGE USAGE_UK MODIFIER_4 MODIFIER_5 QUALIFIERS
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 E10 Type 1 diabe… E10 DEFA… 3 NA NA NA
#> 2 E11 Type 2 diabe… E11 DEFA… 3 NA NA NA
#> # ℹ 6 more variables: GENDER_MASK <chr>, MIN_AGE <chr>, MAX_AGE <chr>,
#> # TREE_DESCRIPTION <chr>, code_type <chr>, preferred_description <lgl>-
DESCRIPTION():search for codes that match a description
DESCRIPTION(pattern = "cyst", code_type = "ICD-10")
#> ℹ Using 'UKB v4' as latest version
#> ℹ Using 'UKB v4' as latest version
#> # A tibble: 2 × 14
#> code description ICD10_CODE USAGE USAGE_UK MODIFIER_4 MODIFIER_5 QUALIFIERS
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 L721 Trichilemmal… L72.1 DEFA… 3 NA NA NA
#> 2 N330 Tuberculous … N33.0 ASTE… 2 NA NA NA
#> # ℹ 6 more variables: GENDER_MASK <chr>, MIN_AGE <chr>, MAX_AGE <chr>,
#> # TREE_DESCRIPTION <chr>, code_type <chr>, preferred_description <lgl>