Introduction
The goal of codeminer is to simplify working with clinical codes for research using electronic health records. The workflow is as follows:
- Create a local resource containing lookup and mapping tables for various clinical codings systems (e.g. ICD10 and Read codes)
- Build clinical code lists for conditions of interest by querying this resource
This vignette demonstrates the above using dummy data included with the package.
Also included are functions for mapping between different clinical
coding systems, and using Phecodes(Denny,
Bastarache, and Roden 2016; Wu et al. 2019) with UK Biobank data.
See vignettes vignette('MAP')
vignette('caliber') and vignette('phecodes')
for further information.
Build a local clinical codes lookup and mappings resource
The first step is to create a local database containing lookup and
mapping tables for various clinical coding systems using
build_database().
By default this will download the following resources:
UK Biobank resource 592 (Clinical coding classification systems and maps)
UK Biobank data codings file
Phecode lookup and mapping files (for ICD9 and ICD10 to phecode)
The tables are imported into R, reformatted, and stored as a named list of data frames:
# Create a temporary database with dummy data
(db_path <- create_dummy_database())
#> ✔ Dummy database ready to use!
#> [1] "/tmp/RtmpJKRKUk/file255d350072c5.duckdb"
Sys.getenv("CODEMINER_DB_PATH")
#> [1] "/tmp/RtmpJKRKUk/file255d350072c5.duckdb"codeminer resolves the database location using the
following precedence:
- The
CODEMINER_DB_PATHenvironment variable, if set - A default location determined by
rappdirs::user_data_dir()
To persist the database location across sessions, set
CODEMINER_DB_PATH in your .Renviron,
e.g. using usethis::edit_r_environ(scope = "project"):
# ./.Renviron
CODEMINER_DB_PATH=/path/to/codeminer-database.duckdb
Alternatively, you can point codeminer at a specific
database file with codeminer_connect():
codeminer_connect(main = "/path/to/codeminer-database.duckdb")The database is a duckdb database.
codeminer manages the database connection automatically —
you don’t need to connect or disconnect manually. You can check the
current connection status with codeminer_status():
codeminer_status()
#> ℹ Workbench active
#> Main: /tmp/RtmpJKRKUk/file255d350072c5.duckdb
#> Extra: not attachedBuild a clinical code list
Explore codes
Codes may be explored with:
-
CODES(): look up descriptions for a set of code in the given code system type
CODES(
codes = c("E10", "E11"),
type = "ICD-10"
)
#> ℹ Using 'UKB v4' as latest version
#> <codeminer_codelist>: 2 codes
#>
#> Code type: "ICD-10"
#> # A tibble: 2 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 E10 Type 1 diabetes mellitus ICD-10
#> 2 E11 Type 2 diabetes mellitus ICD-10-
DESCRIPTION():search for codes that match a description
DESCRIPTION(pattern = "cyst", type = "ICD-10")
#> <codeminer_codelist>: 2 codes
#> Code type: "ICD-10"
#>
#> # A tibble: 2 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 L721 Trichilemmal cyst ICD-10
#> 2 N330 Tuberculous cystitis ICD-10Managing tables
Adding tables
You can add custom lookup, mapping, and relationship tables to the
database with add_lookup_table(),
add_mapping_table(), and
add_relationship_table(). Each requires a data frame and a
metadata object created with the corresponding *_metadata()
constructor:
custom_lookup <- data.frame(
code = c("CUSTOM1", "CUSTOM2"),
description = c("Custom code 1", "Custom code 2")
)
add_lookup_table(
custom_lookup,
lookup_metadata("custom_codes", lookup_version = "v1")
)
#> ✔ Lookup table custom_codes_v1 added successfully.
CODES("all", type = "custom_codes")
#> ℹ Using 'v1' as latest version
#> <codeminer_codelist>: 2 codes
#>
#> Code type: "custom_codes"
#> # A tibble: 2 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 CUSTOM1 Custom code 1 custom_codes
#> 2 CUSTOM2 Custom code 2 custom_codesRemoving tables
To remove a table, use the corresponding
remove_*_table() function with the same identifying
keys:
remove_lookup_table("custom_codes", "v1")
#> ✔ Lookup table custom_codes_v1 removed.Removing a table deletes both the data table and its metadata entry. After removal, the same code type and version can be re-added.
Viewing metadata
Use get_codeminer_metadata() to inspect the tables
currently in the database:
get_codeminer_metadata("lookup")
#> lookup_table_name code_type lookup_version lookup_code_col
#> 1 BNF_UKB v4 BNF UKB v4 BNF_Code
#> 2 DM+D_UKB v4 DM+D UKB v4 concept_id
#> 3 ICD-9_UKB v4 ICD-9 UKB v4 ICD9
#> 4 ICD-10_UKB v4 ICD-10 UKB v4 ALT_CODE
#> 5 Read 2_UKB v4 Read 2 UKB v4 read_code
#> 6 Read 2, drugs_UKB v4 Read 2, drugs UKB v4 read_code
#> 7 Read 3_UKB v4 Read 3 UKB v4 read_code
#> lookup_description_col lookup_source
#> 1 Description https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=592
#> 2 term https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=592
#> 3 DESCRIPTION_ICD9 https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=592
#> 4 DESCRIPTION https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=592
#> 5 term_description https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=592
#> 6 term_description https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=592
#> 7 term_description https://biobank.ndph.ox.ac.uk/ukb/refer.cgi?id=592
#> preferred_description_col preferred_description_indicator col_filters
#> 1 <NA> <NA> <NA>
#> 2 <NA> <NA> <NA>
#> 3 <NA> <NA> <NA>
#> 4 <NA> <NA> <NA>
#> 5 term_code 00 <NA>
#> 6 <NA> <NA> <NA>
#> 7 description_type P <NA>Version pinning
When multiple versions of a lookup, mapping, or relationship table
are available, codeminer resolves "latest"
automatically. The first time a query function resolves
"latest" for a given code type, the resolved version is
cached for the remainder of the session. This avoids repeated
informational messages and ensures consistent version usage across a
workflow.
You can override this for the current session with
codeminer_set_version():
# Pin lookup and relationship versions for a code type
codeminer_set_version(
lookup = c("ICD-10" = "UKB v4"),
relationship = c("ICD-10" = "UKB v4")
)
# Pin a mapping version (use "from > to" format for the key)
codeminer_set_version(
mapping = c("Read 3 > ICD-10" = "UKB v4")
)Pins only affect the default "latest" resolution.
Explicit version arguments always take precedence:
# This uses the pinned version for ICD-10:
CODES("E10", type = "ICD-10")
#> <codeminer_codelist>: 1 code
#> Code type: "ICD-10"
#>
#> # A tibble: 1 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 E10 Type 1 diabetes mellitus ICD-10
# This ignores the pin and uses "UKB v4" directly:
CODES("E10", type = "ICD-10", lookup_version = "UKB v4")
#> <codeminer_codelist>: 1 code
#> Code type: "ICD-10"
#>
#> # A tibble: 1 × 3
#> code description code_type
#> <chr> <chr> <chr>
#> 1 E10 Type 1 diabetes mellitus ICD-10To clear all version selections and return to automatic
"latest" resolution:
You can also clear versions selectively by code type:
# Clear only the ICD-10 lookup version
codeminer_clear_versions(lookup = "ICD-10")
# Clear lookup and relationship for SNOMED CT
codeminer_clear_versions(
lookup = "SNOMED CT",
relationship = "SNOMED CT"
)codeminer_status() shows any active versions alongside
the connection info.
Storing version settings
For reproducible analysis, you can store your version pins in a configuration file and load them at the start of a session.
CSV format (one row per code type, columns for each table type):
code_type,lookup,relationship
ICD-10,UKB v4,UKB v4
Read 3,UKB v4,UKB v4
SNOMED CT,GPS v1,GPS v1
cfg <- read.csv("codeminer_versions.csv")
codeminer_set_version(
lookup = setNames(cfg$lookup, cfg$code_type),
relationship = setNames(cfg$relationship, cfg$code_type)
)Mapping pins use a "from > to" key format and are
best stored in a separate file or in JSON:
JSON format:
{
"lookup": {"ICD-10": "UKB v4", "Read 3": "UKB v4"},
"relationship": {"ICD-10": "UKB v4"},
"mapping": {"Read 3 > ICD-10": "UKB v4"}
}
cfg <- jsonlite::fromJSON("codeminer_versions.json")
codeminer_set_version(
lookup = unlist(cfg$lookup),
relationship = unlist(cfg$relationship),
mapping = unlist(cfg$mapping)
)Column filters
Some tables contain rows that should be excluded by default — for
example, inactive SNOMED CT concepts or approximate code mappings.
Column filters (col_filters) let table authors declare
which columns are filterable, what values are available, and which
values should be selected by default.
How filters are defined
Filters are stored in table metadata as a JSON specification. Each
filterable column has an entry with values (all valid
options) and defaults (applied when no override is
given):
# When adding a lookup table with filters
add_lookup_table(
my_snomed_lookup,
lookup_metadata(
"SNOMED CT",
lookup_version = "v1",
col_filters = list(
active_concept = list(
values = c("0", "1"),
defaults = c("1")
)
)
)
)Query-time behaviour
All query functions accept a col_filters parameter with
three options:
-
"default"(the default): apply filters from session pin or metadata defaults -
NULL: no filtering — return all rows regardless of column values - A named list: apply explicit filters for this call only
# Default: only active concepts (from metadata defaults)
CODES("all", type = "SNOMED CT")
# Override: return all rows including inactive
CODES("all", type = "SNOMED CT", col_filters = NULL)
# Custom: only inactive concepts
CODES("all", type = "SNOMED CT", col_filters = list(active_concept = c("0")))Filters are per-table-type
An important design point: each table type has its own
independent col_filters. This matters most for
MAP(), which touches two table types:
- The mapping table (e.g., Read 3 → ICD-10) may have
filters like
mapping_status - The target lookup table (e.g., ICD-10) may have
filters like
active_concept
When you call MAP(col_filters = ...), this controls only
the mapping table. The target lookup table uses its own default filters
when looking up descriptions. This is intentional — the two tables have
different filterable columns and different semantics.
To override filters on the target lookup as well, use session pinning:
# Pin lookup filters for SNOMED CT
codeminer_set_col_filters(
lookup = list("SNOMED CT" = list(active_concept = c("0", "1")))
)
# Now MAP() will use the pinned lookup filters for the target table
MAP("24700007", from = "SNOMED CT", to = "ICD-10")Session pinning
Like version pinning, you can pin col_filters for the entire session:
# Pin: include inactive SNOMED concepts
codeminer_set_col_filters(
lookup = list("SNOMED CT" = list(active_concept = c("0", "1")))
)
# Clear all filter pins
codeminer_clear_col_filters()Temporary overrides
For a scoped override, use with_col_filters():
# Temporarily include inactive concepts for this block only
result <- with_col_filters(
{
CODES("all", type = "SNOMED CT")
},
lookup = list("SNOMED CT" = list(active_concept = c("0", "1")))
)
# Outside the block, default filters apply againUpdating filters after table creation
If you need to add or change filters on an existing table without re-adding the data:
update_lookup_metadata(
"SNOMED CT",
col_filters = list(
active_concept = list(values = c("0", "1"), defaults = c("1"))
)
)Discovering available filters
get_col_filters() returns all registered filters, useful
for building UIs:
# Just defaults (for applying)
get_col_filters(defaults_only = TRUE)
# Full spec with all available values (for checkboxes in a Shiny app)
get_col_filters(defaults_only = FALSE)