Determine first isolates of all microorganisms of every patient per episode and (if needed) per specimen type. These functions support all four methods as summarised by Hindler et al. in 2007 (doi:10.1086/511864
). To determine patient episodes not necessarily based on microorganisms, use is_new_episode()
that also supports grouping with the dplyr
package.
Usage
first_isolate(
x = NULL,
col_date = NULL,
col_patient_id = NULL,
col_mo = NULL,
col_testcode = NULL,
col_specimen = NULL,
col_icu = NULL,
col_keyantimicrobials = NULL,
episode_days = 365,
testcodes_exclude = NULL,
icu_exclude = FALSE,
specimen_group = NULL,
type = "points",
method = c("phenotype-based", "episode-based", "patient-based", "isolate-based"),
ignore_I = TRUE,
points_threshold = 2,
info = interactive(),
include_unknown = FALSE,
include_untested_sir = TRUE,
...
)
filter_first_isolate(
x = NULL,
col_date = NULL,
col_patient_id = NULL,
col_mo = NULL,
episode_days = 365,
method = c("phenotype-based", "episode-based", "patient-based", "isolate-based"),
...
)
Source
Methodology of this function is strictly based on:
M39 Analysis and Presentation of Cumulative Antimicrobial Susceptibility Test Data, 5th Edition, 2022, Clinical and Laboratory Standards Institute (CLSI). https://clsi.org/standards/products/microbiology/documents/m39/.
Hindler JF and Stelling J (2007). Analysis and Presentation of Cumulative Antibiograms: A New Consensus Guideline from the Clinical and Laboratory Standards Institute. Clinical Infectious Diseases, 44(6), 867-873. doi:10.1086/511864
Arguments
- x
a data.frame containing isolates. Can be left blank for automatic determination, see Examples.
- col_date
column name of the result date (or date that is was received on the lab) - the default is the first column with a date class
- col_patient_id
column name of the unique IDs of the patients - the default is the first column that starts with 'patient' or 'patid' (case insensitive)
- col_mo
column name of the names or codes of the microorganisms (see
as.mo()
) - the default is the first column of classmo
. Values will be coerced usingas.mo()
.- col_testcode
column name of the test codes. Use
col_testcode = NULL
to not exclude certain test codes (such as test codes for screening). In that casetestcodes_exclude
will be ignored.- col_specimen
column name of the specimen type or group
- col_icu
column name of the logicals (
TRUE
/FALSE
) whether a ward or department is an Intensive Care Unit (ICU). This can also be a logical vector with the same length as rows inx
.- col_keyantimicrobials
(only useful when
method = "phenotype-based"
) column name of the key antimicrobials to determine first isolates, seekey_antimicrobials()
. The default is the first column that starts with 'key' followed by 'ab' or 'antibiotics' or 'antimicrobials' (case insensitive). Usecol_keyantimicrobials = FALSE
to prevent this. Can also be the output ofkey_antimicrobials()
.- episode_days
episode in days after which a genus/species combination will be determined as 'first isolate' again. The default of 365 days is based on the guideline by CLSI, see Source.
- testcodes_exclude
a character vector with test codes that should be excluded (case-insensitive)
- icu_exclude
a logical to indicate whether ICU isolates should be excluded (rows with value
TRUE
in the column set withcol_icu
)- specimen_group
value in the column set with
col_specimen
to filter on- type
type to determine weighed isolates; can be
"keyantimicrobials"
or"points"
, see Details- method
the method to apply, either
"phenotype-based"
,"episode-based"
,"patient-based"
or"isolate-based"
(can be abbreviated), see Details. The default is"phenotype-based"
if antimicrobial test results are present in the data, and"episode-based"
otherwise.- ignore_I
logical to indicate whether antibiotic interpretations with
"I"
will be ignored whentype = "keyantimicrobials"
, see Details- points_threshold
minimum number of points to require before differences in the antibiogram will lead to inclusion of an isolate when
type = "points"
, see Details- info
a logical to indicate info should be printed - the default is
TRUE
only in interactive mode- include_unknown
a logical to indicate whether 'unknown' microorganisms should be included too, i.e. microbial code
"UNKNOWN"
, which defaults toFALSE
. For WHONET users, this means that all records with organism code"con"
(contamination) will be excluded at default. Isolates with a microbial ID ofNA
will always be excluded as first isolate.- include_untested_sir
a logical to indicate whether also rows without antibiotic results are still eligible for becoming a first isolate. Use
include_untested_sir = FALSE
to always returnFALSE
for such rows. This checks the data set for columns of classsir
and consequently requires transforming columns with antibiotic results usingas.sir()
first.- ...
arguments passed on to
first_isolate()
when usingfilter_first_isolate()
, otherwise arguments passed on tokey_antimicrobials()
(such asuniversal
,gram_negative
,gram_positive
)
Value
A logical vector
Details
To conduct epidemiological analyses on antimicrobial resistance data, only so-called first isolates should be included to prevent overestimation and underestimation of antimicrobial resistance. Different methods can be used to do so, see below.
These functions are context-aware. This means that the x
argument can be left blank if used inside a data.frame call, see Examples.
The first_isolate()
function is a wrapper around the is_new_episode()
function, but more efficient for data sets containing microorganism codes or names.
All isolates with a microbial ID of NA
will be excluded as first isolate.
Different methods
According to Hindler et al. (2007, doi:10.1086/511864 ), there are different methods (algorithms) to select first isolates with increasing reliability: isolate-based, patient-based, episode-based and phenotype-based. All methods select on a combination of the taxonomic genus and species (not subspecies).
All mentioned methods are covered in the first_isolate()
function:
Method | Function to apply |
Isolate-based | first_isolate(x, method = "isolate-based") |
(= all isolates) | |
Patient-based | first_isolate(x, method = "patient-based") |
(= first isolate per patient) | |
Episode-based | first_isolate(x, method = "episode-based") , or: |
(= first isolate per episode) | |
- 7-Day interval from initial isolate | - first_isolate(x, method = "e", episode_days = 7) |
- 30-Day interval from initial isolate | - first_isolate(x, method = "e", episode_days = 30) |
Phenotype-based | first_isolate(x, method = "phenotype-based") , or: |
(= first isolate per phenotype) | |
- Major difference in any antimicrobial result | - first_isolate(x, type = "points") |
- Any difference in key antimicrobial results | - first_isolate(x, type = "keyantimicrobials") |
Isolate-based
This method does not require any selection, as all isolates should be included. It does, however, respect all arguments set in the first_isolate()
function. For example, the default setting for include_unknown
(FALSE
) will omit selection of rows without a microbial ID.
Patient-based
To include every genus-species combination per patient once, set the episode_days
to Inf
. Although often inappropriate, this method makes sure that no duplicate isolates are selected from the same patient. In a large longitudinal data set, this could mean that isolates are excluded that were found years after the initial isolate.
Episode-based
To include every genus-species combination per patient episode once, set the episode_days
to a sensible number of days. Depending on the type of analysis, this could be 14, 30, 60 or 365. Short episodes are common for analysing specific hospital or ward data, long episodes are common for analysing regional and national data.
This is the most common method to correct for duplicate isolates. Patients are categorised into episodes based on their ID and dates (e.g., the date of specimen receipt or laboratory result). While this is a common method, it does not take into account antimicrobial test results. This means that e.g. a methicillin-resistant Staphylococcus aureus (MRSA) isolate cannot be differentiated from a wildtype Staphylococcus aureus isolate.
Phenotype-based
This is a more reliable method, since it also weighs the antibiogram (antimicrobial test results) yielding so-called 'first weighted isolates'. There are two different methods to weigh the antibiogram:
Using
type = "points"
and argumentpoints_threshold
(default)This method weighs all antimicrobial drugs available in the data set. Any difference from I to S or R (or vice versa) counts as
0.5
points, a difference from S to R (or vice versa) counts as1
point. When the sum of points exceedspoints_threshold
, which defaults to2
, an isolate will be selected as a first weighted isolate.All antimicrobials are internally selected using the
all_antimicrobials()
function. The output of this function does not need to be passed to thefirst_isolate()
function.Using
type = "keyantimicrobials"
and argumentignore_I
This method only weighs specific antimicrobial drugs, called key antimicrobials. Any difference from S to R (or vice versa) in these key antimicrobials will select an isolate as a first weighted isolate. With
ignore_I = FALSE
, also differences from I to S or R (or vice versa) will lead to this.Key antimicrobials are internally selected using the
key_antimicrobials()
function, but can also be added manually as a variable to the data and set in thecol_keyantimicrobials
argument. Another option is to pass the output of thekey_antimicrobials()
function directly to thecol_keyantimicrobials
argument.
The default method is phenotype-based (using type = "points"
) and episode-based (using episode_days = 365
). This makes sure that every genus-species combination is selected per patient once per year, while taking into account all antimicrobial test results. If no antimicrobial test results are available in the data set, only the episode-based method is applied at default.
Examples
# `example_isolates` is a data set available in the AMR package.
# See ?example_isolates.
example_isolates[first_isolate(info = TRUE), ]
#> ℹ Determining first isolates using an episode length of 365 days
#> ℹ Using column 'date' as input for col_date.
#> ℹ Using column 'patient' as input for col_patient_id.
#> ℹ Basing inclusion on all antimicrobial results, using a points threshold
#> of 2
#> ℹ Excluding 16 isolates with a microbial ID 'UNKNOWN' (in column 'mo')
#> => Found 1,387 'phenotype-based' first isolates (69.4% of total where a
#> microbial ID was available)
#> # A tibble: 1,387 × 46
#> date patient age gender ward mo PEN OXA FLC AMX
#> <date> <chr> <dbl> <chr> <chr> <mo> <sir> <sir> <sir> <sir>
#> 1 2002-01-02 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
#> 2 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
#> 3 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
#> 4 2002-01-16 067927 45 F ICU B_STPHY_EPDR R NA R NA
#> 5 2002-01-17 858515 79 F ICU B_STPHY_EPDR R NA S NA
#> 6 2002-01-17 495616 67 M Clinical B_STPHY_EPDR R NA S NA
#> 7 2002-01-19 738003 71 M Clinical B_ESCHR_COLI R NA NA NA
#> 8 2002-01-21 462081 75 F Clinical B_CTRBC_FRND R NA NA R
#> 9 2002-01-22 F35553 50 M ICU B_PROTS_MRBL R NA NA NA
#> 10 2002-02-03 481442 76 M ICU B_STPHY_CONS R NA S NA
#> # ℹ 1,377 more rows
#> # ℹ 36 more variables: AMC <sir>, AMP <sir>, TZP <sir>, CZO <sir>, FEP <sir>,
#> # CXM <sir>, FOX <sir>, CTX <sir>, CAZ <sir>, CRO <sir>, GEN <sir>,
#> # TOB <sir>, AMK <sir>, KAN <sir>, TMP <sir>, SXT <sir>, NIT <sir>,
#> # FOS <sir>, LNZ <sir>, CIP <sir>, MFX <sir>, VAN <sir>, TEC <sir>,
#> # TCY <sir>, TGC <sir>, DOX <sir>, ERY <sir>, CLI <sir>, AZM <sir>,
#> # IPM <sir>, MEM <sir>, MTR <sir>, CHL <sir>, COL <sir>, MUP <sir>, …
# \donttest{
# get all first Gram-negatives
example_isolates[which(first_isolate(info = FALSE) & mo_is_gram_negative()), ]
#> ℹ Using column 'mo' as input for mo_is_gram_negative()
#> # A tibble: 441 × 46
#> date patient age gender ward mo PEN OXA FLC AMX
#> <date> <chr> <dbl> <chr> <chr> <mo> <sir> <sir> <sir> <sir>
#> 1 2002-01-02 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
#> 2 2002-01-19 738003 71 M Clinical B_ESCHR_COLI R NA NA NA
#> 3 2002-01-21 462081 75 F Clinical B_CTRBC_FRND R NA NA R
#> 4 2002-01-22 F35553 50 M ICU B_PROTS_MRBL R NA NA NA
#> 5 2002-02-05 067927 45 F ICU B_SERRT_MRCS R NA NA R
#> 6 2002-02-27 066895 85 F Clinical B_KLBSL_PNMN R NA NA R
#> 7 2002-03-08 4FC193 69 M Clinical B_ESCHR_COLI R NA NA R
#> 8 2002-03-16 4FC193 69 M Clinical B_PSDMN_AERG R NA NA R
#> 9 2002-04-01 496896 46 F ICU B_ESCHR_COLI R NA NA NA
#> 10 2002-04-23 EE2510 69 F ICU B_ESCHR_COLI R NA NA NA
#> # ℹ 431 more rows
#> # ℹ 36 more variables: AMC <sir>, AMP <sir>, TZP <sir>, CZO <sir>, FEP <sir>,
#> # CXM <sir>, FOX <sir>, CTX <sir>, CAZ <sir>, CRO <sir>, GEN <sir>,
#> # TOB <sir>, AMK <sir>, KAN <sir>, TMP <sir>, SXT <sir>, NIT <sir>,
#> # FOS <sir>, LNZ <sir>, CIP <sir>, MFX <sir>, VAN <sir>, TEC <sir>,
#> # TCY <sir>, TGC <sir>, DOX <sir>, ERY <sir>, CLI <sir>, AZM <sir>,
#> # IPM <sir>, MEM <sir>, MTR <sir>, CHL <sir>, COL <sir>, MUP <sir>, …
if (require("dplyr")) {
# filter on first isolates using dplyr:
example_isolates %>%
filter(first_isolate(info = TRUE))
}
#> ℹ Determining first isolates using an episode length of 365 days
#> ℹ Basing inclusion on all antimicrobial results, using a points threshold
#> of 2
#> ℹ Excluding 16 isolates with a microbial ID 'UNKNOWN' (in column 'mo')
#> => Found 1,387 'phenotype-based' first isolates (69.4% of total where a
#> microbial ID was available)
#> # A tibble: 1,387 × 46
#> date patient age gender ward mo PEN OXA FLC AMX
#> <date> <chr> <dbl> <chr> <chr> <mo> <sir> <sir> <sir> <sir>
#> 1 2002-01-02 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
#> 2 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
#> 3 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
#> 4 2002-01-16 067927 45 F ICU B_STPHY_EPDR R NA R NA
#> 5 2002-01-17 858515 79 F ICU B_STPHY_EPDR R NA S NA
#> 6 2002-01-17 495616 67 M Clinical B_STPHY_EPDR R NA S NA
#> 7 2002-01-19 738003 71 M Clinical B_ESCHR_COLI R NA NA NA
#> 8 2002-01-21 462081 75 F Clinical B_CTRBC_FRND R NA NA R
#> 9 2002-01-22 F35553 50 M ICU B_PROTS_MRBL R NA NA NA
#> 10 2002-02-03 481442 76 M ICU B_STPHY_CONS R NA S NA
#> # ℹ 1,377 more rows
#> # ℹ 36 more variables: AMC <sir>, AMP <sir>, TZP <sir>, CZO <sir>, FEP <sir>,
#> # CXM <sir>, FOX <sir>, CTX <sir>, CAZ <sir>, CRO <sir>, GEN <sir>,
#> # TOB <sir>, AMK <sir>, KAN <sir>, TMP <sir>, SXT <sir>, NIT <sir>,
#> # FOS <sir>, LNZ <sir>, CIP <sir>, MFX <sir>, VAN <sir>, TEC <sir>,
#> # TCY <sir>, TGC <sir>, DOX <sir>, ERY <sir>, CLI <sir>, AZM <sir>,
#> # IPM <sir>, MEM <sir>, MTR <sir>, CHL <sir>, COL <sir>, MUP <sir>, …
if (require("dplyr")) {
# short-hand version:
example_isolates %>%
filter_first_isolate(info = FALSE)
}
#> # A tibble: 1,387 × 46
#> date patient age gender ward mo PEN OXA FLC AMX
#> <date> <chr> <dbl> <chr> <chr> <mo> <sir> <sir> <sir> <sir>
#> 1 2002-01-02 A77334 65 F Clinical B_ESCHR_COLI R NA NA NA
#> 2 2002-01-07 067927 45 F ICU B_STPHY_EPDR R NA R NA
#> 3 2002-01-14 462729 78 M Clinical B_STPHY_AURS R NA S R
#> 4 2002-01-16 067927 45 F ICU B_STPHY_EPDR R NA R NA
#> 5 2002-01-17 858515 79 F ICU B_STPHY_EPDR R NA S NA
#> 6 2002-01-17 495616 67 M Clinical B_STPHY_EPDR R NA S NA
#> 7 2002-01-19 738003 71 M Clinical B_ESCHR_COLI R NA NA NA
#> 8 2002-01-21 462081 75 F Clinical B_CTRBC_FRND R NA NA R
#> 9 2002-01-22 F35553 50 M ICU B_PROTS_MRBL R NA NA NA
#> 10 2002-02-03 481442 76 M ICU B_STPHY_CONS R NA S NA
#> # ℹ 1,377 more rows
#> # ℹ 36 more variables: AMC <sir>, AMP <sir>, TZP <sir>, CZO <sir>, FEP <sir>,
#> # CXM <sir>, FOX <sir>, CTX <sir>, CAZ <sir>, CRO <sir>, GEN <sir>,
#> # TOB <sir>, AMK <sir>, KAN <sir>, TMP <sir>, SXT <sir>, NIT <sir>,
#> # FOS <sir>, LNZ <sir>, CIP <sir>, MFX <sir>, VAN <sir>, TEC <sir>,
#> # TCY <sir>, TGC <sir>, DOX <sir>, ERY <sir>, CLI <sir>, AZM <sir>,
#> # IPM <sir>, MEM <sir>, MTR <sir>, CHL <sir>, COL <sir>, MUP <sir>, …
if (require("dplyr")) {
# flag the first isolates per group:
example_isolates %>%
group_by(ward) %>%
mutate(first = first_isolate(info = TRUE)) %>%
select(ward, date, patient, mo, first)
}
#> ℹ Determining first isolates using an episode length of 365 days
#> ℹ Basing inclusion on all antimicrobial results, using a points threshold
#> of 2
#>
#> Group: ward = "Clinical"
#> ℹ Excluding 9 isolates with a microbial ID 'UNKNOWN' (in column 'mo')
#> => Found 865 'phenotype-based' first isolates (70.1% of total where a
#> microbial ID was available)
#>
#> Group: ward = "ICU"
#> ℹ Excluding 6 isolates with a microbial ID 'UNKNOWN' (in column 'mo')
#> => Found 452 'phenotype-based' first isolates (70.0% of total where a
#> microbial ID was available)
#>
#> Group: ward = "Outpatient"
#> ℹ Excluding 1 isolates with a microbial ID 'UNKNOWN' (in column 'mo')
#> => Found 99 'phenotype-based' first isolates (82.5% of total where a
#> microbial ID was available)
#> # A tibble: 2,000 × 5
#> # Groups: ward [3]
#> ward date patient mo first
#> <chr> <date> <chr> <mo> <lgl>
#> 1 Clinical 2002-01-02 A77334 B_ESCHR_COLI TRUE
#> 2 Clinical 2002-01-03 A77334 B_ESCHR_COLI FALSE
#> 3 ICU 2002-01-07 067927 B_STPHY_EPDR TRUE
#> 4 ICU 2002-01-07 067927 B_STPHY_EPDR FALSE
#> 5 ICU 2002-01-13 067927 B_STPHY_EPDR FALSE
#> 6 ICU 2002-01-13 067927 B_STPHY_EPDR FALSE
#> 7 Clinical 2002-01-14 462729 B_STPHY_AURS TRUE
#> 8 Clinical 2002-01-14 462729 B_STPHY_AURS FALSE
#> 9 ICU 2002-01-16 067927 B_STPHY_EPDR TRUE
#> 10 ICU 2002-01-17 858515 B_STPHY_EPDR TRUE
#> # ℹ 1,990 more rows
# }