How to predict antimicrobial resistance
Source:vignettes/resistance_predict.Rmd
resistance_predict.Rmd
Needed R packages
As with many uses in R, we need some additional packages for AMR data
analysis. Our package works closely together with the tidyverse packages dplyr
and ggplot2
. The
tidyverse tremendously improves the way we conduct data science - it
allows for a very natural way of writing syntaxes and creating beautiful
plots in R.
Our AMR
package depends on these packages and even
extends their use and functions.
Prediction analysis
Our package contains a function resistance_predict()
,
which takes the same input as functions for other
AMR data analysis. Based on a date column, it calculates cases per
year and uses a regression model to predict antimicrobial
resistance.
It is basically as easy as:
# resistance prediction of piperacillin/tazobactam (TZP):
resistance_predict(tbl = example_isolates, col_date = "date", col_ab = "TZP", model = "binomial")
# or:
example_isolates %>%
resistance_predict(
col_ab = "TZP",
model = "binomial"
)
# to bind it to object 'predict_TZP' for example:
predict_TZP <- example_isolates %>%
resistance_predict(
col_ab = "TZP",
model = "binomial"
)
The function will look for a date column itself if
col_date
is not set.
When running any of these commands, a summary of the regression model
will be printed unless using
resistance_predict(..., info = FALSE)
.
This text is only a printed summary - the actual result (output) of
the function is a data.frame
containing for each year: the
number of observations, the actual observed resistance, the estimated
resistance and the standard error below and above the estimation:
predict_TZP
#> # A tibble: 34 × 7
#> year value se_min se_max observations observed estimated
#> * <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 2002 0.2 NA NA 15 0.2 0.0562
#> 2 2003 0.0625 NA NA 32 0.0625 0.0616
#> 3 2004 0.0854 NA NA 82 0.0854 0.0676
#> 4 2005 0.05 NA NA 60 0.05 0.0741
#> 5 2006 0.0508 NA NA 59 0.0508 0.0812
#> 6 2007 0.121 NA NA 66 0.121 0.0889
#> 7 2008 0.0417 NA NA 72 0.0417 0.0972
#> 8 2009 0.0164 NA NA 61 0.0164 0.106
#> 9 2010 0.0566 NA NA 53 0.0566 0.116
#> 10 2011 0.183 NA NA 93 0.183 0.127
#> # ℹ 24 more rows
The function plot
is available in base R, and can be
extended by other packages to depend the output based on the type of
input. We extended its function to cope with resistance predictions:
plot(predict_TZP)
This is the fastest way to plot the result. It automatically adds the right axes, error bars, titles, number of available observations and type of model.
We also support the ggplot2
package with our custom
function ggplot_sir_predict()
to create more appealing
plots:
ggplot_sir_predict(predict_TZP)
# choose for error bars instead of a ribbon
ggplot_sir_predict(predict_TZP, ribbon = FALSE)
Choosing the right model
Resistance is not easily predicted; if we look at vancomycin resistance in Gram-positive bacteria, the spread (i.e. standard error) is enormous:
example_isolates %>%
filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "binomial") %>%
ggplot_sir_predict()
Vancomycin resistance could be 100% in ten years, but might remain very low.
You can define the model with the model
parameter. The
model chosen above is a generalised linear regression model using a
binomial distribution, assuming that a period of zero resistance was
followed by a period of increasing resistance leading slowly to more and
more resistance.
Valid values are:
Input values | Function used by R | Type of model |
---|---|---|
"binomial" or "binom" or
"logit"
|
glm(..., family = binomial) |
Generalised linear model with binomial distribution |
"loglin" or "poisson"
|
glm(..., family = poisson) |
Generalised linear model with poisson distribution |
"lin" or "linear"
|
lm() |
Linear model |
For the vancomycin resistance in Gram-positive bacteria, a linear model might be more appropriate:
example_isolates %>%
filter(mo_gramstain(mo, language = NULL) == "Gram-positive") %>%
resistance_predict(col_ab = "VAN", year_min = 2010, info = FALSE, model = "linear") %>%
ggplot_sir_predict()
The model itself is also available from the object, as an
attribute
:
model <- attributes(predict_TZP)$model
summary(model)$family
#>
#> Family: binomial
#> Link function: logit
summary(model)$coefficients
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -200.67944891 46.17315349 -4.346237 1.384932e-05
#> year 0.09883005 0.02295317 4.305725 1.664395e-05