Antimicrobial Resistance Genes and Mutations detection#

We will be using 2 different tools and their associated databases for detecting genes and mutations that could confer an antimicrobial resistance to the bacteria carrying it. These tools are:

AMRFinderPlus from NCBI
CARD RGI (Resistance Gene Identifier)

In this practical you will learn to run these tools and their associated tools, to inspect the output files, and to visualise them in a graphical way.

Prepare our computing environment#

We will first run the appropriate srun command to book the computing cores (cpus) on the cluster.

Tip

You need to ask the teacher which partition to use !

srun -p SELECTED_PARTITION --cpus-per-task 2 --pty bash -i

You are now on a computing node, with computing 2 cpus reserved for you. That way, you can run commands interactively.

If you want to exit the srun interactive mode, press CTRL+D or type exit

First, you need to identify in your results directory the fasta file corresponding to the unicycler hybrid assembly.

cd results
ls -l

AMRFinderPlus#

First we will activate the AMRFinder container, and check the options and the different organisms datasets available for resistance mutations detection

module load system/singularity/3.6.0

singularity shell path/to/amrfinder_container

amrfinder -h
amrfinder --list_organisms

Now that we have identified the organism option to use, we can run the command

amrfinder --nucleotide K2_unicycler_scaffolds.fasta -o K2_unicycler_AMRfinder.txt --organism Klebsiella --plus --mutation_all K2_unicycler_AMRfinder_all_mut.txt

Question

Inspect both output files and comment. Which resistance could be identified ?

Now that we are more used to the command line, we can prepare a generic script amrfinder_script.sh to run AMRFinderPlus on all our strains draft genomes, using a loop for sbatch the jobs. Create it in the scripts directory. Here is an example of script that we will explain, and where you need to change some paths...

#!/bin/bash
## SLURM CONFIG ##

#SBATCH --job-name=amrfinder
#SBATCH --output=%x.%j.out
#SBATCH --cpus-per-task 4
#SBATCH --time=24:00:00
#SBATCH -p SELECTED_PARTITION
#SBATCH --mail-type=FAIL,END
#SBATCH --mem-per-cpu=4G

###
HELP="USAGE: amrfinderplus.sh contigs.fasta"
# If we didn't get any arguments, print help and exit
if [[ $# < 1 ]]
then
    echo "$HELP"
    exit 0
fi

#variables
suffix=".fasta"
prefix1="${1%$suffix}"

##
module load system/singularity/3.6.0

singularity run path/to/amrfinder_container \
  amrfinder --nucleotide ${1} -o ${prefix1}_AMRfinder.txt --organism Klebsiella --plus --mutation_all ${prefix1}_AMRfinder_all_mut.txt

Once your script seems ready, exit the container, and the srun if you are still on the node. Then you are ready to loop on the unicycler fasta files for running the script with sbatch

First, we can test it on one strain:

sbatch ../scripts/amrfinder_script.sh K1_unicycler_scaffolds.fasta

If the test has worked, you can run the loop:

for i in K*_unicycler_scaffolds.fasta; \
  do sbatch ../scripts/amrfinder_script.sh $i;\
  done

CARD RGI#

Now that we feel more familiar with preparing SLURM scripts, we will make card_script.sh for running CARD RGI on all samples.

#!/bin/bash
## SLURM CONFIG ##

#SBATCH --job-name=card
#SBATCH --output=%x.%j.out
#SBATCH --cpus-per-task 4
#SBATCH --time=24:00:00
#SBATCH -p highmemplus
#SBATCH --mail-type=FAIL,END
#SBATCH --mem-per-cpu=4G

###

HELP="
USAGE: card_script.sh contigs.fasta
"
# If we didn't get any arguments, print help and exit
if [[ $# < 1 ]]
then
    echo "$HELP"
    exit 0
fi

#variables
suffix=".fasta"
prefix1="${1%$suffix}"

##
module load system/singularity/3.6.0

# loading the CARD database
singularity run /path/to/containers/rgi_6.0.0--pyha8f3691_0.sif \
  rgi load --card_json /path/to/databases/CARD/card.json

# Running RGI Main => output json for Heatmap plotting
singularity run /path/to/containers/rgi_6.0.0--pyha8f3691_0.sif \
  rgi main -i ${1} -o ${prefix1}_RGI_main --debug -a BLAST -d wgs -n 4

# Preparing tabular file for import to excel or R
singularity run /path/to/containers/rgi_6.0.0--pyha8f3691_0.sif \
  rgi tab -i ${prefix1}_RGI_main.json

rm ${1}.temp.*
rm ${1}.temp

Make a test run on one sample before running the loop.

If the test has worked, you can run the loop:

for i in K*_unicycler_scaffolds.fasta; \
  do sbatch ../scripts/card_script.sh $i;\
  done

Once we have all the json files from CARD RGI, we will place them in the same directory.

mkdir card_json
mv *_RGI_main.json card_json

After that, we can run the RGI program for making the heatmap clustering Run the sruncommand if you are not already on the computing node:

srun -p SELECTED_PARTITION --cpus-per-task 2 --pty bash -i

Run RGI heatmap:

module load system/singularity/3.6.0

singularity run path/to/card_container

#Generate a heat map from pre-compiled RGI main JSON files, samples clustered by similarity of resistome and AMR genes organized by Drug Class
rgi heatmap -i card_json -cat drug_class -o hm_drug_class -clus samples

Tip

You can find other ways to cluster/classify the resistance genes, as the following examples:

#Generate a heat map from pre-compiled RGI main JSON files, samples clustered by similarity of resistome and AMR genes organized by resistance resistance_mechanism
rgi heatmap -i card_json -cat resistance_mechanism -o hm_resistance_mechanism -clus samples
#Generate a heat map from pre-compiled RGI main JSON files, samples clustered by similarity of resistome and AMR genes organized by AMR gene family
rgi heatmap -i card_json -cat gene_family -o hm_genefamily_samples -clus samples

#Generate a heat map from pre-compiled RGI main JSON files, samples clustered by similarity of resistome (with histogram used for abundance of identical resistomes) and AMR genes organized by distribution among samples:
rgi heatmap -i card_json -o cluster_both_frequency -f -clus both

Then you can scp all the pictures on you computer and inspect.

Previous MLST

Plasmids Next