Installing with Docker (recommended)¶
The only dependency for running ImmuneDB in Docker is to have Docker installed locally.
Pulling the Docker Image¶
With Docker installed run one of these commands:
# (Recommended) Pulls a specific release version $ docker pull arosenfeld/immunedb:v0.24.1 # Pulls the most recent stable, but not released version $ docker pull arosenfeld/immunedb # Pulls the development version $ docker pull arosenfeld/immunedb:develop
Running the Container¶
To start a shell session within the container run (replacing 0.24.1 with a version you pulled previously):
$ docker run -it arosenfeld/immunedb:v0.24.1
This will start a shell with ImmuneDB and accessory scripts pre-installed. The location of important files are:
/root/germlines: IMGT aligned germlines for IGH, TCRA, and TCRB.
clearcutexecutable for generating lineages. This file is in the containers
/usr/local/sbin/serve_immunedb.sh: A helper script to serve the ImmuneDB web interface. This file is in the container’s
/apps/bowtie2/bowtie2: The local-alignment tool Bowtie2. This file is in the container’s
/share/configs: The recommended directory to store ImmuneDB configurations generated by
/share/mysql_data: The location MySQL (specifically MariaDB) will store its data.
/example: A set of example input data to familiarize yourself with ImmuneDB
Running the Example Pipeline¶
To get started, two input FASTQ files and an associated
are included at
/example. We recommend running through this example before
analyzing your own data to become familiar with the ImmuneDB pipeline.
First, create a database for the example. Note the database root user does not have a password, so we specify a blank one.
$ immunedb_admin create example /share/configs/ --admin-pass "" 2018-06-08 17:44:20 [INFO] Creating user "example" 2018-06-08 17:44:20 [INFO] Creating database "example" 2018-06-08 17:44:20 [INFO] Creating config at /share/configs/example.json 2018-06-08 17:44:20 [INFO] Initializing tables 2018-06-08 17:44:21 [INFO] Success!
Now that a database has been created, start the V- and J-identification:
$ immunedb_identify /share/configs/example.json \ ~/germlines/imgt_human_ighv.fasta \ ~/germlines/imgt_human_ighj.fasta \ /example 2018-06-08 17:52:28 [INFO] Starting sample Donor1_Colon # ... output truncated ... 2018-06-08 17:52:33 [INFO] Completed sample Donor1_Spleen in 0.1m - 1458/1470 (99%) identified
Then collapse the sequences across the samples:
$ immunedb_collapse /share/configs/example.json 2018-06-08 17:58:05 [INFO] Resetting collapse info for subject 1 # ... output truncated ... 2018-06-08 17:58:06 [INFO] Worker 2: Committing collapsed sequences
We will then infer clones using the CDR3 similarity method with all default parameters:
$ immunedb_clones /share/configs/example.json similarity 2018-06-08 18:00:31 [INFO] Generating task queue for subject 1 # ... output truncated ... 2018-06-08 18:00:34 [INFO] Skipping subclones
We then calculate per-sample clone statistics:
$ immunedb_clone_stats /share/configs/example.json 2018-06-08 18:01:38 [INFO] Creating task queue to generate stats for 236 clones. # ... output truncated ... 2018-06-08 18:01:43 [INFO] Worker 2: Clone 236
Optionally, we can also generate a lineage for each clone. To reduce the
influence of sequencing error, we use
--min-count 2 to include only
mutations that occur at least twice:
$ immunedb_clone_trees /share/configs/example.json --min-count 2 2018-06-08 15:12:07 [INFO] Creating task queue for clones # ... output truncated ... 2018-06-08 15:12:08 [INFO] Worker 5: Running clone 236
Another optional step is to use BASELINe to calculate selection pressure for each clone. Note that this is a relatively slow process, even for this small dataset:
$ immunedb_clone_pressure /share/configs/example.json \ /apps/baseline/Baseline_Main.r 2018-06-08 23:34:32 [INFO] Creating task queue to calculate selection pressure for 236 clones. # ... output truncated ... 2018-06-09 00:35:46 [INFO] Worker 4: Clone 236
The last step of the pipeline is to calculate statistics for each sample in the dataset:
$ immunedb_sample_stats /share/configs/example.json 2018-06-08 18:04:58 [INFO] Creating task queue to generate stats for sample 1. # ... output truncated ... 2018-06-08 18:04:59 [INFO] Worker 1: Processing clones for sample 2, include_outliers False, only_full_reads False
At this point the database is fully populated and you can use the web interface
and export data. First, lets export the data in AIRR format and
move it to
/share/export so it is available to the host system:
$ mkdir /share/export $ cd /share/export $ immunedb_export /share/configs/example.json airr 2018-06-08 18:09:41 [INFO] Exporting subject D1
There should now be a
D1.airr.tsv file in the containers
/share/export directory and the linked
on the host. There is only one file since the AIRR format export breaks the
data into one file per subject and this example only has the subject
Finally, let’s view the data in the web interface using the included helper
script. This takes a moment, so wait for the message
$ serve_immunedb.sh /share/configs/example.json Running for database /share/configs/example.json # ... output truncated ... webpack: Compiled successfully.
You should now be able to navigate to
view the web interface.
At this point you’ve completed the example pipeline. For details on creating your own metadata file and tweaking the pipeline to your needs see Running the Data Analysis Pipeline and Command Line Reference.