Data Models — immunedb.common.models

class immunedb.common.models.Clone(**kwargs)

A group of sequences likely originating from the same germline

Parameters:
  • id (int) – An auto-assigned unique identifier for the clone
  • functional (bool) – If the clone is functional
  • v_gene (str) – The V-gene assigned to the sequence
  • j_gene (str) – The J-gene assigned to the sequence
  • cdr3_nt (str) – The consensus nucleotides for the clone
  • cdr3_num_nts (int) – The number of nucleotides in the group’s CDR3
  • cdr3_aa (str) – The amino-acid sequence of the group’s CDR3
  • subject_id (int) – The ID of the subject to which the clone belongs
  • subject (Relationship) – Reference to the associated Subject instance
  • germline (str) – The germline sequence for this clone
  • tree (str) – The textual representation of the clone’s lineage tree
  • parent_id (int) – The (possibly null) ID of the clone’s parent
consensus_germline

Returns the consensus germline for the clone

regions

Returns the IMGT region boundaries for the clone

class immunedb.common.models.CloneStats(**kwargs)

Stores statistics for a given clone and sample. If sample is null the statistics are for the specified clone in all samples.

Parameters:
  • clone_id (int) – The clone ID
  • clone (Relationship) – Reference to the associated Clone instance
  • functional (bool) – If the associated clone is functional. This is a denormalized field.
  • sample_id (int) – The sample ID
  • sample (Relationship) – Reference to the associated Sample instance
  • unique_cnt (int) – The number of unique sequences in the clone in the sample
  • total_cnt (int) – The number of total sequences in the clone in the sample
  • mutations (str) – A JSON stanza of mutation count information
  • selection_pressure (str) – A JSON stanza of selection pressure information
class immunedb.common.models.DuplicateSequence(**kwargs)

A sequence which is a duplicate of a Sequence. This is used to minimize the size of the sequences table. The copy_number attribute of Sequence instances is equal to the number of its duplicate sequences plus one.

Parameters:
  • pk (int) – A primary key for this duplicate sequence
  • seq_id (str) – A unique identifier for the sequence as output by the sequencer
  • duplicate_seq_ai (str) – The auto-increment value of the sequence in the same sample with the same sequence
  • duplicate_seq (Relationship) – Reference to the associated Sequence instance of which this is a duplicate
  • sample_id (int) – The ID of the sample from which this sequence came
class immunedb.common.models.ModificationLog(**kwargs)

A log message for a database modification

Parameters:
  • id (int) – The ID of the log message
  • datetime (datetime) – The date and time of the message
  • action_type (str) – A short string representing the action
  • info (str) – A JSON stanza with log message information
class immunedb.common.models.NoResult(**kwargs)

A sequence which could not be match with a V or J.

Parameters:
  • pk (int) – A primary key for this no result
  • seq_id (str) – A unique identifier for the sequence as output by the sequencer
  • sample_id (int) – The ID of the sample from which this sequence came
  • sample (Relationship) – Reference to the associated Sample instance
  • sequence (str) – The sequence of the non-identifiable input
  • sequence – The quality of the non-identifiable input
class immunedb.common.models.Sample(**kwargs)

A sample taken from a single subject, tissue, and subset.

Parameters:
  • id (int) – An auto-assigned unique identifier for the sample
  • name (str) – A unique name for the sample as defined by the experimenter
  • info (str) – Optional information about the sample
  • date (date) – The date the sample was taken
  • study_id (int) – The ID of the study under which the subject was sampled
  • study (Relationship) – Reference to the associated Study instance
  • subject_id (int) – The ID of the subject from which the sample was taken
  • subject (Relationship) – Reference to the associated Subject instance
  • subset (str) – The tissue subset of the sample
  • tissue (str) – The tissue of the sample
  • ig_class (str) – The class of cells of the sample (e.g. IgA)
  • disease (str) – The known disease(s) present in the sample
  • lab (str) – The lab which acquired the sample
  • experimenter (str) – The experimenters name who took the sample
  • v_primer (str) – A description of the V gene primer used (if any)
  • j_primer (str) – A description of the J gene primer used (if any)
  • v_ties_mutations (float) – Average mutation rate of sequences in the sample
  • v_ties_len (float) – Average length of sequences in the sample
class immunedb.common.models.SampleStats(**kwargs)

Aggregate statistics for a sample. This exists to reduce the time queries take for a sample.

Parameters:
  • sample_id (int) – The ID of the sample for which the statistics were generated
  • sample (Relationship) – Reference to the associated Sample instance
  • filter_type (str) – The type of filter for the statistics (e.g. functional)
  • outliers (bool) – If outliers were included in the statistics
  • full_reads (bool) – If only full reads were included in the statistics
  • v_identity_dist (str) – Distribution of V gene identity
  • v_match_dist (str) – Distribution of V gene match count
  • v_length_dist (str) – Distribution of V gene total length
  • j_match_dist (str) – Distribution of J gene match count
  • j_length_dist (str) – Distribution of J gene total length
  • v_gene_dist (str) – Distribution of V-gene assignments
  • j_gene_dist (str) – Distribution of J-gene assignments
  • copy_number_dist (str) – Distribution of copy number
  • cdr3_length_dist (str) – Distribution of CDR3 lengths
  • sequence_cnt (int) – The total number of sequences
  • in_frame_cnt (int) – The number of in-frame sequences
  • stop_cnt (int) – The number of sequences containing a stop codon
  • functional_cnt (int) – The number of functional sequences
  • no_result_cnt (int) – The number of invalid sequences
class immunedb.common.models.Sequence(**kwargs)

Represents a single unique sequence.

Parameters:
  • ai (int) – An auto-incremented value for the sequence
  • subject_id (int) – The ID of the subject for this subject
  • seq_id (str) – A unique identifier for the sequence as output by the sequencer
  • sample_id (int) – The ID of the sample from which this sequence came
  • sample (Relationship) – Reference to the associated Sample instance
  • partial (bool) – If the sequence is a partial read
  • probable_indel_or_misalign (bool) – If the sequence likely has an indel or is a bad alignment
  • v_gene (str) – The V-gene assigned to the sequence
  • j_gene (str) – The J-gene assigned to the sequence
  • num_gaps (int) – Number of inserted gaps
  • pad_length (int) – The number of pad nucleotides added to the V end of the sequence
  • v_match (int) – The number of V-gene nucleotides matching the germline
  • v_length (int) – The length of the V-gene segment prior to a streak of mismatches in the CDR3
  • j_match (int) – The number of J-gene nucleotides matching the germline
  • j_length (int) – The length of the J-gene segment after a streak of mismatches in the CDR3
  • removed_prefix (str) – The sequence (if any) which was removed from the beginning of the sequence during alignment. Possibly used during indel correction
  • removed_prefix_qual (str) – The quality (if any) which was removed from the beginning of the sequence during alignment. Possibly used during indel correction
  • pre_cdr3_length (int) – The length of the V-gene prior to the CDR3
  • pre_cdr3_match (int) – The number of V-gene nucleotides matching the germline prior to the CDR3
  • post_cdr3_length (int) – The length of the J-gene after to the CDR3
  • post_cdr3_match (int) – The number of J-gene nucleotides matching the germline after to the CDR3
  • in_frame (bool) – If the sequence’s CDR3 has a length divisible by 3
  • functional (bool) – If the sequence is functional
  • stop (bool) – If the sequence contains a stop codon
  • copy_number (int) – Number of reads in the sample which collapsed to this sequence
  • cdr3_num_nts (int) – The number of nucleotides in the CDR3
  • cdr3_nt (str) – The nucleotides comprising the CDR3
  • cdr3_aa (str) – The amino-acids comprising the CDR3
  • sequence (str) – The (possibly-padded) sequence
  • quality (str) – Optional Phred quality score (in Sanger format) for each base in sequence
  • germline (str) – The germline sequence for this sequence
  • clone_id (int) – The clone ID to which this sequence belongs
  • clone (Relationship) – Reference to the associated Clone instance
  • mutations_from_clone (str) – A JSON stanza with mutation information
clone_sequence

Gets the sequence within the context of the associated clone by adding insertions from other sequences to this one.

get_v_extent(in_clone)

Returns the estimated V length, including the portion in the CDR3

original_quality

Returns the original quality given with the J end trimmed to the germline

original_sequence

Returns the original sequence given with the J end trimmed to the germline

regions

Returns the IMGT region boundaries for the sequence

class immunedb.common.models.SequenceCollapse(**kwargs)

A one to many table that links sequence from different samples that collapse to one another. This is used instead of a field in Sequence for performance reasons.

Parameters:
  • sample_id (int) – The ID of the sample with the sequence being collapsed
  • seq_ai (int) – The auto-increment value of the sequence being collapsed
  • clone (Relationship) – Reference to the associated Sequence instance being collapsed
  • collapse_to_subject_sample_id (int) – The ID of the sample in which the collapse to sequence belongs
  • collapse_to_subject_seq_ai (int) – The auto-increment value of the sequence collapsing to
  • collapse_to_subject_seq_id (int) – The sequence ID of the sequence collapsing to. This is a denormalized field.
  • instances_in_subject (int) – The number of instance of the sequence in the subject
  • copy_number_in_subject (int) – The aggregate copy number of the sequence in the subject
collapse_to_seq

Returns the sequence being collapse to

class immunedb.common.models.Study(**kwargs)

A study which aggregates related samples.

Parameters:
  • id (int) – An auto-assigned unique identifier for the study
  • name (str) – A unique name for the study
  • info (str) – Optional information about the study
class immunedb.common.models.Subject(**kwargs)

A subject which was sampled for a study.

Parameters:
  • id (int) – An auto-assigned unique identifier for the subject
  • identifier (str) – An identifier for the subject as defined by the experimenter
  • study_id (int) – The ID of the study under which the subject was sampled
  • study (Relationship) – Reference to the associated Study instance
immunedb.common.models.check_string_length(cls, key, inst)

Checks if a string can properly fit into a given field. If it is too long, a ValueError is raised. This prevents MySQL from truncating fields that are too long.