MHC Format
We single out the requirements for MHC format used in
the mhc column as we are aware that there are multiple
nomenclatures in this field.
In general, the format expected by our algorithm is similar
to what you will find on the Immune Epitope Database And
Analysis Resource website (IEDB). For the requirements on
the MHC sequences in the mhcseq column, please refer to
Amino Acids Sequences Format.
Overall
Our algorithm uses information provided in the mhc column
in conjunction with the pmhc_species column to determine the
MHC classes as well as other information. Therefore, before you
get started:
Warning
Make sure that the information provided in the pmhc_species column
is compatible with the corresponding information in the
mhc column. Otherwise, the program will raise an exception.
Prefix
Our program is quite flexible on the input format of the names
of MHCs. It is completely acceptable if you want to prepend
HLA- to a human HLA (or H2- a mouse MHC) or ignore that
prefix. It’s even fine if you have whitespaces: some where in the
string or use H-2- instead of H2-.
Input Examples |
Acceptable |
|---|---|
HLA-A*01:01; HLA A*01:01; HLA- A*01:01; HLAA*01:01; A*01:01 |
Yes |
Human A*01:01; MHC-A*01:01 |
No |
H2-Db; H-2-Db; H2 Db; Db; H2Db |
Yes |
Mouse-Db; MHC-Db |
No |
Details
We have already computed the ESM embeddings of around 20,000 MHCs. Since
they are stored using key-value pairs, if a value provided in the mhc
column can not find an exact match, our algorithm will assume that it is
a “new” MHC and invoke the ESM2 algorithm to encode the
corresponding sequences. This could significantly slow down the entire process.
Or the program will halt and throw an error if the sequences are not provided.
mhc requirements
- Human Class I
One name for the HLA would suffice. Our program will use input value as the name for the Alpha chain and impute the Beta chain using
human_microglobulin.- Human Class II HLA that starts with DP or DQ
Names for both chains should be provided. The format we assume is
MHC Alphafollowed by a forward slash/, which is then followed byMHC Beta.- Human Class II HLA that starts with DR
There are two possible scenarios that we take into account. If both the user provided information on both chains, then the inference method follows that of the HLA DP and DQ. On the other hand, if only the information on Beta chain is supplied, then our program use the input value as the name for the Beta chain and impute the Alpha chain as
DRA*01:01.- Mouse Class I
One name for the MHC would suffice. Our program will use input value as the name for the Alpha chain and impute the Beta chain using
mouse_microglobulin.- Mouse Class II
One name for the MHC would suffice. Our program will automatically extract the alpha and beta chain sequences from our database.
Class |
mhc |
|---|---|
Human Class I |
A*01:01 |
Human Class II: Only DRB |
DRB1*01:01 |
Human Class II: DRA and DRB |
DRA*01:01/DRB1*01:01 |
Human Class II: DP |
DPA1*04:02/DPB1*01:01 |
Human Class II: DQ |
DQA1*06:04/DQB1*02:07 |
Mouse Class I |
H-2-Db |
Mouse Class II |
H-2-IAk |
Note
If you are still not sure whether or not the information you supplied conforms with our standard, we also provided some rudimentary functionalities to help you. Please refer to Data Curation where we guide you through the process.