MHC Format

We single out the requirements for MHC format used in the mhc column as we are aware that there are multiple nomenclatures in this field. In general, the format expected by our algorithm is similar to what you will find on the Immune Epitope Database And Analysis Resource website (IEDB). For the requirements on the MHC sequences in the mhcseq column, please refer to Amino Acids Sequences Format.

Overall

Our algorithm uses information provided in the mhc column in conjunction with the pmhc_species column to determine the MHC classes as well as other information. Therefore, before you get started:

Warning

Make sure that the information provided in the pmhc_species column is compatible with the corresponding information in the mhc column. Otherwise, the program will raise an exception.

Prefix

Our program is quite flexible on the input format of the names of MHCs. It is completely acceptable if you want to prepend HLA- to a human HLA (or H2- a mouse MHC) or ignore that prefix. It’s even fine if you have whitespaces: some where in the string or use H-2- instead of H2-.

Sample Input

Input Examples

Acceptable

HLA-A*01:01; HLA A*01:01; HLA- A*01:01; HLAA*01:01; A*01:01

Yes

Human A*01:01; MHC-A*01:01

No

H2-Db; H-2-Db; H2 Db; Db; H2Db

Yes

Mouse-Db; MHC-Db

No

Details

We have already computed the ESM embeddings of around 20,000 MHCs. Since they are stored using key-value pairs, if a value provided in the mhc column can not find an exact match, our algorithm will assume that it is a “new” MHC and invoke the ESM2 algorithm to encode the corresponding sequences. This could significantly slow down the entire process. Or the program will halt and throw an error if the sequences are not provided.

mhc requirements

Human Class I

One name for the HLA would suffice. Our program will use input value as the name for the Alpha chain and impute the Beta chain using human_microglobulin.

Human Class II HLA that starts with DP or DQ

Names for both chains should be provided. The format we assume is MHC Alpha followed by a forward slash /, which is then followed by MHC Beta.

Human Class II HLA that starts with DR

There are two possible scenarios that we take into account. If both the user provided information on both chains, then the inference method follows that of the HLA DP and DQ. On the other hand, if only the information on Beta chain is supplied, then our program use the input value as the name for the Beta chain and impute the Alpha chain as DRA*01:01.

Mouse Class I

One name for the MHC would suffice. Our program will use input value as the name for the Alpha chain and impute the Beta chain using mouse_microglobulin.

Mouse Class II

One name for the MHC would suffice. Our program will automatically extract the alpha and beta chain sequences from our database.

Sample Input

Class

mhc

Human Class I

A*01:01

Human Class II: Only DRB

DRB1*01:01

Human Class II: DRA and DRB

DRA*01:01/DRB1*01:01

Human Class II: DP

DPA1*04:02/DPB1*01:01

Human Class II: DQ

DQA1*06:04/DQB1*02:07

Mouse Class I

H-2-Db

Mouse Class II

H-2-IAk

Note

If you are still not sure whether or not the information you supplied conforms with our standard, we also provided some rudimentary functionalities to help you. Please refer to Data Curation where we guide you through the process.