API Reference

Here you will find all the details to data curation functions.

pMTnet_Omni_Document.data_curation.check_column_names(df: DataFrame) → DataFrame[source]

Check if the column names are correct

The main purpose of this function is to make sure that the dataframe provided by the users contains necessary columns so that it can be used by subsequent functions

This function will NOT create mhca, mhcb, mhcaseq, mhcbseq. It will keep the original column mhc, mhcseq.

Parameters:: df (pd.DataFrame) – A pandas dataframe containing pairing data
Returns:: A pandas dataframe with corrected column names
Return type:: pd.DataFrame

pMTnet_Omni_Document.data_curation.check_species(df: DataFrame) → DataFrame[source]

Check the TCR species and pMHC species

Parameters:: df (pd.DataFrame) – A pandas dataframe containing pairing data
Returns:: A pandas dataframe with curated data
Return type:: pd.DataFrame

pMTnet_Omni_Document.data_curation.check_v_gene_allele(df: DataFrame, a_reference_df: DataFrame, b_reference_df) → DataFrame[source]

pMTnet_Omni_Document.data_curation.check_va_vb(df: DataFrame, background_tcrs_dir: str = './validation_data/') → Tuple[DataFrame, DataFrame][source]

Check VA and VB

Parameters:

df (pd.DataFrame) – A pandas dataframe containing pairing data
background_tcrs_dir (str, optional) – The path to background tcrs data, by default “./validation_data/”

Returns:

A pandas dataframe with curated data and a pandas dataframe with invalid data

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

pMTnet_Omni_Document.data_curation.infer_mhc_info(df: DataFrame) → DataFrame[source]

Infer the MHC classes and create columns mhca and mhcb

The input df should be the output of the check_column_names function.

Parameters:: df (pd.DataFrame) – A pandas dataframe containing mhc, mhcseq, and pmhc_species
Returns:: A column of a pandas dataframe with the inferred MHC classes, MHCs on the alpha chain and the beta chain
Return type:: pd.DataFrame

pMTnet_Omni_Document.data_curation.check_mhc(df: DataFrame, mhc_path: str = './validation_data/valid_mhc.txt') → Tuple[DataFrame, DataFrame, DataFrame][source]

Check mhc This function will check if the data format conforms to what our model expects

Parameters:

df (pd.DataFrame) – A pandas dataframe containing pairing data
mhc_path (str) – The file path to valid mhcs

Returns:

Four pandas dataframe containing curated pairing data, pairs with peptides longer than 30, problematic mhca, and problematic mhcb

Return type:

Tuple[df.DataFrame, df.DataFrame, df.DataFrame, df.DataFrame]

pMTnet_Omni_Document.data_curation.check_peptide(df: DataFrame) → Tuple[DataFrame, DataFrame][source]

Check peptide columns

Parameters:: df (pd.DataFrame) – A pandas dataframe with pairing data
Returns:: A pandas dataframe with curated data and a dataframe with dropped data
Return type:: Tuple[pd.DataFrame, pd.DataFrame]

pMTnet_Omni_Document.data_curation.check_amino_acids(df_column: DataFrame) → DataFrame[source]

Check amino acids are valid This function checks if the amino acids in one column of a dataframe are valid amino acids

Parameters:: df_column (pd.DataFrame) – One column of a dataframe
Returns:: Currated column with invalid aa replaced by “_”
Return type:: pd.DataFrame

pMTnet_Omni_Document.data_curation.check_amino_acids_columns(df: DataFrame) → DataFrame[source]

Check all columns with AA sequences

Parameters:: df (pd.DataFrame) – A pandas dataframe with pairing data
Returns:: A pandas dataframe with curated data
Return type:: pd.DataFrame

class pMTnet_Omni_Document.data_curation.NumpyArrayEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: JSONEncoder

default(obj)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)

pMTnet_Omni_Document.data_curation.encode_mhc_seq(df: DataFrame) → dict[source]

Encode MHC sequences

Parameters:: df (pd.DataFrame) – A pandas dataframe containing pairing data
Returns:: A dictionary of the mhc sequences and their the EMS embeddings
Return type:: dict

pMTnet_Omni_Document.data_curation.read_file(file_path: str, save_results: bool = False, output_folder_path: Optional[str] = None, **kwargs) → Tuple[DataFrame, dict][source]

Reads in user dataframe and performs some basic data curation

file_path: str: Path to the dataframe
save_results: bool: Whether or not the save the result
output_folder_path: str: The path to the output folder
**kwargs: Other arguments taken by the read_csv function

Returns:: A curated pandas dataframe
Return type:: pd.DataFrame