API Reference

Here you will find all the details to data curation functions.

pMTnet_Omni_Document.data_curation.check_column_names(df: DataFrame) DataFrame[source]

Check if the column names are correct

The main purpose of this function is to make sure that the dataframe provided by the users contains necessary columns so that it can be used by subsequent functions

This function will NOT create mhca, mhcb, mhcaseq, mhcbseq. It will keep the original column mhc, mhcseq.

Parameters:

df (pd.DataFrame) – A pandas dataframe containing pairing data

Returns:

A pandas dataframe with corrected column names

Return type:

pd.DataFrame

pMTnet_Omni_Document.data_curation.check_species(df: DataFrame) DataFrame[source]

Check the TCR species and pMHC species

Parameters:

df (pd.DataFrame) – A pandas dataframe containing pairing data

Returns:

A pandas dataframe with curated data

Return type:

pd.DataFrame

pMTnet_Omni_Document.data_curation.check_v_gene_allele(df: DataFrame, a_reference_df: DataFrame, b_reference_df) DataFrame[source]
pMTnet_Omni_Document.data_curation.check_va_vb(df: DataFrame, background_tcrs_dir: str = './validation_data/') Tuple[DataFrame, DataFrame][source]

Check VA and VB

Parameters:
  • df (pd.DataFrame) – A pandas dataframe containing pairing data

  • background_tcrs_dir (str, optional) – The path to background tcrs data, by default “./validation_data/”

Returns:

A pandas dataframe with curated data and a pandas dataframe with invalid data

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

pMTnet_Omni_Document.data_curation.infer_mhc_info(df: DataFrame) DataFrame[source]

Infer the MHC classes and create columns mhca and mhcb

The input df should be the output of the check_column_names function.

Parameters:

df (pd.DataFrame) – A pandas dataframe containing mhc, mhcseq, and pmhc_species

Returns:

A column of a pandas dataframe with the inferred MHC classes, MHCs on the alpha chain and the beta chain

Return type:

pd.DataFrame

pMTnet_Omni_Document.data_curation.check_mhc(df: DataFrame, mhc_path: str = './validation_data/valid_mhc.txt') Tuple[DataFrame, DataFrame, DataFrame][source]

Check mhc This function will check if the data format conforms to what our model expects

Parameters:
  • df (pd.DataFrame) – A pandas dataframe containing pairing data

  • mhc_path (str) – The file path to valid mhcs

Returns:

Four pandas dataframe containing curated pairing data, pairs with peptides longer than 30, problematic mhca, and problematic mhcb

Return type:

Tuple[df.DataFrame, df.DataFrame, df.DataFrame, df.DataFrame]

pMTnet_Omni_Document.data_curation.check_peptide(df: DataFrame) Tuple[DataFrame, DataFrame][source]

Check peptide columns

Parameters:

df (pd.DataFrame) – A pandas dataframe with pairing data

Returns:

A pandas dataframe with curated data and a dataframe with dropped data

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

pMTnet_Omni_Document.data_curation.check_amino_acids(df_column: DataFrame) DataFrame[source]

Check amino acids are valid This function checks if the amino acids in one column of a dataframe are valid amino acids

Parameters:

df_column (pd.DataFrame) – One column of a dataframe

Returns:

Currated column with invalid aa replaced by “_”

Return type:

pd.DataFrame

pMTnet_Omni_Document.data_curation.check_amino_acids_columns(df: DataFrame) DataFrame[source]

Check all columns with AA sequences

Parameters:

df (pd.DataFrame) – A pandas dataframe with pairing data

Returns:

A pandas dataframe with curated data

Return type:

pd.DataFrame

class pMTnet_Omni_Document.data_curation.NumpyArrayEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: JSONEncoder

default(obj)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
pMTnet_Omni_Document.data_curation.encode_mhc_seq(df: DataFrame) dict[source]

Encode MHC sequences

Parameters:

df (pd.DataFrame) – A pandas dataframe containing pairing data

Returns:

A dictionary of the mhc sequences and their the EMS embeddings

Return type:

dict

pMTnet_Omni_Document.data_curation.read_file(file_path: str, save_results: bool = False, output_folder_path: Optional[str] = None, **kwargs) Tuple[DataFrame, dict][source]

Reads in user dataframe and performs some basic data curation

file_path: str

Path to the dataframe

save_results: bool

Whether or not the save the result

output_folder_path: str

The path to the output folder

**kwargs

Other arguments taken by the read_csv function

Returns:

A curated pandas dataframe

Return type:

pd.DataFrame