Drug: chebi, 2024-07-27

!lamin load laminlabs/bionty-assets
💡 connected lamindb: laminlabs/bionty-assets
import lamindb as ln
import bionty as bt
import pandas as pd

ln.context.uid = "fQpBV2oEQUFi0000"
ln.track()

new_ontology = ln.ULabel.filter(name="new_ontology").one()
ln.context.run.transform.ulabels.add(new_ontology)
💡 connected lamindb: laminlabs/bionty-assets
💡 notebook imports: bionty==0.48.2 lamindb==0.76.0 pandas==2.2.2
WARNING: Skipping /home/zeth/miniconda3/envs/lamindb/lib/python3.11/site-packages/jupyterlab_widgets-3.0.10.dist-info due to invalid metadata entry 'name'

💡 loaded Transform('fQpBV2oEQUFi0000') & created Run('2024-08-20 10:05:14.415182+00:00')

Curate source

The chebi owl file only has chebi IDs. However, mappings between chebi and chembl exist that we will add to the chebi DataFrame. We obtained a source from https://ftp.ebi.ac.uk/pub/databases/chembl/UniChem/data/table_dumps/ which tells us that Source 1 corresponds to chembl and source 7 to chebi. Hence, we obtain the mapping from src1 to src7 from https://ftp.ebi.ac.uk/pub/databases/chembl/UniChem/data/wholeSourceMapping/.

# The parquet file was obtained by loading http://purl.obolibrary.org/obo/chebi/236/chebi.owl with Bionty Drug
drug_df = pd.read_parquet("chebi_2024-07-27.parquet")
drug_df.head()
name definition synonyms parents
ontology_id
CHEBI:10 (+)-Atherospermoline None (+)-Atherospermoline [CHEBI:133004]
CHEBI:100 (-)-medicarpin The (-)-Enantiomer Of Medicarpin. (-)-Medicarpin|(-)-medicarpin|(6aR,11aR)-9-met... [CHEBI:16114]
CHEBI:10000 Vismione D None Vismione D [CHEBI:46955]
CHEBI:100000 (2S,3S,4R)-3-[4-(3-cyclopentylprop-1-ynyl)phen... None None [CHEBI:36820, CHEBI:22712, CHEBI:38777]
CHEBI:100001 N-[(2R,3S,6R)-2-(hydroxymethyl)-6-[2-[[oxo-[4-... None None [CHEBI:20857]
def read_mapping_file(file_path: str) -> dict[str, str]:
    chembl_dict = {}

    with open(file_path, "r") as file:
        next(file)

        for line in file:
            fromsrc1, tosrc7 = line.strip().split()
            chembl_dict[f"CHEBI:{tosrc7}"] = fromsrc1

    return chembl_dict


src_mapping = read_mapping_file("src1src7.txt")
first_key = next(iter(src_mapping))
print(f"First element of the mapping: {first_key}: {src_mapping[first_key]}")
First element of the mapping: CHEBI:16273: CHEMBL46810
drug_df["chembl_id"] = drug_df.index.map(src_mapping.get)
drug_df
name definition synonyms parents chembl_id
ontology_id
CHEBI:10 (+)-Atherospermoline None (+)-Atherospermoline [CHEBI:133004] CHEMBL500609
CHEBI:100 (-)-medicarpin The (-)-Enantiomer Of Medicarpin. (-)-Medicarpin|(-)-medicarpin|(6aR,11aR)-9-met... [CHEBI:16114] CHEMBL238845
CHEBI:10000 Vismione D None Vismione D [CHEBI:46955] CHEMBL487795
CHEBI:100000 (2S,3S,4R)-3-[4-(3-cyclopentylprop-1-ynyl)phen... None None [CHEBI:36820, CHEBI:22712, CHEBI:38777] None
CHEBI:100001 N-[(2R,3S,6R)-2-(hydroxymethyl)-6-[2-[[oxo-[4-... None None [CHEBI:20857] None
... ... ... ... ... ...
CHEBI:99995 2-[(2S,4aS,12aS)-5-methyl-6-oxo-8-[(1-oxo-2-ph... None None [CHEBI:22160] None
CHEBI:99996 N-[(1S,3S,4aR,9aS)-3-[2-[(2,5-difluorophenyl)m... None None [CHEBI:74927] None
CHEBI:99997 N-[(2S,4aS,12aS)-2-[2-(cyclohexylmethylamino)-... None None [CHEBI:17792, CHEBI:36586] None
CHEBI:99998 N-[[(3S,9S,10R)-16-(dimethylamino)-12-[(2S)-1-... None None [CHEBI:52898, CHEBI:24995] CHEMBL1903737
CHEBI:99999 N-[(5S,6S,9S)-5-methoxy-3,6,9-trimethyl-2-oxo-... None None [CHEBI:52898, CHEBI:24995] None

200981 rows × 5 columns

drug_df.to_parquet("df_all__chebi__2024-07-27__Drug.parquet")

Register in laminlabs/bionty-assets

from bionty.core._bionty import register_source_in_bionty_assets
source_record = bt.Source.filter(name="chebi", organism="all", version="2024-07-27", entity="Drug").one()
register_source_in_bionty_assets(filepath="df_all__chebi__2024-07-27__Drug.parquet", source=source_record)
... uploading df_all__chebi__2024-07-27__Drug.parquet: 100.0%
registered Source(uid='1atB', entity='Drug', organism='all', name='chebi', version='2024-07-27', in_db=False, currently_used=False, description='', url='http://purl.obolibrary.org/obo/chebi/236/chebi.owl', md5='', source_website='', created_by_id=3, dataframe_artifact_id=176, updated_at='2024-08-20 10:05:33 UTC') with dataframe Artifact(uid='FeIg71WrUn9HBeS1VbtA', is_latest=True, key='df_all__chebi__2024-07-27__Drug.parquet', suffix='.parquet', size=13901923, hash='0MdXAAAHwLqglrfW55lEhw', _hash_type='md5', visibility=1, _key_is_virtual=False, created_by_id=2, storage_id=1, transform_id=9, run_id=10, updated_at='2024-08-20 10:05:22 UTC')
Artifact(uid='FeIg71WrUn9HBeS1VbtA', is_latest=True, key='df_all__chebi__2024-07-27__Drug.parquet', suffix='.parquet', size=13901923, hash='0MdXAAAHwLqglrfW55lEhw', _hash_type='md5', visibility=1, _key_is_virtual=False, created_by_id=2, storage_id=1, transform_id=9, run_id=10, updated_at='2024-08-20 10:05:22 UTC')
ln.finish()
✅ cell execution numbers increase consecutively
💡 go to: https://lamin.ai/laminlabs/bionty-assets/transform/fQpBV2oEQUFi0000
💡 if you want to update your notebook without re-running it, use `lamin save notebook.ipynb`