BioSample: ncbi, 2023-09

import pandas as pd
df = pd.read_xml("https://www.ncbi.nlm.nih.gov/biosample/docs/attributes/?format=xml")
df.columns = df.columns.str.lower()
df.rename(columns={"harmonizedname": "abbr", "synonym": "synonyms"}, inplace=True)
df
name abbr synonyms description format package
0 API gravity api api gravity API gravity is a measure of how heavy or light... {float} {unit} MIUVIG.hydrocarbon-fluids_swabs.6.0
1 EDTA inhibitor tested edta_inhibitor_tested edta inhibitor tested Was carbapenemase activity tested in the prese... ['', 'yes', 'no', 'missing', 'not applicable',... Beta-lactamase.1.0
2 FAO classification fao_class soil taxonomic/fao classification soil classification from the FAO World Referen... {term} MIUVIG.soil.6.0
3 FDA food industry class name food_industry_class food industry class The US FDA Class is the second of five element... {text} OneHealthEnteric.1.0
4 FDA food industry code name food_industry_code food industry code The US FDA Industry Code is the first of five ... {text} OneHealthEnteric.1.0
... ... ... ... ... ... ...
933 window signs of water/mold window_water_mold window water mold Signs of the presence of mold or mildew on the... {text} MIUVIG.built.6.0
934 window status window_status None Defines whether the windows were open or close... {text} MIUVIG.built.6.0
935 window type window_type None The type of windows, e.g., single-hung sash wi... {text} MIUVIG.built.6.0
936 window vertical position window_vert_pos window vert pos The vertical position of the window on the wal... {text} MIUVIG.built.6.0
937 xylene xylene None Concentration of xylene in the sample {float} {unit} MIUVIG.hydrocarbon-fluids_swabs.6.0

938 rows × 6 columns

df.to_parquet("df_all__ncbi__2023-09__BioSample.parquet")
from bionty.dev._md5 import calculate_md5

calculate_md5("df_all__ncbi__2023-09__BioSample.parquet")
'918db9bd1734b97c596c67d9654a4126'