BioSample
: ncbi, 2023-09¶
import pandas as pd
df = pd.read_xml("https://www.ncbi.nlm.nih.gov/biosample/docs/attributes/?format=xml")
df.columns = df.columns.str.lower()
df.rename(columns={"harmonizedname": "abbr", "synonym": "synonyms"}, inplace=True)
df
name | abbr | synonyms | description | format | package | |
---|---|---|---|---|---|---|
0 | API gravity | api | api gravity | API gravity is a measure of how heavy or light... | {float} {unit} | MIUVIG.hydrocarbon-fluids_swabs.6.0 |
1 | EDTA inhibitor tested | edta_inhibitor_tested | edta inhibitor tested | Was carbapenemase activity tested in the prese... | ['', 'yes', 'no', 'missing', 'not applicable',... | Beta-lactamase.1.0 |
2 | FAO classification | fao_class | soil taxonomic/fao classification | soil classification from the FAO World Referen... | {term} | MIUVIG.soil.6.0 |
3 | FDA food industry class name | food_industry_class | food industry class | The US FDA Class is the second of five element... | {text} | OneHealthEnteric.1.0 |
4 | FDA food industry code name | food_industry_code | food industry code | The US FDA Industry Code is the first of five ... | {text} | OneHealthEnteric.1.0 |
... | ... | ... | ... | ... | ... | ... |
933 | window signs of water/mold | window_water_mold | window water mold | Signs of the presence of mold or mildew on the... | {text} | MIUVIG.built.6.0 |
934 | window status | window_status | None | Defines whether the windows were open or close... | {text} | MIUVIG.built.6.0 |
935 | window type | window_type | None | The type of windows, e.g., single-hung sash wi... | {text} | MIUVIG.built.6.0 |
936 | window vertical position | window_vert_pos | window vert pos | The vertical position of the window on the wal... | {text} | MIUVIG.built.6.0 |
937 | xylene | xylene | None | Concentration of xylene in the sample | {float} {unit} | MIUVIG.hydrocarbon-fluids_swabs.6.0 |
938 rows × 6 columns
df.to_parquet("df_all__ncbi__2023-09__BioSample.parquet")
from bionty.dev._md5 import calculate_md5
calculate_md5("df_all__ncbi__2023-09__BioSample.parquet")
'918db9bd1734b97c596c67d9654a4126'