Inference
To use a fine-tuned model for inference, please first download the desired models from HuggingFace or ModelScope to local or provide a model trained by yourself.
- Here is the finetune models list
We use Plant DNAGPT model as example to predict active core promoter in plants.
First download a fine-tuned model and corresponding dataset from HuggingFace or ModelScope
# prepare a work directory
mkdir inference
# download fine-tuned model
git clone https://huggingface.co/zhangtaolab/plant-dnagpt-BPE-promoter models/plant-dnagpt-BPE-promoter
# download train dataset
git clone https://huggingface.co/datasets/zhangtaolab/plant-multi-species-core-promoters data/plant-multi-species-core-promoters
We provide a script named model_inference.py
for model inference.
Here is an example that use the script to predict core promoter:
# (method 1) Inference with local model, directly input a sequence
python model_inference.py -m models/plant-dnagpt-BPE-promoter -s 'TTACTAAATTTATAACGATTTTTTATCTAACTTTAGCTCATCAATCTTTACCGTGTCAAAATTTAGTGCCAAGAAGCAGACATGGCCCGATGATCTTTTACCCTGTTTTCATAGCTCGCGAGCCGCGACCTGTGTCCAACCTCAACGGTCACTGCAGTCCCAGCACCTCAGCAGCCTGCGCCTGCCATACCCCCTCCCCCACCCACCCACACACACCATCCGGGCCCACGGTGGGACCCAGATGTCATGCGCTGTACGGGCGAGCAACTAGCCCCCACCTCTTCCCAAGAGGCAAAACCT'
# (method 2) Inference with local model, provide a file contains multiple sequences to predict
python model_inference.py -m models/plant-dnagpt-BPE-promoter -f data/plant-multi-species-core-promoters/test.csv -o inference/promoter_predict_results.txt
# (method 3) Inference with an online model (Auto download the model trained by us from huggingface or modelscope)
python model_inference.py -m zhangtaolab/plant-dnagpt-BPE-promoter -ms huggingface -s 'GGGAAAAAGTGAACTCCATTGTTTTTTCACGCTAAGCAGACCACAATTGCTGCTTGGTACGAAAAGAAAACCGAACCCTTTCACCCACGCACAACTCCATCTCCATTAGCATGGACAGAACACCGTAGATTGAACGCGGGAGGCAACAGGCTAAATCGTCCGTTCAGCCAAAACGGAATCATGGGCTGTTTTTCCAGAAGGCTCCGTGTCGTGTGGTTGTGGTCCAAAAACGAAAAAGAAAGAAAAAAGAAAACCCTTCCCAAGACGTGAAGAAAAGCAATGCGATGCTGATGCACGTTA'
In this script:
1. -m
: Path to the fine-tuned model that is used for inference
2. -s
: Input DNA sequence, only nucleotide A, C, G, T, N are acceptable
3. -f
: Input file that contain multiple sequences, one line for each sequence. If you want to keep more information, file with ,
of \t
separator is acceptable, but a header contains sequence
column must be specified.
4. -ms
: Download the model from huggingface
or modelscope
if the model is not local. The format of model name is zhangtaolab/model-name
, users can copy model name here:
Output results contains the original sequence, input sequence length. If the task type is classification, predicted label and probability of each label will provide; If the task type is regression, a predicted score will provide.