Docker implementation for model inference
Environment deployment for LLMs may be an arduous job. To simplify this process, we also provide a docker version of our model inference code.
The images of the docker version are here, and the usage of docker implementation is shown below.
Inference using GPU
For GPU inference (with Nvidia GPU), please pull the image with gpu
tag, and make sure your computer has install the Nvidia Container Toolkit.
First download a finetune model from Huggingface or ModelScope, here we use Plant DNAMamba model as an example to predict active core promoters。
# prepare a work directory
mkdir LLM_inference
cd LLM_inference
git clone https://huggingface.co/zhangtaolab/plant-dnamamba-BPE-promoter
Then download the corresponding dataset, and if users have their own data, users can also prepare a custom dataset based on the previously mentioned inference data format.
git clone https://huggingface.co/datasets/zhangtaolab/plant-multi-species-core-promoters
Once the model and dataset are ready, pull our model inference image from docker and test if it works.
docker pull zhangtaolab/plant_llms_inference:gpu
docker run --runtime=nvidia --gpus=all -v ./:/home/llms zhangtaolab/plant_llms_inference:gpu -h
usage: inference.py [-h] [-v] -m MODEL [-f FILE] [-s SEQUENCE] [-t THRESHOLD]
[-l MAX_LENGTH] [-bs BATCH_SIZE] [-p SAMPLE] [-seed SEED]
[-d {cpu,gpu,mps,auto}] [-o OUTFILE] [-n]
Script for Plant DNA Large Language Models (LLMs) inference
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-m MODEL Model path (should contain both model and tokenizer)
-f FILE File contains sequences that need to be classified
-s SEQUENCE One sequence that need to be classified
-t THRESHOLD Threshold for defining as True class (Default: 0.5)
-l MAX_LENGTH Max length of tokenized sequence (Default: 512)
-bs BATCH_SIZE Batch size for classification (Default: 1)
-p SAMPLE Subsampling for testing (Default: 1e7)
-seed SEED Random seed for subsampling (Default: None)
-d {cpu,gpu,mps,auto}
Choose CPU or GPU to do inference (require specific
drivers) (Default: auto)
-o OUTFILE Prediction results (Default: stdout)
-n Whether or not save the runtime locally (Default:
False)
Example:
docker run --runtime=nvidia --gpus=all -v /local:/container zhangtaolab/plant_llms_inference:gpu -m model_path -f seqfile.csv -o output.txt
docker run --runtime=nvidia --gpus=all -v /local:/container zhangtaolab/plant_llms_inference:gpu -m model_path -s 'ATCGGATCTCGACAGT' -o output.txt
If the preceding information is displayed, the image is downloaded and the inference script can run normally. Inference is performed below using previously prepared models and datasets.
docker run --runtime=nvidia --gpus=all -v ./:/home/llms zhangtaolab/plant_llms_inference:gpu -m /home/llms/plant-dnamamba-BPE-promoter -f /home/llms/plant-multi-species-core-promoters/test.csv -o /home/llms/predict_results.txt
After the inference progress bar is completed, see the output file predict_results.txt
in the current local directory, which saves the prediction results corresponding to each sequence in the input file.
Inference using CPU
For CPU inference, please pull the image with cpu
tag, this image support computer without NVIDIA GPU, such as cpu-only or Apple M-series Silicon. (Note that Inference of DNAMamba model is not supported in CPU mode)
First download a finetune model from Huggingface or ModelScope, here we use Plant DNAGPT model as an example to predict active core promoters。
# prepare a work directory
mkdir LLM_inference
cd LLM_inference
git clone https://huggingface.co/zhangtaolab/plant-dnagpt-BPE-promoter
Then download the corresponding dataset, and if users have their own data, users can also prepare a custom dataset based on the previously mentioned inference data format.
git clone https://huggingface.co/datasets/zhangtaolab/plant-multi-species-core-promoters
Once the model and dataset are ready, pull our model inference image from docker and test if it works.
docker pull zhangtaolab/plant_llms_inference:cpu
docker run -v ./:/home/llms zhangtaolab/plant_llms_inference:cpu -h
usage: inference.py [-h] [-v] -m MODEL [-f FILE] [-s SEQUENCE] [-t THRESHOLD]
[-l MAX_LENGTH] [-bs BATCH_SIZE] [-p SAMPLE] [-seed SEED]
[-d {cpu,gpu,mps,auto}] [-o OUTFILE] [-n]
Script for Plant DNA Large Language Models (LLMs) inference
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-m MODEL Model path (should contain both model and tokenizer)
-f FILE File contains sequences that need to be classified
-s SEQUENCE One sequence that need to be classified
-t THRESHOLD Threshold for defining as True class (Default: 0.5)
-l MAX_LENGTH Max length of tokenized sequence (Default: 512)
-bs BATCH_SIZE Batch size for classification (Default: 1)
-p SAMPLE Subsampling for testing (Default: 1e7)
-seed SEED Random seed for subsampling (Default: None)
-d {cpu,gpu,mps,auto}
Choose CPU or GPU to do inference (require specific
drivers) (Default: auto)
-o OUTFILE Prediction results (Default: stdout)
-n Whether or not save the runtime locally (Default:
False)
Example:
docker run -v /local:/container zhangtaolab/plant_llms_inference:gpu -m model_path -f seqfile.csv -o output.txt
docker run -v /local:/container zhangtaolab/plant_llms_inference:gpu -m model_path -s 'ATCGGATCTCGACAGT' -o output.txt
If the preceding information is displayed, the image is downloaded and the inference script can run normally. Inference is performed below using previously prepared models and datasets.
docker run -v ./:/home/llms zhangtaolab/plant_llms_inference:cpu -m /home/llms/plant-dnagpt-BPE-promoter -f /home/llms/plant-multi-species-core-promoters/test.csv -o /home/llms/predict_results.txt
After the inference progress bar is completed, see the output file predict_results.txt
in the current local directory, which saves the prediction results corresponding to each sequence in the input file.
- The detailed usage is the same as the section Inference.
Inference with GUI
For convience, we also allow users predicting locally with a GUI based on Gradio, a friendly web app for machine learning models.
CPU inference can simply run the following command, then open the url http://127.0.0.1:7860
in your browser, then you will see a GUI with several options for task prediction.
(plant DNAMamba models are not shown in the cpu image because CPU cannot infer these models)
mkdir -p llms_gradio/cache
cd llms_gradio
docker run -p 7860:7860 -v ./cache:/root/.cache --name gradio_cpu zhangtaolab/plant_llms_gradio:cpu
Models will be downloaded into the llms_gradio/cache
folder in your computer during inference.
GPU-based inference requires users to install the Nvidia Container Toolkit in advance.
After the environment is prepared completely, run the following command, then open the url http://127.0.0.1:7860
in your browser.
mkdir -p llms_gradio/cache
cd llms_gradio
docker run --gpus=all -p 7860:7860 -v ./cache:/root/.cache --name gradio_gpu zhangtaolab/plant_llms_gradio:gpu
Online prediction platform
In order to facilitate users to use the model to predict DNA analysis tasks, we also provide online prediction platforms.
Please refer to online prediction platform