0 背景
Google在2016年2月开源了TensorFlow Serving,这个组件可以将TensorFlow训练好的模型导出,并部署成可以对外提供预测服务的RESTful/RPC接口。有了这个组件,TensorFlow就可以实现应用机器学习的全流程:从训练模型、调试参数,到打包模型,最后部署服务,名副其实是一个从研究到生产整条流水线都齐备的框架。TensorFlow Serving是一个为生产环境而设计的高性能的机器学习服务系统。它可以同时运行多个大规模深度学习模型,支持模型生命周期管理、算法实验,并可以高效地利用GPU资源,让TensorFlow训练好的模型更快捷方便地投入到实际生产环境。
目前TF Serving有Docker、APT(二级制安装)和源码编译三种方式,但考虑实际的生产环境项目部署和简单性,推荐使用Docker方式,这也是让TensorFlow服务支持GPU的最简单方法,因此首先介绍Docker的安装方法。
1 docker安装
如果想要使用GPU版本的docker,则需要安装19.03以上版本的docker,首先卸载旧版本的docker
sudo apt-get remove docker docker-engine docker.io containerd runc
如果apt提示没有这些包也没关系,更新apt库
sudo apt-get update
安装依赖
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
添加Docker的官方GPG密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
新增库
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
更新库并安装
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
验证安装
sudo docker run hello-world
查看版本
sudo docker -v
输出Docker version 19.03.1表明安装成功
然后配置GPU
# Add the package repositories
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo systemctl restart docker
但运行时报错requirement error: unsatisfied condition: brand = tesla\\\\n\\\"\"": unknown.查询是cuda版本问题,我服务器上默认调用的是9.0,但显示要求需要10.1版本,因此安装并切换10.1版本的cuda,切换方法参考《Linux之cuda、cudnn版本切换》
TF Serving客户端和服务端的通信方式有两种分别是gRPC和RESTfull API,接下来对这两种通信的实现方法进行介绍
2 RESTfull API形式
下拉镜像,可以指定版本,这里我们选择支持GPU的镜像
docker pull tensorflow/serving:1.12.3-gpu
然后下载模型
mkdir -p /tmp/tfserving
cd /tmp/tfserving
git clone https://github.com/tensorflow/serving
运行TensorFlow Serving容器,将其指向此模型并打开REST API端口(8501):
docker run --runtime=nvidia -p 8501:8501 --mount type=bind,source=$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_two_gpu,target=/models/half_plus_two -e MODEL_NAME=half_plus_two -t tensorflow/serving:1.12.3-gpu &
运行docker容器,并指定用nvidia-docker,表示调用GPU,启动TensorFlow服务模型,绑定REST API端口8501(gRPC端口为8500),并将我们所需的模型从主机(source)映射到容器中预期模型的位置(target)。我们还将模型的名称作为环境变量传递,这在查询模型时非常重要。提示:在查询模型之前,请务必等到看到如下所示的消息,表明服务器已准备好接收请求:
2018-07-27 00:07:20.773693: I tensorflow_serving/model_servers/main.cc:333]
Exporting HTTP/REST API at:localhost:8501 ...
新打开一个终端,模拟客户端查询
curl -d '{"instances": [1.0, 2.0, 5.0]}' \
-X POST http://localhost:8501/v1/models/half_plus_two:predict
会有返回值
{ "predictions": [2.5, 3.0, 4.5] }
如果是在没有GPU的计算机上运行将会报错
Cannot assign a device for operation 'a': Operation was explicitly assigned to /device:GPU:0
3 gRPC形式
接下来采用手写体数字识别模型的例子,来说明tf服务的过程
首先训练一个模型,如果要用GPU训练,则需要修改mnist_saved_model.py文件,在头文件处新增如下代码
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
创建mnist文件夹,然后运行脚本,将结果文件保存在mnist文件夹中
cd serving
mkdir mnist
python tensorflow_serving/example/mnist_saved_model.py mnist
输出如下
Training model...
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/t10k-labels-idx1-ubyte.gz
2019-09-18 11:24:11.010872: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-18 11:24:11.614926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: TITAN V major: 7 minor: 0 memoryClockRate(GHz): 1.455
pciBusID: 0000:06:00.0
totalMemory: 11.78GiB freeMemory: 11.36GiB
2019-09-18 11:24:11.614978: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-09-18 11:24:15.718308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-18 11:24:15.718387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-09-18 11:24:15.718412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-09-18 11:24:15.721216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10974 MB memory) -> physical GPU (device: 0, name: TITAN V, pci bus id: 0000:06:00.0, compute capability: 7.0)
training accuracy 0.9092
Done training!
Exporting trained model to b'mnist/1'
Done exporting!
训练完之后会在mnist/1文件夹下生成模型文件
mnist
└── 1
├── saved_model.pb
└── variables
├── variables.data-00000-of-00001
└── variables.index
其中,saved_model.pb是序列化的tensorflow::SaveModel,它包括模型的一个或多个图形定义,以及模型的元数据(如签名);variables是包含图形的序列化变量的文件。
运行服务
CPU版
docker run -p 8500:8500 --mount type=bind,source=$(pwd)/mnist,target=/models/mnist -e MODEL_NAME=mnist -t tensorflow/serving
GPU版
docker run --runtime=nvidia -p 8500:8500 --mount type=bind,source=$(pwd)/mnist,target=/models/mnist -e MODEL_NAME=mnist -t tensorflow/serving:1.12.3-gpu
客户端验证
python tensorflow_serving/example/mnist_client.py --num_tests=1000 --server=127.0.0.1:8500
输出如下
Extracting /tmp/train-images-idx3-ubyte.gz
Extracting /tmp/train-labels-idx1-ubyte.gz
Extracting /tmp/t10k-images-idx3-ubyte.gz
Extracting /tmp/t10k-labels-idx1-ubyte.gz
W0918 12:49:20.635714 140358698280704 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Inference error rate: 10.4%
转载:https://blog.csdn.net/zong596568821xp/article/details/99715005