淺談tensorflow語義分割api的使用(deeplab訓練cityscapes)
淺談tensorflow語義分割api的使用(deeplab訓練cityscapes)
安裝教程:
cityscapes訓練:
遇到的坑:
1. 環(huán)境:
- tensorflow1.8+CUDA9.0+cudnn7.0+annaconda3+py3.5
- 使用最新的tensorflow1.12或者1.10都不行,報錯:報錯不造卷積算法(convolution algorithm...)
2. 數(shù)據(jù)集轉(zhuǎn)換
# Exit immediately if a command exits with a non-zero status. set -e CURRENT_DIR=$(pwd) WORK_DIR="." # Root path for Cityscapes dataset. CITYSCAPES_ROOT="${WORK_DIR}/cityscapes" # Create training labels. python "${CITYSCAPES_ROOT}/cityscapesscripts/preparation/createTrainIdLabelImgs.py" # Build TFRecords of the dataset. # First, create output directory for storing TFRecords. OUTPUT_DIR="${CITYSCAPES_ROOT}/tfrecord" mkdir -p "${OUTPUT_DIR}" BUILD_SCRIPT="${CURRENT_DIR}/build_cityscapes_data.py" echo "Converting Cityscapes dataset..." python "${BUILD_SCRIPT}" \ --cityscapes_root="${CITYSCAPES_ROOT}" \ --output_dir="${OUTPUT_DIR}" \
- 首先當前conda環(huán)境下安裝cityscapesScripts模塊,要支持py3.5才行;
- 由于cityscapesscripts/preparation/createTrainIdLabelImgs.py里面默認會把數(shù)據(jù)集gtFine下面的test,train,val文件夾json文件都轉(zhuǎn)為TrainIdlandelImgs.png;然而在test文件下有很多json文件編碼格式是錯誤的,大約十幾張,每次報錯,然后將其剔除?。?!
- 然后執(zhí)行build_cityscapes_data.py將img,lable轉(zhuǎn)換為tfrecord格式。
3. 訓練cityscapes代碼
- 將訓練代碼寫成腳本文件:train_deeplab_cityscapes.sh
#!/bin/bash # CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --backbone resnet --lr 0.01 --workers 4 --epochs 40 --batch-size 16 --gpu-ids 0,1,2,3 --checkname deeplab-resnet --eval-interval 1 --dataset coco PATH_TO_INITIAL_CHECKPOINT='/home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/model.ckpt' PATH_TO_TRAIN_DIR='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/' PATH_TO_DATASET='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/tfrecord' WORK_DIR='/home/rjw/tf-models/research/deeplab' # From tensorflow/models/research/ python "${WORK_DIR}"/train.py \ --logtostderr \ --training_number_of_steps=40000 \ --train_split="train" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --train_crop_size=513 \ --train_crop_size=513 \ --train_batch_size=1 \ --fine_tune_batch_norm=False \ --dataset="cityscapes" \ --tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \ --train_logdir=${PATH_TO_TRAIN_DIR} \ --dataset_dir=${PATH_TO_DATASET}
參數(shù)分析:
training_number_of_steps: 訓練迭代次數(shù);
train_crop_size:訓練圖片的裁剪大小,因為我的GPU只有8G,故我將這個設置為513了;
train_batch_size: 訓練的batchsize,也是因為硬件條件,故保持1;
fine_tune_batch_norm=False :是否使用batch_norm,官方建議,如果訓練的batch_size小于12的話,須將該參數(shù)設置為False,這個設置很重要,否則的話訓練時會在2000步左右報錯
tf_initial_checkpoint:預訓練的初始checkpoint,這里設置的即是前面下載的../research/deeplab/backbone/deeplabv3_cityscapes_train/model.ckpt.index
train_logdir: 保存訓練權(quán)重的目錄,注意在開始的創(chuàng)建工程目錄的時候就創(chuàng)建了,這里設置為"../research/deeplab/exp/train_on_train_set/train/"
dataset_dir:數(shù)據(jù)集的地址,前面創(chuàng)建的TFRecords目錄。這里設置為"../dataset/cityscapes/tfrecord"
4.驗證測試
- 驗證腳本:
#!/bin/bash # CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --backbone resnet --lr 0.01 --workers 4 --epochs 40 --batch-size 16 --gpu-ids 0,1,2,3 --checkname deeplab-resnet --eval-interval 1 --dataset coco PATH_TO_INITIAL_CHECKPOINT='/home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/' PATH_TO_CHECKPOINT='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/' PATH_TO_EVAL_DIR='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/eval/' PATH_TO_DATASET='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/tfrecord' WORK_DIR='/home/rjw/tf-models/research/deeplab' # From tensorflow/models/research/ python "${WORK_DIR}"/eval.py \ --logtostderr \ --eval_split="val" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --eval_crop_size=1025 \ --eval_crop_size=2049 \ --dataset="cityscapes" \ --checkpoint_dir=${PATH_TO_INITIAL_CHECKPOINT} \ --eval_logdir=${PATH_TO_EVAL_DIR} \ --dataset_dir=${PATH_TO_DATASET}
- rusult:model.ckpt-40000為在初始化模型上訓練40000次迭代的模型;后面用初始化模型測試miou_1.0還是很低,不知道是不是有什么參數(shù)設置的問題!??!
- 注意,如果使用官方提供的checkpoint,壓縮包中是沒有checkpoint文件的,需要手動添加一個checkpoint文件;初始化模型中是沒有提供chekpoint文件的。
INFO:tensorflow:Restoring parameters from /home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/model.ckpt-40000 INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Starting evaluation at 2018-12-18-07:13:08 INFO:tensorflow:Evaluation [50/500] INFO:tensorflow:Evaluation [100/500] INFO:tensorflow:Evaluation [150/500] INFO:tensorflow:Evaluation [200/500] INFO:tensorflow:Evaluation [250/500] INFO:tensorflow:Evaluation [300/500] INFO:tensorflow:Evaluation [350/500] INFO:tensorflow:Evaluation [400/500] INFO:tensorflow:Evaluation [450/500] miou_1.0[0.478293568] INFO:tensorflow:Waiting for new checkpoint at /home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/ INFO:tensorflow:Found new checkpoint at /home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/model.ckpt INFO:tensorflow:Graph was finalized. 2018-12-18 15:18:05.210957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-12-18 15:18:05.211047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-12-18 15:18:05.211077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]0 2018-12-18 15:18:05.211100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:N 2018-12-18 15:18:05.211645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9404 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) INFO:tensorflow:Restoring parameters from /home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/model.ckpt INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Starting evaluation at 2018-12-18-07:18:06 INFO:tensorflow:Evaluation [50/500] INFO:tensorflow:Evaluation [100/500] INFO:tensorflow:Evaluation [150/500] INFO:tensorflow:Evaluation [200/500] INFO:tensorflow:Evaluation [250/500] INFO:tensorflow:Evaluation [300/500] INFO:tensorflow:Evaluation [350/500] INFO:tensorflow:Evaluation [400/500] INFO:tensorflow:Evaluation [450/500] miou_1.0[0.496331513]
5.可視化測試
- 在vis目錄下生成分割結(jié)果圖
#!/bin/bash # CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --backbone resnet --lr 0.01 --workers 4 --epochs 40 --batch-size 16 --gpu-ids 0,1,2,3 --checkname deeplab-resnet --eval-interval 1 --dataset coco PATH_TO_CHECKPOINT='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/' PATH_TO_VIS_DIR='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/vis/' PATH_TO_DATASET='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/tfrecord' WORK_DIR='/home/rjw/tf-models/research/deeplab' # From tensorflow/models/research/ python "${WORK_DIR}"/vis.py \ --logtostderr \ --vis_split="val" \ --model_variant="xception_65" \ --atrous_rates=6 \ --atrous_rates=12 \ --atrous_rates=18 \ --output_stride=16 \ --decoder_output_stride=4 \ --vis_crop_size=1025 \ --vis_crop_size=2049 \ --dataset="cityscapes" \ --colormap_type="cityscapes" \ --checkpoint_dir=${PATH_TO_CHECKPOINT} \ --vis_logdir=${PATH_TO_VIS_DIR} \ --dataset_dir=${PATH_TO_DATASET}
以上為個人經(jīng)驗,希望能給大家一個參考,也希望大家多多支持本站。
版權(quán)聲明:本站文章來源標注為YINGSOO的內(nèi)容版權(quán)均為本站所有,歡迎引用、轉(zhuǎn)載,請保持原文完整并注明來源及原文鏈接。禁止復制或仿造本網(wǎng)站,禁止在非www.sddonglingsh.com所屬的服務器上建立鏡像,否則將依法追究法律責任。本站部分內(nèi)容來源于網(wǎng)友推薦、互聯(lián)網(wǎng)收集整理而來,僅供學習參考,不代表本站立場,如有內(nèi)容涉嫌侵權(quán),請聯(lián)系alex-e#qq.com處理。