Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
308 views
in Technique[技术] by (71.8m points)

python - Tensorflow GPU memory allocation

I'm trying to train a custom object detection model using my GPU instead of CPU. I've followed all the instructions given in the following tutorial: https://tensorflow-object-detection-api-tutorial.readthedocs.io/

I've tested my software and everything is installed and working correctly.

Currently using:

  • Windows 10
  • NVIDEA Quadro P1000
  • Tensorflow version 2.4.0
  • CUDA 11.0
  • CuDNN 8.0.4
  • Pre-trained model = ssd_resnet50_v1_fpn_640x640_coco17_tpu-8
  • Number of classes to detect = 8
  • Batch size: 1

The problem however is that after training for a few seconds it stops using the GPU giving the following warning messages.


2020-12-29 15:01:15.444931: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-12-29 15:01:18.923079: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-12-29 15:01:18.928526: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2020-12-29 15:01:19.830691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: Quadro P1000 computeCapability: 6.1
coreClock: 1.5185GHz coreCount: 4 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 89.53GiB/s
2020-12-29 15:01:19.838069: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-12-29 15:01:19.849650: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-12-29 15:01:19.854098: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-12-29 15:01:19.861632: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2020-12-29 15:01:19.867525: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2020-12-29 15:01:19.879754: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2020-12-29 15:01:19.886521: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2020-12-29 15:01:19.891603: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2020-12-29 15:01:19.895368: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2020-12-29 15:01:19.900144: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-29 15:01:19.910485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: Quadro P1000 computeCapability: 6.1
coreClock: 1.5185GHz coreCount: 4 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 89.53GiB/s
2020-12-29 15:01:19.917796: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-12-29 15:01:19.922273: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-12-29 15:01:19.926687: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-12-29 15:01:19.930618: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2020-12-29 15:01:19.934399: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2020-12-29 15:01:19.938808: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2020-12-29 15:01:19.943155: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2020-12-29 15:01:19.947005: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2020-12-29 15:01:19.950826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2020-12-29 15:01:20.491701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-29 15:01:20.496963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2020-12-29 15:01:20.500990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2020-12-29 15:01:20.504027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2991 MB memory) -> physical GPU (device: 0, name: Quadro P1000, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-12-29 15:01:20.512219: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I1229 15:01:20.515150  5872 mirrored_strategy.py:350] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I1229 15:01:20.515150  5872 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I1229 15:01:20.515150  5872 config_util.py:552] Maybe overwriting use_bfloat16: False
WARNING:tensorflow:From C:UsersUSER-Anaconda3libsite-packagesobject_detectionmodel_lib_v2.py:523: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W1229 15:01:20.530780  5872 deprecation.py:339] From C:UsersUSER-Anaconda3libsite-packagesobject_detectionmodel_lib_v2.py:523: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['annotations/train.record']
I1229 15:01:20.546404  5872 dataset_builder.py:148] Reading unweighted datasets: ['annotations/train.record']
INFO:tensorflow:Reading record datasets for input file: ['annotations/train.record']
I1229 15:01:20.546404  5872 dataset_builder.py:77] Reading record datasets for input file: ['annotations/train.record']
INFO:tensorflow:Number of filenames to read: 1
I1229 15:01:20.546404  5872 dataset_builder.py:78] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W1229 15:01:20.546404  5872 dataset_builder.py:86] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From C:UsersUSER-Anaconda3libsite-packagesobject_detectionuildersdataset_builder.py:103: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W1229 15:01:20.546404  5872 deprecation.py:339] From C:UsersUSER-Anaconda3libsite-packagesobject_detectionuildersdataset_builder.py:103: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From C:UsersUSER-Anaconda3libsite-packagesobject_detectionuildersdataset_builder.py:222: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
W1229 15:01:20.562029  5872 deprecation.py:339] From C:UsersUSER-Anaconda3libsite-packagesobject_detectionuildersdataset_builder.py:222: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From C:UsersUSER-Anaconda3libsite-packagesensorflowpythonutildispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W1229 15:01:25.685788  5872 deprecation.py:339] From C:UsersUSER-Anaconda3libsite-packagesensorflowpythonutildispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From C:UsersUSER-Anaconda3libsite-packagesensorflowpythonutildispatch.py:201: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W1229 15:01:27.908942  5872 deprecation.py:339] From C:UsersUSER-Anaconda3libsite-packagesensorflowpythonutildispatch.py:201: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From C:UsersUSER-Anaconda3libsite-packagesobject_detectioninputs.py:281: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W1229 15:01:29.229117  5872 deprecation.py:339] From C:UsersUSER-Anaconda3libsite-packagesobject_detectioninputs.py:281: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
2020-12-29 15:01:31.781125: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
C:UsersUSER-Anaconda3libsite-packagesensorflowpythonkerasackend.py:434: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
2020-12-29 15:01:48.972736: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-12-29 15:01:49.258182: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-12-29 15:01:49.287771: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2020-12-29 15:01:49.822205: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I've seen this error message for four different reasons, with different solutions:

1. You're out of memory

Maybe your GPU memory is filled, when TensorFlow makes initialization and your computational graph ends up using all the memory of your physical device then this issue arises. The solution is to use allow growth = True in GPU option. If memory growth is enabled for a GPU, the runtime initialization will not allocate all memory on the device. Using the below code snippet after imports may solve your problem.

import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

2. You have cache issues

I regularly work around this error by shutting down my python process, removing the ~/.nv directory (on linux, rm -rf ~/.nv), and restarting the Python process. I don't exactly know why this works. It's probably at least partly related to the second option:

3. While using Keras, Keras layers(classes) were directly imported from keras instead of tensorflow.keras

Keras is included in TensorFlow 2.0 above. So

remove import keras and replace from keras.module.module import class statement to --> from tensorflow.keras.module.module import class

For example Replace from keras.layers import Conv3D,ConvLSTM2D,Conv3DTranspose, Input with this: from tensorflow.keras.layers import Conv3D,ConvLSTM2D,Conv3DTranspose, Input

3. You have incompatible versions of CUDA, TensorFlow, NVIDIA drivers, etc.

If you've never had similar models working, you're not running out of VRAM, your imports are right as mentioned in step 3 and your cache is clean, I'd go back and set up CUDA + TensorFlow using the best available installation guide - I have had the most success with following the instructions at https://www.tensorflow.org/install/gpu rather than those on the NVIDIA / CUDA site. Lambda Stack: https://lambdalabs.com/lambda-stack-deep-learning-software is also a good way to go.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...