I've seen this error message for four different reasons, with different solutions:
1. You're out of memory
Maybe your GPU memory is filled, when TensorFlow makes initialization and your computational graph ends up using all the memory of your physical device then this issue arises. The solution is to use allow growth = True
in GPU option. If memory growth is enabled for a GPU, the runtime initialization will not allocate all memory on the device. Using the below code snippet after imports may solve your problem.
import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
2. You have cache issues
I regularly work around this error by shutting down my python process, removing the ~/.nv directory (on linux, rm -rf ~/.nv), and restarting the Python process. I don't exactly know why this works. It's probably at least partly related to the second option:
3. While using Keras, Keras layers(classes) were directly imported from keras instead of tensorflow.keras
Keras is included in TensorFlow 2.0 above. So
remove import keras
and
replace from keras.module.module import class
statement to --> from tensorflow.keras.module.module import class
For example
Replace
from keras.layers import Conv3D,ConvLSTM2D,Conv3DTranspose, Input
with this:
from tensorflow.keras.layers import Conv3D,ConvLSTM2D,Conv3DTranspose, Input
3. You have incompatible versions of CUDA, TensorFlow, NVIDIA drivers, etc.
If you've never had similar models working, you're not running out of VRAM, your imports are right as mentioned in step 3 and your cache is clean, I'd go back and set up CUDA + TensorFlow using the best available installation guide - I have had the most success with following the instructions at https://www.tensorflow.org/install/gpu rather than those on the NVIDIA / CUDA site. Lambda Stack: https://lambdalabs.com/lambda-stack-deep-learning-software is also a good way to go.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…