You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am get the following error when trying to run the model in the "train" phase -
2017-05-30 05:39:23,518 root INFO max_gradient_norm: 5.000000
2017-05-30 05:39:23,518 root INFO clip_gradients: True
2017-05-30 05:39:23,518 root INFO valid_target_length inf
2017-05-30 05:39:23,518 root INFO target_vocab_size: 39
2017-05-30 05:39:23,518 root INFO target_embedding_size: 10.000000
2017-05-30 05:39:23,518 root INFO attn_num_hidden: 128
2017-05-30 05:39:23,518 root INFO attn_num_layers: 2
2017-05-30 05:39:23,519 root INFO visualize: True
2017-05-30 05:39:23,519 root INFO buckets
2017-05-30 05:39:23,519 root INFO [(16, 11), (27, 17), (35, 19), (64, 22), (80, 32)]
2017-05-30 05:41:51,137 root INFO Created model with fresh parameters.
Train: : 0%| | 0/156 [00:00<?, ?it/s]2017-05-30 05:46:19,134 root INFO Generating first batch)
E tensorflow/stream_executor/cuda/cuda_blas.cc:472] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
input_tensor dim: (?, 1, 32, ?)
CNN outdim before squeeze: (?, 1, ?, 512)
CNN outdim: (?, ?, 512)
Traceback (most recent call last):
File "src/launcher.py", line 148, in <module>
main(sys.argv[1:], exp_config.ExpConfig)
File "src/launcher.py", line 145, in main
model.launch()
File "/home/sprabh6/Attention-OCR/src/model/model.py", line 300, in launch
summaries, step_loss, step_logits, _ = self.step(encoder_masks, img_data, zero_paddings, decoder_inputs, target_weights, bucket_id, self.forward_only)
File "/home/sprabh6/Attention-OCR/src/model/model.py", line 411, in step
outputs = self.sess.run(output_feed, input_feed)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(64, 522), b.shape=(522, 128), m=64, n=128, k=522
[[Node: model_with_buckets/embedding_attention_decoder_1/attention_decoder/attention_decoder/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](model_with_buckets/embedding_attention_decoder_1/attention_decoder/attention_decoder/concat, embedding_attention_decoder/attention_decoder/weights/read)]]
[[Node: conv_conv5/BatchNorm/AssignMovingAvg/_270 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_28061_conv_conv5/BatchNorm/AssignMovingAvg", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'model_with_buckets/embedding_attention_decoder_1/attention_decoder/attention_decoder/MatMul', defined at:
File "src/launcher.py", line 148, in <module>
main(sys.argv[1:], exp_config.ExpConfig)
File "src/launcher.py", line 144, in main
session = sess)
File "/home/sprabh6/Attention-OCR/src/model/model.py", line 151, in __init__
use_gru = use_gru)
File "/home/sprabh6/Attention-OCR/src/model/seq2seq_model.py", line 141, in __init__
softmax_loss_function=softmax_loss_function)
File "/home/sprabh6/Attention-OCR/src/model/seq2seq.py", line 993, in model_with_buckets
decoder_inputs[:int(bucket[1])], int(bucket[0]))
File "/home/sprabh6/Attention-OCR/src/model/seq2seq_model.py", line 140, in <lambda>
self.target_weights, buckets, lambda x, y, z: seq2seq_f(x, y, z, False),
File "/home/sprabh6/Attention-OCR/src/model/seq2seq_model.py", line 122, in seq2seq_f
attn_num_hidden = attn_num_hidden)
File "/home/sprabh6/Attention-OCR/src/model/seq2seq.py", line 675, in embedding_attention_decoder
initial_state_attention=initial_state_attention, attn_num_hidden=attn_num_hidden)
File "/home/sprabh6/Attention-OCR/src/model/seq2seq.py", line 575, in attention_decoder
x = linear([inp] + attns, input_size, True)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 751, in _linear
res = math_ops.matmul(array_ops.concat(args, 1), weights)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1765, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1454, in _mat_mul
transpose_b=transpose_b, name=name)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/sprabh6/anaconda/envs/tf_1.0_keras_1/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Blas SGEMM launch failed : a.shape=(64, 522), b.shape=(522, 128), m=64, n=128, k=522
[[Node: model_with_buckets/embedding_attention_decoder_1/attention_decoder/attention_decoder/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](model_with_buckets/embedding_attention_decoder_1/attention_decoder/attention_decoder/concat, embedding_attention_decoder/attention_decoder/weights/read)]]
[[Node: conv_conv5/BatchNorm/AssignMovingAvg/_270 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_28061_conv_conv5/BatchNorm/AssignMovingAvg", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Figured that my LD_LIBRARY_PATH wasn't set properly. So added an entry to make it point to libcublas. Still didn't work. Figured it could be a memory problem. Set GPU options in launcher.py as follows -
I am get the following error when trying to run the model in the "train" phase -
Figured that my LD_LIBRARY_PATH wasn't set properly. So added an entry to make it point to libcublas. Still didn't work. Figured it could be a memory problem. Set GPU options in launcher.py as follows -
Still doesn't work. Can anyone please tell me if I'm missing anything ?
Tensorflow version - 1.1.0
The text was updated successfully, but these errors were encountered: