Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to run training step #7

Open
jrbasso opened this issue Nov 16, 2018 · 3 comments
Open

Failed to run training step #7

jrbasso opened this issue Nov 16, 2018 · 3 comments

Comments

@jrbasso
Copy link

jrbasso commented Nov 16, 2018

I downloaded the repo, went to all the steps of the setup notebook and while executing the example notebook I get an issue running the training: INFO:tensorflow:Error reported to Coordinator: indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]

I use an AWS p2.xlarge and a p3.8xlarge to test and both gives the same error. I used the AWS Deep Learning AMI with Ubuntu.

Full output:

loading annotations into memory...
Done (t=15.25s)
creating index...
index created!
loading annotations into memory...
Done (t=7.31s)
creating index...
index created!
WARNING:tensorflow:From /home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py:737: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
INFO:tensorflow:Restoring parameters from logs/sample/model.ckpt-0
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path logs/sample/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Error reported to Coordinator: indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
	 [[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33)  = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]

Caused by op 'GatherNd_5', defined at:
  File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 499, in start
    self.io_loop.start()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
    self._run_once()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
    handle._run()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 122, in _handle_events
    handler_func(fileobj, events)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2662, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2785, in _run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2907, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-05888e252ab7>", line 8, in <module>
    train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
  File "/home/ubuntu/personlab-tf/personlab/model.py", line 25, in train
    output, init_func = model_func(tensors['image'], checkpoint_path=checkpoint_path, is_training=True)
  File "/home/ubuntu/personlab-tf/personlab/models/mobilenet_v2.py", line 16, in mobilenet_v2_model
    res = model_base(model_output, inner_h, inner_w)
  File "/home/ubuntu/personlab-tf/personlab/models/model_base.py", line 36, in model_base
    lo_y = gather_bilinear(lo_y, lo_p, (inner_h, inner_w)) + lo_y
  File "/home/ubuntu/personlab-tf/personlab/util.py", line 33, in gather_bilinear
    r = tf.gather_nd(params, idx)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3240, in gather_nd
    "GatherNd", params=params, indices=indices, name=name)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
	 [[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33)  = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
	 [[{{node GatherNd_5}} = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
    yield
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py", line 495, in run
    self.run_loop()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py", line 1034, in run_loop
    self._sv.global_step])
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
	 [[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33)  = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]

Caused by op 'GatherNd_5', defined at:
  File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 499, in start
    self.io_loop.start()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
    self._run_once()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
    handle._run()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 122, in _handle_events
    handler_func(fileobj, events)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2662, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2785, in _run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2907, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-05888e252ab7>", line 8, in <module>
    train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
  File "/home/ubuntu/personlab-tf/personlab/model.py", line 25, in train
    output, init_func = model_func(tensors['image'], checkpoint_path=checkpoint_path, is_training=True)
  File "/home/ubuntu/personlab-tf/personlab/models/mobilenet_v2.py", line 16, in mobilenet_v2_model
    res = model_base(model_output, inner_h, inner_w)
  File "/home/ubuntu/personlab-tf/personlab/models/model_base.py", line 36, in model_base
    lo_y = gather_bilinear(lo_y, lo_p, (inner_h, inner_w)) + lo_y
  File "/home/ubuntu/personlab-tf/personlab/util.py", line 33, in gather_bilinear
    r = tf.gather_nd(params, idx)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3240, in gather_nd
    "GatherNd", params=params, indices=indices, name=name)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
	 [[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33)  = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1333     try:
-> 1334       return fn(*args)
   1335     except errors.OpError as e:

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1318       return self._call_tf_sessionrun(
-> 1319           options, feed_dict, fetch_list, target_list, run_metadata)
   1320 

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1406         self._session, options, feed_dict, fetch_list, target_list,
-> 1407         run_metadata)
   1408 

InvalidArgumentError: indices[1,4,50,50,36] = [4, 49, 51, 5] does not index into param shape [5,51,51,17]
	 [[{{node GatherNd_1}} = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_3/BiasAdd, stack_8)]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py in managed_session(self, master, config, start_standard_services, close_summary_writer)
    993           start_standard_services=start_standard_services)
--> 994       yield sess
    995     except Exception as e:

~/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py in train(train_op, logdir, train_step_fn, train_step_kwargs, log_every_n_steps, graph, master, is_chief, global_step, number_of_steps, init_op, init_feed_dict, local_init_op, init_fn, ready_op, summary_op, save_summaries_secs, summary_writer, startup_delay_steps, saver, save_interval_secs, sync_optimizer, session_config, session_wrapper, trace_every_n_steps, ignore_live_threads)
    769             total_loss, should_stop = train_step_fn(
--> 770                 sess, train_op, global_step, train_step_kwargs)
    771             if should_stop:

~/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py in train_step(sess, train_op, global_step, train_step_kwargs)
    486                                         options=trace_run_options,
--> 487                                         run_metadata=run_metadata)
    488   time_elapsed = time.time() - start_time

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    928       result = self._run(None, fetches, feed_dict, options_ptr,
--> 929                          run_metadata_ptr)
    930       if run_metadata:

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1151       results = self._do_run(handle, final_targets, final_fetches,
-> 1152                              feed_dict_tensor, options, run_metadata)
   1153     else:

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1327       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1328                            run_metadata)
   1329     else:

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1347       message = error_interpolation.interpolate(message, self._graph)
-> 1348       raise type(e)(node_def, op, message)
   1349 

InvalidArgumentError: indices[1,4,50,50,36] = [4, 49, 51, 5] does not index into param shape [5,51,51,17]
	 [[node GatherNd_1 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33)  = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_3/BiasAdd, stack_8)]]

Caused by op 'GatherNd_1', defined at:
  File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 499, in start
    self.io_loop.start()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
    self._run_once()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
    handle._run()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 122, in _handle_events
    handler_func(fileobj, events)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2662, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2785, in _run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2907, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-05888e252ab7>", line 8, in <module>
    train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
  File "/home/ubuntu/personlab-tf/personlab/model.py", line 25, in train
    output, init_func = model_func(tensors['image'], checkpoint_path=checkpoint_path, is_training=True)
  File "/home/ubuntu/personlab-tf/personlab/models/mobilenet_v2.py", line 16, in mobilenet_v2_model
    res = model_base(model_output, inner_h, inner_w)
  File "/home/ubuntu/personlab-tf/personlab/models/model_base.py", line 30, in model_base
    mo_y = gather_bilinear(so_y, mo_p, (inner_h, inner_w)) + mo_y
  File "/home/ubuntu/personlab-tf/personlab/util.py", line 33, in gather_bilinear
    r = tf.gather_nd(params, idx)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3240, in gather_nd
    "GatherNd", params=params, indices=indices, name=name)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[1,4,50,50,36] = [4, 49, 51, 5] does not index into param shape [5,51,51,17]
	 [[node GatherNd_1 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33)  = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_3/BiasAdd, stack_8)]]


During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-2-05888e252ab7> in <module>()
      6 log_dir = 'logs/sample/'
      7 
----> 8 train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)

~/personlab-tf/personlab/model.py in train(model_func, data_generator, checkpoint_path, log_dir)
     77                                    log_every_n_steps=100,
     78                                    save_summaries_secs=300,
---> 79                                    session_config=sess_config,
     80                                   )
     81 

~/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py in train(train_op, logdir, train_step_fn, train_step_kwargs, log_every_n_steps, graph, master, is_chief, global_step, number_of_steps, init_op, init_feed_dict, local_init_op, init_fn, ready_op, summary_op, save_summaries_secs, summary_writer, startup_delay_steps, saver, save_interval_secs, sync_optimizer, session_config, session_wrapper, trace_every_n_steps, ignore_live_threads)
    783               threads,
    784               close_summary_writer=True,
--> 785               ignore_live_threads=ignore_live_threads)
    786 
    787     except errors.AbortedError:

~/anaconda3/lib/python3.6/contextlib.py in __exit__(self, type, value, traceback)
     97                 value = type()
     98             try:
---> 99                 self.gen.throw(type, value, traceback)
    100             except StopIteration as exc:
    101                 # Suppress StopIteration *unless* it's the same exception that

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py in managed_session(self, master, config, start_standard_services, close_summary_writer)
   1002         # threads which are not checking for `should_stop()`.  They
   1003         # will be stopped when we close the session further down.
-> 1004         self.stop(close_summary_writer=close_summary_writer)
   1005       finally:
   1006         # Close the session to finish up all pending calls.  We do not care

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py in stop(self, threads, close_summary_writer, ignore_live_threads)
    830           threads,
    831           stop_grace_period_secs=self._stop_grace_secs,
--> 832           ignore_live_threads=ignore_live_threads)
    833     finally:
    834       # Close the writer last, in case one of the running threads was using it.

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py in join(self, threads, stop_grace_period_secs, ignore_live_threads)
    387       self._registered_threads = set()
    388       if self._exc_info_to_raise:
--> 389         six.reraise(*self._exc_info_to_raise)
    390       elif stragglers:
    391         if ignore_live_threads:

~/anaconda3/lib/python3.6/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py in stop_on_exception(self)
    295     """
    296     try:
--> 297       yield
    298     except:  # pylint: disable=bare-except
    299       self.request_stop(ex=sys.exc_info())

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/coordinator.py in run(self)
    493         while not self._coord.wait_for_stop(next_timer_time - time.time()):
    494           next_timer_time += self._timer_interval_secs
--> 495           self.run_loop()
    496       self.stop_loop()
    497 

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/supervisor.py in run_loop(self)
   1032     if self._sv.global_step is not None:
   1033       summary_strs, global_step = self._sess.run([self._sv.summary_op,
-> 1034                                                   self._sv.global_step])
   1035     else:
   1036       summary_strs = self._sess.run(self._sv.summary_op)

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    927     try:
    928       result = self._run(None, fetches, feed_dict, options_ptr,
--> 929                          run_metadata_ptr)
    930       if run_metadata:
    931         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1150     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1151       results = self._do_run(handle, final_targets, final_fetches,
-> 1152                              feed_dict_tensor, options, run_metadata)
   1153     else:
   1154       results = []

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1326     if handle is None:
   1327       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1328                            run_metadata)
   1329     else:
   1330       return self._do_call(_prun_fn, handle, feeds, fetches)

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1346           pass
   1347       message = error_interpolation.interpolate(message, self._graph)
-> 1348       raise type(e)(node_def, op, message)
   1349 
   1350   def _extend_graph(self):

InvalidArgumentError: indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
	 [[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33)  = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]

Caused by op 'GatherNd_5', defined at:
  File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 499, in start
    self.io_loop.start()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
    self._run_once()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/base_events.py", line 1432, in _run_once
    handle._run()
  File "/home/ubuntu/anaconda3/lib/python3.6/asyncio/events.py", line 145, in _run
    self._callback(*self._args)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 122, in _handle_events
    handler_func(fileobj, events)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2662, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2785, in _run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2907, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-05888e252ab7>", line 8, in <module>
    train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
  File "/home/ubuntu/personlab-tf/personlab/model.py", line 25, in train
    output, init_func = model_func(tensors['image'], checkpoint_path=checkpoint_path, is_training=True)
  File "/home/ubuntu/personlab-tf/personlab/models/mobilenet_v2.py", line 16, in mobilenet_v2_model
    res = model_base(model_output, inner_h, inner_w)
  File "/home/ubuntu/personlab-tf/personlab/models/model_base.py", line 36, in model_base
    lo_y = gather_bilinear(lo_y, lo_p, (inner_h, inner_w)) + lo_y
  File "/home/ubuntu/personlab-tf/personlab/util.py", line 33, in gather_bilinear
    r = tf.gather_nd(params, idx)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3240, in gather_nd
    "GatherNd", params=params, indices=indices, name=name)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[3,4,50,50,16] = [4, 51, 50, 16] does not index into param shape [5,51,51,17]
	 [[node GatherNd_5 (defined at /home/ubuntu/personlab-tf/personlab/util.py:33)  = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_7/BiasAdd, stack_48)]]
@alexryan
Copy link

I have this error as well.

`INFO:tensorflow:Error reported to Coordinator: flat indices[1728695, :] = [2, 25, 51, 6] does not index into param (shape: [5,51,51,17]).
[[Node: GatherNd_1 = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_3/BiasAdd, stack_8)]]

Caused by op 'GatherNd_1', defined at:
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in
app.launch_new_instance()
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 505, in start
self.io_loop.start()
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 132, in start
self.asyncio_loop.run_forever()
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/asyncio/base_events.py", line 422, in run_forever
self._run_once()
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/asyncio/base_events.py", line 1434, in _run_once
handle._run()
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tornado/ioloop.py", line 758, in _run_callback
ret = callback()
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tornado/stack_context.py", line 300, in null_wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tornado/gen.py", line 1233, in inner
self.run()
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tornado/gen.py", line 1147, in run
yielded = self.gen.send(value)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 357, in process_one
yield gen.maybe_future(dispatch(*args))
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell
yield gen.maybe_future(handler(stream, idents, msg))
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 534, in execute_request
user_expressions, allow_stdin,
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tornado/gen.py", line 326, in wrapper
yielded = next(result)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2819, in run_cell
raw_cell, store_history, silent, shell_futures)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2845, in _run_cell
return runner(coro)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
coro.send(None)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3020, in run_cell_async
interactivity=interactivity, compiler=compiler, result=result)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3191, in run_ast_nodes
if (yield from self.run_code(code, result)):
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3267, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 8, in
train(mobilenet_v2_model, gen.loader, pm_check_path, log_dir)
File "/home/ubuntu/personlab-tf/personlab/model.py", line 25, in train
output, init_func = model_func(tensors['image'], checkpoint_path=checkpoint_path, is_training=True)
File "/home/ubuntu/personlab-tf/personlab/models/mobilenet_v2.py", line 16, in mobilenet_v2_model
res = model_base(model_output, inner_h, inner_w)
File "/home/ubuntu/personlab-tf/personlab/models/model_base.py", line 30, in model_base
mo_y = gather_bilinear(so_y, mo_p, (inner_h, inner_w)) + mo_y
File "/home/ubuntu/personlab-tf/personlab/util.py", line 33, in gather_bilinear
r = tf.gather_nd(params, idx)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3052, in gather_nd
"GatherNd", params=params, indices=indices, name=name)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/ubuntu/miniconda3/envs/personlab-tf/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): flat indices[1728695, :] = [2, 25, 51, 6] does not index into param (shape: [5,51,51,17]).
[[Node: GatherNd_1 = GatherNd[Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Conv_3/BiasAdd, stack_8)]]`

@sydsim
Copy link
Owner

sydsim commented Dec 29, 2018

@jrbasso @alexryan
sorry to reply you too late. it seems there is a bug that making the offset vectors to be out of boundary.
it occurs only in CPU environment, and ignored in GPU enviroment. (tensorflow/tensorflow#15091)
I'll try to fix it in as soon as possible. if you find how to fix it, please send pull request.

@jrbasso
Copy link
Author

jrbasso commented Feb 14, 2019

@sydsim I actually ran into this issue using a GPU environment. Do you have any ideal on what can I try to mitigate that? Or any clue that I can research and try to fix it? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants