You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.
Please describe the bug
Training with ShardParallel
Please describe the expected behavior
unexpected system error
System information and environment
To Reproduce
Steps to reproduce the behavior:
1.run example get training
2.See error
Screenshots
(MeshHostWorker pid=595449) 2023-07-08 01:00:54.519514: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_executable.cc:459] Check failed: !info.content.empty()
(MeshHostWorker pid=595449) *** SIGABRT received at time=1688778054 on cpu 149 ***
(MeshHostWorker pid=595449) PC: @ 0x7f41ad5cd03b (unknown) raise
(MeshHostWorker pid=595449) @ 0x7f41ad5cd0c0 4016 (unknown)
(MeshHostWorker pid=595449) @ 0x7f10a9fae28e 752 xla::gpu::GpuExecutable::ResolveConstantGlobals()
(MeshHostWorker pid=595449) @ 0x7f10ab561864 2784 xla::gpu::GpuExecutable::ExecuteAsyncOnStreamImpl()
(MeshHostWorker pid=595449) @ 0x7f10ab5631bf 128 xla::gpu::GpuExecutable::ExecuteAsyncOnStream()
(MeshHostWorker pid=595449) @ 0x7f10adf836e6 1376 xla::Executable::ExecuteAsyncOnStreamWrapper()
(MeshHostWorker pid=595449) @ 0x7f10ab9ff720 2432 xla::LocalExecutable::RunAsync()
(MeshHostWorker pid=595449) @ 0x7f10ab9ffe90 256 xla::LocalExecutable::RunAsync()
(MeshHostWorker pid=595449) @ 0x7f10ab5eb1fa 2720 xla::PjRtStreamExecutorExecutable::EnqueueExecution()
(MeshHostWorker pid=595449) @ 0x7f10ab5ec631 5360 xla::PjRtStreamExecutorExecutable::ExecuteHelper()
(MeshHostWorker pid=595449) @ 0x7f10ab5eea59 240 std::_Function_handler<>::_M_invoke()
(MeshHostWorker pid=595449) @ 0x7f10ab9d8378 208 xla::WorkerThread::WorkLoop()
(MeshHostWorker pid=595449) @ 0x7f10af0de3e5 80 tsl::(anonymous namespace)::PThread::ThreadFn()
(MeshHostWorker pid=595449) @ 0x7f41ad56f609 (unknown) start_thread
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: *** SIGABRT received at time=1688778054 on cpu 149 ***
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: PC: @ 0x7f41ad5cd03b (unknown) raise
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f41ad5cd0c0 4016 (unknown)
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f10a9fae28e 752 xla::gpu::GpuExecutable::ResolveConstantGlobals()
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f10ab561864 2784 xla::gpu::GpuExecutable::ExecuteAsyncOnStreamImpl()
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f10ab5631bf 128 xla::gpu::GpuExecutable::ExecuteAsyncOnStream()
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f10adf836e6 1376 xla::Executable::ExecuteAsyncOnStreamWrapper()
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f10ab9ff720 2432 xla::LocalExecutable::RunAsync()
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f10ab9ffe90 256 xla::LocalExecutable::RunAsync()
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f10ab5eb1fa 2720 xla::PjRtStreamExecutorExecutable::EnqueueExecution()
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f10ab5ec631 5360 xla::PjRtStreamExecutorExecutable::ExecuteHelper()
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f10ab5eea59 240 std::_Function_handler<>::_M_invoke()
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f10ab9d8378 208 xla::WorkerThread::WorkLoop()
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f10af0de3e5 80 tsl::(anonymous namespace)::PThread::ThreadFn()
(MeshHostWorker pid=595449) [2023-07-08 01:00:54,596 E 595449 596143] logging.cc:361: @ 0x7f41ad56f609 (unknown) start_thread
Code snippet to reproduce the problem
Additional information
Add any other context about the problem here or include any logs that would be helpful to diagnose the problem.
The text was updated successfully, but these errors were encountered: