Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GJKClosestPoint::CastShape gets stuck in an infinite loop #1383

Open
JLMPL opened this issue Dec 10, 2024 · 3 comments
Open

GJKClosestPoint::CastShape gets stuck in an infinite loop #1383

JLMPL opened this issue Dec 10, 2024 · 3 comments

Comments

@JLMPL
Copy link

JLMPL commented Dec 10, 2024

When the origin of a narrow phase shape cast is close to a kinematic body that it's supposed to hit, the GJKClosestPoint::CastShape gets stuck in an infinite loop. The cast doesn't need to hit the body for this problem to occur. It doesn't happen when the origin of the cast is away from the body even if it does hit near the end.

It happens reliably every time. Shape of the body and direction of the cast doesn't seem to make a difference.

The issue doesn't happen in a debug build so it's hard for me to see the exact state involved. but I've managed to gather that local variable v is {-nan, -nan, -nan, -nan}. According to comment in GJKClosestPoint::GetClosest the NaN is accounted for.

if (v_len_sq < inPrevVLenSq) // Note, comparison order important: If v_len_sq is NaN then this expression will be false so we will return false

Trying to step through the function in release is not super useful but it does confirm that the loop never exits.
I'm sorry if this is vague. I'm a little over my head, not sure what other information could be useful.

Jolt version: release 5.2.0
MSVC version: 193833133
Optimization is on /O2
Floating point model is /fp:strict with /fp:except enabled
JPH_FLOATING_POINT_EXCEPTIONS_ENABLED is defined as well

@jrouwe
Copy link
Owner

jrouwe commented Dec 10, 2024

Hello,

I've encountered this before, and that if is indeed constructed so that it will break out of the loop in case a NaN is detected. As far as I have tested, it will break out of the loop and return, but the problem is that those NaNs slow down the simulation so much that it usually appears that the simulation is hanging (but that may not be your case).

The problem is that NaNs get introduced in the system in the first place and in my experience the NaNs are never generated in GJKClosestPoint but in the calling code. So can you maybe look up a bit further in the callstack to see which of the parameters contain NaNs. Perhaps you're already supplying a NaN to the cast shape? Or perhaps your kinematic body contains some NaNs?

If it is possible, it would really help if you can create a repro case that I can run myself. The simplest way is usually to edit SimpleTest in the Samples app. This is what I tried to repro your issue:

#include <TestFramework.h>

#include <Tests/General/SimpleTest.h>
#include <Jolt/Physics/Collision/Shape/BoxShape.h>
#include <Jolt/Physics/Collision/Shape/SphereShape.h>
#include <Jolt/Physics/Body/BodyCreationSettings.h>
#include <Jolt/Physics/Body/BodyActivationListener.h>
#include <Jolt/Physics/Collision/CollisionCollectorImpl.h>
#include <Jolt/Physics/Collision/ShapeCast.h>
#include <Layers.h>

JPH_IMPLEMENT_RTTI_VIRTUAL(SimpleTest)
{
	JPH_ADD_BASE_CLASS(SimpleTest, Test)
}

SimpleTest::~SimpleTest()
{
}

void SimpleTest::Initialize()
{
	mBodyInterface->CreateAndAddBody(BodyCreationSettings(new BoxShape(Vec3::sReplicate(1.0f)), RVec3::sZero(), Quat::sIdentity(), EMotionType::Kinematic, Layers::MOVING), EActivation::Activate);

	Ref<Shape> sphere = new SphereShape(1.0f);
	for (float x_start = -2.1f; x_start < -1.9f; x_start += 1.0e-5f)
	{
		RShapeCast cast(sphere, Vec3::sReplicate(1.0f), RMat44::sTranslation(RVec3(x_start, 0, 0)), Vec3::sAxisX());
		AllHitCollisionCollector<CastShapeCollector> collector;
		mPhysicsSystem->GetNarrowPhaseQuery().CastShape(cast, { }, RVec3::sZero(), collector);
		if (!collector.HadHit())
			std::abort();
	}
}

But that works fine.

@jrouwe
Copy link
Owner

jrouwe commented Dec 10, 2024

I also tried setting the position of the body, the start of the cast or the direction of the cast to RVec3::sNaN() / Vec3::sNaN() and in all cases I've tested so far it breaks out of the loop and returns no hit as expected.

@JLMPL
Copy link
Author

JLMPL commented Dec 11, 2024

Hello. I am the stupidest but also I'm not.

In Mat44.h: Mat44() = default; ///< Intentionally not initialized for performance reasons
My code:

	JPH::RMat44 tr; //clearly uninitialized memory
	tr.SetTranslation(origin);

For some reason I just assumed default constructed primitives are initialized. So I was getting NaNs from uninitialized memory - go figure. When I fix it with:

JPH::RMat44 tr = JPH::Mat44::sIdentity();

The problem does go away. Now the behavior is correct.

BUT if I were to say: unfix it. There's still the question of why does the loop not exit. So I hacked in an explicit check for NaN in GJKClosestPoint::GetClosest:

		float v_len_sq = v.LengthSq();

		if (std::isnan(v_len_sq)) //explicit check for NaN (might not work with fast math)
		{
			return false; //guaranteed break out of the loop
		}

		if (v_len_sq < inPrevVLenSq) // Note, comparison order important: If v_len_sq is NaN then this expression will be false so we will return false
		{
			// Return closest point
			outV = v;
			outVLenSq = v_len_sq;
			outSet = set;
			return true;
		}

With that check the code behaves exactly as expected and the problem disappears as well.

Conclusion 1: I used the library wrong. Sorry.
Conclusion 2: Somehow my compiler settings violate the rules of NaN comparsion which I heard is a possibility. Though /fp:strict should follow the IEEE754 exactly so I'm not sure what could be causing this.

I hope I haven't completely wasted your time. If it weren't for your informative response I would still probably be banging my head against a wall.

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants