Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reenable console completion help #5056

Merged
merged 1 commit into from
Nov 30, 2024
Merged

Conversation

NicksWorld
Copy link
Contributor

The code has been updated to properly account for potential items on the lua stack.

@myk002 myk002 merged commit de08146 into DFHack:develop Nov 30, 2024
14 checks passed
@dhthwy
Copy link
Contributor

dhthwy commented Nov 30, 2024

I would still suggest wrapping this in a

lua_istable() block

to prevent hangs and crashes (print an error message instead) as there is a very real possibility it won't be a table -- other stuff can be put on the stack in the future and top + 1 won't work anymore.

And I'm not sure if there's supposed to be things on the stack still, so that may get fixed and potentially cause a problem.

@dhthwy
Copy link
Contributor

dhthwy commented Nov 30, 2024

nevermind. I suppose top + 1 will handle those case. Though, still not a bad idea to wrap in lua_istable(). It's better to print an error than to crash or hang.

@@ -453,13 +453,13 @@ void get_commands(color_ostream &con, std::vector<std::string> &commands) {
con.printerr("Failed Lua call to helpdb.get_commands.\n");
}

Lua::GetVector(L, commands);
Lua::GetVector(L, commands, top + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works because the stack size increases by a net of 1 item (the return value from helpdb.get_commands()) between when top is initialized and when this line is reached. In between, there are several stack operations (helpdb is pushed and popped internally to PushModulePublic, and helpdb.get_commands is pushed by PushModulePublic and popped by SafeCall). In my experience, there is some risk that someone coming along later could change the intervening logic to push more things on the stack without popping them, which is legal since this function is using a StackUnwinder, but would break this index.

So I would suggest

Suggested change
Lua::GetVector(L, commands, top + 1);
Lua::GetVector(L, commands, -1);

which is equivalent to Lua::GetVector(L, commands, lua_gettop(L) - 1); and will always give you the top-most item on the stack, which is what you want here.

Copy link
Member

@lethosor lethosor Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned on Discord that -1 won't work because GetVector itself manipulates the stack before using the input index. The patch you posted would fix that.

@@ -449,11 +450,12 @@ void get_commands(color_ostream &con, std::vector<std::string> &commands) {
         return;
     }

-    if (!Lua::SafeCall(con, L, 0, 1)) {
+    if (!Lua::SafeCall(con, L, 0, 1) || !lua_istable(L, -1)) {
         con.printerr("Failed Lua call to helpdb.get_commands.\n");
+        return;
     }

-    Lua::GetVector(L, commands, top + 1);
+    Lua::GetVector(L, commands, -1);
 }

 static bool try_autocomplete(color_ostream &con, const std::string &first, std::string &completed)
diff --git a/library/LuaTools.cpp b/library/LuaTools.cpp
index 08b90bd61..fb4a593ff 100644
--- a/library/LuaTools.cpp
+++ b/library/LuaTools.cpp
@@ -122,6 +122,9 @@ void DFHack::Lua::Push(lua_State *state, const df::coord2d &pos)

 void DFHack::Lua::GetVector(lua_State *state, std::vector<std::string> &pvec, int idx)
 {
+    // Allow usage of relative indexes in arg
+    idx = lua_absindex(state, idx);
+
     luaL_checktype(state, idx, LUA_TTABLE);
     lua_pushnil(state);   // first key
     while (lua_next(state, idx) != 0)

lethosor added a commit to lethosor/dfhack that referenced this pull request Nov 30, 2024
Previously, calling a Lua C API function that throws an error (e.g. with any of
the `lua_error()` family of functions, which use `luaD_throw()` internally) in a
function that also uses our own `StackUnwinder` would deadlock, because:

- `luaD_throw()` calls `lua_lock()` but apparently expects something else to
  call `lua_unlock()`
- `~StackUnwinder()` calls `lua_settop()` to rewind the stack, which internally
  also calls `lua_lock()` and `lua_unlock()`. This is called before the mutex
  can be unlocked.

This was the root cause of the issue behind DFHack#5040, DFHack#5055, DFHack#5056 - `GetVector()`
was called with an invalid index (only when invoked from `gui/launcher`, because
this changed the data on the Lua stack) and threw an exception, which caused the
`~StackUnwinder()` destructor to run while the Lua state was still locked.

Other things to note:
- We control the locking/unlocking implementation - the default, defined in
  `llimits.h`, is a _no-op_.
- @ab9rf points out that the `CRITICAL_SECTION` API we're using on Windows
  already appears to be equivalent to a recursive mutex, so this issue likely
  would not have occurred there.

The issue can be reproduced relatively easily with a simple test command, e.g.:

```diff
@@ -716,6 +723,15 @@ void ls_helper(color_ostream &con, const std::vector<std::string> &params) {
 command_result Core::runCommand(color_ostream &con, const std::string &first_, std::vector<std::string> &parts)
 {
     std::string first = first_;
+    if (first == "luaerror") {
+        auto L = Lua::Core::State;
+        Lua::StackUnwinder top(L);
+
+        luaL_error(L, "test error");
+
+        return CR_OK;
+    }
+
     CommandDepthCounter counter;
     if (!counter.ok())
     {
```

In `gui/launcher`:
- before this change, invoking `luaerror` would hang DF.
- after this change, `luaerror` prints `nil` in red to the native console and
  does nothing. This comes from the last-resort error handler in
  `LuaTools.cpp:report_error()`, and is equivalent to how other unexpected
  errors are handled in code invoked from Lua. Not ideal, but better than a
  crash.

From the native console, invoking `luaerror` crashes DF in both cases (SIGABRT),
and logs `PANIC: unprotected error in call to Lua API (test error)` to
`stderr.log`. The difference in behavior is because `report_error()` above is
part of the error handling system present when code is called _from Lua_,
through variants of `SafeCall`. There is no Lua layer involved in the native
console. (Note: the same crash would have been observed in the original issue
in DFHack#5056 et. al. if the error had occurred when invoked through the console.)
myk002 pushed a commit that referenced this pull request Dec 1, 2024
Previously, calling a Lua C API function that throws an error (e.g. with any of
the `lua_error()` family of functions, which use `luaD_throw()` internally) in a
function that also uses our own `StackUnwinder` would deadlock, because:

- `luaD_throw()` calls `lua_lock()` but apparently expects something else to
  call `lua_unlock()`
- `~StackUnwinder()` calls `lua_settop()` to rewind the stack, which internally
  also calls `lua_lock()` and `lua_unlock()`. This is called before the mutex
  can be unlocked.

This was the root cause of the issue behind #5040, #5055, #5056 - `GetVector()`
was called with an invalid index (only when invoked from `gui/launcher`, because
this changed the data on the Lua stack) and threw an exception, which caused the
`~StackUnwinder()` destructor to run while the Lua state was still locked.

Other things to note:
- We control the locking/unlocking implementation - the default, defined in
  `llimits.h`, is a _no-op_.
- @ab9rf points out that the `CRITICAL_SECTION` API we're using on Windows
  already appears to be equivalent to a recursive mutex, so this issue likely
  would not have occurred there.

The issue can be reproduced relatively easily with a simple test command, e.g.:

```diff
@@ -716,6 +723,15 @@ void ls_helper(color_ostream &con, const std::vector<std::string> &params) {
 command_result Core::runCommand(color_ostream &con, const std::string &first_, std::vector<std::string> &parts)
 {
     std::string first = first_;
+    if (first == "luaerror") {
+        auto L = Lua::Core::State;
+        Lua::StackUnwinder top(L);
+
+        luaL_error(L, "test error");
+
+        return CR_OK;
+    }
+
     CommandDepthCounter counter;
     if (!counter.ok())
     {
```

In `gui/launcher`:
- before this change, invoking `luaerror` would hang DF.
- after this change, `luaerror` prints `nil` in red to the native console and
  does nothing. This comes from the last-resort error handler in
  `LuaTools.cpp:report_error()`, and is equivalent to how other unexpected
  errors are handled in code invoked from Lua. Not ideal, but better than a
  crash.

From the native console, invoking `luaerror` crashes DF in both cases (SIGABRT),
and logs `PANIC: unprotected error in call to Lua API (test error)` to
`stderr.log`. The difference in behavior is because `report_error()` above is
part of the error handling system present when code is called _from Lua_,
through variants of `SafeCall`. There is no Lua layer involved in the native
console. (Note: the same crash would have been observed in the original issue
in #5056 et. al. if the error had occurred when invoked through the console.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants