-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out cause of the segfaults #5
Comments
Oh I know precisely why they happen. Basically, to make a very very long story short, glibc, and musl, and likely bionic, and every other libc is all:
I believe it's technically possible, and I'll likely return to it at some point, but I got tired of reading glibc source code, and other more interesting projects popped up. If you feel like helping, latest changes I pushed should build now (sorry about that). You can run the test binary after following instructions. Should print in debug mode by default. It will segfault either somewhere deep in your libc (I recommend installing the debug symbol version of your libc) or during a GNU ifunc resolution, because it calls something which isn't initialized properly yet. Otherwise the dynamic linker is actually more or less a completely working basic dynamic linker (it can even relocate and run itself, since it has no libc deps)! (It even performs lazy dynamic linking, which musl won't even do, as it resolves everything at startup time) In principle, if I created a fake libc with a test printf that didn't use a libc, but perhaps just a syscall, it would dynamically load and link it just fine. So to reiterate, the problem isn't really dryad so much as it is a lack of my time and willingness to suffer, and the ubiquity of fundamentally broken software from 1970 :] |
Here is a stunningly terrifying example of how badly engineered the spaghetti code of libc/ld.so is: as of glibc-2.24 on my system, the first ifunc resolution fails (it worked previously) due to a segfault on the second instruction: (gdb) disass __exp_finite
Dump of assembler code for function __exp_finite:
0x000000000000cc90 <+0>: mov 0x2f6329(%rip),%rax # 0x302fc0
0x000000000000cc97 <+7>: mov 0xb0(%rax),%eax
0x000000000000cc9d <+13>: test $0x1,%ah
0x000000000000cca0 <+16>: jne 0xccc0 <__exp_finite+48>
0x000000000000cca2 <+18>: test $0x40,%al
0x000000000000cca4 <+20>: jne 0xccb0 <__exp_finite+32>
0x000000000000cca6 <+22>: lea -0x5cd(%rip),%rax # 0xc6e0
0x000000000000ccad <+29>: retq
0x000000000000ccae <+30>: xchg %ax,%ax
0x000000000000ccb0 <+32>: lea 0x53c39(%rip),%rax # 0x608f0
0x000000000000ccb7 <+39>: retq
0x000000000000ccb8 <+40>: nopl 0x0(%rax,%rax,1)
0x000000000000ccc0 <+48>: lea 0x42bd9(%rip),%rax # 0x4f8a0
0x000000000000ccc7 <+55>: retq address 0x302fc0 is null, so likely the offset from that address will be null (and hence the segfault). But why? Oh, what's at 0x302fc0:
A copy relocation of global data. Do you know what Here is an example assembly GNU ifunc resolver for I presume the exp function for 4 cores, which loads the global dynamic linker struct (which is only present if the exact GNU libc dynamic linker is present and has initialized the struct (it isn't, because dryad isn't the glibc dynamic linker)):
Do you see that really awesome assembly macro loading So it loads some struct. Maybe I can emulate it by shimming my implementation whenever it's required by a binary? Here's the definition of the struct rtld_global_ro
{
#endif
/* If nonzero the appropriate debug information is printed. */
EXTERN int _dl_debug_mask;
#define DL_DEBUG_LIBS (1 << 0)
#define DL_DEBUG_IMPCALLS (1 << 1)
#define DL_DEBUG_BINDINGS (1 << 2)
#define DL_DEBUG_SYMBOLS (1 << 3)
#define DL_DEBUG_VERSIONS (1 << 4)
#define DL_DEBUG_RELOC (1 << 5)
#define DL_DEBUG_FILES (1 << 6)
#define DL_DEBUG_STATISTICS (1 << 7)
#define DL_DEBUG_UNUSED (1 << 8)
#define DL_DEBUG_SCOPES (1 << 9)
/* These two are used only internally. */
#define DL_DEBUG_HELP (1 << 10)
#define DL_DEBUG_PRELINK (1 << 11)
/* OS version. */
EXTERN unsigned int _dl_osversion;
/* Platform name. */
EXTERN const char *_dl_platform;
EXTERN size_t _dl_platformlen;
/* Cached value of `getpagesize ()'. */
EXTERN size_t _dl_pagesize;
/* Do we read from ld.so.cache? */
EXTERN int _dl_inhibit_cache;
/* Copy of the content of `_dl_main_searchlist' at startup time. */
EXTERN struct r_scope_elem _dl_initial_searchlist;
/* CLK_TCK as reported by the kernel. */
EXTERN int _dl_clktck;
/* If nonzero print warnings messages. */
EXTERN int _dl_verbose;
/* File descriptor to write debug messages to. */
EXTERN int _dl_debug_fd;
/* Do we do lazy relocations? */
EXTERN int _dl_lazy;
/* Nonzero if runtime lookups should not update the .got/.plt. */
EXTERN int _dl_bind_not;
/* Nonzero if references should be treated as weak during runtime
linking. */
EXTERN int _dl_dynamic_weak;
/* Default floating-point control word. */
EXTERN fpu_control_t _dl_fpu_control;
/* Expected cache ID. */
EXTERN int _dl_correct_cache_id;
/* Mask for hardware capabilities that are available. */
EXTERN uint64_t _dl_hwcap;
/* Mask for important hardware capabilities we honour. */
EXTERN uint64_t _dl_hwcap_mask;
#ifdef HAVE_AUX_VECTOR
/* Pointer to the auxv list supplied to the program at startup. */
EXTERN ElfW(auxv_t) *_dl_auxv;
#endif
/* Get architecture specific definitions. */
#define PROCINFO_DECL
#ifndef PROCINFO_CLASS
# define PROCINFO_CLASS EXTERN
#endif
#include <dl-procinfo.c>
/* Names of shared object for which the RPATH should be ignored. */
EXTERN const char *_dl_inhibit_rpath;
/* Location of the binary. */
EXTERN const char *_dl_origin_path;
/* -1 if the dynamic linker should honor library load bias,
0 if not, -2 use the default (honor biases for normal
binaries, don't honor for PIEs). */
EXTERN ElfW(Addr) _dl_use_load_bias;
/* Name of the shared object to be profiled (if any). */
EXTERN const char *_dl_profile;
/* Filename of the output file. */
EXTERN const char *_dl_profile_output;
/* Name of the object we want to trace the prelinking. */
EXTERN const char *_dl_trace_prelink;
/* Map of shared object to be prelink traced. */
EXTERN struct link_map *_dl_trace_prelink_map;
/* All search directories defined at startup. */
EXTERN struct r_search_path_elem *_dl_init_all_dirs;
#ifdef NEED_DL_SYSINFO
/* Syscall handling improvements. This is very specific to x86. */
EXTERN uintptr_t _dl_sysinfo;
#endif
#ifdef NEED_DL_SYSINFO_DSO
/* The vsyscall page is a virtual DSO pre-mapped by the kernel.
This points to its ELF header. */
EXTERN const ElfW(Ehdr) *_dl_sysinfo_dso;
/* At startup time we set up the normal DSO data structure for it,
and this points to it. */
EXTERN struct link_map *_dl_sysinfo_map;
#endif
/* Mask for more hardware capabilities that are available on some
platforms. */
EXTERN uint64_t _dl_hwcap2;
#ifdef SHARED
/* We add a function table to _rtld_global which is then used to
call the function instead of going through the PLT. The result
is that we can avoid exporting the functions and we do not jump
PLT relocations in libc.so. */
void (*_dl_debug_printf) (const char *, ...)
__attribute__ ((__format__ (__printf__, 1, 2)));
int (internal_function *_dl_catch_error) (const char **, const char **,
bool *, void (*) (void *), void *);
void (internal_function *_dl_signal_error) (int, const char *, const char *,
const char *);
void (*_dl_mcount) (ElfW(Addr) frompc, ElfW(Addr) selfpc);
lookup_t (internal_function *_dl_lookup_symbol_x) (const char *,
struct link_map *,
const ElfW(Sym) **,
struct r_scope_elem *[],
const struct r_found_version *,
int, int,
struct link_map *);
int (*_dl_check_caller) (const void *, enum allowmask);
void *(*_dl_open) (const char *file, int mode, const void *caller_dlopen,
Lmid_t nsid, int argc, char *argv[], char *env[]);
void (*_dl_close) (void *map);
void *(*_dl_tls_get_addr_soft) (struct link_map *);
#ifdef HAVE_DL_DISCOVER_OSVERSION
int (*_dl_discover_osversion) (void);
#endif
/* List of auditing interfaces. */
struct audit_ifaces *_dl_audit;
unsigned int _dl_naudit;
}; You're not hallucinating. That's real, professional C code.
Hehe, anyway, I'm done being silly, don't take me too serious, I'm just a curmudgeon 💃 |
Is musl less bad? |
That's a good question. Here's a long answer: So first let me clarify: I don't think any of them are "bad" in any ultimate sense, not to mention the incredible amount of human hours that were put into the work to make it robust to the point where it "works" - it's really quite impressive in that regard. Unfortunately, much of the code is constrained by legacy design decisions which were made without much thought or future considerations, and therefore consequent work is based on this. The second issue is the lack of any real specification (formal or informal) for how dynamic linkers should interact with a libc. In fact most implementations seem to conflate the two (i.e., they view them as interconnected. musl even goes so far as to make the libc shared object the dynamic linker itself!) A great example of a (good) informal specification is how the dynamic linker prepares the GOT (Global Offset Table), which allows the PLT (Procedure Linkage Table) to properly function for inter-library calls, without really specifying anything about which data structure to use, or how exactly runtime functions should be resolved (these are implementation details, as C programmers like to say). The contract between the binary and the dynamic linker is just: I'll make some storage with two entries in my binary. I'll let you, the dynamic linker, know about it and set it up. This will allow you to use it to allow me to lazily call functions after you figure out where they are. The only thing is: the function signature for resolving symbols (the thing in the second storage location) has to take a pointer to whatever it is that thing you placed in the first location, and an an integer (which is just the index of the symbol that needs resolving's in the PLT relocation array) It's really quite beautiful. This is illustrated in dryad here: https://github.com/m4b/dryad/blob/master/src/linker.rs#L315-L337 In some sense I break this informal "spec" because
Because I don't use a linked list. I use a struct which has extra information for me and an array which I index into using the index to quickly retrieve which shared object is requesting the symbol. (fun fact: because I also prepare this struct with the debug boolean, it allows me to print debug information at runtime, all set via I could really use whatever I want, but that's ok because:
Hence, this is an implementation detail of the dynamic linker, and changing the Unfortunately, other areas have not been so lucky with respect to a better specification (which means Because of this lack of specification, specifically I think in the area of threading (which is really just grafted onto libc in the first place), the developers of libc and the associated dynamic linker are really free to make any optimizations they see fit when it comes to this, for either reasons of ease or performance. What seems to end up occurring, probably because it's so easy, is to have a global struct floating around which is guaranteed to exist (remember, because you're libc, you also control the dynamic linker, so you can just assume the struct you defined in the linker is there) To make this more concrete, here is musl's global data struct, which packs in some threading information, as well as locale data: https://github.com/m4b/dryad/blob/master/src/tls.rs#L33-L45 You'll notice one of the fields is a pointer to the auxv vector. You can once again blame glibc for this; it introduced a non-posix function Anyway, I assume there's a similar such struct in So the the short answer is:
It is possible that 2. is unavoidable for any real libc implementation. I am dubious of this claim if only that most things are solvable by:
"And that's all I have to say about that" --- Forest Gump |
Thanks for the great explanation! |
It would be nice to figure out why the segfaults happen.
The text was updated successfully, but these errors were encountered: