Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out cause of the segfaults #5

Open
DemiMarie opened this issue Oct 30, 2016 · 5 comments
Open

Figure out cause of the segfaults #5

DemiMarie opened this issue Oct 30, 2016 · 5 comments

Comments

@DemiMarie
Copy link

It would be nice to figure out why the segfaults happen.

@m4b
Copy link
Owner

m4b commented Oct 31, 2016

Oh I know precisely why they happen. Basically, to make a very very long story short, glibc, and musl, and likely bionic, and every other libc is all:

  1. Broken by design
  2. contain massive amount of global state,
  3. This global state contains references to, and assume a dynamic linker of a very specific sort (e.g., the glibc dynamic linker, the musl dynamic linker (musl itself), and the bionic dynamic linker).
  4. This global state must be initialized properly by the _ dynamic linker itself_, otherwise accesses to say, malloc, when say printf is called, which calls some other function which checks if malloc is initialized, segfault, since the data it accesses is null, because it hasn't been initialized.
  5. Therefore I suspect to dynamically link a binary which dynamically links against glibc, I have to emulate the binary ABI of the glibc dynamic linker, and so on and so forth.

I believe it's technically possible, and I'll likely return to it at some point, but I got tired of reading glibc source code, and other more interesting projects popped up.

If you feel like helping, latest changes I pushed should build now (sorry about that).

You can run the test binary after following instructions. Should print in debug mode by default. It will segfault either somewhere deep in your libc (I recommend installing the debug symbol version of your libc) or during a GNU ifunc resolution, because it calls something which isn't initialized properly yet.

Otherwise the dynamic linker is actually more or less a completely working basic dynamic linker (it can even relocate and run itself, since it has no libc deps)! (It even performs lazy dynamic linking, which musl won't even do, as it resolves everything at startup time)

In principle, if I created a fake libc with a test printf that didn't use a libc, but perhaps just a syscall, it would dynamically load and link it just fine.

So to reiterate, the problem isn't really dryad so much as it is a lack of my time and willingness to suffer, and the ubiquity of fundamentally broken software from 1970 :]

@m4b
Copy link
Owner

m4b commented Nov 24, 2016

Here is a stunningly terrifying example of how badly engineered the spaghetti code of libc/ld.so is:

as of glibc-2.24 on my system, the first ifunc resolution fails (it worked previously) due to a segfault on the second instruction:

(gdb) disass __exp_finite
Dump of assembler code for function __exp_finite:
   0x000000000000cc90 <+0>:	mov    0x2f6329(%rip),%rax        # 0x302fc0
   0x000000000000cc97 <+7>:	mov    0xb0(%rax),%eax
   0x000000000000cc9d <+13>:	test   $0x1,%ah
   0x000000000000cca0 <+16>:	jne    0xccc0 <__exp_finite+48>
   0x000000000000cca2 <+18>:	test   $0x40,%al
   0x000000000000cca4 <+20>:	jne    0xccb0 <__exp_finite+32>
   0x000000000000cca6 <+22>:	lea    -0x5cd(%rip),%rax        # 0xc6e0
   0x000000000000ccad <+29>:	retq   
   0x000000000000ccae <+30>:	xchg   %ax,%ax
   0x000000000000ccb0 <+32>:	lea    0x53c39(%rip),%rax        # 0x608f0
   0x000000000000ccb7 <+39>:	retq   
   0x000000000000ccb8 <+40>:	nopl   0x0(%rax,%rax,1)
   0x000000000000ccc0 <+48>:	lea    0x42bd9(%rip),%rax        # 0x4f8a0
   0x000000000000ccc7 <+55>:	retq   

address 0x302fc0 is null, so likely the offset from that address will be null (and hence the segfault). But why?

Oh, what's at 0x302fc0:

0000000000302fc0 R_X86_64_GLOB_DAT  _rtld_global_ro@GLIBC_PRIVATE

A copy relocation of global data. Do you know what _rtld_global_ro is? Hopefully not, it will make you sad. It's that massive global, persistent state that I mentioned, which floats around in every single binary that links against libc (which is basically every single binary) every time they're run. That global read only data pointer is copied into the libm on load, which in the modern glibc appears to have become a CS101 professor's worst nightmare, i.e., global variables declared at top scope.

Here is an example assembly GNU ifunc resolver for I presume the exp function for 4 cores, which loads the global dynamic linker struct (which is only present if the exact GNU libc dynamic linker is present and has initialized the struct (it isn't, because dryad isn't the glibc dynamic linker)):

	.text
ENTRY (_ZGVdN4v_exp)
        .type   _ZGVdN4v_exp, @gnu_indirect_function
	LOAD_RTLD_GLOBAL_RO_RDX
        leaq    _ZGVdN4v_exp_avx2(%rip), %rax
	HAS_ARCH_FEATURE (AVX2_Usable)
        jz      2f
        ret
2:      leaq    _ZGVdN4v_exp_sse_wrapper(%rip), %rax
        ret
END (_ZGVdN4v_exp)

Do you see that really awesome assembly macro loading rtld_global_ro?

So it loads some struct. Maybe I can emulate it by shimming my implementation whenever it's required by a binary?

Here's the definition of the rtld_global_ro struct which I would have to reduplicate and populate in order to call a resolver function which inexplicably now suddenly requires state from their dynamic linker to select an implementation at load time:

struct rtld_global_ro
{
#endif

  /* If nonzero the appropriate debug information is printed.  */
  EXTERN int _dl_debug_mask;
#define DL_DEBUG_LIBS	    (1 << 0)
#define DL_DEBUG_IMPCALLS   (1 << 1)
#define DL_DEBUG_BINDINGS   (1 << 2)
#define DL_DEBUG_SYMBOLS    (1 << 3)
#define DL_DEBUG_VERSIONS   (1 << 4)
#define DL_DEBUG_RELOC      (1 << 5)
#define DL_DEBUG_FILES      (1 << 6)
#define DL_DEBUG_STATISTICS (1 << 7)
#define DL_DEBUG_UNUSED	    (1 << 8)
#define DL_DEBUG_SCOPES	    (1 << 9)
/* These two are used only internally.  */
#define DL_DEBUG_HELP       (1 << 10)
#define DL_DEBUG_PRELINK    (1 << 11)

  /* OS version.  */
  EXTERN unsigned int _dl_osversion;
  /* Platform name.  */
  EXTERN const char *_dl_platform;
  EXTERN size_t _dl_platformlen;

  /* Cached value of `getpagesize ()'.  */
  EXTERN size_t _dl_pagesize;

  /* Do we read from ld.so.cache?  */
  EXTERN int _dl_inhibit_cache;

  /* Copy of the content of `_dl_main_searchlist' at startup time.  */
  EXTERN struct r_scope_elem _dl_initial_searchlist;

  /* CLK_TCK as reported by the kernel.  */
  EXTERN int _dl_clktck;

  /* If nonzero print warnings messages.  */
  EXTERN int _dl_verbose;

  /* File descriptor to write debug messages to.  */
  EXTERN int _dl_debug_fd;

  /* Do we do lazy relocations?  */
  EXTERN int _dl_lazy;

  /* Nonzero if runtime lookups should not update the .got/.plt.  */
  EXTERN int _dl_bind_not;

  /* Nonzero if references should be treated as weak during runtime
     linking.  */
  EXTERN int _dl_dynamic_weak;

  /* Default floating-point control word.  */
  EXTERN fpu_control_t _dl_fpu_control;

  /* Expected cache ID.  */
  EXTERN int _dl_correct_cache_id;

  /* Mask for hardware capabilities that are available.  */
  EXTERN uint64_t _dl_hwcap;

  /* Mask for important hardware capabilities we honour. */
  EXTERN uint64_t _dl_hwcap_mask;

#ifdef HAVE_AUX_VECTOR
  /* Pointer to the auxv list supplied to the program at startup.  */
  EXTERN ElfW(auxv_t) *_dl_auxv;
#endif

  /* Get architecture specific definitions.  */
#define PROCINFO_DECL
#ifndef PROCINFO_CLASS
# define PROCINFO_CLASS EXTERN
#endif
#include <dl-procinfo.c>

  /* Names of shared object for which the RPATH should be ignored.  */
  EXTERN const char *_dl_inhibit_rpath;

  /* Location of the binary.  */
  EXTERN const char *_dl_origin_path;

  /* -1 if the dynamic linker should honor library load bias,
     0 if not, -2 use the default (honor biases for normal
     binaries, don't honor for PIEs).  */
  EXTERN ElfW(Addr) _dl_use_load_bias;

  /* Name of the shared object to be profiled (if any).  */
  EXTERN const char *_dl_profile;
  /* Filename of the output file.  */
  EXTERN const char *_dl_profile_output;
  /* Name of the object we want to trace the prelinking.  */
  EXTERN const char *_dl_trace_prelink;
  /* Map of shared object to be prelink traced.  */
  EXTERN struct link_map *_dl_trace_prelink_map;

  /* All search directories defined at startup.  */
  EXTERN struct r_search_path_elem *_dl_init_all_dirs;

#ifdef NEED_DL_SYSINFO
  /* Syscall handling improvements.  This is very specific to x86.  */
  EXTERN uintptr_t _dl_sysinfo;
#endif

#ifdef NEED_DL_SYSINFO_DSO
  /* The vsyscall page is a virtual DSO pre-mapped by the kernel.
     This points to its ELF header.  */
  EXTERN const ElfW(Ehdr) *_dl_sysinfo_dso;

  /* At startup time we set up the normal DSO data structure for it,
     and this points to it.  */
  EXTERN struct link_map *_dl_sysinfo_map;
#endif

  /* Mask for more hardware capabilities that are available on some
     platforms.  */
  EXTERN uint64_t _dl_hwcap2;

#ifdef SHARED
  /* We add a function table to _rtld_global which is then used to
     call the function instead of going through the PLT.  The result
     is that we can avoid exporting the functions and we do not jump
     PLT relocations in libc.so.  */
  void (*_dl_debug_printf) (const char *, ...)
       __attribute__ ((__format__ (__printf__, 1, 2)));
  int (internal_function *_dl_catch_error) (const char **, const char **,
					    bool *, void (*) (void *), void *);
  void (internal_function *_dl_signal_error) (int, const char *, const char *,
					      const char *);
  void (*_dl_mcount) (ElfW(Addr) frompc, ElfW(Addr) selfpc);
  lookup_t (internal_function *_dl_lookup_symbol_x) (const char *,
						     struct link_map *,
						     const ElfW(Sym) **,
						     struct r_scope_elem *[],
						     const struct r_found_version *,
						     int, int,
						     struct link_map *);
  int (*_dl_check_caller) (const void *, enum allowmask);
  void *(*_dl_open) (const char *file, int mode, const void *caller_dlopen,
		     Lmid_t nsid, int argc, char *argv[], char *env[]);
  void (*_dl_close) (void *map);
  void *(*_dl_tls_get_addr_soft) (struct link_map *);
#ifdef HAVE_DL_DISCOVER_OSVERSION
  int (*_dl_discover_osversion) (void);
#endif

  /* List of auditing interfaces.  */
  struct audit_ifaces *_dl_audit;
  unsigned int _dl_naudit;
};

You're not hallucinating. That's real, professional C code.

  • Yes there are macros and #ifdefs inside of the struct def.
  • Yes it makes me want to die too.
  • Yes this is the state of modern C programming
  • Yes they clearly know what they're doing
  • Yes it's extremely readable, obviously maintainable, and won't have any security vulnerabilities
  • Yes it does have comments (they didn't fall asleep during that lecture)

Hehe, anyway, I'm done being silly, don't take me too serious, I'm just a curmudgeon 💃

@bjorn3
Copy link

bjorn3 commented Nov 24, 2016

Is musl less bad?

@m4b
Copy link
Owner

m4b commented Nov 27, 2016

That's a good question. Here's a long answer:

So first let me clarify: I don't think any of them are "bad" in any ultimate sense, not to mention the incredible amount of human hours that were put into the work to make it robust to the point where it "works" - it's really quite impressive in that regard. Unfortunately, much of the code is constrained by legacy design decisions which were made without much thought or future considerations, and therefore consequent work is based on this.

The second issue is the lack of any real specification (formal or informal) for how dynamic linkers should interact with a libc. In fact most implementations seem to conflate the two (i.e., they view them as interconnected. musl even goes so far as to make the libc shared object the dynamic linker itself!)

A great example of a (good) informal specification is how the dynamic linker prepares the GOT (Global Offset Table), which allows the PLT (Procedure Linkage Table) to properly function for inter-library calls, without really specifying anything about which data structure to use, or how exactly runtime functions should be resolved (these are implementation details, as C programmers like to say).

The contract between the binary and the dynamic linker is just: I'll make some storage with two entries in my binary. I'll let you, the dynamic linker, know about it and set it up. This will allow you to use it to allow me to lazily call functions after you figure out where they are. The only thing is: the function signature for resolving symbols (the thing in the second storage location) has to take a pointer to whatever it is that thing you placed in the first location, and an an integer (which is just the index of the symbol that needs resolving's in the PLT relocation array)

It's really quite beautiful.

This is illustrated in dryad here: https://github.com/m4b/dryad/blob/master/src/linker.rs#L315-L337

In some sense I break this informal "spec" because

got[1] == "is the pointer to a data structure that the dynamic linker manages. This data structure is a linked list of nodes corresponding to the symbol tables for each shared library linked with the program. When a symbol is to be resolved by the linker, this list is traversed to find the appropriate symbol."

Because I don't use a linked list. I use a struct which has extra information for me and an array which I index into using the index to quickly retrieve which shared object is requesting the symbol. (fun fact: because I also prepare this struct with the debug boolean, it allows me to print debug information at runtime, all set via LD_DEBUG=all if I want, which is something libc or any of the other dynamic linkers don't do :)

I could really use whatever I want, but that's ok because:

  1. The got[1] only requires a pointer to some data structure.
  2. got[2] is the function pointer to the resolver stub that the binary calls which passes the got[2] pointer (which is whatever you wanted it to be) and the index of the symbol to be resolved.
  3. This pointer is only used by the dynamic linker
  4. The dynamic linker sets that pointer

Hence, this is an implementation detail of the dynamic linker, and changing the PT_INTERPRETER to a different one will not (in principle) affect the binary's behavior.

Unfortunately, other areas have not been so lucky with respect to a better specification (which means PT_INTERPRETER can't be changed without catastrophic consequences).

Because of this lack of specification, specifically I think in the area of threading (which is really just grafted onto libc in the first place), the developers of libc and the associated dynamic linker are really free to make any optimizations they see fit when it comes to this, for either reasons of ease or performance. What seems to end up occurring, probably because it's so easy, is to have a global struct floating around which is guaranteed to exist (remember, because you're libc, you also control the dynamic linker, so you can just assume the struct you defined in the linker is there)

To make this more concrete, here is musl's global data struct, which packs in some threading information, as well as locale data:

https://github.com/m4b/dryad/blob/master/src/tls.rs#L33-L45

You'll notice one of the fields is a pointer to the auxv vector. You can once again blame glibc for this; it introduced a non-posix function getauxv which takes nothing and returns an array of auxillary values that the kernel passed to the dynamic linker. Most implementations cache the location of this pointer in the global struct so subsequent calls to getauxv are fast. (cause calling getauxv needs to be really fast ;))

Anyway, I assume there's a similar such struct in bionic's linker, which has similar, but different global state. Hence while we decided that abstracting which libc you're linking against is good, the behavior of the dynamic linking portion (which is something userspace really never even knows about) seems to be completely implementation and dynamic linker dependent, which sucks IMHO.

So the the short answer is:

  1. in my very opinionated opinion, the musl code is less bad (it's actually much more readable imho), but still suffers from similar problems in that
  2. it also tightly couples the dynamic linker implementation with the libc implementation

It is possible that 2. is unavoidable for any real libc implementation. I am dubious of this claim if only that most things are solvable by:

  1. some extra indirection (GOT[0] and [1] being great example of this)
  2. a proper, well-thought out generic specification/protocol for the desired interactive behavior (and not the implementation)

"And that's all I have to say about that" --- Forest Gump

@bjorn3
Copy link

bjorn3 commented Nov 27, 2016

Thanks for the great explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants