-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase pyelftools and reapply S2E-specific commits as needed #3
base: master
Are you sure you want to change the base?
Conversation
* relocation: handle ARM binaries * relocation: handle R_ARM_ABS32 for ARM machines * testfiles: add reloc_arm_gcc.o.elf Generated on Ubuntu 14.04 using: arm-linux-gnueabi-gcc-4.7 -c -g -o reloc_armhf_gcc.o.elf hello.c * testfiles: add reloc_armhf_gcc.o.elf Generated on Ubuntu 14.04 using: arm-linux-gnueabihf-gcc-4.7 -c -g -o reloc_armhf_gcc.o.elf hello.c * readelf: print soft-float abi for ARM if EF_ARM_ABI_FLOAT_SOFT in flags * readelf: print hard-float abi for ARM if EF_ARM_ABI_FLOAT_HARD in flags * readelf: print BE8 info for armeb binaries * testfiles: add simple_armhf_gcc.o.elf Generated on Ubuntu 14.04 using: arm-linux-gnueabihf-gcc-4.7 -g -o simple_armhf_gcc.o.elf hello.c * elf: remove unwind from dicts and set ARM_EXIDX description * testfiles: add reloc_armsf_gcc.o.elf as soft float testcase taken from binutils 2.30 * testfiles: add reloc_armeb_gcc.o.elf as arm big endian testcase taken from binutils 2.30 testcase arm-be8 * readelf: print endian info LE8 if flag was set in header flags
* Add support for 'R_ARM_CALL' relocation type * Add test script and test files to verify support for 'R_ARM_CALL' Signed-off-by: Koltunov Dmitry <[email protected]>
* Provide enums for DT_FLAGS and DT_FLAGS_1 This change adds two enums with the name to value mappings for the two flags fields in the dynamic section. The values and corresponding names are taken from the elf/elf.h file in the most recent glibc version. The enums are also used to print the names instead of the raw hex values for DT_FLAGS and DT_FLAGS_1 in scripts/readelf.py. Fixes: eliben#189 * Add test file for DT_FLAGS/DT_FLAGS_1 parsing The test file has the DF_BIND_NOW and DF_ORIGIN flags set in DT_FLAGS as well as DF_1_NOW, DF_1_GLOBAL, DF_1_NOOPEN and DF_1_ORIGIN flags in DF_FLAGS_1. This is the source code for the dt_flags.elf file: #include <stdio.h> int function(const char *arg){ printf("Hello, %s!", arg); return 0; } and was compiled using the following command line: $ gcc -shared -m32 \ -Wl,-rpath,'$ORIGIN/lib',-z,global,-z,origin,-z,nodlopen,-z,now \ -o testfiles_for_readelf/dt_flags.elf dt_flags.c
The __init__ function of ARMAttribute has two parameters structs and stream through which the caller can pass in the relevant objects (ARMAttributesSubsubsection does that after seeking to the right position in stream). The accesses for TAG_SECTION and TAG_SYMBOL, however, were referring to non-existing members instead of the parameters. Additionally, one assertion tries to access an undefined 'null_byte' variable which should be 'nul' instead.
The stream position in the .debug_info stream can't change when reading from the .debug_abbrev stream.
…eliben#206) * Implemented ELFFile.get_machine_arch for the remaining architectures. Added all architectures according to the ENUM_E_MACHINE. * Refactored if statement into dict.get.
The code that is intended to coalesce null DIEs into the DIE that precedes them does not do that and is actually not needed as the 'unflattening' procedure takes care of any unexpected null DIEs. Also added a unit test for verifying the DIE size calculation.
…ns (eliben#208) * Added support for decoding .debug_pubtypes and .debug_pubnames sections * Added reference output to dwarf_pubnames_types.py example. * Added readelf support, fixed review comments and documentation updates * Avoid printing the entire die in pubnames example to workaround Python2 vs 3 imcompatibilites
Create all the AbbrevDecl objects during parsing and later return references to them - this gives a small performance gain.
…#214) In DWARFv4 the location lists are referenced with the 'sec_offset' attribute form instead of 'data4' or 'data8'.
* tox: explicitly set locale Locale affects GNU binutils output translation which cause run_readelf_tests.py to fail if system language is not English. Signed-off-by: Efimov Vasily <[email protected]> * test: unittest reproducing error with empty ".debug_pubtypes" section Signed-off-by: Efimov Vasily <[email protected]> * NameLUT: use `construct.If` to declare "name" field This patch also fixes problem with empty first entry. Signed-off-by: Efimov Vasily <[email protected]> * NameLUT._get_entries: remove unused `bytes_read` Signed-off-by: Efimov Vasily <[email protected]>
StringTableSection.get_string() returns an UTF-8 decoded string (or '' if fetching the string failed) since eliben#182 but the code in _DynamicStringTable was never updated to decode anything at all so it just returns a bytes sequence in Python 3. Let's convert the string there as well to be able to use both string tables the same way without having to worry about decoding. Adapt the test cases accordingly.
On macOS I'm getting the following error when testing with tox on py27: ``` ERROR: invocation failed (exit code 1), logfile: /devel/pyelftools/.tox/py27/log/py27-33.log ERROR: actionid: py27 msg: installpkg cmdargs: ['/devel/pyelftools/.tox/py27/bin/pip', 'install', '-U', '--no-deps', '/devel/pyelftools/.tox/dist/pyelftools-0.25.zip'] DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. Processing ./.tox/dist/pyelftools-0.25.zip Complete output from command python setup.py egg_info: Traceback (most recent call last): File "<string>", line 1, in <module> File "/private/var/folders/qz/XXX/T/pip-req-build-890d2p/setup.py", line 47, in <module> scripts=['scripts/readelf.py'] File "/devel/pyelftools/.tox/py27/lib/python2.7/site-packages/setuptools/__init__.py", line 144, in setup _install_setup_requires(attrs) File "/devel/pyelftools/.tox/py27/lib/python2.7/site-packages/setuptools/__init__.py", line 137, in _install_setup_requires dist.parse_config_files(ignore_option_errors=True) File "/devel/pyelftools/.tox/py27/lib/python2.7/site-packages/setuptools/dist.py", line 704, in parse_config_files self._parse_config_files(filenames=filenames) File "/devel/pyelftools/.tox/py27/lib/python2.7/site-packages/setuptools/dist.py", line 600, in _parse_config_files reader = io.TextIOWrapper(fp, encoding=encoding) LookupError: unknown encoding: ``` This is due to the specification of LC_ALL as simply `en_US` without an encoding. Python 3.x seems to be fine with this, but Python 2.7 barfs. As a fix, setting `LC_ALL` to `en_US.utf-8` (including an explicit encoding spec) works.
dynamic: parse DT_{GNU_}HASH for number of symbols In ultra-stripped binaries we can find the symbol table by parsing the dynamic segment and using the pointer in the DT_SYMTAB tag as the base address. However, we don't know anything about the number of symbols in the symbol table. Earlier, this code relied on finding the closest pointer value bigger than the base address of the symbol table. In PIE executables and shared libraries however this method could break as the pointer value for DT_SYMTAB is in the same range as things like DT_RELASZ or DT_STRSZ, leading to a too small number of symbols returned by iter_symbols(). The crashpad project has implemented a different strategy to find the number of symbols: parsing the symbol lookup hash tables (see [0]) as every symbol must have a corresponding entry in the hash table. This commit implements this behaviour for DynamicSegment, leaving the old code as a backup if neither DT_HASH or DT_GNU_HASH tags have been found. For DT_HASH type tables, it is quite easy as the header already contains the number of entries. For DT_GNU_HASH things are a bit more complicated as we need to work forward from the highest symbol referenced in the header (a good explanation of the format can be found at [1]). [0]: chromium/crashpad@1f1657d [1]: https://flapenguin.me/2017/05/10/elf-lookup-dt-gnu-hash/ * dynamic: provide more functions for symbol access So far, the DynamicSegment only provided a method to iterate over all symbols but for some use cases it might be useful to use the recovered symbol table more like a normal SymbolTableSection. To this end, provide get_symbol(index) to fetch a symbol by its index, num_symbols() to get the total number of symbols and get_symbol_by_name(name) to look for a list of symbols with a given name.
$SITE_PYTHON/lib/python3.7/site-packages/elftools/construct/lib/container.py:5 Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working This change is compatible with Python 3.3 and up, when the ABCs were moved to collections.abc. Backward compatibility is retained through the try/except block.
* dwarf: initial DWARFv5 support * dwarf/structs: use Embed to select header layout * dwarf/structs: DW_FORM_strx family Not sure how best to handle 24-bit values yet. * dwarf/structs: use IfThenElse `If` alone wraps the else in a `Value`. * dwarf/structs: DW_FORM_addrx family handling * dwarf_expr: support DW_OP_addrx Not complete, but gets readelf.py to the end of a single binary. * dwarf/constants: DW_UT_* constants * dwarf/structs: fix some DW_FORMs * elftools, test: plumbing for DWARFv5 sections * dwarf/constants: fix typo * dwarf/structs: re-add a comment that got squashed * dwarf/structs: DWARFv5 table header scaffolding * dwarf/constants: typo * test: add a basic DWARFv5 test
…r most architectures (eliben#354) * fixed parsing for structures containing uids or gids in core dumps for most architectures * added testcase for mips corefile uid/gid parsing * better description * better email
* [example] Handle lpe with end_sequence correctly * [example] exclude highpc in address comparison in decode_funcname Co-authored-by: Jangseop Shin <[email protected]>
* ELF notes: keep raw note descriptors as bytes * py3compat: add bytes2hex function * elf/descriptions: use bytes2hex where needed * ELF notes: convert to string only for known types
This is very similar to the filtering implemented for sections in commit d71faeb.
* DWARF 5 tags and attributes * DW_AT_virtual Co-authored-by: Seva Alekseyev <[email protected]>
* DWARF 5 tags and attributes * DW_AT_private Co-authored-by: Seva Alekseyev <[email protected]>
* Add support for .note.gnu.properties notes section References: - Doc: https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf - Linux: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=00e19ceec80b03a43f626f891fcc53e57919f1b3 - Glibc: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/dl-prop.h;h=385548fad3e4ad71dbdcdbfada58585c2f24ea5e;hb=HEAD - Binutils: https://sourceware.org/git/?p=binutils-gdb.git&a=search&h=HEAD&st=commit&s=NT_GNU_PROPERTY_TYPE_0 * Add descriptions for .note.gnu.properties notes * descriptions: add missing PT_GNU_PROPERTY description * py3compat: add optional separator for bytes2hex * readelf: align notes column headers * elf/descriptions: conform to real readelf's output format * test: special case some known readelf output quirks * test: add test ELFs for .note.gnu.property notes
Changes to conform the output of readelf.py to binutils readelf v2.37: - Use singular "entry" when needed instead of "entries". - Output the last entry for the .debug_line output table when DW_LNE_end_sequence is encountered, as DWARF standard dictates. Looks looks like this was a readelf bug which was fixed in commit ba8826a82a29a19b78c18ce4f44fe313de279af7 of the GNU binutils-gdb repo. - Add additional "Stmt" field in the .debug_line output table, and ignore the new "View" field. The "Stmt" field has been implemented in readelf.py. The "View" field is not something that the DWARF standard defines, it's an internal register added to the line number information state machine by binutils to perform assembler checks (see commit ba8826a82a29a19b78c18ce4f44fe313de279af7 of GNU binutils-gdb repo for more info, in particular gas/doc/as.texinfo). "View" is unimplemented in pyelftools for now and a special case has been added in the readelf test suite to ignore it. - Add support for printing section names when dumping .symtab entries of st_type STT_SECTION as readelf v2.37 does (see commit 23356397449a8aa65afead0a895a20be53b3c6b0 of GNU binutils-gdb repo). - Add suport for recognizing SOs specifically tagged as PIE (DT_FLAGS_1 dynamic tag with DF_1_PIE set). In such case, describe the file as "Position-Independent Executable file" instead of "Shared object file", as readelf v2.37 does. - Add leading "0x" for version section addresses when dumping version information (-V) as readelf does. - Ignore "D (mbind)" in section headers flags legend (pyelftools does not output this flag). Special cases ADDED for run_readelf_tests.py: - Ignore "View" column for --debug-dump=decodedline in readelf's output. - Ignore ellipsis ("[...]") for long names/symbols/paths in readelf's output. Special cases REMOVED for run_readelf_tests.py: - Detection of additional '@' after symbol names (flag_after_symtable) seems to no longer be needed as all tests pass whitout this exception. - Special case for DW_AT_apple_xxx seems to no longer be needed, readelf now recognizes those. - Special case for PT_GNU_PROPERTY no longer needed, readelf now recognizes it. Other changes: - Add missing import in elftools/dwarf/lineprogram.py. References: - GNU binutils-gdb repo: https://sourceware.org/git/?p=binutils-gdb.git
- Implement support for GNU property note type GNU_PROPERTY_X86_FEATURE_1_AND (which is a feature bitmask) and its relative flags. - Fix off-by-one in "Data size" column alignment for readelf.py note sections dump. References: - https://gitlab.com/x86-psABIs/x86-64-ABI
* Add PS3/CellOS OSABI identifier. * Remove "OS" from CELL OS ABI * Remove "OS" from CELL OS ABI * Add Missing comma for ELFOSABI_CELL_LV2.
Remove unused imports
…en#395) As more and more tools now support DT_RELR compressed relocations (most notably, the just released GNU binutils 2.38 [0]), let's add support for reading these relocations as well. The original discussion about advantages of packe RELATIVE relocations can be found at [1]. In a nutshell, the format exploits the fact that RELATIVE relocations are often placed next to each other and (for x86_64) stores up to 64 relocations in two 8-byte words. In a regular .rela.dyn table, these would take up 24 * 64 = 1536 bytes. The compressed relocations work as follows: The first word in the section describes a base address and contains an offset for a relocation. This offset must always lie at an even address. Following this entry can be one or more bitmap(s) which have their least significant bit set to 1. All other bits describe (in increasing order of significance) if the following continuous offsets also contain a relocation. The addends for existing relocations are stored at the corresponding offsets in the file (that is, they work like REL relocations). A good description of the history of this feature and its current adoption is the following blog post [2]. [0]: https://lists.gnu.org/archive/html/info-gnu/2022-02/msg00009.html [1]: https://groups.google.com/g/generic-abi/c/bX460iggiKg?pli=1 [2]: https://maskray.me/blog/2021-10-31-relative-relocations-and-relr
* Add support DW_FORM_implicit_const * Add support for DW_FORM_line_strp * Add new tests for DW_FORM_implicit_const and DW_FORM_linestrp.
elftools/* Reapply S2E-specific commits.
Still need to test before marking ready for review. |
I propose we do the following:
That would help keep the history linear, avoid messy merge commits inside the PR, and will make it clear what are the S2E changes. |
This is a complementary PR to S2E/pyelftools#3
I agree that this commit history is a bit messy. I went this path as it was most conducive for testing / understanding the S2E commits in my fork. I'm fine not merging this, I've mostly created this draft PR so you can see what I'm testing. If testing goes well, the most straightforward approach IMO would just be to fetch upstream and resolve the conflicts as I have done on this PR. I think it might confusing for newcomers to see that S2E-env relies on a branch of a fork of pyelftools. |
Circling back to this - having resolved the testing issue related to the ubuntu image, I was able to test tracing functions and did not see any issues related to pyelftools. |
No description provided.