summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/future-designs/debug-fission.md194
1 files changed, 194 insertions, 0 deletions
diff --git a/doc/future-designs/debug-fission.md b/doc/future-designs/debug-fission.md
new file mode 100644
index 00000000..9f7791bf
--- /dev/null
+++ b/doc/future-designs/debug-fission.md
@@ -0,0 +1,194 @@
+Implementing debug fission in the C/C++ rules
+=============================================
+
+Motivation
+----------
+
+Building with debug symbols is needed for debugging purposes, however it
+results in artifacts that are many times larger than their release versions.
+In general, debug builds are also slower than release builds due to various
+reasons, the main ones being the disabling of certain code optimizations (in
+order to allow debuggers to properly work) and debug-only checks and
+diagnostics. Furthermore, sections of debug symbols from common dependencies can
+be replicated many times between different artifacts, but also inside single
+artifacts.
+
+In the cases where one does not produce separate release and debug versions,
+it is usual to just generate the debug artifacts, then strip the debug symbols
+from them to obtain a pseudo-release version. Even in this case, being able to
+reduce the time and space requirements for producing the debug artifacts in the
+first place is of value.
+
+Moreover, distributions usually provide debug information in different packages
+and for that purpose apply themselves debug fission or stripping techniques.
+Projects that provide separated debug information have thus an advantage in
+getting accepted by reducing the work involved in packaging them for distros.
+
+[Debug fission](https://www.tweag.io/blog/2023-11-23-debug-fission)
+-------------------------------------------------------------------
+
+This approach targets specifically Linux ELF files and `gdb(1)`. This is,
+however, more than enough to cover the most used UNIX-like platforms and most
+general purpose debugging tools, which rely on `gdb(1)`.
+
+### Concept
+
+Fission, aka splitting debug information into separate files, exists in modern
+tools for a [long time](https://gcc.gnu.org/wiki/DebugFission). This method
+can be applied when one already splits builds into their constituting compile
+and link steps, which is the case with most modern build tools, including
+*justbuild*. Debug fission proposes that the compilation step of a debug build
+produce, instead of one object file, two artifacts: a `.dwo` DWARF file
+containing all the debug symbols of the compilation unit, and the (now smaller)
+`.o` object file containing now only references to the debug symbols from the
+`.dwo` file. These object files can be linked as usual to produce the final
+build artifact, and the `.dwo` files can be either retained as-is, or packed
+into a corresponding `.dwp` file.
+
+### Benefits
+
+By splitting the debug symbols of each compilation unit into separate
+artifacts, these can be cached and reused as needed, removing any previous
+debug symbols duplication across build artifacts. This has also a beneficial
+impact on the build times. Moreover, the stripped artifacts are quasi-identical
+(e.g., executables differ only in their internal Build ID).
+
+Proposal
+--------
+
+Debug fission requires changes affecting the `OSS` and `rules-cc` rules, as well
+as the OSS toolchain configuration. In order to ensure that new and old
+toolchains and rules, respectively, are still able to work together, the new
+rules will only perform debug fission if the toolchain is configured to use this
+feature.
+
+The following sections describe the needed changes in detail.
+
+### Extend `["CC", "defaults"]` rule with new fields
+
+The `["CC", "defaults"]` rule should accept a new field `"DWP"` containing as a
+singleton list the path to the `dwp` tool to be used for packing DWARF files.
+This field should be handled the same as, e.g., the `"CC"` variable. The
+description of our toolchains should be extended with the `"DWP"` field,
+pointing to the location of the respective tool in the staged binaries folder.
+
+The rule should also accept a new field `"DEBUGFLAGS"` containing as a list
+compile flags to be used for debug builds. This field should be handled in a
+similar way to, e.g., the `"ARFLAGS"` variable, but be used only if debug mode
+is enabled. If missing, default to [`"-g"`]. In this way, most users need not
+set those flags manually anymore, while advanced users can still set their own
+debug-level flags, as needed.
+
+#### Change `"DEBUG"` configuration variable value type to map
+
+In order for the defaults to properly set the appropriate flags in debug mode,
+the configuration variable `"DEBUG"` is changed to be a mapping. As before, a
+`true` evaluation of its value (now, a not empty map) will signal debug mode.
+The following supported keys are proposed:
+
+ - a `"USE_DEBUG_FISSION"` flag, which, if evaluated to `true` enables debug
+ fission and otherwise signals regular debug mode, and
+
+ - a `"FISSION_CONFIG"` map, which can configure in more detail how debug
+ fission behaves. If missing, defaults to empty. If debug fission is not
+ enabled, this field is ignored.
+
+The `"FISSION_CONFIG"` map should accept the following keys:
+
+ - `"USE_DWARF_SPLIT"`: If evaluated to `true`, appends the `-gsplit-dwarf`
+ flag to the `"DEBUGFLAGS"`.
+
+ - `"DWARF_VERSION"`: Expects a number defining the DWARF format version. If
+ provided, appends the `-gdwarf-<version>` flag to the `"DEBUGFLAGS"`.
+
+ Each toolchain comes with a default in terms of which version of the DWARF
+ format is used. Basically all reasonably modern toolchains (GCC >=4.8.1,
+ Clang >=7.0.0 at least) and debugging tools (GDB >= 7.0) use DWARFv4 by
+ default, with the more recent versions having already switched to using the
+ newer, upward compatible [DWARFv5](https://dwarfstd.org/dwarf5std.html)
+ format. However, the degree of implementation and default support of the
+ various compilers and tools differs, so it is recommended to use version 4.
+
+ - `"USE_GDB_INDEX"`: If evaluated to `true`, adds the `-Wl,--gdb-index` link
+ flag. Defaults to `false`.
+
+ This option enables, in linkers that support it, an optimization which
+ bundles certain debug symbols sections together into a single `.gdb_index`
+ section, reducing the size of the final artifact (quite significantly for
+ large artifacts) and drastically improving the debugger start time, but at
+ the cost of a slower linking step.
+
+ - Known supported linkers: `lld` (LLVM >=7), `gold` (binutils >=2.24*), `mold` (>=2.3)
+
+ *`gold` linker additional info:
+ As per the [release notes](https://lwn.net/Articles/1007541/), `binutils`
+ 2.44 (2025-02-02) does **NOT** come with the `gold` linker anymore, as it
+ is considered deprecated and will be removed completely in the near future
+ unless new maintainers are found. Note also that Fedora, for concerns of
+ bit-rot, moved the `gold` linker from its `binutils` RPM to a separate
+ package already since version 31 (2019-10-29).
+
+ - Known unsupported linkers: `ld` (GNU)
+
+ - `"USE_DEBUG_TYPES_SECTION"`: If evaluated to `true`, appends the
+ `-fdebug-types-section` flag to the `"DEBUGFLAGS"`. Defaults to `false`.
+
+ This option enables, for toolchains supporting at least DWARFv4, an
+ optimization that produces separate debug symbols sections for certain large
+ data types, thus providing the linker the opportunity to deduplicate more
+ debug information, resulting in smaller artifacts.
+
+ More performant approaches to reduce the size of the debug information exist,
+ but are not as straight-forward to implement as enabling a flag. For example,
+ Fedora opted instead to use the `dwz` compression tool, another known
+ approach, and make it its default for handling debug RPMs already since
+ version 18 (2013-01-15).
+
+The `["CC", "defaults"]` rule interrogates these fields in order to set the
+appropriate debug flags to be provided to the library/binary rules. Note
+that it is the user's responsibility to configure the debug mode accordingly.
+It is always up to each toolchain how unsupported or unexpected combinations of
+flags are being handled.
+
+### Interface changes to the `"library"` and `"binary"` rules
+
+The `"USE_DEBUG_FISSION"` flag of `"DEBUG"` will inform these rules on whether
+the debug fission logic should be used or not. In this way, only the combination
+of an appropriate configuration and these updated rules will be able to perform
+debug fission, while all other combinations of toolchains and rules will perform
+as before.
+
+All consumers of the internal `"objects"` expression (i.e., static/dynamic
+libraries and binaries) should provide a new field `"debug-info"`, defaulting to
+the empty map. If debug fission is enabled, this field will contain the
+corresponding DWARF package file, constructed via a new expression
+`"dwarf artifact"`, which, based on given `"dwarf-objects"`, `"link-deps"` and a
+`stage`, uses the given DWARF objects, as well as the staged DWARF package files
+provided in the `debug-info` of the given link dependencies, as inputs to an
+`ACTION` generating a resulting DWARF package file, appropriately staged. Each
+such consumer will pass the appropriate inputs to the new `"dwarf artifact"`
+expression considering that `.dwo` files need to be gathered the same as `.o`
+files and `.dwp` files need to be gathered the same as libraries/binaries.
+
+The output of the compile `ACTION` in the `"objects"` expression should be
+extended, if debug fission is enabled, by an additional path corresponding to
+the expected `.dwo` DWARF file, staged next to the usual `.o` file. This path
+needs to be passed to any consumers of `"objects"`. For this purpose, the
+`"objects"` expression should be refactored to provide a map result instead of a
+single variable.
+
+### Extend the `"install-with-deps"` rule to stage DWARF files
+
+The `"install-with-deps"` rule should stage also any `"debug-info"` entries from
+providers when in debug mode, int he same locations as it does regular
+artifacts. As paths are handled by each library/binary accordingly, the DWARF
+package files should always end up next to their corresponding build artifact,
+i.e., where `gdb(1)` expects them.
+
+### Bootstrappable toolchain to expose the `dwp` binary
+
+The bootstrappable toolchain repository provides several toolchains built from
+source. In the case of the `gcc` and `clang` compilers, a `dwp` tool should be
+part of the produced staged binaries. This can be advertised to consumers of
+these toolchains (compilers, compilers+tools) in their `["CC", "defaults"]` via
+the newly introduced `"DWP"` field.