summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorOliver Reiche <oliver.reiche@huawei.com>2023-10-20 16:07:58 +0200
committerOliver Reiche <oliver.reiche@huawei.com>2023-10-20 16:07:58 +0200
commit060a0cf338d6024eee37cc344c224fe3bcb78e81 (patch)
tree3fea7c654b69ecf3490fe9c6cbc542aba0d5bd8f /doc
downloadbootstrappable-toolchain-060a0cf338d6024eee37cc344c224fe3bcb78e81.tar.gz
Initial commit
Diffstat (limited to 'doc')
-rw-r--r--doc/BOOTSTRAP.md99
-rw-r--r--doc/COMPILERS.md61
-rw-r--r--doc/TOOLS.md45
3 files changed, 205 insertions, 0 deletions
diff --git a/doc/BOOTSTRAP.md b/doc/BOOTSTRAP.md
new file mode 100644
index 0000000..d84a9a2
--- /dev/null
+++ b/doc/BOOTSTRAP.md
@@ -0,0 +1,99 @@
+# Bootstrap Process
+
+The bootstrap process cannot rely on anything existing on the build systems,
+except:
+
+1. Coreutils
+2. POSIX-compliant shell located at `/bin/sh`
+3. C89 compiler (e.g., TinyCC, old GCC) with a working C library (*glibc* /
+ *musl libc*)
+
+Consequently, the process is designed to bootstrap a "minimal distribution" that
+contains everything required for building modern toolchains. The process is
+currently separated into two stages.
+
+## Stage 0
+
+### 1. Bootstrapped Busybox "essentials"
+
+Bootstrapping a minimal set of tools that are needed for this stage includes:
+
+- `sed`/`awk`/`diff` (for building `make`/`binutils`/`gcc-4.7.4`/`busybox`)
+- `patch` (for patching `gcc-4.7.4`)
+- `cmp`/`tar` (for running `gcc-4.7.4`'s install target)
+- `find`/`bzip2` (for building full `busybox`)
+
+All tools are bootstrapped via a custom shell script. Note that Busybox' `ar` is
+not included, due to its missing indexing support.
+
+### 2. Bootstrapped GNU Make
+
+Bootstrapping the `make` build system requires the Busybox "essentials" from the
+previous step. It is compiled via its bootstrap script `build.sh`. However, due
+to missing `ar`, final linking is done via custom compile commands.
+
+### 3. Bootstrapped Archiver (from Binutils)
+
+Bootstrapping the archiver `ar`, requires the Busybox "essentials" and `make`
+from the previous steps. This archiver has proper indexing support and is
+compiled via its `Makefile`. However, due to missing `ar` from earlier stages,
+final linking is done via custom script.
+
+### 4. Binutils
+
+Building binutils requires the Busybox "essentials", `make`, and `ar` from the
+previous steps. This *full collection* of binutils includes `ar`, `as`, `ld`,
+`ranlib`, `strip`, and more.
+
+### 5. GCC 4.7.4
+
+Building GCC requires the Busybox "essentials", `make`, and binutils from the
+previous steps. GCC version 4.7.4 is [the last GCC version that can be built
+without a C++ compiler](https://lists.nongnu.org/archive/html/tinycc-devel/2017-05/msg00099.html).
+
+Patches needed for building on modern systems:
+
+- support [new type name `ucontext_t`](https://github.com/gcc-mirror/gcc/commit/883312dc79806f513275b72502231c751c14ff72)
+- support [building with newer C standard](https://gcc.gnu.org/legacy-ml/gcc-patches/2015-08/msg00375.html)
+
+Patches needed for building on systems with *musl libc*:
+[back-ports from GCC 4.8.0, 6.1.0, 9.1.0, and 11.1.0](../etc/patches/gcc-4.7.4/musl-support).
+
+To achieve reproducibility, we had to apply a few more custom patches that
+ensure [build directory independence](../etc/patches/gcc-4.7.4/reproducibility).
+
+Furthermore, to make this a *portable C/C++ toolchain* (which uses bundled
+binutils), we needed to create a shell launcher (e.g., `gcc` launching
+`gcc.real`) that sets `PATH` to bundled binutils, relative to its own location.
+
+### 6. Busybox
+
+Building Busybox requires the Busybox "essentials", `make`, and GCC 4.7.4 from
+the previous steps. This *full collection* of Busybox is ensured to be built
+with an efficient compiler and contains many useful tools for the next stages.
+
+### 7. GNU Make
+
+Building `make` requires `make`, GCC 4.7.4, and Busybox from the previous steps.
+This version of `make` is ensured to be built with an efficient compiler for use
+in the next stages.
+
+## Stage 1
+
+The result of the previous stage is a toolchain definition, containing Busybox,
+`make`, and GCC 4.7.4 bundled with binutils. Unfortunately, GCC 4.7.4 is not
+sufficient to build modern compilers, because most of them require full C++11
+support. Therefore, we introduced a second bootstrapping stage.
+
+### GCC 10.2.0
+
+GCC 10.2.0 is the first GCC version that fully supports the C++11 standard. GCC
+and binutils are built in separate actions, so we can make sure `ar` is
+configured with `--enable-deterministic-archives`. Both actions build for the
+host using the toolchain `stage-0/gcc`. To achieve reproducibility, we had to
+apply a few patches to GCC that ensure build directory independence.
+Furthermore, the use of `msgfmt` is disabled by setting `check_msgfmt=no`.
+Otherwise, the build process might call `msgfmt` with the `LD_LIBRARY_PATH` set
+to the current toolchain's lib dir, which might contain an insufficient
+`libstdc++` version.
+
diff --git a/doc/COMPILERS.md b/doc/COMPILERS.md
new file mode 100644
index 0000000..d6d29b1
--- /dev/null
+++ b/doc/COMPILERS.md
@@ -0,0 +1,61 @@
+# Bootstrapped Compilers
+
+The initial compilers are built with the GCC resulting from the final bootstrap
+stage (`stage-1/gcc`). For more infomation on the bootstrap process, see
+[BOOTSTRAP.md](./BOOTSTRAP.md).
+
+## GCC Native
+
+GCC and binutils are built separately using the GCC toolchain from the final
+bootstrap stage (`stage-1/gcc`). While GCC generally supports reproducible
+builds, this is not necessarily the case if the build directory and toolchain
+root are located in variable paths. To achieve reproducibility, we had to apply
+a few patches that ensure build directory independence. Unfortunately, even
+though we install via the `install-strip` target, not all binaries will be
+stripped (e.g., `libgcc` ignores `install-strip`). Therefore, we need to
+manually strip all libraries and binaries after building.
+
+## GCC Musl
+
+GCC with Musl support is built using the GCC toolchain from the final bootstrap
+stage (`stage-1/gcc`). For building, we use the project
+[*musl-cross-make*](https://github.com/richfelker/musl-cross-make), which
+conveniently also supports building a cross-compiler. To avoid any fetches by
+*musl-cross-make*, we stage the unpacked file trees to their expected target
+destination (e.g., `gcc-13.orig`, `musl-latest.orig`, etc.). For some reason
+*musl-cross-make* tries to modify files in `musl-latest.orig`, so we have to
+provide a writable copy. Finally, we applied a few patches to support newer
+binutils and GCC versions.
+
+Unfortunately, *musl-cross-make* does not call the `install-strip` target.
+Therefore, we apply manual stripping to achieve reproducibility.
+
+## GCC Musl Static
+
+Static building is achieved by using the GCC 13.2.0 Musl toolchain
+(`gcc-13.2.0-musl`) and a "fake" `cc`/`c++` executable that adds the flag
+`-static` to each compiler call.
+
+## Clang Native
+
+Building Clang requires an existing GCC installation with recent C++ standard
+library features (`gcc-13.2.0-native`) and build tools (`busybox`, `make`,
+`cmake`, `python`). GCC is used to build Clang in a first step, before this
+newly built Clang is used to build any of the remaining targets. To ensure that
+the Clang from the first step can be used during the entire build process, GCC's
+runtime libraries (`libgcc`, `libstdc++`) must be locatable by setting
+`LD_LIBRARY_PATH=${GCC_TOOLCHAIN}/lib{32,64}`. For building reproducibly, it is
+required to set `LIBCXXABI_ENABLE_ASSERTIONS` and `LIBUNWIND_ENABLE_ASSERTIONS`
+to `OFF`, as both are enabled by default and cause leaking absolute paths to
+the build directory.
+
+Futhermore, this newly built Clang needs to link GCC's runtime objects
+(`crt*.o`) for compiling its runtime libraries (`libc++`, `libc++abi`, and
+`libunwind`). Therefore, we additionally need to set
+`LDFLAGS=-gcc-toolchain=${GCC_TOOLCHAIN}` *after* Clang was built (note that
+setting this option earlier will fail, due to it being an unknown option to the
+GCC that is used to build Clang in the very first step).
+
+Finally, we also have to patch Clang's `libc++`, because it is using `strto*_l`
+functions that are [deliberately missing in musl
+libc](https://www.openwall.com/lists/musl/2020/10/01/3).
diff --git a/doc/TOOLS.md b/doc/TOOLS.md
new file mode 100644
index 0000000..5f01e64
--- /dev/null
+++ b/doc/TOOLS.md
@@ -0,0 +1,45 @@
+# Bootstrapped Tools
+
+All tools are statically built with the GCC 13.2.0 with musl support
+(`gcc-13.2.0-musl`). Note that dynamically linking with this compiler will not work on
+non-musl systems. Therefore, all *configure checks* and *build steps* must use
+static linking. For more information on the compilers, see
+[COMPILERS.md](./COMPILERS.md).
+
+## Busybox
+
+Busybox is compiled statically by setting `LDFLAGS='-static'`. It strictly
+requires GCC for building, which can be set via `HOSTCC` and `HOSTCXX`. It
+employs *internal checks* to verify that the compilers are working. However,
+those checks seem to ignore `LDFLAGS`, which causes them to fail on non-musl
+systems. Therefore, we had to forcefully set the compilers to `"${CC} -static"`
+and `"${CXX} -static"`.
+
+For reproducibility, we additionally had to set `SOURCE_DATE_EPOCH=0`.
+
+## Make
+
+Make is compiled statically by setting `LDFLAGS='-static'`. For some reason, the
+binary likes to record the absolute path to the C++ compiler on the build
+machine (despite not even using C++). To achieve reproducibility, we set
+`CXX='unused'`.
+
+## CMake
+
+CMake is compiled statically with bundled dependencies. The only dependency that
+is not bundled with CMake's sources is `libssl`. We use the C++ library
+`boringssl` to satisfy that dependency. However, CMake expects `libssl` to be a
+C library (not C++), which is solved via a small patch. For reproducibility, we
+had to redefine `__FILE__` to `__FILE_NAME__`, which is supported since GCC 12.
+
+## Python
+
+Python is compiled statically with default modules built in. See [Building
+Python Statically](https://wiki.python.org/moin/BuildStatically) for full
+details. Missing modules are:
+- `nis`: deprecated module that caused the static build to fail
+- `ssl`: unsupported, due to missing `libssl` C library
+
+Furthermore, the Python binary likes to record its build time and date, so we
+had to set `SOURCE_DATE_EPOCH=0` to achieve reproducibility.
+