diff options
author | Klaus Aehlig <klaus.aehlig@huawei.com> | 2025-03-14 15:38:07 +0100 |
---|---|---|
committer | Klaus Aehlig <klaus.aehlig@huawei.com> | 2025-03-17 17:07:36 +0100 |
commit | f2813218ea4ae98b6391c8ea4ec40f80586ff31d (patch) | |
tree | 99321f8e7b425d219047ac7e7da32fa46e5c7dfb /doc/tutorial | |
parent | b102373fe59c39bdb4a07db1a597316151a5c419 (diff) | |
download | justbuild-f2813218ea4ae98b6391c8ea4ec40f80586ff31d.tar.gz |
Extend tutorial to also include other uses of build delegation
Diffstat (limited to 'doc/tutorial')
-rw-r--r-- | doc/tutorial/build-delegation.md | 90 |
1 files changed, 90 insertions, 0 deletions
diff --git a/doc/tutorial/build-delegation.md b/doc/tutorial/build-delegation.md new file mode 100644 index 00000000..5d597d9e --- /dev/null +++ b/doc/tutorial/build-delegation.md @@ -0,0 +1,90 @@ +# More build delegation using a serve endpoint + +The original purpose of a `just serve` endpoint is to allow building +against dependencies without having to download them. That is +particularly important when [bootstrapping](https://bootstrappable.org/) +the toolchain. However, the serve endpoint does not care what the +target actually is. As long as it is a content-fixed `export` target, +it has all the necessary roots. Therefore, it can also be used for +other purposes. + +## Example: Analysing large data sets + +Besides sources of long bootstrap chains, all form of measurement +data are also files that one wants to avoid having to download, +while still analysing them in various ways and by several persons. + +### Making data available to serve + +Depending on the nature of the data set to be analysed, several ways +are appropriate to make it available to serve. Data for long-term +archival, such as experimental measurements, can be committed to a +repository and that repository added to the `"repositories"` field +in the serve configuration as usual. + +There is, however, another possibility more suited for data +to be rotated, like monitoring data or invocation-log data written +by `just-mr`. Each entity generating such data (like monitoring +machine, CI runner, etc.) uploads the data directory to the +remote-execution endpoint, e.g., via `just-mr add-to-cas` and only +distributes the tree hash to the entities analysing the data. + +As a user of the serve endpoint, by just knowing the tree hash, +can construct an absent root from it. + +``` +{ "repository": + { "type": "git tree" + , "id": "..." + , "cmd": ["false", "Should be known to CAS"] + , "pragma": {"absent": true} + } + } +``` + +Of course, the command `false` is not able to create the specified +tree, but it should not be executed anyway, especially as we don't +want to ever have that large tree locally. Buildings against this +root still makes it available to serve without ever fetching it; the +reason this works is that `just-mr` always prefers the network-wise +closest path: if the root is not known to the serve endpoint anyway, +but is known to the remote-execution CAS, it simply asks serve +to fetch it from there. No need to get the root local, as it is +marked absent. + +Of course, the above root description is so systematic, that we +can easily generate it from the hash; this is useful if we have +many data sets uploaded individually and hence need many of those +repositories. + +### Analysing data via serve + +To analyse a data set, we need, besides the actual data, also a +target description and, potentially, additional tools. Here we use +that `just` allows separate layers for sources and targets. So we +can add a separate repository with the targets file for analysing +the data. As that one will typically be small, we can write it +locally (allowing us the experiment with different kinds of statistics +we might care about) and mark it as `"to_git"`. This not only +makes it content-fixed, but also ensures that it will be uploaded +to the serve endpoint. For computations delegated to serve, we can +only access export targets; but while measurement data might have +some random component, analysing that data typically is a pure +function. So a simple target file could look as follows. + +``` +{ "": {"type": "export", "target": "stats"} +, "stats": + { "type": "generic" + , "outs": ["stats.json"] + , "cmds": ["./statistics-tool"] + , "deps": ["data", ["@", "tools", "", "statistics-tool"]] + } +, "data": {"type": "install", "dirs": [[["TREE", null, "."], "data"]]} +} +``` + +If the data tree contains several data sets that can be analysed independently, +instead of using a big action, several tasks can be defined using computed +roots. If many different data trees are uploaded, an overall accumulation of +the data of the individual repositories can be carried out. |