From 77190941b1b4dee61cbb65ead44df71f3f6c06dc Mon Sep 17 00:00:00 2001 From: Klaus Aehlig Date: Fri, 1 Apr 2022 16:23:31 +0200 Subject: Add basic documentation on build rules --- doc/concepts/rules.org | 468 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 468 insertions(+) create mode 100644 doc/concepts/rules.org (limited to 'doc/concepts') diff --git a/doc/concepts/rules.org b/doc/concepts/rules.org new file mode 100644 index 00000000..8c84cb2a --- /dev/null +++ b/doc/concepts/rules.org @@ -0,0 +1,468 @@ +* Rules + +Targets are defined in terms of high-level concepts like "libraries", +"binaries", etc. In order to translate these high-level definitions +into actionable tasks, the user defines rules, explaining at a +single point how all targets of a given type are built. + +** Rules files + +Rules are defined in rules files (by default named ~RULES~). Those +contain a JSON object mapping rule names to their rule definition. +For rules, the same naming scheme as for targets applies. + +** Basic components of a rule + +A rule is defined through a JSON object with various keys. The only +mandatory key is ~"expression"~ containing the defining expression +of the rule. + +*** ~"config_fields"~, ~"string_fields"~ and ~"target_fields"~ + +These keys specify the fields that a target defined by that rule can +have. In particular, those have to be disjoint lists of strings. + +For ~"config_fields"~ and ~"string_fields"~ the respective field +has to evaluate to a list of strings, whereas ~"target_fields"~ +have to evaluate to a list of target references. Those references +are evaluated immediately, and in the name context of the target +they occur in. + +The difference between ~"config_fields"~ and ~"string_fields"~ is +that ~"config_fields"~ are evaluated before the target fields and +hence can be used by the rule to specify config transitions for the +target fields. ~"string_fields"~ on the other hand are evaluated +_after_ the target fields; hence the rule cannot use them to +specify a configuration transition, however the target definition +in those fields may use the ~"outs"~ and ~"runfiles"~ functions to +have access to the names of the artifacts or runfiles of a target +specified in one of the target fields. + +*** ~"implicit"~ + +This key specifies a map of implicit dependencies. The keys of the +map are additional target fields, the values are the fixed list +of targets for those fields. If a short-form name of a target is +used (e.g., only a string instead of a module-target pair), it is +interpreted relative to the repository and module the rule is defined +in, not the one the rule is used in. Other than this, those fields +are evaluated the same way as target fields settable on invocation +of the rule. + +*** ~"config_vars"~ + +This is a list of strings specifying which parts of the configuration +the rule uses. The defining expression of the rule is evaluated in an +environment that is the configuration restricted to those variables; +if one of those variables is not specified in the configuration +the value in the restriction is ~null~. + +*** ~"config_transitions"~ + +This key specifies a map of (some of) the target fields (whether +declared as ~"target_fields"~ or as ~"implicit"~) to a configuration +expression. Here, a configuration expression is any expression +in our language. It has access to the ~"config_vars"~ and the +~"config_fields"~ and has to evaluate to a list of maps. Each map +specifies a transition to the current configuration by ammending +it on the domain of that map to the given value. + +*** ~"imports"~ + +This specifies a map of expressions that can later be used by +~CALL_EXPRESSION~. In this way, duplication of (rule) code can be +avoided. For each key, we have to have a name of an expression; +expressions are named following the same naming scheme as targets +and rules. The names are resolved in the context of the rule. +Expressions themselves are defined in expression files, the default +name being ~EXPRESSIONS~. + +Each expression is a JSON object. The only mandatory key is +~"expression"~ wich has to be an expression in our language. It +optionally can have a key ~"vars"~ where the value has to be a list +of strings (and the default is the empty list). Additionally, it +can have another optional key ~"imports"~ following the same scheme +as the ~"imports"~ key of a rule; in the ~"imports"~ key of an +expression, names are resolved in the context of that expression. +It is a requirement that the ~"imports"~ graph be cycle free. + +*** ~"expression"~ + +This specifies the defining expression of the rule. The value has to +be an expression of our expression language (basically, an abstract +syntax tree serialized as JSON). It has access to the following +extra functions and, when evaluated, has to return a result value. + +**** ~FIELD~ + +The field function takes one argument, ~name~ which has to evaluate +to the name of a field. For string fields, the given list of strings +is returned; for target fields, the list of abstract names for the +given target is returned. These abstract names are opaque within +the rule language (but meaningful when reported in error messages) +and should only be used to be passed on to other functions that +expect names as inputs. + +**** ~DEP_ARTIFACTS~ and ~DEP_RUNFILES~ + +These functions give access to the artifacts, or runfiles, respecitively, +of one of the targets depended upon. It takes two (evalutated) +arguments, the mandatory ~"dep"~ and the optional ~"transition"~. + +The argument ~"dep"~ has to evaluate to an abstract name (as can be +obtained from the ~FIELD~ function) of some target specified in one +of the target fields. The ~"transition"~ argument has to evaluate +to a configuration transition (i.e., a map) and the empty transition +is taken as default. It is an error to request a target-transition +pair for a target that was not requested in the given transition +through one of the target fields. + +**** ~DEP_PROVIDES~ + +This function gives access to a particular entry of the provides +map of one of the targets depended upon. The arguments ~"dep"~ +and ~"transition"~ are as for ~DEP_ARTIFACTS~; additionally, there +is the mandatory argument ~"provider"~ which has to evaluate to a +string. The function returns the value of the provides map of the +target at the given provider. If the key is not in the provides +map (or the value at that key is ~null~), the optional argument +~"default"~ is evaluted and returned. The default for ~"default"~ +is the empty list. + +**** ~BLOB~ + +The ~BLOB~ function takes a single (evaluated) argument ~data~ +which is optional and defaults to the empty string. This argument +has to evaluate to a string. The function returns an artifact that +is a non-executable file with the given string as content. + +**** ~ACTION~ + +Actions are a way to define new artifacts from (zero or more) already +defined artifacts by running a command, typically a compiler, linker, +archiver, etc. The action function takes the following arguments. +- ~"inputs"~ A map of artifacts. These artifacts are present when + the command is executed; the keys of the map are the relative path + from the working directory of the command. The command must not + make any assumption about the location of the working directory + in the file system (and instead should refer to files by path + relative to the working directory). Moreover, the command must + not modify the input files in any way. (In-place operations can + be simulated by staging, as is shown in the example later in + this document.) +- ~"cmd"~ The command to execute, given as ~argv~ vector, i.e., + a non-empty list of strings. The 0'th element of that list will + also be the program to be executed. +- ~"env"~ The environment in which the command should be executed, + given as a map of strings to strings. +- ~"outs"~ and ~"out_dirs"~ Two list of strings naming the files + and directories, respectively, the command is expected to create. + It is an error if the command fails to create the promised output + files. These two lists have to be disjoint, but an entry of + ~"outs"~ may well name a location inside one of the ~"out_dirs"~. + +This function returns a map with keys the strings mentioned in +~"outs"~ and ~"out_dirs"~. As values this map has artifacts defined +to be the ones created by running the given command (in the given +environment with the given inputs). + +**** ~RESULT~ + +The ~RESULT~ function is the only way to obtain a result value. +It takes three (evaluated) arguments, ~artifacts~, ~runfiles~, and +~provides~, all of which are optional and default to the empty map. +It defines the result of a target that has the given artifacts, +runfiles, and provided data, respectively. In particular, ~artifacts~ +and ~runfiles~ have to be maps to artifacts, and ~provides~ has +to be a map. + +Result values themselves are opaque in our expression language +and cannot be deconstructed in any way. Their only purpose is to +be the result of the evaluation of the defining expression of a target. + +**** ~CALL_EXPRESSION~ + +This function takes one mandatory argument ~"name"~ which is +unevaluated; it has to a be a string literal. The expression imported +by that name through the imports field is evaluated in the current +enviroment restricted to the variables of that expression. The result +of that evaluation is the result of the ~CALL_EXPRESSION~ statement. + +During the evaluation of an expression, rule fields can stil be +accessed through the functions ~FIELD~, ~DEP_ARTIFACTS~, etc. In +particular, even an expression with no variables (that, hence, is +always evaluated in the empty environment) can carry out non-trivial +compuations and be non-constant. The special functions ~BLOB~, +~ACTION~, and ~RESULT~ are also available. If inside the evaluation +of an expression the function ~CALL_EXPRESSION~ is used, the name +argument refers to the ~"imports"~ map of that expression. So the +call graph is deliberately recursion free. + +** Evaluation of a target + +A target defined by a user-defined rule is evaluated in the +following way. + +- First, the config fields are evaluated. + +- Then, the target-fields are evaluated. This happens for each + field as follows. + - The configuration transition for this field is evaluated and + the transitioned configurations determined. + - The argument expression for this field is evaluated. The result + is interpreted as a list of target names. Each of those targets + is analyzed in all the specified configurations. + +- The string fields are evaluated. If the expression for a string + field queries a target (via ~outs~ or ~runfiles~), the value for + that target is returned in the first configuration. The rational + here is that such generator expressions are intended to refer to + the corresponding target in its "main" configuration; they are + hardly used anyway for fields branching their targets over many + configurations. + +- The effective configuration for the target is determined. The target + effectively has used of the configuration the variables used by + the ~arguments_config~ in the rule invocation, the ~config_vars~ + the rule specified, and the parts of the configuration used by + a target dependend upon. For a target dependend upon, all parts + it used of its configuration are relevant expect for those fixed + by the configuration transition. + +- The rule expression is evaluated and the result of that evaluation + is the result of the rule. + +** Example of developing a rule + +Let's consider step by step an example of writing a rule. Say we want +to write a rule that programatically patches some files. + +*** Framework: The minimal rule + +Every rule has to have a defining expression evaluating +to a ~RESULT~. So the minimally correct rule is the ~"null"~ +rule in the following example rule file. + +#+BEGIN_SRC +{ "null": {"expression": {"type": "RESULT"}}} +#+END_SRC + +This rule accepts no parameters, and has the empty map as artifacts, +runfiles, and provided data. So it is not very useful. + +*** String inputs + +Let's allow the target definition to have some fields. The most +simple fields are ~string_fields~; they are given by a list of +strings. In the defining expression we can access them directly via +the ~FIELD~ function. Strings can be used when defining maps, but +we can also create artifacts from them, using the ~BLOB~ function. +To create a map, we can use the ~singleton_map~ function. We define +values step by setp, using the ~let*~ construct. + +#+BEGIN_SRC +{ "script only": + { "string_fields": ["script"] + , "expression": + { "type": "let*" + , "bindings": + [ [ "script content" + , { "type": "join" + , "separator": "\n" + , "$1": + { "type": "++" + , "$1": + [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]] + } + } + ] + , [ "script" + , { "type": "singleton_map" + , "key": "script.ed" + , "value": + {"type": "BLOB", "data": {"type": "var", "name": "script content"}} + } + ] + ] + , "body": + {"type": "RESULT", "artifacts": {"type": "var", "name": "script"}} + } + } +} +#+END_SRC + +*** Target inputs and derived artifacts + +Now it is time to add the input files. Source files are targets like +any other target (and happen to contain precisely one artifact). So +we add a target field ~"srcs"~ for the file to be patched. Here we +have to keep in mind that, on the one hand, target fields accept a +list of targets and, on the other hand, the artifacts of a target +are a whole map. We chose to patch all the artifacts of all given +~"srcs"~ targets. We can iterate over lists with ~foreach~ and maps +with ~foreach_map~. + +Next, we have to keep in mind that targets may place their artifacts +at arbitrary logical locations. For us that means that first +we have to make a decission at which logical locations we want +to place the output artifacts. As one thinks of patching as an +in-place operation, we chose to logically place the outputs where +the inputs have been. Of course, we do not modify the input files +in any way; after all, we have to define a mathematical function +computing the output artifacts, not a collection of side effects. +With that choice of logical artifact placement, we have to decide +what to do if two (or more) input targets place their artifacts at +logically the same location. We could simply take a "latest wins" +semantics (keep in mind that target fields give a list of targets, +not a set) as provided by the ~map_union~ function. We chose to +consider it a user error if targets with conflicting artifacts are +specified. This is provided by the ~disjoint_map_union~ that also +allows to specify an error message to be provided the user. Here, +conflict means that values for the same map position are defined +in a different way. + +The actual patching is done by an ~ACTION~. We have the script +already; to make things easy, we stage the input to a fixed place +and also expect a fixed output location. Then the actual command +is a simple shell script. The only thing we have to keep in mind +is that we want useful output precisely if the action fails. Also +note that, while we define our actions sequentially, they will +be executed in parallel, as none of them depends on the output of +another one of them. + +#+BEGIN_SRC +{ "ed patch": + { "string_fields": ["script"] + , "target_fields": ["srcs"] + , "expression": + { "type": "let*" + , "bindings": + [ [ "script content" + , { "type": "join" + , "separator": "\n" + , "$1": + { "type": "++" + , "$1": + [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]] + } + } + ] + , [ "script" + , { "type": "singleton_map" + , "key": "script.ed" + , "value": + {"type": "BLOB", "data": {"type": "var", "name": "script content"}} + } + ] + , [ "patched files per target" + , { "type": "foreach" + , "var": "src" + , "range": {"type": "FIELD", "name": "srcs"} + , "body": + { "type": "foreach_map" + , "var_key": "file_name" + , "var_val": "file" + , "range": + {"type": "DEP_ARTIFACTS", "dep": {"type": "var", "name": "src"}} + , "body": + { "type": "let*" + , "bindings": + [ [ "action output" + , { "type": "ACTION" + , "inputs": + { "type": "map_union" + , "$1": + [ {"type": "var", "name": "script"} + , { "type": "singleton_map" + , "key": "in" + , "value": {"type": "var", "name": "file"} + } + ] + } + , "cmd": + [ "/bin/sh" + , "-c" + , "cp in out && chmod 644 out && /bin/ed out < script.ed > log 2>&1 || (cat log && exit 1)" + ] + , "outs": ["out"] + } + ] + ] + , "body": + { "type": "singleton_map" + , "key": {"type": "var", "name": "file_name"} + , "value": + { "type": "lookup" + , "map": {"type": "var", "name": "action output"} + , "key": "out" + } + } + } + } + } + ] + , [ "artifacts" + , { "type": "disjoint_map_union" + , "msg": "srcs artifacts must not overlap" + , "$1": + { "type": "++" + , "$1": {"type": "var", "name": "patched files per target"} + } + } + ] + ] + , "body": + {"type": "RESULT", "artifacts": {"type": "var", "name": "artifacts"}} + } + } +} +#+END_SRC + +A typical invocation of that rule would be a target file like the following. +#+BEGIN_SRC +{ "input.txt": + { "type": "ed patch" + , "script": ["%g/world/s//user/g", "%g/World/s//USER/g"] + , "srcs": [["FILE", null, "input.txt"]] + } +} +#+END_SRC + +*** Implicit dependencies and config transitions + +Say, instead of patching a file, we want to generate source files +from some high-level description using our actively developed code +generator. Then we have to do some additional considerations. +- First of all, every target defined by this rule not only depends + on the targets the user specifies. Additionally, our code + generator is also an implicit dependecy. And as it is under + active development, we certainly do not want it to be taken from + the ambient build environment (as we did in the previous exmaple + with ~ed~ which, however, is a pretty stable tool). So we use an + ~implicit~ target for this. +- Next, we notice that our code generator is used during the + build. In particular, we want that tool (written in some compiled + language) to be built for the platform we run our actions on, not + the target platform we build our final binaries for. Therefore, + we have to use a configuration transition. +- As our defining expression also needs the configuration transition + to access the artifacts of that implict target, we better define + it as a reusable expression. Other rules in our rule collection + might also have the same task; so ~["transitions", "for host"]~ + might be a good place to define it. In fact, it can look like + the expression with that name in our own code base. + +So, the overall organisation of our rule might be as follows. + +#+BEGIN_SRC +{ "generated code": + { "target_fields": ["srcs"] + , "implicit": {"generator": [["generators", "foogen"]]} + , "config_vars": ["HOST_ARCH"] + , "imports": {"for host": ["transitions", "for host"]} + , "config_transitions": + {"generator": [{"type": "CALL_EXPRESSION", "name": "for host"}]} + , "expression": ... + } +} +#+END_SRC -- cgit v1.2.3