From 77190941b1b4dee61cbb65ead44df71f3f6c06dc Mon Sep 17 00:00:00 2001
From: Klaus Aehlig <klaus.aehlig@huawei.com>
Date: Fri, 1 Apr 2022 16:23:31 +0200
Subject: Add basic documentation on build rules

---
 doc/concepts/rules.org | 468 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 468 insertions(+)
 create mode 100644 doc/concepts/rules.org

(limited to 'doc/concepts')

diff --git a/doc/concepts/rules.org b/doc/concepts/rules.org
new file mode 100644
index 00000000..8c84cb2a
--- /dev/null
+++ b/doc/concepts/rules.org
@@ -0,0 +1,468 @@
+* Rules
+
+Targets are defined in terms of high-level concepts like "libraries",
+"binaries", etc. In order to translate these high-level definitions
+into actionable tasks, the user defines rules, explaining at a
+single point how all targets of a given type are built.
+
+** Rules files
+
+Rules are defined in rules files (by default named ~RULES~). Those
+contain a JSON object mapping rule names to their rule definition.
+For rules, the same naming scheme as for targets applies.
+
+** Basic components of a rule
+
+A rule is defined through a JSON object with various keys. The only
+mandatory key is ~"expression"~ containing the defining expression
+of the rule.
+
+*** ~"config_fields"~, ~"string_fields"~ and ~"target_fields"~
+
+These keys specify the fields that a target defined by that rule can
+have. In particular, those have to be disjoint lists of strings.
+
+For ~"config_fields"~ and ~"string_fields"~ the respective field
+has to evaluate to a list of strings, whereas ~"target_fields"~
+have to evaluate to a list of target references. Those references
+are evaluated immediately, and in the name context of the target
+they occur in.
+
+The difference between ~"config_fields"~ and ~"string_fields"~ is
+that ~"config_fields"~ are evaluated before the target fields and
+hence can be used by the rule to specify config transitions for the
+target fields. ~"string_fields"~ on the other hand are evaluated
+_after_ the target fields; hence the rule cannot use them to
+specify a configuration transition, however the target definition
+in those fields may use the ~"outs"~ and ~"runfiles"~ functions to
+have access to the names of the artifacts or runfiles of a target
+specified in one of the target fields.
+
+*** ~"implicit"~
+
+This key specifies a map of implicit dependencies. The keys of the
+map are additional target fields, the values are the fixed list
+of targets for those fields. If a short-form name of a target is
+used (e.g., only a string instead of a module-target pair), it is
+interpreted relative to the repository and module the rule is defined
+in, not the one the rule is used in. Other than this, those fields
+are evaluated the same way as target fields settable on invocation
+of the rule.
+
+*** ~"config_vars"~
+
+This is a list of strings specifying which parts of the configuration
+the rule uses. The defining expression of the rule is evaluated in an
+environment that is the configuration restricted to those variables;
+if one of those variables is not specified in the configuration
+the value in the restriction is ~null~.
+
+*** ~"config_transitions"~
+
+This key specifies a map of (some of) the target fields (whether
+declared as ~"target_fields"~ or as ~"implicit"~) to a configuration
+expression. Here, a configuration expression is any expression
+in our language. It has access to the ~"config_vars"~ and the
+~"config_fields"~ and has to evaluate to a list of maps. Each map
+specifies a transition to the current configuration by ammending
+it on the domain of that map to the given value.
+
+*** ~"imports"~
+
+This specifies a map of expressions that can later be used by
+~CALL_EXPRESSION~. In this way, duplication of (rule) code can be
+avoided. For each key, we have to have a name of an expression;
+expressions are named following the same naming scheme as targets
+and rules. The names are resolved in the context of the rule.
+Expressions themselves are defined in expression files, the default
+name being ~EXPRESSIONS~.
+
+Each expression is a JSON object. The only mandatory key is
+~"expression"~ wich has to be an expression in our language. It
+optionally can have a key ~"vars"~ where the value has to be a list
+of strings (and the default is the empty list). Additionally, it
+can have another optional key ~"imports"~ following the same scheme
+as the ~"imports"~ key of a rule; in the ~"imports"~ key of an
+expression, names are resolved in the context of that expression.
+It is a requirement that the ~"imports"~ graph be cycle free.
+
+*** ~"expression"~
+
+This specifies the defining expression of the rule. The value has to
+be an expression of our expression language (basically, an abstract
+syntax tree serialized as JSON). It has access to the following
+extra functions and, when evaluated, has to return a result value.
+
+**** ~FIELD~
+
+The field function takes one argument, ~name~ which has to evaluate
+to the name of a field. For string fields, the given list of strings
+is returned; for target fields, the list of abstract names for the
+given target is returned. These abstract names are opaque within
+the rule language (but meaningful when reported in error messages)
+and should only be used to be passed on to other functions that
+expect names as inputs.
+
+**** ~DEP_ARTIFACTS~ and ~DEP_RUNFILES~
+
+These functions give access to the artifacts, or runfiles, respecitively,
+of one of the targets depended upon. It takes two (evalutated)
+arguments, the mandatory ~"dep"~ and the optional ~"transition"~.
+
+The argument ~"dep"~ has to evaluate to an abstract name (as can be
+obtained from the ~FIELD~ function) of some target specified in one
+of the target fields. The ~"transition"~ argument has to evaluate
+to a configuration transition (i.e., a map) and the empty transition
+is taken as default. It is an error to request a target-transition
+pair for a target that was not requested in the given transition
+through one of the target fields.
+
+**** ~DEP_PROVIDES~
+
+This function gives access to a particular entry of the provides
+map of one of the targets depended upon. The arguments ~"dep"~
+and ~"transition"~ are as for ~DEP_ARTIFACTS~; additionally, there
+is the mandatory argument ~"provider"~ which has to evaluate to a
+string. The function returns the value of the provides map of the
+target at the given provider. If the key is not in the provides
+map (or the value at that key is ~null~), the optional argument
+~"default"~ is evaluted and returned. The default for ~"default"~
+is the empty list.
+
+**** ~BLOB~
+
+The ~BLOB~ function takes a single (evaluated) argument ~data~
+which is optional and defaults to the empty string. This argument
+has to evaluate to a string. The function returns an artifact that
+is a non-executable file with the given string as content.
+
+**** ~ACTION~
+
+Actions are a way to define new artifacts from (zero or more) already
+defined artifacts by running a command, typically a compiler, linker,
+archiver, etc. The action function takes the following arguments.
+- ~"inputs"~ A map of artifacts. These artifacts are present when
+  the command is executed; the keys of the map are the relative path
+  from the working directory of the command. The command must not
+  make any assumption about the location of the working directory
+  in the file system (and instead should refer to files by path
+  relative to the working directory). Moreover, the command must
+  not modify the input files in any way. (In-place operations can
+  be simulated by staging, as is shown in the example later in
+  this document.)
+- ~"cmd"~ The command to execute, given as ~argv~ vector, i.e.,
+  a non-empty list of strings. The 0'th element of that list will
+  also be the program to be executed.
+- ~"env"~ The environment in which the command should be executed,
+  given as a map of strings to strings.
+- ~"outs"~ and ~"out_dirs"~ Two list of strings naming the files
+  and directories, respectively, the command is expected to create.
+  It is an error if the command fails to create the promised output
+  files. These two lists have to be disjoint, but an entry of
+  ~"outs"~ may well name a location inside one of the ~"out_dirs"~.
+
+This function returns a map with keys the strings mentioned in
+~"outs"~ and ~"out_dirs"~. As values this map has artifacts defined
+to be the ones created by running the given command (in the given
+environment with the given inputs).
+
+**** ~RESULT~
+
+The ~RESULT~ function is the only way to obtain a result value.
+It takes three (evaluated) arguments, ~artifacts~, ~runfiles~, and
+~provides~, all of which are optional and default to the empty map.
+It defines the result of a target that has the given artifacts,
+runfiles, and provided data, respectively. In particular, ~artifacts~
+and ~runfiles~ have to be maps to artifacts, and ~provides~ has
+to be a map.
+
+Result values themselves are opaque in our expression language
+and cannot be deconstructed in any way. Their only purpose is to
+be the result of the evaluation of the defining expression of a target.
+
+**** ~CALL_EXPRESSION~
+
+This function takes one mandatory argument ~"name"~ which is
+unevaluated; it has to a be a string literal. The expression imported
+by that name through the imports field is evaluated in the current
+enviroment restricted to the variables of that expression. The result
+of that evaluation is the result of the ~CALL_EXPRESSION~ statement.
+
+During the evaluation of an expression, rule fields can stil be
+accessed through the functions ~FIELD~, ~DEP_ARTIFACTS~, etc. In
+particular, even an expression with no variables (that, hence, is
+always evaluated in the empty environment) can carry out non-trivial
+compuations and be non-constant. The special functions ~BLOB~,
+~ACTION~, and ~RESULT~ are also available. If inside the evaluation
+of an expression the function ~CALL_EXPRESSION~ is used, the name
+argument refers to the ~"imports"~ map of that expression. So the
+call graph is deliberately recursion free.
+
+** Evaluation of a target
+
+A target defined by a user-defined rule is evaluated in the
+following way.
+
+- First, the config fields are evaluated.
+
+- Then, the target-fields are evaluated. This happens for each
+  field as follows.
+  - The configuration transition for this field is evaluated and
+    the transitioned configurations determined.
+  - The argument expression for this field is evaluated. The result
+    is interpreted as a list of target names. Each of those targets
+    is analyzed in all the specified configurations.
+
+- The string fields are evaluated. If the expression for a string
+  field queries a target (via ~outs~ or ~runfiles~), the value for
+  that target is returned in the first configuration. The rational
+  here is that such generator expressions are intended to refer to
+  the corresponding target in its "main" configuration; they are
+  hardly used anyway for fields branching their targets over many
+  configurations.
+
+- The effective configuration for the target is determined. The target
+  effectively has used of the configuration the variables used by
+  the ~arguments_config~ in the rule invocation, the ~config_vars~
+  the rule specified, and the parts of the configuration used by
+  a target dependend upon. For a target dependend upon, all parts
+  it used of its configuration are relevant expect for those fixed
+  by the configuration transition.
+
+- The rule expression is evaluated and the result of that evaluation
+  is the result of the rule.
+
+** Example of developing a rule
+
+Let's consider step by step an example of writing a rule. Say we want
+to write a rule that programatically patches some files.
+
+*** Framework: The minimal rule
+
+Every rule has to have a defining expression evaluating
+to a ~RESULT~. So the minimally correct rule is the ~"null"~
+rule in the following example rule file.
+
+#+BEGIN_SRC
+{ "null": {"expression": {"type": "RESULT"}}}
+#+END_SRC
+
+This rule accepts no parameters, and has the empty map as artifacts,
+runfiles, and provided data. So it is not very useful.
+
+*** String inputs
+
+Let's allow the target definition to have some fields. The most
+simple fields are ~string_fields~; they are given by a list of
+strings. In the defining expression we can access them directly via
+the ~FIELD~ function. Strings can be used when defining maps, but
+we can also create artifacts from them, using the ~BLOB~ function.
+To create a map, we can use the ~singleton_map~ function. We define
+values step by setp, using the ~let*~ construct.
+
+#+BEGIN_SRC
+{ "script only":
+  { "string_fields": ["script"]
+  , "expression":
+    { "type": "let*"
+    , "bindings":
+      [ [ "script content"
+        , { "type": "join"
+          , "separator": "\n"
+          , "$1":
+            { "type": "++"
+            , "$1":
+              [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]]
+            }
+          }
+        ]
+      , [ "script"
+        , { "type": "singleton_map"
+          , "key": "script.ed"
+          , "value":
+            {"type": "BLOB", "data": {"type": "var", "name": "script content"}}
+          }
+        ]
+      ]
+    , "body":
+      {"type": "RESULT", "artifacts": {"type": "var", "name": "script"}}
+    }
+  }
+}
+#+END_SRC
+
+*** Target inputs and derived artifacts
+
+Now it is time to add the input files. Source files are targets like
+any other target (and happen to contain precisely one artifact). So
+we add a target field ~"srcs"~ for the file to be patched. Here we
+have to keep in mind that, on the one hand, target fields accept a
+list of targets and, on the other hand, the artifacts of a target
+are a whole map. We chose to patch all the artifacts of all given
+~"srcs"~ targets. We can iterate over lists with ~foreach~ and maps
+with ~foreach_map~.
+
+Next, we have to keep in mind that targets may place their artifacts
+at arbitrary logical locations. For us that means that first
+we have to make a decission at which logical locations we want
+to place the output artifacts. As one thinks of patching as an
+in-place operation, we chose to logically place the outputs where
+the inputs have been. Of course, we do not modify the input files
+in any way; after all, we have to define a mathematical function
+computing the output artifacts, not a collection of side effects.
+With that choice of logical artifact placement, we have to decide
+what to do if two (or more) input targets place their artifacts at
+logically the same location. We could simply take a "latest wins"
+semantics (keep in mind that target fields give a list of targets,
+not a set) as provided by the ~map_union~ function. We chose to
+consider it a user error if targets with conflicting artifacts are
+specified. This is provided by the ~disjoint_map_union~ that also
+allows to specify an error message to be provided the user. Here,
+conflict means that values for the same map position are defined
+in a different way.
+
+The actual patching is done by an ~ACTION~. We have the script
+already; to make things easy, we stage the input to a fixed place
+and also expect a fixed output location. Then the actual command
+is a simple shell script. The only thing we have to keep in mind
+is that we want useful output precisely if the action fails. Also
+note that, while we define our actions sequentially, they will
+be executed in parallel, as none of them depends on the output of
+another one of them.
+
+#+BEGIN_SRC
+{ "ed patch":
+  { "string_fields": ["script"]
+  , "target_fields": ["srcs"]
+  , "expression":
+    { "type": "let*"
+    , "bindings":
+      [ [ "script content"
+        , { "type": "join"
+          , "separator": "\n"
+          , "$1":
+            { "type": "++"
+            , "$1":
+              [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]]
+            }
+          }
+        ]
+      , [ "script"
+        , { "type": "singleton_map"
+          , "key": "script.ed"
+          , "value":
+            {"type": "BLOB", "data": {"type": "var", "name": "script content"}}
+          }
+        ]
+      , [ "patched files per target"
+        , { "type": "foreach"
+          , "var": "src"
+          , "range": {"type": "FIELD", "name": "srcs"}
+          , "body":
+            { "type": "foreach_map"
+            , "var_key": "file_name"
+            , "var_val": "file"
+            , "range":
+              {"type": "DEP_ARTIFACTS", "dep": {"type": "var", "name": "src"}}
+            , "body":
+              { "type": "let*"
+              , "bindings":
+                [ [ "action output"
+                  , { "type": "ACTION"
+                    , "inputs":
+                      { "type": "map_union"
+                      , "$1":
+                        [ {"type": "var", "name": "script"}
+                        , { "type": "singleton_map"
+                          , "key": "in"
+                          , "value": {"type": "var", "name": "file"}
+                          }
+                        ]
+                      }
+                    , "cmd":
+                      [ "/bin/sh"
+                      , "-c"
+                      , "cp in out && chmod 644 out && /bin/ed out < script.ed > log 2>&1 || (cat log && exit 1)"
+                      ]
+                    , "outs": ["out"]
+                    }
+                  ]
+                ]
+              , "body":
+                { "type": "singleton_map"
+                , "key": {"type": "var", "name": "file_name"}
+                , "value":
+                  { "type": "lookup"
+                  , "map": {"type": "var", "name": "action output"}
+                  , "key": "out"
+                  }
+                }
+              }
+            }
+          }
+        ]
+      , [ "artifacts"
+        , { "type": "disjoint_map_union"
+          , "msg": "srcs artifacts must not overlap"
+          , "$1":
+            { "type": "++"
+            , "$1": {"type": "var", "name": "patched files per target"}
+            }
+          }
+        ]
+      ]
+    , "body":
+      {"type": "RESULT", "artifacts": {"type": "var", "name": "artifacts"}}
+    }
+  }
+}
+#+END_SRC
+
+A typical invocation of that rule would be a target file like the following.
+#+BEGIN_SRC
+{ "input.txt":
+  { "type": "ed patch"
+  , "script": ["%g/world/s//user/g", "%g/World/s//USER/g"]
+  , "srcs": [["FILE", null, "input.txt"]]
+  }
+}
+#+END_SRC
+
+*** Implicit dependencies and config transitions
+
+Say, instead of patching a file, we want to generate source files
+from some high-level description using our actively developed code
+generator. Then we have to do some additional considerations.
+- First of all, every target defined by this rule not only depends
+  on the targets the user specifies. Additionally, our code
+  generator is also an implicit dependecy. And as it is under
+  active development, we certainly do not want it to be taken from
+  the ambient build environment (as we did in the previous exmaple
+  with ~ed~ which, however, is a pretty stable tool). So we use an
+  ~implicit~ target for this.
+- Next, we notice that our code generator is used during the
+  build. In particular, we want that tool (written in some compiled
+  language) to be built for the platform we run our actions on, not
+  the target platform we build our final binaries for. Therefore,
+  we have to use a configuration transition.
+- As our defining expression also needs the configuration transition
+  to access the artifacts of that implict target, we better define
+  it as a reusable expression. Other rules in our rule collection
+  might also have the same task; so ~["transitions", "for host"]~
+  might be a good place to define it. In fact, it can look like
+  the expression with that name in our own code base.
+
+So, the overall organisation of our rule might be as follows.
+
+#+BEGIN_SRC
+{ "generated code":
+  { "target_fields": ["srcs"]
+  , "implicit": {"generator": [["generators", "foogen"]]}
+  , "config_vars": ["HOST_ARCH"]
+  , "imports": {"for host": ["transitions", "for host"]}
+  , "config_transitions":
+    {"generator": [{"type": "CALL_EXPRESSION", "name": "for host"}]}
+  , "expression": ...
+  }
+}
+#+END_SRC
-- 
cgit v1.2.3