summaryrefslogtreecommitdiff
path: root/doc/concepts/rules.org
diff options
context:
space:
mode:
Diffstat (limited to 'doc/concepts/rules.org')
-rw-r--r--doc/concepts/rules.org551
1 files changed, 0 insertions, 551 deletions
diff --git a/doc/concepts/rules.org b/doc/concepts/rules.org
deleted file mode 100644
index d4c61b5e..00000000
--- a/doc/concepts/rules.org
+++ /dev/null
@@ -1,551 +0,0 @@
-* User-defined Rules
-
-Targets are defined in terms of high-level concepts like "libraries",
-"binaries", etc. In order to translate these high-level definitions
-into actionable tasks, the user defines rules, explaining at a
-single point how all targets of a given type are built.
-
-** Rules files
-
-Rules are defined in rules files (by default named ~RULES~). Those
-contain a JSON object mapping rule names to their rule definition.
-For rules, the same naming scheme as for targets applies. However,
-built-in rules (always named by a single string) take precedence
-in naming; to explicitly refer to a rule defined in the current
-module, the module has to be specified, possibly by a relative
-path, e.g., ~["./", ".", "install"]~.
-
-** Basic components of a rule
-
-A rule is defined through a JSON object with various keys. The only
-mandatory key is ~"expression"~ containing the defining expression
-of the rule.
-
-*** ~"config_fields"~, ~"string_fields"~ and ~"target_fields"~
-
-These keys specify the fields that a target defined by that rule can
-have. In particular, those have to be disjoint lists of strings.
-
-For ~"config_fields"~ and ~"string_fields"~ the respective field
-has to evaluate to a list of strings, whereas ~"target_fields"~
-have to evaluate to a list of target references. Those references
-are evaluated immediately, and in the name context of the target
-they occur in.
-
-The difference between ~"config_fields"~ and ~"string_fields"~ is
-that ~"config_fields"~ are evaluated before the target fields and
-hence can be used by the rule to specify config transitions for the
-target fields. ~"string_fields"~ on the other hand are evaluated
-_after_ the target fields; hence the rule cannot use them to
-specify a configuration transition, however the target definition
-in those fields may use the ~"outs"~ and ~"runfiles"~ functions to
-have access to the names of the artifacts or runfiles of a target
-specified in one of the target fields.
-
-*** ~"implicit"~
-
-This key specifies a map of implicit dependencies. The keys of the
-map are additional target fields, the values are the fixed list
-of targets for those fields. If a short-form name of a target is
-used (e.g., only a string instead of a module-target pair), it is
-interpreted relative to the repository and module the rule is defined
-in, not the one the rule is used in. Other than this, those fields
-are evaluated the same way as target fields settable on invocation
-of the rule.
-
-*** ~"config_vars"~
-
-This is a list of strings specifying which parts of the configuration
-the rule uses. The defining expression of the rule is evaluated in an
-environment that is the configuration restricted to those variables;
-if one of those variables is not specified in the configuration
-the value in the restriction is ~null~.
-
-*** ~"config_transitions"~
-
-This key specifies a map of (some of) the target fields (whether
-declared as ~"target_fields"~ or as ~"implicit"~) to a configuration
-expression. Here, a configuration expression is any expression
-in our language. It has access to the ~"config_vars"~ and the
-~"config_fields"~ and has to evaluate to a list of maps. Each map
-specifies a transition to the current configuration by amending
-it on the domain of that map to the given value.
-
-*** ~"imports"~
-
-This specifies a map of expressions that can later be used by
-~CALL_EXPRESSION~. In this way, duplication of (rule) code can be
-avoided. For each key, we have to have a name of an expression;
-expressions are named following the same naming scheme as targets
-and rules. The names are resolved in the context of the rule.
-Expressions themselves are defined in expression files, the default
-name being ~EXPRESSIONS~.
-
-Each expression is a JSON object. The only mandatory key is
-~"expression"~ which has to be an expression in our language. It
-optionally can have a key ~"vars"~ where the value has to be a list
-of strings (and the default is the empty list). Additionally, it
-can have another optional key ~"imports"~ following the same scheme
-as the ~"imports"~ key of a rule; in the ~"imports"~ key of an
-expression, names are resolved in the context of that expression.
-It is a requirement that the ~"imports"~ graph be cycle free.
-
-*** ~"expression"~
-
-This specifies the defining expression of the rule. The value has to
-be an expression of our expression language (basically, an abstract
-syntax tree serialized as JSON). It has access to the following
-extra functions and, when evaluated, has to return a result value.
-
-**** ~FIELD~
-
-The field function takes one argument, ~name~ which has to evaluate
-to the name of a field. For string fields, the given list of strings
-is returned; for target fields, the list of abstract names for the
-given target is returned. These abstract names are opaque within
-the rule language (but meaningful when reported in error messages)
-and should only be used to be passed on to other functions that
-expect names as inputs.
-
-**** ~DEP_ARTIFACTS~ and ~DEP_RUNFILES~
-
-These functions give access to the artifacts, or runfiles, respectively,
-of one of the targets depended upon. It takes two (evaluated)
-arguments, the mandatory ~"dep"~ and the optional ~"transition"~.
-
-The argument ~"dep"~ has to evaluate to an abstract name (as can be
-obtained from the ~FIELD~ function) of some target specified in one
-of the target fields. The ~"transition"~ argument has to evaluate
-to a configuration transition (i.e., a map) and the empty transition
-is taken as default. It is an error to request a target-transition
-pair for a target that was not requested in the given transition
-through one of the target fields.
-
-**** ~DEP_PROVIDES~
-
-This function gives access to a particular entry of the provides
-map of one of the targets depended upon. The arguments ~"dep"~
-and ~"transition"~ are as for ~DEP_ARTIFACTS~; additionally, there
-is the mandatory argument ~"provider"~ which has to evaluate to a
-string. The function returns the value of the provides map of the
-target at the given provider. If the key is not in the provides
-map (or the value at that key is ~null~), the optional argument
-~"default"~ is evaluated and returned. The default for ~"default"~
-is the empty list.
-
-**** ~BLOB~
-
-The ~BLOB~ function takes a single (evaluated) argument ~data~
-which is optional and defaults to the empty string. This argument
-has to evaluate to a string. The function returns an artifact that
-is a non-executable file with the given string as content.
-
-**** ~TREE~
-
-The ~TREE~ function takes a single (evaluated) argument ~$1~ which
-has to be a map of artifacts. The result is a single tree artifact
-formed from the input map. It is an error if the map cannot be
-transformed into a tree (e.g., due to staging conflicts).
-
-**** ~ACTION~
-
-Actions are a way to define new artifacts from (zero or more) already
-defined artifacts by running a command, typically a compiler, linker,
-archiver, etc. The action function takes the following arguments.
-- ~"inputs"~ A map of artifacts. These artifacts are present when
- the command is executed; the keys of the map are the relative path
- from the working directory of the command. The command must not
- make any assumption about the location of the working directory
- in the file system (and instead should refer to files by path
- relative to the working directory). Moreover, the command must
- not modify the input files in any way. (In-place operations can
- be simulated by staging, as is shown in the example later in
- this document.)
-
- It is an additional requirement that no conflicts occur when
- interpreting the keys as paths. For example, ~"foo.txt"~ and
- ~"./foo.txt"~ are different as strings and hence legitimately
- can be assigned different values in a map. When interpreted as
- a path, however, they name the same path; so, if the ~"inputs"~
- map contains both those keys, the corresponding values have
- to be equal.
-- ~"cmd"~ The command to execute, given as ~argv~ vector, i.e.,
- a non-empty list of strings. The 0'th element of that list will
- also be the program to be executed.
-- ~"env"~ The environment in which the command should be executed,
- given as a map of strings to strings.
-- ~"outs"~ and ~"out_dirs"~ Two list of strings naming the files
- and directories, respectively, the command is expected to create.
- It is an error if the command fails to create the promised output
- files. These two lists have to be disjoint, but an entry of
- ~"outs"~ may well name a location inside one of the ~"out_dirs"~.
-
-This function returns a map with keys the strings mentioned in
-~"outs"~ and ~"out_dirs"~. As values this map has artifacts defined
-to be the ones created by running the given command (in the given
-environment with the given inputs).
-
-**** ~RESULT~
-
-The ~RESULT~ function is the only way to obtain a result value.
-It takes three (evaluated) arguments, ~"artifacts"~, ~"runfiles"~, and
-~"provides"~, all of which are optional and default to the empty map.
-It defines the result of a target that has the given artifacts,
-runfiles, and provided data, respectively. In particular, ~"artifacts"~
-and ~"runfiles"~ have to be maps to artifacts, and ~"provides"~ has
-to be a map. Moreover, they keys in ~"runfiles"~ and ~"artifacts"~
-are treated as paths; it is an error if this interpretation yields
-to conflicts. The keys in the artifacts or runfile maps as seen by
-other targets are the normalized paths of the keys given.
-
-
-Result values themselves are opaque in our expression language
-and cannot be deconstructed in any way. Their only purpose is to
-be the result of the evaluation of the defining expression of a target.
-
-**** ~CALL_EXPRESSION~
-
-This function takes one mandatory argument ~"name"~ which is
-unevaluated; it has to a be a string literal. The expression imported
-by that name through the imports field is evaluated in the current
-environment restricted to the variables of that expression. The result
-of that evaluation is the result of the ~CALL_EXPRESSION~ statement.
-
-During the evaluation of an expression, rule fields can still be
-accessed through the functions ~FIELD~, ~DEP_ARTIFACTS~, etc. In
-particular, even an expression with no variables (that, hence, is
-always evaluated in the empty environment) can carry out non-trivial
-computations and be non-constant. The special functions ~BLOB~,
-~ACTION~, and ~RESULT~ are also available. If inside the evaluation
-of an expression the function ~CALL_EXPRESSION~ is used, the name
-argument refers to the ~"imports"~ map of that expression. So the
-call graph is deliberately recursion free.
-
-** Evaluation of a target
-
-A target defined by a user-defined rule is evaluated in the
-following way.
-
-- First, the config fields are evaluated.
-
-- Then, the target-fields are evaluated. This happens for each
- field as follows.
- - The configuration transition for this field is evaluated and
- the transitioned configurations determined.
- - The argument expression for this field is evaluated. The result
- is interpreted as a list of target names. Each of those targets
- is analyzed in all the specified configurations.
-
-- The string fields are evaluated. If the expression for a string
- field queries a target (via ~outs~ or ~runfiles~), the value for
- that target is returned in the first configuration. The rational
- here is that such generator expressions are intended to refer to
- the corresponding target in its "main" configuration; they are
- hardly used anyway for fields branching their targets over many
- configurations.
-
-- The effective configuration for the target is determined. The target
- effectively has used of the configuration the variables used by
- the ~arguments_config~ in the rule invocation, the ~config_vars~
- the rule specified, and the parts of the configuration used by
- a target dependent upon. For a target dependent upon, all parts
- it used of its configuration are relevant expect for those fixed
- by the configuration transition.
-
-- The rule expression is evaluated and the result of that evaluation
- is the result of the rule.
-
-** Example of developing a rule
-
-Let's consider step by step an example of writing a rule. Say we want
-to write a rule that programmatically patches some files.
-
-*** Framework: The minimal rule
-
-Every rule has to have a defining expression evaluating
-to a ~RESULT~. So the minimally correct rule is the ~"null"~
-rule in the following example rule file.
-
-#+BEGIN_SRC
-{ "null": {"expression": {"type": "RESULT"}}}
-#+END_SRC
-
-This rule accepts no parameters, and has the empty map as artifacts,
-runfiles, and provided data. So it is not very useful.
-
-*** String inputs
-
-Let's allow the target definition to have some fields. The most
-simple fields are ~string_fields~; they are given by a list of
-strings. In the defining expression we can access them directly via
-the ~FIELD~ function. Strings can be used when defining maps, but
-we can also create artifacts from them, using the ~BLOB~ function.
-To create a map, we can use the ~singleton_map~ function. We define
-values step by step, using the ~let*~ construct.
-
-#+BEGIN_SRC
-{ "script only":
- { "string_fields": ["script"]
- , "expression":
- { "type": "let*"
- , "bindings":
- [ [ "script content"
- , { "type": "join"
- , "separator": "\n"
- , "$1":
- { "type": "++"
- , "$1":
- [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]]
- }
- }
- ]
- , [ "script"
- , { "type": "singleton_map"
- , "key": "script.ed"
- , "value":
- {"type": "BLOB", "data": {"type": "var", "name": "script content"}}
- }
- ]
- ]
- , "body":
- {"type": "RESULT", "artifacts": {"type": "var", "name": "script"}}
- }
- }
-}
-#+END_SRC
-
-*** Target inputs and derived artifacts
-
-Now it is time to add the input files. Source files are targets like
-any other target (and happen to contain precisely one artifact). So
-we add a target field ~"srcs"~ for the file to be patched. Here we
-have to keep in mind that, on the one hand, target fields accept a
-list of targets and, on the other hand, the artifacts of a target
-are a whole map. We chose to patch all the artifacts of all given
-~"srcs"~ targets. We can iterate over lists with ~foreach~ and maps
-with ~foreach_map~.
-
-Next, we have to keep in mind that targets may place their artifacts
-at arbitrary logical locations. For us that means that first
-we have to make a decision at which logical locations we want
-to place the output artifacts. As one thinks of patching as an
-in-place operation, we chose to logically place the outputs where
-the inputs have been. Of course, we do not modify the input files
-in any way; after all, we have to define a mathematical function
-computing the output artifacts, not a collection of side effects.
-With that choice of logical artifact placement, we have to decide
-what to do if two (or more) input targets place their artifacts at
-logically the same location. We could simply take a "latest wins"
-semantics (keep in mind that target fields give a list of targets,
-not a set) as provided by the ~map_union~ function. We chose to
-consider it a user error if targets with conflicting artifacts are
-specified. This is provided by the ~disjoint_map_union~ that also
-allows to specify an error message to be provided the user. Here,
-conflict means that values for the same map position are defined
-in a different way.
-
-The actual patching is done by an ~ACTION~. We have the script
-already; to make things easy, we stage the input to a fixed place
-and also expect a fixed output location. Then the actual command
-is a simple shell script. The only thing we have to keep in mind
-is that we want useful output precisely if the action fails. Also
-note that, while we define our actions sequentially, they will
-be executed in parallel, as none of them depends on the output of
-another one of them.
-
-#+BEGIN_SRC
-{ "ed patch":
- { "string_fields": ["script"]
- , "target_fields": ["srcs"]
- , "expression":
- { "type": "let*"
- , "bindings":
- [ [ "script content"
- , { "type": "join"
- , "separator": "\n"
- , "$1":
- { "type": "++"
- , "$1":
- [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]]
- }
- }
- ]
- , [ "script"
- , { "type": "singleton_map"
- , "key": "script.ed"
- , "value":
- {"type": "BLOB", "data": {"type": "var", "name": "script content"}}
- }
- ]
- , [ "patched files per target"
- , { "type": "foreach"
- , "var": "src"
- , "range": {"type": "FIELD", "name": "srcs"}
- , "body":
- { "type": "foreach_map"
- , "var_key": "file_name"
- , "var_val": "file"
- , "range":
- {"type": "DEP_ARTIFACTS", "dep": {"type": "var", "name": "src"}}
- , "body":
- { "type": "let*"
- , "bindings":
- [ [ "action output"
- , { "type": "ACTION"
- , "inputs":
- { "type": "map_union"
- , "$1":
- [ {"type": "var", "name": "script"}
- , { "type": "singleton_map"
- , "key": "in"
- , "value": {"type": "var", "name": "file"}
- }
- ]
- }
- , "cmd":
- [ "/bin/sh"
- , "-c"
- , "cp in out && chmod 644 out && /bin/ed out < script.ed > log 2>&1 || (cat log && exit 1)"
- ]
- , "outs": ["out"]
- }
- ]
- ]
- , "body":
- { "type": "singleton_map"
- , "key": {"type": "var", "name": "file_name"}
- , "value":
- { "type": "lookup"
- , "map": {"type": "var", "name": "action output"}
- , "key": "out"
- }
- }
- }
- }
- }
- ]
- , [ "artifacts"
- , { "type": "disjoint_map_union"
- , "msg": "srcs artifacts must not overlap"
- , "$1":
- { "type": "++"
- , "$1": {"type": "var", "name": "patched files per target"}
- }
- }
- ]
- ]
- , "body":
- {"type": "RESULT", "artifacts": {"type": "var", "name": "artifacts"}}
- }
- }
-}
-#+END_SRC
-
-A typical invocation of that rule would be a target file like the following.
-#+BEGIN_SRC
-{ "input.txt":
- { "type": "ed patch"
- , "script": ["%g/world/s//user/g", "%g/World/s//USER/g"]
- , "srcs": [["FILE", null, "input.txt"]]
- }
-}
-#+END_SRC
-As the input file has the same name as a target (in the same module),
-we use the explicit file reference in the specification of the sources.
-
-*** Implicit dependencies and config transitions
-
-Say, instead of patching a file, we want to generate source files
-from some high-level description using our actively developed code
-generator. Then we have to do some additional considerations.
-- First of all, every target defined by this rule not only depends
- on the targets the user specifies. Additionally, our code
- generator is also an implicit dependency. And as it is under
- active development, we certainly do not want it to be taken from
- the ambient build environment (as we did in the previous example
- with ~ed~ which, however, is a pretty stable tool). So we use an
- ~implicit~ target for this.
-- Next, we notice that our code generator is used during the
- build. In particular, we want that tool (written in some compiled
- language) to be built for the platform we run our actions on, not
- the target platform we build our final binaries for. Therefore,
- we have to use a configuration transition.
-- As our defining expression also needs the configuration transition
- to access the artifacts of that implicit target, we better define
- it as a reusable expression. Other rules in our rule collection
- might also have the same task; so ~["transitions", "for host"]~
- might be a good place to define it. In fact, it can look like
- the expression with that name in our own code base.
-
-So, the overall organization of our rule might be as follows.
-
-#+BEGIN_SRC
-{ "generated code":
- { "target_fields": ["srcs"]
- , "implicit": {"generator": [["generators", "foogen"]]}
- , "config_vars": ["HOST_ARCH"]
- , "imports": {"for host": ["transitions", "for host"]}
- , "config_transitions":
- {"generator": [{"type": "CALL_EXPRESSION", "name": "for host"}]}
- , "expression": ...
- }
-}
-#+END_SRC
-
-*** Providing information to consuming targets
-
-In the simple case of patching, the resulting file is indeed the
-only information the consumer of that target needs; in fact, the main
-point was that the resulting target could be a drop-in replacement
-of a source file. A typical rule, however, defines something like
-a library and a library is much more, than just the actual library
-file and the public headers: a library may depend on other libraries;
-therefore, in order to use it, we need
-- to have the header files of dependencies available that might be
- included by the public header files of that library,
-- to have the libraries transitively depended upon available during
- linking, and
-- to know the order in which to link the dependencies (as they
- might have dependencies among each other).
-In order to keep a maintainable build description, all this should
-be taken care of by simply depending on that library. We do _not_
-want the consumer of a target having to be aware of such transitive
-dependencies (e.g., when constructing the link command line), as
-it used to be the case in early build tools like ~make~.
-
-It is a deliberate design choice that a target is given only by
-the result of its analysis, regardless of where it is coming from.
-Therefore, all this information needs to be part of the result of
-a target. Such kind of information is precisely, what the mentioned
-~"provides"~ map is for. As a map, it can contain an arbitrary
-amount of information and the interface function ~"DEP_PROVIDES"~
-is in such a way that adding more providers does not affect targets
-not aware of them (there is no function asking for all providers
-of a target). The keys and their meaning have to be agreed upon
-by a target and its consumers. As the latter, however, typically
-are a target of the same family (authored by the same group), this
-usually is not a problem.
-
-A typical example of computing a provided value is the ~"link-args"~
-in the rules used by ~just~ itself. They are defined by the following
-expression.
-#+BEGIN_SRC
-{ "type": "nub_right"
-, "$1":
- { "type": "++"
- , "$1":
- [ {"type": "keys", "$1": {"type": "var", "name": "lib"}}
- , {"type": "CALL_EXPRESSION", "name": "link-args-deps"}
- , {"type": "var", "name": "link external", "default": []}
- ]
- }
-}
-#+END_SRC
-This expression
-- collects the respective provider of its dependencies,
-- adds itself in front, and
-- deduplicates the resulting list, keeping only the right-most
- occurrence of each entry.
-In this way, the invariant is kept, that the ~"link-args"~ from a
-topological ordering of the dependencies (in the order that a each
-entry is mentioned before its dependencies).