summaryrefslogtreecommitdiff
path: root/doc/concepts/rules.org
blob: e91b2ecd9081e82b08d57921a5ca62e7e102199c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
* User-defined Rules

Targets are defined in terms of high-level concepts like "libraries",
"binaries", etc. In order to translate these high-level definitions
into actionable tasks, the user defines rules, explaining at a
single point how all targets of a given type are built.

** Rules files

Rules are defined in rules files (by default named ~RULES~). Those
contain a JSON object mapping rule names to their rule definition.
For rules, the same naming scheme as for targets applies. However,
built-in rules (always named by a single string) take precedence
in naming; to explicitly refer to a rule defined in the current
module, the module has to be specified, possibly by a relative
path, e.g., ~["./", ".", "install"]~.

** Basic components of a rule

A rule is defined through a JSON object with various keys. The only
mandatory key is ~"expression"~ containing the defining expression
of the rule.

*** ~"config_fields"~, ~"string_fields"~ and ~"target_fields"~

These keys specify the fields that a target defined by that rule can
have. In particular, those have to be disjoint lists of strings.

For ~"config_fields"~ and ~"string_fields"~ the respective field
has to evaluate to a list of strings, whereas ~"target_fields"~
have to evaluate to a list of target references. Those references
are evaluated immediately, and in the name context of the target
they occur in.

The difference between ~"config_fields"~ and ~"string_fields"~ is
that ~"config_fields"~ are evaluated before the target fields and
hence can be used by the rule to specify config transitions for the
target fields. ~"string_fields"~ on the other hand are evaluated
_after_ the target fields; hence the rule cannot use them to
specify a configuration transition, however the target definition
in those fields may use the ~"outs"~ and ~"runfiles"~ functions to
have access to the names of the artifacts or runfiles of a target
specified in one of the target fields.

*** ~"implicit"~

This key specifies a map of implicit dependencies. The keys of the
map are additional target fields, the values are the fixed list
of targets for those fields. If a short-form name of a target is
used (e.g., only a string instead of a module-target pair), it is
interpreted relative to the repository and module the rule is defined
in, not the one the rule is used in. Other than this, those fields
are evaluated the same way as target fields settable on invocation
of the rule.

*** ~"config_vars"~

This is a list of strings specifying which parts of the configuration
the rule uses. The defining expression of the rule is evaluated in an
environment that is the configuration restricted to those variables;
if one of those variables is not specified in the configuration
the value in the restriction is ~null~.

*** ~"config_transitions"~

This key specifies a map of (some of) the target fields (whether
declared as ~"target_fields"~ or as ~"implicit"~) to a configuration
expression. Here, a configuration expression is any expression
in our language. It has access to the ~"config_vars"~ and the
~"config_fields"~ and has to evaluate to a list of maps. Each map
specifies a transition to the current configuration by ammending
it on the domain of that map to the given value.

*** ~"imports"~

This specifies a map of expressions that can later be used by
~CALL_EXPRESSION~. In this way, duplication of (rule) code can be
avoided. For each key, we have to have a name of an expression;
expressions are named following the same naming scheme as targets
and rules. The names are resolved in the context of the rule.
Expressions themselves are defined in expression files, the default
name being ~EXPRESSIONS~.

Each expression is a JSON object. The only mandatory key is
~"expression"~ wich has to be an expression in our language. It
optionally can have a key ~"vars"~ where the value has to be a list
of strings (and the default is the empty list). Additionally, it
can have another optional key ~"imports"~ following the same scheme
as the ~"imports"~ key of a rule; in the ~"imports"~ key of an
expression, names are resolved in the context of that expression.
It is a requirement that the ~"imports"~ graph be cycle free.

*** ~"expression"~

This specifies the defining expression of the rule. The value has to
be an expression of our expression language (basically, an abstract
syntax tree serialized as JSON). It has access to the following
extra functions and, when evaluated, has to return a result value.

**** ~FIELD~

The field function takes one argument, ~name~ which has to evaluate
to the name of a field. For string fields, the given list of strings
is returned; for target fields, the list of abstract names for the
given target is returned. These abstract names are opaque within
the rule language (but meaningful when reported in error messages)
and should only be used to be passed on to other functions that
expect names as inputs.

**** ~DEP_ARTIFACTS~ and ~DEP_RUNFILES~

These functions give access to the artifacts, or runfiles, respecitively,
of one of the targets depended upon. It takes two (evalutated)
arguments, the mandatory ~"dep"~ and the optional ~"transition"~.

The argument ~"dep"~ has to evaluate to an abstract name (as can be
obtained from the ~FIELD~ function) of some target specified in one
of the target fields. The ~"transition"~ argument has to evaluate
to a configuration transition (i.e., a map) and the empty transition
is taken as default. It is an error to request a target-transition
pair for a target that was not requested in the given transition
through one of the target fields.

**** ~DEP_PROVIDES~

This function gives access to a particular entry of the provides
map of one of the targets depended upon. The arguments ~"dep"~
and ~"transition"~ are as for ~DEP_ARTIFACTS~; additionally, there
is the mandatory argument ~"provider"~ which has to evaluate to a
string. The function returns the value of the provides map of the
target at the given provider. If the key is not in the provides
map (or the value at that key is ~null~), the optional argument
~"default"~ is evaluted and returned. The default for ~"default"~
is the empty list.

**** ~BLOB~

The ~BLOB~ function takes a single (evaluated) argument ~data~
which is optional and defaults to the empty string. This argument
has to evaluate to a string. The function returns an artifact that
is a non-executable file with the given string as content.

**** ~TREE~

The ~TREE~ function takes a single (evaluated) argument ~$1~ which
has to be a map of artifacts. The result is a single tree artifact
formed from the input map. It is an error if the map cannot be
transformed into a tree (e.g., due to staging conflicts).

**** ~ACTION~

Actions are a way to define new artifacts from (zero or more) already
defined artifacts by running a command, typically a compiler, linker,
archiver, etc. The action function takes the following arguments.
- ~"inputs"~ A map of artifacts. These artifacts are present when
  the command is executed; the keys of the map are the relative path
  from the working directory of the command. The command must not
  make any assumption about the location of the working directory
  in the file system (and instead should refer to files by path
  relative to the working directory). Moreover, the command must
  not modify the input files in any way. (In-place operations can
  be simulated by staging, as is shown in the example later in
  this document.)

  It is an additional requirement that no conflicts occur when
  interpreting the keys as paths. For example, ~"foo.txt"~ and
  ~"./foo.txt"~ are different as strings and hence legitimately
  can be assigned different values in a map. When interpreted as
  a path, however, they name the same path; so, if the ~"inputs"~
  map contains both those keys, the corresponding values have
  to be equal.
- ~"cmd"~ The command to execute, given as ~argv~ vector, i.e.,
  a non-empty list of strings. The 0'th element of that list will
  also be the program to be executed.
- ~"env"~ The environment in which the command should be executed,
  given as a map of strings to strings.
- ~"outs"~ and ~"out_dirs"~ Two list of strings naming the files
  and directories, respectively, the command is expected to create.
  It is an error if the command fails to create the promised output
  files. These two lists have to be disjoint, but an entry of
  ~"outs"~ may well name a location inside one of the ~"out_dirs"~.

This function returns a map with keys the strings mentioned in
~"outs"~ and ~"out_dirs"~. As values this map has artifacts defined
to be the ones created by running the given command (in the given
environment with the given inputs).

**** ~RESULT~

The ~RESULT~ function is the only way to obtain a result value.
It takes three (evaluated) arguments, ~"artifacts"~, ~"runfiles"~, and
~"provides"~, all of which are optional and default to the empty map.
It defines the result of a target that has the given artifacts,
runfiles, and provided data, respectively. In particular, ~"artifacts"~
and ~"runfiles"~ have to be maps to artifacts, and ~"provides"~ has
to be a map. Moreover, they keys in ~"runfiles"~ and ~"artifacts"~
are treated as paths; it is an error if this interpretation yields
to conflicts. The keys in the artifacts or runfile maps as seen by
other targets are the normalized paths of the keys given.


Result values themselves are opaque in our expression language
and cannot be deconstructed in any way. Their only purpose is to
be the result of the evaluation of the defining expression of a target.

**** ~CALL_EXPRESSION~

This function takes one mandatory argument ~"name"~ which is
unevaluated; it has to a be a string literal. The expression imported
by that name through the imports field is evaluated in the current
enviroment restricted to the variables of that expression. The result
of that evaluation is the result of the ~CALL_EXPRESSION~ statement.

During the evaluation of an expression, rule fields can stil be
accessed through the functions ~FIELD~, ~DEP_ARTIFACTS~, etc. In
particular, even an expression with no variables (that, hence, is
always evaluated in the empty environment) can carry out non-trivial
compuations and be non-constant. The special functions ~BLOB~,
~ACTION~, and ~RESULT~ are also available. If inside the evaluation
of an expression the function ~CALL_EXPRESSION~ is used, the name
argument refers to the ~"imports"~ map of that expression. So the
call graph is deliberately recursion free.

** Evaluation of a target

A target defined by a user-defined rule is evaluated in the
following way.

- First, the config fields are evaluated.

- Then, the target-fields are evaluated. This happens for each
  field as follows.
  - The configuration transition for this field is evaluated and
    the transitioned configurations determined.
  - The argument expression for this field is evaluated. The result
    is interpreted as a list of target names. Each of those targets
    is analyzed in all the specified configurations.

- The string fields are evaluated. If the expression for a string
  field queries a target (via ~outs~ or ~runfiles~), the value for
  that target is returned in the first configuration. The rational
  here is that such generator expressions are intended to refer to
  the corresponding target in its "main" configuration; they are
  hardly used anyway for fields branching their targets over many
  configurations.

- The effective configuration for the target is determined. The target
  effectively has used of the configuration the variables used by
  the ~arguments_config~ in the rule invocation, the ~config_vars~
  the rule specified, and the parts of the configuration used by
  a target dependend upon. For a target dependend upon, all parts
  it used of its configuration are relevant expect for those fixed
  by the configuration transition.

- The rule expression is evaluated and the result of that evaluation
  is the result of the rule.

** Example of developing a rule

Let's consider step by step an example of writing a rule. Say we want
to write a rule that programatically patches some files.

*** Framework: The minimal rule

Every rule has to have a defining expression evaluating
to a ~RESULT~. So the minimally correct rule is the ~"null"~
rule in the following example rule file.

#+BEGIN_SRC
{ "null": {"expression": {"type": "RESULT"}}}
#+END_SRC

This rule accepts no parameters, and has the empty map as artifacts,
runfiles, and provided data. So it is not very useful.

*** String inputs

Let's allow the target definition to have some fields. The most
simple fields are ~string_fields~; they are given by a list of
strings. In the defining expression we can access them directly via
the ~FIELD~ function. Strings can be used when defining maps, but
we can also create artifacts from them, using the ~BLOB~ function.
To create a map, we can use the ~singleton_map~ function. We define
values step by setp, using the ~let*~ construct.

#+BEGIN_SRC
{ "script only":
  { "string_fields": ["script"]
  , "expression":
    { "type": "let*"
    , "bindings":
      [ [ "script content"
        , { "type": "join"
          , "separator": "\n"
          , "$1":
            { "type": "++"
            , "$1":
              [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]]
            }
          }
        ]
      , [ "script"
        , { "type": "singleton_map"
          , "key": "script.ed"
          , "value":
            {"type": "BLOB", "data": {"type": "var", "name": "script content"}}
          }
        ]
      ]
    , "body":
      {"type": "RESULT", "artifacts": {"type": "var", "name": "script"}}
    }
  }
}
#+END_SRC

*** Target inputs and derived artifacts

Now it is time to add the input files. Source files are targets like
any other target (and happen to contain precisely one artifact). So
we add a target field ~"srcs"~ for the file to be patched. Here we
have to keep in mind that, on the one hand, target fields accept a
list of targets and, on the other hand, the artifacts of a target
are a whole map. We chose to patch all the artifacts of all given
~"srcs"~ targets. We can iterate over lists with ~foreach~ and maps
with ~foreach_map~.

Next, we have to keep in mind that targets may place their artifacts
at arbitrary logical locations. For us that means that first
we have to make a decission at which logical locations we want
to place the output artifacts. As one thinks of patching as an
in-place operation, we chose to logically place the outputs where
the inputs have been. Of course, we do not modify the input files
in any way; after all, we have to define a mathematical function
computing the output artifacts, not a collection of side effects.
With that choice of logical artifact placement, we have to decide
what to do if two (or more) input targets place their artifacts at
logically the same location. We could simply take a "latest wins"
semantics (keep in mind that target fields give a list of targets,
not a set) as provided by the ~map_union~ function. We chose to
consider it a user error if targets with conflicting artifacts are
specified. This is provided by the ~disjoint_map_union~ that also
allows to specify an error message to be provided the user. Here,
conflict means that values for the same map position are defined
in a different way.

The actual patching is done by an ~ACTION~. We have the script
already; to make things easy, we stage the input to a fixed place
and also expect a fixed output location. Then the actual command
is a simple shell script. The only thing we have to keep in mind
is that we want useful output precisely if the action fails. Also
note that, while we define our actions sequentially, they will
be executed in parallel, as none of them depends on the output of
another one of them.

#+BEGIN_SRC
{ "ed patch":
  { "string_fields": ["script"]
  , "target_fields": ["srcs"]
  , "expression":
    { "type": "let*"
    , "bindings":
      [ [ "script content"
        , { "type": "join"
          , "separator": "\n"
          , "$1":
            { "type": "++"
            , "$1":
              [["H"], {"type": "FIELD", "name": "script"}, ["w", "q", ""]]
            }
          }
        ]
      , [ "script"
        , { "type": "singleton_map"
          , "key": "script.ed"
          , "value":
            {"type": "BLOB", "data": {"type": "var", "name": "script content"}}
          }
        ]
      , [ "patched files per target"
        , { "type": "foreach"
          , "var": "src"
          , "range": {"type": "FIELD", "name": "srcs"}
          , "body":
            { "type": "foreach_map"
            , "var_key": "file_name"
            , "var_val": "file"
            , "range":
              {"type": "DEP_ARTIFACTS", "dep": {"type": "var", "name": "src"}}
            , "body":
              { "type": "let*"
              , "bindings":
                [ [ "action output"
                  , { "type": "ACTION"
                    , "inputs":
                      { "type": "map_union"
                      , "$1":
                        [ {"type": "var", "name": "script"}
                        , { "type": "singleton_map"
                          , "key": "in"
                          , "value": {"type": "var", "name": "file"}
                          }
                        ]
                      }
                    , "cmd":
                      [ "/bin/sh"
                      , "-c"
                      , "cp in out && chmod 644 out && /bin/ed out < script.ed > log 2>&1 || (cat log && exit 1)"
                      ]
                    , "outs": ["out"]
                    }
                  ]
                ]
              , "body":
                { "type": "singleton_map"
                , "key": {"type": "var", "name": "file_name"}
                , "value":
                  { "type": "lookup"
                  , "map": {"type": "var", "name": "action output"}
                  , "key": "out"
                  }
                }
              }
            }
          }
        ]
      , [ "artifacts"
        , { "type": "disjoint_map_union"
          , "msg": "srcs artifacts must not overlap"
          , "$1":
            { "type": "++"
            , "$1": {"type": "var", "name": "patched files per target"}
            }
          }
        ]
      ]
    , "body":
      {"type": "RESULT", "artifacts": {"type": "var", "name": "artifacts"}}
    }
  }
}
#+END_SRC

A typical invocation of that rule would be a target file like the following.
#+BEGIN_SRC
{ "input.txt":
  { "type": "ed patch"
  , "script": ["%g/world/s//user/g", "%g/World/s//USER/g"]
  , "srcs": [["FILE", null, "input.txt"]]
  }
}
#+END_SRC

*** Implicit dependencies and config transitions

Say, instead of patching a file, we want to generate source files
from some high-level description using our actively developed code
generator. Then we have to do some additional considerations.
- First of all, every target defined by this rule not only depends
  on the targets the user specifies. Additionally, our code
  generator is also an implicit dependecy. And as it is under
  active development, we certainly do not want it to be taken from
  the ambient build environment (as we did in the previous exmaple
  with ~ed~ which, however, is a pretty stable tool). So we use an
  ~implicit~ target for this.
- Next, we notice that our code generator is used during the
  build. In particular, we want that tool (written in some compiled
  language) to be built for the platform we run our actions on, not
  the target platform we build our final binaries for. Therefore,
  we have to use a configuration transition.
- As our defining expression also needs the configuration transition
  to access the artifacts of that implict target, we better define
  it as a reusable expression. Other rules in our rule collection
  might also have the same task; so ~["transitions", "for host"]~
  might be a good place to define it. In fact, it can look like
  the expression with that name in our own code base.

So, the overall organisation of our rule might be as follows.

#+BEGIN_SRC
{ "generated code":
  { "target_fields": ["srcs"]
  , "implicit": {"generator": [["generators", "foogen"]]}
  , "config_vars": ["HOST_ARCH"]
  , "imports": {"for host": ["transitions", "for host"]}
  , "config_transitions":
    {"generator": [{"type": "CALL_EXPRESSION", "name": "for host"}]}
  , "expression": ...
  }
}
#+END_SRC