summaryrefslogtreecommitdiff
path: root/doc/future-designs/upwards-symlinks.md
blob: 8623834075428dd6a0c84b3ed684b93bc333dd6f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# Support for upwards symbolic links

## Background

### Existing symlink support

Our buildsystem already supports non-upwards relative symbolic links
as first-class objects. The reason for chosing this restriction was
that those can be placed anywhere inside an action directory without
that directory becoming dependent on the ambient system, regardless
if symbolic links are followed or inspected via readlink(2).

Additionally, `just-mr` supports resolving symlinks that are
entirely within a logical root. This allows to support roots where
symlinks are used to have file under version control only once; as
roots are internally represented in a content-addressable way, the
duplication implicit to the resolving comes at no extra cost.

### Level of enforcement

The restriction to only allow only non-upwards relative symlinks
is enforced
- in the output of actions, including output directories,
- in explicit tree references on `file` roots.

However, it is not eforced when using an explicit tree reference on
a `git` root as input. The reason is that we want to benefit from
the fact that trees are already hashed in git repositories so that
we can get values (and harvest cache hits) without ever looking at
that subdirectory.

### Dangling relative upwards symbolic links

While used rarely, dangling upwards relative symlinks do exist in
some projects, also for legitimate reasons. Being dangling, they
cannot be resolved in the root definition. Where such projects occur
as (transitive) dependency, some form of extended symlink handling
seems desirable.

## Proposed changes

### Extend definition language to support arbitrary relative symlinks

We extend the description language to allow handling relative
symlinks, including upward ones, in a safe way.

#### Extensional symlink level of a computed artifact

We associate a symlink-level with blobs and trees that do not
transitively contain absolute symlinks in the following way.
- Files and executable files have symlink level 0.
- The symlink level of a relative symlink is the number of (necessarily
  leading) `../` of the syntactical canonical form, when read as
  a relative path.
- The symlink level of a tree is the maximum of 0 and the symlink-levels
  of all (immediate) subobjects reduced by one.

#### The declared symlink level

In the description language a declared symlink level is associated
with each artifact; the invariant is that
- the declared level is always at least as big as the extensional level
  of the defined object, and
- the action directory of each action has declared level 0, ensuring
  that action directories do not refer to external sources.

In order to do so, we extend our target names for explicit symlink
and tree references to optionally (defaulting to 0) declare a
symlink level. This is done by allowing in the target name, an
additional dict as last entry, which may contain the key `"symlink
level"`; the value for that key is the declared level and has to
be a non-negative integer. For example,
- `["SYMLINK", null, "foo", {"symlink level": 3}]` is an explicit
  symlink referece to a symlink at `foo` in the current module with
  declared symlink level 3, and
- `["TREE", null, "bar", {"symlink level": 4}]` is an explicit tree
  reference to a tree located at `bar` in the current module with
  declared symlink level 4.
The reason for chosing a dict rather than a positional argument is
to be prepared for additional declared properties in the future,
should they become necessary.

We extend the definition of `"ACTION"` function available inside
rule definitions to declare a symlink level of output files and
output directories by allowing an optional map `"symlink level"`
with keys the names of the declared outputs (files and directories)
and values non-negative integers. The declared symlink level of
an action artifact is the value of the output path in that map; if
not found in that map, the declared value is 0; in this way, this
extension is backwards compatible. Moreover, we require that in
the input stage of an action, every artifact of declared symlink
level `n` is staged under at least `n` directories.

We also extend the `"SYMLINK"` function available inside rule
definitions to have an optional field `"symlink level"` specifying
the declared symlink level. The value, if specified, has to be a
non-negative integer and the default is 0.

Artifacts defined by the `"TREE"` function have as declared symlink
level the maximum of 0 and for each artifact of the defining stage
the difference of the symlink level of the artifact and the number
of directories it is staged under.

Artifacts defined by the `"BLOB"` function available inside rule
definitions have symlink level 0.

### Enforce correctness of symlink level

All computed artifacts (that are `KNOWN` artifacts once computed)
carry the actual symlink level as part of their data structure;
when serializing the artifact description, the symlink level is
reported if (and only if) it is not 0.

For inputs (explicit symlink and tree reference) we enforce that the
actual symlink level is not larger than the declared one. To enforce
this check while keeping the benefits of explicit tree references
in `git` roots, we keep a cache (in the form of a simple mapping on
disk) of tree identifiers to their symlink level. This map will be
garbage collected together with the repository roots (as described
in the "Gargabe collection for Repository Roots" design).

We also enforce the correctness of the symlink level of action
outputs: for `"outs"`, if they are a symlink, the level is checked
and the action rejected if the actual level is larger than the
declared one. For `"out_dirs"`, the directory is rejected if the
actual symlink level is larger than the declared one. It then follows
that all action have actual symlink level 0 of the input stage.

### Extensional projection reduces symlink level

In evaluation of an export target, the intensional description with
the declared symlink level is replaced by the extensional description
using the extensional symlink level. By our construction, the
latter is less or equal than the former. Hence, this projections
allows at most _more_ builds which is in line with the properties
of export targets.