summaryrefslogtreecommitdiff
path: root/src/buildtool/file_system
AgeCommit message (Collapse)Author
2023-07-13just-mr: Fix handling of .gitignore files in git repositoriesPaul Cristian Sarbu
The command 'git add .' does not include paths found in .gitignore files in the directory tree where the command is issued. This is not the desired behaviour, as we expect for a tree with a given commit id to contain all of the entries, irrespective of their meaning to Git. This commit addresses the issue as described. For the just-mr.py script we modified the staging command to 'git add -f .'. For the compiled just-mr, simply adding the force flag to 'git_index_add_all' did not work as intended for files found in ignored subdirectories. This is a known libgit2 issue which has been fixed in v1.6.3. Until we can upgrade our libgit2 version, a workaround was implemented: we recursively read the directory entries ourselves and add each of them iteratively using 'git_index_add_bypath', making sure to ignore the root '.git' subtree (which cannot be staged). At the moment the handling of Git submodules remains an open issue, as Git does not allow '.git' subtrees to be forcefully added to the index, and thus such directory entries will currently not be considered as part of a git tree. This however is consistent behavior between Git and libgit2. (cherry picked from f234434a6fa2118b10765cff2f75bbc3196fec39)
2023-07-13FileSystemManager: Add recursive directory entries reader...Paul Cristian Sarbu
...allowing the skipping of certain subtrees if needed. This is useful, e.g., in simulating what a 'git add' call would do, which ignores all '.git' subdirectories. (cherry picked from 14715e3da452dd73363bc86f92cd9e5b9fdb3a7b)
2023-07-13git cas: only compute absolute path if not absolute alreadyKlaus Aehlig
... and in this way, continue to work correctly in the absence of a current working directory. (cherry picked from 06bb4f11a21aae5713d75b496145f6621302ae3a)
2023-06-09file_system: Avoid malloc in 'fdless' copy/writeOliver Reiche
... to remove the risk of deadlocks on certain combinations of C++ standard library and libc when performing the copy/write in a child process. For 'fdless' copy/write, a child process is used to prevent the parent from getting polluted with open writable file descriptors (which might get inherited by other children that keep them open and can cause EBUSY errors). (cherry picked from 5142b99f94dcbf47274a5f32a1780cf865621401)
2023-05-15memcheck: fix race in libgit2...Paul Cristian Sarbu
...caused by incorrectly setting and resetting the library internal state and the misuse of pthreads in libgit2. Normally, git_libgit2_init and git_libgit2_shutdown should span the life of a worker thread in order to be safely used. However, due to an incorrect implementation of libgit2's threadstate with pthreads, on unix systems there is a race condition. Until the use of pthread_key_t is corrected in libgit2, we need to apply a workaround by always ensuring that the main thread is the first thread reaching the GitContext constructor.
2023-05-05GitTree: Check optional before accessing itOliver Reiche
... and drop unecessary IsTree() check.
2023-04-26imports: Switch to Microsoft GSL implementationOliver Reiche
... with two minor code base changes compared to previous use of gsl-lite: - dag.hpp: ActionNode::Ptr and ArtifactNode::Ptr are not wrapped in gsl::not_null<> anymore, due to lack of support for wrapping std::unique_ptr<>. More specifically, the move constructor is missing, rendering it impossible to use std::vector<>::emplace_back(). - utils/cpp/gsl.hpp: New header file added to implement the macros ExpectsAudit() and EnsureAudit(), asserts running only in debug builds, which were available in gsl-lite but are missing in MS GSL.
2023-03-31git tree: degrade log level for missing entry in a git tree to debugKlaus Aehlig
... as this is only an internal functionality, and the caller will take care of a proper error message if the absence of that entry is not expected.
2023-03-30GitRepo: Guard fake repository odb wrappingPaul Cristian Sarbu
In the current libgit2 implementation, a fake repository wrapped around an existing odb is being registered as owner the same way as a normal repository object. Therefore, one has to guard both the creation and destruction of the fake repository against all other git operations that might access the internal cache during this transfer of ownership.
2023-03-23GitRepo: Make tag creation operation more robustPaul Cristian Sarbu
Use a similar logic as for repository initialisation: first check if tag has not already been created in another process, and only then try creation; make more tries with more wait in between; only retry if failure was due to internal locking.
2023-03-23GitRepo: Add proper error message for keep tag operationPaul Cristian Sarbu
2023-03-23GitRepo: Make repository initialisation more robustPaul Cristian Sarbu
As the initialisation of Git repositories is something that only takes place once, we should check early and cheaply whether the repository is already there before trying to initialize it. If we do need to initilize a repo, we can afford more attempts and longer wait times between tries to initalize if the failure to initialize happens due to the internal Git locking mechanism.
2023-03-23GitRepo: Add proper error message for repository initialisationPaul Cristian Sarbu
2023-03-23GitRepo: Make repository path usage explicitPaul Cristian Sarbu
Opening a repo should not check parent directories, only try to open at given path.
2023-03-23targets: Fix deps structurePaul Cristian Sarbu
2023-03-22just-mr: Shell out to system Git for fetches over SSH...Paul Cristian Sarbu
...due to limited SSH support in libgit2. In order to allow the fetches to still be parallel, we execute: git fetch --no-auto-gc --no-write-fetch-head <repo> [<branch>] This only fetches the packs without updating any refs, at the slight cost of sometimes fetching some redundant information, which for our purposes is practically a non-issue. (If really needed, a 'git gc' call can be done eventually to try to compact the fetched packs, although a save in disk space is not actually guaranteed.)
2023-03-15add missing ldflags -pthread and use -pthread consistentlyAlberto Sartori
2023-03-14GitRepo: Fix memory leak in keep tag operationPaul Cristian Sarbu
2023-03-13Storage: Reworked storage and garbage collectionOliver Reiche
The improved GC implementation uses refactored storage classes instead of directly accessing "unknown" file paths. The required storage class refactoring is quite substantial and outlined in the following paragraphs. The module `buildtool/file_system` was extended by: - `ObjectCAS`: a plain CAS implementation for reading/writing blobs and computing digests for a given `ObjectType`. Depending on that type, files written to the file system may have different properties (e.g., the x-bit set) or the digest may be computed differently (e.g., tree digests in non-compatible mode). A new module `buildtool/storage` was introduced containing: - `LocalCAS`: provides a common interface for the "logical CAS", which internally combines three `ObjectCAS`s, one for each `ObjectType` (file, executable, tree). - `LocalAC`: implements the action cache, which needs the `LocalCAS` for storing cache values. - `TargetCache`: implements the high-level target cache, which also needs the `LocalCAS` for storing cache values. - `LocalStorage`: combines the storage classes `LocalCAS`, `LocalAC`, and `TargetCache`. Those are initialized with settings from `StorageConfig`, such as the build root base path or number of generations for the garbage collector. `LocalStorage` is templated with a Boolean parameter `kDoGlobalUplink`, which indicates that, on every read/write access, the garbage collector should be used for uplinking across all generations (global). - `GarbageCollector`: responsible for garbage collection and the global uplinking across all generations. To do so, it employs instances of `LocalStorage` with `kDoGlobalUplink` set to false, in order to avoid endless recursion. The actual (local) uplinking within two single generations is performed by the corresponding storage class (e.g., `TargetCache` implements uplinking of target cache entries between two target cache generations etc.). Thereby, the actual knowledge how data should be uplinked is implemented by the instance that is responsible for creating the data in the first place.
2023-03-08GitRepo: Add method to check existence of a Git treePaul Cristian Sarbu
2023-03-07Git: Move 'fake' repository log messages to more appropriate reporting levelPaul Cristian Sarbu
2023-03-06GitRepo: Add missing retval check for git oid libgit2 callsPaul Cristian Sarbu
2023-03-03GitRepoRemote: Correctly honor SSL certification settings in fetch and ↵Paul Cristian Sarbu
commit update Uses the SSL certification utility method to correctly set the certification check options for the remote URL libgit2 calls. Due to the fact that remote operations are done via a temporary repository to allow concurrent work, the correct repository configuration needs to be interrogated. Thankfully, libgit2 provides a thread safe config snapshot object to be used in such scenarios. Also updates the existing GitRepoRemote tests accordingly.
2023-03-03GitRepoRemote: Add getter for config snapshotPaul Cristian Sarbu
2023-02-17structure cleanup: move remote operations of GitRepo to other_tools...Paul Cristian Sarbu
...in order to not include unwanted dependencies in just proper. The new class extends the GitRepo class used for just's Git tree operations and gets used in all of just-mr's async maps.
2023-01-31Improve error message if git tree walk failedOliver Reiche
2023-01-26Fix boostrapping on less optimizing compilersKlaus T. Aehlig
While compilers are allowed to drop unused functions in anonymous name spaces, and in this way also the open linker symbols referenced there, they are not obliged to do so. Not optimizing away such unused functions when compiled with -DBOOTSTRAP_BUILD_TOOL causes the linking fail in the initial phase of the boostrap process where libgit2 is not yet available (nor really needed). Therefore, ensure that those dead functions are absent in the initial bootstrap phase using appropriate preprocessor directives. Signed-off-by: Klaus T. Aehlig <aehlig@linta.de>
2023-01-24FileSystemManager: Do not follow symlinksOliver Reiche
... and ensure that cascades of checks are performed with only a single filesystem stat per method.
2023-01-24FileSystemManager: Pass ReadDirectory error to callerOliver Reiche
2023-01-24Just-MR: Remove obsolete GET_BRANCH_REFNAME critical Git operationPaul Cristian Sarbu
As now all remote Git operations in GitRepo require at most just the branch name, there is no more need to inquire the repository about branch refspecs.
2023-01-24GitRepo: Add SSL certificate verification callbacks for remote commit updatePaul Cristian Sarbu
The libgit2 library does not satisfy the http.sslVerify gitconfig field or the GIT_SSL_NO_VERIFY environment variable, so we have to perform these checks ourselves and supply the correct return value from the certificate_check git_fetch_options callback. The callbacks used for fixing the remote fetch SSL certificate verification are reused here.
2023-01-24GitRepo: Add SSL certificate verification callbacks for remote fetchPaul Cristian Sarbu
The libgit2 library does not satisfy the http.sslVerify gitconfig field or the GIT_SSL_NO_VERIFY environment variable, so we have to perform these checks ourselves and supply the correct return value from the certificate_check git_fetch_options callback.
2023-01-24GitRepo: Remove refspec argument in retrieving commit from remote...Paul Cristian Sarbu
...and use instead the branch name. A valid direct refspec (as those retrieved by a remote_ls call) will always end in the branch name, so checking the last path component ('/'-delimited substring) of a retrieved refspec is enough.
2023-01-24GitRepo: Add libgit2 proxy options for remote connection for commit updatePaul Cristian Sarbu
2023-01-24GitRepo: Set libgit2 option for auto-detection of proxy settings in fetching ↵Paul Cristian Sarbu
from remote
2023-01-24GitRepo: Change FetchFromRemote to fetch based on branch namePaul Cristian Sarbu
This also removes the need to call the GET_BRANCH_REFNAME critical operation.
2023-01-20Add local garbage collectionSascha Roloff
2023-01-20Move file_storage.hpp to file_system subdirectorySascha Roloff
2023-01-19Minor fixes for compiling with clang-14Oliver Reiche
2022-12-23Just-MR: Remove wrong pass-by-reference when wrapping loggersPaul Cristian Sarbu
Passing the logger by reference would require the caller to be kept alive. Also, being a shared_ptr, the logger can be passed by value at almost no cost.
2022-12-21FS Manager: Add CopyDirectoryImpl methodPaul Cristian Sarbu
2022-12-21Git: Wrap libgit2 raw pointersPaul Cristian Sarbu
2022-12-21Test: Add tests for critical git opsPaul Cristian Sarbu
2022-12-21Just-MR: Add non-critical git ops logic to git repo classPaul Cristian Sarbu
2022-12-21Just-MR: Add critical ops logic to git repo classPaul Cristian Sarbu
2022-12-21Git CAS: Move Git tree ops to fake repo wrapper classPaul Cristian Sarbu
2022-12-20Git CAS: Add fake repository wrapper for git odbPaul Cristian Sarbu
2022-12-20Git CAS: Clean includesPaul Cristian Sarbu
Removed unused file_system_manager dependency
2022-12-20Git CAS: Add a Git context class to maintain the libgit2 statePaul Cristian Sarbu
2022-11-21Use the newly-added concept of private-depsKlaus Aehlig
While there, also add all direct dependencies explicitly; using directly dependencies that are pulled in only indireclty causes problems from a maintainability point of view.