summaryrefslogtreecommitdiff
path: root/src/search
diff options
context:
space:
mode:
authorRich Felker <dalias@aerifal.cx>2018-10-12 22:32:41 -0400
committerRich Felker <dalias@aerifal.cx>2018-10-12 23:31:27 -0400
commitd44b07fc904f6a0d31ba025f3e9f423c1e47547e (patch)
tree42805b03f474a9fbfa3f4f3d2d4241d6435fdef8 /src/search
parent37cd1676395e5ebdae3f372bf59d4fef54be9818 (diff)
downloadmusl-d44b07fc904f6a0d31ba025f3e9f423c1e47547e.tar.gz
musl-d44b07fc904f6a0d31ba025f3e9f423c1e47547e.tar.bz2
musl-d44b07fc904f6a0d31ba025f3e9f423c1e47547e.tar.xz
musl-d44b07fc904f6a0d31ba025f3e9f423c1e47547e.zip
rewrite core of the glob implementation for correctness & optimization
this code has been long overdue for a rewrite, but the immediate cause that necessitated it was total failure to see past unreadable path components. for example, A/B/* would fail to match anything, even though it should succeed, when both A and A/B are searchable but only A/B is readable. this problem both was caught in conformance testing, and impacted users. the old glob implementation insisted on searching the listing of each path component for a match, even if the next component was a literal. it also used considerable stack space, up to length of the pattern, per recursion level, and relied on an artificial bound of the pattern length by PATH_MAX, which was incorrect because a pattern can be much longer than PATH_MAX while having matches shorter (for example, with necessarily long bracket expressions, or with redundancy). in the new implementation, each level of recursion starts by consuming the maximal literal (possibly escaped-literal) path prefix remaining in the pattern, and only opening a directory to read when there is a proper glob pattern in the next path component. it then recurses into each matching entry. the top-level glob function provided automatic storage (up to PATH_MAX) for construction of candidate/result strings, and allocates a duplicate of the pattern that can be modified in-place with temporary null-termination to pass to fnmatch. this allocation is not a big deal since glob already has to perform allocation, and has to link free to clean up if it experiences an allocation failure or other error after some results have already been allocated. care is taken to use the d_type field from iterated dirents when possible; stat is called only when there are literal path components past the last proper-glob component, or when needed to disambiguate symlinks for the purpose of GLOB_MARK. one peculiarity with the new implementation is the manner in which the error handling callback will be called. if attempting to match */B/C/D where a directory A exists that is inaccessible, the error reported will be a stat error for A/B/C/D rather than (previous and wrong implementation) an opendir error for A, or (likely on other implementations) a stat error for A/B. such behavior does not seem to be non-conforming, but if it turns out to be undesirable for any reason, backtracking could be done on error to report the first component producing it. also, redundant slashes are no longer normalized, but preserved as they appear in the pattern; this is probably more correct, and falls out naturally from the algorithm used. since trailing slashes (which force all matches to be directories) are preserved as well, the behavior of GLOB_MARK has been adjusted not to append an additional slash to results that already end in slash.
Diffstat (limited to 'src/search')
0 files changed, 0 insertions, 0 deletions