Age | Commit message (Collapse) | Author | Files | Lines |
|
commit ae388becb529428ac926da102f1d025b3c3968da accidentally
introduced #define SYSCALL_NO_TLS 1 in mmap.c, which was probably a
stale change left around from unrelated syscall timing measurements.
reverse it.
|
|
the definitions of SO_TIMESTAMP* changed on 32-bit archs in commit
38143339646a4ccce8afe298c34467767c899f51 to the new versions that
provide 64-bit versions of timeval/timespec structure in control
message payload. socket options, being state attached to the socket
rather than function calls, are not trivial to implement as fallbacks
on ENOSYS, and support for them was initially omitted on the
assumption that the ioctl-based polling alternatives (SIOCGSTAMP*)
could be used instead by applications if setsockopt fails.
unfortunately, it turns out that SO_TIMESTAMP is sufficiently old and
widely supported that a number of applications assume it's available
and treat errors as fatal.
this patch introduces emulation of SO_TIMESTAMP[NS] on pre-time64
kernels by falling back to setting the "_OLD" (time32) versions of the
options if the time64 ones are not recognized, and performing
translation of the SCM_TIMESTAMP[NS] control messages in recvmsg.
since recvmsg does not know whether its caller is legacy time32 code
or time64, it performs translation for any SCM_TIMESTAMP[NS]_OLD
control messages it sees, leaving the original time32 timestamp as-is
(it can't be rewritten in-place anyway, and memmove would be mildly
expensive) and appending the converted time64 control message at the
end of the buffer. legacy time32 callers will see the converted one as
a spurious control message of unknown type; time64 callers running on
pre-time64 kernels will see the original one as a spurious control
message of unknown type. a time64 caller running on a kernel with
native time64 support will only see the time64 version of the control
message.
emulation of SO_TIMESTAMPING is not included at this time since (1)
applications which use it seem to be prepared for the possibility that
it's not present or working, and (2) it can also be used in sendmsg
control messages, in a manner that looks complex to emulate
completely, and costly even when running on a time64-supporting
kernel.
corresponding changes in recvmmsg are not made at this time; they will
be done separately.
|
|
the LFS64 macro was not self-documenting and barely saved any
characters. simply use weak_alias directly so that it's clear what's
being done, and doesn't depend on a header to provide a strange macro.
|
|
under some conditions, the mmap syscall wrongly fails with EPERM
instead of ENOMEM when memory is exhausted; this is probably the
result of the kernel trying to fit the allocation somewhere that
crosses into the kernel range or below mmap_min_addr. in any case it's
a conformance bug, so work around it. for now, only handle the case of
anonymous mappings with no requested address; in other cases EPERM may
be a legitimate error.
this indirectly fixes the possibility of malloc failing with the wrong
errno value.
|
|
normally 32-bit archs use the mmap2 syscall and are limited to an
offset of 2^32 pages. however some 32-bit archs (mainly ILP32-on-64
ones like x32) have 64-bit syscall argument slots and thus can accept
the full range. don't artifically limit them.
|
|
this global lock allows certain unlock-type primitives to exclude
mmap/munmap operations which could change the identity of virtual
addresses while references to them still exist.
the original design mistakenly assumed mmap/munmap would conversely
need to exclude the same operations which exclude mmap/munmap, so the
vmlock was implemented as a sort of 'symmetric recursive rwlock'. this
turned out to be unnecessary.
commit 25d12fc0fc51f1fae0f85b4649a6463eb805aa8f already shortened the
interval during which mmap/munmap held their side of the lock, but
left the inappropriate lock design and some inefficiency.
the new design uses a separate function, __vm_wait, which does not
hold any lock itself and only waits for lock users which were already
present when it was called to release the lock. this is sufficient
because of the way operations that need to be excluded are sequenced:
the "unlock-type" operations using the vmlock need only block
mmap/munmap operations that are precipitated by (and thus sequenced
after) the atomic-unlock they perform while holding the vmlock.
this allows for a spectacular lack of synchronization in the __vm_wait
function itself.
|
|
the whole point of this locking is to prevent munmap, or mmap with
MAP_FIXED, from deallocating virtual addresses, or changing the
backing a given virtual address refers to, during certain race windows
involving self-synchronized unmapping or destruction of pthread
synchronization objects. there is no need for exclusion in the other
direction, so it suffices to take the lock momentarily and release it
before making the syscall, rather than holding it across the syscall.
|
|
|
|
internally, other parts of the library assume sizes don't overflow
ssize_t and/or ptrdiff_t, and the way this assumption is made valid is
by preventing creating of such large objects. malloc already does so,
but the check was missing from mmap.
this is also a quality of implementation issue: even if the
implementation internally could handle such objects, applications
could inadvertently invoke undefined behavior by subtracting pointers
within an object. it is very difficult to guard against this in
applications, so a good implementation should simply ensure that it
does not happen.
|
|
the previous logic was assuming the kernel would give EINVAL when
passed an invalid address, but instead with MAP_FIXED it was giving
EPERM, as it considered this an attempt to map over kernel memory.
instead of trying to get the kernel to do the rigth thing, the new
code just handles the error in userspace.
I have also cleaned up the code to use a single mask to check for
invalid low bits and unsupported high bits, so it's simpler and more
clearly correct. the old code was actually wrong for sizeof(long)
smaller than sizeof(off_t) but not equal to 4; now it should be
correct for all possibilities.
for 64-bit systems, the low-bits test is new and extraneous (the
kernel should catch the error anyway when the mmap2 syscall is not
used), but it's cheap anyway. if this is an issue, the OFF_MASK
definition could be tweaked to omit the low bits when SYS_mmap2 is not
defined.
|
|
this implementation is rather heavy-weight, but it's the first
solution i've found that's actually correct. all waiters actually wait
twice at the barrier so that they can synchronize exit, and they hold
a "vm lock" that prevents changes to virtual memory mappings (and
blocks pthread_barrier_destroy) until all waiters are finished
inspecting the barrier.
thus, it is safe for any thread to destroy and/or unmap the barrier's
memory as soon as pthread_barrier_wait returns, without further
synchronization.
|
|
|
|
|
|
- hide all the legacy xxxxxx32 name cruft in syscall.h so the actual
source files can be clean and uniform across all archs.
- cleanup llseek/lseek and mmap2/mmap handling for 32/64 bit systems
- alternate implementation for nice if the target lacks nice syscall
|
|
|