Yu Watanabe [Mon, 17 Apr 2023 21:36:42 +0000 (06:36 +0900)]
gpt-auto: do not fail when no suitable partitions found
Follow-up for
598fd4da1cf9665834110583fd9133073cc12481.
Daan De Meyer [Mon, 17 Apr 2023 22:46:11 +0000 (00:46 +0200)]
getty-generator: Use device hotplug to instantiate virtualizer consoles
If getty-generator runs in the initrd, the corresponding tty might not
have been instantiated yet in /dev, which means a serial getty is not
spawned on it. Instead, let's instantiate the serial-getty when the
device appears so that it always gets instantiated.
Lennart Poettering [Thu, 16 Mar 2023 16:56:23 +0000 (17:56 +0100)]
lsm-util: move detection of support of LSMs into a new lsm-util.[ch] helper
This makes the bpf LSM check generic, so that we can use it elsewhere.
it also drops the caching inside it, given that bpf-lsm code in PID1
will cache it a second time a stack frame further up when it checks for
various other bpf functionality.
Dominique Martinet [Sun, 16 Apr 2023 07:14:49 +0000 (16:14 +0900)]
bpf-firewall: give a name to maps used
Running systemd with IP accounting enabled generates many bpf maps (two
per unit for accounting, another two if IPAddressAllow/Deny are used).
Systemd itself knows which maps belong to what unit and commands like
`systemctl status <unit>` can be used to query what service has which
map, but monitoring these values all the time costs 4 dbus requests
(calling the .IP{E,I}gress{Bytes,Packets} method for each unit) and
makes services like the prometheus systemd_exporter[1] somewhat slow
when doing that for every units, while less precise information could
quickly be obtained by looking directly at the maps.
Unfortunately, bpf map names are rather limited:
- only 15 characters in length (16, but last byte must be 0)
- only allows isalnum(), _ and . characters
If it wasn't for the length limit we could use the normal unit escape
functions but I've opted to just make any forbidden character into
underscores for maximum brievty -- the map prefix is also rather short:
This isn't meant as a precise mapping, but as a hint for admins who want
to look at these.
(Note there is no problem if multiple maps have the same name)
Link: https://github.com/povilasv/systemd_exporter
Lennart Poettering [Fri, 14 Apr 2023 15:47:43 +0000 (17:47 +0200)]
process-util: be more careful with pidfd_get_pid() special cases
Let's be more careful with generating error codes for (expected) error
causes.
This does not introduce new error conditions, it just changes what we
return under specific cases, to make things nicely recognizable in each
case. Most importantly this detects if fdinfo reports a pid of "-1" for
pidfds with processes that are already reaped (and thus have no PID
anymore)
None of our current users care about these error codes, but let's get
this right for the future.
Florian Klink [Mon, 17 Apr 2023 12:46:05 +0000 (14:46 +0200)]
fsck: use execv_p_ and execl_p_
Instead of invoking find_executable on our own, use the variants of exec
provided by glibc which does this for us.
Luca Boccassi [Sat, 15 Apr 2023 02:01:52 +0000 (03:01 +0100)]
creds: make available to all ExecStartPre= and ExecStart= processes
Fixes https://github.com/systemd/systemd/issues/27275
jcg [Mon, 17 Apr 2023 12:41:00 +0000 (20:41 +0800)]
user-util:remove duplicate includes
Benjamin Herrenschmidt [Thu, 13 Apr 2023 03:51:31 +0000 (13:51 +1000)]
virt: Further improve detection of EC2 metal instances
Commit
f90eea7d18d9ebe88e6a66cd7a86b618def8945d
virt: Improve detection of EC2 metal instances
Added support for detecting EC2 metal instances via the product
name in DMI by testing for the ".metal" suffix.
Unfortunately this doesn't cover all cases, as there are going to be
instance types where ".metal" is not a suffix (ie, .metal-16xl,
.metal-32xl, ...)
This modifies the logic to also allow those new forms.
Signed-off-by: Benjamin Herrenschmidt <benh@amazon.com>
Daan De Meyer [Mon, 17 Apr 2023 08:18:42 +0000 (10:18 +0200)]
mkosi: Use kernel-core for Fedora and CentOS images
Let's reduce image size by using a smaller kernel package.
Hans de Goede [Sun, 16 Apr 2023 13:57:55 +0000 (15:57 +0200)]
hwdb: add accelerometer mount matrix for Lenovo Yoga Tablet 2 851F/L
Add an accelerometer mount matrix for Lenovo Yoga Tablet 2 851F/L, to fix
screen rotation now that the kernel has support for the LSM303D IMU.
Luca Boccassi [Sun, 16 Apr 2023 22:32:33 +0000 (23:32 +0100)]
Merge pull request #27298 from mrc0mmand/test-async-tweaks
test: modernize test-async a bit
Yu Watanabe [Sun, 16 Apr 2023 17:09:38 +0000 (02:09 +0900)]
process-util: make safe_fork() unset $NOTIFY_SOCKET
Propagating $NOTIFY_SOCKET is typically dangerous. Let's unset it unless
explicitly requested to keep it.
Fixes #27288.
Replaces #27291.
Frantisek Sumsal [Sun, 16 Apr 2023 18:29:41 +0000 (20:29 +0200)]
docs: add a missing $ sign
Addresses https://github.com/systemd/systemd/pull/27283#pullrequestreview-
1386816102.
Follow-up to
1a127aa02b.
Frantisek Sumsal [Sun, 16 Apr 2023 18:21:37 +0000 (20:21 +0200)]
test: modernize test-async a bit
Mainly to give it some debug output to, hopefully, see why it sometimes
gets stuck in CI when run with sanitizers.
Zbigniew Jędrzejewski-Szmek [Sun, 16 Apr 2023 10:34:49 +0000 (12:34 +0200)]
mkosi: default to Fedora 38
It'll be out this week. We can't update the man pages before it is realeased,
but we can use it for mkosi builds and do some very late testing.
Also, use filepath specification for /bin/pkg-config. We need it for meson, and
meson calls it directly by this path. pkgconfig is a virtual Provides on
pkgconf-pkg-config, and the indirection here just obfuscates things with no
benefit.
Add it explicitly for centos too. (I think it is pulled in by packages which
contain pkg-config modules anyway, but it's better to be explicit).
Yu Watanabe [Sun, 16 Apr 2023 06:31:10 +0000 (15:31 +0900)]
exec-util: make execute_strv() optionally take root directory
Preparation for rewriting kernel-install in C.
Yu Watanabe [Sun, 16 Apr 2023 10:39:58 +0000 (19:39 +0900)]
Merge pull request #27283 from mrc0mmand/assorted-test-tweaks
test: a bunch of assorted tweaks, Saturday edition
Yu Watanabe [Sun, 16 Apr 2023 07:28:26 +0000 (16:28 +0900)]
Merge pull request #27253 from yuwata/cmsg-find-and-copy-data
socket-util: introduce CMSG_FIND_AND_COPY_DATA()
Frantisek Sumsal [Sat, 15 Apr 2023 20:22:56 +0000 (22:22 +0200)]
test: add a couple of tests with invalid UTF-8 characters
Frantisek Sumsal [Sat, 15 Apr 2023 20:04:37 +0000 (22:04 +0200)]
test: add a simple test for getenv_path_list()
Frantisek Sumsal [Sat, 15 Apr 2023 19:33:02 +0000 (21:33 +0200)]
test: add a couple of basic sanity tests for the security verb
Frantisek Sumsal [Sat, 15 Apr 2023 17:51:44 +0000 (19:51 +0200)]
test: add a couple of basic sanity tests for timedatectl
Frantisek Sumsal [Sat, 15 Apr 2023 17:12:45 +0000 (19:12 +0200)]
test: add a simple test for secure-bits stuff
Frantisek Sumsal [Sat, 15 Apr 2023 16:24:13 +0000 (18:24 +0200)]
shared: add a missing include
Frantisek Sumsal [Sat, 15 Apr 2023 16:02:10 +0000 (18:02 +0200)]
test: add tests for uuid/uint64 specifiers
They're used in repart, but are not part of the "common" specifier
lists, so cover them explicitly.
Yu Watanabe [Thu, 13 Apr 2023 09:34:59 +0000 (18:34 +0900)]
tree-wide: also use CMSG_TYPED_DATA() on writing message header
Yu Watanabe [Thu, 13 Apr 2023 09:34:09 +0000 (18:34 +0900)]
sd-dhcp-server: use CMSG_FIND_DATA() at one more place
Yu Watanabe [Thu, 13 Apr 2023 09:02:48 +0000 (18:02 +0900)]
tree-wide: copy timestamp data from cmsg
On RISCV32, time_t is 64bit and size_t is 32bit, hence the timestamp
data in message header may not be aligned.
Fixes #27241.
Yu Watanabe [Thu, 13 Apr 2023 09:00:41 +0000 (18:00 +0900)]
socket-util: introduce CMSG_FIND_AND_COPY_DATA()
The cmd(3) man page says about CMSG_DATA():
> The pointer returned cannot be assumed to be suitably aligned for
> accessing arbitrary payload data types. Applications should not cast
> it to a pointer type matching the payload, but should instead use
> memcpy(3) to copy data to or from a suitably declared object.
Hence, if we want to use unaligned data in cmsg, we need to copy it
before use. That's typically important for reading timestamps in
RISCV32, as the time_t is 64bit and size_t is 32bit on the system.
Frantisek Sumsal [Sat, 15 Apr 2023 11:58:20 +0000 (13:58 +0200)]
test: add a test case for table_dup_cell()
Also, sneak in coverage for "less popular" cell types.
Daan De Meyer [Sat, 15 Apr 2023 16:51:28 +0000 (18:51 +0200)]
mkosi: Always disable sshd, dnsmasq and isc-dhcp-server
Frantisek Sumsal [Sat, 15 Apr 2023 11:12:43 +0000 (13:12 +0200)]
docs: a couple of typo fixes & formatting tweaks
Daan De Meyer [Sat, 15 Apr 2023 07:34:46 +0000 (09:34 +0200)]
mkosi: Update to latest
mkosi now installs a "ignore *" default preset on Debian. We also
switch Debian to dbus-broker now that preset doesn't disable it
anymore.
Florian Klink [Thu, 13 Apr 2023 20:54:54 +0000 (22:54 +0200)]
fsck: look for fsck binary not just in /sbin
This removes remaining hardcoded occurences of `/sbin/fsck`, and instead
uses `find_executable` to find `fsck`.
We also use `fsck_exists_for_fstype` to check for the `fsck.*`
executable, which also checks in `$PATH`, so it's fair to assume fsck
itself is also available.
Luca Boccassi [Fri, 14 Apr 2023 20:31:55 +0000 (21:31 +0100)]
Merge pull request #27273 from mrc0mmand/test-generators
test: add a couple of tests for getty/run/system-update generators
Daan De Meyer [Thu, 13 Apr 2023 17:03:43 +0000 (19:03 +0200)]
preset: Add ignore directive
The ignore directive specifies to not do anything with the given
unit and leave existing configuration intact. This allows distributions
to gradually adopt preset files by shipping a ignore * preset file.
Frantisek Sumsal [Fri, 14 Apr 2023 19:10:18 +0000 (21:10 +0200)]
test: stop the test unit when it's not needed anymore
Otherwise it keeps printing stuff to the journal/console, adding
unnecessary noise.
Frantisek Sumsal [Fri, 14 Apr 2023 19:07:51 +0000 (21:07 +0200)]
test: check the colored --version output
Fran Diéguez [Fri, 14 Apr 2023 18:20:43 +0000 (20:20 +0200)]
po: Translated using Weblate (Galician)
Currently translated at 100.0% (193 of 193 strings)
Co-authored-by: Fran Diéguez <frandieguez@gnome.org>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/master/gl/
Translation: systemd/main
Zbigniew Jędrzejewski-Szmek [Wed, 5 Apr 2023 07:30:52 +0000 (09:30 +0200)]
man/systemd-cryptenroll: update list of PCRs, link to uapi docs
Entia non sunt multiplicanda praeter necessitatem. We had a list of PCRs in the
man page which was already half out-of-date. Instead, link to web page with the
"authoritative" list. Here, drop the descriptions of what shim and grub do. Instead,
just give some short descriptions and mention what systemd components do.
systemd-pcrmachine.service and systemd-pcrfs@.service are now mentioned too.
https://github.com/uapi-group/specifications/commit/
d0e590b1e2648e76ece66157ceade3f45b165b14
extended the table in the specs repo.
https://github.com/uapi-group/specifications/pull/59 adds some more text there
too.
Also, rework the recommendation: hint that PCR 11 is useful, and recommend
binding to policy signatures instead of direct PCR values. This new text is
intentionally vague: doing this correctly is hard, but let's at least not imply
that just binding to PCR 7 is useful in any way.
Also, change "string alias" to "name" in discussion of PCR names.
Inspired by https://discussion.fedoraproject.org/t/future-of-encryption-in-fedora/80397/17
Luca Boccassi [Fri, 14 Apr 2023 15:23:51 +0000 (16:23 +0100)]
Merge pull request #27269 from poettering/statx-dont-sync
mountpoint-util: don't go to the network when doing statx() to detect mountpoints/mnt_id
Frantisek Sumsal [Fri, 14 Apr 2023 15:05:55 +0000 (17:05 +0200)]
test: add a couple of tests for run-generator
Lennart Poettering [Fri, 14 Apr 2023 10:48:14 +0000 (12:48 +0200)]
string-util: add strstrafter()
strstrafter() is like strstr() but returns a pointer to the first
character *after* the found substring, not on the substring itself.
Quite often this is what we actually want.
Inspired by #27267 I think it makes sense to add a helper for this,
to avoid the potentially fragile manual pointer increment afterwards.
Frantisek Sumsal [Fri, 14 Apr 2023 14:09:32 +0000 (16:09 +0200)]
test: add a couple of tests for system-update-generator
Frantisek Sumsal [Fri, 14 Apr 2023 13:15:13 +0000 (15:15 +0200)]
test: properly distinguish between generator phases
Let's make sure the units generated by generators are generated at the
right stage.
Daan De Meyer [Fri, 14 Apr 2023 13:19:57 +0000 (15:19 +0200)]
Merge pull request #27252 from yuwata/chase-mkdir
chase: refuse CHASE_MKDIR_0755 without CHASE_NONEXISTENT or CHASE_PARENT
Luca Boccassi [Fri, 14 Apr 2023 13:15:35 +0000 (14:15 +0100)]
Merge pull request #27266 from dtardon/take-struct
Use TAKE_STRUCT() to copy and reset structs
Luca Boccassi [Fri, 14 Apr 2023 13:14:15 +0000 (14:14 +0100)]
Merge pull request #27265 from dtardon/memleak
Fix memory leak if GREEDY_REALLOC() fails
Frantisek Sumsal [Fri, 14 Apr 2023 10:58:51 +0000 (12:58 +0200)]
test: add a couple of tests for getty-generator
Lennart Poettering [Fri, 14 Apr 2023 11:08:03 +0000 (13:08 +0200)]
mountpoint-util: use memcmp_nn() where appropriate
Lennart Poettering [Fri, 14 Apr 2023 11:05:29 +0000 (13:05 +0200)]
mountpoint-util: fix hosed overflow check
The overflow check was hosed in two ways: overflows in C are undefined,
hence gcc was free to just optimize the whole thing away. We need to
catch overflows before we run into them, not after.
It checked for an overflow against size_t, but the field we need to
write this in is unsigned. i.e. typically 32bit rather than 64bit. Hence
check for the right maximum.
(The whole check is paranoia anyway, the kernel really shouldn't return
values that would induce an overflow, but you never know, the syscall
turned out to be problematic in so many other ways, hence let's stick to
this.)
Lennart Poettering [Fri, 14 Apr 2023 10:47:47 +0000 (12:47 +0200)]
mountpoint-util: pass AT_STATX_DONT_SYNC to statx() when looking for mnt_id/mountpoints
The concept of a "mount" is a local one, hence there's no point in going
to the network to retrieve mnt_id or STATX_ATTR_MOUNT_ROOT. Hence set
AT_STATX_DONT_SYNC so that the call will not go to the network ever, and
risk deadlocking on that.
Just some extra safety.
Frantisek Sumsal [Fri, 14 Apr 2023 10:58:16 +0000 (12:58 +0200)]
test: allow overriding PID1's environment for test purposes
Frantisek Sumsal [Sat, 8 Apr 2023 18:49:45 +0000 (20:49 +0200)]
test: add a couple of test for fstab-related kernel cmdline args
Frantisek Sumsal [Sat, 8 Apr 2023 18:49:14 +0000 (20:49 +0200)]
test: check if x-systemd.automount is ignored for rootfs
Frantisek Sumsal [Fri, 7 Apr 2023 09:37:39 +0000 (11:37 +0200)]
test: run the generators with debug log level
unless requested otherwise.
David Tardon [Fri, 14 Apr 2023 08:21:17 +0000 (10:21 +0200)]
install: use FOREACH_ARRAY
David Tardon [Fri, 14 Apr 2023 07:51:27 +0000 (09:51 +0200)]
tree-wide: rename cleanup function
... with accordance to the current coding style.
David Tardon [Fri, 14 Apr 2023 07:43:43 +0000 (09:43 +0200)]
install: fix memory leak if GREEDY_REALLOC() fails
David Tardon [Fri, 14 Apr 2023 08:08:31 +0000 (10:08 +0200)]
tree-wide: add some asserts
David Tardon [Fri, 14 Apr 2023 07:59:27 +0000 (09:59 +0200)]
tree-wide: use TAKE_STRUCT
Yu Watanabe [Fri, 14 Apr 2023 07:29:08 +0000 (16:29 +0900)]
chase: CHASE_MKDIR_0755 requires CHASE_NONEXISTENT and/or CHASE_PARENT
When CHASE_MKDIR_0755 is specified without CHASE_NONEXISTENT and
CHASE_PARENT, then chase() succeeds only when the file specified by
the path already exists, and in that case, chase() does not create
any parent directories, and CHASE_MKDIR_0755 is meaningless.
Let's mention that CHASE_MKDIR_0755 needs to be specified with
CHASE_NONEXISTENT or CHASE_PARENT, and adds a assertion about that.
Yu Watanabe [Fri, 14 Apr 2023 07:28:54 +0000 (16:28 +0900)]
chase: use FLAGS_SET() macro
Yu Watanabe [Fri, 14 Apr 2023 04:55:31 +0000 (13:55 +0900)]
tree-wide: replace __alignof__() with alignof()
Addresses https://github.com/systemd/systemd/pull/27254#discussion_r1165267046.
Yu Watanabe [Thu, 13 Apr 2023 06:20:49 +0000 (15:20 +0900)]
socket-util: add one missing paren
Follow-up for
b6256af75e0609e451198ed90c293efd50827ab3.
Yu Watanabe [Thu, 13 Apr 2023 07:40:36 +0000 (16:40 +0900)]
timesync: drop unnecessary initialization
Yu Watanabe [Fri, 14 Apr 2023 04:49:04 +0000 (13:49 +0900)]
Merge pull request #27254 from poettering/cmsg-align-check
socket-util: tighten CMSG_TYPED_DATA() alignment checks
Luca Boccassi [Thu, 13 Apr 2023 23:25:06 +0000 (00:25 +0100)]
Merge pull request #27144 from enr0n/fix-scope-timer-on-coldplug
scope: do not disable timer event source when state is SCOPE_RUNNING
Luca Boccassi [Tue, 1 Nov 2022 23:34:15 +0000 (23:34 +0000)]
user units: implicitly enable PrivateUsers= when sandboxing options are set
Enabling these options when not running as root requires a user
namespace, so implicitly enable PrivateUsers=.
This has a side effect as it changes which users are visible to the unit.
However until now these options did not work at all for user units, and
in practice just a handful of user units in Fedora, Debian and Ubuntu
mistakenly used them (and they have been all fixed since).
This fixes the long-standing confusing issue that the user and system
units take the same options but the behaviour is wildly (and sometimes
silently) different depending on which is which, with user units
requiring manually specifiying PrivateUsers= in order for sandboxing
options to actually work and not be silently ignored.
Luca Boccassi [Thu, 13 Apr 2023 20:33:06 +0000 (21:33 +0100)]
Merge pull request #27244 from bluca/uphold_retry
Uphold/StopWhenUnneeded/BindsTo: add retry timer on rate limit
ZjYwMj [Thu, 13 Apr 2023 20:30:42 +0000 (20:30 +0000)]
Synposis and description of networkctl man page reflecting only part of its functionality (#27264)
* Fix inaccurate synposis, and description
Before the fix, they reflected only part of networkctl functionality.
Mike Yuan [Thu, 13 Apr 2023 15:04:49 +0000 (23:04 +0800)]
core/main: fix a typo for --log-target
Follow-up for
d2ebd50d7f9740dcf30e84efc75610af173967d2
Fixes #27105
Nick Rosbrook [Thu, 13 Apr 2023 15:29:32 +0000 (11:29 -0400)]
test: add some tests for RuntimeMaxSec
Make sure the RuntimeMaxSec is applied correctly to service and scope
units when they are started, and also on coldplug.
Nick Rosbrook [Tue, 4 Apr 2023 22:39:26 +0000 (18:39 -0400)]
scope: do not disable timer event source when state is SCOPE_RUNNING
In scope_set_state(), the timer event source may be disabled depending
on the state. Currently, it will be disabled when the state is
SCOPE_RUNNING. This has the effect of new RuntimeMaxSec values being
ignored on coldplug.
Note that this issue is not currently present when scopes are started
because when scope_start() is called, scope_arm_timer() is called after
scope_set_state().
Luca Boccassi [Thu, 6 Apr 2023 11:19:22 +0000 (12:19 +0100)]
systemd-confext: mount confexts as noexec and nosuid
Confexts should not contain code, so mount confexts with noexec.
We cannot mount invidial extensions as noexec, as the overlay ignores
it and bypasses it, we need to use the flag on the whole overlay for
it to be effective.
But given there are legacy scripts still shipped in /etc, allow to
override it with --noexec=false.
Daan De Meyer [Wed, 12 Apr 2023 15:27:06 +0000 (17:27 +0200)]
mkosi: Update to latest
The Bootable= option was removed and mkosi installs less packages
by default now, so let's adapt our configs to those changes.
Luca Boccassi [Wed, 12 Apr 2023 20:37:45 +0000 (21:37 +0100)]
Uphold/StopWhenUnneeded/BindsTo: requeue when job finishes
When a unit is upheld and fails, and there are no state changes in
the upholder, it will not be retried, which is against what the
documentation suggests.
Requeue when the job finishes. Same for the other two queues.
OMOJOLA JOSHUA DAMILOLA [Thu, 30 Mar 2023 07:55:41 +0000 (07:55 +0000)]
systemd-cryptenroll: add string aliases for tpm2 PCRs
Fixes #26697. RFE.
Yu Watanabe [Thu, 13 Apr 2023 05:29:51 +0000 (14:29 +0900)]
test: add several assertions
Follow-up for
7947dbe322a922604f3a5b29693e58b370161ad5.
Fixes CID#
1508781 and CID#
1508783.
Lennart Poettering [Thu, 13 Apr 2023 09:32:57 +0000 (11:32 +0200)]
Merge pull request #18789 from gportay/veritysetup-add-options-for-parity-with-cryptsetup-verity-utility
veritysetup: Add options for parity support with the cryptsetup's verity utility
Yu Watanabe [Wed, 12 Apr 2023 13:38:01 +0000 (22:38 +0900)]
image-policy: introduce parse_image_policy_argument() helper
Addresses
https://github.com/systemd/systemd/pull/25608/commits/
84be0c710d9d562f6d2cf986cc2a8ff4c98a138b#r1060130312,
https://github.com/systemd/systemd/pull/25608/commits/
84be0c710d9d562f6d2cf986cc2a8ff4c98a138b#r1067927293, and
https://github.com/systemd/systemd/pull/25608/commits/
84be0c710d9d562f6d2cf986cc2a8ff4c98a138b#r1067926416.
Follow-up for
84be0c710d9d562f6d2cf986cc2a8ff4c98a138b.
Sjoerd Simons [Thu, 23 Feb 2023 09:00:16 +0000 (10:00 +0100)]
repart: Discard from/to first/last usable lba
Repart considers the start and end of the usable space to the first multiple
of grainsz (at least 4096 bytes). However the first usable LBA of a GPT
partition is at sector 34 (512 bytes sectors) which is not a multiple of 4096.
The backup GPT label at the end also takes up 33 sectors, meaning the last
usable LBA is at 34 sectors from the end, unlikely to be a 4096 multiple as
well.
This meant that the very first and last sectors were never discarded. However
more problematically if an existing partition started before the first
usable grainsz multiple its start didn't get taken into account as a valid
starting point and got its data discarded.
Signed-off-by: Sjoerd Simons <sjoerd@collabora.com>
Lennart Poettering [Thu, 13 Apr 2023 08:49:15 +0000 (10:49 +0200)]
udev,sd-device: use CMSG_FIND_DATA() more
Lennart Poettering [Thu, 13 Apr 2023 08:29:34 +0000 (10:29 +0200)]
tree-wide: port more code over to CMSG_TYPED_DATA()
Lennart Poettering [Thu, 13 Apr 2023 08:21:31 +0000 (10:21 +0200)]
socket-util: tighten aignment check for CMSG_TYPED_DATA()
Apparently CMSG_DATA() alignment is very much undefined. Which is quite
an ABI fuck-up, but we need to deal with this. CMSG_TYPED_DATA() already
checks alignment of the specified pointer. Let's also check matching
alignment of the underlying structures, which we already can do at
compile-time.
See: #27241
(This does not fix #27241, but should catch such errors already at
compile-time instead of runtime)
Lennart Poettering [Thu, 13 Apr 2023 07:10:17 +0000 (09:10 +0200)]
Merge pull request #27027 from dtardon/unit-file-list-cleanup
Use _cleanup_ for UnitFileList hash
Yu Watanabe [Wed, 12 Apr 2023 17:43:52 +0000 (02:43 +0900)]
repart: always take BSD lock when whole block device is opened
Fixes #27236.
Lennart Poettering [Thu, 13 Apr 2023 05:16:24 +0000 (07:16 +0200)]
Merge pull request #27135 from poettering/pin-fdstore
Allow the per-service fdstore to be "pinned", i.e. preserved as long as the unit info remains in memory
Lennart Poettering [Tue, 4 Apr 2023 09:41:55 +0000 (11:41 +0200)]
test: validate that fdstore pinning works
Lennart Poettering [Wed, 12 Apr 2023 19:07:29 +0000 (21:07 +0200)]
pid1: add some debug logging when stashing ds into the fdstore
Lennart Poettering [Tue, 4 Apr 2023 13:51:07 +0000 (15:51 +0200)]
service: rename service_close_socket_fd() → service_release_socket_fd()
Just to match service_release_stdio_fd() and service_release_fd_store()
in the name, since they do similar things.
This follows the concept that we "release" resources, and this is all
generically wrapped in "service_release_resources()".
Lennart Poettering [Tue, 4 Apr 2023 11:42:08 +0000 (13:42 +0200)]
core: move runtime directory removal into release_resource handler
We already clear the various fds we keep from the release_resources()
handler, let's also destroy the runtime dir from there if this
preservation mode is selected.
This makes a minor semantic change: previously we'd keep a runtime
directory around if RuntimeDirectoryPreserve=restart is selected and at
least one JOB_START job was around. With this logic we'll keep it around
a tiny bit longer: as long as any job for the unit is around.
Lennart Poettering [Tue, 4 Apr 2023 10:17:16 +0000 (12:17 +0200)]
service: close fdstore asynchronously
The file descriptors we keep in the fdstore might be basically anything,
let's clean it up with our asynchronous closing feature, to not
deadlock on close().
(Let's also do the same for stdin/stdout/stderr fds, since they might
point to network services these days.)
Lennart Poettering [Wed, 29 Mar 2023 20:10:01 +0000 (22:10 +0200)]
service: allow freeing the fdstore via cleaning
Now that we have a potentially pinned fdstore let's add a concept for
cleaning it explicitly on user requested. Let's expose this via
"systemctl clean", i.e. the same way as user directories are cleaned.
Lennart Poettering [Wed, 29 Mar 2023 20:07:22 +0000 (22:07 +0200)]
service: add ability to pin fd store
Oftentimes it is useful to allow the per-service fd store to survive
longer than for a restart. This is useful in various scenarios:
1. An fd to some security relevant object needs to be stashed somewhere,
that should not be cleaned automatically, because the security
enforcement would be dropped then.
2. A user namespace fd should be allocated on first invocation and be
kept around until the user logs out (i.e. systemd --user ends), á la
#16328 (This does not implement what #16318 asks for, but should
solve the use-case discussed there.)
3. There's interest in allow a concept of "userspace reboots" where the
kernel stays running, and userspace is swapped out (i.e. all services
exit, and the rootfs transitioned into a new version of it) while
keeping some select resources pinned, very similar to how we
implement a switch root. Thus it is useful to allow services to exit,
while leaving their fds around till the very end.
This is exposed through a new FileDescriptorStorePreserve= setting that
is closely modelled after RuntimeDirectoryPreserve= (in fact it reused
the same internal type), since we want similar behaviour in the end, and
quite often they probably want to be used together.
Lennart Poettering [Wed, 29 Mar 2023 20:06:39 +0000 (22:06 +0200)]
service: rework how we release resources
Let's normalize how we release service resources, i.e. the three types
of fds we maintain for each service:
1. the fdstore
2. the socket fd for per-connection socket activated services
3. stdin/stdout/stderr
The generic service_release_resources() hook now calls into
service_release_fd_store() + service_close_socket_fd()
service_release_stdio_fd() one after the other, releasing them all for
the generic "release_resources" infra of the unit lifecycle.
We do no longer close the socket fd from service_set_state(), moving
this exclusively into service_release_resources(), so that all fds are
closed the same way.
Lennart Poettering [Wed, 29 Mar 2023 19:52:41 +0000 (21:52 +0200)]
service: release resources from a seperate queue, not unit_check_gc()
The per-unit-type release_resources() hook (most prominent use: to
release a service unit's fdstore once a unit is entirely dead and has no
jobs more) was currently invoked as part of unit_check_gc(), whose
primary purpose is to determine if a unit should be GC'ed. This was
always a bit ugly, as release_resources() changes state of the unit,
while unit_check_gc() is otherwise (and was before release_resources()
was added) a "passive" function that just checks for a couple of
conditions.
unit_check_gc() is called at various places, including when we wonder if
we should add a unit to the gc queue, and then again when we take it out
of the gc queue to dtermine whether to really gc it now. The fact that
these checks have side effects so far wasn't too problematic, as the
state changes (primarily: that services would empty their fdstores) were
relatively limited and scope.
A later patch in this series is supposed to extend the service state
engine with a separate state distinct from SERVICE_DEAD that is very
much like it but indicates that the service still has active resources
(specifically the fdstore). For cases like that the releasing of the
fdstore would result in state changes (as we'd then return to a classic
SERVICE_DEAD state). And this is where the fact that the
release_resources() is called as side-effect becomes problematic: it
would mean that unit state changes would instantly propagate to state
changes elsewhere, though we usually want this to be done through the
run queue for coalescing and avoidance of recursion.
Hence, let's clean this up: let's move the release_resources() logic
into a queue of its own, and then enqueue items into it from the general
state change notification handle in unit_notify().
Lennart Poettering [Wed, 12 Apr 2023 18:51:23 +0000 (20:51 +0200)]
core: fix property getter method for NFileDescriptorStore bus property
Since
da6053d0a7c16795e7fac1f9ba6694863918a597 this is a size_t, not an
unsigned. The difference doesn't matter on LE archs, but it matters on
BE (i.e. s390x), since we'll return entirely nonsensical data.
Let's fix that.
Follow-up-for:
da6053d0a7c16795e7fac1f9ba6694863918a597
An embarassing bug introduced in 2018... That made me scratch my head
for way too long, as it made #27135 fail on s390x while it passed
everywhere else.
Gaël PORTAY [Sun, 27 Dec 2020 13:55:09 +0000 (08:55 -0500)]
veritysetup: add support for fec options
The verity fec_* parameters allows to use Forward Error Correction to
recover from corruption if hash verification fails.
This adds the options fec_device, fec_offset and fec_roots (sixth
argument) which are the equivalent of the options --fec-device,
--fec-offset and --fec-roots in the veritysetup world.
- fec-device=FILE
- fec-offset=BYTES
- fec-roots=UINT64
See `veritysetup(8)` for more details.