git-history.diyao.me Git - systemd/.git/log

fix typo in ProtectSystem= option

This was introduced by commit d9ae3222cfbd5d2a48e6dbade6617085cc76f1c1 .

(cherry picked from commit 573229efeb2c5ade25794deee8cfe2f967414ef7)

Resolves: #1934500

logind: don't print warning when user@.service template is masked

User instance of systemd is optional feature and if user@.service
template is masked then administrator most likely doesn't want --user
instances of systemd for logged in users. We don't need to be verbose
about it.

(cherry picked from commit 03b6fa0c5b51b0d39334ff6ba183a3391443bcf6)
(cherry picked from commit 65e96327360ab41d44d5383dcecc82a19fad198c)

Resolves: #1894152

cgroup: freezer action must be NOP when cgroup v2 freezer is not available

Low-level cgroup freezer state manipulation is invoked directly from the
job engine when we are about to execute the job in order to make sure
the unit is not frozen and job execution is not blocked because of
that.

Currently with cgroup v1 we would needlessly do a bunch of work in the
function and even falsely update the freezer state. Don't do any of this
and skip the function silently when v2 freezer is not available.

Following bug is fixed by this commit,

$ systemd-run --unit foo.service /bin/sleep infinity
$ systemctl restart foo.service
$ systemctl show -p FreezerState foo.service

Before (cgroup v1, i.e. full "legacy" mode):
FreezerState=thawing

After:
FreezerState=running

(cherry picked from commit 9a1e90aee556b7a30d87553a891a4175ae77ed68)

Resolves: #1868831

core: make sure to restore the control command id, too

Fixes: #15356
(cherry picked from commit e9da62b18af647bfa73807e1c7fc3bfa4bb4b2ac)

Resolves: #1829867

man: document new "boot-complete.target" unit

(cherry picked from commit 82ea38258c0f4964c2f3ad3691c6e4554c4f0bb0)

Related: #1872243

units: add generic boot-complete.target

(cherry picked from commit 329d20db3cb02d789473b8f7e4a59526fcbf5728)

Resolves: #1872243

device: don't emit PropetiesChanged needlessly

Functions called from device_setup_unit() already make sure that unit is
enqueued in case it is a new unit or properties exported on the bus have
changed.

This should prevent unnecessary DBus wakeups and associated DBus traffic
when device_setup_unit() was called while reparsing /proc/self/mountinfo
due to the mountinfo notifications. Note that we parse
/proc/self/mountinfo quite often on the busy systems (e.g. k8s container
hosts) but majority of the time mounts didn't change, only some mount
got added. Thus we don't need to generate PropertiesChanged for devices
associated with the mounts that didn't change.

Thanks to Renaud Métrich <rmetrich@redhat.com> for debugging the
problem and providing draft version of the patch.

(cherry picked from commit 2e129d5d6bd6bd8be4b5359e81a880cbf72a44b8)

Resolves: #1793533

device: make sure we emit PropertiesChanged signal once we set sysfs

(cherry picked from commit 7c4d139485139eae95b17a1d54cb51ae958abd70)

Related: #1793533

tests: sleep a bit and give kernel time to perform the action after manual freeze/thaw

Fixes: #16050
(cherry picked from commit a0d79df8e59c6bb6dc0382d71e835dec869a7df4)

Related: #1848421

fix mis-merge

Resolves: #1848421

test: add test for cgroup v2 freezer support

(cherry picked from commit d446ae89c0168f17eed7135ac06df3b294b3fcc6)

Related: #1830861

core: fix the return value in order to make sure we don't dipatch method return too early

Actually, it is the same kind of problem as in d910f4c . Basically, we
need to return 1 on success code path in slice_freezer_action().
Otherwise we dispatch DBus return message too soon.

Fixes: #16050
(cherry picked from commit 2884836e3c26fa76718319cdc6d13136bbc1354d)

Related: #1830861

core/cgroup: fix return value of unit_cgorup_freezer_action()

We should return 0 only if current freezer state, as reported by the
kernel, is already the desired state. Otherwise, we would dispatch
return dbus message prematurely in bus_unit_method_freezer_generic().

Thanks to Frantisek Sumsal for reporting the issue.

(cherry picked from commit d910f4c2b2542544d7b187a09605da7a0f220837)

Related: #1830861

core: introduce support for cgroup freezer

With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.

This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.

Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.

(cherry picked from commit d9e45bc3abb8adf5a1cb20816ba8f2d2aa65b17e)

Resolves: #1830861

shared: add NULL callback check in one more place

Follow-up for 9f65637308.

(cherry picked from commit d3d53e5cd143bf96d1eb0e254f16fa8d458d38ce)

Related: #1830861

shared: Don't try calling NULL callback in bus_wait_for_units_clear

BugLink: https://bugs.launchpad.net/bugs/1870930
(cherry picked from commit 9f656373082cb13542b877b4f5cb917ef5ff329c)

Related: #1830861

shared: fix assert call

Fixup for 3572d3df8f8. Coverity CID#1403013.

(cherry picked from commit 60b17d6fcd988c9995b7d1476d3aba1c4cbbfddd)

Related: #1830861

shared: add generic logic for waiting for a unit to enter some state

This is a generic implementation of a client-side logic of waiting until
a unit enters or leaves some state.

This is a more generic implementation of the WaitContext logic currently
in systemctl.c, and is supposed to replace it (a later commit does
this). It's similar to bus-wait-for-jobs.c and we probably should fold
that one into it later on.

This code is more powerful and cleaner than the WaitContext logic
however. In addition to waiting for a unit to exit this also allows us
to wait for a unit to leave the "maintainance" state.

This commit only implements the generic logic, and adds no users of it
yet.

(cherry picked from commit 3572d3df8f822d4cf1601428401a837f723771cf)

Related: #1830861

basic/cgroup-util: introduce cg_get_keyed_attribute_full()

Callers of cg_get_keyed_attribute_full() can now specify via the flag whether the
missing keyes in cgroup attribute file are OK or not. Also the wrappers for both
strict and graceful version are provided.

(cherry picked from commit 25a1f04c682260bb9b96e25bdf33665d6172db98)

Related: #1830861

selinux: do preprocessor check only in selinux-access.c

This has the advantage that mac_selinux_access_check() can be used as a
function in all contexts. For example, parameters passed to it won't be
reported as unused if the "function" call is replaced with 0 on SELinux
disabled builds.

(cherry picked from commit 08deac6e3e9119aeb966375f94695e4aa14ffb1c)

Related: #1830861

core: don't consider SERVICE_SKIP_CONDITION for abnormal or failure restarts

Fixes: #16115
(cherry picked from commit bb9244781c6fc7608f7cac910269f8987b8adc01)

Related: #1737283

meson: allow setting the version string during configuration

Very loosely based on upstream commits e1ca734edd17a90a325d5b566a4ea96e66c206e5
and 681bd2c524ed71ac04045c90884ba8d55eee7b66.

Resolves: #1804252

cgroup: Mark memory protections as explicitly set in transient units

A later version of the DefaultMemory{Low,Min} patch changed these to
require explicitly setting memory_foo_set, but we only set that in
load-fragment, not dbus-cgroup.

Without these, we may fall back to either DefaultMemoryFoo or
CGROUP_LIMIT_MIN when we really shouldn't.

(cherry picked from commit 184e989d7da4648bd36511ffa28a9f2b469589d1)

Related: #1763435

cgroup: Respect DefaultMemoryMin when setting memory.min

This is an oversight from https://github.com/systemd/systemd/pull/12332.

Sadly the tests didn't catch it since it requires a real cgroup
hierarchy to see, and it wasn't seen in prod since we're only currently
using DefaultMemoryLow, not DefaultMemoryMin. :-(

(cherry picked from commit 64fe532e90b3e99bf7821ded8a1107c239099e40)

Related: #1763435

cgroup: Check ancestor memory min for unified memory config

Otherwise we might not enable it when we should, ie. DefaultMemoryMin is
set in a parent, but not MemoryMin in the current unit.

(cherry picked from commit 7c9d2b79935d413389a603918a711df75acd3f48)

Related: #1763435

cgroup: Test that it's possible to set memory protection to 0 again

The previous commit fixes this up, and this should prevent it
regressing.

(cherry picked from commit 465ace74d9820824968ab5e82c81e42c2f1894b0)

Related: #1763435

cgroup: Support 0-value for memory protection directives

These make sense to be explicitly set at 0 (which has a different effect
than the default, since it can affect processing of `DefaultMemoryXXX`).

Without this, it's not easily possible to relinquish memory protection
for a subtree, which is not great.

(cherry picked from commit 22bf131be278b95a4a204514d37a4344cf6365c6)

Related: #1763435

cgroup: Readd some plumbing for DefaultMemoryMin

Somehow these got lost in the previous PR, rendering DefaultMemoryMin
not very useful.

(cherry picked from commit 7e7223b3d57c950b399352a92e1d817f7c463602)

Related: #1763435

cgroup: Polish hierarchically aware protection docs a bit

I missed adding a section in `systemd.resource-control` about
DefaultMemoryMin in #12332.

Also, add a NEWS entry going over the general concept.

(cherry picked from commit acdb4b5236f38bbefbcc4a47fdbb9cd558b4b5c5)

Related: #1763435

unit: Add DefaultMemoryMin

(cherry picked from commit 7ad5439e0663e39e36619957fa37eefe8026bcab)

Related: #1763435

cgroup: Create UNIT_DEFINE_ANCESTOR_MEMORY_LOOKUP

This is in preparation for creating unit_get_ancestor_memory_min.

(cherry picked from commit 6264b85e92aeddb74b8d8808a08c9eae8390a6a5)

Related: #1763435

cgroup: Implement default propagation of MemoryLow with DefaultMemoryLow

In cgroup v2 we have protection tunables -- currently MemoryLow and
MemoryMin (there will be more in future for other resources, too). The
design of these protection tunables requires not only intermediate
cgroups to propagate protections, but also the units at the leaf of that
resource's operation to accept it (by setting MemoryLow or MemoryMin).

This makes sense from an low-level API design perspective, but it's a
good idea to also have a higher-level abstraction that can, by default,
propagate these resources to children recursively. In this patch, this
happens by having descendants set memory.low to N if their ancestor has
DefaultMemoryLow=N -- assuming they don't set a separate MemoryLow
value.

Any affected unit can opt out of this propagation by manually setting
`MemoryLow` to some value in its unit configuration. A unit can also
stop further propagation by setting `DefaultMemoryLow=` with no
argument. This removes further propagation in the subtree, but has no
effect on the unit itself (for that, use `MemoryLow=0`).

Our use case in production is simplifying the configuration of machines
which heavily rely on memory protection tunables, but currently require
tweaking a huge number of unit files to make that a reality. This
directive makes that significantly less fragile, and decreases the risk
of misconfiguration.

After this patch is merged, I will implement DefaultMemoryMin= using the
same principles.

(cherry picked from commit c52db42b78f6fbeb7792cc4eca27e2767a48b6ca)

Related: #1763435

test: remove support for suffix in get_testdata_dir()

Instead, use path_join() in callers wherever needed.

(cherry picked from commit 55890a40c3ec0c061c04d1395a38c26313132d12)

Related: #1763435

core: introduce cgroup_add_device_allow()

(cherry picked from commit fd870bac25c2dd36affaed0251b5a7023f635306)

Related: #1763435

core: add MemoryMin

The kernel added support for a new cgroup memory controller knob memory.min in
bf8d5d52ffe8 ("memcg: introduce memory.min") which was merged during v4.18
merge window.

Add MemoryMin to support memory.min.

(cherry picked from commit 484226357789991de0b3363beb69258be06b4c92)

Resolves: #1763435

sd-bus: skip sending formatted UIDs via SASL

The dbus external authentication takes as optional argument the UID the
sender wants to authenticate as. This uid is purely optional. The
AF_UNIX socket already conveys the same information through the
auxiliary socket data, so we really don't have to provide that
information.

Unfortunately, there is no way to send empty arguments, since they are
interpreted as "missing argument", which has a different meaning. The
SASL negotiation thus changes from:

    AUTH EXTERNAL <uid>
    NEGOTIATE_UNIX_FD                   (optional)
    BEGIN

to:

    AUTH EXTERNAL
    DATA
    NEGOTIATE_UNIX_FD                   (optional)
    BEGIN

And thus the replies we expect as a client change from:

    OK <server-id>
    AGREE_UNIX_FD                       (optional)

to:

    DATA
    OK <server-id>
    AGREE_UNIX_FD                       (optional)

Since the old sd-bus server implementation used the wrong reply for
"AUTH" requests that do not carry the arguments inlined, we decided to
make sd-bus clients accept this as well. Hence, sd-bus now allows
"OK <server-id>\r\n" replies instead of "DATA\r\n" replies.

Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
(cherry picked from commit 1ed4723d38cd0d1423c8fe650f90fa86007ddf55)

Resolves: #1838081

sd-bus: fix SASL reply to empty AUTH

The correct way to reply to "AUTH <protocol>" without any payload is to
send "DATA" rather than "OK". The "DATA" reply triggers the client to
respond with the requested payload.

In fact, adding the data as hex-encoded argument like
"AUTH <protocol> <hex-data>" is an optimization that skips the "DATA"
roundtrip. The standard way to perform an authentication is to send the
"DATA" line.

This commit fixes sd-bus to properly send the "DATA" line. Surprisingly
no existing implementation depends on this, as they all pass the data
directly as argument to "AUTH". This will not work if we want to pass
an empty argument, though.

Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
(cherry picked from commit 2010873b4b49b223e0cc07d28205b09c693ef005)

Related: #1838081

sd-bus: avoid magic number in SASL length calculation

Lets avoid magic numbers and use a constant `strlen()` instead.

Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
(cherry picked from commit 3cacdab925c40a5d9b7cf3f67719201bbaa17f67)

Related: #1838081

core: downgrade CPUQuotaPeriodSec= clamping logs to debug

After the first warning log, further messages are downgraded to LOG_DEBUG.

(cherry picked from commit 527ede0c638b47b62a87900438a8a09dea42889e)

Related: #1770379

core: add CPUQuotaPeriodSec=

This new setting allows configuration of CFS period on the CPU cgroup, instead
of using a hardcoded default of 100ms.

Tested:
- Legacy cgroup + Unified cgroup
- systemctl set-property
- systemctl show
- Confirmed that the cgroup settings (such as cpu.cfs_period_ns) were set
  appropriately, including updating the CPU quota (cpu.cfs_quota_ns) when
  CPUQuotaPeriodSec= is updated.
- Checked that clamping works properly when either period or (quota * period)
  are below the resolution of 1ms, or if period is above the max of 1s.

(cherry picked from commit 10f28641115733c61754342d5dcbe70b083bea4b)

Resolves: #1770379

cgroup: use structured initialization

(cherry picked from commit de8a711a5849f9239c93aefa5554a62986dfce42)

Related: #1770379

time-util: Introduce parse_sec_def_infinity

This works like parse_sec() but defaults to USEC_INFINITY when passed an
empty string or only whitespace.

Also introduce config_parse_sec_def_infinity, which can be used to parse
config options using this function.

This is useful for time options that use "infinity" for default and that
can be reset by unsetting them.

Introduce a test case to ensure it works as expected.

(cherry picked from commit 7b61ce3c44ef5908e817009ce4f9d2a7a37722be)

Related: #1770379

core: add IODeviceLatencyTargetSec

This adds support for the following proposed latency based IO control
mechanism.

https://lkml.org/lkml/2018/6/5/428

(cherry picked from commit 6ae4283cb14c4e4a895f4bbba703804e4128c86c)

Resolves: #1831519

core: coldplug possible nop_job

When a unit in a state INACTIVE or DEACTIVATING, JobType JOB_TRY_RESTART or
JOB_TRY_RELOAD will be collapsed to JOB_NOP. And use u->nop_job instead
of u->job.

If a JOB_NOP job is going on with a waiting state, a parallel daemon-reload
just install it during deserialization. Without a coldplug, the job will
not be in m->run_queue, which results in a hung try-restart or
try-reload process.

Reproduce:

run systemctl try-restart test.servcie (inactive) repeatly in a terminal.
run systemctl daemon-reload repeatly in other terminals.

After successful reproduce, systemctl list-jobs will list the hang job.

Upsteam:
systemd/systemd#13124

(cherry picked from commit b49e14d5f3081dfcd363d8199a14c0924ae9152f)

Resolves: #1829798

mount: don't add Requires for tmp.mount

This is a follow-up to #1619292.

rhel-only
Resolves: #1748840

resolvconf: fixes for the compatibility interface

Also use compat_main() when called as `resolvconf`, since the interface
is closer to that of `systemd-resolve`.

Use a heap allocated string to set arg_ifname, since a stack allocated
one would be lost after the function returns. (This last one broke the
case where an interface name was suffixed with a dot, such as in
`resolvconf -a tap0.dhcp`.)

Tested:
$ build/resolvconf -a nonexistent.abc </etc/resolv.conf
Unknown interface 'nonexistent': No such device

Fixes #9423.

(cherry picked from commit 5a01b3f35d7b6182c78b6973db8d99bdabd4f9c3)

Resolves: #1835594

sulogin-shell: Use force if SYSTEMD_SULOGIN_FORCE set

When the root account is locked sulogin will either inform you of
this and not allow you in or if --force is used it will hand
you passwordless root (if using a recent enough version of util-linux).

Not being allowed a shell is ofcourse inconvenient, but at the same
time handing out passwordless root unconditionally is probably not
a good idea everywhere.

This patch thus allows to control which behaviour you want by
setting the SYSTEMD_SULOGIN_FORCE environment variable to true
or false to control the behaviour, eg. via adding this to
'systemctl edit rescue.service' (or emergency.service):

[Service]
Environment=SYSTEMD_SULOGIN_FORCE=1

Distributions who used locked root accounts and want the passwordless
behaviour could thus simply drop in the override file in
/etc/systemd/system/rescue.service.d/override.conf

Fixes: #7115
Addresses: https://bugs.debian.org/802211
(cherry picked from commit 33eb44fe4a8d7971b5614bc4c2d90f8d91cce66c)

Resolves: #1625929

tmpfiles: fix crash with NULL in arg_root and other fixes and tests

The function to replacement paths into the configuration file list was borked.
Apart from the crash with empty root prefix, it would incorrectly handle the
case where root *was* set, and the replacement file was supposed to override
an existing file.

prefix_root is used instead of path_join because prefix_root removes duplicate
slashes (when --root=dir/ is used).

A test is added.

Fixes #11124.

(cherry picked from commit 082bb1c59bd4300bcdc08488c94109680cfadf57)

Resolves: #1836024

seccomp: fix __NR__sysctl usage

Loosely based on
https://github.com/systemd/systemd/pull/14032 and
https://github.com/systemd/systemd/pull/14268.

Related: #1843871

fuzz-compress: add fuzzer for compression and decompression

(cherry picked from commit 029427043b2e0523a21f54374f872b23cf744350)
Resolves: #1843871

journal: adapt for new improved LZ4_decompress_safe_partial()

With lz4 1.8.3, this function can now decompress partial results into a smaller
buffer. The release news don't say anything interesting, but the test case that
was previously failing now works OK.

Fixes #10259.

A test is added. It shows that with *older* lz4, a partial decompression can
occur with the returned size smaller then the requested number of bytes _and_
smaller then the size of the compressed data:

(lz4-libs-1.8.2-1.fc29.x86_64)
Compressed 4194304 → 16464
Decompressed → 4194304
Decompressed partial 12/4194304 → 4194304
Decompressed partial 1/1 → -2 (bad)
Decompressed partial 2/2 → -2 (bad)
Decompressed partial 3/3 → -2 (bad)
Decompressed partial 4/4 → -2 (bad)
Decompressed partial 5/5 → -2 (bad)
Decompressed partial 6/6 → 6 (good)
Decompressed partial 7/7 → 6 (good)
Decompressed partial 8/8 → 6 (good)
Decompressed partial 9/9 → 6 (good)
Decompressed partial 10/10 → 6 (good)
Decompressed partial 11/11 → 6 (good)
Decompressed partial 12/12 → 6 (good)
Decompressed partial 13/13 → 6 (good)
Decompressed partial 14/14 → 6 (good)
Decompressed partial 15/15 → 6 (good)
Decompressed partial 16/16 → 6 (good)
Decompressed partial 17/17 → 6 (good)
Decompressed partial 18/18 → -16459 (bad)

(lz4-libs-1.8.3-1.fc29.x86_64)
Compressed 4194304 → 16464
Decompressed → 4194304
Decompressed partial 12/4194304 → 12
Decompressed partial 1/1 → 1 (good)
Decompressed partial 2/2 → 2 (good)
Decompressed partial 3/3 → 3 (good)
Decompressed partial 4/4 → 4 (good)
...

If we got such a short "successful" decompression in decompress_startswith() as
implemented before this patch, we could be confused and return a false negative
result. But it turns out that this only occurs with small output buffer
sizes. We use greedy_realloc() to manager the buffer, so it is always at least
64 bytes. I couldn't hit a case where decompress_startswith() would actually
return a bogus result. But since the lack of proof is not conclusive, the code
for *older* lz4 is changed too, just to be safe. We cannot rule out that on a
different architecture or with some unlucky compressed string we could hit this
corner case.

The fallback code is guarded by a version check. The check uses a function not
the compile-time define, because there was no soversion bump in lz4 or new
symbols, and we could be compiled against a newer lz4 and linked at runtime
with an older one. (This happens routinely e.g. when somebody upgrades a subset
of distro packages.)

(cherry picked from commit e41ef6fd0027d3619dc1cf062100b2d224d0ee7e)
Resolves: #1843871

test-compress: add test for short decompress_startswith calls

I thought this might fail with lz4 < 1.8.3, but it seems that because of
greedy_realloc, we always use a buffer that is large enough, and it always
passes.

(cherry picked from commit ba17efce44e6a1e139c1671205e9a6ed3824af1b)
Resolves: #1843871

Drop support for lz4 < 1.3.0

lz4-r130 was released on May 29th, 2015. Let's drop the work-around for older
versions. In particular, we won't test any new code against those ancient
releases, so we shouldn't pretend they are supported.

(cherry picked from commit e0a1d4b049e6991919a0eacd5d96f7f39dc6ddd1)
Resolves: #1843871

core: ExecCondition= for services

Closes #10596

(cherry picked from commit 31cd5f63ce86a0784c4ef869c4d323a11ff14adc)

Resolves: #1737283

test-execute: provide custom failure message

test_exec_ambientcapabilities: exec-ambientcapabilities-nobody.service: exit status 0, expected 1

Sometimes we get just the last line, for example from the failure summary,
so make it as useful as possible.

(cherry picked from commit 6aed6a11577b108b9a39f26aeae5e45d98f20c90)

Related: #1737283

test-execute: allow filtering test cases by pattern

When debugging failure in one of the cases, it's annoying to have to wade
through the output from all the other cases. Let's allow picking select
cases.

(cherry picked from commit 9efb96315ae502dabeb94ab35816ea8955563b7a)

Related: #1737283

tests: always use the right vtable wrapper calls

Prompted by https://github.com/systemd/systemd/pull/10836#discussion_r234598868

(cherry picked from commit bd7989a3d90e5d97e09f1eef33d09b2469a79f4d)

Related: #1737283

core: log a recognizable message when a unit succeeds, too

We already are doing it on failure, let's do it on success, too.

Fixes: #10265
(cherry picked from commit 523ee2d41471bfb738f52d59de9b469301842644)

Related: #1737283

core: make log messages about units entering a 'failed' state recognizable

Let's make this recognizable, and carry result information in a
structure fashion.

(cherry picked from commit 7c047d7443347c109daf67023a01c118b5f361eb)

Related: #1737283

core: split out all logic that updates a Job on a unit's unit_notify() invocation

Just some refactoring, no change in behaviour.

(cherry picked from commit 16c74914d233ec93012d77e5f93cf90e42939669)

Related: #1737283

job: when a job was skipped due to a failed condition, log about it

Previously we'd neither show console status output nor log output. Let's
fix that, and still log something.

(cherry picked from commit 9a80f2f4533883d272e6a436512aa7e88cedc549)

Related: #1737283

core: move unit_status_emit_starting_stopping_reloading() and related calls to job.c

This call is only used by job.c and very specific to job handling.
Moreover the very similar logic of job_emit_status_message() is already
in job.c.

Hence, let's clean this up, and move both sets of functions to job.c,
and rename them a bit so that they express precisely what they do:

1. unit_status_emit_starting_stopping_reloading() →
job_emit_begin_status_message()
2. job_emit_status_message() → job_emit_done_status_message()

The first call is after all what we call when we begin with the
execution of a job, and the second call what we call when we are done
wiht it.

Just some moving and renaming, not other changes, and hence no change in
behaviour.

(cherry picked from commit 33a3fdd9781329379f74e11a7a2707816aad8c61)

Related: #1737283

nspawn: chown() the legacy hierarchy when it's used in a container

This is a follow-up to 720f0a2f3c928cc9379501a52146be9fbb4d9be2.

Closes https://github.com/systemd/systemd/issues/10026
Closes https://github.com/systemd/systemd/issues/9563

(cherry picked from commit 89f180201cd8c0f3ce5cb6e8dd7e2b3cbcf71527)

Resolves: 1837094

nspawn: move payload to sub-cgroup first, then sync cgroup trees

if we sync the legacy and unified trees before moving to the right
subcgroup then ultimately the cgroup paths in the hierarchies will be
out-of-sync... Hence, let's move the payload first, and sync then.

Addresses: https://github.com/systemd/systemd/pull/9762#issuecomment-441187979
(cherry picked from commit 27da7ef0d09e00eae821f3ef26e1a666fe7aa087)

Resolves: #1837094

Add support for opening files for appending

Addresses part of #8983

(cherry picked from commit 566b7d23eb747e9c5a74e5647693077b52395fc5)

Resolves: #1809175

man: be clearer that .timer time expressions need to be reset to override them

let's be clearer about the overriding concept for OnCalendar= settings.

Prompted by this thread:

https://lists.freedesktop.org/archives/systemd-devel/2019-March/042351.html
(cherry picked from commit 58031d99c6320855b86f4890baa9165597e3d841)

Resolves: #1816908

udev-rules: make tape-changers also apprear in /dev/tape/by-path/

It is important to be able to access tape changer ("Medium Changers") by
persistant name.
While tape devices can be accessed via /dev/tape/by-id/ and
/dev/tape/by-path/, tape-changers could only be accessed by
/dev/tape/by-id/.
However, in some cases, especially when accessing Amazon Webservice
Storage Gateway VTLs (or accessing iSCSI VTLs in general?) this does not
work, as all tape devices and the tape changer have the same ENV{ID_SERIAL}.
The results is, that only the last device is available in
/dev/tape/by-id/, as the former devices have been overwritten.

As this behavior is hard to change without breaking consistentcy,
this additional device in /dev/tape/by-path/ can be used to access the medium changes.
The tape devices can also be accessed by this path.

The content of the directory will now look like:

  # SCSI tape device, rewind (unchanged)
  /dev/tape/by-path/$env{ID_PATH} -> ../../st*

  # SCSI tape device, no-rewind (unchanged)
  /dev/tape/by-path/$env{ID_PATH}-nst -> ../../nst*

  # SCSI tape changer device (newly added)
  /dev/tape/by-path/$env{ID_PATH}-changer -> ../../sg*

Tape devices and tape changer have different ID_PATHs.
SCSI tape changer get the suffix "-changer"
to make them better distinguishable from tape devices.

(cherry picked from commit 7f8ddf96a25162f06bd94a684cf700c128d18142)

Resolves: #1820112

pid1: add new kernel cmdline arg systemd.cpu_affinity=

Let's allow configuration of the CPU affinity via the kernel cmdline,
overriding CPUAffinity= in /etc/systemd/system.conf

Prompted by:

https://lists.freedesktop.org/archives/systemd-devel/2019-November/043754.html

(cherry picked from commit 68d58f38693e586b5ce5785274f8e42a79625196)

Resolves: #1812894

test: store coredumps in journal

To make debugging much easier, especially for crashes in tests under
QEMU, let's store the entire coredump bundle in the systemd journal,
which is usually kept around by various CIs. Right now, we usually end
up with a journal, but without the coredump itself, which is pretty
useless.

(cherry picked from commit 215bffe1b8d7cb72fe9f72ed53682d52d5c2a9c5)

Related: #1823767

test: try to determine QEMU_SMP dynamically

If the QEMU_SMP value has not been explicitly set, try to determine it
from the number of online CPUs using the nproc utility. If this approach
fails, fall back to the default value QEMU_SMP=1.

This change should significantly help when running integration tests
under QEMU on multicore systems.

(cherry picked from commit 5bfb2a93a4a36bba0d24199553dcda6e560cbb75)

Related: #1823767

test: parallelize tasks in TEST-24-UNIT-TESTS

(cherry picked from commit 2f2a0454efd07644a4e0ccb3f00f1db2d7043391)

Related: #1823767

test: make test-catalog relocatable

Fixes #10045.

(cherry picked from commit d9b6baa69968132d33e4ad8627c7fe0bd527c859)

Resolves: #1823767

test: introduce test_is_running_from_builddir()

(cherry picked from commit 8cb10a4f4dabc508a04f76ea55f23ef517881b61)

Resolves: #1823767

test-execute: skip several tests when running in container

(cherry picked from commit 642d1a6d6e98204ade25816bcc429cb67df92a29)

Resolves: #1823767

test-execute: also check python3 is installed or not

(cherry picked from commit 738c74d7b163ea18e3c68115c3ed8ceed166cbf7)

Resolves: #1823767

test-process-util: skip several verifications when running in unprivileged container

(cherry picked from commit 767eab47501b06327a0e6030e5c54860a3fc427f)

Resolves: #1823767

test-fs-util: skip some tests when running in unprivileged container

(cherry picked from commit 9590065f37be040996f1c2b9a246b9952fdc0c0b)

Resolves: #1823767

test: make install_keymaps() optionally install more keymaps

(cherry picked from commit ad931fee506e1313e8a520ae0ecc1c8e275d9941)

Resolves: #1823767

test: add paths of keymaps in install_keymaps()

It seems that the paths of directories storing keymaps are changed.

(cherry picked from commit 83a7051ee1edbfe8cd2278477d23083beb385409)

Resolves: #1823767

test: replace duplicated Makefile by symbolic link

(cherry picked from commit dd75c133d81f07c56c82ee4e7a80f391ffebd9ce)

Resolves: #1823767

test: introduce install_zoneinfo()

But it is not called by default.

(cherry picked from commit 7d10ec1cda8fed20c36b16d2387f529583645cda)

Resolves: #1823767

test: install libraries required by tests

(cherry picked from commit e3d3dada248c5f30e2978840ca1f0a03a4675b53)

Resolves: #1823767

test: do not use global variable to pass error

(cherry picked from commit 0013fac248a15be3acce84c17a65e3ae0377294b)

Resolves: #1823767

logind: check PolicyKit before allowing VT switch

Let's lock this down a bit. Effectively nothing much changes, since the
default PK policy will allow users on the VT to change VT. Only users
with no local VT session won't be able to switch VTs.

(cherry picked from commit 4acf0cfd2f92edb94ad48d04f1ce6c9ab4e19d55)

Resolves: #1797679

udev: downgrade message when we fail to set inotify watch up

My logs are full of:

systemd-udevd[6586]: seq 13515 queued, 'add' 'block'
systemd-udevd[6586]: seq 13516 queued, 'change' 'block'
systemd-udevd[6586]: seq 13517 queued, 'change' 'block'
systemd-udevd[6586]: seq 13518 queued, 'remove' 'bdi'
systemd-udevd[6586]: seq 13519 queued, 'remove' 'block'
systemd-udevd[9865]: seq 13514 processed
systemd-udevd[9865]: seq 13515 running
systemd-udevd[9865]: GROUP 6 /usr/lib/udev/rules.d/50-udev-default.rules:59
systemd-udevd[9865]: IMPORT builtin 'blkid' /usr/lib/udev/rules.d/60-persistent-storage.rules:95
systemd-udevd[9865]: IMPORT builtin 'blkid' fails: No such file or directory
systemd-udevd[9865]: loop4: Failed to add device '/dev/loop4' to watch: No such file or directory
(the last line is at error level).
If we are too slow to set up a watch and the device is already gone by the time
we try, this is not an error.

(cherry picked from commit 7fe0d0d5c0ad5aa3f069bb282868938d414d7ad1)

Resolves: #1808051

sd-journal: remove the dead code and actually fix #14695

journal_file_fstat() returns an error if we call it on already unlinked
journal file and hence we never reach remove_file_real() which is the
entire point.

I must have made some mistake while testing the fix that got me thinking
the issue is gone while opposite was true.

Fixes #14695

(cherry picked from commit 8581b9f9732d4c158bb5f773230a65ce77f2c292)

Resolves: #1796128

sd-journal: close journal files that were deleted by journald before we've setup inotify watch

Fixes #14695

(cherry picked from commit 28ca867abdb20d0e4ac1901e2ed669cdb41ea3f6)

Related: #1796128

core: transition to FINAL_SIGTERM state after ExecStopPost=

Fixes #14566

(cherry picked from commit c1566ef0d22ed786b9ecf4c476e53b8a91e67578)

Resolves: #1766479

basic: use comma as separator in cpuset cgroup cpu ranges

This is a workaround for
https://bugzilla.redhat.com/show_bug.cgi?id=1819152 and should be
reverted in RHEL-8.3.

RHEL-only

Related: #1818054

core: fix re-realization of cgroup siblings

This is a fix-up for eef85c4a3f8054d29383a176f6cebd1ef3a15b9a which
broke this.

Tracked down by @w-simon

Fixes: #14453
(cherry picked from commit 65f6b6bdcb500c576674b5838e4cc4c35e18bfde)

Related: #1818054

pid1: fix the names of AllowedCPUs= and AllowedMemoryNodes=

The original PR was submitted with CPUSetCpus and CPUSetMems, which was later
changed to AllowedCPUs and AllowedMemmoryNodes everywhere (including the parser
used by systemd-run), but not in the parser for unit files.

Since we already released -rc1, let's keep support for the old names. I think
we can remove it in a release or two if anyone remembers to do that.

Fixes #14126. Follow-up for 047f5d63d7a1ab75073f8485e2f9b550d25b0772.

(cherry picked from commit 0b8d3075872a05e0449906d24421ce192f50c29f)

Related: #1818054

core: rework StopWhenUnneeded= logic

Previously, we'd act immediately on StopWhenUnneeded= when a unit state
changes. With this rework we'll maintain a queue instead: whenever
there's the chance that StopWhenUneeded= might have an effect we enqueue
the unit, and process it later when we have nothing better to do.

This should make the implementation a bit more reliable, as the unit notify event
cannot immediately enqueue tons of side-effect jobs that might
contradict each other, but we do so only in a strictly ordered fashion,
from the main event loop.

This slightly changes the check when to consider a unit "unneeded".
Previously, we'd assume that a unit in "deactivating" state could also
be cleaned up. With this new logic we'll only consider units unneeded
that are fully up and have no job queued. This means that whenever
there's something pending for a unit we won't clean it up.

(cherry picked from commit a3c1168ac293f16d9343d248795bb4c246aaff4a)

Resolves: #1798046

bus_open leak sd_event_source when udevadm trigger。

On my host, when executing the udevadm trigger, I only receive the change event, which causes memleak

(cherry picked from commit b2774a3ae692113e1f47a336a6c09bac9cfb49ad)

Resolves: #1798504

resolved: Recover missing PrivateTmp=yes and ProtectSystem=strict

Since the commit b61e8046ebcb28225423fc0073183d68d4c577c4,
systemd-resolved.service often fails to start with the following message:

Failed at step NAMESPACE spawning /usr/bin/mount: Read-only file system

This is because dropping DynamicUser=yes dropped implicit PrivateTmp=yes and
also implicit After=systemd-tmpfiles-setup.service, and thus
systemd-resolved.service can start before systemd-remount-fs.service. As a
result, mount operations associated with PrivateDevices= can be performed to
still read-only filesystems.

To fix this issue, it's better to recover PrivateTmp=yes and
ProtectSystem=strict just as the upstream commit
62fb7e80fcc45a1530ed58a84980be8cfafa9b3e (Revert "resolve: enable DynamicUser=
for systemd-resolved.service").

Resolves: #1810869

swap: finish the secondary swap units' jobs if deactivation of the primary swap unit fails

Currently, if deactivation of the primary swap unit fails:

    # LANG=C systemctl --no-pager stop dev-mapper-fedora\\x2dswap.swap
    Job for dev-mapper-fedora\x2dswap.swap failed.
    See "systemctl status "dev-mapper-fedora\\x2dswap.swap"" and "journalctl -xe" for details.

then there are still the running stop jobs for all the secondary swap units
that follow the primary one:

    # systemctl list-jobs
     JOB UNIT                                                                                                         TYPE STATE
     3233 dev-disk-by\x2duuid-2dc8b9b1\x2da0a5\x2d44d8\x2d89c4\x2d6cdd26cd5ce0.swap                                    stop running
     3232 dev-dm\x2d1.swap                                                                                             stop running
     3231 dev-disk-by\x2did-dm\x2duuid\x2dLVM\x2dyuXWpCCIurGzz2nkGCVnUFSi7GH6E3ZcQjkKLnF0Fil0RJmhoLN8fcOnDybWCMTj.swap stop running
     3230 dev-disk-by\x2did-dm\x2dname\x2dfedora\x2dswap.swap                                                          stop running
     3234 dev-fedora-swap.swap                                                                                         stop running

    5 jobs listed.

This remains endlessly because their JobTimeoutUSec is infinity:

    # LANG=C systemctl show -p JobTimeoutUSec dev-fedora-swap.swap
    JobTimeoutUSec=infinity

If this issue happens during system shutdown, the system shutdown appears to
get hang and the system will be forcibly shutdown or rebooted 30 minutes later
by the following configuration:

    # grep -E "^JobTimeout" /usr/lib/systemd/system/reboot.target
    JobTimeoutSec=30min
    JobTimeoutAction=reboot-force

The scenario in the real world seems that there is some service unit with
KillMode=none, processes whose memory is being swapped out are not killed
during stop operation in the service unit and then swapoff command fails.

On the other hand, it works well in successful case of swapoff command because
the secondary jobs monitor /proc/swaps file and can detect deletion of the
corresponding swap file.

This commit fixes the issue by finishing the secondary swap units' jobs if
deactivation of the primary swap unit fails.

Fixes: #11577
(cherry picked from commit 9c1f969d40f84d5cc98d810bab8b24148b2d8928)

Resolves: #1749622

cryptsetup: Treat key file errors as a failed password attempt

6f177c7dc092eb68762b4533d41b14244adb2a73 caused key file errors to immediately fail, which would make it hard to correct an issue due to e.g. a crypttab typo or a damaged key file.

Closes #11723.

(cherry picked from commit c20db3887569e0c0d9c0e2845c5286e7edf0133a)

Related: #1763155

test: replace cursor file with a plain cursor

systemd in RHEL 8 doesn't support the --cursor-file option, so let's
fall back to a plain cursor string

Related: #1808940
rhel-only

test: drop the missed || exit 1 expression

...as we've already done in the rest of the testsuite, see
cc469c3dfc398210f38f819d367e68646c71d8da

(cherry picked from commit 67c434b03f8a24f5350f017dfb4b2464406046db)

Related: #1808940

test: add a simple sanity check for systems without NUMA support

(cherry picked from commit 92f8e978923f962a57d744c5f358520ac06f7892)

Related: #1808940

test: give strace some time to initialize

The `coproc` implementation seems to be a little bit different in older
bash versions, so the `strace` is sometimes started AFTER `systemctl
daemon-reload`, which causes unexpected fails. Let's help it a little by
sleeping for a bit.

(cherry picked from commit c7367d7cfdfdcec98f8659f0ed3f1d7b77123903)

Related: #1808940