2,209,147 events recorded by gharchive.org of which 2,209,147 were push events containing 3,350,540 commit messages that amount to 241,492,037 characters filtered with words.py@e23d022007... to these 30 messages:
mm: vmscan: fix extreme overreclaim and swap floods
During proactive reclaim, we sometimes observe severe overreclaim, with several thousand times more pages reclaimed than requested.
This trace was obtained from shrink_lruvec() during such an instance:
prio:0 anon_cost:1141521 file_cost:7767
nr_reclaimed:4387406 nr_to_reclaim:1047 (or_factor:4190)
nr=[7161123 345 578 1111]
While he reclaimer requested 4M, vmscan reclaimed close to 16G, most of it by swapping. These requests take over a minute, during which the write() to memory.reclaim is unkillably stuck inside the kernel.
Digging into the source, this is caused by the proportional reclaim bailout logic. This code tries to resolve a fundamental conflict: to reclaim roughly what was requested, while also aging all LRUs fairly and in accordance to their size, swappiness, refault rates etc. The way it attempts fairness is that once the reclaim goal has been reached, it stops scanning the LRUs with the smaller remaining scan targets, and adjusts the remainder of the bigger LRUs according to how much of the smaller LRUs was scanned. It then finishes scanning that remainder regardless of the reclaim goal.
This works fine if priority levels are low and the LRU lists are comparable in size. However, in this instance, the cgroup that is targeted by proactive reclaim has almost no files left - they've already been squeezed out by proactive reclaim earlier - and the remaining anon pages are hot. Anon rotations cause the priority level to drop to 0, which results in reclaim targeting all of anon (a lot) and all of file (almost nothing). By the time reclaim decides to bail, it has scanned most or all of the file target, and therefor must also scan most or all of the enormous anon target. This target is thousands of times larger than the reclaim goal, thus causing the overreclaim.
The bailout code hasn't changed in years, why is this failing now? The most likely explanations are two other recent changes in anon reclaim:
-
Before the series starting with commit 5df741963d52 ("mm: fix LRU balancing effect of new transparent huge pages"), the VM was overall relatively reluctant to swap at all, even if swap was configured. This means the LRU balancing code didn't come into play as often as it does now, and mostly in high pressure situations where pronounced swap activity wouldn't be as surprising.
-
For historic reasons, shrink_lruvec() loops on the scan targets of all LRU lists except the active anon one, meaning it would bail if the only remaining pages to scan were active anon - even if there were a lot of them.
Before the series starting with commit ccc5dc67340c ("mm/vmscan: make active/inactive ratio as 1:1 for anon lru"), most anon pages would live on the active LRU; the inactive one would contain only a handful of preselected reclaim candidates. After the series, anon gets aged similarly to file, and the inactive list is the default for new anon pages as well, making it often the much bigger list.
As a result, the VM is now more likely to actually finish large anon targets than before.
Change the code such that only one SWAP_CLUSTER_MAX-sized nudge toward the larger LRU lists is made before bailing out on a met reclaim goal.
This fixes the extreme overreclaim problem.
Fairness is more subtle and harder to evaluate. No obvious misbehavior was observed on the test workload, in any case. Conceptually, fairness should primarily be a cumulative effect from regular, lower priority scans. Once the VM is in trouble and needs to escalate scan targets to make forward progress, fairness needs to take a backseat. This is also acknowledged by the myriad exceptions in get_scan_count(). This patch makes fairness decrease gradually, as it keeps fairness work static over increasing priority levels with growing scan targets. This should make more sense - although we may have to re-visit the exact values.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner [email protected] Reviewed-by: Rik van Riel [email protected] Acked-by: Mel Gorman [email protected] Cc: Hugh Dickins [email protected] Cc: Joonsoo Kim [email protected] Cc: [email protected] Signed-off-by: Andrew Morton [email protected]
Details that will be updated with more info as it occurs to me
Notable Abilities Augmented strength, perception, and pain tolerance when under extreme stress. Capacity to shapeshift, wildly altering form in accordance with will. Intrinsic magical abilities of suspected Abyssal origin. Formidable opponent in weapon and hand-to-hand combat. Personality Distrustful, she is aloof with all but those who earn her fidelity with time and commitment. Superficially congenial, a great wound rests upon her heart, leading her to reach out desperately for love from any whom she feels can provide it. This wound also fuels the rage with which she strikes out at the world, often on the behalf of the abused and down-trodden that mirror the horrors of her own life. Drive Hers is a nomad's grace. Little ties her to any one place, and she acts in accordance with the law of her spirit. Her only overarching goal is to understand just where she comes from, and why the power of the Abyss resides within her.
Relax shader strictness in RELEASE mode (#13998)
Disables strictness and warnings as errors for custom pixel shaders in RELEASE. Windows terminal is not telling the user why the shader won't compile which makes it very frustrating for the shader hacker.
After trying the recent preview none of my shaders loaded anymore in Windows Terminal Preview which made me very sad. I had no idea what was wrong with them. After cloning the git repo, building it, fighting an issue that prevent DEBUG SDK from being used I finally was able to identify some issues that were blocking my shaders.
error X3556: integer modulus may be much slower, try using uints if possible. error X4000: use of potentially uninitialized variable (rayCylinder)
While the first one is a good warning I don't think it is an error and the tools I use didn't flag it so was hard to know.
The second one I was staring at the code and was unable to identify what exactly was causing the issues, I fumbled with the code a few times and just felt the fun drain away.
IMHO: I want it to be fun to develop shaders for windows terminal. Fighting invisible errors are not fun. I am not after building production shaders for Windows Terminal, I want some cool effects. So while I am as a .NET developer always runs with Warning as errors I don't think it's the right option here. Especially since Windows Terminal doesn't tell what is the problem.
However, I understand if the shaders you ship with Windows Terminal should be free of errors and silly mistakes, so I kept the stricter setting in DEBUG mode.
Loaded Windows Terminal in RELEASE and DEBUG mode and validated that RELEASE mode had reduced strictness but DEBUG retained the previous more restrictive mode.
Update codebase to use IntoIterator
where possible. (#5269)
Remove unnecessary calls to iter()
/iter_mut()
.
Mainly updates the use of queries in our code, docs, and examples.
// From
for _ in list.iter() {
for _ in list.iter_mut() {
// To
for _ in &list {
for _ in &mut list {
We already enable the pedantic lint clippy::explicit_iter_loop inside of Bevy. However, this only warns for a few known types from the standard library.
As you can see the additions and deletions are exactly equal. Maybe give it a quick skim to check I didn't sneak in a crypto miner, but you don't have to torture yourself by reading every line. I already experienced enough pain making this PR :)
Co-authored-by: devil-ira [email protected]
Fixes asset caching (#69852)
The asset was being loaded, seeing that fully_generated is false, so it attempts to rebuild. The rebuilding clears the existing file cache, and fucks us.
Life is pain.
MUAHAHHAHAHAHAA theme
i got a buncha themes man fuck you pee pee poo poo lookin ass bare ass no undie lookin ass chicken tendies lookin ass people fuckin lookin ass
Delete My Girlfriend Cheated on Me With a Senior, so I’m Cheating on Her With His Girlfriend.html
mankind knew they could not change society so instead of reflecting upon themselves they blamed the beasts heaven or hell ooh that's a nice jacket you got there wait which one of us is real again beats me duel 1 lets rock
[FSDP] Generalize prefetching; lower unshard/reshard to handle (#83665)
self.sharding_strategy
- If the world size is 1, I clamp the sharding strategy to
NO_SHARD
, regardless of the passed-in sharding strategy, since the behavior is fully equivalent. This absolves the need forp._is_sharded or self.world_size == 1
checks in the core code. Once we fully shift the paradigm to using handles, this should result in a clear net positive. However, for now, we still have some places where we interface directly with theFlatParameter
, in which case we have some temporary hacky code.
- If the world size is 1, I clamp the sharding strategy to
HandleConfig
- As a part of the new design abstraction, much logic is lowered to the
FlatParamHandle
. This requires the handle be aware of mixed precision, CPU offloading, sharding strategy, and the process group (for world size > 1). To be less error-prone, I re-defined thedataclass
s andenum
s for the handle. These can be removed and coalesced with the existing ones. - The drawback is that the
FlattenParamsWrapper
constructor now takes in theHandleConfig
to forward it to theFlatParamHandle
constructor. I tolerate this since we plan to retire the FPW. For now, the handle's process group attributes are set later when we callhandle.shard()
. - We will dive into this logic lowering later. For now, the idea is we need to pass some extra info to the handle, which must go through the FPW.
- As a part of the new design abstraction, much logic is lowered to the
FullyShardedDataParallel._shard_parameters()
->FlatParamHandle.shard()
- [Important] Generalizing attributes to remove the 1
FullyShardedDataParallel
: 1FlatParameter
assumption- Before:
_fsdp_graph_order
,_pre_backward_hook_full_params_prefetched
,_forward_full_params_prefetched
,reshard_after_forward
are with respect to 1FullyShardedDataParallel
- After: (1) We use
FlatParamHandle
in place ofFullyShardedDataParallel
. (2) The atomic unit for forward and pre-backward is a group of handles involved in the same module's forward/pre-backward. This is represented asTuple[FlatParamHandle, ...]
. For now, this is always a singleton tuple, but this shift enables a module having multiple FSDP parameters (which we have use cases for).
- Before:
_reset_lazy_init()
attributes- The prefetched flags are merged into
self._handles_prefetched
, which is directly defined in the constructor.reshard_after_forward
is retired since it can be fully determined by other attributes (_is_root
andsharding_strategy
).
- The prefetched flags are merged into
The first step is to read the existing _rebuild_full_params()
. A few notable observations:
- It returns
Tuple[Tensor, bool]
. The first element is the padded unsharded flattened parameter, and the second element is whether we can free it upon exitingsummon_full_params()
. This return value is only used insummon_full_params()
. - If parameter mixed precision is enabled and the
FlatParameter
is already unsharded, then the low precision shard (_mp_shard
) is still re-allocated on GPU. (It is freed at the end of the method.) - If CPU offloading is enabled and the
FlatParameter
is already unsharded, then there is a no-opp.data = p.data.to(self.compute_device, non_blocking=True)
. - Inside
summon_full_params()
,mixed_precision_cast_ran
is alwaysFalse
. Therefore, the return value for thenot p._is_sharded and mixed_precision_cast_ran
branch is unused. -summon_full_params()
can only be called (before forward or after backward) or (between forward and backward). Given this, I cannot think of a case where we callsummon_full_params()
, theFlatParameter
is already unsharded, butreshard_after_forward
isTrue
. TheFlatParameter
should be sharded (before forward or after backward), and theFlatParameter
may only be unsharded (between forward and backward) ifreshard_after_forward
isFalse
. - If parameter mixed precision is enabled and the sharding strategy is a sharded one, then inside
summon_full_params()
, theFlatParameter
is unsharded in full precision. This involves allocating a new padded unsharded flattened parameter on GPU in full precision since_full_param_padded
is in the low precision.
Some comments:
- Ideally, we reduce the complexity of the core code path: i.e. unshard for forward and pre-backward. If the return value is only used for
summon_full_params()
, we should consider if we can compartmentalize that logic. - The branching is complex, and some return values are never used, where this fact is not immediately obvious. We should see if we can reduce the branch complexity.
Disclaimer: The difference in attribute semantics between NO_SHARD
and the sharded strategies makes it challenging to unify the cases. This PR does not attempt to address that since it requires more design thought. However, it does attempt to reduce the complexity for the sharded strategies.
Let us trace through the new logical unshard.
FullyShardedDataParallel._unshard(self, handles: List[FlatParamHandle], prepare_gradient: bool)
- This iterates over the handles and calls
handle.pre_unshard()
,handle.unshard()
, andhandle.post_unshard(prepare_gradient)
in the all-gather stream.
- This iterates over the handles and calls
FlatParamHandle.needs_unshard(self)
- We take an aside to look at this key subroutine.
- For
NO_SHARD
, this returnsFalse
. - For sharded strategies, this checks if the padded unsharded flattened parameter is allocated. The padded unsharded flattened parameter is the base tensor for the unpadded unsharded flattened parameter, which is a view into the padded one. Thus, the padded one's allocation fully determines if the
FlatParameter
is unsharded. - For sharded strategies, to accommodate the parameter mixed precision +
summon_full_params()
case, we introduce_full_prec_full_param_padded
, which is the padded unsharded flattened parameter in full precision. The helper_get_padded_unsharded_flat_param()
takes care of this casing and returns the padded unsharded flattened parameter. Instead of allocating a new tensor each time, we manually manage_full_prec_full_param_padded
's storage just like for_full_param_padded
.
FlatParamHandle.pre_unshard(self)
- For sharded strategies, the postcondition is that the handle's
FlatParameter
points to the tensor to all-gather. This should be on the communication device and in the desired precision. The allocation and usage of the low precision shard for parameter mixed precision and the CPU -> GPU copy for CPU offloading both classify naturally in the pre-unshard. - For sharded strategies, if the
FlatParameter
does not need to be unsharded,pre_unshard()
is a no-op. This avoids unnecessarily allocating and freeing the low precision shard. - For
NO_SHARD
, we simply preserve the existing semantics.
- For sharded strategies, the postcondition is that the handle's
FlatParamHandle.unshard(self)
- If the handle was resharded without freeing the padded unsharded flattened parameter (e.g.
summon_full_params()
between forward and backward whenreshard_after_forward=False
), then theFlatParameter
points to the sharded flattened parameter. We need to switch to using the unsharded parameter. This is a design choice. Alternatively, we may not switch to using the sharded flattened parameter inreshard()
if we do not free the padded unsharded flattened parameter. However, the postcondition that theFlatParameter
points to the sharded flattened parameter afterreshard()
is helpful logically, so I prefer this approach. - Otherwise, this allocates the padded unsharded flattened parameter, all-gathers, and switches to using the unpadded unsharded flattened parameter.
- In the future, we may add an option to
unshard()
that additionally all-gathers the gradient.
- If the handle was resharded without freeing the padded unsharded flattened parameter (e.g.
FlatParamHandle.post_unshard(self, prepare_gradient: bool)
- For sharded strategies, if using parameter mixed precision, this frees the low precision shard. More generally, this should free any sharded allocations made in
pre_unshard()
since the all-gather has been launched. If using CPU offloading, the GPU copy of the local shard goes out of scope afterunshard()
and is able to be garbage collected. We should understand if there is any performance difference between manually freeing versus deferring to garbage collection since our usage is inconsistent. For now, I preserve the existing semantics here. prepare_gradient
is meant to be set toTrue
for the pre-backward unshard andFalse
for the forward unshard. This runs the equivalent logic of_prep_grads_for_backward()
.- This post-unshard logic (notably the gradient preparation) now runs in the all-gather stream, which is fine because we always have the current stream wait for the all-gather stream immediately after
FullyShardedDataParallel._unshard()
. IIUC, we do not need to call_mp_shard.record_stream(current_stream)
(wherecurrent_stream
is the default stream) because_mp_shard
is allocated and freed in the same (all-gather) stream. - A postcondition is that the
FlatParameter
is on the compute device. It should also have the unpadded unsharded size (though I do not have a check for this at the moment).
- For sharded strategies, if using parameter mixed precision, this frees the low precision shard. More generally, this should free any sharded allocations made in
Now that we see how the logical unshard has been reorganized for the core code path, let us dive into summon_full_params()
.
The two constraints are:
- If using parameter mixed precision, we should unshard in full precision.
- We must determine if we should free the padded unsharded flattened parameter upon exiting.
The first constraint is addressed as described before in the core unshard code path, so it remains to explore the second constraint.
I propose a simple rule: We free iff we actually unshard the FlatParameter
in summon_full_params()
(i.e. it was not already unsharded). We perform a case analysis:
Parameter mixed precision enabled:
NO_SHARD
:flat_param.data
points toflat_param._local_shard
, which is the full precision unsharded flattened parameter. This is not safe to free.FULL_SHARD
/SHARD_GRAD_OP
: We force full precision and all-gather to_full_prec_full_param_padded
. We do not supportnested summon_full_params()
, so_full_prec_full_param_padded
must be unallocated. We unshard, and it is safe to free.
Parameter mixed precision disabled:
NO_SHARD
: This is the same as with mixed precision enabled. This is not safe to free.FULL_SHARD
/SHARD_GRAD_OP
: We all-gather to_full_param_padded
. It may already be unsharded.- Already unsharded: The unshard is a no-op. This is not safe to free.
- For
FULL_SHARD
, this can happen for the root FSDP instance afterforward()
but before backward. - For
SHARD_GRAD_OP
, this can happen for all FSDP instances afterforward()
but before backward.
- For
- Needs unshard: We unshard. This is safe to free.
- Already unsharded: The unshard is a no-op. This is not safe to free.
Therefore, we see that it is not safe to free when using NO_SHARD
and when using a sharded strategy but the FlatParameter
is already unsharded. This is precisely the proposed rule.
There were two notable edge cases that the existing code did not address.
- The existing code tests if the
FlatParameter
is already unsharded by checking the allocation status of_full_param_padded
. When using parameter mixed precision, this is the incorrect tensor to check. If_full_param_padded
is allocated (e.g. whenreshard_after_forward=False
and callingsummon_full_params()
between forward and backward), the already-unsharded check is a false positive, andsummon_full_params()
does not correctly force full precision. pytorch/pytorch#83068- This PR's
needs_unshard()
check correctly routes to the appropriate padded unsharded flattened parameter depending on the calling context (i.e. if it needs to force full precision or not).
- This PR's
- The existing code does not free the GPU copy of the padded unsharded flattened parameter when calling
summon_full_params(offload_to_cpu=True)
. It unshards theFlatParameter
, moves the padded unsharded flattened parameter to CPU, and sets theFlatParameter
data to be the appropriate unpadded view into the padded unsharded flattened parameter on CPU. However,_full_param_padded
still points to the all-gathered padded unsharded flattened parameter on GPU, which is kept in memory. pytorch/pytorch#83076- This PR frees the GPU copy and reallocates it upon exiting
summon_full_params()
. This is essential for avoiding peak GPU memory usage from increasing as we recurse through the module tree. There may be some cases where we can avoid reallocation altogether, but that can be addressed in a follow-up PR. - This PR offloads the unpadded unsharded flattened parameter to CPU directly instead of the padded one. As far as I can tell, there is no need to include the padding since unflattening the original parameters does not require the padding.
- The relevant code is in the context manager
FlatParamHandle.to_cpu()
.
- This PR frees the GPU copy and reallocates it upon exiting
This PR removes the mixed precision stream usage. As is, I do not think there is any extra overlap being achieved by the stream usage.
The low precision shard is allocated and copied to in the mixed precision stream (code), and the current stream (in this case the all-gather stream) waits for the mixed precision stream (code). However, we immediately schedule an all-gather that communicates that exact low precision shard (code) with no other meaningful computation between. If we remove the mixed precision stream, the low precision shard is allocated and copied to in the all-gather stream (including the non-blocking CPU -> GPU copy if using CPU offloading).
Under this PR's design, we may consider a "pre-unshard" stream for all logical pre-unshard data transfers if we want to overlap in the future. IIUC, the overlap opportunity exists if there are multiple FlatParameter
s per module, and we only have the all-gather stream wait for the data transfer corresponding to the local shard it communicates, not the others.
If we agree on removing the mixed-precision stream for now, I will remember to delete it from _init_streams()
.
Like with unshard, the first step is the look at the existing _free_full_params()
and _use_param_local_shard()
. A few notable observations:
- For only
NO_SHARD
,_free_full_params()
includes a call to_free_mp_shard()
. - For
summon_full_params()
, there is a separate_free_full_params_and_use_local_shard()
that duplicates the main logic of_free_full_params()
and calls_use_param_local_shard()
. - In
forward()
, ifreshard_after_forward=True
, we call_free_full_params()
and then_free_mp_shard()
. Hence, forNO_SHARD
, the_free_mp_shard()
is a no-op. - In the post-backward hook, we typically call
_free_full_params()
and_free_mp_shard()
. The_free_mp_shard()
is a no-op forNO_SHARD
and ifreshard_after_forward=True
.
Some comments:
- The code certainly works, but some of the no-ops are subtle. When possible, we should make it clear when calls are no-ops or not. It is good that the existing code documents that
_free_mp_shard()
is a no-op in the post-backward hook whenreshard_after_forward=True
. However, there are still some non-obvious no-ops (aroundNO_SHARD
). - We should see if we can avoid the duplicate
_free_full_params_and_use_local_shard()
.
Let us trace through the logical reshard:
FullyShardedDataParallel._reshard(self, handles: List[FlatParamHandle], free_unsharded_flat_params: List[bool])
- The two args should have the same length since they are to be zipped.
- The goal of having
free_unsharded_flat_params
is that the caller should be explicit about whether the (padded) unsharded flattened parameter should be freed. The low precision shard is always meant to be freed (as early as possible), so there is no correspondingList[bool]
.
FlatParamHandle.reshard(self, free_unsharded_flat_param: bool)
- This frees the (padded) unsharded flattened parameter if
free_unsharded_flat_param
and switches to using the sharded flattened parameter. - Echoing back to forcing full precision in
summon_full_params()
,_free_unsharded_flat_param()
frees the correct tensor by using_get_padded_unsharded_flat_parameter()
.
- This frees the (padded) unsharded flattened parameter if
FlatParamHandle.post_reshard(self)
- I am not fully content with the existence of this method, but this seems to be an unavoidable consequence of
NO_SHARD
. Perhaps, this may be useful in the future for other reasons though. - Right now, this method is only meaningful for
NO_SHARD
+ parameter mixed precision + outsidesummon_full_params()
._mp_shard
is not freed in the post-unshard since it is also the low precision unsharded flattened parameter, so we must delay the free until the the post-reshard.
- I am not fully content with the existence of this method, but this seems to be an unavoidable consequence of
Below the FlatParamHandle.reshard()
and post_reshard()
layer, there should not be any no-ops.
One final comment I will mention is that I like the pre_unshard()
, unshard()
, post_unshard()
, and reshard()
, post_reshard()
organization because it makes it clear what the boundaries are and their temporal relationship. Through that, we can set pre- and post-conditions. Furthermore, we can eventually convert logic to hooks that may be registered on the FlatParamHandle
(for pre_unshard()
, post_unshard()
, and post_reshard()
). This may improve the customizability of FSDP.
- This PR reorganizes
forward()
in preparation for non-recursive wrapping, which uses pre-forward and post-forward hooks that expect the signaturehook(module, input)
. For FSDP, themodule
andinput
arguments are not used. - This PR creates a new method
_fsdp_root_pre_forward()
to handle the logic only the root FSDP should run.
Finally, we dive into the prefetching changes. Some highlights:
- This PR unifies the execution order validation and prefetching implementations.
- Both involve the execution order and can be unified to share some boilerplate.
- Execution order validation only runs when the distributed debug level is
INFO
.- We have yet to have one success case where we actually catch an unintended source of dynamism. The warning is also too verbose. Hence, we are gating it by the
INFO
level.
- We have yet to have one success case where we actually catch an unintended source of dynamism. The warning is also too verbose. Hence, we are gating it by the
- This PR moves prefetching to be with respect to groups of handles (as mentioned in the constructor comment).
- This is essential for supporting prefetching with non-recursive wrapping.
- This PR does not include "bubbles", i.e. modules with no handles, in the recorded execution order(s). This deviates from the existing implementation.
- This makes prefetching possibly more aggressive (when there are such bubbles), but it should not have significant performance implications either way.
- This PR changes backward prefetching to reset the post-forward order each iteration (as intended).
- This PR changes forward prefetching to use the first iteration's pre-forward order instead of the first iteration's post-forward order. (We can discuss whether we want this in this PR or not. Otherwise, I can keep it as using the post-forward order to preserve the existing semantics.) This PR also removes the
all_gather_stream.wait_stream(current_stream)
before forward prefetching because it does not help with high GPU reserved memory. We can add that back if desired.
The existing PT-D FSDP pre-backward prefetching uses the reverse post-forward order.
Model Code
class Model(nn.Module):
def __init__(self):
super().__init__()
self.block1 = nn.Sequential(
nn.Conv2d(3, 4, kernel_size=3),
nn.BatchNorm2d(4),
nn.ReLU(inplace=True),
)
self.block2 = nn.Sequential(
nn.Conv2d(4, 4, kernel_size=3),
nn.BatchNorm2d(4),
nn.ReLU(inplace=False),
)
self.block3 = nn.Linear(12, 8)
self.head = nn.Sequential(
nn.AdaptiveAvgPool2d(output_size=(1, 1)),
nn.Flatten(),
nn.Linear(4, 10),
)
def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
return self.head(x)
model = Model().cuda()
fsdp_kwargs = {}
model.block1[1] = FSDP(model.block1[1], **fsdp_kwargs) # BN2d
model.block2[1] = FSDP(model.block2[1], **fsdp_kwargs) # BN2d
model.block1 = FSDP(model.block1, **fsdp_kwargs)
model.block2 = FSDP(model.block2, **fsdp_kwargs)
model.block3 = FSDP(model.block3, **fsdp_kwargs)
model = FSDP(model, **fsdp_kwargs)
Execution Orders
Pre-backward hook for ('head.2.weight', 'head.2.bias') 140339520587136 (model)
Pre-backward hook for ('weight', 'bias') 140339461194656 (block3)
Pre-backward hook for ('0.weight', '0.bias') 140339520589776 (block2)
Pre-backward hook for ('weight', 'bias') 140339520587664 (block2 BN)
Pre-backward hook for ('weight', 'bias') 140339520586656 (block1 BN)
Pre-backward hook for ('0.weight', '0.bias') 140339520588768 (block1)
Pre-forward order:
('head.2.weight', 'head.2.bias') 140339520587136 (model)
('0.weight', '0.bias') 140339520588768 (block1)
('weight', 'bias') 140339520586656 (block1 BN)
('0.weight', '0.bias') 140339520589776 (block2)
('weight', 'bias') 140339520587664 (block2 BN)
('weight', 'bias') 140339461194656 (block3)
Reverse post-forward order:
('head.2.weight', 'head.2.bias') 140339520587136 (model)
('weight', 'bias') 140339461194656 (block3)
('0.weight', '0.bias') 140339520589776 (block2)
('weight', 'bias') 140339520587664 (block2 BN)
('0.weight', '0.bias') 140339520588768 (block1)
('weight', 'bias') 140339520586656 (block1 BN)
Differential Revision: D39293429 Pull Request resolved: pytorch/pytorch#83665 Approved by: https://github.com/zhaojuanmao
Life is one big road with lots of signs. So when you riding through the ruts, don't complicate your mind. Flee from hate, mischief and jealousy. Don't bury your thoughts, put your vision to reality. Wake Up and Live!
Rollup merge of #100473 - compiler-errors:normalize-the-fn-def-sig-plz, r=lcnr
Attempt to normalize FnDef
signature in InferCtxt::cmp
Stashes a normalization callback in InferCtxt
so that the signature we get from tcx.fn_sig(..).subst(..)
in InferCtxt::cmp
can be properly normalized, since we cannot expect for it to have normalized types since it comes straight from astconv.
This is kind of a hack, but I will say that @jyn514
found the fact that we present unnormalized types to be very confusing in real life code, and I agree with that feeling. Though altogether I am still a bit unsure about whether this PR is worth the effort, so I'm open to alternatives and/or just closing it outright.
On the other hand, this isn't a ridiculously heavy implementation anyways -- it's less than a hundred lines of changes, and half of that is just miscellaneous cleanup.
This is stacked onto #100471 which is basically unrelated, and it can be rebased off of that when that lands or if needed.
The code:
trait Foo { type Bar; }
impl<T> Foo for T {
type Bar = i32;
}
fn foo<T>(_: <T as Foo>::Bar) {}
fn needs_i32_ref_fn(f: fn(&'static i32)) {}
fn main() {
needs_i32_ref_fn(foo::<()>);
}
Before:
= note: expected fn pointer `fn(&'static i32)`
found fn item `fn(<() as Foo>::Bar) {foo::<()>}`
After:
= note: expected fn pointer `fn(&'static i32)`
found fn item `fn(i32) {foo::<()>}`
Add CodingFantasy's CSS coding interactive games (#5490)
- Add "Knights of the Flexbox table" game
Welcome to the Knights of the Flexbox table. A game where you can help Sir Frederic Flexbox and his friends to uncover the treasures hidden in the Tailwind CSS dungeons. You can navigate the knight through the dungeon by changing his position within the dungeon using Flexbox and Tailwind CSS.
- Add "Flex Box Adventure" game
Once upon a time, there was a King Arthur. He ruled his kingdom fair and square. But Arthur had one problem. He was a very naive person. So one sunny day, three alchemist brothers offered Arthur to exchange all his Gold Coins for coins made of a more valuable new metal that they had invented - Bit Coins.
Arthur believed them and gave them all his gold. The brothers took the gold and promised to give the bit coins back to Arthur in seven days.
Seven days passed. The brothers have not turned up. Arthur realized he had been scammed. He is angry and intends to take revenge on them. Let's help him do it with our weapon – CSS Flex Box!
We made this game for You
-
You often stumble and try to figure out which combination of Flex Box properties makes the browser do what you want it to do.
-
You want to create complex web layouts without constantly looking at the web page after every Cmd/Ctrl+S press in the code editor.
-
You have tried to learn Flex Box with video tutorials and articles but still don't fully understand how some parts of it work.
4*. Or, if you are a master of CSS Flex Box, we have something interesting and for you too (read further).
Have you found yourself there? Then you definitely want to learn or improve your Flex Box skills. So we have good news for you, really good news...
Learn Flex Box by Playing Game No more boring videos, tutorials and courses. Learn Flex Box in a completely new, fun, effective and revolutionary way. By playing Flex Box coding game!
- Add "Grid Attack" coding game
In an ancient Elvish prophecy, it was said that one day a man would be born with an incredible power that predicts the future – "Marketi Predictori." And another will come to take this power. But the years went by and nothing happened. Until one day, a little elf was born. He was named Luke.
From an early age, he surprised his parents and his sister Rey by guessing the price of apples at the farmer's market before they even reached it. Every year his power rose and his predictions became more and more accurate. But there was one thing Luke could not predict. The coming of the demon Valcorian. It was the one from the prophecy that was to come and take Luke's power. One day Valcorian and his army attacked the town where Luke had lived and kidnapped him to make a ritual of stealing his power.
Go on a dangerous quest with Luke's sister Rey and find her brother. Defeat Valcorian and all his demons using a secret weapon – CSS Grid.
We made this game for You?
-
You often stumble and try to figure out which combination of Grid properties makes the browser do what you want it to do.
-
You are scared by the number of properties a CSS Grid has, and you feel uncomfortable when you need to create a grid layout.
-
You want to create complex web layouts using Grid, but without constantly looking at the web page after every "Cmd/Ctrl+S" press in the code editor.
-
You have tried to learn CSS Grid with video tutorials and articles but still don't fully understand how some parts of it work.
-
You use a Flex Box where Grid is required because you don't feel confident in using it.
Have you found yourself there? Then you definitely want to learn or improve your Grid skills. So we have good news for you, really good news...
Learn Grid by Playing CSS Game No more boring videos, courses and articles. Learn Grid in a revolutionary new, fun, and effective way. By playing a Grid coding game!
docs: My menu lists for breakfast, lunch, dinner
I'm going to keep memoing and rewriting it.
Update some parameters
Dear Matt and Chloe
I have add the count data of
-
Single Female: preferred to the female that might mature to be breeder but failed to producing offspring and immature female that didn't ready for breed.
-
The count data of male which including chick, juvenile, 1 year male, 2 year males, and also the 3 year s male.
However, I found the 3 year male count quite experienced overpopulation due to mature male seem to be more sticked to some area more than other age class, especially in breeding season that they often keep patrol some trail which frequency detected by camera trap causing multiple count in one day period. Thus, I decided to proceed the filter 3 year male count as have done in other age class before to get their number more accurate in Bayesian and MCMC analysis.
So the data of 3 years male would be available soon as I deal with the over count as possible.
Best Regards Ghan Saridniru
New output format: iso-efi-initrd
This option was previously not available and required postprocessing of a tar-kernel-initrd
output.
Comparison with iso-efi
:
iso-efi
only loads the kernel at boot, and the root filesystem is mounted from the actual boot media (eg, a CD-ROM - physical or emulated). This can often cause trouble (it has for us) for multiple reasons:
- the linuxkit kernel might not have the correct drivers built-in for the hardware (see #3154)
- especially with virtual or emulated CD-ROMs, performance can be abysmal: we saw the case where the server IPMI allowed using a ISO stored in AWS S3 over HTTP...you can imagine what happens when you start doing random I/O on the root fs in that case.
- The ISO image has the root device name baked in (ie,
/dev/sr0
) which fails if for some reason the CD-ROM we're running from doesn't end up using that device, so manual tweaking is required (see #2375)
iso-efi-initrd
, on the other hand, packs the root filesystem as an initramfs (ie similar to what the raw output does, except that in this case we're preparing an ISO image), so both the kernel and the initramfs are loaded in memory by the boot loader and, once running, we don't need to worry about root devices or kernel drivers (and the speed is good, as everything runs in RAM).
Also, the generated ISO can be copied verbatim (eg with dd
) onto a USB media and it still works.
Finally, the image size is much smaller compared to iso-efi
.
IMHO, iso-efi-initrd
could be used almost anywhere iso-efi
would be used, or might even supersede it. I can't think of a scenario where one might explicitly want to use iso-efi
.
Points to consider:
- Not tested under aarch64 as I don't have access to that arch. If the automated CI tests also test that, then it should be fine.
- I'm not sure what to put inside
images.yaml
for theiso-efi-initrd
image. As it is it works of course (my personal image on docker hub), but I guess it'll have to be some more "official" image. However, that cannot be until this PR is merged, so it's kind of a chicken and egg situation. Please advise. - I can look into adding the corresponding
iso-bios-initrd
builder if there is interest.
Signed-off-by: Davide Brini [email protected]
People listen up don't stand so close, I got somethin that you all should know. Holy matrimony is not for me, I'd rather die alone in misery.
Fuck you gitignore
No. build.gradle is neither build nor .gradle. Gosh darn it.
[IMP] *: owlify password meter and convert change password to real wizard
The changes in auth_password_policy
are largely the owlification of
the password meter widget:
- modernize the password policy module and convert it to an
odoo-module (note: now exports a pseudo-abstract class which is
really a policy, for the sake of somewhat sensibly typing
recommendations
) - replace the implementation of the Meter and PasswordField widgets by owl versions
The changes to web and base stem from taking a look at converting the ChangePassword wizard, and finding that it would be a pain in the ass but also... unnecessary? It seems to have been done as a wizard completely in javascript despite being backend-only for legacy reasons: apparently one of the very old web clients (v5 or v6 probably) implemented it as a "native action" which was directly part of the client's UI, and so it had to be implemented entirely in the client.
Over time it was moved back into the regular UI (and moved around quite a bit), hooked as a client action to maintain access to the existing UI / dialog.
But since it's been an action opened via a button for years it can just... be a normal wizard, with password fields, which auth_password_policy can then set the widget of.
So did that:
- removed the old unnecessary JS, and its dedicated endpoint (which is not used by portal, portal has its own endpoint)
- used check_identity for the "old password check"
- split out
change_password
with an internal bit so we can have a safer (and logged) "set user password" without needing to provide the old password, which is now used for the bulk password change wizard as well - added a small wizard which just takes a new password (and confirmation), for safety a given change password wizard is only accessible to their creator (also the wizard is restricted to employees though technically it would probably be fine for portal users as well)
Rather than extensive messy rewrite / monkeypatching (the original
wizard was 57 LOC, though also 22 LOC of template, the auth_policy
hooking / patching was 33, plus 8 lines of CSS),
auth_password_policy
just sets the widget of the new_password
field in the new wizard, much as it did the bulk wizard.
Also improve the "hide meter if field is empty" feature by leveraging
:placeholder-shown
. This requires setting a placeholder, and while
empty works fine in firefox, it doesn't work in chrome. So the
placeholder needs to be a single space. Still, seems better than
updating a fake attribute or manipulating a class for the sake of
trivial styling.
Notes on unlink + transient vacuum
Although the wizard object is only created when actually calling
change_password
, and is deleted on success, it is possible for the
user to get an error and fail to continue (it should be unlikely
without overrides since the passwords are checked while creating /
saving but...).
While in that case the new_password
in the database is not the
user's own, it could be their future password, or give evidence as
to their password-creation scheme, or some other signal useful to
attack that front of the user's life and behavior. As such, quickly
removing leftovers from the database (by setting a very low transient
lifetime) seems like a good idea.
This is compounded by the check_identity
having a grace period of 10
minutes. 0.1 is 6 minutes, but because the cron runs every 10 the user
effectively has 6~10 minutes between the moment they create an
incorrect / incomplete version of the wizard and the moment where it
is destroyed if they just leave it.
closes odoo/odoo#99458
Signed-off-by: Xavier Morel (xmo) [email protected]
Merge bitcoin/bitcoin#24513: CChainState -> Chainstate
00eeb31c7660e2c28f189f77a6905dee946ef408 scripted-diff: rename CChainState -> Chainstate (James O'Beirne)
Pull request description:
Alright alright alright, I know: we hate refactors. We especially hate cosmetic refactors.
Nobody knows better than I that changing broad swaths of code out from under our already-abused collaborators, only to send a cascade of rebase bankruptcies, is annoying at best and sadistic at worst. And for a rename! The indignation!
But just for a second, imagine yourself. Programming bitcoin/bitcoin
, on a sandy beach beneath a lapis lazuli sky. You go to type the name of what is probably the most commonly used data structure in the codebase, and you only hit shift once.
What could you do in such a world? You could do anything. The only limit is yourself.
So maybe you like the idea of this patch but really don't want to deal with rebasing. You're in luck!
Here're the commands that will bail you out of rebase bankruptcy:
git rebase -i $(git merge-base HEAD master) \
-x 'sed -i "s/CChainState/Chainstate/g" $(git ls-files | grep -E ".*\.(py|cpp|h)$") && git commit --amend --no-edit'
# <commit changed?>
git add -u && git rebase --continue
Anyway I'm not sure how serious I am about this, but I figured it was worth proposing. I have decided I am very serious about this.
Maybe we can have nice things every once in a while?
ACKs for top commit: MarcoFalke: cr ACK 00eeb31c7660e2c28f189f77a6905dee946ef408 hebasto: ACK 00eeb31c7660e2c28f189f77a6905dee946ef408 glozow: ACK 00eeb31c7660e2c28f189f77a6905dee946ef408, thanks for being the one to propose this w0xlt: ACK https://github.com/bitcoin/bitcoin/pull/24513/commits/00eeb31c7660e2c28f189f77a6905dee946ef408
Tree-SHA512: b828a99780614a9b74f7a9c347ce0687de6f8d75232840f5ffc26e02bbb25a3b1f5f9deabbe44f82ada01459586ee8452a3ee2da05d1b3c48558c8df6f49e1b1
[MIRROR] Dimensional Anomaly [MDB IGNORE] (#15974)
- Dimensional Anomaly (#69512)
About The Pull Request
Everyone has been asking: "When will there be an anomaly like the bioscrambler, but for the space station? Please, we need more things which replace objects with different objects from the same typepath." Well I made it and it looked like ass because non-tiling floor and walls look terrible, so then I made this instead. Dimensional.mp4
The "dimensional anomaly" shifts matter into a parallel dimension where objects are made out of something else. Like the Bioscrambler anomaly, it does not expire on its own and only leaves when someone signals it or uses an anomaly remover. When it spawns it picks a "theme" and converts terrain around it until it covers a 7x7 square, then it teleports somewhere else and picks a new theme.
A lot of these themes are relatively benign like "meat", "fancy carpet", or "gold". Some of them are kind of annoying like "icebox" because it creates floor which slows you down, or "clown" because bananium is intentionally annoying. Some of them are actively dangerous, mostly "uranium" and "plasma". The main problem this will usually cause for crewmembers is decreasing area security. When it replaces doors it replaces them with ones which don't have any access control, and it will also replace RWalls with normal and much more vulnerable walls which will make breaking and entering significantly easier until someone has taken the time to fix the damage. But also sometimes it will irradiate them, you never know.
The fact that sometimes the changes are benign (or provide uncommon materials) and might be happening in places you don't care about access to might encourage people to push their luck and leave it alone until it starts turning the captain's office into a bamboo room or repainting medbay a fetching shade of flammable purple, which I would consider a success. Armour.mp4
If you successfully harvest the anomaly core you can place it into the reactive armour to get Reactive Barricade Armour, which shifts your dimension when you take damage and attempts to place some randomised (not terribly durable) objects between you and hopefully your attacker (it really just picks up to four random unoccupied tiles next to you). If you're EMPed then the changes it make to the environment will often be as unpleasant for you as they are for a pursuer, and significantly more likely to harm both of you rather than just provide obstacles.
Other changes: I split anomalies out into their own dmi file, seems to be all the rage lately. I moved the anomaly placing code into a datum instead of the event because I wanted to reuse it but if you have a better idea about where I could have put it let me know. This also fixes a bug where the material spreader component wasn't working when I applied plasma materials to something, the extra whitespace was parsing as another argument for some reason and meant it would runtime. Supermatter delamination was still pointing to Delimber anomalies instead of Bioscrambler.
-
Dimensional Anomaly
-
Fixes the upstream merge skew
Co-authored-by: Jacquerel [email protected] Co-authored-by: GoldenAlpharex [email protected] Co-authored-by: GoldenAlpharex [email protected]
[MIRROR] Resolves is_banned_from headaches and lag (Speeds up roundstart significantly) [MDB IGNORE] (#16001)
- Resolves is_banned_from headaches and lag (Speeds up roundstart significantly) (#69376)
About The Pull Request
Just to be clear, when I refer to time here, I am not talking about cpu time. I'm talking about real time. This doesn't significantly reduce the amount of work we do, it just removes a lot of the waiting around we need to do for db calls to finish.
Adds queuing support to sql bans, so if an ongoing ban retrieval query is active any successive ban retrieval attempts will wait for the active query to finish
This uses the number/blocking_query_timeout config option, I hope it's still valid
This system will allow us to precache ban info, in parallel (or in batches) With this, we can avoid needing to setup all uses of is_banned_from to support parallelization or eat the cost of in-series database requests
Clients who join after initialize will now build a ban cache automatically
Those who join before init is done will be gathered by a batch query sent by a new subsystem, SSban_cache.
This means that any post initalize uses of is_banned_from are worst case by NATURE parallel (since the request is already sent, and we're just waiting for the response)
This saves a lot of headache for implementers (users) of the proc, and saves ~0.9 second from roundstart setup for each client (on /tg/station)
There's a lot of in series is_banned_from calls in there, and this nukes them. This should bring down roundstart join times significantly.
It's hard to say exactly how much, since some cases generate the ban cache at other times. At base tho, we save about 0.9 seconds of real time per client off doing this stuff in parallel. Why It's Good For The Game
When I use percentages I'm speaking about cost per player
I don't like how slow roundstart feels, this kills about 66% of that. the rest is a lot of misc things. About 11% (it's actually 16%) is general mob placing which is hard to optimize. 22% is manifest generation, most of which is GetFlatIcons which REALLY do not need to be holding up the main thread of execution.
An additional 1 second is constant cost from a db query we make to tell the server we exist, which can be made async to avoid holding the proc chain.
That's it. I'm bullying someone into working on the manifest issue, so that should just leave 16% of mob placing, which is really not that bad compared to what we have now. Changelog
cl code: The time between the round starting and the game like, actually starting has been reduced by 66% refactor: I've slightly changed how ban caches are generated, admins please let me know if anything goes fuckey server: I'm using the blocking_query_timeout config. Make sure it's up to date and all. /cl
- Resolves is_banned_from headaches and lag (Speeds up roundstart significantly)
Co-authored-by: LemonInTheDark [email protected]
Add Jest imports everywhere, remove global availability (#6911)
In ESM mode, the jest
globals are technically no longer available. For
some reason, we're still getting away with it, but this issue surfaces
in the v29 upgrade PR so I'm going to address it separately.
https://jestjs.io/docs/ecmascript-modules#differences-between-esm-and-commonjs
Ref: apollographql/apollo-server#6850 (comment)
Rocking The Boat, er, Map Vote (#69561)
- Rocking The Boat, er, Map Vote
Hey there,
A while ago, I spooke (typo intentional) to some other people. One frustration I heard was the fact that people would sometimes sneak through map votes during the very start of a shift, during a high-paced portion, or just as a meme. People in OOC would then flood the vote, putting in any given station. However, if a vote happens 10 minutes in- and the round goes for 70 minutes and not many of the original players are around, then it's not particularly fair to those who have to play next shift on a map they bemoan.
So, we can rock the vote! If a player isn't particularly chuffed with the hand they are given, they can poll the players to see if they want to change the map as well. If rocking the vote goes through, huzzah, you get the ability to vote for the map again. If it doesn't go through: tough luck. You can rock the vote one time per shift by default, and server operators can change the amount of times you can call to rock the map vote at their discretion. Calling to rock the vote either successfully or non-successfully counts as a "call", and when that limit is exceeded: no more calls.
Does this mean that we will only rotate between two maps because pissants will keep rocking the vote until they get what they like? Maybe? I still see people bemoan getting Tram or shit the bed over IceBox, but I think enough people get sick of bread-on-butter to take the server where it need to go. If operators don't really like seeing only two maps play, they can always adjust the config to ensure it doesn't happen.
- makes the grammar grammar
it would be "Rock the Vote vote" otherwise
core: Refactor Pixel features
-
Magic Eraser is wonky and hard to enable and all this mess isn't really worth the trouble so just stick to the older setup.
-
Default Pixel 5 spoof for Photos and only switch to Pixel XL when spoof is toggled.
-
We will try to bypass 2021 features and Raven props for non-Pixel 2021 devices as apps usage requires TPU.
-
Remove P21 experience system feature check
sparse: introduce conditional lock acquire function attribute
The kernel tends to try to avoid conditional locking semantics because it makes it harder to think about and statically check locking rules, but we do have a few fundamental locking primitives that take locks conditionally - most obviously the 'trylock' functions.
That has always been a problem for 'sparse' checking for locking imbalance, and we've had a special '__cond_lock()' macro that we've used to let sparse know how the locking works:
# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
so that you can then use this to tell sparse that (for example) the spinlock trylock macro ends up acquiring the lock when it succeeds, but not when it fails:
#define raw_spin_trylock(lock) __cond_lock(lock, _raw_spin_trylock(lock))
and then sparse can follow along the locking rules when you have code like
if (!spin_trylock(&dentry->d_lock))
return LRU_SKIP;
.. sparse sees that the lock is held here..
spin_unlock(&dentry->d_lock);
and sparse ends up happy about the lock contexts.
However, this '__cond_lock()' use does result in very ugly header files, and requires you to basically wrap the real function with that macro that uses '__cond_lock'. Which has made PeterZ NAK things that try to fix sparse warnings over the years [1].
To solve this, there is now a very experimental patch to sparse that basically does the exact same thing as '__cond_lock()' did, but using a function attribute instead. That seems to make PeterZ happy [2].
Note that this does not replace existing use of '__cond_lock()', but only exposes the new proposed attribute and uses it for the previously unannotated 'refcount_dec_and_lock()' family of functions.
For existing sparse installations, this will make no difference (a negative output context was ignored), but if you have the experimental sparse patch it will make sparse now understand code that uses those functions, the same way '__cond_lock()' makes sparse understand the very similar 'atomic_dec_and_lock()' uses that have the old '__cond_lock()' annotations.
Note that in some cases this will silence existing context imbalance warnings. But in other cases it may end up exposing new sparse warnings for code that sparse just didn't see the locking for at all before.
This is a trial, in other words. I'd expect that if it ends up being successful, and new sparse releases end up having this new attribute, we'll migrate the old-style '__cond_lock()' users to use the new-style '__cond_acquires' function attribute.
The actual experimental sparse patch was posted in [3].
Link: https://lore.kernel.org/all/[email protected]/ [1] Link: https://lore.kernel.org/all/[email protected]/ [2] Link: https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/ [3] Acked-by: Peter Zijlstra [email protected] Cc: Alexander Aring [email protected] Cc: Luc Van Oostenryck [email protected] Signed-off-by: Linus Torvalds [email protected]
[MIRROR] Refactors SM gas behavior to be datum based instead of list based + powerloss co2 buff [MDB IGNORE] (#16000)
- Refactors SM gas behavior to be datum based instead of list based + powerloss co2 buff (#69158)
About The Pull Request
Title! The CO2 thing is there because it makes my job much easier. Can probably find a way to make it move slowly if a maint insist on it. Prefer not to though.
Drafting because I want to make a second PR that have more sweeping changes (clean vars up, make a simpler formula for damage and heat production, delete underused behaviors, etc). Would honestly prefer if both this and that gets merged at the same time but I'm separating it out since it might be rejected. Or maybe ill combine it here we'll see. Ignore that, looks like i can keep this one absolutely atomic. Why It's Good For The Game
Had a lot of trouble when trying to document the SM gas interactions into the wiki, the interactions are all scattered and tracking down everything a gas does is extremely annoying. Hopefully this fixes that. Changelog
cl balance: CO2 powerloss inhibition now works immediately based on gas composition instead of slowly ramping up. refactor: refactored how the SM fetches it's gas info and data. No changes expected except for the co2 thing. /cl
- Refactors SM gas behavior to be datum based instead of list based + powerloss co2 buff
Co-authored-by: vincentiusvin [email protected]
Cargonia, but with less weapons (#4370)
- Cargonia is no more
Fuck you
-
PLEASE
-
Zandario, I will suck your PP
Master of Uno Nerf
--Removed Polymorphine from Master of Transmuation material changing --Removed Master of Transmutation from Fungicaves pool ++Added Master of Transmutation to Lukki Lair pool.
I believe my original idea here was to have him in a tight space to show off his powers better, but I forgot to consider power scaling, he was a bit too strong for the area relative to everything else around him. Additionally the polymorph liquid transmutation was cool, but ultimately too dangerous unless I want to up the creature to be in heaven/hell in my opinion. Heaven is where poly liquid spam is everywhere in the base game, and I want Uno to appear elsewhere.
Fuck you AutisticMOFO I'm staying with my cums and balls
random: implement getrandom() in vDSO
Big changes:
Thomas' previous objection was two-fold: 1) vgetrandom should really have the same function signature as getrandom, in addition to all of the same behavior, and 2) having vgetrandom_alloc be a vDSO function doesn't make sense, because it doesn't actually need anything from the VDSO data page and it doesn't correspond to an existing syscall.
After a discussion at Plumbers this last week, we devised the following ways to fix these: 1) we make the opque state argument be the last argument of vgetrandom, since the real syscall ignores it, and that way all the registers are the same, and no behavior changes, and 2) we make vgetrandom_alloc a syscall, rather than a vDSO function, which also gives it added flexibility for the future, which is good.
Making those changes also reduced the size of this patch a bit.
Smaller changes:
- Properly add buffer offset position.
- Don't EXPORT_SYMBOL for vDSO code.
- Account for timens and vvar being in swapped pages.
Two statements:
-
Userspace wants faster cryptographically secure random numbers of arbitrary size, big or small.
-
Userspace is currently unable to safely roll its own RNG with the same security profile as getrandom().
Statement (1) has been debated for years, with arguments ranging from "we need faster cryptographically secure card shuffling!" to "the only things that actually need good randomness are keys, which are few and far between" to "actually, TLS CBC nonces are frequent" and so on. I don't intend to wade into that debate substantially, except to note that recently glibc added arc4random(), whose goal is to return a cryptographically secure uint32_t, and there are real user reports of it being too slow. So here we are.
Statement (2) is more interesting. The kernel is the nexus of all entropic inputs that influence the RNG. It is in the best position, and probably the only position, to decide anything at all about the current state of the RNG and of its entropy. One of the things it uniquely knows about is when reseeding is necessary.
For example, when a virtual machine is forked, restored, or duplicated, it's imparative that the RNG doesn't generate the same outputs. For this reason, there's a small protocol between hypervisors and the kernel that indicates this has happened, alongside some ID, which the RNG uses to immediately reseed, so as not to return the same numbers. Were userspace to expand a getrandom() seed from time T1 for the next hour, and at some point T2 < hour, the virtual machine forked, userspace would continue to provide the same numbers to two (or more) different virtual machines, resulting in potential cryptographic catastrophe. Something similar happens on resuming from hibernation (or even suspend), with various compromise scenarios there in mind.
There's a more general reason why userspace rolling its own RNG from a getrandom() seed is fraught. There's a lot of attention paid to this particular Linuxism we have of the RNG being initialized and thus non-blocking or uninitialized and thus blocking until it is initialized. These are our Two Big States that many hold to be the holy differentiating factor between safe and not safe, between cryptographically secure and garbage. The fact is, however, that the distinction between these two states is a hand-wavy wishy-washy inexact approximation. Outside of a few exceptional cases (e.g. a HW RNG is available), we actually don't really ever know with any rigor at all when the RNG is safe and ready (nor when it's compromised). We do the best we can to "estimate" it, but entropy estimation is fundamentally impossible in the general case. So really, we're just doing guess work, and hoping it's good and conservative enough. Let's then assume that there's always some potential error involved in this differentiator.
In fact, under the surface, the RNG is engineered around a different principal, and that is trying to use new entropic inputs regularly and at the right specific moments in time. For example, close to boot time, the RNG reseeds itself more often than later. At certain events, like VM fork, the RNG reseeds itself immediately. The various heuristics for when the RNG will use new entropy and how often is really a core aspect of what the RNG has some potential to do decently enough (and something that will probably continue to improve in the future from random.c's present set of algorithms). So in your mind, put away the metal attachment to the Two Big States, which represent an approximation with a potential margin of error. Instead keep in mind that the RNG's primary operating heuristic is how often and exactly when it's going to reseed.
So, if userspace takes a seed from getrandom() at point T1, and uses it for the next hour (or N megabytes or some other meaningless metric), during that time, potential errors in the Two Big States approximation are amplified. During that time potential reseeds are being lost, forgotten, not reflected in the output stream. That's not good.
The simplest statement you could make is that userspace RNGs that expand a getrandom() seed at some point T1 are nearly always worse, in some way, than just calling getrandom() every time a random number is desired.
For those reasons, after some discussion on libc-alpha, glibc's arc4random() now just calls getrandom() on each invocation. That's trivially safe, and gives us latitude to then make the safe thing faster without becoming unsafe at our leasure. Card shuffling isn't particularly fast, however.
How do we rectify this? By putting a safe implementation of getrandom() in the vDSO, which has access to whatever information a particular iteration of random.c is using to make its decisions. I use that careful language of "particular iteration of random.c", because the set of things that a vDSO getrandom() implementation might need for making decisions as good as the kernel's will likely change over time. This isn't just a matter of exporting certain data to userspace. We're not going to commit to a "data API" where the various heuristics used are exposed, locking in how the kernel works for decades to come, and then leave it to various userspaces to roll something on top and shoot themselves in the foot and have all sorts of complexity disasters. Rather, vDSO getrandom() is supposed to be the same exact algorithm that runs in the kernel, except it's been hoisted into userspace as much as possible. And so vDSO getrandom() and kernel getrandom() will always mirror each other hermetically.
API-wise, the vDSO gains one function:
ssize_t vgetrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state);
The return value and the first 3 arguments are the same as ordinary getrandom(), while the last argument is a pointer to some state allocated with vgetrandom_alloc(), explained below. Were all four arguments passed to the getrandom syscall, nothing different would happen, and the functions would have the exact same behavior.
Then, we introduce a new syscall:
void *vgetrandom_alloc([inout] size_t *num, [out] size_t *size_per_each, unsigned int flags);
This takes the desired number of opaque states in num
, and returns a
pointer to an array of opaque states, the number actually allocated back
in num
, and the size in bytes of each one in size_per_each
, enabling
a libc to slice up the returned array into a state per each thread. (The
flags
argument is always zero for now.) We very intentionally do not
leave state allocation up to the caller of vgetrandom, but provide
vgetrandom_alloc for that allocation. There are too many weird things
that can go wrong, and it's important that vDSO does not provide too
generic of a mechanism. It's not going to store its state in just any
old memory address. It'll do it only in ones it allocates.
Right now this means it's a mlock'd page with WIPEONFORK set. In the future maybe there will be other interesting page flags or anti-heartbleed measures, or other platform-specific kernel-specific things that can be set from the syscall. Again, it's important that the kernel has a say in how this works rather than agreeing to operate on any old address; memory isn't neutral. The syscall currently accomplishes this with a call to vm_mmap() and then a call to do_madvise(). It'd be nice to do this all at once, but I'm not sure that a helper function exists for that now, and it seems a bit premature to add one, at least for now.
The interesting meat of the implementation is in lib/vdso/getrandom.c, as generic C code, and it aims to mainly follow random.c's buffered fast key erasure logic. Before the RNG is initialized, it falls back to the syscall. Right now it uses a simple generation counter and a timestamp to make its decisions on reseeding; this covers many cases, but not all, so this RFC still has a little bit of improvement work to do. But it should give you the general idea.
The actual place that has the most work to do is in all of the other files. Most of the vDSO shared page infrastructure is centered around gettimeofday, and so the main structs are all in arrays for different timestamp types, and attached to time namespaces, and so forth. I've done the best I could to add onto this in an unintrusive way, but you'll notice almost immediately from glancing at the code that it still needs some untangling work. This also only works on x86 at the moment. I could certainly use a hand with this part.
So far in my test results, performance is pretty stellar (around 15x for uint32_t generation), and it seems to be working. There are a couple TODO bits with the actual random.c lifetime integration, but as an early patchset, this at least introduces the intended interface.
Cc: [email protected] Cc: [email protected] Cc: Thomas Gleixner [email protected] Cc: Adhemerval Zanella Netto [email protected] Cc: Carlos O'Donell [email protected] Signed-off-by: Jason A. Donenfeld [email protected]