Skip to content

Latest commit

 

History

History
1289 lines (957 loc) · 78.7 KB

2022-09-15.md

File metadata and controls

1289 lines (957 loc) · 78.7 KB

< 2022-09-15 >

2,209,147 events recorded by gharchive.org of which 2,209,147 were push events containing 3,350,540 commit messages that amount to 241,492,037 characters filtered with words.py@e23d022007... to these 30 messages:

Thursday 2022-09-15 00:07:05 by Johannes Weiner

mm: vmscan: fix extreme overreclaim and swap floods

During proactive reclaim, we sometimes observe severe overreclaim, with several thousand times more pages reclaimed than requested.

This trace was obtained from shrink_lruvec() during such an instance:

prio:0 anon_cost:1141521 file_cost:7767
nr_reclaimed:4387406 nr_to_reclaim:1047 (or_factor:4190)
nr=[7161123 345 578 1111]

While he reclaimer requested 4M, vmscan reclaimed close to 16G, most of it by swapping. These requests take over a minute, during which the write() to memory.reclaim is unkillably stuck inside the kernel.

Digging into the source, this is caused by the proportional reclaim bailout logic. This code tries to resolve a fundamental conflict: to reclaim roughly what was requested, while also aging all LRUs fairly and in accordance to their size, swappiness, refault rates etc. The way it attempts fairness is that once the reclaim goal has been reached, it stops scanning the LRUs with the smaller remaining scan targets, and adjusts the remainder of the bigger LRUs according to how much of the smaller LRUs was scanned. It then finishes scanning that remainder regardless of the reclaim goal.

This works fine if priority levels are low and the LRU lists are comparable in size. However, in this instance, the cgroup that is targeted by proactive reclaim has almost no files left - they've already been squeezed out by proactive reclaim earlier - and the remaining anon pages are hot. Anon rotations cause the priority level to drop to 0, which results in reclaim targeting all of anon (a lot) and all of file (almost nothing). By the time reclaim decides to bail, it has scanned most or all of the file target, and therefor must also scan most or all of the enormous anon target. This target is thousands of times larger than the reclaim goal, thus causing the overreclaim.

The bailout code hasn't changed in years, why is this failing now? The most likely explanations are two other recent changes in anon reclaim:

  1. Before the series starting with commit 5df741963d52 ("mm: fix LRU balancing effect of new transparent huge pages"), the VM was overall relatively reluctant to swap at all, even if swap was configured. This means the LRU balancing code didn't come into play as often as it does now, and mostly in high pressure situations where pronounced swap activity wouldn't be as surprising.

  2. For historic reasons, shrink_lruvec() loops on the scan targets of all LRU lists except the active anon one, meaning it would bail if the only remaining pages to scan were active anon - even if there were a lot of them.

    Before the series starting with commit ccc5dc67340c ("mm/vmscan: make active/inactive ratio as 1:1 for anon lru"), most anon pages would live on the active LRU; the inactive one would contain only a handful of preselected reclaim candidates. After the series, anon gets aged similarly to file, and the inactive list is the default for new anon pages as well, making it often the much bigger list.

    As a result, the VM is now more likely to actually finish large anon targets than before.

Change the code such that only one SWAP_CLUSTER_MAX-sized nudge toward the larger LRU lists is made before bailing out on a met reclaim goal.

This fixes the extreme overreclaim problem.

Fairness is more subtle and harder to evaluate. No obvious misbehavior was observed on the test workload, in any case. Conceptually, fairness should primarily be a cumulative effect from regular, lower priority scans. Once the VM is in trouble and needs to escalate scan targets to make forward progress, fairness needs to take a backseat. This is also acknowledged by the myriad exceptions in get_scan_count(). This patch makes fairness decrease gradually, as it keeps fairness work static over increasing priority levels with growing scan targets. This should make more sense - although we may have to re-visit the exact values.

Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner [email protected] Reviewed-by: Rik van Riel [email protected] Acked-by: Mel Gorman [email protected] Cc: Hugh Dickins [email protected] Cc: Joonsoo Kim [email protected] Cc: [email protected] Signed-off-by: Andrew Morton [email protected]


Thursday 2022-09-15 00:17:34 by ShivalaOfVudra

Details that will be updated with more info as it occurs to me

Notable Abilities Augmented strength, perception, and pain tolerance when under extreme stress. Capacity to shapeshift, wildly altering form in accordance with will. Intrinsic magical abilities of suspected Abyssal origin. Formidable opponent in weapon and hand-to-hand combat. Personality Distrustful, she is aloof with all but those who earn her fidelity with time and commitment. Superficially congenial, a great wound rests upon her heart, leading her to reach out desperately for love from any whom she feels can provide it. This wound also fuels the rage with which she strikes out at the world, often on the behalf of the abused and down-trodden that mirror the horrors of her own life. Drive Hers is a nomad's grace. Little ties her to any one place, and she acts in accordance with the law of her spirit. Her only overarching goal is to understand just where she comes from, and why the power of the Abyss resides within her.


Thursday 2022-09-15 00:42:05 by Mårten Rånge

Relax shader strictness in RELEASE mode (#13998)

Disables strictness and warnings as errors for custom pixel shaders in RELEASE. Windows terminal is not telling the user why the shader won't compile which makes it very frustrating for the shader hacker.

After trying the recent preview none of my shaders loaded anymore in Windows Terminal Preview which made me very sad. I had no idea what was wrong with them. After cloning the git repo, building it, fighting an issue that prevent DEBUG SDK from being used I finally was able to identify some issues that were blocking my shaders.

error X3556: integer modulus may be much slower, try using uints if possible. error X4000: use of potentially uninitialized variable (rayCylinder)

While the first one is a good warning I don't think it is an error and the tools I use didn't flag it so was hard to know.

The second one I was staring at the code and was unable to identify what exactly was causing the issues, I fumbled with the code a few times and just felt the fun drain away.

IMHO: I want it to be fun to develop shaders for windows terminal. Fighting invisible errors are not fun. I am not after building production shaders for Windows Terminal, I want some cool effects. So while I am as a .NET developer always runs with Warning as errors I don't think it's the right option here. Especially since Windows Terminal doesn't tell what is the problem.

However, I understand if the shaders you ship with Windows Terminal should be free of errors and silly mistakes, so I kept the stricter setting in DEBUG mode.

Validation Steps Performed

Loaded Windows Terminal in RELEASE and DEBUG mode and validated that RELEASE mode had reduced strictness but DEBUG retained the previous more restrictive mode.


Thursday 2022-09-15 00:42:44 by ira

Update codebase to use IntoIterator where possible. (#5269)

Remove unnecessary calls to iter()/iter_mut(). Mainly updates the use of queries in our code, docs, and examples.

// From
for _ in list.iter() {
for _ in list.iter_mut() {

// To
for _ in &list {
for _ in &mut list {

We already enable the pedantic lint clippy::explicit_iter_loop inside of Bevy. However, this only warns for a few known types from the standard library.

Note for reviewers

As you can see the additions and deletions are exactly equal. Maybe give it a quick skim to check I didn't sneak in a crypto miner, but you don't have to torture yourself by reading every line. I already experienced enough pain making this PR :)

Co-authored-by: devil-ira [email protected]


Thursday 2022-09-15 01:14:33 by LemonInTheDark

Fixes asset caching (#69852)

The asset was being loaded, seeing that fully_generated is false, so it attempts to rebuild. The rebuilding clears the existing file cache, and fucks us.

Life is pain.


Thursday 2022-09-15 01:21:20 by infecsean

MUAHAHHAHAHAHAA theme

i got a buncha themes man fuck you pee pee poo poo lookin ass bare ass no undie lookin ass chicken tendies lookin ass people fuckin lookin ass


Thursday 2022-09-15 01:35:46 by TwilightAnimeTV

Delete My Girlfriend Cheated on Me With a Senior, so I’m Cheating on Her With His Girlfriend.html


Thursday 2022-09-15 01:54:20 by sniff sniff mmmm, game

mankind knew they could not change society so instead of reflecting upon themselves they blamed the beasts heaven or hell ooh that's a nice jacket you got there wait which one of us is real again beats me duel 1 lets rock


Thursday 2022-09-15 03:10:53 by Andrew Gu

[FSDP] Generalize prefetching; lower unshard/reshard to handle (#83665)

Additional Constructor Changes

  • self.sharding_strategy
    • If the world size is 1, I clamp the sharding strategy to NO_SHARD, regardless of the passed-in sharding strategy, since the behavior is fully equivalent. This absolves the need for p._is_sharded or self.world_size == 1 checks in the core code. Once we fully shift the paradigm to using handles, this should result in a clear net positive. However, for now, we still have some places where we interface directly with the FlatParameter, in which case we have some temporary hacky code.
  • HandleConfig
    • As a part of the new design abstraction, much logic is lowered to the FlatParamHandle. This requires the handle be aware of mixed precision, CPU offloading, sharding strategy, and the process group (for world size > 1). To be less error-prone, I re-defined the dataclasss and enums for the handle. These can be removed and coalesced with the existing ones.
    • The drawback is that the FlattenParamsWrapper constructor now takes in the HandleConfig to forward it to the FlatParamHandle constructor. I tolerate this since we plan to retire the FPW. For now, the handle's process group attributes are set later when we call handle.shard().
    • We will dive into this logic lowering later. For now, the idea is we need to pass some extra info to the handle, which must go through the FPW.
  • FullyShardedDataParallel._shard_parameters() -> FlatParamHandle.shard()
  • [Important] Generalizing attributes to remove the 1 FullyShardedDataParallel : 1 FlatParameter assumption
    • Before: _fsdp_graph_order, _pre_backward_hook_full_params_prefetched, _forward_full_params_prefetched, reshard_after_forward are with respect to 1 FullyShardedDataParallel
    • After: (1) We use FlatParamHandle in place of FullyShardedDataParallel. (2) The atomic unit for forward and pre-backward is a group of handles involved in the same module's forward/pre-backward. This is represented as Tuple[FlatParamHandle, ...]. For now, this is always a singleton tuple, but this shift enables a module having multiple FSDP parameters (which we have use cases for).
  • _reset_lazy_init() attributes
    • The prefetched flags are merged into self._handles_prefetched, which is directly defined in the constructor. reshard_after_forward is retired since it can be fully determined by other attributes (_is_root and sharding_strategy).

FSDP Runtime: Unshard

The first step is to read the existing _rebuild_full_params(). A few notable observations:

  • It returns Tuple[Tensor, bool]. The first element is the padded unsharded flattened parameter, and the second element is whether we can free it upon exiting summon_full_params(). This return value is only used in summon_full_params().
  • If parameter mixed precision is enabled and the FlatParameter is already unsharded, then the low precision shard (_mp_shard) is still re-allocated on GPU. (It is freed at the end of the method.)
  • If CPU offloading is enabled and the FlatParameter is already unsharded, then there is a no-op p.data = p.data.to(self.compute_device, non_blocking=True).
  • Inside summon_full_params(), mixed_precision_cast_ran is always False. Therefore, the return value for the not p._is_sharded and mixed_precision_cast_ran branch is unused. -summon_full_params() can only be called (before forward or after backward) or (between forward and backward). Given this, I cannot think of a case where we call summon_full_params(), the FlatParameter is already unsharded, but reshard_after_forward is True. The FlatParameter should be sharded (before forward or after backward), and the FlatParameter may only be unsharded (between forward and backward) if reshard_after_forward is False.
  • If parameter mixed precision is enabled and the sharding strategy is a sharded one, then inside summon_full_params(), the FlatParameter is unsharded in full precision. This involves allocating a new padded unsharded flattened parameter on GPU in full precision since _full_param_padded is in the low precision.

Some comments:

  • Ideally, we reduce the complexity of the core code path: i.e. unshard for forward and pre-backward. If the return value is only used for summon_full_params(), we should consider if we can compartmentalize that logic.
  • The branching is complex, and some return values are never used, where this fact is not immediately obvious. We should see if we can reduce the branch complexity.

Disclaimer: The difference in attribute semantics between NO_SHARD and the sharded strategies makes it challenging to unify the cases. This PR does not attempt to address that since it requires more design thought. However, it does attempt to reduce the complexity for the sharded strategies.

Unshard: Core Code Path

Let us trace through the new logical unshard.

  1. FullyShardedDataParallel._unshard(self, handles: List[FlatParamHandle], prepare_gradient: bool)
    • This iterates over the handles and calls handle.pre_unshard(), handle.unshard(), and handle.post_unshard(prepare_gradient) in the all-gather stream.
  2. FlatParamHandle.needs_unshard(self)
    • We take an aside to look at this key subroutine.
    • For NO_SHARD, this returns False.
    • For sharded strategies, this checks if the padded unsharded flattened parameter is allocated. The padded unsharded flattened parameter is the base tensor for the unpadded unsharded flattened parameter, which is a view into the padded one. Thus, the padded one's allocation fully determines if the FlatParameter is unsharded.
    • For sharded strategies, to accommodate the parameter mixed precision + summon_full_params() case, we introduce _full_prec_full_param_padded, which is the padded unsharded flattened parameter in full precision. The helper _get_padded_unsharded_flat_param() takes care of this casing and returns the padded unsharded flattened parameter. Instead of allocating a new tensor each time, we manually manage _full_prec_full_param_padded's storage just like for _full_param_padded.
  3. FlatParamHandle.pre_unshard(self)
    • For sharded strategies, the postcondition is that the handle's FlatParameter points to the tensor to all-gather. This should be on the communication device and in the desired precision. The allocation and usage of the low precision shard for parameter mixed precision and the CPU -> GPU copy for CPU offloading both classify naturally in the pre-unshard.
    • For sharded strategies, if the FlatParameter does not need to be unsharded, pre_unshard() is a no-op. This avoids unnecessarily allocating and freeing the low precision shard.
    • For NO_SHARD, we simply preserve the existing semantics.
  4. FlatParamHandle.unshard(self)
    • If the handle was resharded without freeing the padded unsharded flattened parameter (e.g. summon_full_params() between forward and backward when reshard_after_forward=False), then the FlatParameter points to the sharded flattened parameter. We need to switch to using the unsharded parameter. This is a design choice. Alternatively, we may not switch to using the sharded flattened parameter in reshard() if we do not free the padded unsharded flattened parameter. However, the postcondition that the FlatParameter points to the sharded flattened parameter after reshard() is helpful logically, so I prefer this approach.
    • Otherwise, this allocates the padded unsharded flattened parameter, all-gathers, and switches to using the unpadded unsharded flattened parameter.
    • In the future, we may add an option to unshard() that additionally all-gathers the gradient.
  5. FlatParamHandle.post_unshard(self, prepare_gradient: bool)
    • For sharded strategies, if using parameter mixed precision, this frees the low precision shard. More generally, this should free any sharded allocations made in pre_unshard() since the all-gather has been launched. If using CPU offloading, the GPU copy of the local shard goes out of scope after unshard() and is able to be garbage collected. We should understand if there is any performance difference between manually freeing versus deferring to garbage collection since our usage is inconsistent. For now, I preserve the existing semantics here.
    • prepare_gradient is meant to be set to True for the pre-backward unshard and False for the forward unshard. This runs the equivalent logic of _prep_grads_for_backward().
    • This post-unshard logic (notably the gradient preparation) now runs in the all-gather stream, which is fine because we always have the current stream wait for the all-gather stream immediately after FullyShardedDataParallel._unshard(). IIUC, we do not need to call _mp_shard.record_stream(current_stream) (where current_stream is the default stream) because _mp_shard is allocated and freed in the same (all-gather) stream.
    • A postcondition is that the FlatParameter is on the compute device. It should also have the unpadded unsharded size (though I do not have a check for this at the moment).

Unshard: summon_full_params()

Now that we see how the logical unshard has been reorganized for the core code path, let us dive into summon_full_params().

The two constraints are:

  1. If using parameter mixed precision, we should unshard in full precision.
  2. We must determine if we should free the padded unsharded flattened parameter upon exiting.

The first constraint is addressed as described before in the core unshard code path, so it remains to explore the second constraint.

I propose a simple rule: We free iff we actually unshard the FlatParameter in summon_full_params() (i.e. it was not already unsharded). We perform a case analysis:

Parameter mixed precision enabled:

  • NO_SHARD: flat_param.data points to flat_param._local_shard, which is the full precision unsharded flattened parameter. This is not safe to free.
  • FULL_SHARD / SHARD_GRAD_OP: We force full precision and all-gather to _full_prec_full_param_padded. We do not support nested summon_full_params(), so _full_prec_full_param_padded must be unallocated. We unshard, and it is safe to free.

Parameter mixed precision disabled:

  • NO_SHARD: This is the same as with mixed precision enabled. This is not safe to free.
  • FULL_SHARD / SHARD_GRAD_OP: We all-gather to _full_param_padded. It may already be unsharded.
    • Already unsharded: The unshard is a no-op. This is not safe to free.
      • For FULL_SHARD, this can happen for the root FSDP instance after forward() but before backward.
      • For SHARD_GRAD_OP, this can happen for all FSDP instances after forward() but before backward.
    • Needs unshard: We unshard. This is safe to free.

Therefore, we see that it is not safe to free when using NO_SHARD and when using a sharded strategy but the FlatParameter is already unsharded. This is precisely the proposed rule.

There were two notable edge cases that the existing code did not address.

  1. The existing code tests if the FlatParameter is already unsharded by checking the allocation status of _full_param_padded. When using parameter mixed precision, this is the incorrect tensor to check. If _full_param_padded is allocated (e.g. when reshard_after_forward=False and calling summon_full_params() between forward and backward), the already-unsharded check is a false positive, and summon_full_params() does not correctly force full precision. pytorch/pytorch#83068
    • This PR's needs_unshard() check correctly routes to the appropriate padded unsharded flattened parameter depending on the calling context (i.e. if it needs to force full precision or not).
  2. The existing code does not free the GPU copy of the padded unsharded flattened parameter when calling summon_full_params(offload_to_cpu=True). It unshards the FlatParameter, moves the padded unsharded flattened parameter to CPU, and sets the FlatParameter data to be the appropriate unpadded view into the padded unsharded flattened parameter on CPU. However, _full_param_padded still points to the all-gathered padded unsharded flattened parameter on GPU, which is kept in memory. pytorch/pytorch#83076
    • This PR frees the GPU copy and reallocates it upon exiting summon_full_params(). This is essential for avoiding peak GPU memory usage from increasing as we recurse through the module tree. There may be some cases where we can avoid reallocation altogether, but that can be addressed in a follow-up PR.
    • This PR offloads the unpadded unsharded flattened parameter to CPU directly instead of the padded one. As far as I can tell, there is no need to include the padding since unflattening the original parameters does not require the padding.
    • The relevant code is in the context manager FlatParamHandle.to_cpu().

Unshard: Mixed-Precision Stream

This PR removes the mixed precision stream usage. As is, I do not think there is any extra overlap being achieved by the stream usage.

The low precision shard is allocated and copied to in the mixed precision stream (code), and the current stream (in this case the all-gather stream) waits for the mixed precision stream (code). However, we immediately schedule an all-gather that communicates that exact low precision shard (code) with no other meaningful computation between. If we remove the mixed precision stream, the low precision shard is allocated and copied to in the all-gather stream (including the non-blocking CPU -> GPU copy if using CPU offloading).

Under this PR's design, we may consider a "pre-unshard" stream for all logical pre-unshard data transfers if we want to overlap in the future. IIUC, the overlap opportunity exists if there are multiple FlatParameters per module, and we only have the all-gather stream wait for the data transfer corresponding to the local shard it communicates, not the others.

If we agree on removing the mixed-precision stream for now, I will remember to delete it from _init_streams().

FSDP Runtime: Reshard

Like with unshard, the first step is the look at the existing _free_full_params() and _use_param_local_shard(). A few notable observations:

  • For only NO_SHARD, _free_full_params() includes a call to _free_mp_shard().
  • For summon_full_params(), there is a separate _free_full_params_and_use_local_shard() that duplicates the main logic of _free_full_params() and calls _use_param_local_shard().
  • In forward(), if reshard_after_forward=True, we call _free_full_params() and then _free_mp_shard(). Hence, for NO_SHARD, the _free_mp_shard() is a no-op.
  • In the post-backward hook, we typically call _free_full_params() and _free_mp_shard(). The _free_mp_shard() is a no-op for NO_SHARD and if reshard_after_forward=True.

Some comments:

  • The code certainly works, but some of the no-ops are subtle. When possible, we should make it clear when calls are no-ops or not. It is good that the existing code documents that _free_mp_shard() is a no-op in the post-backward hook when reshard_after_forward=True. However, there are still some non-obvious no-ops (around NO_SHARD).
  • We should see if we can avoid the duplicate _free_full_params_and_use_local_shard().

Let us trace through the logical reshard:

  1. FullyShardedDataParallel._reshard(self, handles: List[FlatParamHandle], free_unsharded_flat_params: List[bool])
    • The two args should have the same length since they are to be zipped.
    • The goal of having free_unsharded_flat_params is that the caller should be explicit about whether the (padded) unsharded flattened parameter should be freed. The low precision shard is always meant to be freed (as early as possible), so there is no corresponding List[bool].
  2. FlatParamHandle.reshard(self, free_unsharded_flat_param: bool)
    • This frees the (padded) unsharded flattened parameter if free_unsharded_flat_param and switches to using the sharded flattened parameter.
    • Echoing back to forcing full precision in summon_full_params(), _free_unsharded_flat_param() frees the correct tensor by using _get_padded_unsharded_flat_parameter().
  3. FlatParamHandle.post_reshard(self)
    • I am not fully content with the existence of this method, but this seems to be an unavoidable consequence of NO_SHARD. Perhaps, this may be useful in the future for other reasons though.
    • Right now, this method is only meaningful for NO_SHARD + parameter mixed precision + outside summon_full_params(). _mp_shard is not freed in the post-unshard since it is also the low precision unsharded flattened parameter, so we must delay the free until the the post-reshard.

Below the FlatParamHandle.reshard() and post_reshard() layer, there should not be any no-ops.

One final comment I will mention is that I like the pre_unshard(), unshard(), post_unshard(), and reshard(), post_reshard() organization because it makes it clear what the boundaries are and their temporal relationship. Through that, we can set pre- and post-conditions. Furthermore, we can eventually convert logic to hooks that may be registered on the FlatParamHandle (for pre_unshard(), post_unshard(), and post_reshard()). This may improve the customizability of FSDP.

FSDP Runtime: forward()

  • This PR reorganizes forward() in preparation for non-recursive wrapping, which uses pre-forward and post-forward hooks that expect the signature hook(module, input). For FSDP, the module and input arguments are not used.
  • This PR creates a new method _fsdp_root_pre_forward() to handle the logic only the root FSDP should run.

FSDP Prefetching

Finally, we dive into the prefetching changes. Some highlights:

  1. This PR unifies the execution order validation and prefetching implementations.
    • Both involve the execution order and can be unified to share some boilerplate.
  2. Execution order validation only runs when the distributed debug level is INFO.
    • We have yet to have one success case where we actually catch an unintended source of dynamism. The warning is also too verbose. Hence, we are gating it by the INFO level.
  3. This PR moves prefetching to be with respect to groups of handles (as mentioned in the constructor comment).
    • This is essential for supporting prefetching with non-recursive wrapping.
  4. This PR does not include "bubbles", i.e. modules with no handles, in the recorded execution order(s). This deviates from the existing implementation.
    • This makes prefetching possibly more aggressive (when there are such bubbles), but it should not have significant performance implications either way.
  5. This PR changes backward prefetching to reset the post-forward order each iteration (as intended).
  6. This PR changes forward prefetching to use the first iteration's pre-forward order instead of the first iteration's post-forward order. (We can discuss whether we want this in this PR or not. Otherwise, I can keep it as using the post-forward order to preserve the existing semantics.) This PR also removes the all_gather_stream.wait_stream(current_stream) before forward prefetching because it does not help with high GPU reserved memory. We can add that back if desired.

Appendix

Reverse Post-Forward Order Is Not Always the Pre-Backward Order

The existing PT-D FSDP pre-backward prefetching uses the reverse post-forward order.

Model Code
class Model(nn.Module):
  def __init__(self):
      super().__init__()
      self.block1 = nn.Sequential(
          nn.Conv2d(3, 4, kernel_size=3),
          nn.BatchNorm2d(4),
          nn.ReLU(inplace=True),
      )
      self.block2 = nn.Sequential(
          nn.Conv2d(4, 4, kernel_size=3),
          nn.BatchNorm2d(4),
          nn.ReLU(inplace=False),
      )
      self.block3 = nn.Linear(12, 8)
      self.head = nn.Sequential(
          nn.AdaptiveAvgPool2d(output_size=(1, 1)),
          nn.Flatten(),
          nn.Linear(4, 10),
      )

  def forward(self, x):
      x = self.block1(x)
      x = self.block2(x)
      x = self.block3(x)
      return self.head(x)

model = Model().cuda()
fsdp_kwargs = {}
model.block1[1] = FSDP(model.block1[1], **fsdp_kwargs)  # BN2d
model.block2[1] = FSDP(model.block2[1], **fsdp_kwargs)  # BN2d
model.block1 = FSDP(model.block1, **fsdp_kwargs)
model.block2 = FSDP(model.block2, **fsdp_kwargs)
model.block3 = FSDP(model.block3, **fsdp_kwargs)
model = FSDP(model, **fsdp_kwargs)
Execution Orders
Pre-backward hook for ('head.2.weight', 'head.2.bias') 140339520587136 (model)
Pre-backward hook for ('weight', 'bias') 140339461194656 (block3)
Pre-backward hook for ('0.weight', '0.bias') 140339520589776 (block2)
Pre-backward hook for ('weight', 'bias') 140339520587664 (block2 BN)
Pre-backward hook for ('weight', 'bias') 140339520586656 (block1 BN)
Pre-backward hook for ('0.weight', '0.bias') 140339520588768 (block1)

Pre-forward order:
('head.2.weight', 'head.2.bias') 140339520587136 (model)
('0.weight', '0.bias') 140339520588768 (block1)
('weight', 'bias') 140339520586656 (block1 BN)
('0.weight', '0.bias') 140339520589776 (block2)
('weight', 'bias') 140339520587664 (block2 BN)
('weight', 'bias') 140339461194656 (block3)

Reverse post-forward order:
('head.2.weight', 'head.2.bias') 140339520587136 (model)
('weight', 'bias') 140339461194656 (block3)
('0.weight', '0.bias') 140339520589776 (block2)
('weight', 'bias') 140339520587664 (block2 BN)
('0.weight', '0.bias') 140339520588768 (block1)
('weight', 'bias') 140339520586656 (block1 BN)

Differential Revision: D39293429 Pull Request resolved: pytorch/pytorch#83665 Approved by: https://github.com/zhaojuanmao


Thursday 2022-09-15 04:22:03 by treckstar

Life is one big road with lots of signs. So when you riding through the ruts, don't complicate your mind. Flee from hate, mischief and jealousy. Don't bury your thoughts, put your vision to reality. Wake Up and Live!


Thursday 2022-09-15 05:38:39 by Dylan DPC

Rollup merge of #100473 - compiler-errors:normalize-the-fn-def-sig-plz, r=lcnr

Attempt to normalize FnDef signature in InferCtxt::cmp

Stashes a normalization callback in InferCtxt so that the signature we get from tcx.fn_sig(..).subst(..) in InferCtxt::cmp can be properly normalized, since we cannot expect for it to have normalized types since it comes straight from astconv.

This is kind of a hack, but I will say that @jyn514 found the fact that we present unnormalized types to be very confusing in real life code, and I agree with that feeling. Though altogether I am still a bit unsure about whether this PR is worth the effort, so I'm open to alternatives and/or just closing it outright.

On the other hand, this isn't a ridiculously heavy implementation anyways -- it's less than a hundred lines of changes, and half of that is just miscellaneous cleanup.

This is stacked onto #100471 which is basically unrelated, and it can be rebased off of that when that lands or if needed.


The code:

trait Foo { type Bar; }

impl<T> Foo for T {
    type Bar = i32;
}

fn foo<T>(_: <T as Foo>::Bar) {}

fn needs_i32_ref_fn(f: fn(&'static i32)) {}

fn main() {
    needs_i32_ref_fn(foo::<()>);
}

Before:

   = note: expected fn pointer `fn(&'static i32)`
                 found fn item `fn(<() as Foo>::Bar) {foo::<()>}`

After:

   = note: expected fn pointer `fn(&'static i32)`
                 found fn item `fn(i32) {foo::<()>}`

Thursday 2022-09-15 06:15:38 by David Ordás

Add CodingFantasy's CSS coding interactive games (#5490)

  • Add "Knights of the Flexbox table" game

Welcome to the Knights of the Flexbox table. A game where you can help Sir Frederic Flexbox and his friends to uncover the treasures hidden in the Tailwind CSS dungeons. You can navigate the knight through the dungeon by changing his position within the dungeon using Flexbox and Tailwind CSS.

  • Add "Flex Box Adventure" game

Once upon a time, there was a King Arthur. He ruled his kingdom fair and square. But Arthur had one problem. He was a very naive person. So one sunny day, three alchemist brothers offered Arthur to exchange all his Gold Coins for coins made of a more valuable new metal that they had invented - Bit Coins.

Arthur believed them and gave them all his gold. The brothers took the gold and promised to give the bit coins back to Arthur in seven days.

Seven days passed. The brothers have not turned up. Arthur realized he had been scammed. He is angry and intends to take revenge on them. Let's help him do it with our weapon – CSS Flex Box!

We made this game for You

  1. You often stumble and try to figure out which combination of Flex Box properties makes the browser do what you want it to do.

  2. You want to create complex web layouts without constantly looking at the web page after every Cmd/Ctrl+S press in the code editor.

  3. You have tried to learn Flex Box with video tutorials and articles but still don't fully understand how some parts of it work.

4*. Or, if you are a master of CSS Flex Box, we have something interesting and for you too (read further).

Have you found yourself there? Then you definitely want to learn or improve your Flex Box skills. So we have good news for you, really good news...

Learn Flex Box by Playing Game No more boring videos, tutorials and courses. Learn Flex Box in a completely new, fun, effective and revolutionary way. By playing Flex Box coding game!

  • Add "Grid Attack" coding game

In an ancient Elvish prophecy, it was said that one day a man would be born with an incredible power that predicts the future – "Marketi Predictori." And another will come to take this power. But the years went by and nothing happened. Until one day, a little elf was born. He was named Luke.

From an early age, he surprised his parents and his sister Rey by guessing the price of apples at the farmer's market before they even reached it. Every year his power rose and his predictions became more and more accurate. But there was one thing Luke could not predict. The coming of the demon Valcorian. It was the one from the prophecy that was to come and take Luke's power. One day Valcorian and his army attacked the town where Luke had lived and kidnapped him to make a ritual of stealing his power.

Go on a dangerous quest with Luke's sister Rey and find her brother. Defeat Valcorian and all his demons using a secret weapon – CSS Grid.

We made this game for You?

  1. You often stumble and try to figure out which combination of Grid properties makes the browser do what you want it to do.

  2. You are scared by the number of properties a CSS Grid has, and you feel uncomfortable when you need to create a grid layout.

  3. You want to create complex web layouts using Grid, but without constantly looking at the web page after every "Cmd/Ctrl+S" press in the code editor.

  4. You have tried to learn CSS Grid with video tutorials and articles but still don't fully understand how some parts of it work.

  5. You use a Flex Box where Grid is required because you don't feel confident in using it.

Have you found yourself there? Then you definitely want to learn or improve your Grid skills. So we have good news for you, really good news...

Learn Grid by Playing CSS Game No more boring videos, courses and articles. Learn Grid in a revolutionary new, fun, and effective way. By playing a Grid coding game!


Thursday 2022-09-15 07:31:23 by ykim-yeji

docs: My menu lists for breakfast, lunch, dinner

I'm going to keep memoing and rewriting it.


Thursday 2022-09-15 09:25:29 by Ghan Saridnirun

Update some parameters

Dear Matt and Chloe

I have add the count data of

  1. Single Female: preferred to the female that might mature to be breeder but failed to producing offspring and immature female that didn't ready for breed.

  2. The count data of male which including chick, juvenile, 1 year male, 2 year males, and also the 3 year s male.

However, I found the 3 year male count quite experienced overpopulation due to mature male seem to be more sticked to some area more than other age class, especially in breeding season that they often keep patrol some trail which frequency detected by camera trap causing multiple count in one day period. Thus, I decided to proceed the filter 3 year male count as have done in other age class before to get their number more accurate in Bayesian and MCMC analysis.

So the data of 3 years male would be available soon as I deal with the over count as possible.

Best Regards Ghan Saridniru


Thursday 2022-09-15 10:09:34 by Davide Brini

New output format: iso-efi-initrd

This option was previously not available and required postprocessing of a tar-kernel-initrd output.

Comparison with iso-efi:

iso-efi only loads the kernel at boot, and the root filesystem is mounted from the actual boot media (eg, a CD-ROM - physical or emulated). This can often cause trouble (it has for us) for multiple reasons:

  • the linuxkit kernel might not have the correct drivers built-in for the hardware (see #3154)
  • especially with virtual or emulated CD-ROMs, performance can be abysmal: we saw the case where the server IPMI allowed using a ISO stored in AWS S3 over HTTP...you can imagine what happens when you start doing random I/O on the root fs in that case.
  • The ISO image has the root device name baked in (ie, /dev/sr0) which fails if for some reason the CD-ROM we're running from doesn't end up using that device, so manual tweaking is required (see #2375)

iso-efi-initrd, on the other hand, packs the root filesystem as an initramfs (ie similar to what the raw output does, except that in this case we're preparing an ISO image), so both the kernel and the initramfs are loaded in memory by the boot loader and, once running, we don't need to worry about root devices or kernel drivers (and the speed is good, as everything runs in RAM).

Also, the generated ISO can be copied verbatim (eg with dd) onto a USB media and it still works.

Finally, the image size is much smaller compared to iso-efi.

IMHO, iso-efi-initrd could be used almost anywhere iso-efi would be used, or might even supersede it. I can't think of a scenario where one might explicitly want to use iso-efi.

Points to consider:

  • Not tested under aarch64 as I don't have access to that arch. If the automated CI tests also test that, then it should be fine.
  • I'm not sure what to put inside images.yaml for the iso-efi-initrd image. As it is it works of course (my personal image on docker hub), but I guess it'll have to be some more "official" image. However, that cannot be until this PR is merged, so it's kind of a chicken and egg situation. Please advise.
  • I can look into adding the corresponding iso-bios-initrd builder if there is interest.

cute seal

Signed-off-by: Davide Brini [email protected]


Thursday 2022-09-15 10:22:04 by treckstar

People listen up don't stand so close, I got somethin that you all should know. Holy matrimony is not for me, I'd rather die alone in misery.


Thursday 2022-09-15 10:38:05 by tyra314

Fuck you gitignore

No. build.gradle is neither build nor .gradle. Gosh darn it.


Thursday 2022-09-15 11:00:56 by Xavier Morel

[IMP] *: owlify password meter and convert change password to real wizard

The changes in auth_password_policy are largely the owlification of the password meter widget:

  • modernize the password policy module and convert it to an odoo-module (note: now exports a pseudo-abstract class which is really a policy, for the sake of somewhat sensibly typing recommendations)
  • replace the implementation of the Meter and PasswordField widgets by owl versions

The changes to web and base stem from taking a look at converting the ChangePassword wizard, and finding that it would be a pain in the ass but also... unnecessary? It seems to have been done as a wizard completely in javascript despite being backend-only for legacy reasons: apparently one of the very old web clients (v5 or v6 probably) implemented it as a "native action" which was directly part of the client's UI, and so it had to be implemented entirely in the client.

Over time it was moved back into the regular UI (and moved around quite a bit), hooked as a client action to maintain access to the existing UI / dialog.

But since it's been an action opened via a button for years it can just... be a normal wizard, with password fields, which auth_password_policy can then set the widget of.

So did that:

  • removed the old unnecessary JS, and its dedicated endpoint (which is not used by portal, portal has its own endpoint)
  • used check_identity for the "old password check"
  • split out change_password with an internal bit so we can have a safer (and logged) "set user password" without needing to provide the old password, which is now used for the bulk password change wizard as well
  • added a small wizard which just takes a new password (and confirmation), for safety a given change password wizard is only accessible to their creator (also the wizard is restricted to employees though technically it would probably be fine for portal users as well)

Rather than extensive messy rewrite / monkeypatching (the original wizard was 57 LOC, though also 22 LOC of template, the auth_policy hooking / patching was 33, plus 8 lines of CSS), auth_password_policy just sets the widget of the new_password field in the new wizard, much as it did the bulk wizard.

Also improve the "hide meter if field is empty" feature by leveraging :placeholder-shown. This requires setting a placeholder, and while empty works fine in firefox, it doesn't work in chrome. So the placeholder needs to be a single space. Still, seems better than updating a fake attribute or manipulating a class for the sake of trivial styling.

Notes on unlink + transient vacuum

Although the wizard object is only created when actually calling change_password, and is deleted on success, it is possible for the user to get an error and fail to continue (it should be unlikely without overrides since the passwords are checked while creating / saving but...).

While in that case the new_password in the database is not the user's own, it could be their future password, or give evidence as to their password-creation scheme, or some other signal useful to attack that front of the user's life and behavior. As such, quickly removing leftovers from the database (by setting a very low transient lifetime) seems like a good idea.

This is compounded by the check_identity having a grace period of 10 minutes. 0.1 is 6 minutes, but because the cron runs every 10 the user effectively has 6~10 minutes between the moment they create an incorrect / incomplete version of the wizard and the moment where it is destroyed if they just leave it.

closes odoo/odoo#99458

Signed-off-by: Xavier Morel (xmo) [email protected]


Thursday 2022-09-15 13:42:54 by glozow

Merge bitcoin/bitcoin#24513: CChainState -> Chainstate

00eeb31c7660e2c28f189f77a6905dee946ef408 scripted-diff: rename CChainState -> Chainstate (James O'Beirne)

Pull request description:

Alright alright alright, I know: we hate refactors. We especially hate cosmetic refactors.

Nobody knows better than I that changing broad swaths of code out from under our already-abused collaborators, only to send a cascade of rebase bankruptcies, is annoying at best and sadistic at worst. And for a rename! The indignation!

But just for a second, imagine yourself. Programming bitcoin/bitcoin, on a sandy beach beneath a lapis lazuli sky. You go to type the name of what is probably the most commonly used data structure in the codebase, and you only hit shift once.

What could you do in such a world? You could do anything. The only limit is yourself.


So maybe you like the idea of this patch but really don't want to deal with rebasing. You're in luck!

Here're the commands that will bail you out of rebase bankruptcy:

git rebase -i $(git merge-base HEAD master) \
  -x 'sed -i "s/CChainState/Chainstate/g" $(git ls-files | grep -E ".*\.(py|cpp|h)$") && git commit --amend --no-edit'
# <commit changed?>
git add -u && git rebase --continue

Anyway I'm not sure how serious I am about this, but I figured it was worth proposing. I have decided I am very serious about this.

Maybe we can have nice things every once in a while?

ACKs for top commit: MarcoFalke: cr ACK 00eeb31c7660e2c28f189f77a6905dee946ef408 hebasto: ACK 00eeb31c7660e2c28f189f77a6905dee946ef408 glozow: ACK 00eeb31c7660e2c28f189f77a6905dee946ef408, thanks for being the one to propose this w0xlt: ACK https://github.com/bitcoin/bitcoin/pull/24513/commits/00eeb31c7660e2c28f189f77a6905dee946ef408

Tree-SHA512: b828a99780614a9b74f7a9c347ce0687de6f8d75232840f5ffc26e02bbb25a3b1f5f9deabbe44f82ada01459586ee8452a3ee2da05d1b3c48558c8df6f49e1b1


Thursday 2022-09-15 15:54:00 by SkyratBot

[MIRROR] Dimensional Anomaly [MDB IGNORE] (#15974)

  • Dimensional Anomaly (#69512)

About The Pull Request

Everyone has been asking: "When will there be an anomaly like the bioscrambler, but for the space station? Please, we need more things which replace objects with different objects from the same typepath." Well I made it and it looked like ass because non-tiling floor and walls look terrible, so then I made this instead. Dimensional.mp4

The "dimensional anomaly" shifts matter into a parallel dimension where objects are made out of something else. Like the Bioscrambler anomaly, it does not expire on its own and only leaves when someone signals it or uses an anomaly remover. When it spawns it picks a "theme" and converts terrain around it until it covers a 7x7 square, then it teleports somewhere else and picks a new theme.

A lot of these themes are relatively benign like "meat", "fancy carpet", or "gold". Some of them are kind of annoying like "icebox" because it creates floor which slows you down, or "clown" because bananium is intentionally annoying. Some of them are actively dangerous, mostly "uranium" and "plasma". The main problem this will usually cause for crewmembers is decreasing area security. When it replaces doors it replaces them with ones which don't have any access control, and it will also replace RWalls with normal and much more vulnerable walls which will make breaking and entering significantly easier until someone has taken the time to fix the damage. But also sometimes it will irradiate them, you never know.

The fact that sometimes the changes are benign (or provide uncommon materials) and might be happening in places you don't care about access to might encourage people to push their luck and leave it alone until it starts turning the captain's office into a bamboo room or repainting medbay a fetching shade of flammable purple, which I would consider a success. Armour.mp4

If you successfully harvest the anomaly core you can place it into the reactive armour to get Reactive Barricade Armour, which shifts your dimension when you take damage and attempts to place some randomised (not terribly durable) objects between you and hopefully your attacker (it really just picks up to four random unoccupied tiles next to you). If you're EMPed then the changes it make to the environment will often be as unpleasant for you as they are for a pursuer, and significantly more likely to harm both of you rather than just provide obstacles.

Other changes: I split anomalies out into their own dmi file, seems to be all the rage lately. I moved the anomaly placing code into a datum instead of the event because I wanted to reuse it but if you have a better idea about where I could have put it let me know. This also fixes a bug where the material spreader component wasn't working when I applied plasma materials to something, the extra whitespace was parsing as another argument for some reason and meant it would runtime. Supermatter delamination was still pointing to Delimber anomalies instead of Bioscrambler.

  • Dimensional Anomaly

  • Fixes the upstream merge skew

Co-authored-by: Jacquerel [email protected] Co-authored-by: GoldenAlpharex [email protected] Co-authored-by: GoldenAlpharex [email protected]


Thursday 2022-09-15 15:56:20 by SkyratBot

[MIRROR] Resolves is_banned_from headaches and lag (Speeds up roundstart significantly) [MDB IGNORE] (#16001)

  • Resolves is_banned_from headaches and lag (Speeds up roundstart significantly) (#69376)

About The Pull Request

Just to be clear, when I refer to time here, I am not talking about cpu time. I'm talking about real time. This doesn't significantly reduce the amount of work we do, it just removes a lot of the waiting around we need to do for db calls to finish.

Adds queuing support to sql bans, so if an ongoing ban retrieval query is active any successive ban retrieval attempts will wait for the active query to finish

This uses the number/blocking_query_timeout config option, I hope it's still valid

This system will allow us to precache ban info, in parallel (or in batches) With this, we can avoid needing to setup all uses of is_banned_from to support parallelization or eat the cost of in-series database requests

Clients who join after initialize will now build a ban cache automatically

Those who join before init is done will be gathered by a batch query sent by a new subsystem, SSban_cache.

This means that any post initalize uses of is_banned_from are worst case by NATURE parallel (since the request is already sent, and we're just waiting for the response)

This saves a lot of headache for implementers (users) of the proc, and saves ~0.9 second from roundstart setup for each client (on /tg/station)

There's a lot of in series is_banned_from calls in there, and this nukes them. This should bring down roundstart join times significantly.

It's hard to say exactly how much, since some cases generate the ban cache at other times. At base tho, we save about 0.9 seconds of real time per client off doing this stuff in parallel. Why It's Good For The Game

When I use percentages I'm speaking about cost per player

I don't like how slow roundstart feels, this kills about 66% of that. the rest is a lot of misc things. About 11% (it's actually 16%) is general mob placing which is hard to optimize. 22% is manifest generation, most of which is GetFlatIcons which REALLY do not need to be holding up the main thread of execution.

An additional 1 second is constant cost from a db query we make to tell the server we exist, which can be made async to avoid holding the proc chain.

That's it. I'm bullying someone into working on the manifest issue, so that should just leave 16% of mob placing, which is really not that bad compared to what we have now. Changelog

cl code: The time between the round starting and the game like, actually starting has been reduced by 66% refactor: I've slightly changed how ban caches are generated, admins please let me know if anything goes fuckey server: I'm using the blocking_query_timeout config. Make sure it's up to date and all. /cl

  • Resolves is_banned_from headaches and lag (Speeds up roundstart significantly)

Co-authored-by: LemonInTheDark [email protected]


Thursday 2022-09-15 16:12:57 by Trevor Scheer

Add Jest imports everywhere, remove global availability (#6911)

In ESM mode, the jest globals are technically no longer available. For some reason, we're still getting away with it, but this issue surfaces in the v29 upgrade PR so I'm going to address it separately.

https://jestjs.io/docs/ecmascript-modules#differences-between-esm-and-commonjs

Ref: apollographql/apollo-server#6850 (comment)


Thursday 2022-09-15 16:45:27 by san7890

Rocking The Boat, er, Map Vote (#69561)

  • Rocking The Boat, er, Map Vote

Hey there,

A while ago, I spooke (typo intentional) to some other people. One frustration I heard was the fact that people would sometimes sneak through map votes during the very start of a shift, during a high-paced portion, or just as a meme. People in OOC would then flood the vote, putting in any given station. However, if a vote happens 10 minutes in- and the round goes for 70 minutes and not many of the original players are around, then it's not particularly fair to those who have to play next shift on a map they bemoan.

So, we can rock the vote! If a player isn't particularly chuffed with the hand they are given, they can poll the players to see if they want to change the map as well. If rocking the vote goes through, huzzah, you get the ability to vote for the map again. If it doesn't go through: tough luck. You can rock the vote one time per shift by default, and server operators can change the amount of times you can call to rock the map vote at their discretion. Calling to rock the vote either successfully or non-successfully counts as a "call", and when that limit is exceeded: no more calls.

Does this mean that we will only rotate between two maps because pissants will keep rocking the vote until they get what they like? Maybe? I still see people bemoan getting Tram or shit the bed over IceBox, but I think enough people get sick of bread-on-butter to take the server where it need to go. If operators don't really like seeing only two maps play, they can always adjust the config to ensure it doesn't happen.

  • makes the grammar grammar

it would be "Rock the Vote vote" otherwise


Thursday 2022-09-15 18:14:19 by Joey Huab

core: Refactor Pixel features

  • Magic Eraser is wonky and hard to enable and all this mess isn't really worth the trouble so just stick to the older setup.

  • Default Pixel 5 spoof for Photos and only switch to Pixel XL when spoof is toggled.

  • We will try to bypass 2021 features and Raven props for non-Pixel 2021 devices as apps usage requires TPU.

  • Remove P21 experience system feature check


Thursday 2022-09-15 19:16:19 by Linus Torvalds

sparse: introduce conditional lock acquire function attribute

The kernel tends to try to avoid conditional locking semantics because it makes it harder to think about and statically check locking rules, but we do have a few fundamental locking primitives that take locks conditionally - most obviously the 'trylock' functions.

That has always been a problem for 'sparse' checking for locking imbalance, and we've had a special '__cond_lock()' macro that we've used to let sparse know how the locking works:

# define __cond_lock(x,c)        ((c) ? ({ __acquire(x); 1; }) : 0)

so that you can then use this to tell sparse that (for example) the spinlock trylock macro ends up acquiring the lock when it succeeds, but not when it fails:

#define raw_spin_trylock(lock)  __cond_lock(lock, _raw_spin_trylock(lock))

and then sparse can follow along the locking rules when you have code like

    if (!spin_trylock(&dentry->d_lock))
            return LRU_SKIP;
.. sparse sees that the lock is held here..
    spin_unlock(&dentry->d_lock);

and sparse ends up happy about the lock contexts.

However, this '__cond_lock()' use does result in very ugly header files, and requires you to basically wrap the real function with that macro that uses '__cond_lock'. Which has made PeterZ NAK things that try to fix sparse warnings over the years [1].

To solve this, there is now a very experimental patch to sparse that basically does the exact same thing as '__cond_lock()' did, but using a function attribute instead. That seems to make PeterZ happy [2].

Note that this does not replace existing use of '__cond_lock()', but only exposes the new proposed attribute and uses it for the previously unannotated 'refcount_dec_and_lock()' family of functions.

For existing sparse installations, this will make no difference (a negative output context was ignored), but if you have the experimental sparse patch it will make sparse now understand code that uses those functions, the same way '__cond_lock()' makes sparse understand the very similar 'atomic_dec_and_lock()' uses that have the old '__cond_lock()' annotations.

Note that in some cases this will silence existing context imbalance warnings. But in other cases it may end up exposing new sparse warnings for code that sparse just didn't see the locking for at all before.

This is a trial, in other words. I'd expect that if it ends up being successful, and new sparse releases end up having this new attribute, we'll migrate the old-style '__cond_lock()' users to use the new-style '__cond_acquires' function attribute.

The actual experimental sparse patch was posted in [3].

Link: https://lore.kernel.org/all/[email protected]/ [1] Link: https://lore.kernel.org/all/[email protected]/ [2] Link: https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/ [3] Acked-by: Peter Zijlstra [email protected] Cc: Alexander Aring [email protected] Cc: Luc Van Oostenryck [email protected] Signed-off-by: Linus Torvalds [email protected]


Thursday 2022-09-15 19:17:34 by SkyratBot

[MIRROR] Refactors SM gas behavior to be datum based instead of list based + powerloss co2 buff [MDB IGNORE] (#16000)

  • Refactors SM gas behavior to be datum based instead of list based + powerloss co2 buff (#69158)

About The Pull Request

Title! The CO2 thing is there because it makes my job much easier. Can probably find a way to make it move slowly if a maint insist on it. Prefer not to though.

Drafting because I want to make a second PR that have more sweeping changes (clean vars up, make a simpler formula for damage and heat production, delete underused behaviors, etc). Would honestly prefer if both this and that gets merged at the same time but I'm separating it out since it might be rejected. Or maybe ill combine it here we'll see. Ignore that, looks like i can keep this one absolutely atomic. Why It's Good For The Game

Had a lot of trouble when trying to document the SM gas interactions into the wiki, the interactions are all scattered and tracking down everything a gas does is extremely annoying. Hopefully this fixes that. Changelog

cl balance: CO2 powerloss inhibition now works immediately based on gas composition instead of slowly ramping up. refactor: refactored how the SM fetches it's gas info and data. No changes expected except for the co2 thing. /cl

  • Refactors SM gas behavior to be datum based instead of list based + powerloss co2 buff

Co-authored-by: vincentiusvin [email protected]


Thursday 2022-09-15 20:48:54 by LordPapalus

Cargonia, but with less weapons (#4370)

  • Cargonia is no more

Fuck you

  • PLEASE

  • Zandario, I will suck your PP


Thursday 2022-09-15 22:00:59 by Conga Lyne

Master of Uno Nerf

--Removed Polymorphine from Master of Transmuation material changing --Removed Master of Transmutation from Fungicaves pool ++Added Master of Transmutation to Lukki Lair pool.

I believe my original idea here was to have him in a tight space to show off his powers better, but I forgot to consider power scaling, he was a bit too strong for the area relative to everything else around him. Additionally the polymorph liquid transmutation was cool, but ultimately too dangerous unless I want to up the creature to be in heaven/hell in my opinion. Heaven is where poly liquid spam is everywhere in the base game, and I want Uno to appear elsewhere.


Thursday 2022-09-15 22:03:22 by M2rsh

Fuck you AutisticMOFO I'm staying with my cums and balls


Thursday 2022-09-15 22:46:45 by Jason A. Donenfeld

random: implement getrandom() in vDSO

Changes v2->v3:

Big changes:

Thomas' previous objection was two-fold: 1) vgetrandom should really have the same function signature as getrandom, in addition to all of the same behavior, and 2) having vgetrandom_alloc be a vDSO function doesn't make sense, because it doesn't actually need anything from the VDSO data page and it doesn't correspond to an existing syscall.

After a discussion at Plumbers this last week, we devised the following ways to fix these: 1) we make the opque state argument be the last argument of vgetrandom, since the real syscall ignores it, and that way all the registers are the same, and no behavior changes, and 2) we make vgetrandom_alloc a syscall, rather than a vDSO function, which also gives it added flexibility for the future, which is good.

Making those changes also reduced the size of this patch a bit.

Smaller changes:

  • Properly add buffer offset position.
  • Don't EXPORT_SYMBOL for vDSO code.
  • Account for timens and vvar being in swapped pages.

Two statements:

  1. Userspace wants faster cryptographically secure random numbers of arbitrary size, big or small.

  2. Userspace is currently unable to safely roll its own RNG with the same security profile as getrandom().

Statement (1) has been debated for years, with arguments ranging from "we need faster cryptographically secure card shuffling!" to "the only things that actually need good randomness are keys, which are few and far between" to "actually, TLS CBC nonces are frequent" and so on. I don't intend to wade into that debate substantially, except to note that recently glibc added arc4random(), whose goal is to return a cryptographically secure uint32_t, and there are real user reports of it being too slow. So here we are.

Statement (2) is more interesting. The kernel is the nexus of all entropic inputs that influence the RNG. It is in the best position, and probably the only position, to decide anything at all about the current state of the RNG and of its entropy. One of the things it uniquely knows about is when reseeding is necessary.

For example, when a virtual machine is forked, restored, or duplicated, it's imparative that the RNG doesn't generate the same outputs. For this reason, there's a small protocol between hypervisors and the kernel that indicates this has happened, alongside some ID, which the RNG uses to immediately reseed, so as not to return the same numbers. Were userspace to expand a getrandom() seed from time T1 for the next hour, and at some point T2 < hour, the virtual machine forked, userspace would continue to provide the same numbers to two (or more) different virtual machines, resulting in potential cryptographic catastrophe. Something similar happens on resuming from hibernation (or even suspend), with various compromise scenarios there in mind.

There's a more general reason why userspace rolling its own RNG from a getrandom() seed is fraught. There's a lot of attention paid to this particular Linuxism we have of the RNG being initialized and thus non-blocking or uninitialized and thus blocking until it is initialized. These are our Two Big States that many hold to be the holy differentiating factor between safe and not safe, between cryptographically secure and garbage. The fact is, however, that the distinction between these two states is a hand-wavy wishy-washy inexact approximation. Outside of a few exceptional cases (e.g. a HW RNG is available), we actually don't really ever know with any rigor at all when the RNG is safe and ready (nor when it's compromised). We do the best we can to "estimate" it, but entropy estimation is fundamentally impossible in the general case. So really, we're just doing guess work, and hoping it's good and conservative enough. Let's then assume that there's always some potential error involved in this differentiator.

In fact, under the surface, the RNG is engineered around a different principal, and that is trying to use new entropic inputs regularly and at the right specific moments in time. For example, close to boot time, the RNG reseeds itself more often than later. At certain events, like VM fork, the RNG reseeds itself immediately. The various heuristics for when the RNG will use new entropy and how often is really a core aspect of what the RNG has some potential to do decently enough (and something that will probably continue to improve in the future from random.c's present set of algorithms). So in your mind, put away the metal attachment to the Two Big States, which represent an approximation with a potential margin of error. Instead keep in mind that the RNG's primary operating heuristic is how often and exactly when it's going to reseed.

So, if userspace takes a seed from getrandom() at point T1, and uses it for the next hour (or N megabytes or some other meaningless metric), during that time, potential errors in the Two Big States approximation are amplified. During that time potential reseeds are being lost, forgotten, not reflected in the output stream. That's not good.

The simplest statement you could make is that userspace RNGs that expand a getrandom() seed at some point T1 are nearly always worse, in some way, than just calling getrandom() every time a random number is desired.

For those reasons, after some discussion on libc-alpha, glibc's arc4random() now just calls getrandom() on each invocation. That's trivially safe, and gives us latitude to then make the safe thing faster without becoming unsafe at our leasure. Card shuffling isn't particularly fast, however.

How do we rectify this? By putting a safe implementation of getrandom() in the vDSO, which has access to whatever information a particular iteration of random.c is using to make its decisions. I use that careful language of "particular iteration of random.c", because the set of things that a vDSO getrandom() implementation might need for making decisions as good as the kernel's will likely change over time. This isn't just a matter of exporting certain data to userspace. We're not going to commit to a "data API" where the various heuristics used are exposed, locking in how the kernel works for decades to come, and then leave it to various userspaces to roll something on top and shoot themselves in the foot and have all sorts of complexity disasters. Rather, vDSO getrandom() is supposed to be the same exact algorithm that runs in the kernel, except it's been hoisted into userspace as much as possible. And so vDSO getrandom() and kernel getrandom() will always mirror each other hermetically.

API-wise, the vDSO gains one function:

ssize_t vgetrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state);

The return value and the first 3 arguments are the same as ordinary getrandom(), while the last argument is a pointer to some state allocated with vgetrandom_alloc(), explained below. Were all four arguments passed to the getrandom syscall, nothing different would happen, and the functions would have the exact same behavior.

Then, we introduce a new syscall:

void *vgetrandom_alloc([inout] size_t *num, [out] size_t *size_per_each, unsigned int flags);

This takes the desired number of opaque states in num, and returns a pointer to an array of opaque states, the number actually allocated back in num, and the size in bytes of each one in size_per_each, enabling a libc to slice up the returned array into a state per each thread. (The flags argument is always zero for now.) We very intentionally do not leave state allocation up to the caller of vgetrandom, but provide vgetrandom_alloc for that allocation. There are too many weird things that can go wrong, and it's important that vDSO does not provide too generic of a mechanism. It's not going to store its state in just any old memory address. It'll do it only in ones it allocates.

Right now this means it's a mlock'd page with WIPEONFORK set. In the future maybe there will be other interesting page flags or anti-heartbleed measures, or other platform-specific kernel-specific things that can be set from the syscall. Again, it's important that the kernel has a say in how this works rather than agreeing to operate on any old address; memory isn't neutral. The syscall currently accomplishes this with a call to vm_mmap() and then a call to do_madvise(). It'd be nice to do this all at once, but I'm not sure that a helper function exists for that now, and it seems a bit premature to add one, at least for now.

The interesting meat of the implementation is in lib/vdso/getrandom.c, as generic C code, and it aims to mainly follow random.c's buffered fast key erasure logic. Before the RNG is initialized, it falls back to the syscall. Right now it uses a simple generation counter and a timestamp to make its decisions on reseeding; this covers many cases, but not all, so this RFC still has a little bit of improvement work to do. But it should give you the general idea.

The actual place that has the most work to do is in all of the other files. Most of the vDSO shared page infrastructure is centered around gettimeofday, and so the main structs are all in arrays for different timestamp types, and attached to time namespaces, and so forth. I've done the best I could to add onto this in an unintrusive way, but you'll notice almost immediately from glancing at the code that it still needs some untangling work. This also only works on x86 at the moment. I could certainly use a hand with this part.

So far in my test results, performance is pretty stellar (around 15x for uint32_t generation), and it seems to be working. There are a couple TODO bits with the actual random.c lifetime integration, but as an early patchset, this at least introduces the intended interface.

Cc: [email protected] Cc: [email protected] Cc: Thomas Gleixner [email protected] Cc: Adhemerval Zanella Netto [email protected] Cc: Carlos O'Donell [email protected] Signed-off-by: Jason A. Donenfeld [email protected]


< 2022-09-15 >