Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(cli): OOM with deeply nested stacks #33064

Open
1 task
rehanvdm opened this issue Jan 22, 2025 · 3 comments
Open
1 task

(cli): OOM with deeply nested stacks #33064

rehanvdm opened this issue Jan 22, 2025 · 3 comments
Labels
bug This issue is a bug. p2 package/tools Related to AWS CDK Tools or CLI

Comments

@rehanvdm
Copy link

Describe the bug

When stacks depend on each other in a long chain or if there are many stacks then the CLI slows down. In certain conditions, it will throw an Out Out Memory (OOM) error.

This is better explained with an example:

  • Each Wave has Stacks in it.
  • Each Stack in a Wave will depend on all the stacks in the previous waves.
  • This is so that the stacks within a wave can be deployed in parallel (the --concurrency flag) and that the next Wave only starts after all the stacks in the previous Wave have been completed.

This is how it looks visually, 3 Waves, each with 2 stacks.

flowchart TB
    subgraph Wave1 [Wave 1]
        Wave1Stack1[Stack 1]
        Wave1Stack2[Stack 2]
    end
    subgraph Wave2 [Wave 2]
        Wave2Stack1[Stack 1]
        Wave2Stack2[Stack 2]
    end
    subgraph Wave3 [Wave 3]
        Wave3Stack1[Stack 1]
        Wave3Stack2[Stack 2]
    end

    Wave2Stack1 --> Wave1Stack1
    Wave2Stack1 --> Wave1Stack2
 
    Wave2Stack2 --> Wave1Stack1
    Wave2Stack2 --> Wave1Stack2

    Wave3Stack1 --> Wave2Stack1
    Wave3Stack1 --> Wave2Stack2

    Wave3Stack2 --> Wave2Stack1
    Wave3Stack2 --> Wave2Stack2
Loading

Regression Issue

  • Select this option if this issue appears to be a regression.

Last Known Working CDK Version

No response

Expected Behavior

The CDK should not be limited by the number of "nesting" of stack dependencies

Current Behavior

It slows to a crawl and throws OOM for anything over 10 Waves with 5 stacks in them for me on a Mac M2 Pro, 16 GB, without touching NODE_OPTIONS.

Reproduction Steps

I have prepared this repo: https://github.com/rehanvdm/cdk-issue-slow-stack-dependencies

But the code is as simple as:

const app = new App();
const numberOfWaves = 9;
const numberOfStacks = 10;

function createWaveStacks(app: App, waveNumber: number, numberOfStacks: number,
                          previousWaveStacks: Stack[] = []): Stack[] {
  const waveStacks: Stack[] = [];
  for (let i = 0; i < numberOfStacks; i++) {
    const stack = new Stack(app, `CdkWave${waveNumber}-${i}`);
    waveStacks.push(stack);

    for (const previousStack of previousWaveStacks) {
      stack.addDependency(previousStack);
      console.log(`${new Date().toLocaleTimeString()} - Adding dependency from stack ${previousStack.node.id} to stack ${stack.node.id}`);
    }
  }
  return waveStacks;
}

let previousWaveStacks: Stack[] = [];
for (let waveNumber = 1; waveNumber <= numberOfWaves; waveNumber++) {
  previousWaveStacks = createWaveStacks(app, waveNumber, numberOfStacks, previousWaveStacks);
}

app.synth();

The following command is ran to just do the synth, the same thing happens on all CDK commands:

cdk ls --debug --v --vv

Possible Solution

No response

Additional Information/Context

I will show reduced logs of two scenarios:

  1. Waves = 10, Stacks = 5. Where the console.log after the .addDependency is fast, but then after the last one there are multiple seconds that pass before the next debug output and the command completes.
  2. Waves = 9, Stacks = 10. Where the console.log after the .addDependency visibly slows down, where seconds pass between these calls. These also finish printing, then multiple seconds pass before an OOM error is thrown.

Scenario 1 Logs

Notice 18 seconds since the last console.log and next CDK CLI output.

rehanvandermerwe@Mac cdk-issue-slow-stack-dependencies % cdk ls --debug --v --vv
[16:05:44] CDK toolkit version: 2.176.0 (build 899965d)
[16:05:44] Command line arguments: {
  _: [ 'ls' ],
  debug: true,
...
[16:05:44] env: {
  CDK_DEFAULT_REGION: '1',
  CDK_DEBUG: 'true',
  CDK_OUTDIR: 'cdk.out',
  CDK_CLI_ASM_VERSION: '39.0.0',
  CDK_CLI_VERSION: '2.176.0'
}
4:05:47 PM - Adding dependency from stack CdkWave1-0 to stack CdkWave2-0
4:05:47 PM - Adding dependency from stack CdkWave1-1 to stack CdkWave2-0
4:05:47 PM - Adding dependency from stack CdkWave1-2 to stack CdkWave2-0
4:05:47 PM - Adding dependency from stack CdkWave1-3 to stack CdkWave2-0
4:05:47 PM - Adding dependency from stack CdkWave1-4 to stack CdkWave2-0
4:05:47 PM - Adding dependency from stack CdkWave1-0 to stack CdkWave2-1
4:05:47 PM - Adding dependency from stack CdkWave1-1 to stack CdkWave2-1
4:05:47 PM - Adding dependency from stack CdkWave1-2 to stack CdkWave2-1
4:05:47 PM - Adding dependency from stack CdkWave1-3 to stack CdkWave2-1
4:05:47 PM - Adding dependency from stack CdkWave1-4 to stack CdkWave2-1
4:05:47 PM - Adding dependency from stack CdkWave1-0 to stack CdkWave2-2
4:05:47 PM - Adding dependency from stack CdkWave1-1 to stack CdkWave2-2
4:05:47 PM - Adding dependency from stack CdkWave1-2 to stack CdkWave2-2
...
4:05:48 PM - Adding dependency from stack CdkWave9-2 to stack CdkWave10-3
4:05:48 PM - Adding dependency from stack CdkWave9-3 to stack CdkWave10-3
4:05:48 PM - Adding dependency from stack CdkWave9-4 to stack CdkWave10-3
4:05:48 PM - Adding dependency from stack CdkWave9-0 to stack CdkWave10-4
4:05:48 PM - Adding dependency from stack CdkWave9-1 to stack CdkWave10-4
4:05:48 PM - Adding dependency from stack CdkWave9-2 to stack CdkWave10-4
4:05:48 PM - Adding dependency from stack CdkWave9-3 to stack CdkWave10-4
4:05:48 PM - Adding dependency from stack CdkWave9-4 to stack CdkWave10-4
CdkWave1-0
CdkWave1-1
CdkWave1-2
CdkWave1-3
CdkWave1-4
CdkWave2-0
...
CdkWave9-4
CdkWave10-0
CdkWave10-1
CdkWave10-2
CdkWave10-3
CdkWave10-4
[16:06:06] Reading cached notices from /Users/rehanvandermerwe/.cdk/cache/notices.json

Scenario 2 Logs

rehanvandermerwe@Mac cdk-issue-slow-stack-dependencies % cdk ls --debug --v --vv
[16:07:34] CDK toolkit version: 2.176.0 (build 899965d)
[16:07:34] Command line arguments: {
  _: [ 'ls' ],
  debug: true,
...
[16:07:34] env: {
  CDK_DEFAULT_REGION: '1',
  CDK_DEBUG: 'true',
  CDK_OUTDIR: 'cdk.out',
  CDK_CLI_ASM_VERSION: '39.0.0',
  CDK_CLI_VERSION: '2.176.0'
}
4:07:38 PM - Adding dependency from stack CdkWave1-0 to stack CdkWave2-0
4:07:38 PM - Adding dependency from stack CdkWave1-1 to stack CdkWave2-0
4:07:38 PM - Adding dependency from stack CdkWave1-2 to stack CdkWave2-0
4:07:38 PM - Adding dependency from stack CdkWave1-3 to stack CdkWave2-0
4:07:38 PM - Adding dependency from stack CdkWave1-4 to stack CdkWave2-0
4:07:38 PM - Adding dependency from stack CdkWave1-5 to stack CdkWave2-0
4:07:38 PM - Adding dependency from stack CdkWave1-6 to stack CdkWave2-0
4:07:38 PM - Adding dependency from stack CdkWave1-7 to stack CdkWave2-0
4:07:38 PM - Adding dependency from stack CdkWave1-8 to stack CdkWave2-0
4:07:38 PM - Adding dependency from stack CdkWave1-9 to stack CdkWave2-0
4:07:38 PM - Adding dependency from stack CdkWave1-0 to stack CdkWave2-1
4:07:38 PM - Adding dependency from stack CdkWave1-1 to stack CdkWave2-1
4:07:38 PM - Adding dependency from stack CdkWave1-2 to stack CdkWave2-1
...
4:07:38 PM - Adding dependency from stack CdkWave7-5 to stack CdkWave8-0
4:07:38 PM - Adding dependency from stack CdkWave7-6 to stack CdkWave8-0
4:07:38 PM - Adding dependency from stack CdkWave7-7 to stack CdkWave8-0
4:07:38 PM - Adding dependency from stack CdkWave7-8 to stack CdkWave8-0
4:07:38 PM - Adding dependency from stack CdkWave7-9 to stack CdkWave8-0
4:07:39 PM - Adding dependency from stack CdkWave7-0 to stack CdkWave8-1
4:07:39 PM - Adding dependency from stack CdkWave7-1 to stack CdkWave8-1
4:07:39 PM - Adding dependency from stack CdkWave7-2 to stack CdkWave8-1
4:07:39 PM - Adding dependency from stack CdkWave7-3 to stack CdkWave8-1
...
4:07:39 PM - Adding dependency from stack CdkWave7-6 to stack CdkWave8-4
4:07:39 PM - Adding dependency from stack CdkWave7-7 to stack CdkWave8-4
4:07:40 PM - Adding dependency from stack CdkWave7-8 to stack CdkWave8-4
...
4:07:40 PM - Adding dependency from stack CdkWave7-6 to stack CdkWave8-8
4:07:40 PM - Adding dependency from stack CdkWave7-7 to stack CdkWave8-8
4:07:40 PM - Adding dependency from stack CdkWave7-8 to stack CdkWave8-8
4:07:41 PM - Adding dependency from stack CdkWave7-9 to stack CdkWave8-8
4:07:41 PM - Adding dependency from stack CdkWave7-0 to stack CdkWave8-9
...
4:08:00 PM - Adding dependency from stack CdkWave8-6 to stack CdkWave9-7
4:08:00 PM - Adding dependency from stack CdkWave8-7 to stack CdkWave9-7
4:08:00 PM - Adding dependency from stack CdkWave8-8 to stack CdkWave9-7
4:08:01 PM - Adding dependency from stack CdkWave8-9 to stack CdkWave9-7
4:08:01 PM - Adding dependency from stack CdkWave8-0 to stack CdkWave9-8
4:08:01 PM - Adding dependency from stack CdkWave8-1 to stack CdkWave9-8
4:08:01 PM - Adding dependency from stack CdkWave8-2 to stack CdkWave9-8
4:08:02 PM - Adding dependency from stack CdkWave8-3 to stack CdkWave9-8
4:08:02 PM - Adding dependency from stack CdkWave8-4 to stack CdkWave9-8
4:08:02 PM - Adding dependency from stack CdkWave8-5 to stack CdkWave9-8
4:08:02 PM - Adding dependency from stack CdkWave8-6 to stack CdkWave9-8
4:08:03 PM - Adding dependency from stack CdkWave8-7 to stack CdkWave9-8
4:08:03 PM - Adding dependency from stack CdkWave8-8 to stack CdkWave9-8
4:08:03 PM - Adding dependency from stack CdkWave8-9 to stack CdkWave9-8
4:08:03 PM - Adding dependency from stack CdkWave8-0 to stack CdkWave9-9
4:08:04 PM - Adding dependency from stack CdkWave8-1 to stack CdkWave9-9
4:08:04 PM - Adding dependency from stack CdkWave8-2 to stack CdkWave9-9
4:08:04 PM - Adding dependency from stack CdkWave8-3 to stack CdkWave9-9
4:08:04 PM - Adding dependency from stack CdkWave8-4 to stack CdkWave9-9
4:08:04 PM - Adding dependency from stack CdkWave8-5 to stack CdkWave9-9
4:08:05 PM - Adding dependency from stack CdkWave8-6 to stack CdkWave9-9
4:08:05 PM - Adding dependency from stack CdkWave8-7 to stack CdkWave9-9
4:08:05 PM - Adding dependency from stack CdkWave8-8 to stack CdkWave9-9
4:08:05 PM - Adding dependency from stack CdkWave8-9 to stack CdkWave9-9

<--- Last few GCs --->

[15014:0x120008000]   187748 ms: Mark-Compact 4057.8 (4136.6) -> 4049.9 (4136.8) MB, 3360.92 / 0.00 ms  (average mu = 0.130, current mu = 0.003) allocation failure; scavenge might not succeed
[15014:0x120008000]   190519 ms: Mark-Compact 4058.0 (4136.8) -> 4050.1 (4137.1) MB, 2761.79 / 0.00 ms  (average mu = 0.075, current mu = 0.003) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0x101080c3c node::OOMErrorHandler(char const*, v8::OOMDetails const&) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
 2: 0x101208774 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
 3: 0x1013dce48 v8::internal::Heap::GarbageCollectionReasonToString(v8::internal::GarbageCollectionReason) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
 4: 0x1013e0cfc v8::internal::Heap::CollectGarbageShared(v8::internal::LocalHeap*, v8::internal::GarbageCollectionReason) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
 5: 0x1013dd760 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
 6: 0x1013db4e8 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
 7: 0x1013d213c v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
 8: 0x1013d299c v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
 9: 0x1013b7a0c v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
10: 0x10179f71c v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
11: 0x101afcc44 Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
12: 0x101a768a8 Builtins_GrowFastSmiOrObjectElements [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
13: 0x10705dfbc 
14: 0x10705ffc0 
15: 0x10705ffc0 
16: 0x10705ffc0 
17: 0x10705ffc0 
18: 0x10705ffc0 
19: 0x107064944 
20: 0x101a743e4 Builtins_InterpreterEntryTrampoline [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
21: 0x101aab210 Builtins_AsyncFunctionAwaitResolveClosure [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
22: 0x101b58fb8 Builtins_PromiseFulfillReactionJob [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
23: 0x101a9ab94 Builtins_RunMicrotasks [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
24: 0x101a723f4 Builtins_JSRunMicrotasksEntry [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
25: 0x10134a2c8 v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
26: 0x10134a7b4 v8::internal::(anonymous namespace)::InvokeWithTryCatch(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
27: 0x10134a990 v8::internal::Execution::TryRunMicrotasks(v8::internal::Isolate*, v8::internal::MicrotaskQueue*) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
28: 0x101371b5c v8::internal::MicrotaskQueue::RunMicrotasks(v8::internal::Isolate*) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
29: 0x1013722f8 v8::internal::MicrotaskQueue::PerformCheckpoint(v8::Isolate*) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
30: 0x100facc4c node::InternalCallbackScope::Close() [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
31: 0x100fac7bc node::InternalCallbackScope::~InternalCallbackScope() [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
32: 0x10107a16c node::fs::FSReqPromise<node::AliasedBufferBase<double, v8::Float64Array>>::Resolve(v8::Local<v8::Value>) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
33: 0x101086b28 node::fs::AfterNoArgs(uv_fs_s*) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
34: 0x10107ac24 node::MakeLibuvRequestCallback<uv_fs_s, void (*)(uv_fs_s*)>::Wrapper(uv_fs_s*) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
35: 0x101a50228 uv__work_done [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
36: 0x101a53c78 uv__async_io [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
37: 0x101a65d50 uv__io_poll [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
38: 0x101a5423c uv_run [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
39: 0x100fad6f0 node::SpinEventLoopInternal(node::Environment*) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
40: 0x1010c08c8 node::NodeMainInstance::Run(node::ExitCode*, node::Environment*) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
41: 0x1010c05dc node::NodeMainInstance::Run() [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
42: 0x1010487f4 node::Start(int, char**) [/Users/rehanvandermerwe/.volta/tools/image/node/20.15.0/bin/node]
43: 0x19afdc274 start [/usr/lib/dyld]
rehanvandermerwe@Mac cdk-issue-slow-stack-dependencies % 

CDK CLI Version

2.176.0

Framework Version

2.176.0

Node.js Version

v20.15.0

OS

Mac M2

Language

TypeScript

Language Version

No response

Other information

No response

@rehanvdm rehanvdm added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 22, 2025
@github-actions github-actions bot added the package/tools Related to AWS CDK Tools or CLI label Jan 22, 2025
@rehanvdm
Copy link
Author

I have collected all other possibly related memory issues:

#24183
#3325
#28606
#32624
aws/jsii#745
hashicorp/terraform-cdk#1264
aws/jsii#3091

@ashishdhingra ashishdhingra added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Jan 22, 2025
@ashishdhingra
Copy link
Contributor

@rehanvdm Good afternoon. Could you please check if guidance provided in https://stackoverflow.com/questions/48387040/how-do-i-determine-the-correct-max-old-space-size-for-node-js works for you? I'm unsure if we have any guidance or baseline defined for your scenario.

@ashishdhingra ashishdhingra added p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Jan 22, 2025
@rehanvdm
Copy link
Author

That does not seem to make a difference, if even at all. I tried both these methods:

  1. First try:
NODE_OPTIONS=--max_old_space_size=4096 cdk ls --debug --v --vv
  1. Second try, by changing app in cdk.json:
"app": "NODE_OPTIONS=--max_old_space_size=4096 npx ts-node --prefer-ts-exts bin/cdk-issue-slow-stack-dependencies.ts",

Then running just:

cdk ls --debug --v --vv

This is why I mentioned in the ticket that I did not mess with any NODE_OPTIONS. Because it is an uphill battle at this point. Somewhere in the CDK CLI code something is recursing and can not handle deeply nested dependencies.

Even if it worked, it's just treating the symptoms and not the problem. Adding 1 more wave or stack can send it flying off and error on OOM again.

It's also worth noting that the CPU for the node process is at 100% the whole time running the tests.

I have found that the stackDependencyReasons function here is called recursively and is one bottleneck. It checks for cyclic dependencies here.

Uncommenting it, as a proof for debugging, I noticed that the .addDependency function completed as expected all within milliseconds of each other, not seconds like the logs showed at the end. But then after that it still OOM errors,

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 package/tools Related to AWS CDK Tools or CLI
Projects
None yet
Development

No branches or pull requests

2 participants