Ensure tests call helm uninstall to remove all resources on teardown #122

ryanohnemus · 2024-01-12T16:11:03Z

This has multiple fixes:

Tests that were not calling helm uninstall <helm-release> were leaving resources in the cluster such as ClusterRole and ClusterRoleBindings that would conflict with follow up tests that also had rbac.create: true, causing test failures
chunk-rollover test:
a) MacOS Run Fix - Updated the test to remove in-place seds (sed -i) as the sed -i option works differently on a mac. One in-place sed was replaced by using an environment variable in the config, the other just passes it's output directly to kubectl apply -f -
b) this test specifically had created it's own rbac files that were providing the same permissions as the ones that can be created via the helm with rbac.create: true. It was also not cleaning these up afterwards and leaving them around in cluster scope with the same name as the ones that would be created via the helm chart (if you do not use a fullnameOverride). Since this is the first test to run, any test after this that had rbac.create: true would fail to run. The individualized logic to add these resources to the cluster has been removed, along with these files, and rbac.create:true has been set in the test's fluentbit-basic.yaml file. The helm clean-up as described in 1) cleans these up so there are not conflicts with tests afterwards.
c) this test should probably be reviewed in the future. There's a forced 3 minute sleep that checks every 10s for pods that aren't running (via --field-selector=status.phase!=Running). Pods could restart and be back in phase running within 10 seconds (if i'm not mistaken), so not sure if the intent was to check for crashes/restarts but I don't think this would work. All it seems like it's doing right now is making full CI runs take an 3 extra minutes.
kubernetes-plugins test:
a) did some re-organization for future tests by adding a common deploy_fluent_bit function. This will set a FLUENTBIT_POD_NAME that can be used by tests.
b) I was running into a weird label search (k8s pod list) return timing issue when using run kubectl get pods -l "app.kubernetes.io/name=fluent-bit" -o jsonpath='{.items[0].spec.nodeName}' immediately after checking that a pod was running, in which this could return a failure that the jsonpath would not match (because the k8s pod list being returned hadnot been updated yet?). If we don't attempt to search by the label and just use the exact fluent-bit pod's name we don't run into the timing issue.
skip opensearch hosted test when the $HOSTED_OPENSEARCH_HOST is localhost, this is to fix the test when running the full suite of tests from a local kind cluster and not through CI.

ryanohnemus · 2024-01-13T15:49:41Z

@patrick-stephens The current CI is broken after we merged in the kubernetes-plugins tests yesterday. The fixes here should address it.

========================
Starting tests.
========================

Fluentbit repository: ghcr.io/fluent/fluent-bit - tag: latest


1..8
ok 1 chunk rollover test in 196000ms
ok 2 test fluent-bit forwards logs to elasticsearch default index in 86000ms
ok 3 test fluent-bit forwards logs to elasticsearch default index using http compression in 81000ms
ok 4 test fluent-bit adds k8s labels to records in 63000ms
ok 5 verify config in 3000ms
ok 6 test fluent-bit forwards logs to opensearch default index in 57000ms
ok 7 test fluent-bit forwards logs to AWS OpenSearch hosted service default index in 1000ms # skip Skipping Hosted OpenSearch When 'HOSTED_OPENSEARCH_HOST=localhost'
ok 8 Verify K8S cluster accessible in 0ms


========================
All tests passed!
========================

Signed-off-by: ryanohnemus <[email protected]>

patrick-stephens

Sincerely appreciate both you doing this and monitoring the CI after your changes were merged.

We need to look at a better way of testing it but it's complicated by PRs on a public repo - we don't want to hit any supply chain security problems either.

ryanohnemus mentioned this pull request Jan 12, 2024

Add kubernetes-plugins ci initial test #121

Merged

ryanohnemus force-pushed the k8s_namespace_labels_ci branch from 35fee32 to 01dae0b Compare January 12, 2024 20:10

ryanohnemus changed the title ~~wip for namespace_labels~~ Ensure tests call helm uninstall to remove all resources on teardown Jan 12, 2024

ryanohnemus force-pushed the k8s_namespace_labels_ci branch 2 times, most recently from 597d44c to 2d354d3 Compare January 12, 2024 20:30

ryanohnemus marked this pull request as ready for review January 12, 2024 20:31

ryanohnemus force-pushed the k8s_namespace_labels_ci branch 5 times, most recently from 6b62cca to e63352c Compare January 13, 2024 15:37

ensure tests run helm uninstall in teardown

434dcb6

Signed-off-by: ryanohnemus <[email protected]>

ryanohnemus force-pushed the k8s_namespace_labels_ci branch from e63352c to 434dcb6 Compare January 13, 2024 15:50

patrick-stephens approved these changes Jan 15, 2024

View reviewed changes

patrick-stephens merged commit 7bea545 into fluent:main Jan 15, 2024
7 of 8 checks passed

ryanohnemus deleted the k8s_namespace_labels_ci branch January 18, 2024 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure tests call helm uninstall to remove all resources on teardown #122

Ensure tests call helm uninstall to remove all resources on teardown #122

ryanohnemus commented Jan 12, 2024 •

edited

Loading

ryanohnemus commented Jan 13, 2024 •

edited

Loading

patrick-stephens left a comment

Ensure tests call helm uninstall to remove all resources on teardown #122

Ensure tests call helm uninstall to remove all resources on teardown #122

Conversation

ryanohnemus commented Jan 12, 2024 • edited Loading

ryanohnemus commented Jan 13, 2024 • edited Loading

patrick-stephens left a comment

Choose a reason for hiding this comment

ryanohnemus commented Jan 12, 2024 •

edited

Loading

ryanohnemus commented Jan 13, 2024 •

edited

Loading