Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure tests call helm uninstall to remove all resources on teardown #122

Merged
merged 1 commit into from
Jan 15, 2024

Conversation

ryanohnemus
Copy link
Contributor

@ryanohnemus ryanohnemus commented Jan 12, 2024

This has multiple fixes:

  1. Tests that were not calling helm uninstall <helm-release> were leaving resources in the cluster such as ClusterRole and ClusterRoleBindings that would conflict with follow up tests that also had rbac.create: true, causing test failures

  2. chunk-rollover test:
    a) MacOS Run Fix - Updated the test to remove in-place seds (sed -i) as the sed -i option works differently on a mac. One in-place sed was replaced by using an environment variable in the config, the other just passes it's output directly to kubectl apply -f -
    b) this test specifically had created it's own rbac files that were providing the same permissions as the ones that can be created via the helm with rbac.create: true. It was also not cleaning these up afterwards and leaving them around in cluster scope with the same name as the ones that would be created via the helm chart (if you do not use a fullnameOverride). Since this is the first test to run, any test after this that had rbac.create: true would fail to run. The individualized logic to add these resources to the cluster has been removed, along with these files, and rbac.create:true has been set in the test's fluentbit-basic.yaml file. The helm clean-up as described in 1) cleans these up so there are not conflicts with tests afterwards.
    c) this test should probably be reviewed in the future. There's a forced 3 minute sleep that checks every 10s for pods that aren't running (via --field-selector=status.phase!=Running). Pods could restart and be back in phase running within 10 seconds (if i'm not mistaken), so not sure if the intent was to check for crashes/restarts but I don't think this would work. All it seems like it's doing right now is making full CI runs take an 3 extra minutes.

  3. kubernetes-plugins test:
    a) did some re-organization for future tests by adding a common deploy_fluent_bit function. This will set a FLUENTBIT_POD_NAME that can be used by tests.
    b) I was running into a weird label search (k8s pod list) return timing issue when using run kubectl get pods -l "app.kubernetes.io/name=fluent-bit" -o jsonpath='{.items[0].spec.nodeName}' immediately after checking that a pod was running, in which this could return a failure that the jsonpath would not match (because the k8s pod list being returned hadnot been updated yet?). If we don't attempt to search by the label and just use the exact fluent-bit pod's name we don't run into the timing issue.

  4. skip opensearch hosted test when the $HOSTED_OPENSEARCH_HOST is localhost, this is to fix the test when running the full suite of tests from a local kind cluster and not through CI.

@ryanohnemus ryanohnemus force-pushed the k8s_namespace_labels_ci branch from 35fee32 to 01dae0b Compare January 12, 2024 20:10
@ryanohnemus ryanohnemus changed the title wip for namespace_labels Ensure tests call helm uninstall to remove all resources on teardown Jan 12, 2024
@ryanohnemus ryanohnemus force-pushed the k8s_namespace_labels_ci branch 2 times, most recently from 597d44c to 2d354d3 Compare January 12, 2024 20:30
@ryanohnemus ryanohnemus marked this pull request as ready for review January 12, 2024 20:31
@ryanohnemus ryanohnemus force-pushed the k8s_namespace_labels_ci branch 5 times, most recently from 6b62cca to e63352c Compare January 13, 2024 15:37
@ryanohnemus
Copy link
Contributor Author

ryanohnemus commented Jan 13, 2024

@patrick-stephens The current CI is broken after we merged in the kubernetes-plugins tests yesterday. The fixes here should address it.

========================
Starting tests.
========================

Fluentbit repository: ghcr.io/fluent/fluent-bit - tag: latest


1..8
ok 1 chunk rollover test in 196000ms
ok 2 test fluent-bit forwards logs to elasticsearch default index in 86000ms
ok 3 test fluent-bit forwards logs to elasticsearch default index using http compression in 81000ms
ok 4 test fluent-bit adds k8s labels to records in 63000ms
ok 5 verify config in 3000ms
ok 6 test fluent-bit forwards logs to opensearch default index in 57000ms
ok 7 test fluent-bit forwards logs to AWS OpenSearch hosted service default index in 1000ms # skip Skipping Hosted OpenSearch When 'HOSTED_OPENSEARCH_HOST=localhost'
ok 8 Verify K8S cluster accessible in 0ms


========================
All tests passed!
========================

@ryanohnemus ryanohnemus force-pushed the k8s_namespace_labels_ci branch from e63352c to 434dcb6 Compare January 13, 2024 15:50
Copy link
Collaborator

@patrick-stephens patrick-stephens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sincerely appreciate both you doing this and monitoring the CI after your changes were merged.

We need to look at a better way of testing it but it's complicated by PRs on a public repo - we don't want to hit any supply chain security problems either.

@patrick-stephens patrick-stephens merged commit 7bea545 into fluent:main Jan 15, 2024
7 of 8 checks passed
@ryanohnemus ryanohnemus deleted the k8s_namespace_labels_ci branch January 18, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants