Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File scorecard_BasicSpecCheck.json is empty if operator-sdk produces timeout #355

Closed
tkrishtop opened this issue Dec 9, 2021 · 3 comments · Fixed by #356
Closed

File scorecard_BasicSpecCheck.json is empty if operator-sdk produces timeout #355

tkrishtop opened this issue Dec 9, 2021 · 3 comments · Fixed by #356
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@tkrishtop
Copy link
Contributor

Bug Description

Sometimes operator-sdk scorecard produces timeout, it seems like a several-minute outage of internal resources. It's not a fault of preflight, because direct operator-sdk scorecard tests which run immediately after, timeout as well.
The problem is that in this case, preflight doesn't generate correct logs and the file scorecard_BasicSpecCheck.json is empty.

Version and Command Invocation

preflight 1.0.5

Steps to Reproduce:

Operator to test:

name: "testpmd-operator"
version: "v0.2.9"
image: "quay.io/rh-nfv-int/testpmd-operator-bundle@sha256:5e28f883faacefa847104ebba1a1a22ee897b7576f0af6b8253c68b5c8f42815"
index_image: "quay.io/tkrishtop/index-testpmd-operator-bundle:v0.2.9"

Preflight command:

podman run \
-it \
--rm \
--pull=always \
--privileged \
-e PFLT_JUNIT=true \
-e PFLT_ARTIFACTS=/artifacts \
-e PFLT_LOGFILE=/artifacts/preflight.log \
-e PFLT_LOGLEVEL=trace \
-e KUBECONFIG=/kubeconfig \
-e PFLT_INDEXIMAGE={{ OO_INDEX }} \
-e PFLT_NAMESPACE={{ preflight_namespace }} \
-e PFLT_SERVICEACCOUNT=default \
-e PFLT_SCORECARD_IMAGE={{ config_images[0] }} \
-v {{ preflight_kubeconfig }}/kubeconfig:/kubeconfig \
-v {{ preflight_artifacts }}:/artifacts \
{{ preflight_version }} \
check operator \
{{ operator.image }}

Here are all log files.

Expected Result

I would expect to have an operator-sdk error error running tests context deadline exceeded in preflight.log or results.json. I'd like to not run direct operator-sdk tests and rely entirely on preflight.

Actual Result

results.json

"errors": [
  {
    "name": "ScorecardBasicSpecCheck",
    "elapsed_time": 240096,
    "description": "Check to make sure that all CRs have a spec block.",
    "help": "Check ScorecardBasicSpecCheck encountered an error. Please review the /artifacts/operator_bundle_scorecard_BasicSpecCheck.json file for more information."
  },

operator_bundle_scorecard_BasicSpecCheck.json is empty.

preflight.log

time="2021-12-07T07:49:15Z" level=trace msg="running scorecard with the following invocation[operator-sdk scorecard --output json --selector=test=basic-check-spec-test --kubeconfig /kubeconfig --wait-time 240s --namespace preflight-testing --service-account default --config /tmp/scorecard-test-config-2578384683.yaml --verbose /tmp/preflight-606340614/fs]"
time="2021-12-07T07:53:15Z" level=info msg="check completed: ScorecardBasicSpecCheck" ERROR="failed to run operator-sdk scorecard: unexpected end of JSON input" result="failed to run operator-sdk scorecard: unexpected end of JSON input"

operator-sdk scorecard --selector=test=basic-check-spec-test which runs immediately after

Stdout ============================

Stderr ============================
time="2021-12-07T02:12:24-06:00" level=debug msg="Debug logging is set"
Error: error running tests context deadline exceeded
Usage:
  operator-sdk scorecard [flags]

Flags:
  -c, --config string            path to scorecard config file
  -h, --help                     help for scorecard
      --kubeconfig string        kubeconfig path
  -L, --list                     Option to enable listing which tests are run
  -n, --namespace string         namespace to run the test images in
  -o, --output string            Output format for results. Valid values: text, json, xunit (default "text")
  -l, --selector string          label selector to determine which tests are run
  -s, --service-account string   Service account to use for tests (default "default")
  -x, --skip-cleanup             Disable resource cleanup after tests are run
  -b, --storage-image string     Storage image to use (default "docker.io/library/busybox@sha256:c71cb4f7e8ececaffb34037c2637dc86820e4185100e18b4d02d613a9bd772af")
  -t, --test-output string       Test output directory. (default "test-output")
  -u, --untar-image string       Untar image to use (default "registry.access.redhat.com/ubi8@sha256:910f6bc0b5ae9b555eb91b88d28d568099b060088616eba2867b07ab6ea457c7")
  -w, --wait-time duration       seconds to wait for tests to complete. Example: 35s (default 30s)

Global Flags:
      --plugins strings   plugin keys to be used for this subcommand execution
      --verbose           Enable verbose logging

time="2021-12-07T02:17:24-06:00" level=fatal msg="error running tests context deadline exceeded"

Additional Context

We'd like to deactivate direct operator-sdk tests and rely entirely on preflight logs.

@tkrishtop tkrishtop added the kind/bug Categorizes issue or PR as related to a bug. label Dec 9, 2021
@acornett21
Copy link
Contributor

This is happening because operator-sdk dose not handle context deadline errors like they do for other application errors. Link to code L235 would be where the error is propigated.

@acornett21
Copy link
Contributor

/assign

@tkrishtop
Copy link
Contributor Author

I opened a related issue to operator-sdk to add more information in the timeout logs: operator-framework/operator-sdk#5452

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants