[bitnami/kafka] Kafka controllers freezing periodically #31100

igloo12 · 2024-12-19T05:12:47Z

Name and Version

bitnami/kafka:30.1.0

What architecture are you using?

None

What steps will reproduce the bug?

Running the helm charts with given values.

Are you using any custom parameters or values?

listeners:
  client:
    protocol: SASL_PLAINTEXT
  controller:
    protocol: SASL_PLAINTEXT
  interbroker:
    protocol: SASL_PLAINTEXT
  external:
    containerPort: 9095
    protocol: SASL_SSL
    name: EXTERNAL
    sslClientAuth: required
externalAccess:
  enabled: true
  controller:
    service:
      type: ClusterIP
      domain: "somedomain.com"
      ports:
        external: 9095
      annotations:
        service.beta.kubernetes.io/oci-load-balancer-internal: "true"
        oci.oraclecloud.com/load-balancer-type: "lb"
        service.beta.kubernetes.io/oci-load-balancer-subnet1: "xxx"
controller:
  livenessProbe:
    enabled: true
    initialDelaySeconds: 10
    timeoutSeconds: 10
    failureThreshold: 3
    periodSeconds: 10
    successThreshold: 1
  readinessProbe:
    enabled: true
    initialDelaySeconds: 5
    failureThreshold: 6
    timeoutSeconds: 10
    periodSeconds: 10
    successThreshold: 1
tls:
  existingSecret: kafka-jks-0
  password: xxx
  keystorePassword: xxx
  truststorePassword: xxx
  endpointIdentificationAlgorithm:
extraConfig: |
  allow.everyone.if.no.acl.found=false
  authorizer.class.name=org.apache.kafka.metadata.authorizer.StandardAuthorizer
  super.users=User:admin;User:controller_user;User:inter_broker_user
image:
  debug: true
provisioning:
  enabled: true
  resources:
    requests:
      cpu: 2
      memory: 512Mi
    limits:
      cpu: 3
      memory: 1024Mi
  extraProvisioningCommands:
    - echo "Allow user to consume from any topic"
    - "/opt/bitnami/kafka/bin/kafka-acls.sh --bootstrap-server $KAFKA_SERVICE --command-config $CLIENT_CONF --add --allow-principal User:auser --consumer --topic fusion_ --resource-pattern-type prefixed"
    - "/opt/bitnami/kafka/bin/kafka-acls.sh
            --bootstrap-server $KAFKA_SERVICE
            --command-config $CLIENT_CONF
            --list"

What is the expected behavior?

For the controllers to run without restarting

What do you see instead?

The controller will crash after a while and freeze Kafka until it is forced to restart. I can't exec into it, and it uses a lot of CPU power.

$ kubectl top pods -n kafka
NAME                 CPU(cores)   MEMORY(bytes)   
kafka-controller-0   30m          632Mi           
kafka-controller-1   24m          555Mi           
kafka-controller-2   750m         765Mi

The last of the log messages is this.

[2024-12-19 01:56:55,916] INFO [GroupCoordinator 2]: Dynamic member with unknown member id joins group myapp in PreparingRebalance state. Created a new member id consumer-myapp-3-5046ef33-84e8-43af-b159-3c03bb143ecb and request the member to rejoin with this id. (kafka.coordinator.group.GroupCoordinator)
[2024-12-19 01:56:57,694] INFO [GroupCoordinator 2]: Stabilized group myapp generation 106 (__consumer_offsets-24) with 21 members (kafka.coordinator.group.GroupCoordinator)
[2024-12-19 01:56:57,697] INFO [GroupCoordinator 2]: Assignment received from leader consumer-myapp-2-8abd47e6-efc2-4c61-883a-e7f3f6681e40 for group myapp for generation 106. The group has 21 members, 0 of which are static. (kafka.coordinator.group.GroupCoordinator)
[2024-12-19 02:36:14,987] INFO [BrokerLifecycleManager id=2] Unable to send a heartbeat because the RPC got timed out before it could be sent. (kafka.server.BrokerLifecycleManager)

The text was updated successfully, but these errors were encountered:

igloo12 · 2024-12-19T16:06:06Z

It happened again, and I noticed that the frozen pod had high disk reads and the health node didn't

igloo12 · 2024-12-20T16:27:50Z

I changed the liveness prod to match the readiness probe. The containers are still dying but they are recovering faster

  customLivenessProbe:
    failureThreshold: 6
    initialDelaySeconds: 5
    periodSeconds: 10
    successThreshold: 1
    tcpSocket:
      port: controller
    timeoutSeconds: 10

carrodher · 2024-12-23T18:48:57Z

Hi, the issue may not be directly related to the Bitnami container image/Helm chart, but rather to how the application is being utilized, configured in your specific environment, or tied to a particular scenario that is not easy to reproduce on our side.

If you think that's not the case and want to contribute a solution, we'd like to invite you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Please feel free to contact us if you have any questions or need assistance.

Suppose you have any questions about the application, customizing its content, or technology and infrastructure usage. In that case, we highly recommend that you refer to the forums and user guides provided by the project responsible for the application or technology.

With that said, we'll keep this ticket open until the stale bot automatically closes it, in case someone from the community contributes valuable insights.

github-actions · 2025-01-08T01:28:11Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

igloo12 added the tech-issues The user has a technical issue about an application label Dec 19, 2024

igloo12 changed the title ~~Kafka controller freezing~~ Kafka controller freezing periodically Dec 19, 2024

igloo12 changed the title ~~Kafka controller freezing periodically~~ Kafka controllers freezing periodically Dec 19, 2024

github-actions bot added the triage Triage is needed label Dec 19, 2024

github-actions bot assigned carrodher Dec 19, 2024

igloo12 changed the title ~~Kafka controllers freezing periodically~~ [bitnami/kafka] Kafka controllers freezing periodically Dec 19, 2024

igloo12 mentioned this issue Dec 19, 2024

[bitnami/kafka] Liveness probe not effectively finding failed pods #31119

Open

carrodher added the kafka label Dec 23, 2024

github-actions bot added the stale 15 days without activity label Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/kafka] Kafka controllers freezing periodically #31100

[bitnami/kafka] Kafka controllers freezing periodically #31100

igloo12 commented Dec 19, 2024 •

edited by carrodher

Loading

igloo12 commented Dec 19, 2024 •

edited

Loading

igloo12 commented Dec 20, 2024 •

edited by carrodher

Loading

carrodher commented Dec 23, 2024

github-actions bot commented Jan 8, 2025

[bitnami/kafka] Kafka controllers freezing periodically #31100

[bitnami/kafka] Kafka controllers freezing periodically #31100

Comments

igloo12 commented Dec 19, 2024 • edited by carrodher Loading

Name and Version

What architecture are you using?

What steps will reproduce the bug?

Are you using any custom parameters or values?

What is the expected behavior?

What do you see instead?

igloo12 commented Dec 19, 2024 • edited Loading

igloo12 commented Dec 20, 2024 • edited by carrodher Loading

carrodher commented Dec 23, 2024

github-actions bot commented Jan 8, 2025

igloo12 commented Dec 19, 2024 •

edited by carrodher

Loading

igloo12 commented Dec 19, 2024 •

edited

Loading

igloo12 commented Dec 20, 2024 •

edited by carrodher

Loading