Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enable prometheus merge as a configuration in the Kyma Istio CR #1232

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

jeffreylimnardy
Copy link

Description

Changes proposed in this pull request:

  • Add another config field enablePrometheusMerge in the Kyma Istio CR which by default is set to false. When set to true it will behave as described in this ADR.
  • Restarts the sidecars whenever this configuration is changes to change the annotations on the pods

Pre-Merge Checklist

  • As a PR reviewer, verify code coverage and evaluate if it is acceptable.

Related issues

Resolves #1185

@CLAassistant
Copy link

CLAassistant commented Jan 7, 2025

CLA assistant check
All committers have signed the CLA.

@kyma-bot kyma-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cla: yes Indicates the PR's author has signed the CLA. labels Jan 7, 2025
@jeffreylimnardy jeffreylimnardy marked this pull request as ready for review January 13, 2025 08:03
@jeffreylimnardy jeffreylimnardy requested review from a team as code owners January 13, 2025 08:03
@kasiakepka
Copy link
Contributor

kasiakepka commented Jan 16, 2025

Hi @jeffreylimnardy can you please look at the failing tests and let us know when they are green :)

@jeffreylimnardy
Copy link
Author

jeffreylimnardy commented Jan 27, 2025

Hi @jeffreylimnardy can you please look at the failing tests and let us know when they are green :)

Sorry for the delay! Working on it, I was out sick for the last 2 weeks 🤒 , tests are green now! :)

pbochynski
pbochynski previously approved these changes Jan 30, 2025
Copy link
Contributor

@pbochynski pbochynski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kyma-bot kyma-bot added the lgtm Looks good to me! label Jan 30, 2025
@kyma-bot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@mluk-sap
Copy link

mluk-sap commented Feb 11, 2025

I tested it with the following script:

#/bin/bash
set -e
set -o pipefail
set -x

export NAMESPACE=workload

k3d cluster delete kyma
k3d cluster create kyma --k3s-arg '--tls-san=host.docker.internal@server:*' --agents 2 --k3s-arg '--disable=traefik@server:0' -p 80:80@loadbalancer -p 443:443@loadbalancer 

export IMG=istio:$(date +%s)
make build
make docker-build 
k3d image import -c kyma "$IMG"

make deploy

echo "deploying istio CR"
cat <<EOF | kubectl -n kyma-system apply -f -
apiVersion: operator.kyma-project.io/v1alpha2
kind: Istio
metadata:
  name: default
  namespace: kyma-system
  labels:
    app.kubernetes.io/name: default
spec:
  config:
    telemetry:
      metrics:
        prometheusMerge: true
EOF
echo "istio CR deployed"

echo "waiting for istio CR"
kubectl wait istio/default -n kyma-system --for=condition=Ready 
kubectl wait pods -n istio-system -l app=istiod  --for=condition=ready
echo "istio CR ready"

echo "istio configmap"
kubectl get -n istio-system configmap istio -o yaml

echo "deploying app"
kubectl create ns $NAMESPACE || true
kubectl label namespace $NAMESPACE istio-injection=enabled --overwrite

cat <<EOF | kubectl -n $NAMESPACE apply -f -
apiVersion: v1
kind: Service
metadata:
  name: application-service
  annotations:
    prometheus.io/port: "8080"
    prometheus.io/scrape: "true"
spec:
  ports:
  - name: http-metrics
    port: 8080
    targetPort: 8080
  - name: merged-metrics
    port: 15020
    targetPort: 15020
  selector:
    app: application
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: application-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: application
  template:
    metadata:
      labels:
        app: application
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      containers:
      - name: application-container
        image: quay.io/brancz/prometheus-example-app:v0.3.0
        ports:
        - name: web
          containerPort: 8080
EOF
echo "app deployed"

echo "waiting for pods"
kubectl wait pods -n "$NAMESPACE" -l app=application  --for=condition=ready
echo "pods ready"

echo "starting curl"
kubectl delete ns test || true
kubectl create ns test
kubectl label namespace test istio-injection=enabled --overwrite
kubectl run --namespace test --image=curlimages/curl check-internal-curl -- /bin/sleep 3600
kubectl wait --for=condition=ready pod -n test check-internal-curl

echo "visiting application"
kubectl exec --namespace test -ti check-internal-curl -- curl --verbose --fail --location http://application-service.$NAMESPACE.svc.cluster.local:8080/

echo "getting merged metrics"
kubectl exec --namespace test -ti check-internal-curl -- curl --verbose --fail --location http://application-service.$NAMESPACE.svc.cluster.local:15020/stats/prometheus | grep -e "^http_request" -e "^version"

It is able to get the application metrics exposed via istio-proxy on port 15020:

getting merged metrics
++ kubectl exec --namespace test -ti check-internal-curl -- curl --verbose --fail --location http://application-service.workload.svc.cluster.local:15020/stats/prometheus
++ grep -e '^http_request' -e '^version'
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="0.005"} 1
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="0.01"} 1
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="0.025"} 1
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="0.05"} 1
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="0.1"} 1
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="0.25"} 1
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="0.5"} 1
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="1"} 1
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="2.5"} 1
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="5"} 1
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="10"} 1
http_request_duration_seconds_bucket{code="200",handler="found",method="get",le="+Inf"} 1
http_request_duration_seconds_sum{code="200",handler="found",method="get"} 0.000971
http_request_duration_seconds_count{code="200",handler="found",method="get"} 1
http_requests_total{code="200",method="get"} 1
version{version="v0.3.0"} 1

Copy link

@mluk-sap mluk-sap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PrometheusMergeRestartPredicate needs to compare current pod with the desired state from the Istio CR

}

func (p PrometheusMergeRestartPredicate) Matches(_ v1.Pod) bool {
return p.oldPrometheusMerge != p.newPrometheusMerge

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only checks the 'last applied configuration' and the current one of the Istio CR and ignores the current workload state.
So it works improperly:

  • it restarts workloads endlessly if there is a configuration change in the Istio CR
  • it won't restart workloads if there are subsequent Istio CR changes, so the configuration change in the Istio CR is not visible

I observed the first problem in my local environment - I changed the prometheusMerge flag in the Istio CR and my workload is restarted every minute with every Istio reconcilliation.

It should rather compare the given pod (passed to this function as a parameter) with the desired state of the Istio CR (istioCR.Spec.Config.Telemetry.Metrics.PrometheusMerge). It may just look if prometheus on pod annotations are properly set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Indicates the PR's author has signed the CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Istio prometheusMerge
6 participants