Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mlflow Helm Chart Deployment failes: Error with volumePermission.enabled=true #30833

Closed
Ganesh-hub-5 opened this issue Dec 9, 2024 · 3 comments
Assignees
Labels
mlflow solved stale 15 days without activity tech-issues The user has a technical issue about an application triage Triage is needed

Comments

@Ganesh-hub-5
Copy link

Ganesh-hub-5 commented Dec 9, 2024

Name and Version

bitnami/mlflow Chart Version: mlflow-2.2.1 App Version: 2.18.0

What architecture are you using?

None

What steps will reproduce the bug?

Kubernetes version:

Client Version: v1.30.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.11+rke2r1

Environment:
OS: Red Hat Enterprise Linux
Kernel: Linux
RKE2 Version: v1.28.11

What we have done till now

  1. Created pv and pvc in abca3ns
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-data-mlflow-a3-postgresql
labels:
   app.kubernetes.io/component: primary
   app.kubernetes.io/instance: mlflow-a3
   app.kubernetes.io/name: postgresql
spec:
capacity:
  storage: 10Gi
accessModes:
  - ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
hostPath:
  path: /opt/mlflow/postgresql

---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-mlflow-a3-minio
labels:
   app.kubernetes.io/instance: mlflow-a3
   app.kubernetes.io/managed-by: Helm
   app.kubernetes.io/name: minio
   app.kubernetes.io/version: 2024.11.7
   helm.sh/chart: minio-14.8.5
spec:
capacity:
  storage: 8Gi
accessModes:
  - ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
hostPath:
  path: /opt/mlflow/minio

---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-mlflow-a3-tracking
labels:
   app.kubernetes.io/component: tracking
   app.kubernetes.io/instance: mlflow-a3
   app.kubernetes.io/managed-by: Helm
   app.kubernetes.io/name: mlflow
   app.kubernetes.io/part-of: mlflow
   app.kubernetes.io/version: 2.18.0
   helm.sh/chart: mlflow-2.2.1
spec:
capacity:
  storage: 8Gi
accessModes:
  - ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
hostPath:
  path: /opt/mlflow/tracking
  1. Created pvc in abca3ns
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app.kubernetes.io/component: primary
    app.kubernetes.io/instance: mlflow-a3
    app.kubernetes.io/name: postgresql
  name: data-mlflow-a3-postgresql-0
  namespace: abca3ns
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    meta.helm.sh/release-name: mlflow-a3
    meta.helm.sh/release-namespace: abca3ns
  labels:
    app.kubernetes.io/instance: mlflow-a3
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: minio
    app.kubernetes.io/version: 2024.11.7
    helm.sh/chart: minio-14.8.5
  name: mlflow-a3-minio
  namespace: abca3ns
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    meta.helm.sh/release-name: mlflow-a3
    meta.helm.sh/release-namespace: abca3ns
  labels:
    app.kubernetes.io/component: tracking
    app.kubernetes.io/instance: mlflow-a3
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: mlflow
    app.kubernetes.io/part-of: mlflow
    app.kubernetes.io/version: 2.18.0
    helm.sh/chart: mlflow-2.2.1
  name: mlflow-a3-tracking
  namespace: abca3ns
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
$ kubectl get pv
NAME                                       CAPACITY      ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                                              STORAGECLASS   REASON   AGE                                                      
pv-data-mlflow-a3-postgresql               10Gi          RWO            Retain           Bound       abca3ns/data-mlflow-a3-postgresql-0                                                        2d
pv-mlflow-a3-minio                         8Gi           RWO            Retain           Bound       abca3ns/mlflow-a3-minio                                                                    44m
pv-mlflow-a3-tracking                      8Gi           RWO            Retain           Bound       abca3ns/mlflow-a3-tracking    
44m

$ kubectl get pvc -n abca3ns
NAME                          STATUS   VOLUME                         CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-mlflow-a3-postgresql-0   Bound    pv-data-mlflow-a3-postgresql   10Gi       RWO                           2d
mlflow-a3-minio               Bound    pv-mlflow-a3-minio             8Gi        RWO                           46m
mlflow-a3-tracking            Bound    pv-mlflow-a3-tracking          8Gi        RWO                           46m

Are you using any custom parameters or values?

The values.yaml we are using

global:
 affinity:
   nodeAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
       nodeSelectorTerms:
       - matchExpressions:
         - key: abca3/node-type
           operator: In
           values:
             - platform

volumePermissions:
   enabled: true

postgresql:
 affinity:
   nodeAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
       nodeSelectorTerms:
       - matchExpressions:
         - key: abca3/node-type
           operator: In
           values:
             - platform
 persistence:
   enabled: true
   existingClaim: data-mlflow-a3-postgresql-0

minio:
 affinity:
   nodeAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
       nodeSelectorTerms:
       - matchExpressions:
         - key: abca3/node-type
           operator: In
           values:
             - platform
 persistence:
   enabled: true
   existingClaim: mlflow-a3-minio

mlflow:
 affinity:
   nodeAffinity:
     requiredDuringSchedulingIgnoredDuringExecution:
       nodeSelectorTerms:
       - matchExpressions:
         - key: abca3/node-type
           operator: In
           values:
             - platform
 persistence:
     enabled: true
     existingClaim: mlflow-a3-tracking

What is the expected behavior?

$ kubectl get all -n abca3ns

NAME                                      READY   STATUS                  RESTARTS         AGE
pod/mlflow-a3-minio-595bc459b7-w8m2t      1/1     Running        14 (4m55s ago)   51m
pod/mlflow-a3-postgresql-0                1/1     Running        14 (4m34s ago)   51m
pod/mlflow-a3-run-56b57c5dfb-vxhx2        1/1     Running   10 (3m32s ago)   51m
pod/mlflow-a3-tracking-85554865bc-qr6gr   1/1     Running   10 (3m31s ago)   51m

NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mlflow-a3-minio      1/1     1            1          51m
deployment.apps/mlflow-a3-run        1/1     1            1           51m
deployment.apps/mlflow-a3-tracking   1/1     1            1           51m

What do you see instead?

$ kubectl get all -n abca3ns
NAME                                      READY   STATUS                  RESTARTS         AGE
pod/mlflow-a3-minio-595bc459b7-w8m2t      0/1     CrashLoopBackOff        14 (4m55s ago)   51m
pod/mlflow-a3-postgresql-0                0/1     CrashLoopBackOff        14 (4m34s ago)   51m
pod/mlflow-a3-run-56b57c5dfb-vxhx2        0/1     Init:CrashLoopBackOff   10 (3m32s ago)   51m
pod/mlflow-a3-tracking-85554865bc-qr6gr   0/1     Init:CrashLoopBackOff   10 (3m31s ago)   51m

deployment.apps/mlflow-a3-minio      0/1     1            0           51m
deployment.apps/mlflow-a3-run        0/1     1            0           51m
deployment.apps/mlflow-a3-tracking   0/1     1            0           51m

$ kubectl logs mlflow-a3-minio-595bc459b7-w8m2t -n abca3ns
 06:19:30.84 INFO  ==>
 06:19:30.84 INFO  ==> Welcome to the Bitnami minio container
 06:19:30.85 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
 06:19:30.85 INFO  ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
 06:19:30.85 INFO  ==> Upgrade to Tanzu Application Catalog for production environments to access custom-configured and pre-packaged software components. Gain enhanced features, including Software Bill of Materials (SBOM), CVE scan result reports, and VEX documents. To learn more, visit https://bitnami.com/enterprise
 06:19:30.86 INFO  ==>
 06:19:30.86 INFO  ==> ** Starting MinIO setup **
/opt/bitnami/scripts/libminio.sh: line 374: /bitnami/minio/data/.root_user: Permission denied

$ kubectl logs mlflow-a3-postgresql-0 -n abca3ns
postgresql 06:19:50.55 INFO  ==>
postgresql 06:19:50.55 INFO  ==> Welcome to the Bitnami postgresql container
postgresql 06:19:50.64 INFO  ==> Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql 06:19:50.64 INFO  ==> Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql 06:19:50.65 INFO  ==> Upgrade to Tanzu Application Catalog for production environments to access custom-configured and pre-packaged software components. Gain enhanced features, including Software Bill of Materials (SBOM), CVE scan result reports, and VEX documents. To learn more, visit https://bitnami.com/enterprise
postgresql 06:19:50.65 INFO  ==>
postgresql 06:19:50.85 INFO  ==> ** Starting PostgreSQL setup **
postgresql 06:19:51.04 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql 06:19:51.05 INFO  ==> Loading custom pre-init scripts...
postgresql 06:19:51.14 INFO  ==> Initializing PostgreSQL database...
mkdir: cannot create directory ‘/bitnami/postgresql/data’: Permission denied

Additional information

The helm command we are using for mlflow install is

helm install mlflow-a3 oci://registry-1.docker.io/bitnamicharts/mlflow --set service.type=ClusterIP -f value.yaml  -n abca3ns

With the help of the information provided in this document, we tried to resolve this issue by passing --set volumePermissions.enabled=true then
we encountered a new error like below. Please help with this error it would be appreciated.

$ helm upgrade mlflow-a3 oci://registry-1.docker.io/bitnamicharts/mlflow --set service.type=ClusterIP -f value.yaml --set volumePermissions.enabled=true  -n abca3ns
Pulled: registry-1.docker.io/bitnamicharts/mlflow:2.2.1
Digest: sha256:f7a440020cb59232ade5ff644624586095485b465d138e1b76efc33f7e0f1eba
Error: UPGRADE FAILED: template: mlflow/templates/tracking/deployment.yaml:105:12: executing "mlflow/templates/tracking/deployment.yaml" at <include "mlflow.v0.volumePermissionsInitContainer" .>: error calling include: template: mlflow/templates/_helpers.tpl:521:12: executing "mlflow.v0.volumePermissionsInitContainer" at <include>: wrong number of args for include: want 2 got 1
@Ganesh-hub-5 Ganesh-hub-5 added the tech-issues The user has a technical issue about an application label Dec 9, 2024
@github-actions github-actions bot added the triage Triage is needed label Dec 9, 2024
@carrodher
Copy link
Member

Bitnami containers are designed to operate as non-root by default. Consequently, any files or directories used by the application should be owned by the root group, as the random user (1001 by default) is a member of this root group. To ensure proper permissions, you'll need to adjust the ownership of your local directory accordingly.

For more comprehensive information about non-root containers and their significance for security, you can explore the following resources:

These references provide valuable insights into the best practices and considerations when working with non-root containers in Bitnami applications.

Copy link

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Dec 27, 2024
Copy link

github-actions bot commented Jan 1, 2025

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

@github-actions github-actions bot added the solved label Jan 1, 2025
@bitnami-bot bitnami-bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlflow solved stale 15 days without activity tech-issues The user has a technical issue about an application triage Triage is needed
Projects
None yet
Development

No branches or pull requests

4 participants