Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Download grafana agent archive to local folder in case of different arch #166

Open
davordbetter opened this issue Mar 22, 2024 · 9 comments

Comments

@davordbetter
Copy link

I have two hosts in inventory.
One machine is amd64 and another is arm64.

While running ansible-playbook on my pc, it works fine.

TASK [grafana.grafana.grafana_agent : Create Grafana Agent temp directory] ****************************************************************************************************************************************
ok: [mon-vm -> localhost]

TASK [grafana.grafana.grafana_agent : Download Grafana Agent archive to local folder] *****************************************************************************************************************************
changed: [mon-vm -> localhost]
changed: [dev-be1 -> localhost]

TASK [grafana.grafana.grafana_agent : Extract grafana-agent.zip] **************************************************************************************************************************************************
.fcst....?? grafana-agent-linux-arm64
changed: [mon-vm -> localhost]
.fcst....?? grafana-agent-linux-amd64
changed: [dev-be1 -> localhost]

TASK [grafana.grafana.grafana_agent : Set local path] *************************************************************************************************************************************************************
ok: [mon-vm]
ok: [dev-be1]

TASK [grafana.grafana.grafana_agent : Propagate downloaded binary] ************************************************************************************************************************************************
ok: [mon-vm]
diff skipped: destination file appears to be binary
diff skipped: source file size is greater than 104448
changed: [dev-be1]

While same playbook on gitlab ci/cd pipeline does not repeat download archive and downloads only amd64 binary

TASK [grafana.grafana.grafana_agent : Create Grafana Agent temp directory] *****
--- before
+++ after
@@ -1,5 +1,5 @@
 {
-    "mode": "0755",
+    "mode": "0751",
     "path": "/tmp/grafana-agent",
-    "state": "absent"
+    "state": "directory"
 }
changed: [ssxmon-vm -> localhost]
TASK [grafana.grafana.grafana_agent : Download Grafana Agent archive to local folder] ***
changed: [ssxmon-vm -> localhost]
TASK [grafana.grafana.grafana_agent : Extract grafana-agent.zip] ***************
>f++++++.?? grafana-agent-linux-arm64
changed: [ssxmon-vm -> localhost]
TASK [grafana.grafana.grafana_agent : Set local path] **************************
ok: [ssxmon-vm]
ok: [ssxdev-be1]
TASK [grafana.grafana.grafana_agent : Propagate downloaded binary] *************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option
fatal: [ssxdev-be1]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/grafana-agent/grafana-agent-linux-amd64' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
ok: [ssxmon-vm]

Looking at role task

    - name: Download Grafana Agent archive to local folder
      become: false
      ansible.builtin.get_url:
        url: "{{ _grafana_agent_download_url }}"
        dest: "{{ grafana_agent_local_tmp_dir }}/grafana-agent_{{ _grafana_agent_cpu_arch }}_{{ grafana_agent_version }}.zip"
        mode: 0664
      register: _download_archive
      until: _download_archive is succeeded
      retries: 5
      delay: 2
      delegate_to: localhost
      check_mode: false
      run_once: true

it has option "run_once: true".
Now I'm confused why did repeat download on local env, while pipeline did honor run_once parameter.

Anyway, I think run_once should not be here or it should be solved in some different way.
On other hand, this run_once is handy when I run script over high amount of VMs.

@devmittal02
Copy link

did you find any workaround for the same, getting same issue while running it on bunch of hosts having both arm64 and amd64 type archs

@ishanjainn
Copy link
Member

ishanjainn commented Mar 29, 2024

Hey @devmittal02, Haven't checked it out as we are building a new role for Grafana Agent which is for flow mode (recommended way now) so probably can test this out on that.

If you wanna double check, we have a PR open so I can get any changes you want in that right now.

@davordbetter
Copy link
Author

My "workaround" is to group arm and amd VM in different groups and run 2 pipelines with interntory limit (-l)

@ishanjainn
Copy link
Member

This seems a very weird issue, @davordbetter any thoughts on why this is specially failing on GitLab?

@devmittal02 What platform are you running the playbook on?

@devmittal02
Copy link

devmittal02 commented Mar 29, 2024

Hey i think the issue is because of this run once, i am running on AWX to the entire fleet of ec2 machines, it spins up a on demand container and triggeres the playbook across the machines using SSM,

What's happening is lets say for 1st machine when it ran lets say that was AMD, so it downloaded the binary for that only and store in local, next time when ARM machine comes , it skips download step because of "run once" and copies only the previous AMD variant of binary, hence the issue of file doesn't exists, as it is a wrong binary

- name: Download Grafana Agent binary to controller (localhost)
  block:
    - name: Create Grafana Agent temp directory
      become: false
      ansible.builtin.file:
        path: "{{ grafana_agent_local_tmp_dir }}"
        state: directory
        mode: 0751
      delegate_to: localhost
      check_mode: false
      run_once: true

    - name: Download Grafana Agent archive to local folder
      become: false
      ansible.builtin.get_url:
        url: "{{ _grafana_agent_download_url }}"
        dest: "{{ grafana_agent_local_tmp_dir }}/grafana-agent_{{ _grafana_agent_cpu_arch }}_{{ grafana_agent_version }}.zip"
        mode: 0664
      register: _download_archive
      until: _download_archive is succeeded
      retries: 5
      delay: 2
      delegate_to: localhost
      check_mode: false
      run_once: true

    - name: Extract grafana-agent.zip
      become: false
      ansible.builtin.unarchive:
        src: "{{ grafana_agent_local_tmp_dir }}/grafana-agent_{{ _grafana_agent_cpu_arch }}_{{ grafana_agent_version }}.zip"
        dest: "{{ grafana_agent_local_tmp_dir }}"
        remote_src: false
      delegate_to: localhost
      run_once: true

@davordbetter
Copy link
Author

@ishanjainn can't figure it out, why same docker image with roles runs on my pc with both binaries, on gitlab pipeline only one (which is correct acorting to role run_once).

But only difference is that my pc is M2 macbook (emulated amd64 docker image) while gitlab runner runs on amd64 linux ubuntu vm.

@gardar
Copy link
Collaborator

gardar commented Apr 9, 2024

The issue is indeed that the task has "run_once"
It downloads the zip according the the facts of the first host, if that host contains a different cpu architecture than the others then that's going to cause the issue described.

Until this gets fixed the simplest workaround would be to separate the hosts based on cpu architecture in the playbook that executes the role.

Something like this:

inventory/hosts

[amd64_hosts]
example.host.tld

[arm64_hosts]
arm.host.tld

playbook.grafana_agent.yml

---
- name: Grafana agent on amd64 hosts
  hosts: amd64_hosts
  roles:
    - role: grafana.grafana.grafana_agent

- name: Grafana agent on amd64 hosts
  hosts: arm64_hosts
  roles:
    - role: grafana.grafana.grafana_agent

@voidquark
Copy link
Collaborator

Based on the message in the Grafana Agent documentation:

Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.

I believe this can be closed, and migration to Alloy is required. @ishanjainn, what are your thoughts?

@davordbetter
Copy link
Author

Need to reopen again, but this would be really nice to be solved and I don't see that it should be a big issue to solve.
Migration to alloy will take some time, meanwhile we need to support existing environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants