Webhook for deployments #562

jayantbh · 2024-10-01T19:46:57Z

Why do we need this ?

Currently PR merges are assumed to be deployments for a repo, which is fair for any repo that runs on some kind of a CI.

But for many such repos that don't, we should at least support a webhook based mechanism that allows me to feed my deployment/workflow runs data into Middleware.

That will let me have a better picture of my Dora metrics with more accurate Lead Time, and Deployment Frequency.

Kamlesh72 · 2024-10-02T11:26:43Z

@jayantbh working on this.

jayantbh · 2024-10-02T11:27:49Z

Sure. Do share your approach before you begin implementation.

jayantbh · 2024-10-03T15:45:06Z

Important

This issue is tagged advanced. By taking this up you acknowledge that you're accepting that this will be a non-trivial change and may require thorough testing and review.
Of course, this also means that we offer swag for someone who goes out of their way to tackle issues tagged advanced. 🚀
This also means we'll follow up on this regularly, and in case of inactivity the issue would be unassigned.

Kamlesh72 · 2024-10-03T18:41:47Z

@jayantbh Currently we take PR Merge or Workflow ( like Github_Actions ) for deployments. correct?

I am thinking to create a route that collects workflow/deployment webhook data.
This captured data will be mapped and pushed into RepoWorkflowRuns.
Separate adapter for each like bitbucket, circleci, gitlab etc.

This is basic idea, although more brainstorming needed.

jayantbh · 2024-10-04T17:11:13Z

This should ideally happen on the python backend (apiserver dir). But yes, you have the right idea. I'll let @adnanhashmi09 explain further.

adnanhashmi09 · 2024-10-07T19:00:22Z

Keep the following in mind while implementing the workflow:

Use authorization headers or custom headers for authenticating the workflow user. We should create a mechanism for users to create and update API keys. This would also include UI development efforts.
The webhook should never cause the workflow to fail or take an excessively long time. It should return a status of 200 in all cases. In case of an error, the response body should contain the error message and possible ways to fix it.
We need a mechanism to map these workflows to repositories linked with Middleware. Therefore, the webhook should also receive repository data for each workflow run.
The processing of data should be asynchronous and not block the API response. The API request should resolve almost immediately after the request has been sent.
The data should be processed in chunks, and the end user should send data in chunks, i.e., no more than 500 workflow runs data in a single call. This webhook should have the ability to sync large amounts of data and/or a single workflow run. Users can make a call to this webhook at the start and end of their workflow. We can infer the duration of the workflow run using that. Another case could be a user sending a number of their older workflow runs for us to process.
A simple validation of received data should be performed when someone tries to upload data. If the required fields are not present, we should return a process error body with a status code of 200. We don't keep erroneous data.
We would also need an API to prune the data synced if someone uploaded incorrect data and wanted to delete it.
An API to revoke/generate API tokens is necessary.
A frontend page to manage API tokens should be developed.
Implement alerting/notification in case of erroneous data.
A data dump for the request type, request body, response and error should be saved in case of an error. The data received from the end-user can be saved here and then later picked up for processing. So this could serve multiple purposes.
We need some event based system to process workflow runs asynchronously without blocking the main thread. So whenever someone sends are request to our webhook we register an "event" which is picked up by a listener. When that event is invoked, the listener queries the database for the latest data to process and starts processing.
The request body can be like as follows:

{
    "workflow_runs":[
        {          
            "workflow_name":"custom_workflow",
            "repo_names":["middleware"],
            "event_actor":"adnanhashmi09",
            "head_branch":"master",
            "workflow_run_unique_id":"unique_item",
            "status":"SUCCESS",
            "duration":"200", // can be provided, or we can infer this
            "workflow_run_conducted_at":"2024-09-28T20:35:45.123456+00:00"
        }
    ]
}

Read through the workflow sync once to check all the fields required for creating a RepoWorkflowRun

A RepoWorkflow shall be created based on workflow_name and repo_names if not already present. This shall also be a part of validation, ie, if RepoWorkflow cannot be created due to the repo_names being wrong or not being linked to middleware shall result in an error.

So there are a lot of moving parts in this implementation and would require a thorough understanding of our system. Please read through the sync and document your approach here before starting to implement. This is a rather comprehensive task and would longer to implement.

Kamlesh72 · 2024-10-09T13:45:34Z

@adnanhashmi09 Providers like Github actions, Gitlab, Circleci... are the sources of user deployment data. What other platforms can send data to our system? The structure of data will be different for each source, so we will need adapter to process data or user will be sending the structured response?

We can store all the incoming data into redis which will be later picked up for processing. We can only verify errors like API_Key before sending the response as data is not yet processed but we need to return 200 asap. So how we agree on point 6 as we are not processing data synchronously?

We also fetch data from Github Actions REST Api. So both rest api and webhook data will be stored in same table, right? Can a user also prune data fetched from Github Actions REST Api (point 7) ?

Can you please elaborate point 11 ?

adnanhashmi09 · 2024-10-10T22:38:33Z

@adnanhashmi09 Providers like Github actions, Gitlab, Circleci... are the sources of user deployment data. What other platforms can send data to our system? The structure of data will be different for each source, so we will need adapter to process data or user will be sending the structured response?

This webhook implementation is platform agnostic. We don't care about the workflow providers as the provider is not responsible for sending data. It is the user who integrates our webhook into their workflow who is responsible for sending the correct data. We will define a set of fields we require in the request body for us to register RepoWorkflow and RepoWorkflowRuns. It is up to the end user to make sure correct values are being sent.

We can store all the incoming data into redis which will be later picked up for processing. We can only verify errors like API_Key before sending the response as data is not yet processed but we need to return 200 asap. So how we agree on point 6 as we are not processing data synchronously?

Well, we can check for a few errors besides API_KEY errors. For instance, Maximum allowed data to be sent in once request, validate if the repo_names sent are linked with middleware or not. These operations are fairly quick to compute.

We also fetch data from Github Actions REST Api. So both rest api and webhook data will be stored in same table, right? Can a user also prune data fetched from Github Actions REST Api (point 7) ?

I don't think anybody would get github actions data from both integration and webhook. But yes, in practice we keep both data. We don't give the option to prune github actions data as they can always unlink that integration.

Can you please elaborate point 11 ?

We can save the entire request data in a database table including the data we receive for processing. This way we can check for errors and show alerts to the user by getting data from that table. It can also serve as a data dump to check what data has been received by our system for processing.

Kamlesh72 · 2024-10-23T16:55:39Z

API KEYS

User will be able to Create, Read and Delete API Keys.

// APIKeys Table Schema in Postgres
API_KEYS {
    name: string,
    secret_key: string,
    expiry_at: time,
    is_deleted: boolean, // Used for Expired or Deleted keys
    scope: string[] //  [ WORKFLOW, INCIDENT ]
    org_id: string
}

Receiving Webhook Data

// POST /public/webhook/deployments
// Headers: "X-Secret-Key": "secret_key"
{
    workflow_runs: [{
        workflow_name: string,
        repo_name: string,
        event_actor: string,
        head_branch: string,
        workflow_run_id: string,
        status: string, // Success, Failure, Pending, Canceled
        duration: number,
        workflow_run_conducted_at: string (ISO 8601 format),
        html_url: string
    }]
}

// POST /public/webhook/incidents
// Headers: "X-Secret-Key": "secret_key"
{
    incidents: [{
        incident_key: string,
        provider: string, // zenduty, pagerduty etc
        status: string, // Triggered, Acknowledged, Resolved
        title: string,
        creation_date: string,
        acknowledged_date: string,
        resolved_date: string,
        assignees: string[],
        url: string,
        incident_type: string, // Incident or Alert
        provider_service_name: string,
        source_type: string, // ??
    }]
}

Pre Processing Validation

Verify API Key
Verify size of data
Verify required fields
Verify if repo exists in middleware
If error, send 200 with error message and Notify user about erroneous data on email/slack. ( The notification module can be developed separately and later integrated into it )
If success, then store data in WebhookEventRequests and call the queue, and send 200 response.

Store the data for processing

Store the data in postgres table WebhookEventRequests (which act as DataDump table).

WebhookEvents {
    request_type: "DEPLOYMENT", // Or INCIDENT
    request_data: any, // For storing dump data
    status: string, // Waiting, Running, Skipped, Success, Failure
    error: string,
    created_in_db_at: time, 
    processed_at: time, 
    retries: number
}

Call the Celery/RQ to process data async. The broker will be Redis.
If there is any error, WebhookEvents will be updated accordingly with error.
If no error, then update WebhookEvents and store data in RepoWorkflow and RepoWorkflowRuns.

Note: If the server goes down, some of the jobs may not be executed. So either their status in database will be Waiting or Running. Then on server restart, we can mark these as Skipped.

Prune the synced data

For a received webhook data, user can request to delete synced data.
WebhookEvents.request_data can be used to prune the synced data using queue.
So if we go with this way, I think we will also need to store ids of the synced data.

UI

There will be 2 pages namely: Webhooks, API Keys.
Webhooks page will show logs for received deployments and incidents, corresponding queue status can be seen here.

jayantbh added enhancement New feature or request hacktoberfest advanced These are not really meant for beginners or brand new contributors, but feel free to give it a shot! labels Oct 1, 2024

jayantbh mentioned this issue Oct 2, 2024

Bitbucket Integration #560

Open

jayantbh assigned Kamlesh72 Oct 2, 2024

adnanhashmi09 mentioned this issue Oct 7, 2024

Webhook/endpoint to create "incidents" in the system on-demand #558

Open

jayantbh removed the hacktoberfest label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Webhook for deployments #562

Webhook for deployments #562

jayantbh commented Oct 1, 2024

Kamlesh72 commented Oct 2, 2024

jayantbh commented Oct 2, 2024

jayantbh commented Oct 3, 2024

Kamlesh72 commented Oct 3, 2024

jayantbh commented Oct 4, 2024

adnanhashmi09 commented Oct 7, 2024

Kamlesh72 commented Oct 9, 2024

adnanhashmi09 commented Oct 10, 2024 •

edited

Loading

Kamlesh72 commented Oct 23, 2024 •

edited

Loading

Webhook for deployments #562

Webhook for deployments #562

Comments

jayantbh commented Oct 1, 2024

Why do we need this ?

Kamlesh72 commented Oct 2, 2024

jayantbh commented Oct 2, 2024

jayantbh commented Oct 3, 2024

Kamlesh72 commented Oct 3, 2024

jayantbh commented Oct 4, 2024

adnanhashmi09 commented Oct 7, 2024

Kamlesh72 commented Oct 9, 2024

adnanhashmi09 commented Oct 10, 2024 • edited Loading

Kamlesh72 commented Oct 23, 2024 • edited Loading

API KEYS

Receiving Webhook Data

Pre Processing Validation

Store the data for processing

Prune the synced data

UI

adnanhashmi09 commented Oct 10, 2024 •

edited

Loading

Kamlesh72 commented Oct 23, 2024 •

edited

Loading