Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhook for deployments #562

Open
jayantbh opened this issue Oct 1, 2024 · 9 comments
Open

Webhook for deployments #562

jayantbh opened this issue Oct 1, 2024 · 9 comments
Assignees
Labels
advanced These are not really meant for beginners or brand new contributors, but feel free to give it a shot! enhancement New feature or request

Comments

@jayantbh
Copy link
Contributor

jayantbh commented Oct 1, 2024

Why do we need this ?

Currently PR merges are assumed to be deployments for a repo, which is fair for any repo that runs on some kind of a CI.

But for many such repos that don't, we should at least support a webhook based mechanism that allows me to feed my deployment/workflow runs data into Middleware.

That will let me have a better picture of my Dora metrics with more accurate Lead Time, and Deployment Frequency.

@jayantbh jayantbh added enhancement New feature or request hacktoberfest advanced These are not really meant for beginners or brand new contributors, but feel free to give it a shot! labels Oct 1, 2024
@Kamlesh72
Copy link
Contributor

@jayantbh working on this.

@jayantbh
Copy link
Contributor Author

jayantbh commented Oct 2, 2024

Sure. Do share your approach before you begin implementation.

@jayantbh
Copy link
Contributor Author

jayantbh commented Oct 3, 2024

Important

This issue is tagged advanced. By taking this up you acknowledge that you're accepting that this will be a non-trivial change and may require thorough testing and review.
Of course, this also means that we offer swag for someone who goes out of their way to tackle issues tagged advanced. 🚀
This also means we'll follow up on this regularly, and in case of inactivity the issue would be unassigned.

@Kamlesh72
Copy link
Contributor

@jayantbh Currently we take PR Merge or Workflow ( like Github_Actions ) for deployments. correct?

I am thinking to create a route that collects workflow/deployment webhook data.
This captured data will be mapped and pushed into RepoWorkflowRuns.
Separate adapter for each like bitbucket, circleci, gitlab etc.

This is basic idea, although more brainstorming needed.

@jayantbh
Copy link
Contributor Author

jayantbh commented Oct 4, 2024

This should ideally happen on the python backend (apiserver dir). But yes, you have the right idea. I'll let @adnanhashmi09 explain further.

@adnanhashmi09
Copy link
Contributor

Keep the following in mind while implementing the workflow:

  1. Use authorization headers or custom headers for authenticating the workflow user. We should create a mechanism for users to create and update API keys. This would also include UI development efforts.

  2. The webhook should never cause the workflow to fail or take an excessively long time. It should return a status of 200 in all cases. In case of an error, the response body should contain the error message and possible ways to fix it.

  3. We need a mechanism to map these workflows to repositories linked with Middleware. Therefore, the webhook should also receive repository data for each workflow run.

  4. The processing of data should be asynchronous and not block the API response. The API request should resolve almost immediately after the request has been sent.

  5. The data should be processed in chunks, and the end user should send data in chunks, i.e., no more than 500 workflow runs data in a single call. This webhook should have the ability to sync large amounts of data and/or a single workflow run. Users can make a call to this webhook at the start and end of their workflow. We can infer the duration of the workflow run using that. Another case could be a user sending a number of their older workflow runs for us to process.

  6. A simple validation of received data should be performed when someone tries to upload data. If the required fields are not present, we should return a process error body with a status code of 200. We don't keep erroneous data.

  7. We would also need an API to prune the data synced if someone uploaded incorrect data and wanted to delete it.

  8. An API to revoke/generate API tokens is necessary.

  9. A frontend page to manage API tokens should be developed.

  10. Implement alerting/notification in case of erroneous data.

  11. A data dump for the request type, request body, response and error should be saved in case of an error. The data received from the end-user can be saved here and then later picked up for processing. So this could serve multiple purposes.

  12. We need some event based system to process workflow runs asynchronously without blocking the main thread. So whenever someone sends are request to our webhook we register an "event" which is picked up by a listener. When that event is invoked, the listener queries the database for the latest data to process and starts processing.

  13. The request body can be like as follows:

{
    "workflow_runs":[
        {          
            "workflow_name":"custom_workflow",
            "repo_names":["middleware"],
            "event_actor":"adnanhashmi09",
            "head_branch":"master",
            "workflow_run_unique_id":"unique_item",
            "status":"SUCCESS",
            "duration":"200", // can be provided, or we can infer this
            "workflow_run_conducted_at":"2024-09-28T20:35:45.123456+00:00"
        }
    ]
}

Read through the workflow sync once to check all the fields required for creating a RepoWorkflowRun

  1. A RepoWorkflow shall be created based on workflow_name and repo_names if not already present. This shall also be a part of validation, ie, if RepoWorkflow cannot be created due to the repo_names being wrong or not being linked to middleware shall result in an error.

So there are a lot of moving parts in this implementation and would require a thorough understanding of our system. Please read through the sync and document your approach here before starting to implement. This is a rather comprehensive task and would longer to implement.

@Kamlesh72
Copy link
Contributor

@adnanhashmi09 Providers like Github actions, Gitlab, Circleci... are the sources of user deployment data. What other platforms can send data to our system? The structure of data will be different for each source, so we will need adapter to process data or user will be sending the structured response?

We can store all the incoming data into redis which will be later picked up for processing. We can only verify errors like API_Key before sending the response as data is not yet processed but we need to return 200 asap. So how we agree on point 6 as we are not processing data synchronously?

We also fetch data from Github Actions REST Api. So both rest api and webhook data will be stored in same table, right? Can a user also prune data fetched from Github Actions REST Api (point 7) ?

Can you please elaborate point 11 ?

@adnanhashmi09
Copy link
Contributor

adnanhashmi09 commented Oct 10, 2024

@adnanhashmi09 Providers like Github actions, Gitlab, Circleci... are the sources of user deployment data. What other platforms can send data to our system? The structure of data will be different for each source, so we will need adapter to process data or user will be sending the structured response?

This webhook implementation is platform agnostic. We don't care about the workflow providers as the provider is not responsible for sending data. It is the user who integrates our webhook into their workflow who is responsible for sending the correct data. We will define a set of fields we require in the request body for us to register RepoWorkflow and RepoWorkflowRuns. It is up to the end user to make sure correct values are being sent.

We can store all the incoming data into redis which will be later picked up for processing. We can only verify errors like API_Key before sending the response as data is not yet processed but we need to return 200 asap. So how we agree on point 6 as we are not processing data synchronously?

Well, we can check for a few errors besides API_KEY errors. For instance, Maximum allowed data to be sent in once request, validate if the repo_names sent are linked with middleware or not. These operations are fairly quick to compute.

We also fetch data from Github Actions REST Api. So both rest api and webhook data will be stored in same table, right? Can a user also prune data fetched from Github Actions REST Api (point 7) ?

I don't think anybody would get github actions data from both integration and webhook. But yes, in practice we keep both data. We don't give the option to prune github actions data as they can always unlink that integration.

Can you please elaborate point 11 ?

We can save the entire request data in a database table including the data we receive for processing. This way we can check for errors and show alerts to the user by getting data from that table. It can also serve as a data dump to check what data has been received by our system for processing.

@Kamlesh72
Copy link
Contributor

Kamlesh72 commented Oct 23, 2024

API KEYS

  • User will be able to Create, Read and Delete API Keys.
// APIKeys Table Schema in Postgres
API_KEYS {
    name: string,
    secret_key: string,
    expiry_at: time,
    is_deleted: boolean, // Used for Expired or Deleted keys
    scope: string[] //  [ WORKFLOW, INCIDENT ]
    org_id: string
}

Receiving Webhook Data

// POST /public/webhook/deployments
// Headers: "X-Secret-Key": "secret_key"
{
    workflow_runs: [{
        workflow_name: string,
        repo_name: string,
        event_actor: string,
        head_branch: string,
        workflow_run_id: string,
        status: string, // Success, Failure, Pending, Canceled
        duration: number,
        workflow_run_conducted_at: string (ISO 8601 format),
        html_url: string
    }]
}
// POST /public/webhook/incidents
// Headers: "X-Secret-Key": "secret_key"
{
    incidents: [{
        incident_key: string,
        provider: string, // zenduty, pagerduty etc
        status: string, // Triggered, Acknowledged, Resolved
        title: string,
        creation_date: string,
        acknowledged_date: string,
        resolved_date: string,
        assignees: string[],
        url: string,
        incident_type: string, // Incident or Alert
        provider_service_name: string,
        source_type: string, // ??
    }]
}

Pre Processing Validation

  • Verify API Key
  • Verify size of data
  • Verify required fields
  • Verify if repo exists in middleware
    If error, send 200 with error message and Notify user about erroneous data on email/slack. ( The notification module can be developed separately and later integrated into it )
    If success, then store data in WebhookEventRequests and call the queue, and send 200 response.

Store the data for processing

  1. Store the data in postgres table WebhookEventRequests (which act as DataDump table).
WebhookEvents {
    request_type: "DEPLOYMENT", // Or INCIDENT
    request_data: any, // For storing dump data
    status: string, // Waiting, Running, Skipped, Success, Failure
    error: string,
    created_in_db_at: time, 
    processed_at: time, 
    retries: number
}
  1. Call the Celery/RQ to process data async. The broker will be Redis.
  2. If there is any error, WebhookEvents will be updated accordingly with error.
  3. If no error, then update WebhookEvents and store data in RepoWorkflow and RepoWorkflowRuns.
  • Note: If the server goes down, some of the jobs may not be executed. So either their status in database will be Waiting or Running. Then on server restart, we can mark these as Skipped.

Prune the synced data

  • For a received webhook data, user can request to delete synced data.
  • WebhookEvents.request_data can be used to prune the synced data using queue.
  • So if we go with this way, I think we will also need to store ids of the synced data.

UI

There will be 2 pages namely: Webhooks, API Keys.
Webhooks page will show logs for received deployments and incidents, corresponding queue status can be seen here.

Webhook and API Keys page UI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
advanced These are not really meant for beginners or brand new contributors, but feel free to give it a shot! enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants