Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replaying deadletter with replay udf and includePubsubMessage=true doesn't work #38

Open
mhite opened this issue Mar 3, 2023 · 2 comments
Labels
bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@mhite
Copy link
Contributor

mhite commented Mar 3, 2023

The default configured behavior of the log export pipeline is to force includePubsubMessage to true[1].

When used in combination with sample Dataflow UDF message replay Javascript[2], replayed messages from the deadletter subscription won't be handled correctly. As messages enter the original/main log subscription, they will be re-wrapped in the data key again. You will end up with strange logs like this in Splunk:

{
  "data": {
    "time": 1677812444044,
    "event": "{\"data\":{\"insertId\":\"sv1hmye6bz6p\",\"logName\":\"projects/redacted-project/logs/[cloudaudit.googleapis.com](http://cloudaudit.googleapis.com/)%2Fdata_access\",\"protoPayload\":{\"@type\":\"[type.googleapis.com/google.cloud.audit.AuditLog\](http://type.googleapis.com/google.cloud.audit.AuditLog%5C)",\"authenticationInfo\":{\"principalEmail\":\"[redacteduser-sfx-scraper@redacted-project.iam.gserviceaccount.com](mailto:redacteduser-sfx-scraper@redacted-project.iam.gserviceaccount.com)\",\"principalSubject\":\"[serviceAccount:redacteduser-sfx-scraper@redacted-project.iam.gserviceaccount.com](mailto:serviceAccount%3Aredacteduser-sfx-scraper@redacted-project.iam.gserviceaccount.com)\",\"serviceAccountKeyName\":\"//[iam.googleapis.com/projects/redacted-project/serviceAccounts/redacteduser-sfx-scraper@redacted-project.iam.gserviceaccount.com/keys/a49e3382174e3775dec6fd4e5b593dbfed8c1ecc\](http://iam.googleapis.com/projects/redacted-project/serviceAccounts/redacteduser-sfx-scraper@redacted-project.iam.gserviceaccount.com/keys/a49e3382174e3775dec6fd4e5b593dbfed8c1ecc%5C)"},\"authorizationInfo\":[{\"granted\":true,\"permission\":\"monitoring.timeSeries.list\",\"resource\":\"50701599922\",\"resourceAttributes\":{}}],\"methodName\":\"google.monitoring.v3.MetricService.ListTimeSeries\",\"request\":{\"@type\":\"[type.googleapis.com/google.monitoring.v3.ListTimeSeriesRequest\](http://type.googleapis.com/google.monitoring.v3.ListTimeSeriesRequest%5C)",\"filter\":\"metric.type = \\\"[kubernetes.io/node/cpu/allocatable_utilization\\\](http://kubernetes.io/node/cpu/allocatable_utilization%5C%5C%5C)"\",\"name\":\"projects/redacted-project\",\"pageSize\":10000},\"requestMetadata\":{\"callerIp\":\"44.230.82.104\",\"callerSuppliedUserAgent\":\"grpc-java-netty/1.51.1,gzip(gfe)\",\"destinationAttributes\":{},\"requestAttributes\":{\"auth\":{},\"time\":\"2023-03-03T03:00:44.050942344Z\"}},\"resourceName\":\"projects/redacted-project\",\"serviceName\":\"[monitoring.googleapis.com](http://monitoring.googleapis.com/)\"},\"receiveTimestamp\":\"2023-03-03T03:00:44.174642381Z\",\"resource\":{\"labels\":{\"method\":\"google.monitoring.v3.MetricService.ListTimeSeries\",\"project_id\":\"redacted-project\",\"service\":\"[monitoring.googleapis.com](http://monitoring.googleapis.com/)\"},\"type\":\"audited_resource\"},\"severity\":\"INFO\",\"timestamp\":\"2023-03-03T03:00:44.044614574Z\"},\"attributes\":{\"[logging.googleapis.com/timestamp\](http://logging.googleapis.com/timestamp%5C)":\"2023-03-03T03:00:44.044614574Z\"},\"messageId\":\"7058887762594632\",\"delivery_attempt\":1}"
  },
  "attributes": {
    "errorMessage": "Splunk write status code: 403",
    "timestamp": "2023-03-03 03:00:34.000000"
  },
  "messageId": "7058953036869866",
  "delivery_attempt": 1
}

[1] - https://github.com/GoogleCloudPlatform/terraform-splunk-log-export/blob/main/main.tf#L85
[2] - https://storage.googleapis.com/splk-public/js/dataflow_udf_messages_replay.js

@rarsan rarsan added bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers labels Mar 3, 2023
@rarsan
Copy link
Contributor

rarsan commented Mar 21, 2023

Thx for filing this issue! Here's an updated sample UDF function for replay which supports both cases of includePubsubMessage (true or false). This UDF handles:

  • parsing the payload in both the original message (original log entry) and the failed message (log entry nested in Splunk HEC event), and
  • inferring whether Pub/Sub message wrapper is included or not.

You'll need to copy and upload the UDF function into your own GCS bucket, and set dataflow_job_udf_gcs_path parameter accordingly.
Please give this a try. Once verified, we'll update the guide and the published UDF you linked above.

@mhite
Copy link
Contributor Author

mhite commented Mar 23, 2023

Here's what I tested:

  • Replace UDF on my existing Dataflow that does not use "data" wrapper. Noticed no interruption in logs.
  • Introduced outage, ie. disabled token. Events pushed to deadletter queue.
  • Fixed outage, ie. re-enabled token.
  • Deployed replay Dataflow pipeline.
  • Logs with delivery_attempt=2 show up in Splunk, backfill works.
  • Tear down replay Dataflow pipeline.
  • Switched "includePubsubMessage" == true, redeployed Dataflow pipeline
  • Verified new messages with "data" wrapper now showing up. delivery_attempt appears as a field under data wrapper.
  • Introduced outage, ie. disabled token. Events pushed to deadletter queue.
  • Fixed outage, ie. re-enabled token.
  • Deployed replay Dataflow pipeline.
  • Logs with data.delivery_attempt=2 show up in Splunk, backfill works.
  • Tear down replay Dataflow pipeline.
  • Reverted back to "includePubsubMessage" == false

So just from that set of smoke tests, it looks like you have addressed the data corruption issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants