Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dotnet] Avoid exceptions in cdp logging on file upload #14972

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

RenderMichael
Copy link
Contributor

@RenderMichael RenderMichael commented Dec 28, 2024

User description

Thanks for contributing to Selenium!
A PR well described will help maintainers to quickly review and merge it

Before submitting your PR, please check our contributing guidelines.
Avoid large PRs, help reviewers by making them as simple and short as possible.

Description

This workaround avoids throwing an exception when cdp is listening to a file upload.

Motivation and Context

(Partial?) solution for #14903

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

PR Type

Bug fix


Description

  • Implements a fix to prevent sensitive data exposure in CDP logging during file upload operations
  • When handling CDP events containing post data (typically file uploads), the raw post data is now replaced with a placeholder message
  • Adds conditional logic to detect and handle request nodes containing post data
  • Maintains existing logging functionality for non-file upload events

Changes walkthrough 📝

Relevant files
Error handling
DevToolsSession.cs
Sanitize CDP logging for file upload post data                     

dotnet/src/webdriver/DevTools/DevToolsSession.cs

  • Added handling for sensitive post data in CDP logging during file
    uploads
  • Implemented data sanitization by replacing raw post data with a
    placeholder message
  • Added conditional logic to handle request nodes with post data
    differently
  • +13/-1   

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Typo

    The word "Recieved" is misspelled in the log message. Should be "Received"

    LogTrace("Recieved Event {0}: {1}", method, loggableEventData.ToString());
    Error Handling

    The code assumes postData and request properties exist and can be accessed without null checks, which could lead to NullReferenceException

    var loggableRequest = loggableEventData["request"]!;
    loggableRequest["postData"] = "*RAW POST DATA REMOVED FROM LOGS*";
    loggableRequest["postDataEntries"] = new JsonArray();

    Copy link
    Contributor

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add defensive programming by validating JSON data before access to prevent runtime exceptions

    Add null check for eventData before accessing it to prevent potential
    NullReferenceException

    dotnet/src/webdriver/DevTools/DevToolsSession.cs [603]

    -if (eventData.AsObject().TryGetPropertyValue("request", out var requestNode)
    +if (eventData != null && eventData.AsObject().TryGetPropertyValue("request", out var requestNode)
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: This is an important defensive programming suggestion that could prevent runtime crashes. The null check is crucial when dealing with JSON data that might be malformed or incomplete.

    8
    General
    Correct spelling error in log message to maintain code quality and professionalism

    Fix the typo in the log message where "Received" is misspelled as "Recieved"

    dotnet/src/webdriver/DevTools/DevToolsSession.cs [610]

    -LogTrace("Recieved Event {0}: {1}", method, loggableEventData.ToString());
    +LogTrace("Received Event {0}: {1}", method, loggableEventData.ToString());
    • Apply this suggestion
    Suggestion importance[1-10]: 3

    Why: While this is a valid correction of a spelling mistake ("Recieved" to "Received"), it's a minor issue that only affects log messages and doesn't impact functionality.

    3

    else
    {
    LogTrace("Recieved Event {0}: {1}", method, eventData.ToString());
    }
    Copy link
    Contributor Author

    @RenderMichael RenderMichael Dec 28, 2024

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Fundamentally, this error happens because the string resembles JSON, but is in fact invalid JSON
    image

    @@ -600,7 +600,19 @@ private void ProcessMessage(string message)
    var methodParts = method.Split(new char[] { '.' }, 2);
    var eventData = messageObject["params"];

    LogTrace("Recieved Event {0}: {1}", method, eventData.ToString());
    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    This is not a complete solution, the JSON exception is simply thrown later:
    image

    The above is in generated cdp code (\cdp\v131\Fetch\FetchAdapter.cs in the above screenshot).

    Related relevant code:
    image

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    @nvborisenko You mentioned trying out JavaScriptEncoder.UnsafeRelaxedJsonEscaping. I cannot edit the generated code directly but I added a spoof here, and it did not fix the exception
    image

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I modified generator to use JavaScriptEncoder.UnsafeRelaxedJsonEscaping, it didn't help. The worst scenario is that it works with Newtonsoft.Json (Selenium v4.23?).

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I do not understand how Newtonsoft accepts this nonsense input. It’s not real JSON, just some raw binary data.

    Maybe we should intersept the post data and sanitize it ourselves?

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    One more observation: in json response I see binary data as is, probably not even encoded! This is why VS Code shows json improperly. CDP specification doesn't say it should be encoded (base64). I think this is why they deprecated it.

    PS: But why Newtonsoft.Json could parse it successfully? I think it is issue in this library. I didn't check it yet, but I guess:

    1 using old selenium version (v4.23?) when we upload binary file
    2 and we download it manually back
    3 then the content of "original" and "downloaded" files is different

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I tried switching from JsonNode to JsonElement just to see what would happen.

    Good news, no exceptions on ToString() like the logging has, but the de-serialization still fails
    image

    Maybe we should parse and encode the postData ourselves?

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I am stupid for this activity. It is easier to:

    • really Newtonsoft.Json parses it correctly?

    Copy link
    Contributor Author

    @RenderMichael RenderMichael Dec 30, 2024

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I narrowed down the issue. It looks like this character U+DB40 appears here for some reason. It is the first part of a UTF-16 surrogate pair, but the second half is missing.
    image

    I opened the raw Firefox Installer.exe file and found this segment here:
    image

    Note how in this case, it is just a regular I.

    What is happening here? The CDP is giving us back broken JSON, this may be a bug on their end.

    I will investigate whether it is possible to work around this somehow.

    Copy link
    Contributor Author

    @RenderMichael RenderMichael Dec 30, 2024

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I found a workaround, with a custom converter.

    We can add to the postData property:

    image

    As well as to PostDataEntry.Bytes:
    image

    The following converter:

    internal sealed class InvalidUtf16Converter : JsonConverter<string>
    {
        public override bool HandleNull => true;
    
        public override string? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
        {
            try
            {
                return reader.GetString();
            }
            catch (InvalidOperationException)
            {
                var bytes = reader.ValueSpan;
                var sb = new StringBuilder(bytes.Length);
                foreach (var b in bytes)
                    sb.Append(Convert.ToChar(b));
                return sb.ToString();
            }
        }
    
        public override void Write(Utf8JsonWriter writer, string value, JsonSerializerOptions options) =>
            writer.WriteStringValue(value);
    }

    Since we have a good idea of which models may have arbitrary data,

    However... it only works with JsonElement.Deserialize<T>. When using a JsonNode.Deserialize<T>, the exception happens way before even the converter is reached. Even JsonNode.ToString() throws an exception!

    For this reason, we may need to change various JsonNodes to JsonElements, such as DevToolsEventReceivedEventArgs.EventData (that property is used everywhere in the cdp-generated code, but luckily the line e.EventData.Deserialize(eventData.EventArgsType) works exactly the same (it just hits a different overload). Also in DevToolsCommandData.Result

    I am not familiar enough with the cdp code generation process to make generator changes. Is there any docs for this?

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Most likely you are interested in https://github.com/SeleniumHQ/selenium/tree/trunk/third_party/dotnet/devtools/src/generator/Templates

    There is no docs, generally saying build process invokes generation of CDP classes based on *.cdl (I don't remember exactly, it is a definition of contract) files applying *.hbs templates (https://handlebarsjs.com/).

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants