Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cdc_* patcher.py (#986, #882, #980, #981, #983, #1008 & more) #1010

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

lukect
Copy link

@lukect lukect commented Jan 27, 2023

This PR fixes every recent detection issue. (#986)

To use my fork right now (maintainer seems to temporarily be inactive):

requirements.txt

git+https://github.com/lukect/undetected-chromedriver.git
selenium~=4.8.0

pip install -r requirements.txt

pip

pip install git+https://github.com/lukect/undetected-chromedriver.git selenium~=4.8.0

@fabifont
Copy link

fabifont commented Jan 27, 2023

@lukect nice update. I didn't check the chromedriver source but are you sure that the cdc string will remain the same and won't change over time? Seems like it is hardcoded too much and according to this reply something is missing.

@fabifont
Copy link

fabifont commented Jan 27, 2023

Anyway I don't understand why you don't remove all cdc_ occurrences with:

with open(driver_path, "r+b") as f:
    data = f.read()
    f.seek(0)
    f.write(data.replace(b"cdc_", <random cdc or spaces>))

@lukect
Copy link
Author

lukect commented Jan 28, 2023

@lukect nice update. I didn't check the chromedriver source but are you sure that the cdc string will remain the same and won't change over time? Seems like it is hardcoded too much and according to this reply something is missing.

My comment on my replacement map shows that window.cdc_adoQpoasnfa76pfcZLmcfl_* props are hard-coded into ChromeDriver.
The random window./^[a-z]{3}_[a-z]{22}_.*/i props were only a feature of undetected-chromedriver, to keep the replacement patch as simple as possible. However, the antibot/fingerprinters read the source code here and wrote code to detect these randomly generated window./^[a-z]{3}_[a-z]{22}_.*/i props so undetected-chromedriver had to add a JavaScript patcher to then remove them. This actually made the patching happening in patcher.py of undetected-chromedriver<=3.2.1 completely redundant.
This behavior of adding random props matching the pattern is what allowed fingerprint.com to precisely fingerprint undetected-chromedriver here: #977

Anyway I don't understand why you don't remove all cdc_ occurrences with:

with open(driver_path, "r+b") as f:
    data = f.read()
    f.seek(0)
    f.write(data.replace(b"cdc_", <random cdc or spaces>))

We have to completely prevent any props from being added, because of what we discussed: #993 (comment)

We have to abandon the JavaScript cdc_prop removal, because there is no clean way to apply it for new tabs/windows/pop-ups opened by a page. An antibot script could simply open a pop-up, run detection, then close it immediately to detect undetected-chromedriver while not really affecting user experience except for a short flash on screen.

What does this mean?

This change only introduces pros and absolutely no cons:

  1. The new patcher.py is still fast. As you can see, I log the time it takes, and it always completes within 0.1 seconds.
  2. The patcher.py is only run at startup, instead of the JavaScript prop removal approach, which had to run every time a new Page/document is received. Removing it is a major performance improvement.

@lukect
Copy link
Author

lukect commented Jan 28, 2023

Also, the hard-coded window.cdc_adoQpoasnfa76pfcZLmcfl_* props in ChromeDriver are unlikely to change, but if they do, we can create a more advanced patcher.py.

I don't believe the props were added to make ChromeDriver detectable. I believe they were added so the DevTools debugger can still function without hinderance, even if a script overrides window.Array, window.Promise or window.Symbol.

@lukect
Copy link
Author

lukect commented Jan 28, 2023

@lukect The #1010 update worked on some sites, eg: https://nowsecure.nl/#relax, https://pixelscan.net/, and https://fingerprint.com/products/bot-detection, but not on others, such as Google Login.
(Your previous solution #993 worked on all 4 of the above, so #1010 appears to be a step in the wrong direction, so far.)

Originally posted by @mdmintz in #993 (comment)

It is not possible that #1010 introduces a new issue. The behavior that the antibots can see is unchanged (unless they open a new tab/window/pop-up, then this PR is a major improvement upon my last).

You can see the proof here by running an adjusted version of my temporary fix script @ #986 (comment) to verify it no longer has any effect with the new patcher.py:

cdc_props = driver.execute_script('const j=[];for(const p in window){'
                                  'if(/^[a-z]{3}_[a-zA-Z0-9]{22}_.*/i.test(p)){'
                                  'j.push(p);delete window[p];}}return j;')
if len(cdc_props) > 0:
    cdc_props_js_array = '[' + ','.join('"' + p + '"' for p in cdc_props) + ']'
    driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument',
                          {'source': cdc_props_js_array + '.forEach(k=>delete window[k]);'})
    print(f"{len(cdc_props)} cdc_props removed")
else:
    print("No cdc_props removal required!")

You will see it will always print("No cdc_props removal required!")!

It is far more likely a fingerprinting issue. Perhaps the antibots can somehow detect that the window is never modified and hence the fingerprint is too plain? I noticed Imperva/Incapsula doesn't allow me through on the plain default Chrome profile anymore, but when I add uBlock Origin, it allows access. This suggests to me that they weren't actually detecting my fork, but that they are giving a negative score for browser profiles deemed too generic.

@fabifont
Copy link

fabifont commented Jan 28, 2023

Also, the hard-coded window.cdc_adoQpoasnfa76pfcZLmcfl_* props in ChromeDriver are unlikely to change, but if they do, we can create a more advanced patcher.py.

I don't believe the props were added to make ChromeDriver detectable. I believe they were added so the DevTools debugger can still function without hinderance, even if a script overrides window.Array, window.Promise or window.Symbol.

Well so what about:

def gen_spaces(match):
    return b" " * (len(match.group()) - 3)

re.sub(b"function \(\) {window.cdc_(.*)\(\);", gen_spaces, data)
re.sub(b"cdc_(.*)||", gen_spaces, data)

@lukect
Copy link
Author

lukect commented Jan 28, 2023

Also, the hard-coded window.cdc_adoQpoasnfa76pfcZLmcfl_* props in ChromeDriver are unlikely to change, but if they do, we can create a more advanced patcher.py.
I don't believe the props were added to make ChromeDriver detectable. I believe they were added so the DevTools debugger can still function without hinderance, even if a script overrides window.Array, window.Promise or window.Symbol.

Well so what about:

def gen_spaces(match):
    return b" " * (len(match.group()) - 3)

re.sub(b"(function \(\) {window.cdc_(.*)\(\);", gen_spaces, data)
re.sub(b"(cdc_(.*)||", gen_spaces, data)

Your regex doesn't compile, but thanks for the idea. I fixed it and tried it:

    def patch_exe(self):
        """
        Patches the ChromeDriver binary

        :return: True on success
        """
        logger.info("patching driver executable %s" % self.executable_path)
        start = time.time()

        def gen_js_whitespaces(match):
            return b" " * len(match.group())

        with io.open(self.executable_path, "r+b") as fh:
            file_bin = fh.read()
            file_bin = re.sub(b"window\.cdc_[a-zA-Z0-9]{22}_(Array|Promise|Symbol) = window\.(Array|Promise|Symbol);",
                   gen_js_whitespaces, file_bin)
            file_bin = re.sub(b"window\.cdc_[a-zA-Z0-9]{22}_(Array|Promise|Symbol) \|\|", gen_js_whitespaces, file_bin)
            fh.seek(0)
            fh.write(file_bin)

        time_taken_s = time.time() - start
        logger.info(f"finished patching driver executable in {time_taken_s:.3f}s")
        return True

This was marginally faster (1-5%) so I will update the PR to use the regex instead.

Thanks

@fabifont
Copy link

Also, the hard-coded window.cdc_adoQpoasnfa76pfcZLmcfl_* props in ChromeDriver are unlikely to change, but if they do, we can create a more advanced patcher.py.
I don't believe the props were added to make ChromeDriver detectable. I believe they were added so the DevTools debugger can still function without hinderance, even if a script overrides window.Array, window.Promise or window.Symbol.

Well so what about:

def gen_spaces(match):
    return b" " * (len(match.group()) - 3)

re.sub(b"(function \(\) {window.cdc_(.*)\(\);", gen_spaces, data)
re.sub(b"(cdc_(.*)||", gen_spaces, data)

Your regex doesn't compile, but thanks for the idea. I fixed it and tried it:

    def patch_exe(self):
        """
        Patches the ChromeDriver binary

        :return: True on success
        """
        logger.info("patching driver executable %s" % self.executable_path)
        start = time.time()

        def gen_js_whitespaces(match):
            return b" " * len(match.group())

        with io.open(self.executable_path, "r+b") as fh:
            file_bin = fh.read()
            file_bin = re.sub(b"window\.cdc_[a-zA-Z0-9]{22}_(Array|Promise|Symbol) = window\.(Array|Promise|Symbol);",
                   gen_js_whitespaces, file_bin)
            file_bin = re.sub(b"window\.cdc_[a-zA-Z0-9]{22}_(Array|Promise|Symbol) \|\|", gen_js_whitespaces, file_bin)
            fh.seek(0)
            fh.write(file_bin)

        time_taken_s = time.time() - start
        logger.info(f"finished patching driver executable in {time_taken_s:.3f}s")
        return True

This was marginally faster (1-5%) so I will update the PR to use the regex instead. I think it is faster because bytearray.replace creates copies while re.sub doesn't.

Thanks

Yes sorry, the initial parenthesis was a typo, I wrote it on the fly with my phone. Also the pattern wasn't accurate but you got the point.

@fabifont
Copy link

Another thing. Before the last update you were removing the entire initialization function. Now you are just removing the lines inside it leaving the body function with only spaces. I don't know if this could be a problem but I preferred to warn you.

@lukect
Copy link
Author

lukect commented Jan 28, 2023

Another thing. Before the last update you were removing the entire initialization function. Now you are just removing the lines inside it leaving the body function with only spaces. I don't know if this could be a problem but I preferred to warn you.

Whitespaces or empty functions shouldn't make any difference. The Chrome V8 engine should optimize both of them to nothing.

@fabifont
Copy link

fabifont commented Jan 30, 2023

Another thing. Before the last update you were removing the entire initialization function. Now you are just removing the lines inside it leaving the body function with only spaces. I don't know if this could be a problem but I preferred to warn you.

Whitespaces or empty functions shouldn't make any difference. The Chrome V8 engine should optimize both of them to nothing.

Ok. What about changing the check for the CDC presence? It's too much hardcoded. Just use if b"cdc_" in fh.read()

@fabifont
Copy link

fabifont commented Jan 31, 2023

@lukect
I also found another cdc occurence that won't be removed:

// |key| is a long random string, unlikely to conflict with anything else.
var key = '$cdc_asdjflasutopfhvcZLmcfl_';
if (w3c) {
    if (!(key in doc))
        doc[key] = new CacheWithUUID();
        return doc[key];
    } else {
        if (!(key in doc))
            doc[key] = new Cache();
        return doc[key];
    }
}

Should it be removed or should cdc be replaced with something else?

Anyway the link you poster above doesn't work. Can you update it?
image

@fabifont
Copy link

Apparently fingerptint.js is still detecting selenium. I found other keywords that they check: https://github.com/fingerprintjs/BotD/blob/d1586a293bcb299a54662ce422e73f5e6fa49f89/src/detectors/document_properties.ts

@lukect
Copy link
Author

lukect commented Jan 31, 2023

Another thing. Before the last update you were removing the entire initialization function. Now you are just removing the lines inside it leaving the body function with only spaces. I don't know if this could be a problem but I preferred to warn you.

Whitespaces or empty functions shouldn't make any difference. The Chrome V8 engine should optimize both of them to nothing.

Ok. What about changing the check for the CDC presence? It's too much hardcoded. Just use if b"cdc_" in fh.read()

We won't know how much prop code to overwrite.

@lukect I also found another cdc occurence that won't be removed:

// |key| is a long random string, unlikely to conflict with anything else.
var key = '$cdc_asdjflasutopfhvcZLmcfl_';
if (w3c) {
    if (!(key in doc))
        doc[key] = new CacheWithUUID();
        return doc[key];
    } else {
        if (!(key in doc))
            doc[key] = new Cache();
        return doc[key];
    }
}

Should it be removed or should cdc be replaced with something else?

Anyway the link you poster above doesn't work. Can you update it? image

I can't find this snippet of .js anywhere in the Chromium codebase.

Apparently fingerptint.js is still detecting selenium. I found other keywords that they check: https://github.com/fingerprintjs/BotD/blob/d1586a293bcb299a54662ce422e73f5e6fa49f89/src/detectors/document_properties.ts

It doesn't detect for me: https://fingerprint.com/products/bot-detection/

I think those checks are outdated.

@fabifont
Copy link

Ok I can confirm that everything is working fine. @ultrafunkamsterdam what do you think about it?

@fabifont
Copy link

fabifont commented Jan 31, 2023

@lukect I found a weird behavior with the old remove cdc props method (I know that it is fixed now but I want understand better what was happening).

I mixed your latest branch with the old remove props:

        cdc_props: list[str] = driver.execute_script(  # type: ignore
            """
            let objectToInspect = window,
                result = [];
            while(objectToInspect !== null)
            { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
              objectToInspect = Object.getPrototypeOf(objectToInspect); }
            return result.filter(i => i.match(/^[a-z]{3}_[a-z]{22}_.*/i))
            """
        )
        if len(cdc_props) < 1:
            return
        cdc_props_js_array = "[" + ", ".join('"' + p + '"' for p in cdc_props) + "]"
        driver.execute_cdp_cmd(
            cmd="Page.addScriptToEvaluateOnNewDocument",
            cmd_args={"source": f"{cdc_props_js_array}.forEach(p => delete window[p] && console.log('removed', p));"},
        )

Then I went to https://fingerprint.com/products/bot-detection/ with a get request to trigger the cdc removal and even if there were NO cdc occurrences (so len(cdc_props) < 1) I was detected.

Removing that part allowed me to bypass the detection.

So the question is: why they could detect me even if I didn't remove any property? I just checked them with a js script but I didn't change them. Hope you can answer me.

@lukect
Copy link
Author

lukect commented Jan 31, 2023

@lukect I found a weird behavior with the old remove cdc props method (I know that it is fixed now but I want understand better what was happening).

I mixed your latest branch with the old remove props:

        cdc_props: list[str] = driver.execute_script(  # type: ignore
            """
            let objectToInspect = window,
                result = [];
            while(objectToInspect !== null)
            { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
              objectToInspect = Object.getPrototypeOf(objectToInspect); }
            return result.filter(i => i.match(/^[a-z]{3}_[a-z]{22}_.*/i))
            """
        )
        if len(cdc_props) < 1:
            return
        cdc_props_js_array = "[" + ", ".join('"' + p + '"' for p in cdc_props) + "]"
        driver.execute_cdp_cmd(
            cmd="Page.addScriptToEvaluateOnNewDocument",
            cmd_args={"source": f"{cdc_props_js_array}.forEach(p => delete window[p] && console.log('removed', p));"},
        )

Then I went to https://fingerprint.com/products/bot-detection/ with a get request to trigger the cdc removal and even if there were NO cdc occurrences (so len(cdc_props) < 1) I was detected.

Removing that part allowed me to bypass the detection.

So the question is: why they could detect me even if I didn't remove any property? I just checked them with a js script but I didn't change them. Hope you can answer me.

I can't tell from the information you have given me here, but you want to make sure the first script only runs on the Chrome home screen / new tab so it only picks up the real props. After that, the removal of the real props when Page.addScriptToEvaluateOnNewDocument is safe to run at anytime, guaranteed.

@fabifont
Copy link

@lukect I found a weird behavior with the old remove cdc props method (I know that it is fixed now but I want understand better what was happening).
I mixed your latest branch with the old remove props:

        cdc_props: list[str] = driver.execute_script(  # type: ignore
            """
            let objectToInspect = window,
                result = [];
            while(objectToInspect !== null)
            { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
              objectToInspect = Object.getPrototypeOf(objectToInspect); }
            return result.filter(i => i.match(/^[a-z]{3}_[a-z]{22}_.*/i))
            """
        )
        if len(cdc_props) < 1:
            return
        cdc_props_js_array = "[" + ", ".join('"' + p + '"' for p in cdc_props) + "]"
        driver.execute_cdp_cmd(
            cmd="Page.addScriptToEvaluateOnNewDocument",
            cmd_args={"source": f"{cdc_props_js_array}.forEach(p => delete window[p] && console.log('removed', p));"},
        )

Then I went to https://fingerprint.com/products/bot-detection/ with a get request to trigger the cdc removal and even if there were NO cdc occurrences (so len(cdc_props) < 1) I was detected.
Removing that part allowed me to bypass the detection.
So the question is: why they could detect me even if I didn't remove any property? I just checked them with a js script but I didn't change them. Hope you can answer me.

I can't tell from the information you have given me here, but you want to make sure the first script only runs on the Chrome home screen / new tab so it only picks up the real props. After that, the removal of the real props when Page.addScriptToEvaluateOnNewDocument is safe to run at anytime, guaranteed.

I found a big detection method. Every execute_script is detectable by fingerprint js. You can try running driver.execute_script("let abc = 'abc'") before getting their page and you will be detected. To avoid this you must run the script directly from the runtime using the cdp protocol. I think I will open a new issue for that.

@lukect
Copy link
Author

lukect commented Jan 31, 2023 via email

@fabifont
Copy link

No, all the variables are gone after you `Chrome.get` the page. Even if you execute a script on the page, I'm pretty sure the variables are inaccessible (in an outside scope the page scripts can't access) if you use `const` or `let`. Not sure about `var`. -------- Original Message --------
On 31 Jan 2023, 23:32, Fabio Fontana < @.> wrote: > > @.[lukect] I found a weird behavior with the old remove cdc props method (I know that it is fixed now but I want understand better what was happening). > > I mixed your latest branch with the old remove props: > > > > > > cdc_props: list[str] = driver.execute_script( # type: ignore > > """ > > let objectToInspect = window, > > result = []; > > while(objectToInspect !== null) > > { result = result.concat(Object.getOwnPropertyNames(objectToInspect)); > > objectToInspect = Object.getPrototypeOf(objectToInspect); } > > return result.filter(i => i.match(/^[a-z]{3}_[a-z]{22}_.*/i)) > > """ > > ) > > if len(cdc_props) < 1: > > return > > cdc_props_js_array = "[" + ", ".join('"' + p + '"' for p in cdc_props) + "]" > > driver.execute_cdp_cmd( > > cmd="Page.addScriptToEvaluateOnNewDocument", > > cmd_args={"source": f"{cdc_props_js_array}.forEach(p => delete window[p] && console.log('removed', p));"}, > > ) > > > > > > Then I went to https://fingerprint.com/products/bot-detection/ with a get request to trigger the cdc removal and even if there were NO cdc occurrences (so len(cdc_props) < 1) I was detected. > > Removing that part allowed me to bypass the detection. > > So the question is: why they could detect me even if I didn't remove any property? I just checked them with a js script but I didn't change them. Hope you can answer me. > > I can't tell from the information you have given me here, but you want to make sure the first script only runs on the Chrome home screen / new tab so it only picks up the real props. After that, the removal of the real props when Page.addScriptToEvaluateOnNewDocument is safe to run at anytime, guaranteed. I found a big detection method. Every execute_script is detectable by fingerprint js. You can try running driver.execute_script("let abc = 'abc'") before getting their page and you will be detected. To avoid this you must run the script directly from the runtime using the cdp protocol. I think I will open a new issue for that. — Reply to this email directly, [view it on GitHub][], or [unsubscribe][]. You are receiving this because you were mentioned.![ANJD657UGE35SZJCQBCBIPLWVGOILA5CNFSM6AAAAAAUI5C2D2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUDV2OW.gif][]Message ID: @.***>
[lukect]: https://github.com/lukect [view it on GitHub]: #1010 (comment) [unsubscribe]: https://github.com/notifications/unsubscribe-auth/ANJD6554UT2534YN7LT5JM3WVGOILANCNFSM6AAAAAAUI5C2DY [ANJD657UGE35SZJCQBCBIPLWVGOILA5CNFSM6AAAAAAUI5C2D2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUDV2OW.gif]: https://github.com/notifications/beacon/ANJD657UGE35SZJCQBCBIPLWVGOILA5CNFSM6AAAAAAUI5C2D2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUDV2OW.gif

Sorry you are right. You will be detected only if you run execute_script after the get request. With the cdp protocol request this won't happen either.

@fabifont
Copy link

#1017 (comment)

@fabifont
Copy link

fabifont commented Feb 1, 2023

No, all the variables are gone after you Chrome.get the page. Even if you execute a script on the page, I'm pretty sure the variables are inaccessible (in an outside scope the page scripts can't access) if you use const or let. Not sure about var. -------- Original Message --------
On 31 Jan 2023, 23:32, Fabio Fontana < @.> wrote: > > _@**._[lukect] I found a weird behavior with the old remove cdc props method (I know that it is fixed now but I want understand better what was happening). > > I mixed your latest branch with the old remove props: > > > > > > cdc_props: list[str] = driver.execute_script( # type: ignore > > """ > > let objectToInspect = window, > > result = []; > > while(objectToInspect !== null) > > { result = result.concat(Object.getOwnPropertyNames(objectToInspect)); > > objectToInspect = Object.getPrototypeOf(objectToInspect); } > > return result.filter(i => i.match(/^[a-z]{3}_[a-z]{22}_.*/i)) > > """ > > ) > > if len(cdc_props) < 1: > > return > > cdc_props_js_array = "[" + ", ".join('"' + p + '"' for p in cdc_props) + "]" > > driver.execute_cdp_cmd( > > cmd="Page.addScriptToEvaluateOnNewDocument", > > cmd_args={"source": f"{cdc_props_js_array}.forEach(p => delete window[p] && console.log('removed', p));"}, > > ) > > > > > > Then I went to https://fingerprint.com/products/bot-detection/ with a get request to trigger the cdc removal and even if there were NO cdc occurrences (so len(cdc_props) < 1) I was detected. > > Removing that part allowed me to bypass the detection. > > So the question is: why they could detect me even if I didn't remove any property? I just checked them with a js script but I didn't change them. Hope you can answer me. > > I can't tell from the information you have given me here, but you want to make sure the first script only runs on the Chrome home screen / new tab so it only picks up the real props. After that, the removal of the real props when Page.addScriptToEvaluateOnNewDocument is safe to run at anytime, guaranteed. I found a big detection method. Every execute_script is detectable by fingerprint js. You can try running driver.execute_script("let abc = 'abc'") before getting their page and you will be detected. To avoid this you must run the script directly from the runtime using the cdp protocol. I think I will open a new issue for that. — Reply to this email directly, [view it on GitHub][], or [unsubscribe][]. You are receiving this because you were mentioned.![ANJD657UGE35SZJCQBCBIPLWVGOILA5CNFSM6AAAAAAUI5C2D2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUDV2OW.gif][]Message ID: _@_.*>
[lukect]: https://github.com/lukect [view it on GitHub]: #1010 (comment) [unsubscribe]: https://github.com/notifications/unsubscribe-auth/ANJD6554UT2534YN7LT5JM3WVGOILANCNFSM6AAAAAAUI5C2DY [ANJD657UGE35SZJCQBCBIPLWVGOILA5CNFSM6AAAAAAUI5C2D2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUDV2OW.gif]: https://github.com/notifications/beacon/ANJD657UGE35SZJCQBCBIPLWVGOILA5CNFSM6AAAAAAUI5C2D2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUDV2OW.gif

Sorry you are right. You will be detected only if you run execute_script after the get request. With the cdp protocol request this won't happen either.

And it doesn't affect only variable declaration but every js snippet like console.log("abc")

@melleryb
Copy link

melleryb commented Feb 2, 2023

This PR fixes every recent detection issue. (#986)

To use my fork right now (maintainer seems to temporarily be inactive):

requirements.txt

git+https://github.com/lukect/undetected-chromedriver.git
selenium~=4.8.0

pip install -r requirements.txt

pip

pip install git+https://github.com/lukect/undetected-chromedriver.git selenium~=4.8.0

Hi.I try your solution last night it was really cool and worked.But this morning i got error from imperva.I try from different ip but it's dedected again.Fingerprint is clear but i don't have idea how imperva got me

@lukect
Copy link
Author

lukect commented Feb 3, 2023

@lukect does the last commit fix #1017 ? Can you explain what was happening?

Yes it fixes, mostly.

@lukect I also found another cdc occurence that won't be removed:

// |key| is a long random string, unlikely to conflict with anything else.
var key = '$cdc_asdjflasutopfhvcZLmcfl_';
if (w3c) {
    if (!(key in doc))
        doc[key] = new CacheWithUUID();
        return doc[key];
    } else {
        if (!(key in doc))
            doc[key] = new Cache();
        return doc[key];
    }
}

Should it be removed or should cdc be replaced with something else?

Anyway the link you poster above doesn't work. Can you update it? image

Basically you were correct about this after all. I'm not sure why the GitHub search didn't show this for me, but it is there: https://source.chromium.org/chromium/chromium/src/+/main:chrome/test/chromedriver/js/call_function.js

This commit should fix all the current detections. I will make this cache even less detectable soon to protect it in the long-term (though this may require a full .js file replacement instead of a regex)!

There is basically a cache for the JavaScript environment to store references to elements. This cache is store on the document and hence is detectable. I made it so the name of the cache property is completely random, which will make detection far harder, but not impossible.

If the antibots catch up on this fix we can still:

  1. Store the cache inside the window object instead?
  2. Scramble the cache's object prototype function names as well?
  3. Add fake functions to the cache's object prototype?
  4. Make the cache object look like an object always added by a popular extension?
  5. Most extreme, but stealthy option if all else causes detection: Create a Chrome extension to maintain the cache in a context/scope hidden from the webpage, then communicate with this extension from Selenium/undetected-chromedriver?

This was referenced Feb 3, 2023
@fabifont
Copy link

fabifont commented Feb 3, 2023

@lukect does the last commit fix #1017 ? Can you explain what was happening?

Yes it fixes, mostly.

@lukect I also found another cdc occurence that won't be removed:

// |key| is a long random string, unlikely to conflict with anything else.
var key = '$cdc_asdjflasutopfhvcZLmcfl_';
if (w3c) {
    if (!(key in doc))
        doc[key] = new CacheWithUUID();
        return doc[key];
    } else {
        if (!(key in doc))
            doc[key] = new Cache();
        return doc[key];
    }
}

Should it be removed or should cdc be replaced with something else?
Anyway the link you poster above doesn't work. Can you update it? image

Basically you were correct about this after all. I'm not sure why the GitHub search didn't show this for me, but it is there: https://source.chromium.org/chromium/chromium/src/+/main:chrome/test/chromedriver/js/call_function.js

This commit should fix all the current detections. I will make this cache even less detectable soon to protect it in the long-term (though this may require a full .js file replacement instead of a regex)!

There is basically a cache for the JavaScript environment to store references to elements. This cache is store on the document and hence is detectable. I made it so the name of the cache property is completely random, which will make detection far harder, but not impossible.

If the antibots catch up on this fix we can still:

1. Store the cache inside the `window` object instead?

2. Scramble the cache's object prototype function names as well?

3. Add fake functions to the cache's object prototype?

4. Make the cache object look like the object always added by a popular extension?

5. Most extreme, but stealthy option if all else causes detection: Create a Chrome extension to maintain the cache in a context/scope hidden from the webpage, then communicate with this extension from Selenium/undetected-chromedriver?

Perfect. As you said this solution is hard to detect but not impossible. I think that the most suitable options are 1, 2 or 3. Creating or faking an extension will just add more detectable stuff to the project.

This PR is good for me, I would only change the hardcoded if I told you about in a previous comment because it can change quickly over time. Just leave if b"cdc_" in .

Let's see what @ultrafunkamsterdam thinks about that.

@fabifont
Copy link

fabifont commented Feb 3, 2023

@lukect I can confirm that I am not getting detected anymore. I'm using that replacement:

file_bin = re.sub(rb"\$cdc_[a-zA-Z0-9]{22}_", lambda m: bytes(random.choices((string.ascii_letters + string.digits).encode("ascii"), k=len(m.group()))), file_bin)

@phuvipro618
Copy link

Thank you. It works fine.

@mdonova33
Copy link

mdonova33 commented Feb 4, 2023

Any chance we can get these updates ported to https://github.com/seleniumbase/SeleniumBase uc_mode? This works with a proxy on windows, while current uc mode in seleniumbase with a proxy fails @mdmintz ?

@lukect
Copy link
Author

lukect commented Feb 4, 2023

@lukect I can confirm that I am not getting detected anymore. I'm using that replacement:

file_bin = re.sub(rb"\$cdc_[a-zA-Z0-9]{22}_", lambda m: bytes(random.choices((string.ascii_letters + string.digits).encode("ascii"), k=len(m.group()))), file_bin)

The lambda function is bad imo. The random prop name's length should also be random for better undetectability.

@mdmintz
Copy link

mdmintz commented Feb 4, 2023

@mdonova33 those updates were already ported over and worked for me. See seleniumbase/SeleniumBase#1725 (comment) for usage.

@lukect lukect changed the title Fix window.cdc_* patcher.py (#986, #882, #980, #981, #983, #1008 & more) Fix cdc_* patcher.py (#986, #882, #980, #981, #983, #1008 & more) Feb 4, 2023
@lukect
Copy link
Author

lukect commented Feb 4, 2023

@ultrafunkamsterdam 3.4.0 doesn't fix all of the recent detections and actually adds new vulnerabilities that the antibots will be using to their advantage by next week.

@alexreg
Copy link

alexreg commented Feb 6, 2023

@lukect Indeed, both the 3.4 release and unfortunately your fork too are now failing again on familysearch.org. (An Incapsula update in the last hour, it seems.)

@L4BORG
Copy link

L4BORG commented Feb 6, 2023

If it is detectable you can be sure they will use it, you can also be sure they're reading this... 😱

@fabifont
Copy link

fabifont commented Feb 7, 2023

@lukect Indeed, both the 3.4 release and unfortunately your fork too are now failing again on familysearch.org. (An Incapsula update in the last hour, it seems.)

Can you provide a snippet ? I want to reproduce that

@alexreg
Copy link

alexreg commented Feb 7, 2023

@fabifont Sorry, probably a false alarm... I was persisting the user data dir, and it looks like something bad happened to it that made it detectable. When I reset that dir, it went back to working fine.

I suspect the issue might be switching between "headful" and headless modes whilst using the same user dir, as per this issue. Interestingly, I can operate fine in headful mode, but when I switch to headless mode, it causes the loss of the session, even when I then return to headful mode. Perhaps Incapsula detected me switching between headful and headless mode at one point?

@nikolsky
Copy link

nikolsky commented Feb 7, 2023

Started detecting by cloudflare several hours ago :(

@fabifont
Copy link

fabifont commented Feb 7, 2023

Started detecting by cloudflare several hours ago :(

Please always provide a snippet to reproduce the problem. Otherwise I won't consider it reliable.

@fabifont
Copy link

@fabifont Sorry, here is the site, that worked today morning for me with fix: git+https://github.com/lukect/undetected-chromedriver.git

Now Under Attack Mode is happening neither git+https://github.com/lukect/undetected-chromedriver.git nor undetected-chromedriver==3.4.2:

import undetected_chromedriver as uc
import time

browser = uc.Chrome(version_main=109)

browser.get('https://www.vfsvisaservicesrussia.com/Global-Appointment/Account/RegisteredLogin?q=shSA0YnE4pLF9Xzwon/x/BGxVUxGuaZP3eMAtGHiEL0kQAXm+Lc2PfVNUJtzf7vWRu19bwvTWMZ48njgDU5r4g==')

time.sleep(100)

Chrome version is 109, selenium==4.8.0

Screenshot 2023-02-07 at 22 35 48

Update: Oh my god! I did nothing and it start working again. Is it possible that the developers can tune cloudflare security level?

Yes, it is possible that they are mixing "under attack mode" + "managed turnstile" + "super bot fight mode". If you wanna learn more about that see the Cloudflare documentation.

@lukect
Copy link
Author

lukect commented Feb 10, 2023

@fabifont Is Cloudflare actually detecting my fork in any of these modes?

@fabifont
Copy link

fabifont commented Feb 10, 2023

@fabifont Is Cloudflare actually detecting my fork in any of these modes?

2 days ago I tried visiting the website sent by @nikolsky and Cloudflare spawned the turnstile captcha. So I guess that there is something that is still detectable or the website is showing the challenge by default

@luckymancvp
Copy link

I go to https://www.vfsvisaservicesrussia.com/Global-Appointment/Account/RegisteredLogin?q=shSA0YnE4pLF9Xzwon/x/BGxVUxGuaZP3eMAtGHiEL0kQAXm+Lc2PfVNUJtzf7vWRu19bwvTWMZ48njgDU5r4g==

by me browser, It still showed captcha from cloudflare. ( not use selenium or any lib, just manual)

@lukect
Copy link
Author

lukect commented Feb 10, 2023

I go to https://www.vfsvisaservicesrussia.com/Global-Appointment/Account/RegisteredLogin?q=shSA0YnE4pLF9Xzwon/x/BGxVUxGuaZP3eMAtGHiEL0kQAXm+Lc2PfVNUJtzf7vWRu19bwvTWMZ48njgDU5r4g==

by me browser, It still showed captcha from cloudflare. ( not use selenium or any lib, just manual)

Yeah I'm pretty sure Cloudflare isn't detecting anything and the website just shows a CAPTCHA for every browser when in a certain mode.

It's not like these checks are intensive or anything: I'm sure they will run every check in every mode.

@fabifont
Copy link

I go to https://www.vfsvisaservicesrussia.com/Global-Appointment/Account/RegisteredLogin?q=shSA0YnE4pLF9Xzwon/x/BGxVUxGuaZP3eMAtGHiEL0kQAXm+Lc2PfVNUJtzf7vWRu19bwvTWMZ48njgDU5r4g==
by me browser, It still showed captcha from cloudflare. ( not use selenium or any lib, just manual)

Yeah I'm pretty sure Cloudflare isn't detecting anything and the website just shows a CAPTCHA for every browser when in a certain mode.

It's not like these checks are intensive or anything: I'm sure they will run every check in every mode.

Yes, I agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.