-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cdc_* patcher.py (#986, #882, #980, #981, #983, #1008 & more) #1010
base: master
Are you sure you want to change the base?
Conversation
@lukect nice update. I didn't check the chromedriver source but are you sure that the cdc string will remain the same and won't change over time? Seems like it is hardcoded too much and according to this reply something is missing. |
Anyway I don't understand why you don't remove all with open(driver_path, "r+b") as f:
data = f.read()
f.seek(0)
f.write(data.replace(b"cdc_", <random cdc or spaces>)) |
My comment on my replacement map shows that
We have to completely prevent any props from being added, because of what we discussed: #993 (comment) We have to abandon the JavaScript cdc_prop removal, because there is no clean way to apply it for new tabs/windows/pop-ups opened by a page. An antibot script could simply open a pop-up, run detection, then close it immediately to detect What does this mean?This change only introduces pros and absolutely no cons:
|
Also, the hard-coded I don't believe the props were added to make |
Originally posted by @mdmintz in #993 (comment) It is not possible that #1010 introduces a new issue. The behavior that the antibots can see is unchanged (unless they open a new tab/window/pop-up, then this PR is a major improvement upon my last). You can see the proof here by running an adjusted version of my temporary fix script @ #986 (comment) to verify it no longer has any effect with the new
You will see it will always It is far more likely a fingerprinting issue. Perhaps the antibots can somehow detect that the |
Well so what about: def gen_spaces(match):
return b" " * (len(match.group()) - 3)
re.sub(b"function \(\) {window.cdc_(.*)\(\);", gen_spaces, data)
re.sub(b"cdc_(.*)||", gen_spaces, data) |
Your regex doesn't compile, but thanks for the idea. I fixed it and tried it: def patch_exe(self):
"""
Patches the ChromeDriver binary
:return: True on success
"""
logger.info("patching driver executable %s" % self.executable_path)
start = time.time()
def gen_js_whitespaces(match):
return b" " * len(match.group())
with io.open(self.executable_path, "r+b") as fh:
file_bin = fh.read()
file_bin = re.sub(b"window\.cdc_[a-zA-Z0-9]{22}_(Array|Promise|Symbol) = window\.(Array|Promise|Symbol);",
gen_js_whitespaces, file_bin)
file_bin = re.sub(b"window\.cdc_[a-zA-Z0-9]{22}_(Array|Promise|Symbol) \|\|", gen_js_whitespaces, file_bin)
fh.seek(0)
fh.write(file_bin)
time_taken_s = time.time() - start
logger.info(f"finished patching driver executable in {time_taken_s:.3f}s")
return True This was marginally faster (1-5%) so I will update the PR to use the regex instead. Thanks |
Yes sorry, the initial parenthesis was a typo, I wrote it on the fly with my phone. Also the pattern wasn't accurate but you got the point. |
Another thing. Before the last update you were removing the entire initialization function. Now you are just removing the lines inside it leaving the body function with only spaces. I don't know if this could be a problem but I preferred to warn you. |
Whitespaces or empty functions shouldn't make any difference. The Chrome V8 engine should optimize both of them to nothing. |
Ok. What about changing the check for the CDC presence? It's too much hardcoded. Just use |
@lukect // |key| is a long random string, unlikely to conflict with anything else.
var key = '$cdc_asdjflasutopfhvcZLmcfl_';
if (w3c) {
if (!(key in doc))
doc[key] = new CacheWithUUID();
return doc[key];
} else {
if (!(key in doc))
doc[key] = new Cache();
return doc[key];
}
} Should it be removed or should Anyway the link you poster above doesn't work. Can you update it? |
Apparently fingerptint.js is still detecting selenium. I found other keywords that they check: https://github.com/fingerprintjs/BotD/blob/d1586a293bcb299a54662ce422e73f5e6fa49f89/src/detectors/document_properties.ts |
We won't know how much prop code to overwrite.
I can't find this snippet of .js anywhere in the Chromium codebase.
It doesn't detect for me: https://fingerprint.com/products/bot-detection/ I think those checks are outdated. |
Ok I can confirm that everything is working fine. @ultrafunkamsterdam what do you think about it? |
@lukect I found a weird behavior with the old remove cdc props method (I know that it is fixed now but I want understand better what was happening). I mixed your latest branch with the old remove props: cdc_props: list[str] = driver.execute_script( # type: ignore
"""
let objectToInspect = window,
result = [];
while(objectToInspect !== null)
{ result = result.concat(Object.getOwnPropertyNames(objectToInspect));
objectToInspect = Object.getPrototypeOf(objectToInspect); }
return result.filter(i => i.match(/^[a-z]{3}_[a-z]{22}_.*/i))
"""
)
if len(cdc_props) < 1:
return
cdc_props_js_array = "[" + ", ".join('"' + p + '"' for p in cdc_props) + "]"
driver.execute_cdp_cmd(
cmd="Page.addScriptToEvaluateOnNewDocument",
cmd_args={"source": f"{cdc_props_js_array}.forEach(p => delete window[p] && console.log('removed', p));"},
) Then I went to https://fingerprint.com/products/bot-detection/ with a get request to trigger the cdc removal and even if there were NO cdc occurrences (so Removing that part allowed me to bypass the detection. So the question is: why they could detect me even if I didn't remove any property? I just checked them with a js script but I didn't change them. Hope you can answer me. |
I can't tell from the information you have given me here, but you want to make sure the first script only runs on the Chrome home screen / new tab so it only picks up the real props. After that, the removal of the real props when |
I found a big detection method. Every |
No, all the variables are gone after you \`Chrome.get\` the page. Even if you execute a script on the page, I'm pretty sure the variables are inaccessible (in an outside scope the page scripts can't access) if you use \`const\` or \`let\`. Not sure about \`var\`.
\-------- Original Message --------
On 31 Jan 2023, 23:32, Fabio Fontana < ***@***.***> wrote:
> > ***@***.***[lukect] I found a weird behavior with the old remove cdc props method (I know that it is fixed now but I want understand better what was happening).
> > I mixed your latest branch with the old remove props:
> >
> > ```
> > cdc_props: list[str] = driver.execute_script( # type: ignore
> > """
> > let objectToInspect = window,
> > result = [];
> > while(objectToInspect !== null)
> > { result = result.concat(Object.getOwnPropertyNames(objectToInspect));
> > objectToInspect = Object.getPrototypeOf(objectToInspect); }
> > return result.filter(i => i.match(/^[a-z]{3}_[a-z]{22}_.*/i))
> > """
> > )
> > if len(cdc_props) < 1:
> > return
> > cdc_props_js_array = "[" + ", ".join('"' + p + '"' for p in cdc_props) + "]"
> > driver.execute_cdp_cmd(
> > cmd="Page.addScriptToEvaluateOnNewDocument",
> > cmd_args={"source": f"{cdc_props_js_array}.forEach(p => delete window[p] && console.log('removed', p));"},
> > )
> > ```
> >
> > Then I went to https://fingerprint.com/products/bot-detection/ with a get request to trigger the cdc removal and even if there were NO cdc occurrences (so `len(cdc_props) < 1`) I was detected.
> > Removing that part allowed me to bypass the detection.
> > So the question is: why they could detect me even if I didn't remove any property? I just checked them with a js script but I didn't change them. Hope you can answer me.
>
> I can't tell from the information you have given me here, but you want to make sure the first script only runs on the Chrome home screen / new tab so it only picks up the real props. After that, the removal of the real props when `Page.addScriptToEvaluateOnNewDocument` is safe to run at anytime, guaranteed.
I found a big detection method. Every `execute_script` is detectable by fingerprint js. You can try running `driver.execute_script("let abc = 'abc'")` before getting their page and you will be detected. To avoid this you must run the script directly from the runtime using the cdp protocol. I think I will open a new issue for that.
—
Reply to this email directly, [view it on GitHub][], or [unsubscribe][].
You are receiving this because you were mentioned.![ANJD657UGE35SZJCQBCBIPLWVGOILA5CNFSM6AAAAAAUI5C2D2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUDV2OW.gif][]Message ID: ***@***.***>
[lukect]: https://github.com/lukect
[view it on GitHub]: #1010 (comment)
[unsubscribe]: https://github.com/notifications/unsubscribe-auth/ANJD6554UT2534YN7LT5JM3WVGOILANCNFSM6AAAAAAUI5C2DY
[ANJD657UGE35SZJCQBCBIPLWVGOILA5CNFSM6AAAAAAUI5C2D2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUDV2OW.gif]: https://github.com/notifications/beacon/ANJD657UGE35SZJCQBCBIPLWVGOILA5CNFSM6AAAAAAUI5C2D2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUDV2OW.gif
|
Sorry you are right. You will be detected only if you run execute_script after the get request. With the cdp protocol request this won't happen either. |
And it doesn't affect only variable declaration but every js snippet like |
Hi.I try your solution last night it was really cool and worked.But this morning i got error from imperva.I try from different ip but it's dedected again.Fingerprint is clear but i don't have idea how imperva got me |
Yes it fixes, mostly.
Basically you were correct about this after all. I'm not sure why the GitHub search didn't show this for me, but it is there: https://source.chromium.org/chromium/chromium/src/+/main:chrome/test/chromedriver/js/call_function.js This commit should fix all the current detections. I will make this cache even less detectable soon to protect it in the long-term (though this may require a full .js file replacement instead of a regex)! There is basically a cache for the JavaScript environment to store references to elements. This cache is store on the If the antibots catch up on this fix we can still:
|
Perfect. As you said this solution is hard to detect but not impossible. I think that the most suitable options are 1, 2 or 3. Creating or faking an extension will just add more detectable stuff to the project. This PR is good for me, I would only change the hardcoded if I told you about in a previous comment because it can change quickly over time. Just leave Let's see what @ultrafunkamsterdam thinks about that. |
@lukect I can confirm that I am not getting detected anymore. I'm using that replacement: file_bin = re.sub(rb"\$cdc_[a-zA-Z0-9]{22}_", lambda m: bytes(random.choices((string.ascii_letters + string.digits).encode("ascii"), k=len(m.group()))), file_bin) |
Thank you. It works fine. |
Any chance we can get these updates ported to https://github.com/seleniumbase/SeleniumBase uc_mode? This works with a proxy on windows, while current uc mode in seleniumbase with a proxy fails @mdmintz ? |
The lambda function is bad imo. The random prop name's length should also be random for better undetectability. |
@mdonova33 those updates were already ported over and worked for me. See seleniumbase/SeleniumBase#1725 (comment) for usage. |
@ultrafunkamsterdam 3.4.0 doesn't fix all of the recent detections and actually adds new vulnerabilities that the antibots will be using to their advantage by next week. |
@lukect Indeed, both the 3.4 release and unfortunately your fork too are now failing again on familysearch.org. (An Incapsula update in the last hour, it seems.) |
If it is detectable you can be sure they will use it, you can also be sure they're reading this... 😱 |
Can you provide a snippet ? I want to reproduce that |
@fabifont Sorry, probably a false alarm... I was persisting the user data dir, and it looks like something bad happened to it that made it detectable. When I reset that dir, it went back to working fine. I suspect the issue might be switching between "headful" and headless modes whilst using the same user dir, as per this issue. Interestingly, I can operate fine in headful mode, but when I switch to headless mode, it causes the loss of the session, even when I then return to headful mode. Perhaps Incapsula detected me switching between headful and headless mode at one point? |
Started detecting by cloudflare several hours ago :( |
Please always provide a snippet to reproduce the problem. Otherwise I won't consider it reliable. |
Yes, it is possible that they are mixing "under attack mode" + "managed turnstile" + "super bot fight mode". If you wanna learn more about that see the Cloudflare documentation. |
@fabifont Is Cloudflare actually detecting my fork in any of these modes? |
by me browser, It still showed captcha from cloudflare. ( not use selenium or any lib, just manual) |
Yeah I'm pretty sure Cloudflare isn't detecting anything and the website just shows a CAPTCHA for every browser when in a certain mode. It's not like these checks are intensive or anything: I'm sure they will run every check in every mode. |
Yes, I agree |
This PR fixes every recent detection issue. (#986)
To use my fork right now (maintainer seems to temporarily be inactive):
requirements.txt
pip install -r requirements.txt
pip
pip install git+https://github.com/lukect/undetected-chromedriver.git selenium~=4.8.0