Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding history entry fails if path contains unicode surrogate code points #2947

Open
jkhsjdhjs opened this issue Oct 15, 2024 · 3 comments
Open
Labels

Comments

@jkhsjdhjs
Copy link

SABnzbd version

4.3.3

Operating system

Arch Linux

Using Docker image

None

Description

If the path returned by one_file_or_folder() contains unicode surrogate code points, adding the history entry will fail on download completion with SQL Command Failed. It looks like utf-8 encoding is performed on the string query parameters, which fails due to the special character:

workdir_complete = one_file_or_folder(workdir_complete)

(workdir_complete is later used as path in the SQL query)

2024-10-15 14:40:20,591::INFO::[notifier:157] Sending notification: Error - SQL Command Failed, see log (type=error, job_cat=None)
2024-10-15 14:40:20,591::ERROR::[database:154] SQL Command Failed, see log
2024-10-15 14:40:20,591::INFO::[database:155] SQL: INSERT INTO history (completed, name, nzb_name, category, pp, script, report,
            url, status, nzo_id, storage, path, script_log, script_line, download_time, postproc_time, stage_log,
            downloaded, fail_message, url_info, bytes, duplicate_key, md5sum, password)
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
2024-10-15 14:40:20,592::INFO::[database:156] Arguments: (1729003220, 'test', 'test.nzb', '*', 'D', 'Default', '', None, 'Completed', 'SABnzbd_nzo_mfg7581n', '/mnt/test.1/weird.characters.\udc96.txt', '/var/lib/sabnzbd/Downloads/incomplete/test.1', '', '', 0, 0, 'Servers:::server=288 B\r\nDownload:::Downloaded in 0 seconds at an average of 918 B/s<br/>Age: 2m\r\nSource:::test.nzb\r\nRepair:::[test] No par2 sets;[test] Trying RAR-based verification;[test] RAR files verified successfully\r\nUnpack:::[test] Direct Unpack - Unpacked 1 files/folders in 0 seconds', 230, '', '', 230, None, '4da44f19d4f2410a09356163191507b3', None)
2024-10-15 14:40:20,592::INFO::[database:157] Traceback:
Traceback (most recent call last):
  File "/usr/lib/sabnzbd/sabnzbd/database.py", line 128, in execute
    self.cursor.execute(command, args)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc96' in position 50: surrogates not allowed
2024-10-15 14:40:20,592::INFO::[database:308] Added job test to history

(line numbers are off due to me adding debug statements everywhere)

Here, the filename contains the special character \udc96 (as reported by os.listdir()). Interestingly, the filename reported by the unpacker contains \ufffe\ue096 instead:

2024-10-15 14:40:20,524::INFO::[postproc:482] Unpacked files ['/mnt/_UNPACK_test.1/weird.characters.\ufffe\ue096.txt']

A possible solution would be to extend the functions responsible for sanitizing filenames/foldernames to strip such characters.

Reproduction Steps:

  1. Create a file with $'\226' in its name
  2. Create a rar archive containing this file
  3. Post it to usenet
  4. Download it via sabnzbd
touch weird.characters.$'\226'.txt
rar a test weird.characters.$'\226'.txt
nyuu [...] -o test.nzb test.rar
@jkhsjdhjs jkhsjdhjs added the Bug label Oct 15, 2024
@Safihre
Copy link
Member

Safihre commented Oct 15, 2024

It's depended on the OS and filesystem.
See also #1633.

@thezoggy
Copy link
Contributor

thezoggy commented Oct 15, 2024

additionally whats your locale set to that you run sab on: locale
and then the disk your writing this to.. whats its mount options/filesystem.

btw you can change your logging level to debug from sab homepage by clicking the wrench and setting it there on the bottom right.

@jkhsjdhjs
Copy link
Author

The locale is set to en_US.UTF-8:

$ sudo -u sabnzbd locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

The disk where this is written to is a ZFS subvolume:

rpool/subvol-100-disk-1 on /mnt type zfs (rw,noatime,xattr,posixacl,casesensitive)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants