Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use upstream uchardet #22

Open
dotlambda opened this issue Feb 15, 2023 · 8 comments
Open

Use upstream uchardet #22

dotlambda opened this issue Feb 15, 2023 · 8 comments

Comments

@dotlambda
Copy link

Cchardet currently uses the fork https://github.com/PyYoshi/uchardet of https://gitlab.freedesktop.org/uchardet/uchardet. If I try to compile it with the latter I get

  src/cchardet/_cchardet.cpp: In function ‘PyObject* __pyx_pf_8cchardet_9_cchardet_detect_with_confidence(PyObject*, PyObject*)’:
  src/cchardet/_cchardet.cpp:1495:33: error: ‘uchardet_get_confidence’ was not declared in this scope; did you mean ‘uchardet_get_charset’?
   1495 |   __pyx_v_detected_confidence = uchardet_get_confidence(__pyx_v_ud);
        |                                 ^~~~~~~~~~~~~~~~~~~~~~~
        |                                 uchardet_get_charset
  src/cchardet/_cchardet.cpp: In function ‘PyObject* __pyx_pf_8cchardet_9_cchardet_17UniversalDetector_4feed(__pyx_obj_8cchardet_9_cchardet_UniversalDetector*, PyObject*)’:
  src/cchardet/_cchardet.cpp:1986:42: error: ‘uchardet_get_confidence’ was not declared in this scope; did you mean ‘uchardet_get_charset’?
   1986 |     __pyx_v_self->_detected_confidence = uchardet_get_confidence(__pyx_v_self->_ud);
        |                                          ^~~~~~~~~~~~~~~~~~~~~~~
        |                                          uchardet_get_charset
  src/cchardet/_cchardet.cpp: In function ‘PyObject* __pyx_pf_8cchardet_9_cchardet_17UniversalDetector_6close(__pyx_obj_8cchardet_9_cchardet_UniversalDetector*)’:
  src/cchardet/_cchardet.cpp:2090:42: error: ‘uchardet_get_confidence’ was not declared in this scope; did you mean ‘uchardet_get_charset’?
   2090 |     __pyx_v_self->_detected_confidence = uchardet_get_confidence(__pyx_v_self->_ud);
        |                                          ^~~~~~~~~~~~~~~~~~~~~~~
        |                                          uchardet_get_charset

because the function signatures differ. It would be nice to have cchardet use the upstream project instead because that's shipped by many Linux distributions.

@wbarnha
Copy link
Member

wbarnha commented Feb 15, 2023

I'm definitely in favor of using the upstream uchardet library. I think the original maintainer forked and made slight changes for specific reasons I cannot recall. I've experimented with it in the past and there will be some tinkering required.

@bdraco
Copy link

bdraco commented Feb 19, 2023

It looks like the current wheel build fails because becaues the include dir is missing after a8c5431

     gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fno-semantic-interposition -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free -DTHREAD_STACK_SIZE=0x100000 -fPIC -I/usr/local/include/python3.10 -c src/cchardet/_cchardet.cpp -o build/temp.linux-armv6l-cpython-310/src/cchardet/_cchardet.o
      src/cchardet/_cchardet.cpp:775:10: fatal error: uchardet.h: No such file or directory
        775 | #include "uchardet.h"
            |          ^~~~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1

@bdraco
Copy link

bdraco commented Feb 19, 2023

It looks like wheel builds will only work with ciwheelbuild

a8c5431#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R16

@bdraco
Copy link

bdraco commented Feb 19, 2023

PEP 517 builds fail

python -m build --wheel --no-isolation

cChardet % python3 -m build --wheel --no-isolation
* Getting dependencies for wheel...
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.10/site-packages/pep517/in_process/_in_process.py", line 351, in <module>
    main()
  File "/opt/homebrew/lib/python3.10/site-packages/pep517/in_process/_in_process.py", line 333, in main
    json_out['return_val'] = hook(**hook_input['kwargs'])
  File "/opt/homebrew/lib/python3.10/site-packages/pep517/in_process/_in_process.py", line 118, in get_requires_for_build_wheel
    return hook(config_settings)
  File "/opt/homebrew/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in get_requires_for_build_wheel
    return self._get_build_requires(config_settings, requirements=['wheel'])
  File "/opt/homebrew/lib/python3.10/site-packages/setuptools/build_meta.py", line 320, in _get_build_requires
    self.run_setup()
  File "/opt/homebrew/lib/python3.10/site-packages/setuptools/build_meta.py", line 335, in run_setup
    exec(code, locals())
  File "<string>", line 9, in <module>
ModuleNotFoundError: No module named 'pkgconfig'

ERROR Backend subprocess exited when trying to invoke get_requires_for_build_wheel

bdraco added a commit to bdraco/vulcan-api that referenced this issue Feb 19, 2023
This lib is an optimization only

Removing it per
kapi2289#126 (comment)

Also PEP 517 wheel builds are currently
failing for the lib
faust-streaming/cChardet#22 (comment)
bdraco added a commit to bdraco/vulcan-api that referenced this issue Feb 19, 2023
This lib is an optimization only

Removing it per
kapi2289#126 (comment)

Also PEP 517 wheel builds are currently
failing for the lib
faust-streaming/cChardet#22 (comment)
@bdraco
Copy link

bdraco commented Feb 20, 2023

I'm not sure its possible to use the upstream lib without causing some regressions since it appears to be missing some of the manually applied fixes.

@wbarnha
Copy link
Member

wbarnha commented Feb 20, 2023

My experience with the upstream libraries right now is that they are currently incompatible with what the fork has. I would have to add a confidence function into the upstream libraries in order to guarantee compatibility with what we have in this library.

@bdraco
Copy link

bdraco commented Feb 20, 2023

Related commits

PyYoshi/uchardet@f1e11d6
PyYoshi/uchardet@887584b

@mcepl
Copy link

mcepl commented Feb 26, 2023

I'm not sure its possible to use the upstream lib without causing some regressions since it appears to be missing some of the manually applied fixes.

Well, I don’t see any merge requests (or even issue tickets) to add there those fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants