Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DownloadResults with ZERO WIDTH NO-BREAK SPACE #52

Open
stravag opened this issue Dec 1, 2016 · 7 comments
Open

DownloadResults with ZERO WIDTH NO-BREAK SPACE #52

stravag opened this issue Dec 1, 2016 · 7 comments

Comments

@stravag
Copy link

stravag commented Dec 1, 2016

Hi

We've been using the Android sample code to communicate with ABBY OCR. But we noticed that in the plain/text result of the downloadResults request at the beginning of the text there's always a ZERO WIDTH NO-BREAK SPACE (U+FEFF) character that we had to remove.

Why is that character in the response?

Our current "solution" is to remove the character in the ResultsActivity

contents.toString().replaceFirst("\uFEFF", "")

@Dmitry-Me
Copy link
Contributor

Dmitry-Me commented Dec 1, 2016

I assume that you're describing the following scenario:

  • your application invokes getTaskStatus and obtains an XML describing a completed task
  • that XML contains a URL of a result file
  • you download said result file from the URL and it's a Unicode text file starting with 0xFEFF

In this case the FEFF is a byte order mark (https://en.wikipedia.org/wiki/Byte_order_mark) indicating that it's UTF-16 big endian.

@stravag
Copy link
Author

stravag commented Dec 23, 2016

Yes thanks for the reply, that's exactly it! After further research I found that there is an old known bug in Java, it can't deal with BOMs automatically, making it necessary to remove it manually. http://bugs.java.com/view_bug.do?bug_id=4508058

It would be good if you could change the sample code to incorporate that. https://github.com/abbyysdk/ocrsdk.com/blob/master/Android/src/abbyy/ocrsdk/android/ResultsActivity.java

Apache Commons IO for instance provides a BOMInputStream

@Dmitry-Me
Copy link
Contributor

Which exportFormat do you use when you invoke /processImage?

@stravag
Copy link
Author

stravag commented Dec 28, 2016

TXT Format: language=English&exportFormat=txt

@Dmitry-Me
Copy link
Contributor

Where's the downloadResults method in the sample code?

@Dmitry-Me
Copy link
Contributor

Dmitry-Me commented Dec 29, 2016

It's donloadResult (no s), that's why I asked. This method is for any type of results, including PDF, RTF and whatever else. Also even if it's a text result there's no guarantee that it will then be read and processed with Java code. So I'm not sure it's a good idea to do anything with the output file in the generic Android sample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants