Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong encoding detection while reading a text from the MS Word document #2713

Open
2 tasks
xm74 opened this issue Dec 4, 2024 · 0 comments
Open
2 tasks

Comments

@xm74
Copy link

xm74 commented Dec 4, 2024

Describe the bug and add attachments

I encountered an issue with some documents of MsDoc type when parsing the structure returns text in wrong encoding.
Here is one of such documents and the code used for parsing. In this case instead of the original text Test credit card 4316020000630490 Thank you it returns 敔瑳挠敲楤⁴慣摲㐠ㄳ〶〲〰㘰〳㤴ര桔湡.
Unfortunately, Github does not allow you to attach this file as an attachment, so here is a link to download it.

Expected behavior

Returns the text in the correct encoding

Steps to reproduce

Here is the sample (simplified) code used for parsing.

$text = '';

$parser = \PhpOffice\PhpWord\IOFactory::load('/path/to/base_c1.doc', 'MsDoc');
$sections = $parser->getSections();
foreach ($sections as $section) {

	$elements = $section->getElements();
	parse_document($elements, $text);
}

function parse_document($elements, &$text) {

	foreach ($elements as $element) {

		$class = get_class($element);
		if (method_exists($class, 'getText')) {
			$text .=  $element->getText() . PHP_EOL;
		} else {
			if ($class == 'PhpOffice\PhpWord\Element\TextRun') {
				parse_document($element->getElements(), $text);
			}			
			$text .= PHP_EOL;
		}
	}
}

PHPWord version(s) where the bug happened

1.3.0

PHP version(s) where the bug happened

8.1

Priority

  • I want to crowdfund the bug fix (with @algora-io) and fund a community developer.
  • I want to pay the bug fix and fund a maintainer for that. (Contact @Progi1984)
@xm74 xm74 added the Bug Report label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant