Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for xml files #18

Open
cscotti opened this issue Jun 21, 2023 · 7 comments
Open

Support for xml files #18

cscotti opened this issue Jun 21, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@cscotti
Copy link

cscotti commented Jun 21, 2023

It would be nice if you could implement xml support in order to compare version ?
Actually, we get "Unsupported language: xml" message

@cscotti cscotti added the enhancement New feature or request label Jun 21, 2023
@mmueller2012
Copy link
Contributor

Can you tell me a bit more about your use case? What kind of XML files do you use and which changes do you want SemanticDiff to hide?

The issue with XML is that the spec only defines when a file is well formed. It doesn't really specify what changes you can make without changing the meaning. While most applications will just work fine if you reorder children inside of a parent or add additional whitespace for readability, this technically modifies the content. The only exception are attributes which you can reorder according to the spec. It is therefore difficult to implement a generic XML mode.

@ebw44
Copy link

ebw44 commented Sep 25, 2023

XML support would be useful in the case of Twincat files. Twincat is a software for automation software by Beckhoff, and save configuration as well as code into xml files. It has the tendency to modify these xml files for no reasons which makes source control of your project painful. Here are a few examples of what can happen and result in diff in regular text based diff:

  • move items around in a list
  • because of order difference changes hash attribute
  • changes ID attribute of elements

@GeirGrusom
Copy link

I find that XML files is something most diff tools do poorly. I often encounter issues when merging lists that contain the same element multiple times:

<foo>
  <bar>
  </bar>
  <bar>
  </bar>
</foo>

When there is a merge conflict with a <bar> it's an easy mistake to end up with two </bar> right after each other since diff tools don't understand/care that the order matters.

For me the place I most often encounter this is in C# project files, and Visual Studio solution files.

@MakotoE
Copy link

MakotoE commented Jan 25, 2024

I often change the white spacing in XML files - whether tabs or newlines. If SemanticDiff supports XML files, it would greatly help me ignore whitespace changes.

@mmueller2012
Copy link
Contributor

We just released SemanticDiff 0.10.0 which adds support for XML. Please give it a try and report back if it works for you.

@ebw44
Copy link

ebw44 commented Jan 10, 2025

I tried on a few XML files from a Twincat project and it's doing a decent job a finding random move of XML elements, even on big files 1.8MB). However sometimes it doesn't match the moved regions as well as VSCode diff. Below is an example:

<?xml version="1.0"?>
<TcSmProject xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.beckhoff.com/schemas/2012/07/TcSmProject" TcSmVersion="1.0" TcVersion="3.1.4024.35">
	<Var>
		<Name>Dummy1</Name>
		<Type>BYTE</Type>
	</Var>
	<Var>
		<Name>Moved1</Name>
		<Type>BOOL</Type>
	</Var>
	<Var>
		<Name>Dummy2</Name>
		<Type>BYTE</Type>
	</Var>
	<Var>
		<Name>Dummy3</Name>
		<Type>BYTE</Type>
	</Var>
	<Var>
		<Name>Dummy4</Name>
		<Type>BYTE</Type>
	</Var>
	<Var>
		<Name>Moved2</Name>
		<Type>BOOL</Type>
	</Var>
	<Var>
		<Name>Dummy5</Name>
		<Type>BYTE</Type>
	</Var>
	<Var>
		<Name>Dummy6</Name>
		<Type>BYTE</Type>
	</Var>
</TcSmProject>
<?xml version="1.0"?>
<TcSmProject xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.beckhoff.com/schemas/2012/07/TcSmProject" TcSmVersion="1.0" TcVersion="3.1.4024.35">
	<Var>
    	<Name>Added</Name>
    	<Type GUID="{8C90EA60-DC50-8918-7405-4F6098F75AE7}">ARRAY [0..2] OF BYTE</Type>
    </Var>
	<Var>
		<Name>Dummy1</Name>
		<Type>BYTE</Type>
	</Var>
	<Var>
		<Name>Dummy2</Name>
		<Type>BYTE</Type>
	</Var>
	<Var>
		<Name>Dummy3</Name>
		<Type>BYTE</Type>
	</Var>
	<Var>
		<Name>Moved1</Name>
		<Type>BOOL</Type>
	</Var>
	<Var>
		<Name>Dummy4</Name>
		<Type>BYTE</Type>
	</Var>
	<Var>
		<Name>Dummy5</Name>
		<Type>BYTE</Type>
	</Var>
	<Var>
		<Name>Dummy6</Name>
		<Type>BYTE</Type>
	</Var>
	<Var>
		<Name>Moved2</Name>
		<Type>BOOL</Type>
	</Var>
</TcSmProject>

VSCode find the minimum move region
image
But SemanticDiff detects a single moved region and some changes, missing the second moved region.
image

PS: The online tool find both moved regions so it might be that this has already been improved.

@mmueller2012
Copy link
Contributor

@ebw44 Thanks for pointing this out. We have debugged the issue and it turns out to be a difference between the Windows and Linux versions of SemanticDiff. One of the libraries we use behaves slightly different depending on the operating system (stable vs. unstable sorting). The diff is valid in both cases (no changes get lost), but one might be slightly easier to understand than the other. We will fix this in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants