Add arabic normalizer for search #21724

ivanPyrohivskyi · 2025-01-14T16:13:45Z

To issue #11709

vshcherb · 2025-01-14T16:25:55Z

OsmAnd-java/src/main/java/net/osmand/CollatorStringMatcher.java

 	public static boolean cmatches(Collator collator, String fullName, String part, StringMatcherMode mode){
+		String withoutDiacritic = ArabicNormalizer.normalize(fullName);


Very ineffective too many calls for comparision plus overhead of 10 replaceAll methods

Yes, exactly. Some hints:

Simplify the Kashida replacement by using 'replace' instead of 'replaceAll', more efficient (when no regex is needed).

Use a single single map-based lookup to reduce the redundancy of multiple 'replace' calls

Use a Stream to initialize the character replacements, makes the code easier to maintain.

How's this:

public class ArabicNormalizer { private static final Pattern DIACRITICS_PATTERN = Pattern.compile("\\p{Mn}"); private static final Map<String, String> CHAR_REPLACEMENTS = Stream.of(new String[][] { {"إ", "ا"}, {"أ", "ا"}, {"ئ", "ي"}, {"ؤ", "و"}, {"آ", "ا"}, {"ى", "ي"}, {"ة", "ه"} }).collect(Collectors.toMap(data -> data[0], data -> data[1])); public static String normalize(String text) { if (text == null) { return null; // Handle null input } String normalized = Normalizer.normalize(text, Normalizer.Form.NFD); normalized = DIACRITICS_PATTERN.matcher(normalized).replaceAll(""); // Remove diacritics efficiently // Hamza and other normalizations for (Map.Entry<String, String> entry : CHAR_REPLACEMENTS.entrySet()) { normalized = normalized.replace(entry.getKey(), entry.getValue()); } // Kashida normalized = normalized.trim().replace("\u0640", ""); // Kashida return normalized; } }

Add arabic normalizer for search

40e4d56

vshcherb reviewed Jan 14, 2025

View reviewed changes

Refactoring

f3fde36

ivanPyrohivskyi marked this pull request as draft January 15, 2025 17:05

ivanPyrohivskyi marked this pull request as ready for review January 15, 2025 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add arabic normalizer for search #21724

Add arabic normalizer for search #21724

ivanPyrohivskyi commented Jan 14, 2025

vshcherb Jan 14, 2025

sonora Jan 14, 2025

sonora Jan 14, 2025

		public static boolean cmatches(Collator collator, String fullName, String part, StringMatcherMode mode){
		String withoutDiacritic = ArabicNormalizer.normalize(fullName);

Add arabic normalizer for search #21724

Are you sure you want to change the base?

Add arabic normalizer for search #21724

Conversation

ivanPyrohivskyi commented Jan 14, 2025

vshcherb Jan 14, 2025

Choose a reason for hiding this comment

sonora Jan 14, 2025

Choose a reason for hiding this comment

sonora Jan 14, 2025

Choose a reason for hiding this comment