From b7d6d068659c9a7c6a3f5459cdfb93f6834c9208 Mon Sep 17 00:00:00 2001 From: ripley Date: Mon, 2 Oct 2023 05:26:41 +0000 Subject: [PATCH] more tweaks for macOS 14 git-svn-id: https://svn.r-project.org/R/trunk@85246 00db46b3-68df-0310-9c12-caf00c1e9a41 --- src/library/base/man/iconv.Rd | 36 +++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/src/library/base/man/iconv.Rd b/src/library/base/man/iconv.Rd index bb18643abb4..4b21f05fcb1 100644 --- a/src/library/base/man/iconv.Rd +++ b/src/library/base/man/iconv.Rd @@ -78,21 +78,23 @@ iconvlist() \section{Implementation Details}{ There are three main implementations of \code{iconv} in use. Linux's most common C runtime, \samp{glibc}, contains one. Several platforms - supply versions of GNU \samp{libiconv}, including macOS and FreeBSD, in - some cases with additional encodings. On Windows we use a version of - Yukihiro Nakadaira's \samp{win_iconv}, which is based on Windows' - codepages. (We have added many encoding names for compatibility with - other systems.) All three have \code{iconvlist}, ignore case in - encoding names and support \samp{//TRANSLIT} (but with different - results, and for \samp{win_iconv} currently a \sQuote{best fit} - strategy is used except for \code{to = "ASCII"}). + supply versions or emulations of GNU \samp{libiconv}, including macOS + and FreeBSD, in some cases with additional encodings. On Windows we + use a version of Yukihiro Nakadaira's \samp{win_iconv}, which is based + on Windows' codepages. (We have added many encoding names for + compatibility with other systems.) All three have \code{iconvlist}, + ignore case in encoding names and support \samp{//TRANSLIT} (but with + different results, and for \samp{win_iconv} currently a \sQuote{best + fit} strategy is used except for \code{to = "ASCII"}). %% https://github.com/apple-oss-distributions/libiconv/blob/libiconv-80.1.1/citrus/iconv.h - The macOS implementation is attributed to the \sQuote{Citrus Project}: - the Apple sources declare it as \sQuote{compatible} with GNU \samp{libiconv} - 1.11 from 2006. However, it is has diverged notably since and in macOS - 14 uses a \sQuote{best fit} for conversions at least for \code{to = - "ASCII"} and \code{to = "latin1"}. + The macOS 14 implementation is attributed to the \sQuote{Citrus + Project}: the Apple headers declare it as \sQuote{compatible} with GNU + \samp{libiconv} 1.11 from 2006. However, it differs in + significant ways including using a \sQuote{best fit} for conversions, + at least for \code{to = "ASCII"} and \code{to = "latin1"}. (Earlier + versions of macOS used GNU \samp{libiconv} 1.11. It seems this + implementation is also used in recent versions of FreeBSD.) Most commercial Unixes contain an implementation of \code{iconv} but none we have encountered have supported the encoding names we need: @@ -108,8 +110,7 @@ iconvlist() There are other implementations, e.g.\sspace{} NetBSD has used one from the Citrus project (which does not support \samp{//TRANSLIT}) and there is - an older FreeBSD port (\samp{libiconv} is usually used there): it has - not been reported whether or not these work with \R. + an older FreeBSD port. Note that you cannot rely on invalid inputs being detected, especially for \code{to = "ASCII"} where some implementations allow 8-bit @@ -117,7 +118,7 @@ iconvlist() substitution. Some of the implementations have interesting extra encodings: for - example GNU \samp{libiconv} allows \code{to = "C99"} to use + example GNU \samp{libiconv} and macOS 14 allow \code{to = "C99"} to use \samp{\\uxxxx} escapes (or if needed \samp{\Uuxxxxxxxx}) for non-ASCII characters. } @@ -215,6 +216,9 @@ Encoding(x) <- "latin1" x try(iconv(x, "latin1", "ASCII//TRANSLIT")) # platform-dependent iconv(x, "latin1", "ASCII", sub = "byte") +## glibc gives "Ekstroem" "Joreskog" "bisschen Zurcher" +## macOS 14 gives "Ekstrom" "J\"oreskog" "bisschen Z\"urcher" + ## and for Windows' 'Unicode' str(xx <- iconv(x, "latin1", "UTF-16LE", toRaw = TRUE)) iconv(xx, "UTF-16LE", "UTF-8")