Skip to content

Commit

Permalink
more tweaks for macOS 14
Browse files Browse the repository at this point in the history
git-svn-id: https://svn.r-project.org/R/trunk@85246 00db46b3-68df-0310-9c12-caf00c1e9a41
  • Loading branch information
ripley committed Oct 2, 2023
1 parent a7de8d1 commit b7d6d06
Showing 1 changed file with 20 additions and 16 deletions.
36 changes: 20 additions & 16 deletions src/library/base/man/iconv.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -78,21 +78,23 @@ iconvlist()
\section{Implementation Details}{
There are three main implementations of \code{iconv} in use. Linux's
most common C runtime, \samp{glibc}, contains one. Several platforms
supply versions of GNU \samp{libiconv}, including macOS and FreeBSD, in
some cases with additional encodings. On Windows we use a version of
Yukihiro Nakadaira's \samp{win_iconv}, which is based on Windows'
codepages. (We have added many encoding names for compatibility with
other systems.) All three have \code{iconvlist}, ignore case in
encoding names and support \samp{//TRANSLIT} (but with different
results, and for \samp{win_iconv} currently a \sQuote{best fit}
strategy is used except for \code{to = "ASCII"}).
supply versions or emulations of GNU \samp{libiconv}, including macOS
and FreeBSD, in some cases with additional encodings. On Windows we
use a version of Yukihiro Nakadaira's \samp{win_iconv}, which is based
on Windows' codepages. (We have added many encoding names for
compatibility with other systems.) All three have \code{iconvlist},
ignore case in encoding names and support \samp{//TRANSLIT} (but with
different results, and for \samp{win_iconv} currently a \sQuote{best
fit} strategy is used except for \code{to = "ASCII"}).

%% https://github.com/apple-oss-distributions/libiconv/blob/libiconv-80.1.1/citrus/iconv.h
The macOS implementation is attributed to the \sQuote{Citrus Project}:
the Apple sources declare it as \sQuote{compatible} with GNU \samp{libiconv}
1.11 from 2006. However, it is has diverged notably since and in macOS
14 uses a \sQuote{best fit} for conversions at least for \code{to =
"ASCII"} and \code{to = "latin1"}.
The macOS 14 implementation is attributed to the \sQuote{Citrus
Project}: the Apple headers declare it as \sQuote{compatible} with GNU
\samp{libiconv} 1.11 from 2006. However, it differs in
significant ways including using a \sQuote{best fit} for conversions,
at least for \code{to = "ASCII"} and \code{to = "latin1"}. (Earlier
versions of macOS used GNU \samp{libiconv} 1.11. It seems this
implementation is also used in recent versions of FreeBSD.)

Most commercial Unixes contain an implementation of \code{iconv} but
none we have encountered have supported the encoding names we need:
Expand All @@ -108,16 +110,15 @@ iconvlist()

There are other implementations, e.g.\sspace{} NetBSD has used one from the
Citrus project (which does not support \samp{//TRANSLIT}) and there is
an older FreeBSD port (\samp{libiconv} is usually used there): it has
not been reported whether or not these work with \R.
an older FreeBSD port.

Note that you cannot rely on invalid inputs being detected, especially
for \code{to = "ASCII"} where some implementations allow 8-bit
characters and pass them through unchanged or with transliteration or
substitution.

Some of the implementations have interesting extra encodings: for
example GNU \samp{libiconv} allows \code{to = "C99"} to use
example GNU \samp{libiconv} and macOS 14 allow \code{to = "C99"} to use
\samp{\\uxxxx} escapes (or if needed \samp{\Uuxxxxxxxx}) for
non-ASCII characters.
}
Expand Down Expand Up @@ -215,6 +216,9 @@ Encoding(x) <- "latin1"
x
try(iconv(x, "latin1", "ASCII//TRANSLIT")) # platform-dependent
iconv(x, "latin1", "ASCII", sub = "byte")
## glibc gives "Ekstroem" "Joreskog" "bisschen Zurcher"
## macOS 14 gives "Ekstrom" "J\"oreskog" "bisschen Z\"urcher"
## and for Windows' 'Unicode'
str(xx <- iconv(x, "latin1", "UTF-16LE", toRaw = TRUE))
iconv(xx, "UTF-16LE", "UTF-8")
Expand Down

0 comments on commit b7d6d06

Please sign in to comment.