[ast-developers] Renaming the C.UTF-8 builtin locale?

Discussion:

Lionel Cons

2013-09-25 18:47:24 UTC

Here's a request from one of my teams:
Can the AST built in locale C.UTF-8 be renamed to a different name,
for example builtin_C.UTF-8 to avoid collisions with a system's native
C.UTF-8 locale (apparently they have the problem at least on AIX)
which may have similar but different properties?

Lionel

Glenn Fowler

2013-09-25 19:48:08 UTC

Permalink

Post by Lionel Cons
Can the AST built in locale C.UTF-8 be renamed to a different name,
for example builtin_C.UTF-8 to avoid collisions with a system's native
C.UTF-8 locale (apparently they have the problem at least on AIX)
which may have similar but different properties?

I wan't aware of any system with a C.UTF-8 locale
is there a url describing what it means on AIX?

Lionel Cons

2013-09-25 20:03:33 UTC

Permalink

Post by Glenn Fowler

I wan't aware of any system with a C.UTF-8 locale
is there a url describing what it means on AIX?

Unfortunately I'm just the messenger, but I forward the question. What
I can understand is that C.UTF-8 is a C locale with UTF-8 multibyte
characters, but the differences start with the character classes and
other details.

Lionel

Glenn Fowler

2013-09-25 20:46:46 UTC

Permalink

Post by Lionel Cons

Post by Glenn Fowler

I wan't aware of any system with a C.UTF-8 locale
is there a url describing what it means on AIX?

a question for the list:

for the C.UTF-8 local ast overrides the [:alpha:] class and wcwidth() print width function
this is to provide a test locale that is consistent across platforms

it was dgk's and mine impression that [:alpha:] and wcwidth() are language neutral
and thus one set of lookup tables for unicode [:alpha:] and wcwidth() should work

is there wiggle room in the uncode spec for an implementation to change what
[:alpha:] and wcwidth() mean?

if the answer is no then we can ask why ast and AIX differ on [:alpha:] and wcwidth()
(and any other differences too) and fix ast if its missing something

if the answer is yes then we'll have to use a name different from C.UTF-8 as Lionel suggests