Lauri Tirkkonen via illumos-developer
2014-10-15 08:41:11 UTC
Issues: https://www.illumos.org/issues/4006
https://www.illumos.org/issues/5227
Webrev http://www.niksula.hut.fi/~ltirkkon/webrev/4006/
I will note that this diff is *huge*, because it consists of importing
locale data that is correctly formatted for localedef. I'm not 100%
comfortable with this; it would be possible to do this conversion at
build-time to greatly reduce the size of the diff, but since I
implemented the conversion utility in Python3 [0], that would either add
a build-time dependency or require further work. However, since there is
a precedent for this kind of solution in localedef (commit
2da1cd3a39e2d3da7f9d15071ea9462919c011ac) I thought I'd ask what the
list thinks.
This changeset adds a script 'mkclasses.py' to convert data from the
Unicode Character Database (UCD) into the character classification data
format localedef expects in LC_CTYPE, and also imports that data into
the gate so that localedef can use it for all UTF-8 locales. In addition
I had to update the UTF-8.cm charmap file from CLDR because the latest
UCD data references characters that weren't present in the charmap
currently in the gate.
This changeset does not touch case mapping data; that still comes from
the CLDR data files. While out of scope for this issue, that data might
also need some love.
Lastly I moved some code from mkwidths.py to utf8_util.py to facilitate
reuse, and regenerated widths.txt with the new UTF-8.cm (to verify that
that script still works after my changes).
[0]: I would have used Python 2, like mkwidths.py does, but the copies
readily available to me (those in OI and OmniOS) were what Python calls
"narrow builds", which limits the valid argument range of unichr [1].
Python 3 does not have this problem.
[1]: https://docs.python.org/2/library/functions.html#unichr
--
Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com
https://www.illumos.org/issues/5227
Webrev http://www.niksula.hut.fi/~ltirkkon/webrev/4006/
I will note that this diff is *huge*, because it consists of importing
locale data that is correctly formatted for localedef. I'm not 100%
comfortable with this; it would be possible to do this conversion at
build-time to greatly reduce the size of the diff, but since I
implemented the conversion utility in Python3 [0], that would either add
a build-time dependency or require further work. However, since there is
a precedent for this kind of solution in localedef (commit
2da1cd3a39e2d3da7f9d15071ea9462919c011ac) I thought I'd ask what the
list thinks.
This changeset adds a script 'mkclasses.py' to convert data from the
Unicode Character Database (UCD) into the character classification data
format localedef expects in LC_CTYPE, and also imports that data into
the gate so that localedef can use it for all UTF-8 locales. In addition
I had to update the UTF-8.cm charmap file from CLDR because the latest
UCD data references characters that weren't present in the charmap
currently in the gate.
This changeset does not touch case mapping data; that still comes from
the CLDR data files. While out of scope for this issue, that data might
also need some love.
Lastly I moved some code from mkwidths.py to utf8_util.py to facilitate
reuse, and regenerated widths.txt with the new UTF-8.cm (to verify that
that script still works after my changes).
[0]: I would have used Python 2, like mkwidths.py does, but the copies
readily available to me (those in OI and OmniOS) were what Python calls
"narrow builds", which limits the valid argument range of unichr [1].
Python 3 does not have this problem.
[1]: https://docs.python.org/2/library/functions.html#unichr
--
Lauri Tirkkonen | +358 50 5341376 | lotheac @ IRCnet
-------------------------------------------
illumos-developer
Archives: https://www.listbox.com/member/archive/182179/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182179/21175072-86d49504
Modify Your Subscription: https://www.listbox.com/member/?member_id=21175072&id_secret=21175072-abdf7b7e
Powered by Listbox: http://www.listbox.com