Re: How would we use BCP 47 strings in a simple way?
Lassi Kortela 02 Aug 2020 08:34 UTC
> en-US: "Get your new radials at the tire center!"
> en-GB: "Get your new radials at the tyre centre!"
> en-CA: "Get your new radials at the tire centre!"
lol. It did get it right!
> Then a lookup request for "en" will immediately fail, and a request for
> "en-IN" will fail, truncate the search string to "en", and fail again.
> In either case the anglophone user will probably get the original
> Finnish. (If you use the filtering algorithm instead, you will get the
> three English versions with no indication of which one to use.)
>
> So to avoid this happening, we make sure that there is always a string
> tagged "en" as well as well as "en-*" strings. Since Americans are the
> most numerous (and also the most ignorant about other people's
> spellings), it probably makes sense to add this additional localization:
>
> en: "Get your new radials at the tire center!"
>
> Then a request for "en" will succeed, and a request for "en-IN" will
> fail, truncate the search string to "en", and succeed.
>
> All this stuff is very carefully spelled out in the second half of BCP
> 47. The detailed algorithm descriptions amount to implementations, so
> there is no point in rediscovering the wheel.
Specs like this are heavy reading for people who are not localization
experts: every paragraph reads like there must be subtleties and implied
contextual knowledge that we are bound to miss. By extension we also
have no confidence in our ability to correctly implement the spec or to
extract a sensible sub-spec out of it. That's why I try to spell things
out like a five-year-old and rely on you to sanity-check our work.
BCP 47 section 3.3.1. Basic Filtering
<https://tools.ietf.org/html/bcp47#section-3.3.1> says: "Basic filtering
is identical to the type of matching described in [RFC3066], Section 2.5
(Language-range)."
RFC 3066 section 2.5 Language-range
<https://tools.ietf.org/html/rfc3066#section-2.5> says: "A
language-range matches a language-tag if it exactly equals the tag, or
if it exactly equals a prefix of the tag such that the first character
following the prefix is "-"."
Is this what you have in mind?
As for prioritizing variants of the same language, this would be handled
by the procedure that returns the localizations? I.e. its internal list
would be sorted in some sensible order, and the search would simply
return the first match.
At this point we should write some code to prove that things are not too
hard.