I've been involved in one way or another in internationalizing software for over 25 years. Those 25 years have certainly seen some changes -- like the rise of Unicode -- but I think we still fall short.
Recently, on the Android platform, a change was made that peripherally relates to today's topic; this is a revised version of that post, as the issue is much broader than that of Android, or Java, but rather how we go about externalizing strings for translation.
The current usual practice for externalizing strings is to externalize a template, which gets filled in, for example, "%d cats and %d dogs". Typically, this would be used like this:
format("%d cats and %d dogs", catCount, dogCount);
That doesn't look too bad, but when you don't have the format string and the parameters in the same place, it becomes really hard to understand what is happening, and errors are common.
But that's just the start of the problem. When you translate, often word order needs to change. For example "red shoes" becomes "zapatos rojos" in Spanish. This extends to entire sentence structure. The usual solution is to allow the format string to refer to arguments explicitly by position, so for:
format("%s %s", color, item);
we might substitute "%2$s %1$s" for Spanish.
But that's not good enough.
In my experience (perhaps not comprehensive), the workflow goes like this: A developer writes the message, in his language of choice, usually English, but perhaps, say, Japanese, and plops in %s and friends where appropriate, and arranges the arguments to the formatting code in the same order as they occur in his format string.
Then someone comes along and whacks him over the head for not externalizing it for translation, and it is moved to strings.xml or other localization resource.
Finally, maybe one time in 100, the application is successful enough to be localized to other languages. One might wish for this to be more often, but actually I think I'm being generous here. But it's a pain to do if the previous step hasn't been done, as a matter of practice. So we'd like to make that as easy as possible. Really, I don't think we're even close to making it as easy as we should make it! Especially since the programmer, and likely his manager as well, will correctly perceive that the effort stands a high probability of being wasted.
Then finally, if we're lucky, we get to translate the app. If we're really, really lucky, we'll have a QA department that will try to reproduce as many of the messages as possible, in the original language.
Now, off goes string.xml and friends to the localization service, translator, team member, or maybe the developer's wife. Much of the time, maybe 80% or more, the strings with multiple %s's will not be using them in a way that is particularly sensitive to language. They may be constructing an identifier name, or a list, or may occur in different sentences, which would not in most cases require reordering.
Your mileage may vary on the frequency, of course. But I'd say the most common use would be something like "%d cats and %d dogs" -- and ordering is not the problem you face here, but rather, handling of plurals.
And Android gets big kudos for addressing it, with a <plurals/> item listing alternatives based on count. But "%d gatos y %d perros" or "%d猫と%d犬?" are perfectly fine, from an ordering standpoint.
BUT! Your Spanish translator is barfing right now, and it has nothing to do with positionals. The programmer didn't think about plurals. (Yeah, yeah, he should have. See above for the unfortunate reality). Your translator has to get the programmer to change the code. Bletch. That likely requires a whole new release. more expensive QA, etc. Yikes! Maybe we live with it for now, instead.
Now (next release) it goes to the Japanese translator. Who goes, WTF is this? Well, OK, maybe more like 「なにこれ？」
What used to be easily translated as "%d猫と%d犬" is now just "%d %s and %d %s", or maybe by now "%1$d %2$s and %3$d %4$s". What's the context? Should this be "%1$d%2$sと%3$d%4$s", "%1$d%2$sで、%3$d%4$s", or a dozen other possibilities? Yes, the first guess is still really the only likely answer given the presence of %d, but still, you've made the translation job harder.
Given that the original was in English, which has plurals, the programmer might have supplied that -- but consider if the original were in Japanese, which almost entirely lacks the concept. Just like English lacks the concept of a different set of number words for long skinny things like pencils, vs flat sheets like paper. In that case, the first time that plurals will enter the picture will be when the app gets to the English translator, who will get to barf. Note that even if the Japanese programmer writes it in English originally, you're still probably going to have the same situation. In fact, you're likely to have it even if English is your programmer's native tongue.
In addition to translation, often the text and wording are manipulated separately from the code, for marketing and/or usability and/or branding purposes. So the original text may have been "Pigs: %d, Cattle: %d", and now it's being rebranded for a pet store by someone who doesn't even have access to the source, and they want to make the text style more friendly -- and THIS is the first context in which plurals need to be handled.
Finally, notice that Spanish has plural forms for the adjectives! Your English-speaking programmer may not have even been aware of this fact. Again, putting this burden on the programmer leads to failure.
Bottom line on that point is that plurals are a linguistic concern, and should not be institutionalized in Java code. Plural handling should be fully externalized -- as part of the format syntax. Say, perhaps, %(3:MyPlural)$s, which rather than directly referencing an argument, references a <plural/>, and selects the case based on the numeric value in parameter 3.
The bottom bottom line is, there really isn't much value in asking programmers to write in the positional arguments. It's more economical to do so at the point of reordering, which is only a small subset of the cases - if you get to that point at all. If you want to force programmers to do something useful -- force them to define and document what each parameter MEANS. For example, if you forced the programmer to give each parameter a NAME, instead of a position in a getString(...) call, the extra effort would not be duplicative, and you get some improved robustness for your efforts.
For a really radical suggestion: in strings.xml:
<string id='cats_and_dogs' >%(CATS)d % (CATS:cat_plural)s and %(DOGS)d %DOGS:dog_plural)s</string>
getFormatted(R.string.cats_and_dogs) .p("DOGS", dogCount) .p("CATS", catCount) .asText();
This fully removes order from the picture, provides some self-documenting characteristics, separates the concerns properly -- and of course, is completely incompatible with Java's formatting. Though it is detectable which is which.
It'd be rather simple to create such a formatting library. It can be built on the existing language formatting facilities, so it could be done as a third-party library. However, as a matter of encouraging good practice throughout the industry, I'd like to see language developers/vendors providing such a facility natively. It adds more value if it's done as a standard practice. Finally, we need tool support to make it easier for programmers to work with externalized string at the point-of-use in the code, and to go in the reverse direction -- from string resource to usage in the code.