Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[selectors] :lang for documents without content language and for elements of unknown language; consider :lang("") over :not(:lang("*")) #6915

Closed
myfonj opened this issue Dec 28, 2021 · 6 comments
Labels
Closed Accepted by CSSWG Resolution i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. selectors-4 Current Work

Comments

@myfonj
Copy link

myfonj commented Dec 28, 2021

Brief questions to answer

  1. Is :lang("*") really valid selector? (Safari supports it and Chrome accepted Issue 1281157 to implement it.)
  2. How to address document that failed to define it's content language?
  3. Should be HTML <el lang=""></el> or equivalent <el lang></el> matched by CSS :lang("") or even :lang()? (Not yet proposed nor implemented.)
  4. Is erroneous "undefined" document language equivalent of <html lang=""> or is it something different?

Trivia

  1. It seems that there have never been a way for lang() functional pseudo-class to precisely target document without defined content language nor element sub-tree set as such with lang="" attribute.
  2. Selectors-4 draft introduces wildcard support (md) in string argument for matching "any" language of given script or region group (like "*-Latn" or "*-ch"), opening possibility (loophole?) for matching any specified language value using :lang("*") and with conjunction with negation pseudo-class opens possibility to use :not(:lang("*")) for targeting elements that belong to "no specified language" from previous point. While plain "*" value is not explicitly mentioned in the draft, this reportedly already works in current Safari.
  3. Undetermined language occur on any HTML document that does not have explicit lang attribute and does not come with content-language HTTP header or it's "deprecated nasty" <meta http-equiv> counterpart.
  4. For marking non-linguistic unknown language content it is advised to use lang="" attribute. While it is possible to target it with attribute selector ([lang=""]), is does not seem like a right tool and introduces nesting / inheritance problems that lang() selector was created for.

Pseudo-code samples / introductory use cases

Sample 1: Document without content language

HTTP/2 200 OK
[no `content-language: xy,zz` HTTP header here]

<html [no `lang="xy"]` here>
	<head>
		[no `<meta http-equiv="content-language" content="xy,zz">` here]
	</head>
	<body>
		[I want target this document.]
	</body>
</html>

Sample 2: element with unknown language content

<html lang="xy">
<body>
 <p>xyx:
  <samp lang="">
   000<em>111<var lang="xy">x</var></em>000
  </samp>
 </p>
 [I want to target elements with digits and omit elements with letters.]
</body>
</html>

Problems

Inability to "legally" target specifically undetermined language document or element seems to be quite minor issue, since most CSS approaches tend to start with "common" defaults and progress to language specific with selector of higher specificity. Like

Sample 3: using specificity or order for refining defaults

/* pseudo example of common approach for setting language related styles */
/* hand-waving default */ :root { quotes: '"' '"' "'" "'"; } 
/* specific known language */ :lang(x-whatever) { quotes "→" "←" "☛" "☚"; }

However this approach fails short if CSS' author needs to specifically address poorly marked-up document lacking any content language hint

Sample 4: using body:not(:lang("*")) for styling "bad" document

/* Pseudo example "let's put a country flag representing language of the document on it's beginning" */
/* known languages */
body:lang(en-gb)::before { content: url(./flags-gb.svg) / "British English document: "; }
body:lang(en-us)::before { content: url(./flags-us.svg) / "American English document: "; }
/* [etc] */

/* default for "unknown" */
body::before { content: url(./flags-specified-but-unknown.svg) / "Content language of this document is specified but not included in style sheets - please contact style maintainer. "; font-size: small; color: GrayText; }

/* "error message" for "unspecified". topic of this issue */
body:not(:lang("*"))::before { content: url(./flags-unspecified-error.svg) / "This document appears to have no content language specified. If you are it's maintainer fix it ASAP, please. "; }`
body:not(:lang("*")) { background-color: var(--bg-error, Canvas); color: var(--fg-error, GrayText); font-family: cursive; }

(Think Sample 3 above. Also notice that samples use CSS features not yet widely implemented - pseudo element alternative texts and system colors - but those are not related for the issue in question.)

Links, resources and notes

@fantasai fantasai added selectors-4 Current Work i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. labels Dec 31, 2021
@xfq xfq changed the title [css-selectors] :lang for documents without content language and for elements of unknown language; consider :lang("") over :not(:lang("*")) [selectors] :lang for documents without content language and for elements of unknown language; consider :lang("") over :not(:lang("*")) Jan 4, 2022
@astearns astearns added this to 10:30-11:30 i18n in TPAC Friday 2022 Sep 13, 2022
@frivoal
Copy link
Collaborator

frivoal commented Sep 14, 2022

This was discussed at TPAC by the i18n-WG, and the conclusion was that we should have:

  • :lang("") that matches lang="" (and descendants…)
  • :lang("*") that matches everything but lang="" and descendants
  • a note about lang="und" and lang="" being treated distinctly, despite having similar semantics

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed :lang() without document language.

The full IRC log of that discussion <TabAtkins> Topic: :lang() without document language
<TabAtkins> fantasai: When a lang is unknown, there's two ways ot make it unknown
<TabAtkins> fantasai: first is blank `lang` value
<TabAtkins> fantasai: second is `lang=und`
<TabAtkins> fantasai: So how do you match these elements?
<TabAtkins> fantasai: Proposal is `:lang("")` matches untagged (blank tag), and add a note that `"und"` and `""` are matched differently despite similar semantics
<TabAtkins> fantasai: And then `:lang("*")` will match any tagged element, including `lang=und` (but won't match untagged/empty string)
<TabAtkins> TabAtkins: Sorry, hadn't read the discussion, but what's the justification for treating empty string differently from und?
<fantasai> Rossen: I'm not sure about treating them differently
<PaulG> which spec does "und" value come from?
<fantasai> Probably BCP47
<TabAtkins> Rossen_: Is it in selectors, or text...?
<TabAtkins> fantasai: und is in the lang tag spec
<TabAtkins> PaulG: rfc 5646 doesn't have it, it's not in the IDN reg
<TabAtkins> Rossen_: Perhaps we should delay a week for florian to be able to weigh in
<dbaron> und looks like it's for undetermined

@dbaron
Copy link
Member

dbaron commented Oct 12, 2022

und is discussed in RFC 5646

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed :lang() again, and agreed to the following:

  • RESOLVED: `lang=und` matches :lang("*")`, `lang=""` doesn't match any `:lang()`
The full IRC log of that discussion <TabAtkins> Topic: :lang() again
<TabAtkins> github: https://github.com//issues/6915#issuecomment-1247024928
<TabAtkins> Rossen_: So question was incosnistency between empty string and und both being "undefined", but they match differently
<fantasai> dbaron dropped a link to the "und" definition in https://www.rfc-editor.org/rfc/rfc5646.html#page-56
<TabAtkins> florian: Empty string is defined to mean "undefined" by HTML. "und" is defined as "undefined by ISO [something]
<TabAtkins> florian: Possibly HTML shoudln't have introduced another value to mean the same thing
<TabAtkins> florian: but it did
<TabAtkins> florian: i18n group was somewhat struggling with whether to unify it or not
<TabAtkins> florian: Could b epushed back, but logic was that HTML didn't unify them, and when we do the string matching of lang selectors we do string matching without worrying about the semantics
<TabAtkins> florian: So our :lang() just does standard lang string parsing + matching
<TabAtkins> florian: So logic was probably "just keep it simple"
<TabAtkins> florian: In practice HTML semantics somewhat combine both "explicitly undetermined" and "author couldn't be bothered to specify".
<TabAtkins> florian: but generally, this is a space where we dont' control the semantics
<TabAtkins> florian: But I don't think i18n was firm on the conclusion, so if we want to push back it could b eheard
<fantasai> TabAtkins: That does answer the question
<fantasai> TabAtkins: I think I'd be happier if we push back
<fantasai> TabAtkins: but having this distinction from HTML be reflected in our Selectors should be avoided if possible
<TabAtkins> florian: I think what happened in practice is th eobservation is that "und" wasn't really used on the web, so empty string is how it was actually done in HTML
<TabAtkins> florian: So effectively we can ignore the "und" value and have :lang("*") match everything *but* the empty string.
<TabAtkins> florian: And so while technicaly "und" matches the lang, in practice udnefined langs don't match it
<TabAtkins> fantasai: Yeah, "*" matches "und", but we've had a request for "can I match things without a language", and we'd be able lto do that if we make a distinction in this manner
<TabAtkins> jfkthame: I think this is a distinctino we should maintain, seems there is a semantic difference between lang being undefined/untagged and explicitly tagged as undetermined
<PaulG> q+
<TabAtkins> My position is very weak, I defer to whoever has expertise
<PaulG> q-
<TabAtkins> jfkthame: I agree "und" is rarely used but it does seem semantically meaningful
<Rossen_> ack fantasai
<Zakim> fantasai, you wanted to point out * semantics
<PaulG> q+
<fantasai> TabAtkins: If we accept to keep distinct, we don't need to push back. Since several ppl think it's good to keep separate, I'm ok with that
<TabAtkins> florian: The group that came up with the original rec was just weakly leaning - I think they got it right, but still
<Rossen_> ack PaulG
<TabAtkins> PaulG: AGree with keeping them separate. I suspect coalescing would encourage more "und", where default lang choice is used when undefined, but I think it shouldn't when it's explicitly "undetermined"
<TabAtkins> Sounds good to me
<TabAtkins> (I appreciate the "undefined" vs "undetermined" distinction. Was just objecting to two separate "undefined" notions.)
<TabAtkins> Rossen_: Objections?
<TabAtkins> RESOLVED: `lang=und` matches :lang("*")`, `lang=""` doesn't match any `:lang()`
<fantasai> scribenick: fantasai

@fantasai
Copy link
Collaborator

fantasai commented Nov 8, 2022

Fixed in 9b51686

@fantasai fantasai closed this as completed Nov 8, 2022
@aphillips
Copy link
Contributor

@fantasai @frivoal Thanks for this fix. Alas, I just now posted some comments related to the change. I can file a new issue if needed.

@frivoal I'm closing your action for I18N...

jakearchibald pushed a commit to jakearchibald/csswg-drafts that referenced this issue Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed Accepted by CSSWG Resolution i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. selectors-4 Current Work
Projects
No open projects
TPAC Friday 2022
10:30-11:30 i18n
Development

No branches or pull requests

6 participants