status-bydesign(Manually changing this from bug
to status-bydesign
given my discoveries documented below.)
I searched around, but I couldn't find any previous posts referencing regular expressions.
Regular expressions are stated to be currently supported, but it is not in the list of languages supported by highlight.js (it was supported by Prettify).
There are some weird effects when highlighting complex expressions, e.g., from this answer:
(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
It sometimes italicizes the characters between asterisks *
, and other times fails to highlight character lists inside square brackets, []
.
If it's not supported by highlight.js, where is this highlighting scheme even coming from?See update Are regular expressions included in the FAQ list by mistake1? I notice that the default highlighter for the regex tag on SO is lang-default
rather than lang-regex
.
Update
So I've done a little digging, and it appears what's really going on here is that the regular expression in this post is getting auto-recognized as Markdown, even when specified as regex
.
Setting the identifier of the same snippet as lang-markdown
has an identical effect as regex
:
(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
This leads into the discovery I made, which largely revolves around the last sentence of my original post:
I notice that the default highlighter for the regex tag on SO is
lang-default
rather thanlang-regex
.
As described in this post by @T.J.Crowder, and backed up by the help center, there is a difference between identifying a code block as lang-X
vs. just X
.
As per the help center (emphasis mine):
You can use either one of the supported language codes, like
lang-cpp
orlang-sql
, or you can specify a tag, and the syntax highlighting language associated with this tag will be used.
This was news to me! I had been under the impression, which I'm sure many others are as well, that ID X
was simply a shortcut to lang-X
. This is incorrect.
Therefore, ID'ing a snippet as regex
is really saying "identify this snippet as the defined identifier for regex". This happens to be lang-default
, which is really a shortcut to tell the highlighter to "guess" what the correct highlight should be, which in this specific case, becomes Markdown.
So it's going regex
==> lang-default
==> lang-markdown
.
Popping open the console to take a look at the first snippet here will still show class="lang-regex s-code-block hljs"
, even though it's getting highlighted as Markdown. I believe this is due to how highlight.js works. It appears it never actually changes the identifier class name itself, but rather injects the child syntax classes underneath it in regardless.
1 - It looks like it was added back into the list in the FAQ post on Sept. 28 (Rev. 100), and given my discoveries below, the answer is yes, it is a mistake.