Quantcast
Viewing all articles
Browse latest Browse all 39

Answer by zcoop98 for Goodbye, Prettify. Hello highlight.js! Swapping out our Syntax Highlighter

(Manually changing this from bug to status-bydesign given my discoveries documented below.)

I searched around, but I couldn't find any previous posts referencing regular expressions.
Regular expressions are stated to be currently supported, but it is not in the list of languages supported by highlight.js (it was supported by Prettify).

There are some weird effects when highlighting complex expressions, e.g., from this answer:

(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

It sometimes italicizes the characters between asterisks *, and other times fails to highlight character lists inside square brackets, [].

If it's not supported by highlight.js, where is this highlighting scheme even coming from?See update Are regular expressions included in the FAQ list by mistake1? I notice that the default highlighter for the tag on SO is lang-default rather than lang-regex.


Update

So I've done a little digging, and it appears what's really going on here is that the regular expression in this post is getting auto-recognized as Markdown, even when specified as regex.

Setting the identifier of the same snippet as lang-markdown has an identical effect as regex:

(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

This leads into the discovery I made, which largely revolves around the last sentence of my original post:

I notice that the default highlighter for the tag on SO is lang-default rather than lang-regex.

As described in this post by @T.J.Crowder, and backed up by the help center, there is a difference between identifying a code block as lang-X vs. just X.

As per the help center (emphasis mine):

You can use either one of the supported language codes, like lang-cpp or lang-sql, or you can specify a tag, and the syntax highlighting language associated with this tag will be used.

This was news to me! I had been under the impression, which I'm sure many others are as well, that ID X was simply a shortcut to lang-X. This is incorrect.

Therefore, ID'ing a snippet as regex is really saying "identify this snippet as the defined identifier for ". This happens to be lang-default, which is really a shortcut to tell the highlighter to "guess" what the correct highlight should be, which in this specific case, becomes Markdown.

So it's going regex ==> lang-default ==> lang-markdown.

Popping open the console to take a look at the first snippet here will still show class="lang-regex s-code-block hljs", even though it's getting highlighted as Markdown. I believe this is due to how highlight.js works. It appears it never actually changes the identifier class name itself, but rather injects the child syntax classes underneath it in regardless.


1 - It looks like it was added back into the list in the FAQ post on Sept. 28 (Rev. 100), and given my discoveries below, the answer is yes, it is a mistake.


Viewing all articles
Browse latest Browse all 39

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>