Extension md_in_html does not recognize tags with hyphens #1246

igordsm · 2022-04-28T01:57:34Z

Web components are custom HTML components that are required to have - in their names. This breaks current HTML handling since these elements are not considered. IMHO they should be treated the same as <div> ("block" elements, if I'm not mistaken).

The following was tested in current main with the extension md_in_html active.

input

<a-b>

asdf

</a-b>

output:

<p><a-b></p>
<p>asdf</p>
<p></a-b></p>

expected:

<a-b>
<p>asdf</p>
</a-b>

I went through the code and might know how to add this, but I would like the maintainers' input before proceeding.

The text was updated successfully, but these errors were encountered:

waylan · 2022-04-29T18:59:08Z

Web components are custom HTML components that are required to have - in their names.

Can you point us to a spec for this?

igordsm · 2022-04-29T21:17:04Z

The resource I use the most is MDN: https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_custom_elements

The actual specification of valid names is at https://html.spec.whatwg.org/#valid-custom-element-name

waylan · 2022-05-02T14:21:56Z

Thank you for the links. There are two things I need to mention here.

First of all, the way Python-Markdown handles raw HTML is to define a list of known block-level tags. Any content within those block-level tags gets special treatment. Anything outside those known block-level tags is just treated as regular Markdown content, including inline raw HTML elements, which explains the behavior of the sample provided above.

Second, I will note that to use custom elements, the HTML spec requires you to register the elements with the browser first. Without registering them, then the browser would have no knowledge of how to handle them. In fact, a custom element could be an inline element or a block-level element.

Given the above, I think that the logical way to support custom elements in Python-Markdown is to require the user to "register" the elements. That is, if you have a custom element which should be treated as a block-level element, you need to inform the Markdown class about it. This would probably make a good candidate for a third party extension (extension to register custom elements), although you can do this without an extension as demonstrated below.

>>> src = '''
... <a-b>
...
... asdf
...
... </a-b>
... '''
>>> md = markdown.Markdown()
>>> md.block_level_elements.append('a-b')
>>> md.convert(src)
'<a-b>\n\nasdf\n\n</a-b>'

That said, this does not currently work correctly with the md_in_html extension. Specifically, the extension fails to allow Markdown parsing within the element.

>>> md = markdown.Markdown(extensions=['md_in_html'])
>>> md.block_level_elements.append('a-b')
>>> md.convert(src)
'<a-b>\n\nasdf\n\n</a-b>'

This would appear to be because the extension compiles its lists of element types when the class instance is created and therefore does not see the changes made to the Markdown class latter (see relevant code here). Ideally, the extension would build its list of element types after all extensions are loaded. I'm open to a PR which makes this change only. However, I do not see any need to add explicit support for custom elements specifically.

igordsm · 2022-05-09T02:12:57Z

Thanks for the detailed feedback @waylan . I'll try and make a PR with the changes you outlined above this week.

waylan added more-info-needed More information needs to be provided. extension Related to one or more of the included extensions. labels Apr 29, 2022

waylan added feature Feature request. someday-maybe Approved low priority request. confirmed Confirmed bug report or approved feature request. and removed more-info-needed More information needs to be provided. labels May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extension md_in_html does not recognize tags with hyphens #1246

Extension md_in_html does not recognize tags with hyphens #1246

igordsm commented Apr 28, 2022

waylan commented Apr 29, 2022

igordsm commented Apr 29, 2022

waylan commented May 2, 2022 •

edited

Loading

igordsm commented May 9, 2022 •

edited

Loading

Extension md_in_html does not recognize tags with hyphens #1246

Extension md_in_html does not recognize tags with hyphens #1246

Comments

igordsm commented Apr 28, 2022

waylan commented Apr 29, 2022

igordsm commented Apr 29, 2022

waylan commented May 2, 2022 • edited Loading

igordsm commented May 9, 2022 • edited Loading

waylan commented May 2, 2022 •

edited

Loading

igordsm commented May 9, 2022 •

edited

Loading