Skip to content

Latest commit

 

History

History
545 lines (400 loc) · 14.4 KB

Syntax.md

File metadata and controls

545 lines (400 loc) · 14.4 KB

pmarkdown supported syntax

This page describes the Markdown syntax supported by pmarkdown in its default mode. That syntax can be changed by selecting another mode, as documented here.

In addition, many of the behaviors of the program can be individually tweaked through a large set of options. Some of the options are mentioned here but you should refer to the options documentation for the complete list.

Overview

The overall syntax is based on the CommonMark spec and is not fully defined here. Refer to the spec for all the details.

The syntax is split into top-level blocks (which themselves can be containers or not) and inline content inside most of these blocks. In general, the blocks are always parsed first, and their inline content is parsed afterward and cannot impact the rendering of the block.

Escaping

Any symbol character can be prefixed by a single \ in a Markdown source to prevent any Markdown interpretation of that character and force it to be rendered as-is in the output.

Block structures

These are the leaves blocks, that is type of blocks that can’t contain other blocks.

Paragraphs

A paragraph is the simplest structure in Markdown. It’s just a set of line of text that does not look like any other block. A single paragraph can span multiple line of text but is interrupted at the first empty line, when the structure of another block is recognised, or when the structure of a parent block ends.

Note that the documentation of some block types says that they cannot interrupt a paragraph. This means that such a block cannot immediately follow a paragraph without being separated by an empty line.

Paragraphs are rendered in HTML within a single <p> tag. For example:

foo
bar

baz

Is rendered as:

<p>foo
bar</p>
<p>baz</p>

Headings

By default, only so-called ATX headings are supported. These headings are a single line of text, starting with a number of # characters, followed by at least one space or tab, then the heading text property and optionally terminated by another set of # characters which does not need to be of the same length as the opening ones.

The heading level is determined by the number of opening # characters. For example:

# Foo

## Bar ###

Is rendered as:

<h1>Foo</h1>
<h2>Bar</h2>

Because there is no strong consensus on their syntax, and it can be confused with list label, setext headings are not supported by default. They can be activated with the use_setext_headings option.

File metadata

The Markdown text can start with a YAML table, at the very beginning of the content, to specify metadata for the file. This table must start with a line containing only --- and ends with a line containing only --- or .... The YAML content must not be empty and must not contain any blank line.

The YAML content is not part of the rendered HTML, but can be extracted by other processing systems.

---
foo: bar
baz: bin
...
baz

Is rendered as:

<p>baz</p>

Thematic breaks

A thematic break (usually rendered as a horizontal line through the page) can be inserting three or more -, _, * characters alone on a line, optionally separated by spaces or tabs. For example:

***

- - -

Is rendered as:

<hr />
<hr />

Code blocks

We support the two types of common code blocks: indented and fenced code blocks.

Indented code blocks are made of any paragraph starting with at least 4 spaces or a tab.

Fenced code blocks starts with a line containing at least three backticks (`) or three tilde (~) characters in a row, optionally followed by some text, called the info string. If the code blocks starts with backticks then the info string cannot contain a backtick characters. The block extends until a line containing only backticks or tilde (matching the opening character) and using at least as many of these characters as the opening line.

For example:

    foo
    bar

Is rendered as:

<pre><code>foo
bar
</code></pre>

and

``` c++
foo bar
```

Is rendered as:

<pre><code class="language-c++">foo bar
</code></pre>

As can be seen in this example, when present the first word of the info string (and only that one) is used to generate a language class for the <code> tag, which is often recognised by syntax coloration scripts. This behavior can be tweaked with the code_blocks_info option.

Link reference definitions

The general syntax for links is described below but some links can use a reference definition to avoid having to mention the URL at the point where the link is used.

The reference definition can appear anywhere in the document as a paragraph of its own (it can appear before or after where it is used, and can be inside container blocks). It is recommended to put it soon after the point where it is used, at the root of the document.

The general syntax is as follow: the link label between brackets, followed by a colon, followed by optional spaces and up to one new line, followed by the URL itself, optionally between angled brackets, optionally followed by spaces and up to one new line and the link title, between single or double quotation mark. Example:

[label]: https://example.com "Some title"

The link reference definition does not appear in the generated HTML, unless it is used by an inline link. Note that the optional title cannot contain quotation marks of the same type (simple or double) than the one that are used to quote it. Also, if several link reference definitions are present with the same label, a warning is emitted and only the first one is kept.

HTML Blocks

A document can contain embedded HTML. At top level, the content of the HTML block will start with an opening tag and continue until a matching closing tag or a blank line, depending on the type of tags.

The complete rules are too complex to describe here. See the CommonMark spec for all the details.

The content of the HTML block is included as-is in the output and is not parsed as Markdown.

Note that you can use the disallowed_html_tags option to prevent some specific HTML tags from being present in the HTML output. When this option is set, the tags that it specifies will appear in the output as literal text, rather than HTML markup.

Tables

A table can be represented through a set of lines where the cells are separated by pipes. The first line is the header line, the next line must be a separator line and it must have exactly the same number of cells. Then a table can include as many other lines as needed and their number of cells does not need to match exactly the header line.

The separator line, consist of cells separated by pipes (like any other lines) and the cells are made only of set of dashes (-) optionally with colons (:) at the beginning or end of the cell, to signify the alignment of the column. Example:

| foo | bar | baz |
| :-- | --- | :-: |
| 1 | 2 | 3 |

Is rendered as:

<table>
  <thead>
    <tr>
      <th align="left">foo</th>
      <th>bar</th>
      <th align="center">baz</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td align="left">1</td>
      <td>2</td>
      <td align="center">3</td>
    </tr>
  </tbody>
</table>

By default, tables cannot interrupt a paragraph (there must be a blank line before them or they must appear just after a title or a similar construct) and separating pipes (|) must be used at the beginning and end of each line of the table. This behavior can be tweaked using the table_blocks_can_interrupt_paragraph and table_blocks_pipes_requirements options.

Note that if you put more cells in a line than there was in the header line, then these cells will be ignored and a warning will be generated.

The content of all cells is interpreted as inline Markdown text but cannot contain other Markdown blocks.

Container blocks

The blocks are top-level structure (like the leaves blocks above) except that they can contain any other Markdown blocks in addition to inline content.

Lists

TODO, but note that pmarkdown is using the default CommonMark syntax.

Task lists

TODO, but note that pmarkdown is using the default GitHub.

Block Quotes

TODO, but note that pmarkdown is using the default CommonMark syntax.

Inlines

Inline content is the content that can appear within other blocks or, as paragraphs, directly at the root of the document. In general, each paragraph is rendered within a <p> tag, but this may vary depending on the container block.

Note that, in the examples below, the <p> tag surrounding paragraphs is omitted when it’s not relevant for the example.

Text content

By default text that is not interpreted as anything else is rendered as-is in the output:

foo bar

Is rendered as:

<p>foo bar</p>

HTML entities

Any character can appear escaped in the input markdown and it will generally be decoded in the output HTML. The supported escaping are the following:

  • decimal character references: &#123; where 123 is any decimal unicode code point.
  • hexadecimal character references: &#X1bc; where 1cb is any hexadecimal unicode code point.
  • HTML entity reference: &#abc; where abc is a valid HTML5 entity name.
  • Backslash escaping: \X where X is an ASCII control character (e.g. #, !, or any other graphical, non-alphanumeric ASCII character).

For example:

&#65;
&#X4B;
&mu;
\#

Is rendered as:

A
K
μ
#

When it appears as an escaped entity, a character will never be part of a Markdown construct (whether to indicate a block structure or an inline span) so escaping can be used to prevent some text from being recognised as a Markdown construct:

\# this is not a header

Is rendered as:

<p># this is not a header</p>

Finally, note that some characters with special meanings in HTML5 are always escaped in the output, whether they appeared literally or escaped in the input:

some "word"

Is rendered as:

some &quot;word&quot;

Soft and hard line breaks

New lines in a paragraph are rendered as-is, with a new line in the HTML output, also called a soft line break:

foo
bar

Is rendered as:

<p>foo
bar</p>

It is possible to render hard line breaks (<br />) by putting a backslash just before the end of the line:

foo\
bar

Is rendered as:

<p>foo<br />
bar</p>

Other Markdown engines allow to end the line with two spaces for a hard line break. But, as this is mostly invisible in the Markdown source, this is not supported in our default syntax. This can be activated with the two_spaces_hard_line_breaks option.

Emphasis and other markups

Emphasis and other spans of text can be delimited using some specific characters. Our default syntax uses *, _, and ~. The exact parsing rules are too complex to be included here (they are documented in the CommonMark spec) but, in general, a span will be delimited by one or two of the same character among those listed above, when they are flanking the left side of a word, until the same character in the same quantity is found flanking the right side of a word:

*emphasis (italic)*
_also emphasis_
**strong (bold)**
__also strong__
~strikethrough~
~~deleted~~

Is rendered as:

<em>emphasis (italic)</em>
<em>also emphasis</em>
<strong>strong (bold)</strong>
<strong>also strong</strong>
<s>strikethrough</s>
<del>deleted</del>

Multiple different spans can be nested, but they can’t partially overlap:

*this is em **and strong***
*this __is only em* and not strong__

Is rendered as:

<em>this is em <strong>and strong</strong></em>
<em>this __is only em</em> and not strong__

The rendering of these delimited spans can be tuned with the inline_delimiters option, which allows to specify arbitrary tags to generate for a given delimiter (and can also be used to generate <span> with arbitrary classes).

Links

TODO, but note that pmarkdown is using the default CommonMark syntax.

Images

TODO, but note that pmarkdown is using the default CommonMark syntax.

Autolinks

For URLs that look like links, they can appear directly in the Markdown text, surrounded by angle brackets:

<http://example.com/page?param=foo>

Is rendered as:

<a href="http://example.com/page?param=foo">http://example.com/page?param=foo</a>

That syntax requires that the protocol (for example: https:) be present in the link for it to be recognised.

An even lighter syntax exist, where the link can appear directly in the text if there is no ambiguity. It must start by a protocol or by the string www. and must come at the beginning of a line, after a whitespace, or after some other existing delimiters:

A link: www.example.com/page?param=foo.

Is rendered as:

A link: <a href="https://www.example.com/page?param=foo">www.example.com/page?param=foo</a>.

As seen above, that syntax will try to exclude punctuation like . or ) following a link when it does not seem to be part of the link itself. Also, note that the default https: protocol is added if it was missing. This can be controlled using the default_extended_autolinks_scheme option.

Raw HTML

Anything that looks like an HTML tag will appear as-is in the output. For example:

text that is <ins>inserted</ins>.

Is rendered as:

text that is <ins>inserted</ins>.

As opposed to block-level HTML element, this applies only to the tag itself and any other text will still be parsed as Markdown (including between two HTML tags).

Note that you can use the disallowed_html_tags option to prevent some specific HTML tags from being present in the HTML output. When this option is set, the tags that it specifies will appear in the output as literal text, rather than HTML markup.