html-to-markdown Johannes Kaufmann
winget install --id=JohannesKaufmann.html2markdown -e
Convert HTML to Markdown. Even works with entire websites and can be extended through rules.
html-to-markdown: Converting HTML to Markdown with Precision
html-to-markdown is a robust tool designed to convert HTML content, including entire websites, into clean, readable Markdown. This versatile software supports a wide range of Markdown elements such as bold text, italicized phrases, ordered and unordered lists, blockquotes, and both inline and block code snippets. It ensures proper formatting for links and images while employing smart escaping to handle special characters effectively.
The tool offers customizable options, allowing users to decide whether to remove or retain specific HTML tags, providing ultimate control over the output. Additionally, it is extendable through plugins and rules, making it adaptable to various conversion needs.
Ideal for developers, technical writers, content creators, and anyone requiring efficient HTML-to-Markdown conversions, html-to-markdown streamlines workflows by saving time and maintaining precise formatting. Its flexibility enables users to tailor the conversion process according to their specific requirements.
Installation: html-to-markdown can be easily installed via winget, ensuring a seamless setup process for your development environment.
README
html-to-markdown
A robust html-to-markdown converter that transforms HTML (even entire websites) into clean, readable Markdown. It supports complex formatting, customizable options, and plugins for full control over the conversion process.
Use the fully extendable Golang library or a quick CLI command. Alternatively, try the Online Demo or REST API to see it in action!
Here are some cool features:
-
Bold & Italic: Supports bold and italicβeven within single words.
-
List: Handles ordered and unordered lists with full nesting support.
-
Blockquote: Blockquotes can include other elements, with seamless support for nested quotes.
-
Inline Code & Code Block: Correctly handles backticks and multi-line code blocks, preserving code structure.
-
Link & Image: Properly formats multi-line links, adding escapes for blank lines where needed.
-
Smart Escaping: Escapes special characters only when necessary, to avoid accidental Markdown rendering. ποΈ ESCAPING.md
-
Remove/Keep HTML: Choose to strip or retain specific HTML tags for ultimate control over output.
-
Plugins: Easily extend with plugins. Or create custom ones to enhance functionality.
-
Table Plugin: Converts tables with support for alignment, rowspan and colspan.
Usage
π» Golang library | π¦ CLI | βΆοΈ Hosted Demo | π Hosted REST API
> [!TIP] > Looking for an all in one cloud solution? We're sponsored by π₯ Firecrawl, where you can scrape any website and turn it into AI friendly markdown with one API call.
Golang Library
Installation
go get -u github.com/JohannesKaufmann/html-to-markdown/v2
Or if you want a specific commit add the suffix /v2@commithash
> [!NOTE]
> This is the documentation for the v2 library. For the old version switch to the "v1" branch.
Usage
package main
import (
"fmt"
"log"
htmltomarkdown "github.com/JohannesKaufmann/html-to-markdown/v2"
)
func main() {
input := `<strong>Bold Text</strong>`
markdown, err := htmltomarkdown.ConvertString(input)
if err != nil {
log.Fatal(err)
}
fmt.Println(markdown)
// Output: **Bold Text**
}
- π§βπ» Example code, basics
Use WithDomain
to convert relative links to absolute links:
package main
import (
"fmt"
"log"
htmltomarkdown "github.com/JohannesKaufmann/html-to-markdown/v2"
"github.com/JohannesKaufmann/html-to-markdown/v2/converter"
)
func main() {
input := `<img src="/assets/image.png" />`
markdown, err := htmltomarkdown.ConvertString(
input,
converter.WithDomain("https://example.com"),
)
if err != nil {
log.Fatal(err)
}
fmt.Println(markdown)
// Output: 
}
The function htmltomarkdown.ConvertString()
is a small wrapper around converter.NewConverter()
and the base and commonmark plugins. If you want more control, use the following:
package main
import (
"fmt"
"log"
"github.com/JohannesKaufmann/html-to-markdown/v2/converter"
"github.com/JohannesKaufmann/html-to-markdown/v2/plugin/base"
"github.com/JohannesKaufmann/html-to-markdown/v2/plugin/commonmark"
)
func main() {
input := `<strong>Bold Text</strong>`
conv := converter.NewConverter(
converter.WithPlugins(
base.NewBasePlugin(),
commonmark.NewCommonmarkPlugin(
commonmark.WithStrongDelimiter("__"),
// ...additional configurations for the plugin
),
// ...additional plugins (e.g. table)
),
)
markdown, err := conv.ConvertString(input)
if err != nil {
log.Fatal(err)
}
fmt.Println(markdown)
// Output: __Bold Text__
}
- π§βπ» Example code, options
> [!NOTE]
> If you use NewConverter
directly make sure to also register the commonmark and base plugin.
Collapse & Tag Type
You can specify how different HTML tags should be handled during conversion.
- Tag Types: When collapsing whitespace it is useful to know if a node is block or inline.
- So if you have Web Components/Custom Elements remember to register the type using
TagType
orRendererFor
. - Additionally, you can remove tags completely from the output.
- So if you have Web Components/Custom Elements remember to register the type using
- Pre-built Renderers: There are several pre-built renderers available. For example:
RenderAsHTML
will render the node (including children) as HTML.RenderAsHTMLWrapper
will render the node as HTML and render the children as markdown.
> [!NOTE]
> By default, some tags are automatically removed (e.g. `