html-to-markdown logo

html-to-markdown Johannes Kaufmann

Use this command to install html-to-markdown:
winget install --id=JohannesKaufmann.html2markdown -e

Convert HTML to Markdown. Even works with entire websites and can be extended through rules.

html-to-markdown: Converting HTML to Markdown with Precision

html-to-markdown is a robust tool designed to convert HTML content, including entire websites, into clean, readable Markdown. This versatile software supports a wide range of Markdown elements such as bold text, italicized phrases, ordered and unordered lists, blockquotes, and both inline and block code snippets. It ensures proper formatting for links and images while employing smart escaping to handle special characters effectively.

The tool offers customizable options, allowing users to decide whether to remove or retain specific HTML tags, providing ultimate control over the output. Additionally, it is extendable through plugins and rules, making it adaptable to various conversion needs.

Ideal for developers, technical writers, content creators, and anyone requiring efficient HTML-to-Markdown conversions, html-to-markdown streamlines workflows by saving time and maintaining precise formatting. Its flexibility enables users to tailor the conversion process according to their specific requirements.

Installation: html-to-markdown can be easily installed via winget, ensuring a seamless setup process for your development environment.

README

html-to-markdown

A robust html-to-markdown converter that transforms HTML (even entire websites) into clean, readable Markdown. It supports complex formatting, customizable options, and plugins for full control over the conversion process.

Use the fully extendable Golang library or a quick CLI command. Alternatively, try the Online Demo or REST API to see it in action!

Here are some cool features:

  • Bold & Italic: Supports bold and italicβ€”even within single words.

  • List: Handles ordered and unordered lists with full nesting support.

  • Blockquote: Blockquotes can include other elements, with seamless support for nested quotes.

  • Inline Code & Code Block: Correctly handles backticks and multi-line code blocks, preserving code structure.

  • Link & Image: Properly formats multi-line links, adding escapes for blank lines where needed.

  • Smart Escaping: Escapes special characters only when necessary, to avoid accidental Markdown rendering. πŸ—’οΈ ESCAPING.md

  • Remove/Keep HTML: Choose to strip or retain specific HTML tags for ultimate control over output.

  • Plugins: Easily extend with plugins. Or create custom ones to enhance functionality.

  • Table Plugin: Converts tables with support for alignment, rowspan and colspan.


Usage

πŸ’» Golang library | πŸ“¦ CLI | ▢️ Hosted Demo | 🌐 Hosted REST API

> [!TIP] > Looking for an all in one cloud solution? We're sponsored by πŸ”₯ Firecrawl, where you can scrape any website and turn it into AI friendly markdown with one API call.


Golang Library

Installation

go get -u github.com/JohannesKaufmann/html-to-markdown/v2

Or if you want a specific commit add the suffix /v2@commithash

> [!NOTE]
> This is the documentation for the v2 library. For the old version switch to the "v1" branch.

Usage

Go V2 Reference

package main

import (
	"fmt"
	"log"

	htmltomarkdown "github.com/JohannesKaufmann/html-to-markdown/v2"
)

func main() {
	input := `<strong>Bold Text</strong>`

	markdown, err := htmltomarkdown.ConvertString(input)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(markdown)
	// Output: **Bold Text**
}

Use WithDomain to convert relative links to absolute links:

package main

import (
	"fmt"
	"log"

	htmltomarkdown "github.com/JohannesKaufmann/html-to-markdown/v2"
	"github.com/JohannesKaufmann/html-to-markdown/v2/converter"
)

func main() {
	input := `<img src="/assets/image.png" />`

	markdown, err := htmltomarkdown.ConvertString(
		input,
		converter.WithDomain("https://example.com"),
	)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(markdown)
	// Output: ![](https://example.com/assets/image.png)
}

The function htmltomarkdown.ConvertString() is a small wrapper around converter.NewConverter() and the base and commonmark plugins. If you want more control, use the following:

package main

import (
	"fmt"
	"log"

	"github.com/JohannesKaufmann/html-to-markdown/v2/converter"
	"github.com/JohannesKaufmann/html-to-markdown/v2/plugin/base"
	"github.com/JohannesKaufmann/html-to-markdown/v2/plugin/commonmark"
)

func main() {
	input := `<strong>Bold Text</strong>`

	conv := converter.NewConverter(
		converter.WithPlugins(
			base.NewBasePlugin(),
			commonmark.NewCommonmarkPlugin(
				commonmark.WithStrongDelimiter("__"),
				// ...additional configurations for the plugin
			),

			// ...additional plugins (e.g. table)
		),
	)

	markdown, err := conv.ConvertString(input)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(markdown)
	// Output: __Bold Text__
}

> [!NOTE]
> If you use NewConverter directly make sure to also register the commonmark and base plugin.


Collapse & Tag Type

You can specify how different HTML tags should be handled during conversion.

  • Tag Types: When collapsing whitespace it is useful to know if a node is block or inline.
    • So if you have Web Components/Custom Elements remember to register the type using TagType or RendererFor.
    • Additionally, you can remove tags completely from the output.
  • Pre-built Renderers: There are several pre-built renderers available. For example:
    • RenderAsHTML will render the node (including children) as HTML.
    • RenderAsHTMLWrapper will render the node as HTML and render the children as markdown.

> [!NOTE]
> By default, some tags are automatically removed (e.g. `

Versions
2.3.1
2.3.0
2.2.2
2.1.0
Website
License