Back to Blog

Client-Side Markdown to PDF: From html2pdf.js to Native Browser Printing

MD-TO Team

Converting Markdown to PDF in the browser sounds simple, but it’s full of pitfalls. This article documents the technical choices, architectural evolution, and implementation details behind md-to.com’s PDF export — from our initial attempt with html2pdf.js to the final migration to native browser printing, including every lesson learned along the way.


Three Paths to Client-Side PDF Generation

Under a pure frontend (no server) constraint, there are three main approaches to generating PDFs:

ApproachMechanismLibraries
Server-side renderingLaunch a headless browser in Node.js to render HTML and export PDFPuppeteer, Playwright, wkhtmltopdf
Client-side JS generationRender HTML to Canvas in the browser, then convert to PDFhtml2pdf.js, jsPDF + html2canvas
Native browser printingCall window.print(), user selects “Save as PDF” in the print dialogNo third-party library needed

md-to.com is a fully static site where all conversions happen locally in the browser with no backend dependency. This rules out server-side approaches entirely, leaving us to choose between html2pdf.js and native browser printing.


Version 1: The html2pdf.js Experiment and Why We Dropped It

Initial Approach

We initially adopted html2pdf.js, which works as follows:

  1. Pass HTML elements to html2canvas to render as a Canvas bitmap
  2. Use jsPDF to slice the Canvas into multi-page PDF
  3. Trigger a browser download

The core configuration:

const opt = {
  margin: [0.5, 0.5],
  filename: pdfFilename,
  image: { type: 'jpeg', quality: 0.98 },
  html2canvas: { scale: 2, useCORS: true, logging: false },
  jsPDF: { unit: 'in', format: 'letter', orientation: 'portrait' },
  pagebreak: {
    mode: ['css', 'legacy'],
    avoid: ['pre', 'code', 'blockquote', 'img', '.katex-display',
            'h1', 'h2', 'h3', 'h4', 'h5', 'h6'],
  },
};
await html2pdf().set(opt).from(wrapper).save();

The pagebreak.avoid configuration tells html2pdf.js to avoid splitting pages inside these elements — in theory, this should solve the truncation problem.

Problems Encountered

Real-world usage exposed several critical issues:

1. Bitmap output — text is not selectable

html2pdf.js essentially does “HTML → Canvas screenshot → stitch into PDF”. Every page is a JPEG image. Text in the PDF cannot be selected or searched, and all hyperlinks are lost. For technical documentation, this is a dealbreaker.

2. Page-break truncation is hard to fix

While pagebreak.avoid handles simple cases, when a code block exceeds one page in height or nested lists span across pages, html2pdf.js’s pagination algorithm cuts right through the middle of elements — because it calculates based on pixel height, unlike a browser layout engine that understands content structure.

3. Performance and memory issues

html2canvas must re-render the entire DOM tree into a Canvas. For long documents (10+ pages), this consumes massive amounts of memory and can even crash the page on mobile devices.


Version 2: Native Browser Printing

Why Native Printing

The browser’s print function invokes the operating system’s layout engine, producing vector PDFs — text is selectable and searchable, hyperlinks remain clickable, and file sizes are smaller.

Dimensionhtml2pdf.jsNative browser printing
Output formatBitmap (Canvas screenshot)Vector (text selectable/searchable)
PaginationJS pixel calculation, prone to truncationCSS page-break-*, handled by browser engine
HyperlinksLostPreserved
PerformanceHeavy memory usage rendering CanvasCalls system print kernel, extremely fast
MaintainabilityConcerningNative browser capability, future-proof

The only “downside” is that users need to manually select “Save as PDF” in the print dialog — this requires good UI guidance.


Core Architecture: Overlay + IFrame

The overall architecture has three layers:

┌──────────────────────────────────┐
│           Toolbar                │  ← Print button, Cancel button
├──────────────────────────────────┤
│                                  │
│     ┌────────────────────┐       │
│     │                    │       │
│     │   IFrame (Preview) │       │  ← Injected with styled HTML
│     │   width: 8.5in     │       │
│     │                    │       │
│     └────────────────────┘       │
│                                  │
│       Overlay (Full-screen)      │  ← z-index: 70
└──────────────────────────────────┘

Workflow

  1. User clicks the “Download PDF” button
  2. showPdfOverlay() is called, creating a full-screen overlay
  3. Dynamically import the template system and get the active template
  4. Call generatePrintStyles(template) to generate complete print styles
  5. Create an iframe and write styles + HTML content into it
  6. User previews the result in the overlay, then clicks “Print / Save as PDF”
  7. iframe.contentWindow.print() triggers the system print dialog

Why an IFrame

Calling window.print() directly on the current page would print the entire page (including the editor, sidebar, and other UI). Using an iframe allows us to:

  • Isolate print content to only the rendered Markdown output
  • Inject independent print styles without affecting the main page
  • Provide WYSIWYG preview — what you see in the iframe is what gets printed

Key implementation:

const styles = generatePrintStyles(template);
const doc = iframe.contentDocument || iframe.contentWindow?.document;
if (doc) {
  doc.open();
  doc.write(`
    <!DOCTYPE html>
    <html>
    <head>
      <meta charset="UTF-8">
      <title>Document</title>
      ${styles}
    </head>
    <body>
      ${currentHtml}
    </body>
    </html>
  `);
  doc.close();
}

The iframe width is set to 8.5in (Letter paper width), with height dynamically calculated based on content:

setTimeout(() => {
  const bodyHeight = doc.body.scrollHeight;
  iframe.style.height = Math.max(bodyHeight + 40, 800) + 'px';
}, 100);

CSS Pagination Strategies in Detail

The core of the native printing approach lies in CSS. @media print rules instruct the browser layout engine to follow our intent when paginating.

Preventing Content Truncation

@media print {
  tr, pre, blockquote, img, .katex-display {
    page-break-inside: avoid;
    break-inside: avoid;
  }
}

Using both page-break-inside (legacy syntax) and break-inside (modern syntax) ensures maximum compatibility. The covered elements include:

  • pre: Code blocks
  • blockquote: Blockquotes
  • img: Images
  • .katex-display: Math formula blocks
  • tr: Table rows

Tables Span Pages but Headers Repeat

@media print {
  table {
    page-break-inside: auto;
  }
  thead {
    display: table-header-group;
  }
}

This is a clever combination: table is allowed to span pages (since tables can be very long), but thead set to table-header-group makes the browser repeat the table header at the top of each page. This is extremely useful for long tables — no need to flip back to the first page to check column names.

Headings Don’t Get Orphaned at Page Bottom

@media print {
  h1, h2, h3, h4, h5, h6 {
    page-break-after: avoid;
    break-after: avoid;
  }
}

If a heading appears at the bottom of a page with its body text starting on the next page, the reading experience suffers. page-break-after: avoid tells the browser: don’t page-break after a heading — keep the heading and its following content on the same page.

Preserving Background Colors

@media print {
  body {
    -webkit-print-color-adjust: exact;
    print-color-adjust: exact;
  }
}

Browsers don’t print background colors by default (to save ink). These two CSS lines force background color preservation — critical for code block backgrounds, table zebra stripes, and blockquote backgrounds. Users still need to check “Background graphics” in the print dialog for this to take effect.


Template System and Dynamic Style Generation

md-to.com offers 20+ document templates, each defining a complete set of visual parameters:

interface DocumentTemplate {
  fonts: { body: string; heading: string; code: string };
  fontSizes: { body: number; h1: number; h2: number; /* ... */ code: number };
  colors: {
    text: string; heading: string; link: string;
    codeBackground: string; codeText: string; codeBorder: string;
    quoteBorder: string; quoteBackground: string; quoteText: string;
    tableBorder: string; tableHeaderBg: string; tableHeaderText: string;
    tableRowOdd: string; tableRowEven: string;
  };
  spacing: {
    lineHeight: number;
    headingBefore: number; headingAfter: number;
    paragraphBefore: number; paragraphAfter: number;
  };
}

The generatePrintStyles(template) function injects template parameters into a CSS template string, generating a complete <style> tag. This means the same Markdown content produces a completely different PDF when you switch templates — fonts, colors, and spacing all follow the template.

Code highlighting styles (highlight.js) are also inlined into the generated CSS, ensuring code blocks in the PDF retain syntax coloring.


Smart Filename Generation

Downloaded PDF filenames aren’t a generic download.pdf. The getDownloadFileName() function extracts the first heading from the Markdown source:

export function getDownloadFileName(
  mdText: string,
  extension: string,
  fallbackPrefix: string = 'markdown-export',
): string {
  const lines = mdText.split(/\r?\n/);
  const headingLine = lines.find((line) => /^\s{0,3}#{1,6}\s+/.test(line));
  if (headingLine) {
    const raw = headingLine
      .replace(/^\s{0,3}#{1,6}\s+/, '')
      .replace(/\s+#*\s*$/, '');
    const cleaned = sanitizeFilename(raw);
    if (cleaned) return `${cleaned}.${extension}`;
  }
  return `${fallbackPrefix}-${getDateStamp()}.${extension}`;
}

The logic:

  1. Split the Markdown text by lines and find the first line matching # Heading format
  2. Strip the # prefix and any trailing # decorations
  3. Clean illegal characters (\ / : * ? " < > |) via sanitizeFilename(), normalize whitespace, and truncate to 80 characters
  4. If no heading is found, fall back to markdown-to-pdf-20260311.pdf format

For example, if a document starts with # Project Technical Proposal, the downloaded file is named Project Technical Proposal.pdf.


Lessons Learned

1. html2pdf.js Button State Bug

After html2pdf.js completed a download, the button’s disabled state wasn’t reset to false. After the first successful download, the button stayed greyed out. The success path only restored the button text but not the disabled attribute:

// Success path — missing btnDownloadPdf.disabled = false
await html2pdf().set(opt).from(wrapper).save();
showToast(texts.downloadStarted);
btnDownloadPdf.innerText = texts.downloadPdf;

While the failure path handled it correctly:

// Failure path — correctly restored
btnDownloadPdf.disabled = false;
btnDownloadPdf.innerText = texts.downloadPdf;

2. innerHTML Causing XSS Risk

An earlier version used tipText.innerHTML = texts.tip to inject tooltip text. Although texts.tip comes from i18n configuration rather than user input, innerHTML itself is a dangerous API. After code review, it was changed to textContent:

// Before: tipText.innerHTML = texts.tip;
tipText.textContent = texts.tip;

3. z-index Abuse

The overlay’s z-index was initially set to 99999, with the toolbar at 100000. This brute-force approach easily conflicts with other components in complex pages. After optimization, we switched to semantic layering — the overlay uses z-index: 70, and the toolbar uses position: relative to naturally stack above overlay content without needing its own z-index.

4. Gradual Migration Strategy

After migrating to native printing, we didn’t delete the html2pdf.js code. Instead, we hid the button via comments:

// toolbar.appendChild(btnDownloadPdf); Download has bugs, button hidden for now
toolbar.appendChild(btnPrint);

The html2pdf.js download logic, configuration, and event handlers are all preserved — just the entry point is hidden. The benefit: if we ever need to provide “one-click download” (without requiring user interaction in the print dialog), we can quickly restore it.


Guiding Users Through Print Settings

The user experience of native browser printing depends on the print dialog settings. Clear UI guidance is essential:

  1. Destination: Select “Save as PDF”
  2. Background graphics: Must be checked — otherwise code block backgrounds, table zebra stripes, and blockquote backgrounds are all lost
  3. Headers and footers: Uncheck — remove the unnecessary URL and date at the top of the PDF
  4. Paper size: A4 (common internationally) or Letter (common in North America)
  5. Margins: Choose “Default” — CSS already defines padding: 20mm

Conclusion

Looking back at the entire technical evolution:

  • html2pdf.js solved the “client-side PDF generation” problem, but output quality (bitmap, truncation, unsearchable text) fell short
  • Native browser printing leverages the OS layout engine to produce vector PDFs with CSS-controlled pagination, dramatically improving output quality
  • The Overlay + IFrame architecture isolates print content from page UI, providing a WYSIWYG preview experience
  • CSS pagination rules are the core of the solution — page-break-inside: avoid, table-header-group, and page-break-after: avoid solve the three major pain points: code truncation, lost table headers, and orphaned headings

If you’re building a similar frontend PDF export feature, consider native browser printing first. It requires no third-party dependencies, produces the best output quality, and will never become outdated — because printing is a fundamental browser capability.


Related links: