마크다운 렌더링하기

frontmatter

/posts에 mdx 형식으로 글을 작성한 후 URL이 들어오면 파일을 읽어 내용을 전달해 주고 있습니다. 지금은 글을 미리 읽어두지 않고 필요하면 처음 한 번만 읽어서 다시 재활용하도록 구현해 두었는데, 글 목록 페이지를 만드려면 글을 전부 미리 읽어놔야 해서 나중에 이 부분은 살짝 변경하려고 합니다. 어쩃든 mdx 파일의 내용을 전달받으면 메타데이터를 읽고 본문을 HTML 형태로 변환해야 합니다. 사실 이 부분은 전부 라이브러리가 있어 한두줄이면 처리할 수 있는데요. 테스트를 거친 결과 이 시리즈의 첫 글에서도 언급한 대로 성능이 굉장히 나빴기 때문에 직접 구현하려고 합니다. 따라서 본문을 렌더링하는데 사용한 라이브러리는 코드 하이라이팅, 수식 렌더링 밖에 없습니다.

mdx가 어떻게 생겼는지 부터 알아봅시다. mdx의 맨 위에는 frontmatter라고 부르는 글의 간략한 정보를 담는 내용이 옵니다. 지금 쓰고 있는 글은 이렇게 생겼습니다.

---
title: 마크다운 렌더링하기
description: mdx로 쓰인 글을 Next.js에서 렌더링합니다
cover:
tags: blog, next.js, mdx
date: 2024-10-20
author: yeahx4
series: make-blog
seriesIndex: 2
---

내용

---로 구분되어 :를 기준으로 있습니다. tags는 배열인 것도 주의해서 파싱해야 합니다.

export interface PostMeta {
  title: string;
  description: string;
  author: string;
  date: string;
  tags: string[];
  cover: string;
  series: string;
  seriesIndex: number;
}

export const parsePost = (
  content: string
): { meta: PostMeta; body: string[] } => {
  const lines = content.split("\n");
  const meta: PostMeta = {
    title: "",
    description: "",
    author: "",
    cover: "",
    date: "",
    tags: [],
    series: "",
    seriesIndex: -1,
  };

  let body: string[] = [];
  for (let i = 1; i < lines.length; i++) {
    const line = lines[i];

    if (line === "---") {
      body = lines.slice(i + 1);
      break;
    }

    // TODO : "title: abc: def" 형태의 데이터도 처리할 수 있도록 수정
    const [key, value] = line.split(":").map((x) => x.trim());

    if (key === "tags") {
      meta.tags = value.split(",").map((x) => x.trim());
    } else if (key === "seriesIndex") {
      meta.seriesIndex = parseInt(value);
    } else if (key === "title") {
      meta.title = value;
    } else if (key === "description") {
      meta.description = value;
    } else if (key === "author") {
      meta.author = value;
    } else if (key === "cover") {
      meta.cover = value;
    } else if (key === "date") {
      meta.date = value;
    } else if (key === "series") {
      meta.series = value;
    } else {
      console.warn(`Unknown key: ${key}`);
    }
  }

  return { meta, body };
};

한 줄씩 읽어가며 파싱을 진행했습니다. 한 줄에 :이 여러개면 문제가 생길 수 있는 코드긴 한데, :가 들어갈 일이 많지는 않을 것 같아서 일단 이렇게 구현하고 나중에 수정하도록 하겠습니다. meta와 body로 본문을 나눴으니 body를 HTML로 변환하기만 하면 사실상 완성입니다.

2024-10-23 수정

value에 콜론이 들어가도 문제가 없도록 수정했습니다. [key, value] 부분을

const colon = line.indexOf(":");

if (colon === -1) {
  console.warn(`Invalid meta line: ${line}`);
  continue;
}

const key = line.slice(0, colon).trim();
const value = line.slice(colon + 1).trim();

이렇게 수정했습니다.

본문 렌더링

마크다운은 크게 2종류로 분류할 수 있습니다.

줄 전체가 마크다운인 경우 ex) # h1
평범한 텍스트 안에 마크다운이 들어가는 경우 ex) abc **efg** abc

그리고 이 둘이 혼합되는 경우도 있습니다. 예시로 바로 위에 있는 리스트도 한 줄이 전부 마크다운인데 코드가 들어가 있죠. 그래서 renderLine(line: string): ReactNode 함수를 하나 만들어 줄 단위로 렌더링을 진행하고 toHtml(content: string[]): ReactNode[]에서 전체를 렌더링하되 중간중간에 renderLine을 호출해서 내부를 채워주려 합니다. 꽤 처리해야할 케이스가 많으니 하나씩 해 보겠습니다.

스타일링은 tailwind가 적용되지 않기 때문에 본문에 .prose라는 클레스를 갖는 div를 감싸고 전용 css를 하나 만들어 .prose h1 이런 식으로 선택해서 스타일을 직접 적용하려 합니다. 스타일 관련 코드는 굳이 언급하지 않으니 Github 코드를 참고해 주세요.

toHtml

헤딩 (h1 ~ h6)

어떤 줄이 # 으로 시작하면 헤딩이 됩니다. 그냥 노가다입니다.

if (content[i].startsWith("######"))
  html.push(
    <h6 id={content[i].replace("######", "").trim()} key={i}>
      {content[i].replace("######", "").trim()}
    </h6>
  );
else if (content[i].startsWith("#####"))
  html.push(
    <h5 id={content[i].replace("#####", "").trim()} key={i}>
      {content[i].replace("#####", "").trim()}
    </h5>
  );
else if (content[i].startsWith("####"))
  html.push(
    <h4 id={content[i].replace("####", "").trim()} key={i}>
      {content[i].replace("####", "").trim()}
    </h4>
  );
else if (content[i].startsWith("###"))
  html.push(
    <h3 id={content[i].replace("###", "").trim()} key={i}>
      {content[i].replace("###", "").trim()}
    </h3>
  );
else if (content[i].startsWith("##"))
  html.push(
    <h2 id={content[i].replace("##", "").trim()} key={i}>
      {content[i].replace("##", "").trim()}
    </h2>
  );
else if (content[i].startsWith("#"))
  html.push(
    <h1 id={content[i].replace("#", "").trim()} key={i}>
      {content[i].replace("#", "").trim()}
    </h1>
  );

가로줄

frontmatter에 사용된 ---를 제외하고 ---가 오면 글을 나누는 가로줄을 넣습니다. frontmatter에 사용된 가로줄은 애초에 body에 담겨오지 않으니 생각하지 않아도 됩니다.

else if (content[i].startsWith("---")) html.push(<hr key={i} />);

quoteblock

이게 quoteblock 입니다.

quoteblock은 앞에 >를 붙여 여러 줄에 걸쳐서도 작성할 수 있습니다. 따라서 한 줄로만 끝나지 않고 >로 시작하는 연속된 줄 모두를 선택해야 합니다.

else if (content[i].startsWith(">")) {
  let quote = content[i].replace(">", "").trim();

  while (i + 1 < content.length && content[i + 1].startsWith(">")) {
    quote += " " + content[i + 1].replace(">", "").trim();
    i++;
  }

  html.push(<blockquote key={i}>{renderLine(quote)}</blockquote>);
}

unordered list

리스트는 꽤 난이도가 있습니다. 리스트는 중첩이 되기 때문인데요.

리스트 1
리스트 2
- 리스트 2.1
- 리스트 2.2
리스트 3

이런 식으로 중첩이 되기 때문에 재귀를 통해서 처리해야 합니다. 그리고 한 줄에 너무 긴 텍스트를 쓰고 싶을 경우 들여쓰기를 유지하면서 다음 줄에 쓰면 같은 줄로 인식되어야 합니다. 재귀로 구현해야 하기 때문에 함수를 하나 만들어 주었습니다.

else if (content[i].startsWith("-")) {
  const parsedList = parseUnorderedList(content, i);
  html.push(
    <ul key={i} className="list-container">
      {parsedList.list}
    </ul>
  );
  i = parsedList.endIndex;
}

parseUnorderedList는 { list, endIndex }를 반환합니다.

function parseUnorderedList(content: string[], i: number, level = 0) {
  const list = [];
  let j = i;

  while (
    content[j] &&
    content[j].trimStart().startsWith("-") &&
    content[j].indexOf("-") / 2 === level
  ) {
    let itemText = content[j].replace("-", "").trim();

    while (
      j + 1 < content.length &&
      content[j + 1].startsWith(" ") &&
      !content[j + 1].trimStart().startsWith("-")
    ) {
      itemText += " " + content[j + 1].trim();
      j++;
    }

    if (
      j + 1 < content.length &&
      content[j + 1] &&
      content[j + 1].trimStart().startsWith("-") &&
      content[j + 1].indexOf("-") / 2 === level + 1
    ) {
      const nestedList = parseUnorderedList(content, j + 1, level + 1);

      list.push(
        <li key={j}>
          {renderLine(itemText)}
          <ul>{nestedList.list}</ul>
        </li>
      );

      j = nestedList.endIndex;
    } else {
      list.push(<li key={j}>{renderLine(itemText)}</li>);
    }

    j++;
  }
  return { list, endIndex: j - 1 };
}

ordered list

ordered list는 unordered list랑 거의 비슷하지만 ul은 항상 -로 시작하는 반면 서로 다른 숫자로 시작하기 때문에 처리해야 할 부분이 조금 더 많습니다.

else if (content[i].startsWith("1.")) {
  const parsedList = parseOrderedList(content, i);
  html.push(
    <ol key={i} className="list-container">
      {parsedList.list}
    </ol>
  );
  i = parsedList.endIndex;
}

function parseOrderedList(content: string[], i: number, level = 0) {
  const list = [];
  let j = i;

  while (
    content[j] &&
    /^\d+\./.test(content[j].trimStart()) &&
    content[j].indexOf(content[j].trimStart().match(/^\d+\./)![0]) / 3 === level
  ) {
    let itemText = content[j].replace(/^\s*\d+\./, "").trim();

    while (
      j + 1 < content.length &&
      content[j + 1].startsWith(" ") &&
      !/^\d+\./.test(content[j + 1].trimStart())
    ) {
      itemText += " " + content[j + 1].trim();
      j++;
    }

    if (
      j + 1 < content.length &&
      content[j + 1].trimStart() &&
      (content[j + 1].trimStart().startsWith("-") ||
        /^\d+\./.test(content[j + 1].trimStart())) &&
      content[j + 1].indexOf(
        content[j + 1].trimStart().match(/^\d+\./)
          ? content[j + 1].trimStart().match(/^\d+\./)![0]
          : "-"
      ) /
        3 ===
        level + 1
    ) {
      const nestedList = /^\d+\./.test(content[j + 1].trimStart())
        ? parseOrderedList(content, j + 1, level + 1)
        : parseUnorderedList(content, j + 1, level + 1);

      list.push(
        <li key={j}>
          {renderLine(itemText)}
          <ol>{nestedList.list}</ol>
        </li>
      );

      j = nestedList.endIndex;
    } else {
      list.push(<li key={j}>{renderLine(itemText)}</li>);
    }

    j++;
  }
  return { list, endIndex: j - 1 };
}

table

표는

| 제목1 | 제목2 | 제목3 |
| ----- | ----- | ----- |
| 내용1 | 내용2 | 내용3 |
| 내용4 | 내용5 | 내용6 |

구조를 가지고 있습니다. | --- | 위를 th로 넣고 아래를 tb로 넣어줘야 합니다.

const TABLE_ROW_REGEX = /^\|(.+)\|$/;

else if (TABLE_ROW_REGEX.test(content[i])) {
  const tableRows = [];
  while (i < content.length && TABLE_ROW_REGEX.test(content[i])) {
    tableRows.push(content[i]);
    i++;
  }
  i--;

  const table = renderTable(tableRows.join("\n"));
  html.push(<div key={i}>{table}</div>);
}

코드가 너무 길어져서 실제로 table로 바꾸는 코드는 함수로 뺐습니다. 테이블을 쭉 긁어모아서 string으로 전달합니다.

const renderTable = (markdown: string): ReactNode => {
  const lines = markdown.split("\n");
  const table: ReactNode[] = [];
  let header: ReactNode[] = [];
  const rows: ReactNode[][] = [];
  let isHeader = true;

  lines.forEach((line, index) => {
    if (line.trim().startsWith("|")) {
      const cells = line
        .split("|")
        .slice(1, -1)
        .map((cell) => cell.trim());
      if (isHeader) {
        header = cells.map((cell, i) => (
          <th key={`header-${i}`}>{renderLine(cell)}</th>
        ));
        isHeader = false;
      } else {
        rows.push(
          cells.map((cell, i) => (
            <td key={`cell-${index}-${i}`}>{renderLine(cell)}</td>
          ))
        );
      }
    }
  });

  table.push(
    <table key="table">
      <thead>
        <tr>{header}</tr>
      </thead>
      <tbody>
        {rows.slice(1).map((row, i) => (
          <tr key={`row-${i}`}>{row}</tr>
        ))}
      </tbody>
    </table>
  );

  return table;
};

이미지

이미지는 정규표현식을 사용해 alt, href를 분리해 주었습니다.

const IMAGE_REGEX = /!\[([^\]]+)\]\(([^)]+)\)/;

else if (IMAGE_REGEX.test(content[i])) {
  const match = content[i].match(IMAGE_REGEX);
  const alt = match![1];
  const href = match![2];

  html.push(<img key={i} src={href} alt={alt} />);
}

renderLine

이제 줄마다 있는 요소들을 렌더링해 주어야 합니다. 지원할 문법은 이텔릭체, 볼드체, 코드, 링크, 취소줄, 밑줄 그리고 수식이 있는데 수식은 별도로 처리해야 하니 아래에서 따로 얘기하겠습니다. 링크를 제외하고 (패턴)내용(패턴) 형식을 가지고 있습니다. 정규표현식으로 패턴에서 내용을 추출해 처리할 수 있습니다.

export const renderLine = (line: string): ReactNode => {
  const italicRegex = /_([^_]+)_/g;
  const boldRegex = /\*\*([^\*]+)\*\*/g;
  const codeRegex = /`(.*?)`/g;
  const linkRegex = /\[([^\]]+)\]\(([^)]+)\)/g;
  const strikethroughRegex = /~([^~]+)~/g;
  const underlineRegex = /__([^_]+)__/g;
  const mathRegex = /\$([^\$]+)\$/g;

  const processMarkdown = (text: string): ReactNode => {
    let lastIndex = 0;
    const elements: ReactNode[] = [];

    const patterns = [
      {
        regex: boldRegex,
        render: (key: string, match: string) => (
          <strong key={key}>{match}</strong>
        ),
      },
      {
        regex: italicRegex,
        render: (key: string, match: string) => <em key={key}>{match}</em>,
      },
      {
        regex: codeRegex,
        render: (key: string, match: string) => (
          <code className="inline-code" key={key}>
            {match}
          </code>
        ),
      },
      {
        regex: linkRegex,
        render: (key: string, match: string, href?: string) => (
          <a key={key} href={href}>
            {match}
          </a>
        ),
      },
      {
        regex: strikethroughRegex,
        render: (key: string, match: string) => <del key={key}>{match}</del>,
      },
      {
        regex: underlineRegex,
        render: (key: string, match: string) => <u key={key}>{match}</u>,
      },
      {
        regex: mathRegex,
        render: (key: string, match: string) => (
          <Equation key={key} eq={match} inline />
        ),
      },
    ];

    while (lastIndex < text.length) {
      let closestMatch = null;
      let closestPattern = null;
      let closestIndex = text.length;

      for (const { regex, render } of patterns) {
        regex.lastIndex = lastIndex;
        const match = regex.exec(text);
        if (match && match.index < closestIndex) {
          closestMatch = match;
          closestPattern = { regex, render };
          closestIndex = match.index;
        }
      }

      if (closestMatch && closestPattern) {
        if (closestIndex > lastIndex) {
          elements.push(text.slice(lastIndex, closestIndex));
        }

        const key = `inline-${closestIndex}`;
        if (closestPattern.regex === linkRegex) {
          elements.push(
            closestPattern.render(key, closestMatch[1], closestMatch[2])
          );
        } else {
          elements.push(closestPattern.render(key, closestMatch[1]));
        }

        lastIndex = closestPattern.regex.lastIndex;
      } else {
        elements.push(text.slice(lastIndex));
        break;
      }
    }

    return elements;
  };

  return <>{processMarkdown(line)}</>;
};

코드블록

일단 위에서 하던 것과 같은 방식으로 코드블록을 파싱합니다.

else if (content[i].startsWith("```")) {
  const lang = content[i].replace("```", "").trim();

  const code = [];
  let j = i + 1;
  while (j < content.length && !content[j].startsWith("```")) {
    code.push(content[j]);
    j++;
  }

  if (lang === "math") {
    html.push(<Equation key={i} eq={code.join("\n")} />);
  } else {
    html.push(<CodeBlock lang={lang} content={code.join("\n")} key={i} />);
  }

  i = j;
}

코드블록은 진짜 코드가 들어가는 경우와 언어를 math로 주고 LaTeX 코드가 들어가는 경우가 있습니다. 코드인 경우 위에서처럼 하이라이팅을 적용해야 하고 수식은 이미지로 변환해서 보여줘야 합니다.

코드 하이라이팅

코드 하이라이팅은 react-syntax-highlighter를 통해 지원하겠습니다.

$ npm install react-syntax-highlighter

@types/react-syntax-highlighter도 같이 받아주세요.

코드블록은 제 통제를 벗어나는 영역이기 때문에 다크모드에 민감하게 반응하기 어렵습니다. 그래서 클라이언트 컴포넌트로 만들어 다크모드인지 체크하고 다크모드라면 테마를 바꾸는 방향으로 구현했습니다. MutationObserver를 사용하면 DOM에 생기는 변화를 이벤트로 받아올 수 있습니다.

"use client";

import { Prism as SyntaxHighlighter } from "react-syntax-highlighter";
import { prism as lightTheme } from "react-syntax-highlighter/dist/esm/styles/prism";
import { atomDark as darkTheme } from "react-syntax-highlighter/dist/cjs/styles/prism";
import { useEffect, useState } from "react";

export default function CodeBlock({
  content,
  lang,
}: {
  content: string;
  lang: string;
}) {
  const [isDarkMode, setIsDarkMode] = useState(false);

  useEffect(() => {
    const checkDarkMode = () => {
      setIsDarkMode(document.documentElement.classList.contains("dark"));
    };

    const observer = new MutationObserver(() => checkDarkMode());
    observer.observe(document.documentElement, {
      attributes: true,
      attributeFilter: ["class"],
    });

    checkDarkMode();

    return () => observer.disconnect();
  }, []);

  return (
    <SyntaxHighlighter
      language={lang}
      style={isDarkMode ? darkTheme : lightTheme}
    >
      {content}
    </SyntaxHighlighter>
  );
}

수식 렌더링

수식은 MathJax를 통해 렌더링 하고자 합니다. 위에서 간단하게 언급했지만 수식은 코드블록에 math를 넣고 만드는 큰 수식과 글 중간중간에 사용되는 inline 수식이 있습니다. 예를 들면 \( E = mc^2 \) 같은게 인라인 수식입니다. MathJax는 클라이언트 컴포넌트에서만 사용할 수 있습니다. 그래서 초반엔 LaTeX가 그대로 보이다가 수식으로 변경되는 모습을 볼 수 있습니다.

global.css 등에서 모든 요소에 font-family 옵션이 있다면 MathJax는 제외해야 합니다. Math Renderer를 SVG로 바꾸면 문제가 없지만, 기본값은 CHTML이고 기본값을 직접 수정할 수 없습니다. 폰트가 지정되어 있으면 수식이 깨져 보이거나 일반 글씨처럼 보입니다.

"use client";

import { cn } from "@/lib/utils/cn";
import { MathJax, MathJaxContext } from "better-react-mathjax";

const mathJaxConfig = {
  loader: { load: ["input/tex", "output/svg"] },
  tex: {
    inlineMath: [
      ["$", "$"],
      ["\\(", "\\)"],
    ],
    displayMath: [
      ["$$", "$$"],
      ["\\[", "\\]"],
    ],
    processEscapes: true,
  },
};

export default function Equation({
  eq,
  inline = false,
}: {
  eq: string;
  inline?: boolean;
}) {
  return (
    <MathJaxContext config={mathJaxConfig}>
      {inline ? (
        <span
          className={cn(
            "inline-flex items-center mx-1 w-fit h-fit",
            "inline-math align-middle"
          )}
        >
          <MathJax>{`\\( ${eq} \\)`}</MathJax>
        </span>
      ) : (
        <div className="flex justify-center">
          <div
            className={cn(
              "math-container overflow-x-auto w-fit",
              "px-4 h-fit overflow-y-hidden"
            )}
          >
            <MathJax>{`\\[ ${eq} \\]`}</MathJax>
          </div>
        </div>
      )}
    </MathJaxContext>
  );
}

코드 블록에 math 언어를 주고 아래와 같은 LaTeX 코드를 작성하면 이런 결과를 얻을 수 있습니다. 참고로, 이 글에선 생략했지만 수식이 가로로 너무 길어질 경우 (특히 모바일 환경에서) 화면을 뚫고 나가버립니다. 수식을 담는 container에 overflow 옵션을 적절히 주어 화면을 나가지 않도록 처리해야 합니다.

\oint_C L\,dx + M\,dy =
\iint_D \left(
  \frac{\partial M}{\partial x} - \frac{\partial L}{\partial y}
\right) \, dx\,dy

\[ \oint_C L\,dx + M\,dy = \iint_D \left( \frac{\partial M}{\partial x} - \frac{\partial L}{\partial y} \right) \, dx\,dy \]

참고로 LaTeX를 작성하기 어렵다면 (제가 만든 ㅎㅎ) Online LaTeX Editor를 통해 도움을 받을 수 있습니다. 수식을 png, svg로도 다운받을 수 있으니 많이 사랑해주세요.

마치며

이렇게 이틀에 걸쳐 작성된 코드의 기록을 마쳤습니다. 현재 parser.ts 코드만 450줄 정도 되네요. 전체 소스 코드는 마찬가지로 Github에서 보실 수 있습니다. 꽤 구현 난이도가 높았는데, 많은 공부가 된 것 같습니다. 평소 자주 써보지 못했던 정규 표현식을 질릴 정도로 써볼 수 있었고 중첩된 리스트 처리를 위해 재귀까지 썼습니다. BOJ 다이아까지 달리며 확실히 구현 실력이 좀 늘었다는게 느껴져서 보람차네요. 굳이 저처럼 마크다운 렌더링을 직접 구현할 필요는 없습니다. 꽤 많은 버그와 놓친 부분 때문에 코드를 계속 수정하고 있는데 편하게 라이브러리로 한두줄이면 끝낼 수 있습니다. 저는 커스텀 컴포넌트를 넣는다거나, 나중에 원하는 기능을 추가하기 위해, 그리고 성능을 위해 직접 구현했습니다. 본인에게 맞는 방법을 선택하면 될 것 같네요. 긴 글 봐주셔서 감사합니다. 질문, 피드백 등 댓글은 언제나 환영입니다.