<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://personalcompute.net/feed.xml" rel="self" type="application/atom+xml" /><link href="https://personalcompute.net/" rel="alternate" type="text/html" /><updated>2025-12-31T18:46:21+00:00</updated><id>https://personalcompute.net/feed.xml</id><title type="html">PersonalCompute.Net</title><entry><title type="html">Markdown tools - part 1 - Producing high-quality PDF files</title><link href="https://personalcompute.net/2025/12/22/pandoc-template.html" rel="alternate" type="text/html" title="Markdown tools - part 1 - Producing high-quality PDF files" /><published>2025-12-22T00:00:00+00:00</published><updated>2025-12-22T00:00:00+00:00</updated><id>https://personalcompute.net/2025/12/22/pandoc-template</id><content type="html" xml:base="https://personalcompute.net/2025/12/22/pandoc-template.html"><![CDATA[<p>This article shows how to convert Markdown documents into PDF files by using
Pandoc and its extensions. The first step is to transform the Markdown document into
LaTeX code, and then rendering that code in PDF. The plain Pandoc template is
customized to get a reusable set of settings.</p>

<h2 id="markdown">Markdown</h2>

<p>You might have encountered <a href="https://en.wikipedia.org/wiki/Markdown">Markdown</a> formatting
while writing a comment on a forum, or editing a link in a chat application. It’s a
formatting language specifying just the basic elements: where a paragraph should end,
which words should be in bold, which section of text represents a link, and so on. This
format was started in 2004 as an informal collection of formatting rules, and in
2016 two formal RFC documents were published (<a href="https://www.rfc-editor.org/rfc/rfc7763">RFC7763</a> and <a href="https://www.rfc-editor.org/rfc/rfc7764">RFC7764</a>).</p>

<p>Since Markdown files are essentially text files, all the tooling associated with
text files is available from the start (think of <code class="language-plaintext highlighter-rouge">grep</code>, <code class="language-plaintext highlighter-rouge">diff</code> or <code class="language-plaintext highlighter-rouge">git</code>). Entire
workflows dedicated to collaborating and working with text-files are unlocked from the
start, without any vendor lock-in.</p>

<p>On top of that, there’s a large ecosystem of tools for processing Markdown files
(which are “pure content”) into other formats: PDF documents, presentations, websites,
and so on.</p>

<p>Because the format is so simple, it encourages even novice programmers to create,
share and adapt scripts and automations for whatever task they find repetitive,
in whatever language they feel confortable using, without having to learn any
complex framework.</p>

<h2 id="pandoc">Pandoc</h2>

<p><a href="https://pandoc.org/">Pandoc</a> is a converter between various document formats. It
transforms documents written in a markup language (such as Markdown or MediaWiki) to 
other formats (such as LaTeX, HTML or docx). Using Pandoc we’ve already expanded the
range of things we can do with our Markdown files, and we can push it even further by
making formatting choices and customisations.</p>

<h2 id="latex-and-tex">LaTeX and TeX</h2>

<p><a href="https://en.wikipedia.org/wiki/TeX">LaTeX</a> is the gold standard when it comes to
academic publishing. It’s a typesetting system started in the late ’70s, which over
time grew in functionality and popularity in the academic circles.</p>

<p>It also relies on plaintext documents, but its philosophy is at the polar opposite
from Markdown: while Markdown tries to allow as few formatting options as possible in
order to focus on representing the content, TeX allows you to write content, make 
formatting choices, and documents can even contain code that control how the document 
should be formatted. To achieve consistent formatting, you are not limited to a few
formatting options, but are invited to write short replacement-rules or fragments of code,
and reuse those as often as possible.</p>

<p>In fact, TeX is a Turing Complete system, meaning that any valid computer program
can be written as TeX code (and the program will run when the TeX document is converted
to PDF). Over time, users of the LaTeX system began curating a library of macros at <a href="https://ctan.org/">CTAN</a>.</p>

<p>The LaTeX syntax is not as convenient as Markdown, but it’s a tradeoff that’s worth doing
in academia: think of how often academic papers contain specialized notation, such as
advanced math, electrical diagrams, chemistry or music notation - once you become
proficient using the macros needed in your field, it’s simpler than using a
general-purpose document editor.</p>

<h2 id="pandoc-ecosystem">Pandoc ecosystem</h2>

<p>Since Pandoc is a tool focused just on converting documents from one format to another,
producing good-looking documents is considered out of scope. Instead, it relies on
community members to build and share templates, which are “reasonable set of formatting
choices”. One such template geared towards the LaTeX format is <a href="https://github.com/Wandmalfarbe/pandoc-latex-template">Eisvogel</a>, which in turn can be customized even further (see the <a href="https://github.com/Wandmalfarbe/pandoc-latex-template/blob/master/README.md"><code class="language-plaintext highlighter-rouge">README.md</code> file</a> and the <a href="Wandmalfarbe/pandoc-latex-template/tree/master/examples"><code class="language-plaintext highlighter-rouge">examples</code> directoy</a>).</p>

<p>Pandoc also supports “filters”: small extensions that change the way the input is processed
in order to introduce new formatting elements, not handled by the pandoc core. Using just
Markdown and a good LaTeX template covers about 90% of the use-cases, but sometimes
that’s not enough - sometimes we want to add a glossary page at the end of the document,
or write text-boxes with tips that are formatted differently from the rest of the
paragraphs. These things would be easy while working in pure LaTeX code, but Markdown is
too simple. We can cover these use-cases by adding the following filters:</p>

<ul>
  <li><a href="https://github.com/chdemko/pandoc-latex-environment"><code class="language-plaintext highlighter-rouge">pandoc-latex-environment</code></a>, maps the <code class="language-plaintext highlighter-rouge">:::</code> syntax element of the Markdown format to a pair of <code class="language-plaintext highlighter-rouge">\begin{...} \end{...}</code> tags inside the LaTeX output.</li>
  <li><a href="https://github.com/tomncooper/pandoc-gls"><code class="language-plaintext highlighter-rouge">pandoc-gls</code></a>, adds a custom syntax element of <code class="language-plaintext highlighter-rouge">(+x)</code> to the Markdown format, which is mapped to <code class="language-plaintext highlighter-rouge">\gls{x}</code> inside the LaTeX output.</li>
</ul>

<h2 id="pdf-documents">PDF documents</h2>

<p><a href="https://en.wikipedia.org/wiki/PDF">The PDF format</a> is a standard file format (ISO32000)
for documents, achieving a large degree of adoption. It’s a complex binary format, which
encodes with high fidelity documents, “as printed”.</p>

<p>One restricted variant of this format, <a href="https://en.wikipedia.org/wiki/PDF/A">PDF/A</a> is
considered suitable for archival and digital preservation.</p>

<h2 id="putting-it-all-together">Putting it all together</h2>

<p>Dealing with all this complexity and tweaking might sound daunting, but it’s just
an one-time setup activity. Once you get a template and a set of extensions that work
for your use-case, starting working on a new document is just a matter of copying the
customized template to a new folder. Achieving a consistent formatting and using the
brand identity elements happens automatically, and you can focus on the content.</p>

<p>The template used by this website is present at <strong><a href="https://github.com/PersonalCompute-net/doc-template/">https://github.com/PersonalCompute-net/doc-template/</a></strong>
in two versions: with support for glossary (the “glossary-template” directory) and
without (the “basic-template” directory). Check out the <code class="language-plaintext highlighter-rouge">main.pdf</code> document inside the
chosen directory for the steps involved in setting it up. The <code class="language-plaintext highlighter-rouge">main.pdf</code> also serves as
a preview on how the final documents will look. Feel free to tweak it to match your
use-case.</p>

<p>This is a good example of open-source collaboration, where a good solution can be built
using standardized interfaces and formats - combining the ease of Markdown editing with
the quality of the TeX layout engine. There was no central plan or architecture, no single
set of assumptions on how the final documents should look, and no vendor lock-in to a
specific set of tools. The setup experience is not 100% streamlined, but this is because
you have the choice of picking the components, customizing them, or replacing them with
better alternatives, if those appear.</p>

<h2 id="a-note-on-computer-security">A note on computer security</h2>

<p>Let’s also have a look about how various formats and ecosystems deal with computer security, since “allowing any valid program” implies that running malware is possible.</p>

<ul>
  <li><strong>Markdown</strong> has a clear separation between the content (which can be shared without worries) and external tools (which have to be installed by the user).</li>
  <li><strong>Pandoc</strong> by itself does not distribute third-party extensions (templates and filters).
Those have to be installed separately, and it’s the user’s responsability to decide
which tools are trustworthy enough to run.</li>
  <li><strong>LaTeX</strong> allows for third party code to be distributed and executed in two instances:
first in the library of macros contributed by the community (the <code class="language-plaintext highlighter-rouge">texlive-full</code> package,
where macros have to pass a basic review by the package maintainers) and in the documents
themselves. The way security works in the case of documents received from collaborators
is that the collaborators themselves have to reach a minimal level of trust, the macros
are expected to be readable code (obfuscated macros should raise concerns) and the scope
of what the macro can do is quite limited to reading local files and control the contents of the generated PDF.</li>
  <li><strong>Microsoft Office</strong> had a problem with <a href="https://en.wikipedia.org/wiki/Macro_virus">embedded macros</a> because rather than require explicit user action to install a script (as is
the case with the Markdown or Pandoc ecosystems), rely on a central library of trusted
and reviewed extensions, or allow for the code to be inspected before running it (as is
the case with LaTeX), it included macros directly in Word Documents, and the macros could
be launched before the user had a chance to review them first. To make matters worse, the
macro didn’t run inside a tight sandbox (and the macro received the control of the email
accounts and network connections).</li>
</ul>]]></content><author><name></name></author><summary type="html"><![CDATA[This article shows how to convert Markdown documents into PDF files by using Pandoc and its extensions. The first step is to transform the Markdown document into LaTeX code, and then rendering that code in PDF. The plain Pandoc template is customized to get a reusable set of settings.]]></summary></entry><entry><title type="html">Blueprint for a high-scalability, low-cost blog</title><link href="https://personalcompute.net/2025/11/21/blueprint.html" rel="alternate" type="text/html" title="Blueprint for a high-scalability, low-cost blog" /><published>2025-11-21T00:00:00+00:00</published><updated>2025-11-21T00:00:00+00:00</updated><id>https://personalcompute.net/2025/11/21/blueprint</id><content type="html" xml:base="https://personalcompute.net/2025/11/21/blueprint.html"><![CDATA[<p>One problem self-hosted blogs face is that the size of their content should be kept under control, in order to remain within the limits of the virtual server. This means video content or large attachments (&gt;10GB per post) are out of reach for self-hosted blogs. For publishing video content, the mainstream option is a specialized, ad-supported platform, even if the quality of its service is constantly degrading.</p>

<p>This post tries to show an alternative path, a blueprint for delivering blog content, using low-cost infrastructure and interoperable tools.</p>

<h2 id="the-protocol-aspect-initial-file-release">The protocol aspect (Initial file release)</h2>

<p>By releasing files via BitTorrent (possibly with a pre-populated cluster of low-cost seedboxes), this project is creating a ghetto-style CDN. The useful thing is that if the readers themselves contribute with bandwidth and storage, sharing large files becomes practical. The protocol itself has been in use for more than a decade, so it’s well understood.</p>

<p>The main drawback is that, due to the P2P nature of the BitTorrent protocol, it’s not privacy-protecting (every participant in the BitTorrent swarm can see the IP address of every other participant).</p>

<p>To keep things organized, this website has its own BitTorrent tracker: <a href="https://tracker.personalcompute.net/">https://tracker.personalcompute.net/</a>.</p>

<h2 id="the-legal-aspect-licensing--copyright">The legal aspect (Licensing &amp; copyright)</h2>

<p>Since the files are supposed to be propagated across the web, their licensing must be compatible with redistribution. This means that the content must fall into the open-source software, creative-commons or public-domain categories of license.</p>

<p>As a general guideline, new content will be covered under <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">CC BY-NC-SA</a> (🅭 🅯🄏🄎) license, but other permissive licenses such as <a href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA</a> (🅭 🅯🄎) or <a href="https://creativecommons.org/licenses/by-nd/4.0/">CC BY-ND</a> (🅭 🅯⊜) can appear.</p>

<h2 id="the-cryptographic-aspect-copy-authentication">The cryptographic aspect (Copy authentication)</h2>

<p>For everyone who objects to using BitTorrent due to privacy issues, there is a path forward included, right from the design-stage. By hashing the files inside the bundle and cryptographically signing the result, it’s possible to check if it originated from the blog. The OpenBSD community already has a tool for this purpose: <a href="https://www.openbsd.org/papers/bsdcan-signify.html"><code class="language-plaintext highlighter-rouge">signify</code></a> (which they’ve been using for a decade).</p>

<p>As of 2025, the cryptographic algorithms are SHA256 for hashing and Ed25519 for signatures (but as the field of cryptography evolves, vulnerabilities start to appear, cracking hardware becomes more accessible, and new algorithms can be added). Example of a signed-bundle file: <a href="https://cdn.openbsd.org/pub/OpenBSD/7.8/SHA256.sig">https://cdn.openbsd.org/pub/OpenBSD/7.8/SHA256.sig</a>.</p>

<p>The key management reflects the best-practices used by the OpenBSD maintainers:</p>

<ul>
  <li>The signing key is kept on an offline machine (no internet connection, only powered on when necessary).</li>
  <li>Periodic key rotation: every year a new signing key is generated, and for every year, the public key for the following year is being disseminated (ex: during the year 2025, the public keys for the 2025 and 2026 years are being published).</li>
  <li>The public keys are made available in an easy-to-check way (both as downloadable files and DNS TXT records) - <a href="https://personalcompute.net/keys">https://personalcompute.net/keys</a>.</li>
</ul>

<p>There are drawbacks of using the “signify” tool on mirrored file-bundles, but these are more user-interface problems rather than algorithm problems, and could be fixed by replacing a few snippets of code.</p>

<ul>
  <li>If the mirror adds new files to the bundle, the “signify” tool will not raise a warning that the downloaded content contains unaccounted files (but has enough information to make this determination).</li>
  <li>If the mirror changes the content of a single file, the validation of the entire file-bundle fails (but the “signify” tool can be tweaked to indicate which files can or cannot be authenticated).</li>
</ul>

<h2 id="some-final-points">Some final points</h2>

<p>By combining the three aspects, we obtain the following additional benefits:</p>

<ul>
  <li>
    <p><strong>Privacy &amp; Interoperability:</strong> Since the files can be downloaded and redistributed without restrictions, one could mirror the files using any other tools (with better privacy characteristics): private trackers, trusted DC++ hubs, messenger file-transfers or university servers - anything works. And since all file-releases are cryptographically signed, the end-user can (and should) check if any of the files were altered.</p>
  </li>
  <li>
    <p><strong>Availability:</strong> even if the main server is offline, readers don’t depend on it to access the content - they just need a reference to the BitTorrent files and the public key for signature-validation. One problem with older blogs (from 10-15 years ago) is that once the server is shut down, their content is practically lost-media, since the author retains the exclusive right to host that content - with a bit of luck you might find some posts on the Wayback Machine, but that’s not a certainty. But by having the content usable offline (and shared on a distributed infrastructure), the content can still be accessed.</p>
  </li>
</ul>]]></content><author><name></name></author><summary type="html"><![CDATA[One problem self-hosted blogs face is that the size of their content should be kept under control, in order to remain within the limits of the virtual server. This means video content or large attachments (&gt;10GB per post) are out of reach for self-hosted blogs. For publishing video content, the mainstream option is a specialized, ad-supported platform, even if the quality of its service is constantly degrading.]]></summary></entry></feed>