Generic and entry-related utility definitions for the Org-Roam-Blog Engine Package.
§ Roam-related
Getter for a node :PROPERTY:
(defsubst org-roam-blog--get-node-property (node property) "Get PROPERTY from Roam NODE; 'assocdr'-style shortcut." (declare (type org-roam-node node) (type string property)) (--> node (org-roam-node-properties it) (assoc property it) (cdr it)))
§ FIXME An expelled org-roam-node-find-noselect
finder
It was removed from the org-roam core module. I temporarily put it back.
Understand the reason:
Refactor into nicer usage.
(defun org-roam-node-find-noselect (node &optional force) "Navigate to the point for NODE, and return the buffer. If NODE is already visited, this won't automatically move the point to the beginning of the NODE, unless FORCE is non-nil." (unless (org-roam-node-file node) (user-error "Node does not have corresponding file")) (let ((buf (find-file-noselect (org-roam-node-file node)))) (with-current-buffer buf (when (or force (not (equal (org-roam-node-id node) (org-roam-id-at-point)))) (goto-char (org-roam-node-point node)))) buf))
§ Filename utils
(defun org-roam-blog--current-dir () "Get the absolute path of the current directory." (expand-file-name (if load-file-name (file-name-directory load-file-name) default-directory)))
(defsubst org-roam-blog--substring-from-right (s n) "Retusrns substring of length N from the end of string S." (let* ((end (length s)) (beg (- end n))) (substring s beg end)))
(defsubst fname-with-tslash (fname) "Make sure that FNAME comes with trailing slash." (let ((fname (string-trim fname))) (if (string-equal "/" (org-roam-blog--substring-from-right fname 1)) fname (format "%s/" fname))))
(defsubst fname-no-tslash (fname) "Make sure that FNAME comes with trailing slash." (let ((fname (string-trim fname))) (if (string-equal "/" (org-roam-blog--substring-from-right fname 1)) (substring fname 0 -1) fname)))
§ Calendar date list construction
Handy utility defs used by ORB (and general useful for one's custom library).
(defsubst org-roam-blog-to-calendar-date (s) "Constructs a valid date list for Calendar from Org's datestring S." (let ((dt (org-parse-time-string s))) (list (nth 4 dt) ; month (nth 3 dt) ; day (nth 5 dt)))) ; year
§ Slug Builder
A slug-builder made using unidecode
package.
Set as default for ORB's index slug-fn
.
Looks like the easiest way to have e.g. Cyrillic -> ASCII string translations saved.
(defsubst org-roam-blog-slugify (string) "Default slug builder. Sanitizes the STRING using `unidecode' package, removes extra hyphens, coerces result to lowercase." (downcase (seq-reduce (lambda (accum item) (replace-regexp-in-string (car item) (cdr item) accum)) '(("--+" . "-") ("^-" . "") ("-$" . "")) (unidecode-sanitize string))))
§ Entry content to HTML
This is a very raw section.
At the moment, I intend to use external org-to-html translator like orgize
,
since it should be faster (?) and more manageable (??) than the built-in org
methods.
A thin balance between tweaking such engine and/or modifying the entry buffer before translation is yet to be found.
§ Interface to htmlizer - external dynamic module
This is a test environment. The top package file loads an external Rust pre-built module like that I am to shipping with the engine.
Then I define the following alias for a htmlizer function extracted from the module:
(defsubst org-roam-blog--org-to-html (s) (funcall #'my-org-dynmod/org-to-html s))
§ HTMLizer function node property
But that one above the node itself can swap to another htmlizer function
to which it can refer thrugh org-roam-blog-html-fn-property
(defaults to "ORB_HTML_FN"). The following subst returns this function
symbol if set:
(defsubst org-roam-blog--get-node-html-fn (node) "When Roam NODE's html-fn-property is set, return its symbol interned." (declare (type org-roam-node node)) (--> node (org-roam-blog--get-node-property it org-roam-blog-html-fn-property) (when it (intern it))))
§ Backlinks passing for node's context
Shortcut to get the lead-index
for a node in question,
from the current site's state that is lexically bound within -orb--entry-registry
,
during the process of a site staging (within org-roam-blog-stage-site
).
I admit that it's a hack. Originating from lack of planning beforehand. Seems legit, but at the same time a bit clumsy. I might later think about how to communicate the state of a website instance more gracefully (without massive OOP that I'd better avoid). At the moment it's not that important, since it's unlikely that I process more than one site instance at a time, and possible asynchronous processing of indexes within that instance shouldn't be blocked.
But I know it's quite dumb. And yep, hail lexical binding for making this workaround available...
(defsubst org-roam-node--lead-index-for (node-id) (ht-get (org-roam-blog-g-entries-get) node-id))
In the end of the backlinks context processor I remove-duplicates
for
resulting list of hash tables. This is to watch for when anchors are added:
listing same material/section more than once is not good while referencing
different sections on the same page is Okay.
(defsubst org-roam-blog--backlinks-to-context (node) "Extract backlinks for a context of the NODE." (let* ((backlinks (org-roam-backlinks-get node)) (backnodes (mapcar #'org-roam-backlink-source-node backlinks))) (-> (cl-loop for bn in backnodes for lead-index = (org-roam-node--lead-index-for (org-roam-node-id bn)) when lead-index ; process those back-linked entries that are on this site as well collect (ht ("title" (org-roam-node-title bn)) ("link" (org-roam-blog--relative-entry-url bn lead-index)))) (remove-duplicates :test #'ht-equal?))))
§ Contents for node's context
Anchors on page of selected level are collected as Table of Contents.
(defun org-roam-blog--toc-to-context (node req-level) (let* ((lead-index (org-roam-node--lead-index-for (org-roam-node-id node))) (urlprefix (org-roam-blog--relative-entry-url node lead-index)) (headlines (with-temp-buffer (insert (org-roam-blog--prepr-filter-noexport (org-roam-blog--get-node-content node))) (org-element-map (org-element-parse-buffer) 'headline (lambda (headline) (let ((title (org-element-property :raw-value headline)) (level (org-element-property :level headline)) (begin (org-element-property :begin headline))) (when (and (string-match org-roam-blog-sure-headline-regex (progn (goto-char begin) (thing-at-point 'line t))) (not (string-match org-roam-blog-anchor-regex title))) (cons title level)))))))) (->> (cl-loop for hl in headlines for i from 1 collect (cons hl i)) (remove-if-not (lambda (hl) (>= req-level (cdar hl)))) (mapcar (lambda (hl) (ht ("title" (caar hl)) ("link" (format "%s#%s-%s" urlprefix (org-roam-blog-slugify (caar hl)) (cdr hl)))))))))
§ Node content text preprocessing
Preprocessing of a node's content text in order to:
filter
:noexport:
subtreesget links to other registered nodes
collect media files and update their urls
inject headline anchor links
etc
§ Filter :noexport:
The idiom of iteratively modifying text buffer with node content used in following processors proved to be reliable and rather fast.
However, benchmarks show this version of altering Org buffer through
#'re-search-forward
and #'org-cut-subtree
to work faster than my
idioms with #'org-element-map
and #'buffer-substring
placing. And
still it is the slowest middleware, though also quite important one.
That actually refers to a discussion on quick emacs buffer slicing. At the moment I do not know if the solution is found.
(defun org-roam-blog--prepr-filter-noexport (text) (cl-labels ((-filter-noexport () (let ((p (point))) (setf p (re-search-forward "^*" nil t)) (when p (goto-char p) (when (member "noexport" (org-get-tags)) (org-cut-subtree))) p))) (with-temp-buffer (insert text) (goto-char (point-min)) (let ((scan (point))) (while (not (null scan)) (setf scan (-filter-noexport)))) (buffer-string))))
§ Get links to registered nodes
The weird emacs point
arithmetics somehow makes it swallow a
trailing space after id:
links. I really dunno why. The workaround
is the keep-trailing-space
labeled lambda that returns the space-in
string back, or an empty string otherwise:
(defun org-roam-blog--prepr-replace-node-links (text) (cl-labels ((keep-trailing-space (point) (if (char-equal 32 (char-before point)) " " "")) (link-replace () (if-let ((id-link (org-element-map (org-element-parse-buffer) 'link (lambda (link) (when (string= (org-element-property :type link) "id") (list (org-element-property :path link) (org-element-property :begin link) (org-element-property :end link) (org-element-property :contents-begin link) (org-element-property :contents-end link)))) nil t))) (if-let ((index (org-roam-node--lead-index-for (first id-link)))) (setf (buffer-substring (second id-link) (third id-link)) (format "[[%s][%s]]%s" (org-roam-blog--relative-entry-url (org-roam-node-from-id (first id-link)) index) (buffer-substring (fourth id-link) (fifth id-link)) (keep-trailing-space (third id-link)))) (setf (buffer-substring (second id-link) (third id-link)) (format "[[%s][%s]]" (first id-link) (buffer-substring (fourth id-link) (fifth id-link)))))))) (with-temp-buffer (insert text) (let ((links-to-go t)) (while links-to-go (setf links-to-go (link-replace)))) (buffer-string))))
§ Process media links
I may further look at #'org-export-read-attribute
to read and process
image captions and other optional attrs that are used for images in Org.
(defun org-roam-blog--prepr-media-links (text node) (let* ((index (org-roam-node--lead-index-for (org-roam-node-id node))) (destdir (org-roam-blog-site-scratch-dir (org-roam-blog-g-site-get))) (destdir (f-expand (org-roam-blog-index-slug index) destdir)) (destdir (f-expand (org-roam-blog-index-media-dir index) destdir)) (urlprefix (concat "/" (org-roam-blog-index-slug index) "/" (org-roam-blog-index-media-dir index)))) (unless (f-exists-p destdir) ;; FIXME: some setups have outdated f.el: ;; (f-mkdir-full-path destdir) (shell-command (format "mkdir -p %s" destdir))) (cl-labels ((link-replace () (if-let ((file-link (org-element-map (org-element-parse-buffer) 'link (lambda (link) (when (string= (org-element-property :type link) "file") (list (org-element-property :path link) (org-element-property :begin link) (org-element-property :end link)))) nil t))) (cl-destructuring-bind (path begin end) file-link (let ((path-orig (f-expand path (f-dirname (org-roam-node-file node)))) (path-dest (f-expand (f-filename path) destdir)) (urlpath (concat urlprefix "/" (f-filename path)))) (when (f-exists? path-orig) (unless (f-exists? path-dest) (f-copy path-orig path-dest)) (if (member (downcase (f-ext path)) org-roam-blog-media-image-extensions) (setf (buffer-substring begin end) (format org-roam-blog-media-image-inline-format-string urlpath)) (setf (buffer-substring begin end) (format "[[%s][%s]]" urlpath (f-filename path)))))))))) (with-temp-buffer (insert text) (let ((links-to-go t)) (while links-to-go (setf links-to-go (link-replace)))) (buffer-string)))))
§ Prepend anchor links for headlines
Also quite tediously-composed preprocessor, since the intermediate Org-buffer is being modified in place.
(defun org-roam-blog--prepr-prepend-anchor-links (text node) (let ((idx 0)) (cl-labels ((to-id (title idx) (format org-roam-blog-anchor-id-format (org-roam-blog-slugify title) idx)) (to-anchor (title idx) (format org-roam-blog-anchor-format (org-roam-blog-slugify title) idx)) (prepend-anchor-link () (if-let ((headline (org-element-map (org-element-parse-buffer) 'headline (lambda (headline) (let ((title (org-element-property :raw-value headline)) (level (org-element-property :level headline)) (begin (org-element-property :begin headline))) (when (and (string-match org-roam-blog-sure-headline-regex (progn (goto-char begin) (thing-at-point 'line t))) (not (string-match org-roam-blog-anchor-regex title))) (list title level begin)))) nil t))) (cl-destructuring-bind (title level begin) headline (cl-incf idx) (let* ((anchor (to-anchor title idx)) (id (to-id title idx))) (setf (buffer-substring (+ begin level) (+ begin level)) anchor) (setf (buffer-substring begin begin) (format "#+begin_export html\n<a id=\"%s\"></a>\n#+end_export\n" id))))))) (with-temp-buffer (insert text) (let ((headlines-to-go t)) (while headlines-to-go (setf headlines-to-go (prepend-anchor-link)))) (buffer-string)))))
§ Compose the above
FIXME: move
node
to global context object.TODO: add possibility to attach additional "middleware".
(defun org-roam-blog--preprocess-node-content (text node) (-> text (org-roam-blog--prepr-filter-noexport) (org-roam-blog--prepr-media-links node) (org-roam-blog--prepr-replace-node-links) (org-roam-blog--prepr-prepend-anchor-links node)))
§ Entry content to HTML
First define a function that finds the start of a node's content
in both cases (for headline nodes and file nodes), based on a
customizable content-start-regexp
. The definition is similar to
Org's outline-end-of-heading
.
(defun org-roam-blog--content-start () (if (re-search-forward org-roam-blog-outline-content-start-regexp nil 'move) (forward-char -1)))
With that, in those two cases I can obtain content of a node:
(defsubst org-roam-blog--get-node-content (node) "Get textual content of a NODE, for either headline or a file NODE." (with-current-buffer (org-roam-node-find-noselect node t) (let* ((beg (condition-case nil (progn (outline-back-to-heading) (point)) (error 1))) (end (if (= beg 1) (point-max) (progn (outline-end-of-subtree) (point)))) (beg (progn (goto-char beg) (org-roam-blog--content-start) (point)))) (buffer-substring-no-properties beg end))))
And the following definition passes thus obtained content to HTMLizer.
(defsubst org-roam-blog--htmlize-node-content (node) "Get a node content and HTMLize it with preferred engine." (let ((htmlizer (or (org-roam-blog--get-node-html-fn node) #'org-roam-blog--org-to-html))) (if-let ((raw-content-filename (org-roam-blog--get-node-property node org-roam-blog-html-src-property))) (with-temp-buffer (insert-file-contents (f-expand raw-content-filename (f-dirname (org-roam-node-file node)))) (buffer-string)) (funcall htmlizer (org-roam-blog--preprocess-node-content (org-roam-blog--get-node-content node) node)))))
§ Aux and temporary utilities for entries
Ad-hocs that should be reworked.
Drop the main
tag, coming out from orgize
:
(defsubst org-roam-blog--drop-main-tag (s) "Remove <main> tag from orgize-produced html." (seq-subseq s 6 (- (length s) 7)))
(org-roam-blog--drop-main-tag (org-roam-blog--org-to-html "** Section Test"))
: <h2>Section Test</h2>