werc/apps/wman botches some man page links

WARNING

I’m posting this to the 9front list because only spammers are subscribed to the werc list.

BACKROUND

werc/apps/wman/app.rc prints man pages as HTML:

fn wman_page_gen {
    #troff -manhtml $1| troff2html -t 'Plan 9 from User Space'
    troff -N -m$wman_tmac $1 | wman_out_filter
}

The function wman_default_out_filter then performs some magic to transform the resulting plain text into markdown, which is in turn processed by the standard werc handlers. This produces minimal HTML (as opposed to the commented-out, original troff pipeline, which produces hard to read HTML containing tables and other relatively complex structures).

Recently, someone pointed out that wman botches HREF links when troff -N automatically linewraps because of a dash:

; hget http://man.9front.org/8/venti | grep fmt | sed -n 2,3p
          were formatted with fmtarenas or fmtisect (see venti-
          <a href="../8/fmt">fmt(8)</a>). In particular, only the configuration needs to be

FIX

Currently, I have instituted a medium- to low-quality fix on the running system by inserting a ssam(1) line into the wman_default_out_filter function:

fn wman_default_out_filter {
    # col -x syntax is the same for UNIX and Plan 9.
    escape_html \
    | ssam 'x/[a-z]+-\n[ ]+[a-z]+\([0-9]\)/s/\n[ ]+//g' \
    | sed 's!([\.\-a-zA-Z0-9]+)\(('^`{echo $wman_cat_list|tr ' ' '|'}^')\)!<a href="../\2/\1">&</a>!g' \
    | awk '/^$/ {if(n != 1) print; n=1; next} /./ {n=0; print}' \
    | col -x
}

Now we get:

; hget http://man.9front.org/8/venti | grep fmt | sed -n 2,3p
          were formatted with fmtarenas or fmtisect (see <a href="../8/venti-fmt">venti-fmt(8)</a>). In particular, only the configuration needs to be
                       fmtarenas.

WHINING

This sucks for a couple of reasons:

- ssam(1) creates a temporary file on disk.

- page formatting is now dicked-up, as we remove
a newline every time we fix a link.

Plan 9 sed(1) and awk(1) do not recognize the \n for newline shorthand that is available in sam(1).

It should be possible to address this with awk(1), but I’m out of time for today.

Suggestions welcome.

sl