Strip HTML/Markdown before generating post slugs?

Slugs get a bit weird readability-wise because the system doesn’t strip out the HTML or Markdown first. Slugs should be generated from the plaintext.

ETA: For example a URL like

/andy-baio-reminds-us-twitter-com-waxpancake-status-1165089610935848961-s-20

is… kind of no.

ETA: FWIW, I might not even care anymore about customizing slugs if the slug creation process stripped the HTML/Markdown first. (Although my OCD would probably then also want it to ignore all punctuation including periods, so URLs are always a consistent length instead of sometimes being four words because a post starts with a four-word sentence.)

Thanks for the input, @bix. Could you share the original post / raw text that generated that slug? That’ll allow us to test it out and potentially try out a fix.

I’ll dig it up, but the specifics are sort of besides the point, I think?

It’s mostly a question of platform preference: just generate the slug from the raw post, including characters from html or hyperlinks; or strip html, markdown, and punctuation and generate the slug from the punctuation-free plaintext.

The original post is here (linked here on the site I’m moving away from).

https://write.house/bix/andy-baio-reminds-us-that-today-is-bloggers-twentieth-birthday

Here’s the text of the first paragraph as input in the editor.

Andy Baio [reminds us](https://twitter.com/waxpancake/status/1165089610935848961?s=20) that today is [Blogger's](https://blogger.com/) twentieth birthday, per the [announcement by Ev](https://web.archive.org/web/20040625005017/https://www.evhead.com/1999/08/we-just-launched-cool-new-tool-at-pyra.asp) (via the Wayback Machine because Ev's security certificate expired last year) on August 23, 1999, at 3:45pm.

The slug generated by WriteFreely was, as posted above

/andy-baio-reminds-us-twitter-com-waxpancake-status-1165089610935848961-s-20

including the URL from the hyperlink, etc.

Thanks, that helps us debug faster and ensure we’re fixing the right thing :+1:

It’s not always evident to me which things I bring up would be seen the same way by others here, and which things are just my OCD. For example, this situation is definitely plaguing my brain’s need for consistency, with sometimes long URLs, sometimes short URLs cut at sentence-ending periods, sometimes URLs with the html-text still in them.

Other people might not care, so it’s not clear-cut on “is this an issue” or “is this just my brain hurting”.

Extra note: I did just spot that because the current approach stops at the first period it finds, I had one post whose slug was simply “j”, because the post started with someone’s name and their name began with two initials.

FWIW, I’ve been manually generating my post slugs using browserling’s doohickey, which is not ideal because it doesn’t actually behave the same way as Write.as’ does when it comes to certain punctuation (like, I think Write.as turns emdashes into word-separating hyphens, which is a behavior that makes sense, and Browserling doesn’t do that), until this gets sussed one way or the other.

1 Like