PHP shorten text function
While not nearly as many lines as the Content Negotiation class I posted yesterday, my shorten_text() function took me just about as long to code.
Or rather code again and again. This one had me stumped for a few days, I admit.
My shorten_text() function began as a copy+paste adventure from something off the web. Here is the code for the original shorten text function. (Note, I don’t think this is the site where I grabbed the original code, but it is line-for-line the same thing). For something so simple this function works really well when used to shorten a block of text. But what about a block of HTML?
This turns out to be a whole new ball game. The problem: closing tags are treated the same as any other word and are quickly truncated from the string – possibly causing formatting to “leak” to the rest of the page. It is also a quick way to make an XHTML page invalid. I searched the Internet for a function that intelligently cropped text to preserve closing tags but to no avail. I stumbled upon a few projects aimed at creating a whole class to preform this operation but they had yet to release any code.
So I was on my own on this one. My first “working” version used the PHP function explode() to break a block of HTML on the less-than sign (<). The idea being that this symbol marked the beginning of an HTML tag. With that accomplished I scanned my array backwards, each time separating the tag from the text. If the text was dropped completely and the tag was a closing tag I saved it in a stack. Whenever I reached the corresponding opening tag I would drop the closing tag from the stack (because all the content between had been deleted).
For being so crewed, this method worked surprisingly well. I found it pretty clear that the “better” solution would lilely follow a similar formula. Regular expressions seemed to be the way to go because they could offer me a better, more reliable way of finding tags (rather than assuming every less-than symbol belongs to a tag).
The secret was preg_split() – which is the regex version of explode(). With this function I was able to achieve my goal. Using the flag to preserve delimiters (since we don’t want to throw away the HTML tags!), preg_split creates a consistent array where content is in even indexes and tags are in odd.
And there you have it, my new shorten text function.
BTW, you might notice the magic regular expression in shorten_text() is awfully familiar to the big regular expression in my content negotiation class. While I modified the expression slightly, I found this here.