Type: Feature Request
Affects Version/s: None
Fix Version/s: None
currently we are very flexible when it comes to what characters we allow in friendly urls. we even allow characters such as the ellipsis character.. and then url encode everything.
this flexibility is nice.. but I feel it kinda defeats the purpose of a "friendly" url.
i propose that we limit what characters we allow in the friendly url for the sake of keeping things friendly (and compatible). one could argue that we should leave this flexibility in the hands of the web admin to choose reasonable friendly urls, but for blog titles, it's the user that determines the friendly url, we pretty much take almost everything and then url encode it. at that point, "friendly" urls become quite unfriendly.
for example, here is a friendly blog url:
which, if we stripped out the crazy stuff, could look something like this instead:
it is sometimes the case that the blogger will types everything in a word processor first and paste the text into liferay's text boxes. this introduces a whole slew of "interesting" characters and are then encoded in the "friendly" urls. we also found a bug with a combinations of apache and tomcat, that friendly urls are decoded twice causing the friendly urls to not pass the friendly url validation check.
This logic is currently in FriendlyURLNormalizer.normalize(String friendlyURL, char replaceChars). I also think that we should move that class over to portal-kernal so that plugins can have access to this method ( http://issues.liferay.com/browse/LPS-10799 ).
we can do this by simply adding a few lines into the normalize method:
Matcher matcher = _invalidCharsPattern.matcher(friendlyURL);
friendlyURL = matcher.replaceAll(StringPool.DASH);
private static Pattern _invalidCharsPattern = Pattern.compile(