This month I found myself considering the problem of how to split a string representing some English text into substrings that contain only whole words, but where the substrings are as near as possible to a certain number of characters in length. Formalising this a bit, I wanted an algorithm which would take a string s of characters (including spaces but not as consecutive characters) and a number n, and return a substring of s starting at the first character (using one-based numbering) and going up to the mth character, where m ≤ n, character m+1 is a space, and any characters in s which are after the m+1th (exclusive) and before the n+1th (inclusive) are not spaces. That has probably made things sound more confusing, so imagine that n was 7, and s was “1234 6789 ABCD”, then m would be 4, and m+1 would be 5, meaning there are no spaces that are strictly after m+1 but before or including n+1. The main focus of this post, though, is showing the different ways it can be done in Groovy, and how beautiful those ways are, especially if you make it a one-liner.
Continue reading "The right way to split strings"