On splitting strings

Splitting strings is cool, but most languages have their subtle differences in how it is done. The three contenders are JavaScript, Python and Ruby. As an example, suppose you’re getting a string in the form “type_role_name” and you want to split it into type, role, name. The little twist here is that ‘name’ can also contain underscores. Let’s start reversed, it’s an easy job in Ruby:

irb> type,role,name = "user_admin_john_doe".split('_', 3)
=> ["user", "admin", "john_doe"]

Ruby’s split method want’s to know how much pieces you want. Onwards with Python:

>>> type,role,name = 'user_admin_john_doe'.split('_', 2)
('user', 'admin', 'john_doe')

I prefer this style slightly, as you have to express how many splits you want. The mindset here is that when you’re saying ‘split twice’, you’re expecting that the last element may contain the remainder of the string. By contrast, when saying, ‘give me three pieces’, it is unclear whether you want the remainder or not. Besides that, both are equally cool. Last but not least, how to do it in JavaScript? Well, turns out it’s rather messy:

“user_admin_john_doe”.split(‘_’, 3)
[“user”, “admin”, “john”]

Altough this is somewhat similar to Ruby (i.e. ‘gimme three pieces’), it continues to split and discards the remaining stuff. Furthermore, JavaScript doesn’t support multiple assigment (could also be called iterable unpacking), so you have to store the result into an array first and assign individually. But more importantly, how to get “john_doe”? Turns out that you have to fiddle with splitting off substrings:

str = "user_admin_john_doe"
type = str.slice(0, str.indexOf('_'));
str = str.slice(str.indexOf('_') + 1);
role = str.slice(0, str.indexOf('_'));
str = str.slice(str.indexOf('_') + 1);
name = str

Not exactly what I would call elegant but it does the trick. I’d say, it’s a draw game between Python and Ruby, whereas JavaScript failed big time 🙂

2 thoughts on “On splitting strings

  1. In perl it can be written like a plain old-school function call:

    @a = split(/_/, “user_admin_john_doe”, 3);
    [ ‘user’, ‘admin’, ‘john_doe’ ]

    I think of the 3 as ‘the return value shall be an array of 3 elements’.
    But you are right, JavaScript’s interpretation of this is surprising.
    It is probably also a waste of CPU cycles, as it performs more splits than needed.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.