On splitting strings

Splitting strings is cool, but most languages have their subtle differences in how it is done. The three contenders are JavaScript, Python and Ruby. As an example, suppose you’re getting a string in the form “type_role_name” and you want to split it into type, role, name. The little twist here is that ‘name’ can also contain underscores. Let’s start reversed, it’s an easy job in Ruby:

irb> type,role,name = "user_admin_john_doe".split('_', 3)
=> ["user", "admin", "john_doe"]

Ruby’s split method want’s to know how much pieces you want. Onwards with Python:

>>> type,role,name = 'user_admin_john_doe'.split('_', 2)
('user', 'admin', 'john_doe')

I prefer this style slightly, as you have to express how many splits you want. The mindset here is that when you’re saying ‘split twice’, you’re expecting that the last element may contain the remainder of the string. By contrast, when saying, ‘give me three pieces’, it is unclear whether you want the remainder or not. Besides that, both are equally cool. Last but not least, how to do it in JavaScript? Well, turns out it’s rather messy:

“user_admin_john_doe”.split(‘_’, 3)
[“user”, “admin”, “john”]

Altough this is somewhat similar to Ruby (i.e. ‘gimme three pieces’), it continues to split and discards the remaining stuff. Furthermore, JavaScript doesn’t support multiple assigment (could also be called iterable unpacking), so you have to store the result into an array first and assign individually. But more importantly, how to get “john_doe”? Turns out that you have to fiddle with splitting off substrings:

str = "user_admin_john_doe"
type = str.slice(0, str.indexOf('_'));
str = str.slice(str.indexOf('_') + 1);
role = str.slice(0, str.indexOf('_'));
str = str.slice(str.indexOf('_') + 1);
name = str

Not exactly what I would call elegant but it does the trick. I’d say, it’s a draw game between Python and Ruby, whereas JavaScript failed big time 🙂


2 thoughts on “On splitting strings

  1. In perl it can be written like a plain old-school function call:

    @a = split(/_/, “user_admin_john_doe”, 3);
    [ ‘user’, ‘admin’, ‘john_doe’ ]

    I think of the 3 as ‘the return value shall be an array of 3 elements’.
    But you are right, JavaScript’s interpretation of this is surprising.
    It is probably also a waste of CPU cycles, as it performs more splits than needed.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s