On splitting strings

Splitting strings is cool, but most languages have their subtle differences in how it is done. The three contenders are JavaScript, Python and Ruby. As an example, suppose you’re getting a string in the form “type_role_name” and you want to split it into type, role, name. The little twist here is that ‘name’ can also contain underscores. Let’s start reversed, it’s an easy job in Ruby:

irb> type,role,name = "user_admin_john_doe".split('_', 3)
=> ["user", "admin", "john_doe"]

Ruby’s split method want’s to know how much pieces you want. Onwards with Python:

>>> type,role,name = 'user_admin_john_doe'.split('_', 2)
('user', 'admin', 'john_doe')

I prefer this style slightly, as you have to express how many splits you want. The mindset here is that when you’re saying ‘split twice’, you’re expecting that the last element may contain the remainder of the string. By contrast, when saying, ‘give me three pieces’, it is unclear whether you want the remainder or not. Besides that, both are equally cool. Last but not least, how to do it in JavaScript? Well, turns out it’s rather messy:

“user_admin_john_doe”.split(‘_’, 3)
[“user”, “admin”, “john”]

Altough this is somewhat similar to Ruby (i.e. ‘gimme three pieces’), it continues to split and discards the remaining stuff. Furthermore, JavaScript doesn’t support multiple assigment (could also be called iterable unpacking), so you have to store the result into an array first and assign individually. But more importantly, how to get “john_doe”? Turns out that you have to fiddle with splitting off substrings:

str = "user_admin_john_doe"
type = str.slice(0, str.indexOf('_'));
str = str.slice(str.indexOf('_') + 1);
role = str.slice(0, str.indexOf('_'));
str = str.slice(str.indexOf('_') + 1);
name = str

Not exactly what I would call elegant but it does the trick. I’d say, it’s a draw game between Python and Ruby, whereas JavaScript failed big time 🙂

Braindead Python packaging

As you all know, distributing and building packages with the openSUSE Build Service is easy and fun. The only party pooper is that you have to write a spec file to get your RPMs out there. Thanks to darix, we have a decent solution at least for Ruby packages: gem2rpm, a script which auto-generates RPM spec files for Ruby gems. Ever wondered why we don’t have something similar for Python? Well, I did so too. Thus, after a half a week of hackery, I’d like to introduce py2pack, my take on braindead Python packaging. Here’s how it goes:

Lets suppose you want to package zope.interface and you don’t know how it is named exactly or where to download from. First of all, you can search for it and download the source tarball automatically if you found the correct module:

$ py2pack search zope.interface
searching for module zope.interface...
found zope.interface-3.6.1
$ py2pack fetch zope.interface
downloading package zope.interface-3.6.1...
from http://pypi.python.org/packages/source/z/zope.interface/zope.interface-3.6.1.tar.gz

As a next step you may want to generate a package recipe for your distribution. For RPM-based distributions (let’s use openSUSE as an example), you want to generate a spec file named python-zopeinterface.spec:

$ py2pack generate zope.interface -t opensuse.spec -f python-zopeinterface.spec

The above command has the parameter “-t opensuse.spec”, which is also the default (and thus optional). Well, this seems to imply that you can generate other files (and rumor has it that Debian support is on the way, too) 🙂

Back to the topic, the source tarball and package recipe is all you need to generate the RPM file. This final step may depend on which distribution you use. Again, for openSUSE (and by using the openSUSE Build Service), the complete recipe becomes:

$ osc mkpac python-zopeinterface
$ cd python-zopeinterface
$ py2pack fetch zope.interface
$ py2pack generate zope.interface -f python-zopeinterface.spec
$ osc build
$ osc vc
$ osc commit

The first line uses osc, the Build Service command line tool to generate a new package (preferrably in your Build Service home project). The py2pack steps are known already. Finally, the package is tested (built locally), a changes (package changelog) file is generated (with ‘osc vc’) and the result is sent back to the Build Service for public consumption. However, depending on the Python module, you may have to adapt the generated spec file slightly. Py2pack is quite clever and tries to auto-generate as much as it can, but it depends on the metadata that the module provides. Thus, bad metadata implies mediocre results. To get further help about py2pack usage, issue the following command:

$ py2pack help

To demonstrate its capabilities, I re-packaged some Python packages that are in devel:languages:python. As you can see (when logged in to the Build Service), those package now build for all RPM-based distros we currently offer in the Build Service. Currently, you can download py2pack from the Python Package Index or install the package python-py2pack from the devel:languages:python repository.

Happy Python packaging!

Thoughts about Oracle vs. Google

As the above heading works perfectly for SEO currently, I should add my two cents on the topic. There’s been a lot of interesting posts to read about the Oracle suit against Google in the blogosphere recently. The most insightful is probably the comment of James Gosling (the Java creator). Personally, I’m not to concerned about the Java platform as I’ve never been a big fan of the Language and the Platform admittedly, so I am watching this as a bystander mostly. As a big supporter of Open Source and Free Software, to me this patent claim clearly shows that all (even popular) programming languages that are owned and maintained by a commercially interested body carry some dangers. Most of these where already explained by people way more capable than me but it is clear that for developers there is a rich choice of alternatives to choose from: C, C++, Perl, Python, PHP, Ruby, Go and some others. I’m not sure about the D language but everything based on the .NET platform should better be avoided. I may be biased but this case will only accelerate the demise of the Java platform, the server world gets more and more diverse and I’ve yet to see a Java killer app for the desktop. Actually, the Dalvik is the most interesting Java-related technology since a while and I fear that it will stay around a little longer in the mobile sector. The question if the net outcome will be positive for Oracle and for the Language, as this is likely to cause some FUD. Even if the patent claim is justified (as James Gosling suggests), I’m unsure if the extra billions are worth the long-time image and credibility losses. On the other hand its software patents again…

Another interesting aspect is that Google is back on the bright side of live after all the tremors caused by the Verizon net neutrality pact and the release of Google Street View in Germany. Now Oracle is the bad guy (again, after the OpenSolaris “press” release)…