Next I thought I would tackle parsing CSV data in all three languages. What could be more exciting right? Once again, this was born out of actual need - I was recently crunching some CSV data at work. But, I like it as an example (despite both the boring subject matter and “just look in the standard library” nature of the question) exactly because its very real world. I envy the developer that has never been called on to write ETL code, but I bet a lot of you have. It is that kind annoying task that comes up again and again, at least in my world!
So admittedly in Ruby, this is as easy as reaching into the standard library. Way, way back in the day there were gems that offered more features and faster parsing for CSVs than the code in the stdlib, but the Ruby maintainers smartly just integrated that code directly into std.
The documentation is straightforward and you can see the functionality is quite versatile, allowing for reading, writing, from files, from file-like IO objects, and from strings.
Perhaps most importantly, it correctly handles the first and most troublesome issue you always run into with CSV data - some field contains a comma in the data, rather than as markup, and your parsing trips on it. For example:
1 2 3 4 5
1 2 3
Note how the string
"magenta, purple" remains a single string and doesn’t get parsed into a row with 4 fields. Also note we threw it Windows-style line endings and it correctly dealt with that without us having to change the line termination field.
Very similar in Python, you can just reach into the stdlib to parse CSV data. On first glance the Python library is a bit more feature-rich than the Ruby one - offering things like sniffing out the format of the CSV file and reading direct into a dictionary instead of just arrays.
Where I got a little stumped though is that the 2.7.9 version of the library doesn’t support operating directly on strings. They give an example of how to achieve this functionality by wrapping the wring as a 1 item array, but this doesn’t seem to work with line ends embedded in the string. So you have to split the line first, unlike Ruby, then parse each line you find:
1 2 3 4 5 6 7
Once you get through that though, you once again get the correct data, that is
magenta, purple comes out right. Of course you wouldn’t need such gymnastics if you really were reading from a file and like Ruby, the library also supports parsing one line at a time instead of having to read all the data into memory first.
Trying this in Dart is an interesting look at the maturity of the community surrounding Dart. Dart doesn’t have a CSV parser in its standard library. That is not unexpected, as I keep going back to, given its client-side focus. So, we turn to pub.dartlang.org which is Dart’s packaging and publishing system.
There are a few options for CSV parsing, so this part of my trial and research really became a “do they work?” review. Note with dartlang.org, you don’t have the tools you do in Ruby or Python to guage the maturity of a library: such as number of downloads, for a tool like ruby-toolbox.
Several of the libraries I tried did indeed work, but you have to watch out for the output of
Here is an example using csv:
1 2 3 4 5 6 7 8
This will output:
Here is a complete example using
1 2 3 4 5 6 7 8 9 10 11 12
This will output:
1 2 3 4 5 6
Note though it would appear this library has no way to discover the length of a row, so you would have to already know that information in your code. That seems like a shortcoming.
All three languages have options to help you parse CSV data - if they didn’t in this day and age, I guess we would be a little worried. Ruby and Python obviously have some maturity in this area that Dart lacks, but that doesn’t mean you don’t have options in Dart that work well. We can also safely conclude that parsing CSV data is a terrible use of your time and skills, and here is hoping you don’t have to do it often!