Processing Results for the Critérium du Dauphiné

Author: Paul King
Published: 2023-06-13 05:00PM


The 2023 Critérium du Dauphiné has just finished. Let’s examine the results using Groovy and DuckDB. For the purposes of this post, we are interested in the overall top ten riders in the general classification.

Results file

Our results are stored in a CSV file:

lines from CSV file

In a previous article, we looked at reading and writing CSV files using a number of CSV libraries. Today we will use a nice feature of DuckDB which can read in CSV files on the fly.

Our goal is very simple, just print out the information but grouped by each rider’s country.

Sql.withInstance('jdbc:duckdb:', 'org.duckdb.DuckDBDriver') { sql ->
    println "Country  Riders                                  Places    Count"
    sql.eachRow("""SELECT Country,
        rpad(string_agg(Rider, ', '), 40, ' ') as Riders,
        rpad(string_agg(Place, ', '), 10, ' ') as Places,
        bar(count(Country), 0, 4, 30) as Count
        FROM 'topten.csv' GROUP BY Country""") { row ->
        row.with {
            println "$Country       $Riders$Places$Count"
        }
    }
}

Here we are making use of several built-in functions from DuckDB including rpad to right pad the output, string_agg to aggregate the riders (and their places) from the same country, and bar to produce a pretty barchart.

Running this code produces the following output:

Country  Riders                                  Places    Count
🇩🇰       JONAS VINGEGAARD                        1         ███████▌
🇬🇧       ADAM YATES                              2         ███████▌
🇦🇺       BEN O’CONNOR, JAI HINDLEY, JACK HAIG    3, 4, 5   ██████████████████████▌
🇫🇷       GUILLAUME MARTIN, JULIAN ALAPHILIPPE    6, 10     ███████████████
🇿🇦       LOUIS DU BOUISSON MEINTJES              7         ███████▌
🇳🇴       TORSTEIN TRÆEN                          8         ███████▌
🇪🇸       CARLOS RODRIGUEZ CANO                   9         ███████▌

We can achieve a similar result using GQuery (AKA GINQ) using the following code:

var f = 'topten.csv' as File
var lines = f.readLines()*.split(',')
var cols = lines[0].size()
var rows = lines[1..-1].collect{row ->
    (0..<cols).collectEntries{ col -> [lines[0][col], row[col]] }}
var commaDelimited = Collectors.joining(', ')
var aggRiders = { it.stream().map(rec -> rec.r.Rider).collect(commaDelimited) }
var aggPlaces = { it.stream().map(rec -> rec.r.Place).collect(commaDelimited) }
println GQ {
    from r in rows
    groupby r.Country
    select r.Country,
        agg(aggRiders(_g)) as Riders,
        agg(aggPlaces(_g)) as Places,
        '██' * count(r.Country) as Count
}

We could use a CSV library to read in the data, but for this simple example we’ll just use Groovy’s line/text processing capabilities. GQuery doesn’t currently have built in equivalents to bar or string_agg so we roll our own crude bar character function and aggregators, aggRiders and aggPlaces.

Running this code gives the following output:

+---------+--------------------------------------+---------+--------+
| Country | Riders                               | Places  | Count  |
+---------+--------------------------------------+---------+--------+
| 🇦🇺      | BEN O’CONNOR, JAI HINDLEY, JACK HAIG | 3, 4, 5 | ██████ |
| 🇩🇰      | JONAS VINGEGAARD                     | 1       | ██     |
| 🇳🇴      | TORSTEIN TRÆEN                       | 8       | ██     |
| 🇿🇦      | LOUIS DU BOUISSON MEINTJES           | 7       | ██     |
| 🇬🇧      | ADAM YATES                           | 2       | ██     |
| 🇪🇸      | CARLOS RODRIGUEZ CANO                | 9       | ██     |
| 🇫🇷      | GUILLAUME MARTIN, JULIAN ALAPHILIPPE | 6, 10   | ████   |
+---------+--------------------------------------+---------+--------+

For Groovy 5, we are currently exploring the possibility of adding more functions in GQuery, like bar and stringAgg. If you are interested in this functionality, please get in touch via the mailing lists or Slack.