Fruity Eclipse Collections

Author: Paul King
Published: 2022-10-13 11:05AM


This blog post continues on the theme of using fruit emoji from the previous post, but instead of deep learning, we’ll look at clustering using k-means after first exploring some top methods of Eclipse Collections with fruit emoji examples.

Eclipse Collections Fruit Salad

First, we’ll define a Fruit enum (it adds one additional fruit compared to the related Eclipse Collections kata):

enum Fruit {
    APPLE('🍎', Color.RED),
    PEACH('🍑', Color.ORANGE),
    BANANA('🍌', Color.YELLOW),
    CHERRY('🍒', Color.RED),
    ORANGE('🍊', Color.ORANGE),
    GRAPE('🍇', Color.MAGENTA)

    public static ImmutableList<Fruit> ALL = Lists.immutable.with(values())
    public static ImmutableList<String> ALL_EMOJI = Lists.immutable.with(*values()*.emoji)
    final String emoji
    final Color color

    Fruit(String emoji, Color color) {
        this.emoji = emoji
        this.color = color
    }

    static Fruit of(String emoji) {
        values().find{it.emoji == emoji }
    }
}

We can use this enum in the following examples which show off numerous common Eclipse Collections methods:

assert Lists.mutable.with('🍎', '🍎', '🍌', '🍌').distinct() ==
        Lists.mutable.with('🍎', '🍌')

var onlyBanana = Sets.immutable.with('🍌')

assert Fruit.ALL_EMOJI.select(onlyBanana::contains) == List.of('🍌')

assert Fruit.ALL_EMOJI.reject(onlyBanana::contains) ==
        List.of('🍎', '🍑', '🍒', '🍊', '🍇')

assert Fruit.ALL.groupBy(Fruit::getColor) ==
        Multimaps.mutable.list.empty()
                .withKeyMultiValues(RED, Fruit.of('🍎'), Fruit.of('🍒'))
                .withKeyMultiValues(YELLOW, Fruit.of('🍌'))
                .withKeyMultiValues(ORANGE, Fruit.of('🍑'), Fruit.of('🍊'))
                .withKeyMultiValues(MAGENTA, Fruit.of('🍇'))

assert Fruit.ALL.countBy(Fruit::getColor) ==
        Bags.immutable.withOccurrences(RED, 2, YELLOW, 1, ORANGE, 2, MAGENTA, 1)

Fruit.ALL_EMOJI.chunk(4).with {
    assert first == Lists.mutable.with('🍎', '🍑', '🍌', '🍒')
    assert last == Lists.mutable.with('🍊', '🍇')
}

// For normal threads, replace 'withExistingPool' line of code with:
//GParsExecutorsPool.withPool { pool ->
GParsExecutorsPool.withExistingPool(Executors.newVirtualThreadPerTaskExecutor()) { pool ->
    var parallelFruit = Fruit.ALL.asParallel(pool, 1)
    var redFruit = parallelFruit.select(fruit -> fruit.color == RED).toList()
    assert redFruit == Lists.mutable.with(Fruit.of('🍎'), Fruit.of('🍒'))
}

The last example calculates red fruit in parallel threads. As coded, it uses virtual threads when run on JDK19 with preview features enabled. You can follow the suggestion in the comment to run on other JDK versions or with normal threads. In addition to Eclipse Collections, we have the GPars library on our classpath. Here we are only using one method which is managing pool lifecycle for us.

Exploring emoji colors

For some fun, let’s look at whether the nominated color of each fruit matches the color of the related emoji. As in the previous blog, we’ll use the slightly nicer Noto Color Emoji fonts for our fruit as shown here:

Noto Color Emoji

We’ll use an Eclipse Collection BiMap to switch back and forth between the color names and java.awt colors:

@Field public static COLOR_OF = BiMaps.immutable.ofAll([
    WHITE: WHITE, RED: RED, GREEN: GREEN, BLUE: BLUE,
    ORANGE: ORANGE, YELLOW: YELLOW, MAGENTA: MAGENTA
])
@Field public static NAME_OF = COLOR_OF.inverse()

We are also going to use some helper functions to switch between RGB and HSB color values:

static hsb(int r, int g, int b) {
    float[] hsb = new float[3]
    RGBtoHSB(r, g, b, hsb)
    hsb
}

static rgb(BufferedImage image, int x, int y) {
    int rgb = image.getRGB(x, y)
    int r = (rgb >> 16) & 0xFF
    int g = (rgb >> 8) & 0xFF
    int b = rgb & 0xFF
    [r, g, b]
}

The HSB color space represents colors in a spectrum from 0 to 360 degrees:

Image credit: https://nycdoe-cs4all.github.io/units/1/lessons/lesson_3.2

Color Circle

We have two helper methods to assist with colors. The first picks out "mostly black" and "mostly white" colors while the second uses a switch expression to carve out some regions of the color space for our colors of interest:

static range(float[] hsb) {
    if (hsb[1] < 0.1 && hsb[2] > 0.9) return [0, WHITE]
    if (hsb[2] < 0.1) return [0, BLACK]
    int deg = (hsb[0] * 360).round()
    return [deg, range(deg)]
}

static range(int deg) {
    switch (deg) {
        case 0..<16 -> RED
        case 16..<35 -> ORANGE
        case 35..<75 -> YELLOW
        case 75..<160 -> GREEN
        case 160..<250 -> BLUE
        case 250..<330 -> MAGENTA
        default -> RED
    }
}

Note that the JDK doesn’t have a standard color of PURPLE, so we combine purple with magenta by choosing an appropriate broad spectrum for MAGENTA.

We used a Plotly 3D interactive scatterplot (as supported by the Tablesaw Java dataframe and visualization library to visualize our emoji colors (as degrees on the color spectrum) vs the XY coordinates: Color vs xy plot

We are going to try out 3 approaches for determining the predominant color of each emoji:

  • Most common color: We find the color spectrum value for each point and count up the number of points of each color. The color with the most points will be selected. This is simple and works in many scenarios but if an apple or cherry has 100 shades of red but only one shade of green for the stalk or a leaf, green may be selected.

  • Most common range: We group each point into a color range. The range with the most points will be selected.

  • Centroid of biggest cluster: We divide our emoji image into a grid of sub-images. We will perform k-means clustering of the RGB values for each point in the sub-image. This will cluster similar colored points together in a cluster. The cluster with the most points will be selected and its centroid will be chosen as the selected pre-dominant color. This approach has the affect of pixelating our sub-image by color. This approach is inspired by this python article.

Most Common Color

Ignoring the background white color, the most common color for our PEACH emoji is a shade of orange. The graph below shows the count of each color: Color histogram for PEACH

Most Common Range

If instead of counting each color, we group colors into their range and count the numbers in each range, we get the following graph for PEACH: Range histogram for PEACH

K-Means

K-Means is an algorithm for finding cluster centroids. For k=3, we would start by picking 3 random points as our starting centroids.

kmeans step 1

We allocate all points to their closest centroid:

kmeans step 2

Given this allocation, we re-calculate each centroid from all of its points:

kmeans step 3

We repeat this process until either a stable centroid selection is found, or we have reached a certain number of iterations. We used the K-Means algorithm from Apache Commons Math.

Here is the kind of result we would expect if run on the complete set of points for the PEACH emoji. The black dots are the centroids. It has found one green, one orange and one red centroid. The centroid with the most points allocated to it should be the most predominant color. (This is another interactive 3D scatterplot.)

RgbPeach3d We can plot the number of points allocated to each cluster as a bar chart. (We used a Scala plotting library to show Groovy integration with Scala.) Peach colour centroid sizes

The code for drawing the above chart looks like this:

var trace = new Bar(intSeq([1, 2, 3]), intSeq(sizes))
        .withMarker(new Marker().withColor(oneOrSeq(colors)))

var traces = asScala([trace]).toSeq()

var layout = new Layout()
        .withTitle("Centroid sizes for $fruit")
        .withShowlegend(false)
        .withHeight(600)
        .withWidth(800)

Plotly.plot(path, traces, layout, defaultConfig, false, false, true)

K-Means with subimages

The approach we will take for our third option enhances K-Means. Instead of finding centroids for the whole image as the graphs just shown do, we divide the image into subimages and perform the K-Means on each subimage. Our overall pre-dominant color is determined to be the most common color predicated across all of our subimages.

Putting it all together

Here is the final code covering all three approaches (including printing some pretty images highlighting the third approach and the Plotly 3D scatter plots):

var results = Fruit.ALL.collect { fruit ->
    var file = getClass().classLoader.getResource("${fruit.name()}.png").file as File
    var image = ImageIO.read(file)

    var colors = [:].withDefault { 0 }
    var ranges = [:].withDefault { 0 }
    for (x in 0..<image.width) {
        for (y in 0..<image.height) {
            def (int r, int g, int b) = rgb(image, x, y)
            float[] hsb = hsb(r, g, b)
            def (deg, range) = range(hsb)
            if (range != WHITE) { // ignore white background
                ranges[range]++
                colors[deg]++
            }
        }
    }
    var maxRange = ranges.max { e -> e.value }.key
    var maxColor = range(colors.max { e -> e.value }.key)

    int cols = 8, rows = 8
    int grid = 5 // thickness of black "grid" between subimages
    int stepX = image.width / cols
    int stepY = image.height / rows
    var splitImage = new BufferedImage(image.width + (cols - 1) * grid, image.height + (rows - 1) * grid, image.type)
    var g2a = splitImage.createGraphics()
    var pixelated = new BufferedImage(image.width + (cols - 1) * grid, image.height + (rows - 1) * grid, image.type)
    var g2b = pixelated.createGraphics()

    ranges = [:].withDefault { 0 }
    for (i in 0..<rows) {
        for (j in 0..<cols) {
            def clusterer = new KMeansPlusPlusClusterer(5, 100)
            List<DoublePoint> data = []
            for (x in 0..<stepX) {
                for (y in 0..<stepY) {
                    def (int r, int g, int b) = rgb(image, stepX * j + x, stepY * i + y)
                    var dp = new DoublePoint([r, g, b] as int[])
                    var hsb = hsb(r, g, b)
                    def (deg, col) = range(hsb)
                    data << dp
                }
            }
            var centroids = clusterer.cluster(data)
            var biggestCluster = centroids.max { ctrd -> ctrd.points.size() }
            var ctr = biggestCluster.center.point*.intValue()
            var hsb = hsb(*ctr)
            def (_, range) = range(hsb)
            if (range != WHITE) ranges[range]++
            g2a.drawImage(image, (stepX + grid) * j, (stepY + grid) * i, stepX * (j + 1) + grid * j, stepY * (i + 1) + grid * i,
                    stepX * j, stepY * i, stepX * (j + 1), stepY * (i + 1), null)
            g2b.color = new Color(*ctr)
            g2b.fillRect((stepX + grid) * j, (stepY + grid) * i, stepX, stepY)
        }
    }
    g2a.dispose()
    g2b.dispose()

    var swing = new SwingBuilder()
    var maxCentroid = ranges.max { e -> e.value }.key
    swing.edt {
        frame(title: 'Original vs Subimages vs K-Means',
                defaultCloseOperation: DISPOSE_ON_CLOSE, pack: true, show: true) {
            flowLayout()
            label(icon: imageIcon(image))
            label(icon: imageIcon(splitImage))
            label(icon: imageIcon(pixelated))
        }
    }

    [fruit, maxRange, maxColor, maxCentroid]
}

println "Fruit  Expected      By max color  By max range  By k-means"
results.each { fruit, maxRange, maxColor, maxCentroid ->
    def colors = [fruit.color, maxColor, maxRange, maxCentroid].collect {
        NAME_OF[it].padRight(14)
    }.join().trim()
    println "${fruit.emoji.padRight(6)} $colors"
}

Here are the resulting images:

peach images banana images cherry images orange images grape images apple images

And, here are the final results:

results

In our case, all three approaches yielded the same results. Results for other emojis may vary.