Yetto A portrait of Billy Marx Vanzetti, Yetto's nonbinary, anti-capitalist mascot

How our labeling system works

Garen Torikian's Profile Picture

Garen Torikian

Co-Founder, CTO

Engineering

Labels: you either love them or hate them, but you can't escape them. Whether you're sorting through emails, tagging music, or finding your next task to pick up, an application's labeling system can help categorize and organize information.

And everyone's been doing it wrong.

One of the problems with most tagging systems is that they're a flat hierarychy, which leads to situations where they can't accurately describe the thing they're supposed to be naming. If you're looking for a specific email, you might have to sift through a dozen labels to find it. Or, if you're trying to find a specific song, you might have to scroll through a list of genres to find the right one. Most labels you've seen or worked with are a flat hierarchy: they don't next, they don't group, and they don't categorize.

As an example, let's look at a very generic label that we all know communicates so much on its own: bug. Now, I know that this might seem like enough of a description—it's a bug!—but I contend that that's only because this is the only labeling system we've ever really been given online.

Let's say this bug label belongs to an issue tracker in an application which has two features: switches and plugs. Obviously, you'd also have switches and plugs labels, and apply them alongside bug for any issues that come up in the app. The company building this app also believes that every part of the app needs to be documented. So, a documentation label is made. As well, they want everything tracked by urgency, so the usual numbering suite of p0, p1, p2, and p3 are created to help with prioritization.

The issues are labeled, and the queue is dwindling: priorities change. The company discovers that people love their product, but need more documentation. How do they find which issues require documentation for which products? Did everyone remember to tag either switches or plugs with documentation? Or was that something that should've been discussed before dozens of issues were tagged?

The issues are labeled, and the queue is increasing: bugfixes are prioritized. An email from a customer arrives, discussing both a bug in switches and a lack of documentation for plugs. How should this be labeled? With bug, switches, plugs, and documentation all at once? Which category applies to which app feature?

@birdcar has written previously on why Yetto's labeling system is unique. By migrating to a hierarchical structure, our labels can be represented as a graph, which is both visually easier to understand and conceptually more flexible than existing flat label systems. Today, I'm going to talk to you about how it was built.

Patching Postgres

At Yetto, we're huge fans of Postgres, The One True Database 1. We've always found that, no matter what the requirements of our data storage format may be, Postgres can handle it. For our hierarchy, we quickly discovered that Postgres has a data type called ltree, which could be used for our labels. The gist of it is that you could have a table called tree that looks like this:

create table tree(
  id serial primary key,
  letter char,
  path ltree
)

path represents the fully qualified path of any node in this graph. If you have two labels, called bug and p0, and if p0 is a child of bug, then its path in the table would be bug.p0. Similarly, if the child of p0 were switches, its path would be bug.p0.switches.

The problem with Postgres' implementation is twofold. First, the documentation says:

Labels must be less than 256 characters long

As the child of immigrants, I can tell you one thing: English isn't the only language. Just as with accessibility, localization is an oft ignored part of software development—but it’s extremely important to me. I felt that 256 characters were too limited for non-English languages, let alone non-Latin based ones, which often require more bytes per character.

But aside from that, there was another problem:

in C locale the characters A-Za-z0-9_ are allowed

Depending on the locale of your database, your available character set changes. If your database is configured in the C locale (as is the Postgres default, and as ours is), then you could only use the letters A through Z (upper and lowercase), digits, and the underscore (_) character to describe a label path.

That’s a very limited character set! If our Yetto users in Japan, Armenia, or Vietnam want to use our app, they would need to either make sure all their labels were in English, or, we would need to forego support for non-English labels.

Or, option three: change how Postgres works.

The default ltree character set has been bemoaned about on forums such as StackOverflow, and in 2019, there was a patch to expand the available character set, but it was never merged. Taking up the cause, in 2022, I submitted a patch for Postgres which made two changes:

  • The length of an ltree could be 512 bytes long
  • - was now permitted as a character

That second change helped resolve our localization story. While we could have dumped our Postgres database, converted the locale, and reimported all of our records, I decided that, in order to not restrict ourselves to any database locale, we could convert non-English characters into punycode, and store those results in the ltree column. For example, the path գույն.կարմիր could be represented as xn--09a0bdg4c.xn--y9aukv3dc. Granted, this balloons the string length (from 12 bytes to 27), but we felt the trade-off was worth it.

These changes were released in Postgres 16!

Grafting trees

There are very few technical tutorials available on how to properly implement ltrees in Postgres. One of the most oft cited articles is this one by @bustawin (aka Xavier Bustamante). In the article, Xavier provides a sample graph which looks like this:

                    Animal
                      / \
                     /   \
                    /     \
                   /       \
                  /         \
                 /           \
                /             \
               /               \
              /                 \
            Pet               Livestock
           /   \                      \
          /     \               /      \
         /       \             /        \
        /         \           /          \
       /           \         /            \
      /             \       /              \
    Cat             Pig   Sheep            Cow
                     / \
                    /   \
                   /     \
                  /       \
                 /         \
           Teacup       Iberian

As an example, the path to the Iberian node looks like Animal.Pet.Pig.Iberian. The challenge is this: how do you reparent Pig such that it's both a Pet and a Livestock?

The syntax of ltrees—clumps of strings separated by .s—is simple enough to grasp. In an earlier iteration of our labeling system, I was managing these paths manually. That is, if Animal.Livestock.Pig needed to be created, I would manually walk up and down the tree, creating and updating paths, by manually splitting and joining the node names—in application code, no less!

While this code "worked," in practice, it was totally error prone. Postgres’ ltree data type is flexible—too flexible. You can create nonsensical, self-referential paths like Animal.Pet.Animal. As far as ltree is concerned, any node can point to any other. To add some guidance here, we can set up our node tree to operate as a directed acyclic graph, or DAG. A DAG structure helps define which relationships in the tree should be impossible.

You can create a backend system that works according to spec, but it doesn't mean anything if it's unusable. When I began building the UI for labels, I quickly realized that users could get themselves in a world of pain creating nesting labels within themselves, with grandchildren and grandparents sharing identies (like that one Futurama episode). Working around such restrictions in the application code turned out to be harder than expected. Animal.Pet.Animal is easy enough to check. But what about Animal.Livestock.Pet.Pig.Livestock.Pig? The following insertion statements are all valid by default:

insert into tree (letter, path) values ('A', 'A');
insert into tree (letter, path) values ('B', 'A.B');
insert into tree (letter, path) values ('C', 'A.B.C');
insert into tree (letter, path) values ('D', 'A.B.C.D');
-- what
insert into tree (letter, path) values ('E', 'A.B.C.D.B.D');

Tracking node reshuffles in the application turned out to be terribly fraught with errors. Instead, we modified PSQL functions which Xavier and his team shared. By moving the tree manipulation to the database, we can gaurantee both faster operations and data integrity. Open source wins again!

Making sense of it all

Now, even though the backend has prevented users from creating invalid relationships, we still need a way to prevent users from creating these in the first place. It would be a pretty terrible experience if the system prevented you from connecting labels, without explaining to you why in the first place.

Yetto's label overview looks like this:

picture of the label management view in Yetto's inbox settings

In order to create a child or parent relationship, you click on Add to present the list of available options:

picture of the label dropdown

In the image above, Documentation cannot make itself a parent, and because it's already a parent of Guide and Reference, it can't be their parent again, either. Setting up this initial view is not complicated, but it can take a long time to set up all the proper relationships and disable the invalid ones. We actually render each dropdown with a list of all the labels, then asynchronously query the database to get a list of valid relationships for each label. Users see no delay when loading and navigating the page, but in the background, HTML disabled attributes are being applied wherever appropriate.

The real joy comes from label selection. Watch what happens to the dropdown as we try to make Status a child of Documentation:

picture of the label dropdown on management page

As mentioned, grandchildren shouldn't be their own grandparents. By selecting Status, all of its children are unable to be selected. Again, even though the database would've prevented this faulty data from being inserted in the first place, it's still important to make sure that people are kept from getting themselves into trouble!

Gotta tag 'em all

Once all your tags are set up the way you'd like, looking for them and applying them onto conversations works exactly as you'd expect:

picture of the label dropdown on conversation page

Here, we're showing how you can explicitly describe which feature in the app has a bug.p0. You can apply as many of these nested labels as you like, so you can imagine that there are also documentation.plugs and documentation.switches labels to further scope what a conversation is all about.

Creating your own labels

Organizing data in hierarchies is superior to a flat tagging system, because it provides a more complete representation of the data you're organizing. By grouping labels with each other, you can find and identify the tickets that matter to you most. Whether it's your personal inbox or your team's support conversations, Yetto's labels can do what other tagging systems do, and so much more.

If you'd like to give our support desk a try, sign up for access now!


1: Okay, SQLite is pretty cool too.