The real 10x engineer

If you came here - already internally screaming at the sheer mention of 10x engineers — then I'm very sorry. This isn't about that. They unfortunately really don't exist. That said, I understand why the myth is such a popular one: the idea of having someone on the team who knows everything and never makes mistakes is incredibly appealing - who wouldn't want someone like that on the team?

This post is about how we might be able to make some of that dream a reality. Not by recruiting, but by building good internal tooling instead.

One of our core principles at Plain is that we want to keep our team intentionally small. This belief impacts different areas in our company: hiring, culture, communication, technology, processes, etc. Both of our founders — Simon and Matt — regularly remind us all what this means, why it’s important for us and why we should keep it that way.

A key part of this is making sure we can spend our time building rather than discussing how to build. The general rule is: if we all agree on a convention and it’s possible to enforce it via code, then we'll do it, so we don’t have to talk or think about it ever again.

Using linters to enforce conventions

Linters have always been a great tool to transform conventions into actual rules that can be automatically enforced (or at least flagged when they’re not followed). At the end of the day, no one really likes to have discussions about the right indentation size, how imports should be sorted, etc. It’s usually one engineer imposing their personal opinion on how to do X, while the rest of the team will just give in and let the (usually loudest) engineer be right.

Styling conventions are just the surface of what a good linter can offer you. Static code analysis can go much further:

  • Specific code conventions. For example: ‘all exported functions in this package should take ctx as the first argument’
  • Code structure. For example: ‘object from domain X cannot be directly altered by object from domain Y’
  • Security and compliance. For example: ‘you should never log a user name’. If you define infrastructure as code, you can also write rules which explicitly disallow access from one resource to another. For instance: ‘none of the lambdas from domain A should have access to databases on domain B’

Consistently enforced conventions are important for us because Plain is an API-first company. We want to give you an amazing customer service tool for your business, as well as enabling you build your own support tools by leveraging the same GraphQL APIs we use — allowing you to build tailored support tooling without having to build everything from scratch. Our API is at the core of our product — both internally and externally.

This brings some challenges: all of our opinions, design decisions, and flaws will be exposed to the public. That’s why we take API changes very seriously, usually involving most of the product team (remember we’re a very small team!) This way we get the feedback from people working in different parts of our stack, who will have different views on how a shiny new API query or mutation we’re adding is going to be used.

There are a few things that are not up for debate. Not because the person driving the API change has more authority than the rest of the team, but rather, because we decided to add a very opinionated engineer to the mix: a GraphQL schema linter. Our own 10x engineer 😏.

During a thorough review of our API schemas, we had realised that we were ‘loosely’ applying some conventions without anything in place to actually enforce them. Instead, we relied on whoever was reviewing API schema changes to detect problems and inconsistencies. It was time to automate this.

Enter... GraphQL schema linter

A quick Google search gave us a solution for linting GraphQL schemas: graphql-schema-linter. This tool works as a thin wrapper around graphql/validation, which implements the GraphQL validation spec and it is used by graphql-js for schema and query/mutation validation.

With graphql-schema-linter, we can easily define our own rules in code so that we can offload all the opinions and conventions we have agreed on during the past year to a machine.

We could have written our own little tool leveraging graphql/validation’s visitor’s API. However, as a small team, we decided that graphql-schema-linter was exactly what we needed: a way to apply custom rules on our schema and get the errors back presented in a way that felt familiar to our developers.

A few examples of rules we are applying are:

  • All boolean fields start with is or has and they’re not nullable
  • All mutations take an input object whose type has the name <MutationName>Input and return a type that has the name <MutationName>Output
  • Queries should not return arrays, but use pagination
  • Pagination must follow the relay pagination spec (https://relay.dev/graphql/connections.htm)
  • All mutation outputs must include an error field with a specific type (MutationError)
  • All of the types defined in the schema are actually used
  • Simple id fields in inputs are disallowed. Instead, we require them to be more explicit about the entity they link to (for example: customerId)
  • Rules around union types. For instance, if an union type is called Entry, all of the types in this union must end with the word Entry
  • All common datetime fields must be non-nullable and have a specific type. For example, createdAt, updatedAt.

Since the rules can be specified in code, it felt natural to unit test them. So that’s exactly what we did: every rule we define has a set of tests to ensure the rule covers our requirements. The tests and the schema linter run as part of our CI pipeline, so we won’t allow any code merges with API schemas that don’t follow our conventions.

It’s impressive we’ve come this far without showing any code. Below you can see one of our simplest rules, which makes sure the input and output types of a mutation are not-nullable and have a specific name.

import { ValidationRule } from 'graphql';
import { GraphQLError } from 'graphql/error';

export const mutationInputAndOutputConvention: ValidationRule = (context) => {
  return {
    ObjectTypeDefinition: (node, _key, _parent, _path, _ancestors) => {
      if (node.name.value === 'Mutation') {
        const mutations = node.fields;

        if (!mutations) {
          return;
        }

        mutations.map((mutation) => {
          const input = mutation.arguments?.find((i) => i.name.value === 'input');
          if (!input) {
            return;
          }
          const inputKind = input.type.kind;
          const outputKind = mutation.type.kind;

          if (inputKind !== 'NonNullType') {
            context.reportError(
              new GraphQLError(
                `Input type on ${mutation.name.value} Mutation should be non-nullable`,
                [node],
              ),
            );
          }

          if (outputKind !== 'NonNullType') {
            context.reportError(
              new GraphQLError(
                `Output type on ${mutation.name.value} Mutation should be non-nullable`,
                [node],
              ),
            );
          }
        });
      }
    },
  };
};

The idea behind how this works is simple. Every rule is a GrapqhQL node visitor. When the validator walks through the AST, it’ll trigger your visitor (based on the functions you define in it). In the example above, our validation rule will be invoked for every ObjectTypeDefinition node it finds in the AST.

Below you can see another example of a rule. In this case, it makes sure that all our boolean variables are named correctly (starting with is or has) and that they are non-nullable. Note how in this case, we’re interested in FieldDefinition nodes.

import { ValidationRule } from 'graphql';
import { GraphQLError } from 'graphql/error';
import { FieldDefinitionNode, TypeNode } from 'graphql/language/ast';
import _ from 'lodash';

export const booleanNamingConvention: ValidationRule = (context) => {
  return {
    FieldDefinition: (node) => {
      if (isBooleanType(node.type)) {
        if (!isBooleanFieldNamedCorrectly(node.name.value)) {
          context.reportError(
            new GraphQLError(
              `Boolean field '${
                node.name.value
              }' should be prefixed with 'is' or 'has' (e.g. "is${_.capitalize(node.name.value)}")`,
              [node],
            ),
          );
        }
        if (isNullableField(node)) {
          context.reportError(
            new GraphQLError(
              `Boolean field '${node.name.value}' should not be nullable (e.g. "${node.name.value}: Boolean!")`,
              [node],
            ),
          );
        }
      }
    },
  };
};

function isBooleanType(node: TypeNode): boolean {
  if (node.kind === 'NamedType') {
    return node.name.value === 'Boolean';
  }
  return isBooleanType(node.type);
}

function isBooleanFieldNamedCorrectly(fieldName: string) {
  return fieldName.startsWith('is') || fieldName.startsWith('has');
}

function isNullableField(node: FieldDefinitionNode): boolean {
  return node.type.kind === 'NamedType';
}

If it works for us, it will work for you

In our case, our desire to stay small is what makes automated way to check our GraphQL API schemas worthwhile. But there is more to it.

For bigger organisations with a GraphQL API, consistent and clean APIs are just as important - you’ll have to deal with a higher number of opinions and comments during code reviews that don’t add a lot of value. Enforcing API conventions with a linter will help you get rid of all the noise, making sure the whole organisation adheres to the same rules — compared to, for example, having ten different ways of exposing paginated results across different teams.

Developer experience improves dramatically when you have some guardrails that help you apply conventions without being aware of all of them. This has a real positive impact both inside and outside of your team. Your engineers will be able to focus on the actual interface design and implementation details while your API clients will find it much easier to consume your API.

If you — like us — are part of a small product team, make sure that everything you can automate is actually automated. There’s a whole category of problems that go away with proper static code checking. Problems which some people would probably try to avoid by hiring one of those…10x engineers. Investing in your developer toolchain can give you a massive return of investment and you will learn a lot along the way.

  • #blog
© Plain. CS without the BS since 2020.
plain.com