GraphQL subscriptions

☢️ Warning this is a technical one!

We want our support app to feel super fast and importantly make collaboration between team members really easy. Like for any productivity app, an important part of this for us is making sure all UI updates are in real-time. We want to make sure you know instantly if an advisor starts to reply to a customer or issues are opened or closed by a back-end system.

This week we concluded the first big step in this direction by implementing GraphQL subscriptions for a customer’s timeline. It's hard to screenshot live-ness since… it's the same UI but … live. So instead we wanted to share a little on how this works behind the scenes, and specifically some of the technical challenges of building this in our architecture.

Typically GraphQL subscriptions are over WebSockets. Handling many WebSockets connections at scale is not a trivial thing to architect and implement yourself on traditional architectures. This is due to the connections being persistent and not stateless like HTTP requests (hence why there are so many real-time SaaS providers out there!).

Since our current architecture is primarily using AWS serverless solutions like AWS API Gateway, AWS Lambda, DynamoDB, and Eventbridge, we decided to try to implement a scalable GraphQL Subscription over WebSockets solution using these serverless technologies. The main hurdle we encountered was that the open source NodeJS GraphQL libraries all assume a long-running stateful server holding on to a WebSocket connection. This doesn’t exist with AWS API Gateway and Lambdas, so it meant that we needed to significantly reengineer how these libraries handled WebSockets.

The rough outline of our solution is as follows (this deserves a longer more detailed blog post): When a client subscribes to a customer’s timeline, it establishes a WebSocket connection to AWS API Gateway. This invokes our Lambda function with a connection id and the payload of the message. The connection id and the GraphQL subscription then get stored in DynamoDB table. When a new timeline entry is added to a customer’s timeline, then an event is fired and an event handler Lambda queries the DynamoDB table for WebSocket connections that are subscribed to that customer’s timeline. If it finds any it then executes the GraphQL subscription with the event as an argument to get a result. Finally, we send each connection the GraphQL result and… we're done! Phew.

That's the happy path, but with a host of detailed connection and performance issues to handle this was not trivial to implement. Also, for a front-end to really feel live, it had to not only be told of new entries (the easy ones) but also when an entry was updated or removed and it had to handle out-of-order or duplicate entries.

This is the final schema we ended up with:

  enum TimelineEntryChangeType {

  type TimelineEntryChange {
    changeType: TimelineEntryChangeType!
    timelineEntry: TimelineEntry!
    cursor: String!

  type Subscription {
    timelineChanges(customerId: ID!): TimelineEntryChange!

It's been really challenging but we're really happy with this first implementation - we've built a lot of infrastructure that is re-usable meaning that in the next few weeks and months we can roll out subscriptions for many other resources such as customers details, agent statuses, and anything and everything that needs to be real-time!

Fancy working on problems like this? We're hiring!

  • #changelog
© Plain. CS without the BS since 2020.