Data and Devops practices and tools

Tuesday, December 1, 2020

Running a scalable & reliable GraphQL endpoint with Serverless

To make the most of this tutorial, sign up for Serverless Framework’s dashboard account for free:
https://app.serverless.com

New to AppSync? Check out this Ultimate Guide to AWS AppSync

Part 1: GraphQL endpoints with API Gateway + AWS Lambda (this post)
Part 2: AppSync Backend: AWS Managed GraphQL Service
Part 3: AppSync Frontend: AWS Managed GraphQL Service

Introduction

Over the last four years, I've been exploring the world of big data, building real-time and batch systems at scale. For the last couple of months, I've been developing products with serverless architectures here at Glassdoor.

Given the intersection of serverless and big data, there have been a few questions on everyone's mind:

1) How can we build low latency APIs to serve complex, high dimensional and big datasets? 2) Using a single query, can we construct a nested response from multiple data sources? 3) Can we build an endpoint which can securely aggregate and paginate through data and with high performance? 4) And is there a way we can do all of that at scale, paying only for each query execution, and not for idle CPU time?

The answer for us ended up largely being GraphQL.

This post aims to show you how you too can streamline your existing workflow and handle complexity with ease. While I won't be digging deep into specific things Glassdoor was working on, I will be showing you a pretty related example that utilizes a mini Twitter clone I made.

Ready to talk about creating Serverless GraphQL endpoints using DynamoDB, RDS and the Twitter REST API? Ready to see a sweet performance comparison? Ready to hear some solid techniques on how you can convince the backend team that using GraphQL is a great idea?

Awesome. Let's go.

Note For the GraphQL and Serverless primer, keep reading. Or click here to go straight to the code walkthrough

What is GraphQL?

I’m going to start this off by stating a fact: The way we currently build APIs, as a collection of micro-services that are all split up and maintained separately, isn’t optimal. If you're a fellow back-end or front-end engineer, you're probably familiar with this struggle.

Luckily for us, the tech horizon is ever-expanding. We have options. And we should use them.

GraphQL lets you shrink your multitude of APIs down into a single HTTP endpoint, which you can use to fetch data from multiple data sources.

In short, it lets you: 1. Reduce network costs and get better query efficiency. 2. Know exactly what your response will look like and ensure you're never sending more or less than the client needs. 3. Describe your API with types that map your schema to existing backends.

Thousands of companies are now using GraphQL in production with the help of open source frameworks built by Facebook, Apollo, and Graphcool. Starbucks uses it to power their store locator. When I read that, it made my morning coffee taste even better. 😉

Danielle's slide from Serverless and GraphQL meetup at Glassdoor, Jan 29, 2018

Very reasonably, you are probably thinking, “Yeah, okay, Facebook is one thing; they have a giant engineering team. But for me, having only one API endpoint is too risky. What if it goes down? How do I handle that much load? What about security?”

You are absolutely correct: with one HTTP endpoint, you need to be entirely sure that endpoint never goes down and that it scales on demand.

That’s where serverless comes in.

What is Serverless?

Serverless has gained popularity over last few years, primarily because it gives developers flexibility.

With Serverless comes the following: 1. No server management (no need to manage any form of machine) 2. Pay-per-execution (never pay for idle) 3. Auto-scale (scale based on demand) 4. Function as a unit of application logic

What makes Serverless and GraphQL such a great fit?

When moving to GraphQL, you suddenly rely on one HTTP endpoint to connect your clients to your backend services. Once you do decide to do that, you want this one HTTP endpoint to be: reliable, fast, auto-scaling and have a small attack vector regarding security.

All these properties are fulfilled by a single AWS Lambda function in combination with an API Gateway. It’s just a great fit!

In sum, powering your GraphQL endpoint with a serverless backend solves scaling and availability concerns outright, and it gives you a big leg up on security. It’s not even that much code or configuration.

It takes only a few minutes to get to a production-ready setup, which we're about to dive into, right now.

Serverless-GraphQL repository

With the shiny new Serverless and GraphQL Repository, it’s incredibly straightforward to get your HTTP endpoint up and running.

alt text

The repository comes in two flavors: API Gateway + Lambda backend, or AppSync backend. (More backend integrations, including Graphcool Prisma, Druid, MongoDB, and AWS Neptune, forthcoming.)

Note: I’m going to focus on AWS Lambda below, but know that you can use any serverless provider (Microsoft Azure, Google Cloud Functions, etc.) with GraphQL.

Let's create a Serverless GraphQL Endpoint

To create this endpoint, I'm going to be using the Apollo-Server-Lambda package from npm. (You can also use Express, Koa, or Hapi frameworks but I prefer less complexity and more simplicity). Also, to make your endpoint production ready, you might want to integrate the lambda function with Cloudwatch-metrics, AWS X-Ray or Apollo Engine for monitoring and debugging.

Some of the main components of building your endpoint are (with links to serverless-graphql repo):

handler.js: lambda function handler to route HTTP requests and return the response.
serverless.yml: creates AWS resources and sets up the GraphQL endpoint.
schema.js: defines our GraphQL schema we're using to build this mini Twitter app.
resolver.js: defines query handler functions to fetch data from our other services (RDS, REST, DynamoDB, etc.).

Step 1: Configure the Serverless template

We'll be using the Serverless Framework to build and deploy your API resources quickly. If you don't have the Framework installed, get it with npm install serverless -g.

To start, specify in your serverless.yml that you are setting up a GraphQL HTTP endpoint:

1functions:
2  graphql:
3    handler: handler.graphqlHandler
4    events:
5    - http:
6        path: graphql
7        method: post
8        cors: true

Now, any HTTP POST event on the path /graphql will trigger the graphql Lambda function, and will be handled by graphqlHandler.

Step 2: Configure the Lambda function (Apollo-Server-Lambda)

Set up the callback to Lambda in your handler.js file:

1import { graphqlLambda, graphiqlLambda } from 'apollo-server-lambda';
2import { makeExecutableSchema } from 'graphql-tools';
3import { schema } from './schema';
4import { resolvers } from './resolvers';
5
6const myGraphQLSchema = makeExecutableSchema({
7  typeDefs: schema,
8  resolvers,
9});
10
11exports.graphqlHandler = function graphqlHandler(event, context, callback) {
12  function callbackWithHeaders(error, output) {
13    // eslint-disable-next-line no-param-reassign
14    output.headers['Access-Control-Allow-Origin'] = '*';
15    callback(error, output);
16  }
17
18  const handler = graphqlLambda({ schema: myGraphQLSchema });
19  return handler(event, context, callbackWithHeaders);
20};

In your Lambda function, GraphQL Schema and Resolvers will be imported (as I'll explain further in a minute).

Once API Gateway triggers an event, the graphqlLambda function will handle it. The response is sent back to the client.

Step 3: Create a GraphQL schema

For this post, I am going to focus on a subset of the schema to keep things simple—I'll handle mutations and subscriptions in a future post:

type Query {
    getUserInfo(handle: String!): User!
}

type Tweet {
    tweet_id: String!
    tweet: String!
    handle: String!
    created_at: String!
}

type TweetConnection {
    items: [Tweet!]!
    nextToken: String
}

type User {
    name: String!
    description: String!
    followers_count: Int!
    following: [String!]!
    topTweet: Tweet
    tweets(limit: Int!, nextToken: String): TweetConnection
}

Step 4: Create your GraphQL resolvers

Still with me? Great. Let's dive deep into how Lambda retrieves data from DynamoDB, RDS and, the REST backend.

We'll use the getUserInfo field as an example. This field takes a Twitter handle as input and returns that user's personal and tweet info.

Setting up the DynamoDB backend

First, we'll create two tables (Users and Tweets) to store user and tweet info respectively. We'll also be using Global Secondary Index (tweet-index) on Tweets table to sort all user tweets by timestamp.

These resources will be created using the serverless.yml:

Table: User HashKey: handle Attributes: name, description, followers_count

Table: Tweets HashKey: tweet_id Attributes: tweet, handle, created_at Index: tweet-index (hashKey: handle, sortKey: created_at)

At this point, you'll need to mock fake data using Faker.

You'll also need to make sure your IAM Roles are set properly in the serverless.yml, so that Lambda can access DynamoDB. These are defined in the serverless.yml file in the repository.

If you're interested in knowing more about IAM permissions, here's an excellent primer.

Creating the GraphQL resolver

Let's set it up for getUserInfo to retrieve data from DynamoDB. I'll be breaking down the code for you.

First of all, we need to define how the getUserInfo and tweets fields will fetch the data:

1export const resolvers = {
2  Query: {
3    getUserInfo: (root, args) => getUserInfo(args),
4  },
5  User: {
6    tweets: (obj, args) => getPaginatedTweets(obj.handle, args),
7  },
8};

Then we'll query the DynamoDB table index, tweet-index, to retrieve paginated tweets for a given user handle. Passing the nextToken parameter implies paginating through the result set, which is passed as ExclusiveStartKey.

If the result contains LastEvaluatedKey (as shown here), then return it as nextToken:

1  getPaginatedTweets(handle, args) {
2    return promisify(callback => {
3      const params = {
4        TableName: 'Tweets',
5        KeyConditionExpression: 'handle = :v1',
6        ExpressionAttributeValues: {
7          ':v1': handle,
8        },
9        IndexName: 'tweet-index',
10        Limit: args.limit,
11        ScanIndexForward: false,
12      };
13
14      if (args.nextToken) {
15        params.ExclusiveStartKey = {
16          tweet_id: args.nextToken.tweet_id,
17          created_at: args.nextToken.created_at,
18          handle: handle,
19        };
20      }
21
22      docClient.query(params, callback);
23    })
24    //then parse the result
25  },

For the getUserInfo field, you can similarly retrieve the results as shown below:

1  getUserInfo(args) {
2    return promisify(callback =>
3      docClient.query(
4        {
5          TableName: 'Users',
6          KeyConditionExpression: 'handle = :v1',
7          ExpressionAttributeValues: {
8            ':v1': args.handle,
9          },
10        },
11        callback
12      )
13    )
14  //then parse the result
15 }

The end result? You've got a GraphQL endpoint that reliably scales! 💥

Let's test it out locally and then deploy it to production.

Clone the Git repo and install dependencies

git clone https://github.com/serverless/serverless-graphql.git

cd app-backend/dynamodb
yarn install

To test the GraphQL endpoint locally on my machine, I'm using these three plugins for the Serverless Framework: Serverless Offline, Serverless Webpack and Serverless DynamoDB Local.

These plugins make it super easy to run the entire solution E2E locally without any infrastructure. It will also help us debug issues faster.

If you've followed me this far, DynamoDB will now be available and running on your local machine at http://localhost:8000/shell:

!Live Example

For deploying your endpoint in production, please run:

cd app-backend/dynamodb
yarn deploy-prod

Note: We also have a previous post on making a serverless GraphQL API, which covers the process in more detail.

Setting up the RDS backend

DynamoDB is great for fetching data by a set of keys; but using a relational database like RDS will provide us the flexibility to model complex relationships and run aggregation mechanisms at runtime.

Let's look at the process of connecting your Lambda to RDS.

We have explained the requirements to set up RDS in production in the readme, but you can test your GraphQL endpoint locally using SQLite3 (without any AWS infrastructure). Boom!

Data modeling and table creation

We will create two tables (Users and Tweets) to store user and tweet info respectively, as described here.

Table: User Primary Key: user_id Attributes: name, description, followers_count

Table: Tweets Primary Key: tweet_id Attributes: tweet, handle, created_at, user_id

Then, you'll need to use Faker again to mock some fake data.

Set your Lambda in the same VPC as RDS for connectivity, and configure knexfile for database configuration in development and production environment.

(The serverless-graphql repo supports connecting to SQLite, MySQL, Aurora, or Postgres using Knex configurations—a powerful query builder for SQL databases and Node.js.)

1const pg = require('pg');
2const mysql = require('mysql');
3
4module.exports = {
5  development: {
6    client: 'sqlite3',  // in development mode you can use SQLite
7    connection: {
8      filename: './dev.db',
9    },
10  },
11  production: {
12    client: process.env.DATABASE_TYPE === 'pg' ? 'pg' : 'mysql', // in production mode you can use PostgresQL, MySQL or Aurora
13    connection: process.env.DATABASE_URL,
14  },
15};

Let's go ahead and write our resolver functions.

The knex ORM layer queries the User table to resolve getUserInfo and returns a list of user attributes. Then, we join both Tweets and Users tables on user_id to resolve tweets. In the end, topTweet is returned using where, limit and orderBy clauses.

And it just works!

Here's the getUserInfo resolver:

1export const resolvers = {
2  Query: {
3    getUserInfo: (root, args) =>
4      knex('Users')
5        .where('handle', args.handle)
6        .then(users => {
7          const user = users[0];
8          if (!user) {
9            throw new Error('User not found');
10          }
11          return user;
12        })
13  }
14};

Here's the tweets resolver:

1  User: {
2    tweets: obj =>
3      knex
4        .select('*')
5        .from('Tweets')
6        .leftJoin('Users', 'Tweets.user_id', 'Users.user_id')
7        .where('handle', obj.handle)
8        .then(posts => {
9          if (!posts) {
10            throw new Error('User not found');
11          }
12
13          tweets = { items: posts };
14
15          return tweets;
16        }),
17  },

And here's the topTweet resolver:

1  User: {
2    topTweet: obj =>
3      knex('Tweets')
4        .where('handle', obj.handle)
5        .orderBy('retweet_count', 'desc')
6        .limit(1)
7        .then(tweet => {
8          if (!tweet) {
9            throw new Error('User not found');
10          }
11          return tweet[0];
12        }),
13  },

Run it locally on your machine (RDS instance not required).

Kickstart on local using SQLite

cd app-backend/rds
yarn install
yarn start

And deploy to production:

cd app-backend/rds
yarn deploy-prod

Note: When running in production, please make sure your database endpoint is configured correctly in config/security.env.prod.

REST wrapper

Last but not least—it's time for the REST API backend!

This use case is the most common when you have pre-existing microservices, and you want to wrap them around GraphQL. Don't worry; it's easier than you think.

We'll fetch data from Twitter's REST API, but it could very well be your own REST API. You'll need to create OAuth tokens here, OR use these test account tokens for faster setup.

In this case, we don't need to create tables or mock data because we will be querying real data. Let's look at how to resolve following field to find a list of Users being followed.

The consumerKey, consumerSecret and handle are passed as an input to the friends/list API:

1import { OAuth2 } from 'oauth';
2const Twitter = require('twitter');
3
4async function getFollowing(handle, consumerKey, consumerSecret) {
5  const url = 'friends/list';
6
7  const oauth2 = new OAuth2(
8    consumerKey,
9    consumerSecret,
10    'https://api.twitter.com/',
11    null,
12    'oauth2/token',
13    null
14  );
15
16  return new Promise(resolve => {
17    oauth2.getOAuthAccessToken(
18      '',
19      {
20        grant_type: 'client_credentials',
21      },
22      (error, accessToken) => {
23        resolve(accessToken);
24      }
25    );
26  })
27    .then(accessToken => {
28      const client = new Twitter({
29        consumer_key: consumerKey,
30        consumer_secret: consumerSecret,
31        bearer_token: accessToken,
32      });
33
34      const params = { screen_name: handle };
35
36      return client
37        .get(url, params)
38      //then parse the result
39}

Note: A complete example is given here. You can also check out Saeri's walkthrough on building a Serverless GraphQL Gateway on top of a 3rd Party REST API.

Go ahead and run it locally on your machine:

cd app-backend/rest-api
yarn install
yarn start

And deploy to production:

cd app-backend/rest-api
yarn deploy-prod

Client Integrations (Apollo ReactJS, Netlify, and S3)

The serverless-graphql repository comes with two client implementations

If you are new to the ReactJs + Apollo Integration, I would recommend going through these tutorials.

The code for apollo-client in the serverless-graphql repo is here.

To start the client on local, first start any backend service on local. For example:

cd app-backend/rest-api
yarn install
yarn start

Now, make sure http://localhost:4000/graphiql is working.

If you kickstart Apollo Client (as shown below), you will have a react server running on your local machine. The setup is created using create react app:

cd app-client/apollo-client
yarn install
yarn start

!Live Example

In production, you can also deploy the client on Netlify or AWS S3. Please follow the instructions here.

Performance Analysis

Which brings us to the best part. Let's dive into the performance of our Serverless GraphQL endpoint.

We can measure the E2E latency of the API call by adding the network delay, AWS API Gateway response time, and AWS Lambda execution time, which includes execution time of the backend query. For this analysis, my setup consists of:

Baseline Dataset: 500 Users, 5000 Tweets (10 tweets per user) where each user record is less than 1 KB in size.

Region: All the resources were created in aws us-east-1, and API calls were made from 2 EC2 nodes in the same region.

Lambda Memory size = 1024 MB

Lambda execution time with DynamoDB backend

I simulated 500 users making the API call with a ramp-up period of 30 secs hitting two separate GraphQL endpoints (one with DynamoDB and the other one with PostgreSQL). All the 500 users posted the same payload; there is no caching involved for this analysis.

The service map below was created by AWS X-Ray:

For 99% of the simulated calls, DynamoDB took less than 15ms; but 1% of the calls had high response times which resulted in overall avg latency of 25ms. The Lambda execution time was 60ms; the time spent on the Lambda service itself was 90ms on average (we can optimize the Lambda execution time, but not service time itself).

Cold Starts

Approximately 2% of the total calls were cold starts. I noticed an additional latency of 700ms-800ms in Lambda execution time for the first API call, which came from initialization of the Lambda container itself.

This additional latency was observed in both endpoints (DynamoDB and PostgreSQL). There are ways to optimize this overhead, and I would strongly recommend you to read up on them here.

Increase in Lambda memory size limit by 2x and 3x

Increasing the lambda memory size by 2x (2048 MB) improved the overall latency of the Lambda service by 18%; increasing by 3x (3008 MB) improved the latency by 38%.

The latency of DynamoDB backend remained constant, and the Lambda execution time itself improved within 20% range for 3x memory:

Lambda Service Latency (1GB Memory)	Lambda Service Latency (2GB Memory)

Lambda execution time with PostgreSQL backend

With RDS, the Lambda execution time increased along with the size of the data.

When I increased the Tweets dataset by a factor of 100 (to 1000 tweets per user), I found the response time increased by 5x-10x. This possibly happens because we are joining the Tweets and Users tables on the fly, which results in more query execution time.

Query performance can be further improved by using indexing and other database optimizations. Conversely, DynamoDB latency remains constant with increasing dataset size (which is expected by design).

API Gateway and Network Latency

On average, the E2E response time of the GraphQL endpoint ranges from 100ms-200ms (including the Lambda execution time). Hence, on API Gateway the network latency is approximately between 40-100 ms, which can be further reduced by caching.

You might ask, "Why do we need API Gateway? Can't we just use Lambda to fetch the GraphQL response?"

Well. This analysis truly merits a separate blog of its own, where we can do an in-depth study of all the latencies and query optimizations. Or you can also read this forum discussion about it.

Selling GraphQL in your organization

When using new tech, always a discussion of “do we want this, or not?”

Ready to switch everything over, but not sure about how to convince the backend team? Well, here’s how I’ve seen this play out several times, with success.

First, the frontend team would wrap their existing REST APIs in a serverless GraphQL endpoint. It added some latency, but they were able to experiment with product changes way faster and could fetch only what was needed.

Then, they would use this superior workflow to gain even more buy-in. They would back up this buy-in by showing the backend team that nothing had broken so far.

Now I’m not saying you should do that, but also, if you wanted to, there it is for your consideration. My lips are sealed.

Special thanks!

First of all, I would like to thank Nik Graf, Philipp Müns and Austen Collins for kickstarting open source initiatives to help people build GraphQL endpoints easily on Serverless platforms. I have personally learned a lot during my work with you guys!

I would also like to give a shout to our open source committers - Jon, Léo Pradel, Tim, Justin, Dan Kreiger and others.

Thanks Andrea and Drake Costa for reviewing the final draft of this post and Rich for helping me out with questions.

Last but not the least, I would like to thank Steven for introducing me to GraphQL.

I hope you guys liked my first blog post! Feel free to reach out and let me know what you think.

Adopting GraphQL and Apollo in a Legacy Application

Trello is currently undergoing a big technological shift on the frontend, and an important part of this shift has been our approach to incrementally adopting GraphQL. We’ve used a client-side GraphQL schema (in a creative way) to quickly unlock the benefits of GraphQL. If you’re considering moving to GraphQL in your product, then this would be a great place to start before investing time and energy into a server-side schema.

For the past 10 years, the development teams at Trello have been writing features in Backbone and CoffeeScript backed by a solid REST API and WebSockets. This was bleeding edge when we first started working in this architecture, but as the codebase has grown, we have begun to reach the limits of its capabilities. For example, it became very difficult at times to understand how a piece of data ended up in our client-side cache, and whether or not it was stale. Not to mention, our client-side cache was entirely proprietary (built on top of Backbone models). Over the past 2 years, we’ve been working on modernizing our architecture by moving to React and TypeScript, and as part of that we began to explore the idea of adopting GraphQL (and Apollo).

We spent a lot of time considering how we’d implement GraphQL before making any code changes, and we began to have a very good understanding of what our end-state architecture would look like, but in practice, we didn’t know how we’d move from where we were today into this new world. With tens of millions of users and 250,000 lines of code, it was critical that we could head towards our architecture in an incremental way, whilst still delivering new features and keeping regression risk low. We realized that one of the areas where we’d struggle would be GraphQL. We now had many developers with extensive React experience, but almost no-one who’d written a production-level GraphQL schema. We needed to begin to understand what impact our schema decisions would have, and how we could consume the schema from React, without locking ourselves in to a publicly supported GraphQL API.

Why did we choose to adopt GraphQL?

There are some very sizable benefits to adopting a GraphQL schema in your frontend, even if the implementation itself is backed by REST. The first is knowing the shape of the data being requested ahead of time. In REST, you can have an API endpoint that has existed for many years (let’s use /1/board/{boardId} as an example) that has grown organically over time to return more and more data. It becomes increasingly difficult to say, with confidence, the subset of that data that is actually required by the frontend feature making that request. In Trello, as we started to convert some of our most used REST requests to their equivalent GraphQL queries, it became abundantly clear that we were over-fetching a lot of data but it was very difficult to tell which data wasn’t actually required by the UI.

The second is type safety. GraphQL schemas are strictly typed, so it becomes trivial to generate static types for the response of any given query (in Trello we are using graphql-code-generator). When combined with an editor like VSCode, these static types give both an excellent developer experience and a much higher level of confidence when requesting remote data. It can also eliminate an entire class of “contract based” testing that ensures your API is delivering the data that is promised by its specification, as the type generation for queries will fail (at build time instead of runtime) if the schema has changed in a way that prevents it from being able to satisfy a query. This type safety greatly simplifies a huge range of complex issues when integrating between a frontend and an API, as your app won’t even compile if any of its queries can’t be satisfied by your GraphQL schema.

Why was Apollo the obvious choice for Trello?

Choosing a frontend framework to interact with GraphQL could be an article just on its own. Our family at Atlassian had a lot of experience with moving to GraphQL incrementally, with Apollo as the framework of choice, so it made sense for us to leverage this knowledge and make the same choice. Obviously if there was a compelling downside to Apollo we would have considered other alternatives (relay, amplify, urql), but it had already been used heavily in production with no significant drawbacks.

Apollo can be thought of as the “glue” that holds our components and our GraphQL queries together. It comes with some pretty big benefits out of the box that make it an excellent choice for managing your remote data.

The biggest benefit by far is declarative data fetching. In your typical REST application (backed by something like Redux or MobX), components are responsible for imperatively requesting data when they are mounted, or when some interaction occurs in the UI. This commonly leads to the following situation:

declarative-data-fetching-example

In this scenario we have 2 separate components backed by the same data. The question that immediately follows is: which component is responsible for triggering the fetch from your REST API? You can end up with many of your components containing complicated logic in their lifecycle methods like this:

❌ This is bad

class CardDescription extends React.Component {
  componentDidMount() {
    const { card, cardId } = this.props;

    // If we don't have a card loaded in our cache, fetch the full card
    if (!card) {
      dispatch(loadFullCard(cardId));
      return;
    }

    // If we do have a card in our cache, but no description, we want to load it
    if (!card.description) {
      dispatch(loadCardDescription(cardId));
      return;
    }
  }

  render() {
    const { card, isLoading } = this.props;
    if (isLoading) {
      return null;
    }
    return <span>{card.description}</span>;
  }
}

This is particularly troublesome in Trello, as much of the app is still written in Backbone and CoffeeScript, but we want to safely be able to write new, isolated, “leaf” components with confidence that they will (if necessary) fetch the data they require for rendering. Luckily, Apollo does just this. Each component specifies its query, containing only the data it needs for rendering and Apollo manages making the requests and caching the data for you.

✅ This is much better

const CARD_DESCRIPTION_QUERY = gql`
  query CardDescription($cardId: ID!) {
    card(id: $cardId) {
      description
    }
  }
`;

const CardDescription = ({ cardId }) => {
  const { data, loading } = useQuery({ query: CARD_DESCRIPTION_QUERY, variables: { cardId }});
  if (loading) {
    return null;
  }

  return <span>{data.card.description}</span>
}

In this example, all the potentially complicated logic of checking whether you have the data required to render your component (and whether a fetch might be required) is managed entirely by Apollo. The end result is that we can write a new component that specifies its data requirements, and mount it anywhere in the app with confidence that it will render correctly, even if the data wasn’t present when the component was mounted. Not to mention, we also don’t need to maintain all the code required for managing a cache anymore.

We now had a good picture of what we wanted our components to look like, and a vision of how this could be solved with Apollo and GraphQL, but how could we achieve this without investing in a complete re-write of our REST API?

Using a client-side schema to incrementally adopt GraphQL and Apollo

Wrapping an existing REST API server to support GraphQL can be a pretty significant undertaking. What if there are costly refactors that need to be made (around authentication for example)? What if the application you work on is split across multiple teams? What if there are multiple upstream services required by the frontend that need to be consolidated behind a single GraphQL API? The good news is that you don’t have to wait!

There are many advantages to adopting a server-side GraphQL solution, including defense against over-fetching and a more explorable API for third party consumers. However, many of the advantages in the frontend can still be enjoyed using an existing REST API (or any remote data source). How? By wrapping the API with a client-side GraphQL-based library (like apollo-client). In fact, this approach can actually be beneficial when compared to diving straight into a server-side solution for a few reasons:

✅ Starting with a client-side solution results in a faster developer loop,where schema changes are applied, consumed and contract-tested all in the same PR.

✅ The GraphQL schema can more easily be built incrementally based on the requirements of the frontend, rather than trying to convert your entire REST API into a GraphQL-based solution in one go.

✅ A client-side schema gives a great “starting point” should you eventually decide to take the plunge and invest in a real GraphQL server.

There are a few ways to go about implementing a client-side GraphQL approach, the 2 most popular being apollo-link-rest and local resolvers.

Getting up and running with apollo-link-rest

apollo-link-rest allows you to construct a client-side schema, but use the @rest directive to provide information about how the query should be executed via REST, eg:

query MyBoards {
  boards @rest(type: "Board", path: "/1/board/") {
    name
  }
}

This is the “lowest touch” solution to getting up and running with Apollo. It’s a great way to start reaping the benefits of Apollo’s caching and declarative data-fetching. However, there are a few drawbacks to this approach.

The first is that the structure of your REST API will end up directly impacting the structure of your GraphQL schema. This might be okay in some cases, but often this can result in making compromises to your schema that wouldn’t have manifested in a server-side schema. We ideally want our schema to be directly portable to a server-side solution at some point, so allowing these compromises in the short-term can end up coming back to bite us in the future.

Secondly, apollo-link-rest is a “leaky” abstraction, in the sense that your queries are aware that there is no GraphQL server involved. i.e. boards @rest(type: "Board", path: "/1/board/") {.

Again, this makes it more difficult to untangle should you plan on making the jump to a server-side solution in the future. In an ideal world, our components (and queries) wouldn’t have to change at all if we were to adopt a GraphQL server.

Lastly, with “nested resources”, some queries can get very expensive. Take the following query:

query MyBoards {
  boards {
    name
    lists {
      name
      cards {
        name
        dueDate
      }
    }
    members {
      username
      email
    }
  }
}

Using apollo-link-rest for a query like this would end up “fanning out” and potentially making many individual REST API requests, which can obviously be quite detrimental to performance (remember that this is client to server, not server to server). Thankfully there exists a solution that gives us an escape hatch for these issues.

Achieving greater flexibility with local resolvers

Apollo Client now provides tools for managing client-side state out of the box. Typically, this solution is used for storing/accessing local UI state using Apollo, but it also works extremely well for querying a REST API. The same mechanisms that exist in a GraphQL server (a schema paired with resolvers) are used for managing this data, so the end result is something that much more closely resembles a server-side GraphQL solution.

Effectively, this approach boils down to:

Writing a GraphQL schema (the same way you would if it existed on the server)
Writing local resolvers that fetch the requested data from your REST API

Now your components are entirely shielded from the fact that you are wrapping a REST API on the client, with queries looking like this:

query {
  member(id: "me") @client {
    id
    fullName
    boards(filter: open) {
      name
      lists {
        name
      }
    }
    organizations {
      name
      displayName
    }
  }
}

Which, apart from the @client directive, are exactly what they would look like with a GraphQL server. The query is then satisfied using your local resolvers, which might look something like this:

const resolvers = {
  Query: {
    member: (_, args) => {
      const results = await fetch(`/1/member/${args.id}`);
      const member = await results.json();
      return member;
    },
  },
  Member: {
    boards: (member, args) => {
      const results = await fetch(`/1/member/${member.id}/boards`);
      const boards = await results.json();
      return boards;
    },
    organizations: (member, args) => {
      const results = await fetch(`/1/member/${member.id}/organizations`);
      const organizations = await results.json();
      return organizations;
    },
  },
  Board: {
    lists: (board, args) => {
      const results = await fetch(`/1/boards/${board.id}/lists`);
      const lists = await results.json();
      return lists;
    },
  },
};

This approach is far more flexible than apollo-link-rest, and allows you to encapsulate all knowledge of the REST API to your resolvers. Additionally, if your server implementation is based on JavaScript, this can become a great starting point for a Node-based GraphQL server. But what about the performance issues we mentioned earlier, related to the “fanning out” of requests when querying nested resources?

Trello’s Solution

Luckily for us, Trello’s REST API actually supports many of the features that make a GraphQL endpoint so appealing, namely:

Field narrowing (eg. https://trello.com/1/members/me?fields=username,email)
Nested resource expansion (eg. https://trello.com/1/members/me?boards=open&board_fields=name)

A request to a REST API that supports field narrowing and nested resource expansion has a lot in common with a GraphQL query. With this in mind, we were able to write a “GraphQL query → REST API URL” translation layer that turns a query like this:

query {
  member(id: "me") @client {
    id
    fullName
    boards(filter: open) {
      name
      lists {
        name
      }
    }
    organizations {
      name
      displayName
    }
  }
}

Into a single REST API request like this:

https://trello.com/1/member/me?fields=id,fullName&boards=open&board_fields=name&board_lists=all&board_list_fields=name&organizations=all&organization_fields=name,displayName

Translating a GraphQL query into a REST API request

We achieve this by using our own generic local resolver to traverse the GraphQL query and generate a single REST URL with all of the required query parameters. A single data structure specifies the query parameters required to “expand” the nested resources, and to narrow the requested fields down to only what is required by the GraphQL query. Here’s a small snippet of the data structure that drives this traversal (including the configuration that makes the above query possible).

/**
 * This data structure represents valid 'chains' of nested resources according
 * to the Trello API.
 * See https://developers.trello.com/reference#understanding-nested-resources
 *
 * @name
 * Represents the node's name as it would appear in a graphql query. For
 * example:
 *
 * board {
 *   cards {
 *     checklists
 *   }
 * }
 *
 * Would be expected to match a 'path' down this tree according to the `name`
 * property.
 *
 * @nodeToQueryParams
 * A function which is called when parsing a graphql query into query params
 * for a REST API request. It's given a FieldNode and expected to return all
 * the necessary query params to satisfy the data for that given node
 *
 * @nestedResources
 * Recursive property used to define the 'tree' of nested resources according to
 * the above.
 *
 */
const VALID_NESTED_RESOURCES: NestedResource[] = [
  {
    name: 'member',
    nodeToQueryParams: (node) => ({
      fields: getChildFieldNames(node),
    }),
    nestedResources: [
      {
        name: 'organizations',
        nodeToQueryParams: (node) => ({
          organizations: getArgument(node, 'filter') || 'all',
          organization_fields: getChildFieldNames(node),
        }),
      },
      {
        name: 'boards',
        nodeToQueryParams: (node) => ({
          boards: getArgument(node, 'filter') || 'all',
          board_fields: getChildFieldNames(node),
        }),
        nestedResources: [
          {
            name: 'lists',
            nodeToQueryParams: (node) => ({
              board_lists: getArgument(node, 'filter') || 'all',
              board_list_fields: getChildFieldNames(node),
            }),
          },
        ],
      },
    ],
  },
];

The important part here is nodeToQueryParams. This is a function that is given a node of the query (this represents a portion of the GraphQL query), and returns all of the query parameters required to expand that nested resource. We have a few utilities (getArgument and getChildFieldNames) that make extracting the information from the GraphQL query node trivial.

This diagram shows, at a high level, how a single GraphQL query flows from the component to the server:

client-side-query-flow

This solution has allowed us to move forward with most of the benefits of a “real” GraphQL server, without needing to invest the time or resources to build it out.

✅ Declarative data fetching in our UI

✅ Type safety and “automatic” contract testing of the schema and its queries

✅ Protection from over-fetching data (using field-narrowing in our REST API)

✅ Clear transition path for migrating our client-side schema to the server

✅ An environment that makes it easy to learn GraphQL without the need to support third-party consumers

This also comes with some drawbacks, but they are minor in comparison (and disappear once you move to a server-side GraphQL schema):

❌ Cannot be consumed by other clients (e.g. Mobile, PowerUp developers)

❌ Performance implications of client to server network latency

❌ The page weight implications of additional client libraries and schema information

In the 6 months since we started down the path of GraphQL adoption, we have reached roughly 80% coverage of our REST API with our schema, and shipped multiple new features to production. Overall the feedback from Trello developers has been overwhelmingly positive about both the developer experience and the simplicity of the resulting React components. Our vision for the next 18 months is that the client-side schema will have stabilized and we’ll actively move the schema to the server for the benefit of other consumers.

We’d love to hear your feedback on our approach, and whether you’ve tried something similar, or had success with an alternative approach!