Version: 3.x

Common issues in the data fetching layer

Imagine a resolver on a Product.qty field that fetches the current quantity in stock for a product from the https://inventory.example.com/stock/PRODUCT_SKU remote API.

Let's see what happens when we run a query like this on our GraphQL endpoint:

{
  category("pants") {
    name
    products({ limit: 10 }) {
      sku
      name
      qty
    }
  }
}

The problem

This query would lead to 12 HTTP requests from the server to the remote datasource:

1 request to fetch category information
1 request to fetch products in the category, with sku and name (in the best case)
10 additional requests to fetch products qty field

Furthermore, the qty requests will be started only after the previous category response has been received, leading to network waterfalls which will delay the GraphQL response.

This problem is also known as the N+1 problem and DataLoaders are a way to solve this using batching and caching.

Batching

Batching, in this context, is the process of grouping every data that is required so that they could be retrieved in an efficient manner.

Let's suppose that our category contained 10 products PANT-01, PANT-02, …, PANT-10. Then, instead of making 10 HTTP requests for each product stock, we could leverage batching to fetch all product inventories with a single remote API call (ex: https://inventory.example.com/stocks?skus=PANT-01,PANT-02,…,PANT-10).

These kinds of batch endpoints are not always available on remote services. But if they exist, they can avoid many remote API calls and lead to better performance.

Caching

Caching mechanisms are useful for GraphQL resolvers in two ways:

to prevent re-fetching the same data across different queries (or for different users)
to prevent re-fetching the same data twice during the same query resolution

The first use case is something one may already know from other systems. In the previous example, Caching would allow remote API calls to be made when the first user visits the page, and retrieve this information from the cache for further visitors. The following GraphQL responses will then be faster and the remote system's load will decrease dramatically.

The second one is more specific to GraphQL. To understand its gain, we should consider the following query:

{
  category("pants") {
    name
    products({ limit: 10 }) {
      sku
      name
      qty
      upsells({ limit: 2 }) {
        sku
        name
        qty
      }
    }
  }
}

If the product PANT-01 was an upsell of all other products in the pants category, the inventory API would be requested again when resolving the upsells.qty field. Query-level caching prevents those extra-calls, by reusing the response from products.qty that has already been fetched previously to resolve data.

The problem​

Batching​

Caching​

The problem

Batching

Caching