Common issues in the data fetching layer
Imagine a resolver on a Product.qty
field that fetches the current quantity in
stock for a product from the https://inventory.example.com/stock/PRODUCT_SKU
remote API.
Let's see what happens when we run a query like this on our GraphQL endpoint:
{
category("pants") {
name
products({ limit: 10 }) {
sku
name
qty
}
}
}
The problem
This query would lead to 12 HTTP requests from the server to the remote datasource:
- 1 request to fetch category information
- 1 request to fetch products in the category, with sku and name (in the best case)
- 10 additional requests to fetch products
qty
field
Furthermore, the qty
requests will be started only after the previous category
response has been received, leading to network waterfalls which will delay the
GraphQL response.
This problem is also known as the N+1 problem and DataLoaders are a way to solve this using batching and caching.
Batching
Batching, in this context, is the process of grouping every data that is required so that they could be retrieved in an efficient manner.
Let's suppose that our category contained 10 products PANT-01
, PANT-02
, …,
PANT-10
. Then, instead of making 10 HTTP requests for each product stock, we
could leverage batching to fetch all product inventories with a single remote
API call (ex:
https://inventory.example.com/stocks?skus=PANT-01,PANT-02,…,PANT-10
).
These kinds of batch endpoints are not always available on remote services. But if they exist, they can avoid many remote API calls and lead to better performance.
Caching
Caching mechanisms are useful for GraphQL resolvers in two ways:
- to prevent re-fetching the same data across different queries (or for different users)
- to prevent re-fetching the same data twice during the same query resolution
The first use case is something one may already know from other systems. In the previous example, Caching would allow remote API calls to be made when the first user visits the page, and retrieve this information from the cache for further visitors. The following GraphQL responses will then be faster and the remote system's load will decrease dramatically.
The second one is more specific to GraphQL. To understand its gain, we should consider the following query:
{
category("pants") {
name
products({ limit: 10 }) {
sku
name
qty
upsells({ limit: 2 }) {
sku
name
qty
}
}
}
}
If the product PANT-01
was an upsell of all other products in the pants
category, the inventory API would be requested again when resolving the
upsells.qty
field. Query-level caching prevents those extra-calls, by reusing
the response from products.qty
that has already been fetched previously to
resolve data.