Skip to main content

Normalized Stores

It's extremely common in a web application for a change made in one place to need to be reflected in other places in order to maintain consistency for the viewer. For example, imagine you're viewing the posts on the profile page for your friend Rebecca:

Rebecca

Gotta get down on Friday!

Rebecca

Heading to the studio now, just gotta grab a quick Starbucks first...

Friend1: Get their Pumpkin Spiced Latte!

Friend2: Sing your heart out!

Show more…

Rebecca

The new album is coming along nicely!

After you load the page, but before you click "Show more", Rebecca decides to change her name to "Becky". You click "Show more" and new comments render, including one from Becky. Seemingly magically, the entire page updates to render "Becky" everywhere:

Becky

Gotta get down on Friday!

Becky

Heading to the studio now, just gotta grab a quick Starbucks first...

Friend1: Get their Pumpkin Spiced Latte!

Friend2: Sing your heart out!

Becky: Pumpkin Spiced Latte FTW!

Friend2: I prefer the Cinnamon one.

Show more…

Becky

The new album is coming along nicely!

Is this because all the data for the entire page was refetched? No! It's due to the combined magics of object identification and normalized stores.

Object identification

One of the common patterns you'll see in GraphQL schemas is the GraphQL Global Object Identification Specification. This specification gives a standard way of identifying your entities (users, posts, comments, etc) via an id: ID! field. Each entity that supports this may declare so by its type implementing the Node interface. With this, you can uniquely reference any entity of any type using a single, stable, unique identifier. This pattern is used by the Relay GraphQL client among others.

This isn't the only way of identifying entities though; for example Apollo Client lets you identify an entity via a combination of attributes using the dataIdFromObject() callback (there are other options in newer versions of Apollo Client also). This defaults to combining the __typename and id field of the object into a unique identifier, which works assuming your entities have an id field, and that you always fetch both the id and the __typename. Apollo Client will automatically ask for __typename even if you forget, but it's up to you to ask for id.

note

Some other GraphQL clients, such as urql, also support normalized caching. Please see your GraphQL client's documentation for how to enable such a feature if it exists.

In this example, when you fetch the profile for Rebecca you find that the entity representing her is the User with ID U9DB7. Later, when you fetch the comments, you see the commenter 'Becky' has this same ID, and so you can determine that Rebecca must have changed her name to Becky. But going through and manually updating this everywhere that you've already fetched would be a lot of work; what you need is a normalized store.

Normalized storage

A normalized store stores the entire information fetched across all of your GraphQL requests (queries, mutations, subscriptions) in a format that means that when an entity is updated in one place, it's automatically updated everywhere.

As different normalized stores work in different ways, this article covers an extremely rough approximation; in order to keep things simple, the following example assumes the store is configured (and the schema is written) such that id is always a globally unique identifier.

As you know the globally unique identifier for each entity, and you know that the value of a node in the graph is independent of the path through which it was fetched, you can break a response up into its constituent entities, store these entities and the references between them into a "normalized store", and then later reconstitute the query response from the store when needed.

Example 1: Profile Fetch

For example, imagine the initial profile fetch is:

query ProfileFetch($id: ID!) {
user(id: $id) {
id
name
tagline
posts(first: 2) {
id
author {
id
name
}
body
likeCount
commentCount
comments(first: 2) {
id
author {
id
name
}
body
}
}
}
}

The response to this query is something like this:

{
"user": {
"id": "U9DB7",
"name": "Rebecca",
"tagline": "Gotta get down on Friday",
"posts": [
{
"id": "P3Q41",
"author": {
"id": "U9DB7",
"name": "Rebecca"
},
"body": "Heading to the studio now, just gotta grab a quick Starbucks first...",
"likeCount": 30,
"commentCount": 12,
"comments": [
{
"id": "C2PL1",
"author": {
"id": "U6EA1",
"name": "Friend1"
},
"body": "Get their Pumpkin Spiced Latte!"
},
{
"id": "C3YZ3",
"author": {
"id": "U7BZ3",
"name": "Friend2"
},
"body": "Sing your heart out!"
}
]
},
{
"id": "P3Q43",
"author": {
"id": "U9DB7",
"name": "Rebecca"
},
"body": "The new album is coming along nicely!",
"likeCount": 42,
"commentCount": 0,
"comments": []
}
]
}
}
note

I've highlighted redundant data about the same entity (Rebecca) referenced in multiple positions.

The goal is to write this to a normalized store, wherein there's a single entry for each entity in the result. You do this by reading through the results layer by layer, and when you see an entity you extract it, store it, and replace the original with a reference to the entity's id.

info

This is similar to "normalization" in databases: rather than storing repeated author data in a posts table, you instead store the author_id into posts and have a separate table for authors where you can look up the data for each author by their unique id.

The very first entity is the root Query object itself; it can be identified as simply Query:

{
"Query": {
/* ... */
}
}

There's only one field in this, but critically that field accepts arguments which would likely change the result of the field, so you must factor the arguments into the identity of the field; thus you can replace user with user(id:'U9DB7'):

{
"Query": {
"user(id:'U9DB7')": {
/* ... */
}
}
}

You can see from the data inside of this user field that it is an entity (it has an id, the entity identifier), so you can replace it with a reference. There are many ways of building a normalized store, but this one will leverage the fact that a response key in GraphQL can never contain a $ symbol and thus use an object with a $ref key to indicate a reference.

{
"Query": {
"user(id:'U9DB7')": {
"$ref": "U9DB7"
}
},
"U9DB7": {
/* ... */
}
}
Don't use this as a basis for your own normalized store

If you are actually looking to build your own normalized store, this isn't safe since custom scalars such as JSON are allowed to have $ref keys. If you are writing your normalized store in JS then you could use a Symbol or a class instance for references as examples of things that cannot be constructed from JSON.

Another thing: since id is an arbitrary string, the schema designer could set the id of a User record to Query which would cause issues with the design here. This is easy to solve by either prepending all identifiers (except the root operation type identifiers you choose yourself) with a specific symbol e.g. @ or $.

The value of the user with id U9DB7 can now be written into the normalized store, and the next entity can be considered:

{
"Query": {
"user(id:'U9DB7')": {
"$ref": "U9DB7"
}
},
"U9DB7": {
"id": "U9DB7",
"name": "Rebecca",
"tagline": "Gotta get down on Friday",
"posts(first:2)": [{ "$ref": "P3Q41" }, { "$ref": "P3Q43" }]
},
"P3Q41": {
/* ... */
},
"P3Q43": {
/* ... */
}
}

Continuing like this, the completed normalized store will end up something like:

{
"Query": {
"user(id:'U9DB7')": {
"$ref": "U9DB7"
}
},
"U9DB7": {
"id": "U9DB7",
"name": "Rebecca",
"tagline": "Gotta get down on Friday",
"posts(first:2)": [
{ "$ref": "P3Q41" },
{ "$ref": "P3Q43" },
]
},
"P3Q41": {
"id": "P3Q41",
"author": { "$ref": "U9DB7" }
"body": "Heading to the studio now, just gotta grab a quick Starbucks first...",
"likeCount": 30,
"commentCount": 12,
"comments(first:2)": [
{ "$ref": "C2PL1" },
{ "$ref": "C3YZ3" }
]
},
"C2PL1": {
"id": "C2PL1",
"author": { "$ref": "U6EA1" },
"body": "Get their Pumpkin Spiced Latte!"
},
"C3YZ3": {
"id": "C3YZ3",
"author": { "$ref": "U7BZ3" },
"body": "Sing your heart out!"
},
"U6EA1": {
"id": "U6EA1",
"name": "Friend1"
},
"U7BZ3": {
"id": "U7BZ3",
"name": "Friend2"
},
"P3Q43": {
"id": "P3Q43",
"author": { "$ref": "U9DB7" }
"body": "The new album is coming along nicely!",
"likeCount": 42,
"commentCount": 0,
"comments(first:2)": []
}
}

When you come to render this data in your application, you can regenerate the query response from the normalized cache by walking through the GraphQL document and extracting the specified entity for each field/arguments combo referenced.

Example 2: Fetching more comments

To fetch more comments, you can issue a request such as:

query LoadMoreComments($postId: ID!, $offset: Int!) {
post(id: $postId) {
id
comments(first: 2, offset: $offset) {
id
author {
id
name
}
body
}
}
}

using variables {"postId": "P3Q41", "offset": 2} and receive a response like:

{
"post": {
"id": "P3Q41",
"comments": [
{
"id": "C4NR1",
"author": {
"id": "U9DB7",
"name": "Becky"
},
"body": "Pumpkin Spiced Latte FTW!"
},
{
"id": "C4NR6",
"author": {
"id": "U7BZ3",
"name": "Friend2"
},
"body": "I prefer the Cinnamon one."
}
]
}
}

Note that the name has changed for the entity U9DB7 (previously: Rebecca).

The normalized result for this query is something like:

{
"Query": {
"post(id:'P3Q41')": {
"$ref": "P3Q41"
}
},
"P3Q41": {
"id": "P3Q41",
"comments(first:2,offset:2)": [{ "$ref": "C4NR1" }, { "$ref": "C4NR6" }]
},
"C4NR1": {
"id": "C4NR1",
"author": { "$ref": "U9DB7" },
"body": "Pumpkin Spiced Latte FTW!"
},
"C4NR6": {
"id": "C4NR6",
"author": { "$ref": "U7BZ3" },
"body": "I prefer the Cinnamon one."
},
"U9DB7": {
"id": "U9DB7",
"name": "Becky"
},
"U7BZ3": {
"id": "U7BZ3",
"name": "Friend2"
}
}

Merging normalized stores

As you merge the above store into the main normalized store, you will overwrite stale keys, resulting in:

 {
"Query": {
"user(id:'U9DB7')": {
"$ref": "U9DB7"
+ },
+ "post(id:'P3Q41')": {
+ "$ref": "P3Q41"
}
},
"U9DB7": {
"id": "U9DB7",
- "name": "Rebecca",
+ "name": "Becky",
"tagline": "Gotta get down on Friday",
"posts(first:2)": [
{ "$ref": "P3Q41" },
{ "$ref": "P3Q43" },
]
},
"P3Q41": {
"id": "P3Q41",
"author": { "$ref": "U9DB7" }
"body": "Heading to the studio now, just gotta grab a quick Starbucks first...",
"likeCount": 30,
"commentCount": 12,
+ "comments(first:2,offset:2)": [ { "$ref": "C4NR1" }, { "$ref": "C4NR6" } ],
"comments(first:2)": [
{ "$ref": "C2PL1" },
{ "$ref": "C3YZ3" }
]
},
"C2PL1": {
"id": "C2PL1",
"author": { "$ref": "U6EA1" },
"body": "Get their Pumpkin Spiced Latte!"
},
"C3YZ3": {
"id": "C3YZ3",
"author": { "$ref": "U7BZ3" },
"body": "Sing your heart out!"
},
+ "C4NR1": {
+ "id": "C4NR1",
+ "author": { "$ref": "U9DB7" },
+ "body": "Pumpkin Spiced Latte FTW!"
+ },
+ "C4NR6": {
+ "id": "C4NR6",
+ "author": { "$ref": "U7BZ3" },
+ "body": "I prefer the Cinnamon one."
+ },
"U6EA1": {
"id": "U6EA1",
"name": "Friend1"
},
"U7BZ3": {
"id": "U7BZ3",
"name": "Friend2"
},
"P3Q43": {
"id": "P3Q43",
"author": { "$ref": "U9DB7" }
"body": "The new album is coming along nicely!",
"likeCount": 42,
"commentCount": 0,
"comments(first:2)": []
}
}

Now when the first query re-renders, it will re-read the normalized data from the store and pick up Becky's new name, and render it everywhere, as we saw above in the introduction.

Mutations

Normalized caches really shine when used with mutations; it's generally the case that when you perform a mutation you're modifying data that is related to what you have rendered (updating a record, adding an entry to or removing an entry from a list, etc), and so having the results of that mutation display consistently throughout your application makes for a consistent experience for the user - by fetching the resulting data on the mutation payload and updating it in the normalized store, this can happen automatically for you.

This experience can be enhanced further with the usage of the "optimistic updates" technique: the application can guess what the server is going to say, write the guess to the normalized store, and instantly re-render showing the user the new data everywhere in a consistent fashion. Assuming everything went right, this is a very pleasant low-latency experience for the user. If something goes wrong, you need to roll back this change and inform the user of the error... so it's generally best to use it only in situations where the happy path is the likely path!

Subscriptions

Subscriptions also benefit significantly from normalized caches; in fact you don't even need to "render" your subscription results anywhere! If your subscription queries data that you have already fetched and rendered via a regular query, then when the subscription yields data, it can automatically be written to the normalized store, which will trigger the application to re-render with the new values everywhere. GraphQL's real-time features can be fantastic when backed with a normalized store!