Ask a Question

Posting List and Tablet

Posting Lists and Tablets

Posting lists and tablets are internal storage mechanisms and are generally hidden from users or developers, but logs, core product code, blog posts and discussions about Dgraph may use the terms “posting list” and “tablet.”

Posting lists are a form of inverted index. Posting lists correspond closely to the RDF concept of a graph, where the entire graph is a collection of triples, <subject> <predicate> <object>. In this view, a posting list is a list of all triples that share a <subject>+<predicate> pair.

(Note that in Dgraph docs, we typically use the term “relationship” rather than predicate, but here we will refer to predicates explicitly.)

The posting lists are grouped by predicate into tablets. A tablet therefore has all data for a predicate, for all subject UIDs.

Tablets are the basis for data shards in Dgraph. In the near future, Dgraph may split a single tablet into two shards, but currently every data shard is a single predicate. Every server then hosts and stores a set of tablets. Dgraph will move or allocate different tablets to different servers to achieve balance across a sharded cluster.

Example

If we’re storing friendship relationships among four people, we may have four posting lists represented by the four tables below:

Node Attribute Value
person1 friend person2
person1 friend person4

 

Node Attribute Value
person2 friend person1

 

Node Attribute Value
person3 friend person2
person3 friend person4

 

Node Attribute Value
person4 friend person2
person4 friend person1
person4 friend person3

 

The corrsponding posting lists would be something like:

person1UID+friend->[person2UID, person4UID]
person2UID+friend->[person1UID]
person3UID+friend->[person2UID, person4UID]
person4UID+friend->[person1UID, person2UID, person3UID]

 

Similarly, a posting list will also hold all literal value properties for every node. E.g. consider the names of people in these three tables:

Node Attribute Value
person1 name “James”
person1 name “Jimmy”
person1 name “Jim”

 

Node Attribute Value
person2 name “Rajiv”

 

Node Attribute Value
person3 name “Rachel”

  The posting lists would look like:

person1UID+name->["James", "Jimmy", "Jim"]
person2UID+friend->["Rajiv"]
person3UID+friend->["Rachel"]

 

Note that person4 has no name attribute specified, so that posting list would not exist.

In these examples, two predicates (relations) are defined, and therefore two tablets will exist.

The tablet for the friend predicate will hold all posting lists for all “friend” relationships in the entire graph. The tablet for the name property will hold all posting lists for name in the graph.

If other types such as Pets or Cities also have a name property, their data will be in the same tablet as the Person names.

Performance implications

A key advantage of grouping data into predicate-based shards is that we have all the data to do one join in one tablet on one server/shard. This means, one RPC to the machine serving that tablet will be adequate, as documented in How Dgraph Minmizes Network Calls.

Posting lists are the unit of data access and caching in Dgraph. The underlying key-value store stores and retrieves posting lists as a unit. Queries that access larger posting lists will use more cache and may incur more disk access for un-cached posting lists.