Walkthrough of Azure Cosmos DB Graph (Gremlin)


I show you the general tasks of Azure Cosmos DB Gremlin (graph) for your first use, and a little bit dive into the practical usage of graph query.

This post will also help the beginner to learn how to use the graph database itself, since Azure Cosmos DB is one of the compliant database to the popular graph framework "Apache TinkerPop".

Create your database and collection

First you must prepare your database.

With Azure Cosmos DB, you must provision account, database, and collection just like Azure Cosmos DB NoSQL database.
You can create these objects using API (REST or SDK), but here we use UI of Azure Portal.

When you create Azure Cosmos DB account in Azure Portal, you must select "Gremlin (graph)" as the supported API as the following picture.

After you've created your account, next you create your database and collection with Data Explorer as follows.
The following "Database id" is the identifier for a database, and "Graph id" is for a collection.

Calling API

Before calling APIs, please copy your account key in your Azure Portal.

Moreover please copy the Gremlin URI in your Azure Portal.

Now you can develop your application with APIs.

As I mentioned before, Azure Cosmos DB is one of the database that is compliant with TinkerPop open source framework. Therefore you can use a lot of existing tools compliant with TinkerPop. (See "Apache TinkerPop" for language drivers and other tools.)
For example, if you use PHP for your programming, you can easily download and install gremlin-php by the following command. (You can also use App Service Editor in Azure App Services. No need to prepare your local physical machine for trial !)

curl http://getcomposer.org/installer | php
php composer.phar require brightzone/gremlin-php "2.*"

The following is the simple PHP program code which is retrieving all vertices (nodes) in graph database. (Later I explain about the gremlin language.)

Note that :

  • host is the host name of the previously copied gremlin uri.
  • username is /dbs/{your database name}/colls/{your collection name}.
  • password is the previously copied account key.
<?php
require_once('vendor/autoload.php');
use \Brightzone\GremlinDriver\Connection;

$db = new Connection([
  'host' => 'graph01.graphs.azure.com',
  'port' => '443',
  'graph' => 'graph',
  'username' => '/dbs/db01/colls/test01',
  'password' => 'In12qhzXYz...',
  'ssl' => TRUE
]);
$db->open();
$res = $db->send('g.V()');
$db->close();

// output the all vertex in db
var_dump($res);
?>

The retrieved result is so called GraphSON format as follows. In this PHP example, the result will be serialized to the PHP array with the following same format.

{
  "id": "u001",
  "label": "person",
  "type": "vertex",
  "properties": {
    "firstName": [
      {
        "value": "John"
      }
    ],
    "age": [
      {
        "value": 45
      }
    ]
  }
}

You can also use SDK for Azure Cosmos DB graph (.NET, Java, Node.js), which is specific for Azure Cosmos DB.
Especially, if you need some specific operations for Azure Cosmos DB (ex: creating or managing database, collections, etc), it's better to use this SDK.

For example, the following is the C# example code using Azure Cosmos DB Graph SDK. (Here the gremlin language is also used. Later I explain about the details.)
Note that the endpoint differs from the previous gremlin uri.

using Microsoft.Azure.Documents;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.Documents.Linq;
using Microsoft.Azure.Graphs;
using Newtonsoft.Json;

static void Main(string[] args)
{
  using (DocumentClient client = new DocumentClient(
    new Uri("https://graph01.documents.azure.com:443/"),
    "In12qhzXYz..."))
  {
    DocumentCollection graph = client.CreateDocumentCollectionIfNotExistsAsync(
      UriFactory.CreateDatabaseUri("db01"),
      new DocumentCollection { Id = "test01" },
      new RequestOptions { OfferThroughput = 1000 }).Result;
    // drop all vertex
    IDocumentQuery<dynamic> query1 =
      client.CreateGremlinQuery<dynamic>(graph, "g.V().drop()");
    dynamic result1 = query1.ExecuteNextAsync().Result;
    Console.WriteLine($"{JsonConvert.SerializeObject(result1)}");
    // add vertex
    IDocumentQuery<dynamic> query2 =
      client.CreateGremlinQuery<dynamic>(graph, "g.addV('person').property('id', 'u001').property('firstName', 'John')");
    dynamic result2 = query2.ExecuteNextAsync().Result;
    Console.WriteLine($"{JsonConvert.SerializeObject(result2)}");
  }

  Console.WriteLine("Done !");
  Console.ReadLine();
}

Before building your application with Azure Cosmos DB SDK, you must install Microsoft.Azure.Graphs package with NuGet. (Other dependent libraries like Microsoft.Azure.DocumentDB, etc are also installed in your project.)

Interactive Console and Visualize

As I described above, TinkerPop framework is having the various open source utilities contributed by communities.

For example, if you want to run the gremlin language (query, etc) with the interactive console, you can use Gremlin Console.
Please see the official document "Azure Cosmos DB: Create, query, and traverse a graph in the Gremlin console" for details about Gremlin Console with Azure Cosmos DB.

There're also several libraries or software for visualizing gremlin-compatibile graph in Tinkerpop framework.

If you're using Visual Studio and Azure Cosmos DB, the following Github sample source (written as ASP.NET web project) is very easy to use for visualizing Azure CosmosDB graph.

[Gitub] Azure-Samples / azure-cosmos-db-dotnet-graphexplorer
https://github.com/Azure-Samples/azure-cosmos-db-dotnet-graphexplorer

Gremlin Language

As you've seen in my previous programming example, it's very important to understand the gremlin language (query, etc) for your practical use.
Let's dive into the gremlin language (query, etc), which is not deep, but practical level of understanding.

First, we simply create the vertex (node).
The following is creating 2 vertices of "John" and "Mary".

g.addV('employee').property('id', 'u001').property('firstName', 'John').property('age', 44)
g.addV('employee').property('id', 'u002').property('firstName', 'Mary').property('age', 37)

The following is creating the edge between 2 vertices of John and Mary. (This sample means that John is a manager for Mary.)
As you can see, you can specify (identify) the targeting vertex with the previous "id" property.

g.V('u002').addE('manager').to(g.V('u001'))

In this post, we use the following simple structure (vertices and edges) for our subsequent examples.

g.addV('employee').property('id', 'u001').property('firstName', 'John').property('age', 44)
g.addV('employee').property('id', 'u002').property('firstName', 'Mary').property('age', 37)
g.addV('employee').property('id', 'u003').property('firstName', 'Christie').property('age', 30)  
g.addV('employee').property('id', 'u004').property('firstName', 'Bob').property('age', 35)
g.addV('employee').property('id', 'u005').property('firstName', 'Susan').property('age', 31)
g.addV('employee').property('id', 'u006').property('firstName', 'Emily').property('age', 29)
g.V('u002').addE('manager').to(g.V('u001'))
g.V('u005').addE('manager').to(g.V('u001'))
g.V('u004').addE('manager').to(g.V('u002'))
g.V('u005').addE('friend').to(g.V('u006'))
g.V('u005').addE('friend').to(g.V('u003'))
g.V('u006').addE('friend').to(g.V('u003'))
g.V('u006').addE('manager').to(g.V('u004'))

The following is the example which retrieves vertices with some query conditions. This retrieves the employees whose age is greater than 40. (If you query edges, use g.E() instead of g.V().)

g.V().hasLabel('employee').has('age', gt(40))

As I described above, the retrieved result is so called GraphSON format as follows.

{
  "id": "u001",
  "label": "employee",
  "type": "vertex",
  "properties": {
    "firstName": [
      {
        "id": "9a5c0e2a-1249-4e2c-ada2-c9a7f33e26d5",
        "value": "John"
      }
    ],
    "age": [
      {
        "id": "67d681b1-9a24-4090-bac5-be77337ec903",
        "value": 44
      }
    ]
  }
}

You can also use the logical operation (and(), or()) for the graph query.
For example, the following returns only "Mary".

g.V().hasLabel('employee').and(has('age', gt(35)), has('age', lt(40)))

Next we handle the traversals. (You can traverse the edge.)
Next is the simple traversal example, which just retrieves Mary's manager. (The result will be "John".)

g.V('u002').out('manager').hasLabel('employee')

Note that the following returns the same result. The operation outE() returns the edge element and is getting the incoming vertex by inV(). (Explicitly traversing elements, vertex -> edge -> vertex.)

g.V('u002').outE('manager').inV().hasLabel('employee')

The following retrieves Mary's manager (i.e, "John") and retrieves the all employees whose direct report is him ("John").
The result will be "Mary" and "Susan".

g.V('u002').out('manager').hasLabel('employee').in('manager').hasLabel('employee')

If you want to omit the repeated elements in path, you can use simplePath() as follows. This returns only "Susan", because "Mary" is the repeated vertex.

g.V('u002').out('manager').hasLabel('employee').in('manager').hasLabel('employee').simplePath()

Now let's consider the traversal of the relation "friend". (See the picture illustrated above.)
As you know, "manager" is the directional relation, but "friend" will be the undirectional (non-directional) relation. That is, if A is a friend of B, B will also be a friend of A.
In such a case, you can use both() (or bothE()) operation as follows. The following retrieves Emily's friend, and the result is both "Susan" and "Christie".

g.V('u006').both('friend').hasLabel('employee')

If you want to traverse until some condition matches, you can use repeat().until().
The following retrieves the reporting path (the relation of direct reports) from "John" to "Emily".

g.V('u001').repeat(in('manager')).until(has('id', 'u006')).path()

The result is "John" - "Mary" - "Bob" - "Emily" as the following GraphSON.

{
  "labels": [
    ...
  ],
  "objects": [
    {
      "id": "u001",
      "label": "employee",
      "type": "vertex",
      "properties": {
        "firstName": [
          {
            "id": "9a5c0e2a-1249-4e2c-ada2-c9a7f33e26d5",
            "value": "John"
          }
        ],
        "age": [
          {
            "id": "67d681b1-9a24-4090-bac5-be77337ec903",
            "value": 44
          }
        ]
      }
    },
    {
      "id": "u002",
      "label": "employee",
      "type": "vertex",
      "properties": {
        "firstName": [
          {
            "id": "8d3b7a38-5b8e-4614-b2c4-a28306d3a534",
            "value": "Mary"
          }
        ],
        "age": [
          {
            "id": "2b0804e5-58cc-4061-a03d-5a296e7405d9",
            "value": 37
          }
        ]
      }
    },
    {
      "id": "u004",
      "label": "employee",
      "type": "vertex",
      "properties": {
        "firstName": [
          {
            "id": "3b804f2e-0428-402c-aad1-795f692f740b",
            "value": "Bob"
          }
        ],
        "age": [
          {
            "id": "040a1234-8646-4412-9488-47a5af75a7d7",
            "value": 35
          }
        ]
      }
    },
    {
      "id": "u006",
      "label": "employee",
      "type": "vertex",
      "properties": {
        "firstName": [
          {
            "id": "dfb2b624-e145-4a78-b357-5e147c1de7f6",
            "value": "Emily"
          }
        ],
        "age": [
          {
            "id": "f756c2e9-a16d-4959-b9a3-633cf08bcfd7",
            "value": 29
          }
        ]
      }
    }
  ]
}

Finally, let's consider the shortest path from "Emily" to "John". We assume that you can traverse either "manager" (directional) or "friend" (undirectional).

Now the following returns the possible paths from "Emily" to "John" connected by either "manager" (directional) or "friend" (undirectional).

g.V('u006').repeat(union(both('friend').simplePath(), out('manager').simplePath())).until(has('id', 'u001')).path()

This result is 3 paths :
Emily - Susan - John
Emily - Christie - Susan - John
Emily - Bob - Mary - John

When you want to count the number of each paths (current local elements), use count(local) operation.

g.V('u006').repeat(union(both('friend').simplePath(), out('manager').simplePath())).until(has('id', 'u001')).path().count(local)

This result is :
3
4
4

Then the following returns both count and paths as follows.

g.V('u006').repeat(union(both('friend').simplePath(), out('manager').simplePath())).until(has('id', 'u001')).path().group().by(count(local))
{
  "3": [
    {
      "labels": [...],
      "objects": [
        {
          "id": "u006",
          ...
        },
        {
          "id": "u005",
          ...
        },
        {
          "id": "u001",
          ...
        }
      ]
    }
  ],
  "4": [
    {
      "labels": [...],
      "objects": [
        {
          "id": "u006",
          ...
        },
        {
          "id": "u003",
          ...
        },
        {
          "id": "u005",
          ...
        },
        {
          "id": "u001",
          ...
        }
      ]
    },
    {
      "labels": [...],
      "objects": [
        {
          "id": "u006",
          ...
        },
        {
          "id": "u004",
          ...
        },
        {
          "id": "u002",
          ...
        },
        {
          "id": "u001",
          ...
        }
      ]
    }
  ]
}

 

[Reference]

TinkerPop Documentation (including language reference)
http://tinkerpop.apache.org/docs/current/reference/

 

Comments (0)

Skip to main content