Data store best practices

The API BaaS data store is backed by Cassandra, an open source distributed DBMS. Cassandra is specifically designed to support applications that need flexibility and high scalability, particularly web and mobile applications. To get the most out of your API BaaS application, you should optimize your data access with this kind of database in mind.

Put your data in as you'll want to get it out

The best practices described here are all related to the theme of putting your data in the way you’ll want to get it out. You’ll model your data with your likely read requests in mind rather than by modeling around the structure of the data itself (as you might with a relational database). In many cases, you can avoid using queries that are inefficient for this kind of database.

You can use the following techniques to optimize data store access:

  • Denormalize and duplicate.

    By creating a data model that strategically denormalizes and duplicates data, you can avoid costly queries. In other words, you model the data so that all of the data for a given request is all in one place rather than scattered in a way that a query must account for.

  • Create direct paths to the data you’ll want.

    You can optimize your app’s performance by connecting entities your code retrieves most often. With connections, you can avoid some queries that might add complexity and reduce performance.

Best practice: Denormalize for pre-computed query results

If you’re familiar with designing relational databases, you’re probably used to normalizing the structure of your data as much as possible. When normalizing an RDBMS data model, you minimize redundancy by ensuring that a column from one table is duplicated only once in any other table, and only when necessary to create primary/secondary key relationships. You then retrieve data that spans tables by joining them with an SQL query.

In contrast, with the API BaaS data store you’ll get better performance by strategically denormalizing and duplicating data. When you denormalize, you combine data from what might (in a relational model) have been separate tables. When duplicating, you intentionally maintain copies of certain entity properties across multiple entities. By denormalizing and duplicating, you can collect the results of requests into a single entity rather than relying on a query.

Part of designing your data model should include identifying the queries your client will make. Then in your data model design, you capture the results of those queries in advance by writing that data into entities of one collection you can read from later.

Getting review data the relational way

For example, take a closer look at the relational case through a product database. Imagine you want to present a list of all reviews for a product. When a user chooses a review from the list, you’ll show them a view that includes the review’s title, body, and rating; the product it describes; and who wrote the review. That way, your user will be able to click through to more information about the product or reviewing user.

In a relational database, you’d likely have separate tables for the review-specific information and a rating; for the product; and for the user. Your reviews table would include secondary keys with which to join reviews to the users and products tables. Maybe something like this:

To get the data you need, your query might look like the SQL query below. By getting UUIDs for the user and product into your client code, you’re able to give the user a way to get user and product information from the review.

SELECT review.title, review.body, review.rating, review.uuid,, user.uuid,, product.uuid 
    FROM reviews 
    INNER JOIN users 
    ON users.uuid = reviews.authorId 
    INNER JOIN products 
    ON products.uuid = reviews.productId 
    WHERE reviews.uuid = <uuid_for_selected_review> 

But due to Cassandra’s distributed, high-scale design, a join such as this wouldn’t work. So in the API BaaS, you’d get the data by first storing it all in one place.

Using a denormalized model to store (then retrieve) data

In API BaaS, a more efficient way to get the same result would start by including related user and product data with the review data. This would give you a single place to get all the data you'll show.

The following shows how that model might look as entities in the API BaaS. (The users and products entities are included here to remind you that they exist, but they aren’t actually used in this denormalization example.)

This use case assumes that your user and product data are already in the data store. In the following API request code, you’re just adding a new review written by a user about a particular product. The JSON body is your new review entity (the JSON here is non-entitized to make it readable).

POST https://<baas_host_name>/my_org/my_app/reviews -d { 
    "title" : "Tempted to climb in myself.", 
    "body" : "I mean, who doesn't love a bouncy castle? The kids love it!",
    "rating" : 3, 
    "authorName" : "Jennie", 
    "authorUUID" : <author_uuid>, 
    "authorImage" : "http://<path_to_image>.png", 
    "productUUID" : <product_uuid>,
    "productImage" : "http://<path_to_image>.jpg", 
    "productName" : "Magic Castle Bounce House Inflatable Bouncer" 

Notice that you’re adding user and product data (which your client code would have at hand when the user posts the review) when you’re adding the review-specific data.

Retrieving all the data from one place

You’d get the review, including the subset of product and user data you need right away, by making one API call such as the following (note there’s no SQL-like query string needed):

GET http://<baas_host_name>/my_org/my_app/reviews/<review_uuid> 

Your result might look like this.

    "action" : "get", 
    "application" : "<app_uuid>", 
    "params" : { }, 
    "path" : "/reviews", 
    "uri" : "https://<baas_host_name>/my_org/my_app/reviews", 
    "entities" : [ {
        "uuid" : "<review_uuid>", 
        "type" : "review", 
        "created" : 1395410364673, 
        "modified" : 1395410364673, 
        "authorName" : "Jennie", 
        "authorImage" : "<path_to_image>.png", 
        "authorUUID" : "<author_uuid>", 
        "body" : "I mean, who doesn't love a bouncy castle? The kids love it!", 
        "metadata" : { 
            "path" : "/reviews/<review_uuid>" 
        "productImage" : "http://<path_to_image>.jpg", 
        "productName" : "Magic Castle Bounce House Inflatable Bouncer", 
        "productUUID" : "<product_uuid>", 
        "rating" : 3, 
        "title" : "Tempted to climb in myself." 
    } ], 
    "timestamp" : 1395764951934, 
    "duration" : 16, 
    "organization" : "my_org", 
    "applicationName" : "my_app" 

This gives you, in one request, all the review information you’d planned on presenting to your app’s users for a single view. Your client code could retrieve the review entity in the result, along with the product and user data, then present it all as a review.

Best practice: Connect entities to simplify scoped requests

Another way to streamline your requests is to connect entities by using the built-in connections feature. A connection can replace a simple query WHERE clause.

Getting back to the review example, imagine you want to retrieve all of the reviews written by a particular user. In the relational example, you might have an SQL query such as this:

SELECT * FROM reviews WHERE authorId = <user_uuid>; 

Even in the API BaaS, you could use a similar syntax in a query string appended to an API path. Working from the review entity model in the preceding example, that might look like this (though yours would likely entitize the spaces):

GET http://<baas_host_name>/my_org/my_app/reviews?ql=select * where

An alternative with connections

If the preceding query is an API call you’re going to be making often, there’s a better way. Instead, create a connection between the review and the user when your code creates the review. You can connect entities with a verb that describes their relationship to one another. With connections in place, you can later retrieve data quickly and with a simple syntax.

For more detail about connections, see Entity connections.

For example, when creating a connected entity (such as the review entity here), you can at the same time create other connections to connect the product to the new review, then connect the product to its reviewer.

Use the verb wrote to connect the review being created with the user jennie.

POST /users/jennie/wrote/reviews {<review_entity_json>} 
POST /products/<reviewed_product_uuid>/reviewedIn/reviews/<new_review_uuid> 
POST /products/<reviewed_product_uuid>/reviewedBy/users/jennie 

Having created these connections for each review you post, in addition to getting the review the user wrote, you could later also:

  • Get the reviews for a product:
    GET /products/<reviewed_product_uuid>/reviewedIn/reviews 
  • Get the users who reviewed the product:
    GET /products/<reviewed_product_uuid>/reviewedBy/users