Joola is a distributed data processing, analytics and visualization framework. The framework is designed as an end-to-end solution for data analytics that lets you connect your databases using JSON based mapping of dimensions and metrics. It exposes a RESTful API for querying the data, displaying and manipulating it. The Client SDK communicates with the engine to display, visualize and provide insight into the data. Developers can extend the framework in many ways, add data connectors, authentication plugins, visualizations and more.

Architecture

Joola's architecture is designed for scale and resilience, clusters can be as small as all-in-one nodes you run on your workstation to a full fledged distributed cluster on-premise on hosted on the cloud. This scalability is achieved using Joola's agnostic approach to its configuration, discovery and data store providers.

Central configuration

Joola uses a node based architecture, meaning all nodes must share the same configuration and respond together to configuration changes. When run as a standalone node, Joola can use the local file system to manage configuration, however when multiple nodes on distributed machines are needed, they all need to share a single source of configuration.

For this purpose, Joola uses a central configuration store which can be either Redis or Zookeeper (at the moment) and we're working on adding Consul and etcd, feel free to give us a hand.

When Joola starts tries and locate a configuration file and load it into the central store if it hasn't been initialized yet. Additional nodes joining the cluster must have at least a single setting of the central configuration store to use in order to operate correctly, they'll connect to the central configuration store, load the most current values and respond to any future change.

Read more about configuration

Node discovery process

As more nodes join the cluster they connect to the central configuration store and learn about other nodes in the cluster.

The implementation of this discovery process is different based on the specific provider, however they all provide a standard assurance that nodes register and de-register in an orderly manner and that requests passed back-and-forth between nodes arrive safely to their destination.

As a side note, Joola does not elect a leader and all nodes participate as equals.

Data stores

While Joola comes packed with a simple in-memory (file-system backed) based database, when it comes to big-data that's hardly enough and a proper data store is required.

Joola is designed in an agnostic manner to the data store provider and currently supports ElasticSearch or MongoDB and we're working on adding more, feel free to give us a hand.

Message Delivery (MQ)

Joola uses a Message Queue to ensure the framework's accuracy and resilience.

Each cluster node subscribe to a central MQ via STOMP protocol and push/read events as they arrive to the cluster. Having MQ helps Joola ensure that all events that pushed into the cluster end up being analyzed and processed according to a specific logic.

Another benefit of using MQ is the ability to scale up and reduce the cluster's load by adding more nodes that subscribe to handle incoming events/requests.

Joola at scale diagram

Here's a diagram showing how the different players work together at a large scale deployment.

Terminology

Before diving deep into the system, it's best to become familiar with some basic concepts used throughout the documentation and by the framework.

Joola is a distributed data framework focused at managing analytical processes and data visualization. This means that Joola can be used for a multitude of use cases, ranging from pure visual analytics systems to advanced ad serving.

Workspaces

A workspace is the outer most logical container available. Within workspaces reside collections, users, roles and all other metadata. We use workspaces in order to separate between metadata and allow secure store and delivery of data based on the workspace configuration.

A common use case for workspaces is to support different environments. Using workspaces we can create for example a separate container for development, QA, staging, demo and production. They all share the same Joola framework, however each contains its own configuration.

[
  {
    "key": "_test",
    "description": "Workspace for internal joola tests",
    "name": "joola Framework Tests"
  },
  {
    "key": "_stats",
    "description": "Stores internal statistics of joola",
    "name": "Internal Stats"
  },
  {
    "key": "demo",
    "description": "A starter/playground workspace",
    "name": "Demo Workspace"
  }
]

API documentation on managing Workspaces

Collections

A collection is a data store where documents (the actual bits of data) are persisted into Joola cache. Collections also contain metadata that describe the stored data, for example, what dimensions and metrics the document contains. Collection metadate is represented in JSON and includes all required details on the data stored and its attributes. By referencing this metadata, Joola interprets queries into meaningful insights.

Data is introduced into the collection by pushing events and later queried and visualized.

Learn more about collections

Permissions

Each API endpoint has a permission assigned to it. Only users who are authorized with the required permission can access the endpoint.

The list of permissions is system owned and cannot be changed by the user/operator.

Roles

A role is a logical entity which holds several permissions and may have a filter associated with it.

Learn more about roles

Users

A user is a simple metadata collection describing a user that will access and use the system.

Learn more about users

Dimensions: describe data

A dimension is an descriptive attribute or characteristic of an object that can be given different values. For example, a geographic location could have dimensions called Latitude, Longitude, or City Name. Values for the City Name dimension could be San Francisco, Berlin, or Singapore. Browser, Device, Date are all examples of dimensions that may appear as part of joola.

Dimensions are system wide (based on the system/user permissions) and may appear in all of your reports, though you might see different ones depending on the specific report. Use them to help organize, segment, and analyze your data. In some reports, you can add and remove dimensions to see different aspects of your data.

Relationship between dimensions and metrics

Although dimensions and metrics can stand alone, they usually are used in conjunction with one another. The values of dimensions and metrics and the relationships between those values is what creates meaning in your data. For the greatest insights, dimensions are often associated with one or more metric.

In this example, the City dimension can be associated with the metrics Population and Area. A ratio metric, like Population Density, could also be created with this data, giving even more insight about each of these cities:

DIMENSION METRIC METRIC
City Area (in sq. miles) Population
San Francisco 231 800,000
Berlin 334 3.5 million
Singapore 224 5.2 million

Metrics: measure data

Metrics are individual elements of a dimension that can be measured as a sum or a ratio. For example, the dimension City can be associated with a metric like Population, which would have a sum value of all the residents of the specific city. Visits, Page per Visit, and Average Visit Duration are examples of metrics that may be part of joola.