Carol is a data platform with Machine Learning. Carol provides unlimited possibilities through:
- Data platform: the capability to get data from anywhere and ensure the data quality.
- Intelligent apps: the capability to develop any application and deploy it on top of Carol Platform.
- Carol Assistant: the capability to add the Carol Assistant to your application.
Carol has all concepts and features defined by an MDM (Master Data Management) tool, keep in mind that it is normal for Carol to handle more than one Data Source and centralize all the data making sure that the data has enough quality.
Carol has a set of concepts that are important to have clearly defined to get all power from the platform.
The most recent version of the platform added new concepts as follow:
Organization: This is a way to group multiple environments (before called as tenants). In an Organization the Organization Administrator is able to manage all environments and resources related to the whole organization. The organization's name is the domain in a URL. For the example totvs.carol.ai, the organization is totvs.
Organization Admin: This is the whole in an Organization responsible to maintain Environments and all resources related to the Organization.
Organization User: This user belongs to the organization and it means that this user has access to one or more environments inside the Organization.
Environment: Before it was called Tenants. The environment is responsible to store the data models, connectors, data, named queries and all resources provided by Carol Platform. You will see more details on the following documentation. The environment is the information right after the base URL. For example, for totvs.carol.ai/rh, the environment is rh.
Environment Admin: This is the user responsible to manage the whole Environment including data and resources.
Data Access Level: This is the way to restrict access to the data inside an organization. The Data Access Level can define dynamic rules and manage several users. The restriction applied is related to operations (delete, create and filter data) and data (golden records).
Below is described the set of most important concepts introduced by Carol.
Data Model has the same idea of a table for relational databases. The data model supports columns with the specific data type (Integer, String, Boolean, Date, etc) and it will store all data related to the Data Model. In addition, the data model support survivorship rules, rejection rules, flag rules and other features that will be described soon.
Data Model Type: Each Data Model should be categorized in one of the five data model types.
- Organization: data model related to companies. For instance: company, carriers, transportation companies, etc.
- Person: data model related to a person. For instance: student, employee, etc.
- Product: data model that describes products. For instance: products, services, etc.
- Transaction: data model related to transactional data, normally related to a specific date/time. For instance: receipts, tickets, events, orders, etc.
- Location: data model related to location.
Golden Record: Golden Record is the final result of the combination of a set of connectors. Each connector will contribute with a set of Staging Records and the staging records will be mapped to a specific Data Model. The records stored in the data model (after all Data Model validation) are called as Golden Records.
Merge Rules: The merge rules have the same concept related to Primary Key in relational databases. It is used to identify the same record. If there is a record with the same rules specified in the merge rules, the record will be merged following the "Survivorship Rules". This concept allows the user to create a rule to get records that potentially are the same (potential merges).
Rejection Rules: To help to ensure the data quality, the data model supports a set of rejection rules. Rejection rules are used to specify the rule to reject records that do not match the rules. The user can explore all rejected records to fix the data that is not matching the quality rules. The Golden Record will not be available on the platform, first, it should be fixed to go through Rejection Rules.
Flag Rules: Flag Rules are used to identify records with some quality problem but they are still good to be considered a Golden Record. It means that Carol will compute the record, and make it clear that something in the record could be improved.
Skip Rules: Skip Rules are rules to define if the value on the record is good enough to be set to the Golden Record. Some cases, the record was not rejected by Rejection Rules, but you want to add a specific rule to reject the value in case it matches some rules.
Survivorship Rules: Survivorship rules is the way to let Carol know how to proceed to create the Golden Record in case two golden records match the same merge rule. Some rules available are:
- Recency: keeps the newest value.
- Oldest: keeps the oldest value
- Frequently: keeps the value more frequent.
- Set the survivor value from the data coming from a specific connector. Normally, useful when you trust in a connector for a specific kind of data (like address).
Relationship: This is the way to connect and create the relation between two different Data Models. A sample can be Customer and Order.
Geocoding: This is another layer of data quality for an address field. If you have the geocoding enabled for your tenant and fields, it means that Carol will normalize the address calling external services, and as consequence, Carol will store the geocoding allowing your application to work with longitude and latitude values.
Connectors are the bridge between the external world and Carol. Carol provides a set of connectors to automate this process to capture data.
Some sample of connectors: TOTVS Products, Carol Connect, Dropbox, Facebook, Twitter, File, Sales Force, RSS or Carol accepts any kind of data through REST-API service.
Some concepts related to Connectors:
Staging Table: Table/Container that has fields and is used to store data. Each field has a data type and follows a schema. Staging tables are created inside a connector and are used to store the data before being transformed in Golden Records. If the connector is getting data through REST-APIs, the staging table will be created through REST-APIs as well.
Staging Table Schema: Staging Table definition, the definition is basically all the columns and the data type for each column. That is, it is used to annotate and validate the data that is being sent.
Staging Record: Staging Record is the record inside the staging table.
Identifier: Identifier is the primary key for the data in the staging table. If the connector gets a record with an existing identifier, it will replace the information in the staging record.
Staging Mapping: It is the mapping between the Staging Table and the Data Model. The data model has no limits of mapping. Together with the mapping is defined the cleansing rules.
Consumers: Connectors could work as contributing (the ones that provide data) or consumers (the ones that need to have the data on their side). Normally, the Connector with consumption capability need to have always the most updated version of the data, and because of that, the connector will consume the Golden Record. Every time that the Golden Record has a new version, Carol will enqueue a new version of the Golden Record for consumption purpose.
Reverse Consumption Rules: The consumption process allows the connector to consume golden record following the golden record schema (defined in the data model). If the connector wants to consume the data in the same format it contributed the data (through staging tables), it is possible using the reverse consumption rules. Basically, it is a set of rules to convert the data back to the original format (defined on the staging table as staging schema).
ETL (Extract Transform Load) is a set of tools to allow the user to modify the data structure. Carol has a set of functions to cover the most common cases.
Split: This function allows the staging table to be split into two or more staging tables. This function is useful when it is needed to split different concepts inside the same staging table. For instance, let's say you have the customers and providers in the same entity in your application. You can use ETL functions to split the customers and providers into two staging tables.
Duplicate: This function allows the staging table to be duplicated. If it is needed to work with the same data in different ways, the ETL Duplicate function will do that.
Join: This function allows two staging tables to be joined in a third new staging table. With that, you can join the staging table "employee" with "city" to replace the foreign key defined in "employee" to get the right city in the "city" table.
Lookup table: When working with "Join" ETL function, the user can configure one or both staging tables as a lookup table. It means that Carol will not move the data from this staging table, making it available for next record coming. If the staging table is not set as a lookup table, the staging record will be joint and moved to the third staging table (result staging table).
Explore is the module that allows the user to explore the data, applying specific filters and see the 360 about the Golden Record.
Master of Masters: Carol has the capability to share Data Models and Golden Record. Currently, Carol has a big table related to all companies in Brazil, this data can be accessed through all environments using the Master of Masters concept.
Potential Merges: When defined on the Merge Rules potential merges, Carol will show the number of potential merges to the user. The user has the power to merge and unmerge golden records.
Carol allows the user or applications to consume data through REST-API services. Some important concepts in order to get details about the services:
Filter: A query using Carol Query Language (similar to ElasticSearch Query DSL) to extract, aggregate and compute data.
Named Query: Using the Carol Query Language, but saved in Carol. It has the same features and power provided by "Filters" but saving all complexability related to the "filter" inside Carol.