Carol Connect (2C)

Carol Connect is the tool to get data from any database.

Currently, these are the supported databases:

  • SQL Server (including SQL Server on Azure)
  • Oracle
  • PostgreSQL
  • MySQL
  • Progress OpenEdge
  • MongoDB

🚧

Database user grants

Please guarantee the following grants for the database user to be used in 2c:

  • CREATE SEQUENCE
  • CREATE TABLE
  • CREATE TRIGGER
  • CREATE INDEX

Requirements (server)

To install and run Carol Connect perfectly, the following requirements should be attended:

  • Environment (Windows or Linux) with at least 100 GB of hard disk, 4 GB of ram memory, and internet access without proxy. If there is a table with big records data like images, is strongly recommend to have 8GB of ram memory.
  • An exclusive environment to avoid concurrence with other systems.
  • The URL to *.carol.ai should be allowed on the firewall server (if applicable).
  • A user with administrative rights on the database (permission to install triggers and create table). The administrative rights can be removed after the installation and initialization of the environment.

Network Requirements

Carol connect performs communication with Carol (cloud). In order to allow communication, some requirements should be observed:

  • In case the network has a firewall or proxy, Carol's URL should be added as an exception for performance insurance.
  • Carol connect communicates with Carol using the domain .carol.ai.

Installing Carol Connect

If you are starting a project with Carol and you need to have Carol Connect to integrate your data, you can download the most recent version in our Github repository: https://github.com/totvslabs/2c/releases.

Since March/2021 we deliver a brand new 2C version.

Carol Connect has no installation, you just need to unzip the ZIP file and run the Carol connect through the file "2c.bat" or "2c.sh".

📘

Docker Image

If you prefer, you can start Carol Connect using docker image, by running the following commands:

docker pull totvslabs/2c
docker run totvslabs/2c

Set up the 2C memory

The new 2C has a new parameter at the instalation files that could help you to set up the max memory value that 2C could use for your tasks, please check this steps:

For Windows open this File

  • 2c.bat

For Linux open this File

  • 2c.sh

And check this parameter:

-Xmx4g

This values indicates that 2C can use at the max size memory a default value about 4GB. To increase this value, you can just update this file, and save, before of execute the 2c.bat/2c.sh file or create a 2C service, as showed bellow.

Configuring Carol Connector as a Service

Windows

Files needed:

  • carol2cservice.exe
  • serviceinstall.bat
  • serviceremove.bat

Instructions for registering 2c as windows service:

  1. Run the serviceinstall.bat.
  2. Start the service normally.

Note: You must be running the windows prompt as an administrator.

Linux

Files needed:

  • carolconnector.service

Instructions for registering 2c as linux service:

  1. Edit carolconnector.service and edit the following information:
    -- User
    -- WorkingDirectory
    -- ExecStart
  2. Run the following command (assuming /opt/2c is the install dir):
    -- sudo systemctl enable /opt/2c/carolconnector.service
  3. Start the service by running the following command:
    -- sudo systemctl start carolconnector

Login on Carol Connect

Access the URL http://localhost:8880 and type the Organization, and after click on Continue:

13661366

And inform the Username and Password:

13661366

Some details:

  • Organization: the domain when you are accessing Carol's environment. For example, if your Carol's URL is "totvs.carol.ai", then, your domain is "totvs."
  • Username: the same you are using to authenticate in Carol.
  • Password: the password associated with the previous username.

Notes: In the new 2C, the tenant name no longer needs to be entered. And for Security reasons the timeout has been implement and the user and password is needed each time that 2C is opened on a new browser.

Creating a connection to your Database

The button "+ Add new database" allow you to create a new database connection. You should choose one database to create the connection:

13661366

More details:

13661366

Some details:

  • Connection Name: the name for the connection, simple alias.
  • Carol Tenant: Now if you have more than one tenant at the Carol side you can choose each one at the connection steps. Example: Organization = qa, Tenant = qahom, and qaprod.
  • Carol Connector: Inform the name of a connector that already exists on Carol or you can create a new one chosen the option 'Create a new connector'.
13521352
  • Project: the name for the project, could be a project that already exists, or you can inform a new project name.
  • Connector Label: The name for the connector on Carol.
  • Carol Authentication: At the first time the connector token will be generated automatically.

After, you should type the database parameters. These parameters will be saved locally to automatically integrate the data to Carol.

This is a sample of parameters for SQL Server connection:

13541354

Important parameters:

  • "Instance name / path"; this parameter should be informed only if the database have more than one instance.
  • "TOTVS Protheus database?": this parameter indicates that the database is Protheus. It means that soft-deletion happening on the database will be replicated as a real deletion in Carol.
  • "TOTVS Protheus use company code from table name?": This parameter indicates to Carol that the company group in Protheus belongs to the table name (as the prefix). Carol Connect will remove the company group from the table name and add it as a column for all records in the table.
  • "TOTVS Protheus company codes": This parameter indicates to Carol Connect to enable the conversion of the company group from prefix to column only for the specified company groups.
13661366

Clicking on "Select Entities", you will be able to add tables to be integrated to Carol.

Carol Connect shows all entities available through the database connection configured:

13661366

Clicking on the table allows you to see details related to the table:

13661366

If the entities has all the configuration needed, the Quick enable is showed, otherwise only the option Advanced configuration will be showed. Lets see the Advanced configuration:

13661366

Some details:

  • Primary key information: it shows the fields that belong to the primary key. If this entity does not have the primary key defined in the database, you should select the field(s) that identify one record inside the entity (it is mandatory to integrate data to Carol).
  • Condition for initialization: you can define a condition to filter out some data. For example, you can initialize this entity only sending "records that have the field creation_date > '2017-12-10', for example". In this sample, "creation_date is a field that belongs to the entity selected. This condition will be applied only when initializing the entity. All records inserted/updated will be integrated to Carol without validation with this condition.

Synchronization strategy

Batch Strategies:

  • Sync using a value field: At each time set by the cron expression in the configuration file, 2C will check for any records included or changed after the last timestamp returned and send those records to Carol. When selecting this option, the field containing the inclusion or modification timestamp information must be chosen. This operation supports string fields (with a timestamp or natural sort structure to determine newer records) or numeric fields (logical time/auto-increment values).

  • Sync using Full Load: At each time set by the cron expression in the configuration file, the 2C will get all data in the database, export to the local disk preparing the batches to send it to Carol.

Online Srategy:

After clicking on "Enable", Carol Connect will create the trigger to get all new records and all updated records. After creating the trigger, Carol Connect will export all data to the payload folder preparing the data in batches to send it to Carol. The queue (carol_3c_queue) is used for operations in the database (new records, updated records, and deleted records).

13661366

After enabling the integration of the entity, you can enable other entities. The next section describes the integration monitor through "Database Manager" menu.

Some additional operations are available, allowing the user to disable the synchronization (it drops the trigger), pause the synchronization, resend all data, resync, and generate a report to evaluate the records sent and records in the database.

13621362

After creating the connection, you will see this interface. This interface shows some important information:

  • Last batch sent: last time Carol Connect sent data to Carol.
  • Transmission: quantity of records integrated to Carol since last Carol Connect restart.
  • Last minute rate: quantity of records per second.
  • Carol connector: connector in Carol that is receiving this data.
  • Mean connection wait time: milliseconds that Carol Connect is waiting to get a connection and send data.
  • Mean connection usage time: milliseconds that Carol is using the connection to send data to Carol.

If you need to change the database connection parameters, you can do that clicking on the dots on the top right side:

Creating a connection to your Directory

Create a database directory connection.

13661366

Fill up the information like the following example:

13661366

Next step

13661366

After saved the above configuration a directory connection will be created.

13661366

Then the directory needs to be enabled for syncing the files to Carol.

13661366

It is possible to create a SQL condition for filtering the files by data restriction.

To create the SQL condition the directory connection needs to know what is the database connection and which is the restriction query.

13661366

The syntax is database_connection_name + # + query_with_files_names_result.

Example: Database#select file_name from my_files where extension = 'jpg'

The result of the query needs to be the names of the files that will be synced to Carol.

Data Anonymization

2C is able to anonymize data before send it to Carol. In order to use this feature, just select the type of anonymization before enable the entity synchronization.
The types of anonymization are:

  • None. The field are not anonymizated.
  • Base Round. The field data is rounded using a base number. eg: 16800 using a 1000 base round becomes 17000. Define the base number in the option field.
  • Date. Change date values to 1 (if the field is day, month ou year) or 0 (if the field is hour, minutes or seconds). Put the letter refering to the date data in the options field. eg: 12/15/2015 11:10:35 using mdhMs option becomes 01/01/2015 00:00:00.
    • m Month
    • d Day of month
    • y Year
    • h hour
    • M minute
    • s second
  • Email Mask. Apply to email address data. eg: [email protected] becomes ***@somedomain.com.
  • Hash. A hash function is applied to field data. If the field is a number, CRC32C is selected, if it is a string, SHA3 is selected.
  • Mask. A mask is applied to field data. The character # means that char position is not masked. eg: card number: 123-456-789-012 using mask XXX-XXX-XXX-### becomes XXX-XXX-XXX-012. Define the mask in the option field.
  • Supression. This field is not sent to Carol. It's supressed from data payload.

Carol Connect Monitor

To verify the status related to the data integration, you can access the menu "Database Manager" to verify a few information as described previously:

13661366

2C Configuration

2C configuration is locate in app.config.yml file.

  • syncThreads: Number of job executors running in parallel.
  • poolSize: Maximum number of connections to the database.
  • ignoreTriggers: Set true if 2C is not allowed to create triggers on the database.
  • enableReSync: Set true to enable the ReSync strategy.
  • enableLogin: Set true if 2C needs to ask login each time its API is called.
  • syncByTimestampResendLastDate: Set true if, during a Sync by Timestamp, 2C have to resend records from the last timestamp. Useful when timestamp doesn't have time, only date.
  • resendAllRecordsOnResync: Set true if 2C needs to resend all records each time ReSync is executed.
  • imageAxisPixelsLimit: If bigger than zero, 2C will validate and resize images with axis bigger than the informed limit.
  • ignoreQueueTable: This parameter will disable the feature that allows 2C to create the table "carol_3c_queue" automatically.

2C Cron configuration

2C have four cron expressions:

  • processing: Define each time 2C checks queue table to send data to Carol.
  • resync: Define each time 2C will start one step to ReSync a table. A full ReSync have five steps: Check local data, check staging data, check golden records, check rejected records, compare local with remote data.
  • syncbyresync: Define each time 2C will start a ReSync to tables that are sinchronized by ReSync.
  • syncbytimestamp: Define each time 2C will start a Sync by Timestamp to tables that are synchronized this way.
  • initialload: Define each time 2C checks and starts initial loads for enabled tables.

SQL Server Requirements

These are the requirements that the database should attend to work perfectly with Carol Connect:

Progress OpenEdge Requirements

These are the requirements that the database should attend to work perfectly with Carol Connect:

  • OpenEdge 10. 3 or higher.
  • SQL Broker started and accepting connections.
  • Grant Carol Connect's user to have table definition privileges (DDL privileges).
  • In case this database belongs to Datasul products, the table "carol_3c_queue" already exists on the data dictionary. Not needed to create it (change the parameter setting to avoid the creation).
  • In case this database is not related to Datasul, Carol Connect will create the queue table on the database (based on Carol Connect configuration) when starting it. OpenEdge only allows this operation (DDL) when no other user is connected to the database.

MongoDB

When using Carol Connect integrated with MongoDB, the following items must be followed:

  • Do not use the database administrator to set up the connection.
  • Specify the right restriction to the user that will be able to synchronize the data.
  • No operation will be performed by the integration - there is no queue table for MongoDB.
  • The synchronization happens by "Sync using a value field" strategy.
13661366

Next step

13661366