Carol Connect is the tool to get data from any database.
Currently, these are the supported databases:
- SQL Server (including SQL Server on Azure)
- Progress OpenEdge
To install and run Carol Connect perfectly, the following requirements should be attended:
- Environment (Windows or Linux) with at least 100 GB of hard disk, 4 GB of ram memory and internet access without proxy. If there is a table with big records data like images, is strongly recommend to have 8GB of ram memory.
- An exclusive environment to avoid concurrence with other systems.
- The URL to *.carol.ai should be allowed on the firewall server (if applicable).
- A user with administrative rights on the database (permission to install triggers and create table). The administrative rights can be removed after the installation and initialization of the environment.
Ps.: In case you have a preference to run Carol Connect through a Docker Image, please, get in contact with us at [email protected]
Carol connect performs communication with Carol (cloud). In order to allow communication, some requirements should be observed:
- In case the network has a firewall or proxy, Carol's URL should be added as an exception for performance insurance.
- Carol connect communicates with Carol using the domain .carol.ai.
If you are starting a project with Carol and you need to have Carol Connect to integrate your data, you can download the most recent version in our Github repository: https://github.com/totvslabs/2c/releases.
Carol Connect has no installation, you just need to unzip the ZIP file and run the Carol connect through the file "run-2c.bat" or "run-2c.sh".
- Run the serviceinstall.bat.
- Start the service normally.
Note: You must be running the windows prompt as an administrator.
- Edit carolconnector.service and edit the following information:
- Run the following command (assuming /opt/2c is the install dir):
sudo systemctl enable /opt/2c/carolconnector.service
- Start the service by running the following command:
sudo systemctl start carolconnector
Access the URL http://localhost:8880 and type Carol's credentials:
- Organization: the domain when you are accessing Carol's environment. For example, if your Carol's URL is "totvs.carol.ai", then, your tenant/domain is "totvs."
- Environment: this was called previously as a tenant. This is the environment you choose inside your organization to work in a project. The organization name is on Carol's header bar and on the URL: totvs.carol.ai/production. In this example, the environment is "production."
- Username: the same you are using to authenticate in Carol. This username will be used to integrate data into Carol. You are able to change later on.
- Password: the password associated with the previous username.
The button "+ Add new database" allow you to create a new database connection. You should choose one database to create the connection:
After, you should type the database parameters. These parameters will be saved locally to automatically integrate the data to Carol.
This is a sample of parameters for SQL Server connection:
- "TOTVS Protheus database?": this parameter indicates that the database is Protheus. It means that soft-deletion happening on the database will be replicated as a real deletion in Carol.
- "TOTVS Protheus use company code from table name?": This parameter indicates to Carol that the company group in Protheus belongs to the table name (as the prefix). Carol Connect will remove the company group from the table name and add it as a column for all records in the table.
- "TOTVS Protheus company codes": This parameter indicates to Carol Connect to enable the conversion of the company group from prefix to column only for the specified company groups.
After creating the connection, you will see this interface. This interface shows some important information:
- Last batch sent: last time Carol Connect sent data to Carol.
- Transmission: quantity of records integrated to Carol since last Carol Connect restart.
- Last minute rate: quantity of records per second.
- Carol connector: connector in Carol that is receiving this data.
- Mean connection wait time: milliseconds that Carol Connect is waiting to get a connection and send data.
- Mean connection usage time: milliseconds that Carol is using the connection to send data to Carol.
If you need to change the database connection parameters, you can do that clicking on the dots on the top right side:
Clicking on "Configure Entities", you will be able to add tables to be integrated to Carol.
Carol Connect shows all entities available through the database connection configured:
Clicking on the table allows you to see details related to the table:
- Integration: it shows the integration status.
- Data processing: it shows if the data processing is running or not.
- Primary key information: it shows the fields that belong to the primary key. If this entity does not have the primary key defined in the database, you should select the field(s) that identify one record inside the entity (it is mandatory to integrate data to Carol).
- Condition for initialization: you can define a condition to filter out some data. For example, you can initialize this entity only sending "records that have the field creation_date > '2017-12-10', for example". In this sample, "creation_date is a field that belongs to the entity selected. This condition will be applied only when initializing the entity. All records inserted/updated will be integrated to Carol without validation with this condition.
- Sync using a value field: At each time set by the cron expression in the configuration file, 2C will check for any records included or changed after the last timestamp returned and send those records to Carol. When selecting this option, the field containing the inclusion or modification timestamp information must be chosen. This operation supports string fields (with a timestamp or natural sort structure to determine newer records) or numeric fields (logical time/auto-increment values).
- Sync using Resync: At each time set by the cron expression in the configuration file, the 2C will check the local primary key set and compare them to the set of crosswalks stored in Carol and will send the missing records as well as will send deletes to the non-records existing ones.
- Sync using Full Load: At each time set by the cron expression in the configuration file, the 2C will get all data in the database, export to the local disk preparing the batches to send it to Carol.
After clicking on "Enable", Carol Connect will create the trigger to get all new records and all updated records. After creating the trigger, Carol Connect will export all data to the payload folder preparing the data in batches to send it to Carol. The queue (carol_3c_queue) is used for operations in the database (new records, updated records, and deleted records).
After enabling the integration of the entity, you can enable other entities. The next section describes the integration monitor through "Database Manager" menu.
Some additional operations are available, allowing the user to disable the synchronization (it drops the trigger), pause the synchronization, resend all data, resync, and generate a report to evaluate the records sent and records in the database.
Create a database directory connection.
Fill up the information like the following example:
After saved the above configuration a directory connection will be created.
Then the directory needs to be enabled for syncing the files to Carol.
It is possible to create a SQL condition for filtering the files by data restriction.
To create the SQL condition the directory connection needs to know what is the database connection and which is the restriction query.
The syntax is
Database#select file_name from my_files where extension = 'jpg'
The result of the query needs to be the names of the files that will be synced to Carol.
2C is able to anonymize data before send it to Carol. In order to use this feature, just select the type of anonymization before enable the entity synchronization.
The types of anonymization are:
- None. The field are not anonymizated.
- Base Round. The field data is rounded using a base number. eg:
16800using a 1000 base round becomes
17000. Define the base number in the option field.
- Date. Change date values to 1 (if the field is day, month ou year) or 0 (if the field is hour, minutes or seconds). Put the letter refering to the date data in the options field. eg:
- m Month
- d Day of month
- y Year
- h hour
- M minute
- s second
- Email Mask. Apply to email address data. eg:
- Hash. A hash function is applied to field data. If the field is a number, CRC32C is selected, if it is a string, SHA3 is selected.
- Mask. A mask is applied to field data. The character
#means that char position is not masked. eg: card number:
XXX-XXX-XXX-012. Define the mask in the option field.
- Supression. This field is not sent to Carol. It's supressed from data payload.
To verify the status related to the data integration, you can access the menu "Database Manager" to verify a few information as described previously:
2C configuration is locate in
- syncThreads: Number of job executors running in parallel.
- poolSize: Maximum number of connections to the database.
- ignoreTriggers: Set true if 2C is not allowed to create triggers on the database.
- enableReSync: Set true to enable the ReSync strategy.
- enableLogin: Set true if 2C needs to ask login each time its API is called.
- syncByTimestampResendLastDate: Set true if, during a Sync by Timestamp, 2C have to resend records from the last timestamp. Useful when timestamp doesn't have time, only date.
- resendAllRecordsOnResync: Set true if 2C needs to resend all records each time ReSync is executed.
- imageAxisPixelsLimit: If bigger than zero, 2C will validate and resize images with axis bigger than the informed limit.
- ignoreQueueTable: This parameter will disable the feature that allows 2C to create the table "carol_3c_queue" automatically.
2C have four cron expressions:
- processing: Define each time 2C checks queue table to send data to Carol.
- resync: Define each time 2C will start one step to ReSync a table. A full ReSync have five steps: Check local data, check staging data, check golden records, check rejected records, compare local with remote data.
- syncbyresync: Define each time 2C will start a ReSync to tables that are sinchronized by ReSync.
- syncbytimestamp: Define each time 2C will start a Sync by Timestamp to tables that are synchronized this way.
- initialload: Define each time 2C checks and starts initial loads for enabled tables.
These are the requirements that the database should attend to work perfectly with Carol Connect:
- SQL Server 2008 or higher.
- Configure SQL Server to listen to a TCP port, normally 1433. Please, follow this instruction: https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/configure-a-server-to-listen-on-a-specific-tcp-port
These are the requirements that the database should attend to work perfectly with Carol Connect:
- OpenEdge 10. 3 or higher.
- SQL Broker started and accepting connections.
- Grant Carol Connect's user to have table definition privileges (DDL privileges).
- In case this database belongs to Datasul products, the table "carol_3c_queue" already exists on the data dictionary. Not needed to create it (change the parameter setting to avoid the creation).
- In case this database is not related to Datasul, Carol Connect will create the queue table on the database (based on Carol Connect configuration) when starting it. OpenEdge only allows this operation (DDL) when no other user is connected to the database.
When using Carol Connect integrated with MongoDB, the following items must be followed:
- Do not use the database administrator to set up the connection.
- Specify the right restriction to the user that will be able to synchronize the data.
- No operation will be performed by the integration - there is no queue table for MongoDB.
- The synchronization happens by "Sync using a value field" strategy.
Updated about a month ago