Collector Guide

List of Contents

The Machbase collector is a process for extracting, converting, and inserting log data into a database in real time. The collector is installed in a machine that is physically separated from the server and collect various types of log data in real- time that were generated from machines, and send them through a network.

The Machbase collector is a separate process, and multiple collectors can be operated at the same time, but a single collector is designed to collect only a single type of data source. It processes different types of logs within the allowed range of system resource.


The diagram above shows collecting and sending log data from Node-2 and Node-3 to Node-1 where the Machbase server is installed. As shown in Node- 2 and Node- 3, MachCollector is independently operated to read and send specific log files. Each process is designed to refer to the details of the log data source defined in the tpl file. It appears to be unusual. However, the collector manager which is installed in an each server controls collector processes and it monitors the status of the process as well. Therefore, it allows to manage the whole process with consistent interface.


Below is the list of main features of the collector.

Consistent Interface

Machbase doesn't require additional process or command line options to execute the collector. Rather Machbase provides the SQL- based consistent interface.
Simple installation would do the trick. Install a collector and collector manager in the server, and then the collector starts collecting and storing data. It reduces the administrative expenses for the equipment to read log data, and allows integrated monitoring.
To create and run a specific collector, only SQL statements below are required in the machsql.




High-Performance Data Collection

The architecture of the collector is designed to run separate processes for different formats of log data.
Because of this structure, each process is to read log files with high performance so that it guarantees high stability because each process does not interfere or affect other processes.In addition, the collector can be operated with the optimized code and executable files that suit each log type, and it also uses the minimum system resources to insert data through the dedicated protocol, but expect the best performance.

Collection Methods

The collector provides various methods for collecting log data. You can change the method by simply setting up suitable values for each method in the tpl file.
Currently supported methods are as follows.

Table 1. Supported methods

Name of methods Description
FILE Collect files from local hosts.
SFTP Collect files from remote hosts.
SOCKET Collect data that were inserted via local sockets.
ODBC Collect data from a specified database.

Log Types

The Machbase collector provides regular expression files for different types of log data. Therefore, users able to read logs quickly by re- using the existing regular expression files with simple modification. Currently supported log types are as follows.

Table 2. Supported log types

Name of template file Supported type Default location (Can be modified)
machbase.rgx Trace file of Machbase $MACHBASE_HOME/trc/machbase.trc
apache_access.rgx Access file of Apache web server /var/log/apache2/access.log
apache_error.rgx Error file of Apache web server /var/log/apache2/access.log
syslog.rgx sysglog file of Apache web server /var/log/syslog
custom.rgx User-defined type User-defined file

Easy to customize user-defined log

The template files of the Machbase collector provide a high degree of scalability. Machbase is designed to store data in any format as long as the users able to express the log file with regular expressions even though log files are not predefined. Users can test with a log file and regular expressions through machegex. It can normalize the column and generate the unique template file based on this result. More details are described in the next chapter.

Ensure continuity of data after failures

Machbase collector provides a feature that sends the data again from the location where the server fails to collect. After the failure, the collector remembers the location of the last log file that Machbase database received and start collecting again. Thus, it guarantees that it sends all data to a server continuously without any additional operation when there are software or hardware failures from the server. When the Machbase collector restarted, it remembers the list of collectors and sends the data safely by just starting the server.

Guarantee high availability of server

To guarantee the high availability of the service, the Machbase provides multiple collectors running concurrently. It can operate multiple collectors that read the same data source and send them to different Machbase servers. By doing so, data can be safely maintained in different Machbase servers when one of Machbase servers failed to operate. When the collector restarts after the failure, it continuously restore data. Thus, it provides high availability to customers by data replication.

Integrated monitoring through MWA

Machbase collector allows the user to conduct integrated monitoring via MWA(Machbase Web Analytics). It is possible because the collector manager is in sync with information from the database. With MWA, the user can check the status of various collectors currently operating and also can monitor the status of each collector operating in the server in real-time.

Preprocessing via python script

The collector can call the specified Python scripts right before data branching off from the inside the collector. Using the Python scripts, it can specify a series of actions to be changed or exclude certain records. The collector is mainly focused on data collection, however, it provides flexibility to exclude specified records or convert data before store them.

results matching ""

    No results matching ""