Table of Contents
- Write Once, Read Many
- Support MVCC (Multi-Version Concurrency Control)
- Ultra High-Speed Data Storage
- Real-Time Indexing
- Real-Time Data Compression
- Unmatched Analytical Performance
- Support SQL Syntax for Time Series Data
- Support Full-Text Search
- Support Selective Deletion
- Automated Data Collection
Write Once, Read Many
Once the log data are inserted into the database, they are seldom changed or deleted due to its nature. In order to preserve the integrity, Machbase is designed not to update data. Thus, there will be no possibility of changing or deleting log data by malicious third parties.
Support MVCC (Multi-Version Concurrency Control)
One of the most important things to process log data is that INSERT, UPDATE and DELETE operations should be executed without collisions with SELECT operation. In order to avoid such issue, Machbase is designed not to assign locks in connection with SELECT operation. Further, SELECT operation will never be in conflict with other operations.
As a result, Machbase can retrieve millions of records in ultra-high speed even though hundreds of thousands of data are inserted per second and some of them are deleted in real-time.
Ultra High-Speed Data Storage
Machbase is designed for time series data so that it offers a storage capacity several dozens times faster than currently available database management systems. And it can process data at an amazing speed from a minimum of 300,000 data per second to a maximum of 2,000,000 data per second even though a specified table has multiple indexes.
Under the conventional database structures, the higher the number of indexes, the slower the data entry performance. Machbase has improved these structures to index data in near real-time while hundreds of thousands of data are inserted per second. This is the crucial feature for time series data analysis as it lays the ground for searching large volumes of data per second.
Real-Time Data Compression
One of the characteristics of machine data is that they are being constantly generated. It is natural that the storage space of the database will become insufficient sooner or later and it means that the database will no longer be able to retain sufficient data to process. In order to compress and store big data without sacrificing performance, Machbase stores data with two methods: physical and logical. First, the logical real-time data compression technology. If there are many data with the same values, it codifies the repeated data allowing more storage space. Second, the physical data compression technology. It compresses data into fixed sized partitions and record the data onto disks. With this two-tiered system, it compresses data that were already compressed through logical compression. Thus, it saves not only the I/O costs, but also improves the storage efficiency by compressing the data hundreds of times smaller than the original source data while data are pouring into the database.
Unmatched Analytical Performance
With the analytical technology, Machbase can search and statistically analyze millions of stored data at a high speed while inserting billions of data per second. Machbase shows great performances in both of insertion and analysis due to the indexing technology and it is sure to play a core role when making business decisions in real-time. Unlike the traditional database, Machbase can process two or more indexes with one query. Therefore, if the data are processed in parallel, we can expect even faster performance.
The following example shows the case of utilizing more than two indexes in one query:
SELECT * FROM table1 where c1=1 and c2=2
Support SQL Syntax for Time Series Data
For its nature of log data, the latest data is much more valuable than older ones, and data access for recently generated data is several times more frequent than older data. Machbase offers the following additional benefits for time series data to its users. First, Machbase stores the timestamp in nanosecond precision in the field of “_arrival_time” upon the very moment of records being inserted into its database. Thus, all the data can either be searched by time or be given the specific conditions. Second, when searching data, it outputs the most recent data first. In other words, SELECT operation displays the most recent data first. As mentioned earlier, it is the same results with organizing data by "descendant order by" based on the '_arrival_time' column. Third, it provides a DURATION keyword. SQL provides this function since it is typical to designate a particular time span for analyzing machine data. With this feature, users can easily analyze the data without assigning complicated time operators to the WHERE clause.
#Gather statistics of data from 10 minutes ago to the present time. SELECT SUM(traffic) FROM t1 DURATION 10 minute;
#Gather statistics of data for the 30 minutes starting from an hour ago prior to the present time. SELECT SUM(traffic) FROM t1 DURATION 30 minute BEFORE 1 hour;
Support Full-Text Search
One of the most important practical features for processing time series log data is to determine whether a “particular event” has occurred at a “particular point in time.” Users can determine a “particular point in time” by processing time series data. However, in order to determine whether a “particular event” has occurred, the users need to search for a specific "word" in the text field stored in a specific column. However, if the conventional database management system is used, users generally need to check the conditions of the first several characters of the words through the LIKE clause or the exact match by a B+ Tree in order to search for words in a particular field and as a result, it causes slow responses in most cases. For these reasons, conventional databases are vulnerable to search particular words. However, unlike conventional DBMS, Machbase provides “SEARCH” and “LIKE” features to search texts in real-time.
#Output a record that includes ‘Error’ or ‘102’ in the msg field. SELECT id, ipv4 FROM device WHERE msg SEARCH 'Error' or msg SEARCH '102'
#Output a record that includes ‘Error’ and ‘102’ in the msg field. SELECT id, ipv4 FROM device WHERE msg SEARCH 'Error 102';
Support Selective Deletion
For log data, the DELETE operation is not allowed after data insertion. However, if the DBMS is embedded into appliances, it has a limitation of data storage spaces and the end users neglect to pay attention to the storage. In this case, companies are forced to bear the consequences of "disk full" or other possible errors. To solve this issue, Machbase provides the function to delete the records in a special condition. Thus, companies adopted Machbase embedded on their appliances can easily maintain the size of data at a certain level by using CRON or other programs regularly.
#Delete the oldest 100 rows. DELETE FROM devices OLDEST 100 ROWS;
#Delete everything except the recent 1000 rows. DELETE FROM devices EXCEPT 1000 ROWS;
#Delete everything except the data for 1 day from the present. DELETE FROM devices EXCEPT 1 DAY;
#Delete all the data generated before June 1, 2014. DELETE FROM devices BEFORE TO_DATE('2014-06-01', 'YYYY-MM-DD');
Automated Data Collection
Machbase provides "COLLECTOR" that automatically collects and transmits log data. It can collects structured data such as syslogs and web server logs, but also collects user-defined log format.