Table of Contents
Diagram of Technical Features
Machbase is the high-speed time series database which combines the three technical elements shown in the picture below.
Columnar Database Technology
Columnar database is an optimized database for OLAP (Online Analytical Processing) that column data are physically gathered as shown in the picture below.
Since the storage system uses the column units, the values of the columns are located in consecutive disks or memory space. Even if the columns are from different records, they are searchable without overloading the system.
Moreover, in this structure, the performance of data analysis is at least ten times faster than the row-based structure, and it is also easy to compress the data.
Thus, this columnar structure is often used to analyze a large amount of data, and Machbase also stores records in column. However, conventional columnar DBMS is used for solely the best performance of analysis, but input and loading data in real-time are weak areas, it is not appropriate for analyzing data in real-time.
In-Memory Database Technology
In-memory database is a high-performance database that is optimized for OLTP (Online Transaction Processing) that inputs and searches data at ultra-high speed. In-memory database stands out in such areas as finance, telecommunications and manufacturing industries where data are handled in real-time. In-memory database market is rapidly growing worldwide.
However, in-memory database is not suitable for processing log data that are infinitely generated from the machines due to the fact that all the data need to be located in memory. Row-based database also has significant constraints on data compression and management.
Machbase is able to conduct high performance data processing by employing a technology to load latest log data into the database. Further, it loads old data from the database on to disk once a specified time has passed. As a result, it provides its users with an innovative architecture that can flexibly manage their data depending on the importance of the actual data. Referring such memory architecture internally as "memory window," Machbase permits its users to determine the size of the memory window when they create the log table.
Real-Time Indexing Technology
Machbase has the technology that enables real-time indexing, and this technology has created an innovative time series database that is in harmony with a traditional database technology.
B+ Tree, which is generally used in a traditional database, is not suitable for indexing a large amount of data in real-time. The reasons are stated below.
To begin with, the performance of index update matters a lot. In order to input data onto B+ Tree, the existing keywords in each index should be aligned and find their own locations. This procedure incurs costs as it has to look into every index. Thus, it has the fundamental limitations for meeting the requirements to insert hundreds and thousands of data in real-time.
Second, the size of index data. To increase its performance, B+ Tree maintains the key values inside an index. Therefore, if the number of raw data increases, the size of the index increases also. Likewise, if the number of the index becomes larger, the volume of data also grows exponentially. The index structure of such a traditional database has limitations and difficulties of processing time series data in real-time. On the other hand, Machbase satisfies the real-time requirements through the following index technologies.
Real-Time Bitmap Index
Bitmap Index is a technology of data management that saves the values of columns or records in unique bit strings, composed of 0 and 1 instead of organizing tree data structures with records or column values located within the database. As shown in the picture below, the data values saved in “Data Values” are composed of six-bit strings (b0~b5) on the right side. The data values have the same bit strings.
The bitmap index of Machbase has the following advantages:
First, the data input speed is very fast. It is because we have successfully developed an algorithm that updates only the tip of the bitmap indexes which are reconfigured upon the input of data, greatly reducing unnecessary operations.
Second, the bitmap index itself is configured not to have a key value. Because of this, the space the bitmap index takes can be optimized very efficiently and, it also greatly enhances the efficiency of compression ratio.
Third, there is a close cooperation between indexes. A traditional database has restrict the query processor to select only one index. Due to such limitation, even if several effective indexes are available, concurrent operations cannot be performed. On the other hand, Machbase can be operated at a high speed by using not only an index separately in each column, but also conducting multiple indexes with AND or OR operation. In particular, such property makes it possible to use plural indices included in one query and as a result, parallel operations can easily be performed. Consequently, Machbase can result in high performance in statistical analysis of large-scale.
Fourth, it ensures very good space efficiency. Given the fact that index data, consisted of the bitmap, has the structure to apply various compression algorithms, Machbase can manage data very fast and efficiently.
Real-Time Text Index
Machbase provides a real-time search function via “keyword index”.
The keyword index provides an inverted index used in search engines, so it is suitable for discovering specific patterns in the text data stored in the database. For text-based log data, the search function is necessary because finding specific error messages or message patterns is their main task. And since Machbase provides outstanding performance in searching for specific patterns based on UTF-8, it is the sole database which has the powerful search function and the convenient features of data management.