Online Analytical Processing, or OLAP, is a computer processing technology that allows rapid execution of complex analytical queries. It is an important part of business intelligence, providing powerful capabilities for data mining and trend analysis. OLAP represents the data in a multidimensional structure, allowing the users to consolidate, detail and rotate it in order to see and analyze it from various perspectives. This multidimensional database structure allows processing ad hoc queries rapidly, which makes OLAP tools a popular choice for business reporting.
The term OLAP, as opposed to OLTP, was coined by Edgar Codd in 1993 in his article in Computerworld magazine where he outlined the 12 rules for OLAP software. Although the article was later retracted by the magazine due to a potential conflict of interests, its influence on shaping up the OLAP technology is undeniable. In 1995, Nigel Pendse defied the FASMI (Fast Analysis of Shared Multidimensional Information) requirements for OLAP tools. The history of the technology itself dates back to 1970 when Information Resources released Express, the first OLAP server. It was bought by Oracle in 1995 and subsequently became the basis of Oracle OLAP, a multidimensional calculation engine that Oracle provides within its database. In 1992, another prominent OLAP product, Essbase, was released by Arbor Software (acquired by Oracle in 2007). Microsoft released its OLAP server, MS Analysis Services, in 1998. It contributed to the popularity of the technology and prompted the development of other OLAP products.
OLAP cube is the cornerstone of an OLAP system. The data in the cube is organized using either a star or a snowflake schema, at the center of which there is a fact table that contains aggregations (measures) and is connected to a number of dimension tables containing information about the measures. Dimensions describe how the measures can be analyzed. If the cube contains more than three dimensions is it often referred to as a hypercube.
As mentioned above, at the center of the cube structure there is a fact table containing aggregations. Using aggregations is the main reason why queries are processed a lot faster in OLAP tools as compared to OLTP. Aggregations are summaries of data that have been pre-calculated during the data processing. All members stored in the dimension tables define the queries that the cube can receive, and aggregations answer these queries before they are generated. In the cube, aggregations are stored in the cells whose coordinates are specified by particular dimensions. The number of aggregations the cube might contain depends on all possible combinations of dimension members.
One of the main characteristics of OLAP is the multidimensional data structure. The cube can have multiple dimensions. The combination of dimension members defined in the query point to a specific cell that contains an aggregated value, allowing the system to process queries almost instantaneously. The multidimensional structure also allows viewing the data from multiple angles. All this makes OLAP databases especially popular for data analysis and BI applications.
One of the core components in OLAP technology is OLAP server. In the traditional three-tier data warehouse architecture, it occupies the middle tire, between the database server and the client-side OLAP tool. Besides aggregating and pre-computing the data from the relational database, it provides advanced calculation and write-back options, additional functions that extend the basic possibilities of the query languages, and other features. The leading OLAP software vendors to date are Microsoft, Oracle, IBM, SAP, and others.
There are several types of online analytical processing systems that differ in the way the data is stored, the way access to the data is organized, etc.
Relational OLAP, or ROLAP, stores all data, including aggregations, in relational databases rather than cubes, and doesn’t use pre-computation. ROLAP works with SQL tools that send queries directly to the relational database, but do it in a way that creates an appearance of OLAP operations such as slicing, dicing, and drill down. This approach is considered best suited for enterprises with large data warehouses as if offers better scalability and short load times and is good at processing text information. However, the relational database needs to be specifically designed (an OLTP database won’t do), SQL doesn’t support a lot of complex calculations, and it is often required to create aggregate tables to improve query execution. Overall, ROLAP tools usually have lower performance than those based on MOLAP.
Multidimensional online analytical processing (MOLAP) is the traditional OLAP model. All data, including aggregations, is stored in multidimensional data cubes. The data in the cubes is pre-computed, which ensures fast query performance. The multidimensional model makes it easier to locate cells in a cube than in a table, which, along with powerful OLAP techniques and calculation algorithms, allows retrieving and analyzing data significantly faster than when working with relational databases. At the same time, since the data in the cubes is pre-calculated and thus harder to update, such systems are not very scalable. Besides, transferring data from a relational database to a MOLAP engine can lead to data redundancy.
Hybrid OLAP (HOLAP) combines features of the two previous approaches in order to provide fast query processing in combination with high scalability. In HOLAP, the data is divided between a relational database and multidimensional cubes. There are two approaches as to how to divide the data:
- Horizontal partitioning implies that some data slices that require faster query processing are stored in MOLAP, and the others are stored in ROLAP
- In the vertical partitioning, aggregations are stored in MOLAP and detailed data – in ROLAP.
Major vendors now support this approach in one form or another.
DOLAP DOLAP, or Desktop OLAP, implies that users can download a part of the cube on their computer and work with the data locally. The main advantage of such OLAP applications is that they are easy and cheap to deploy and maintain. The biggest drawback, however, is their very limited functionality. WOLAP WOLAP includes web-based OLAP tools that do not provide options for data download or local access to data. Such solutions have low deployment cost as all the client needs is Internet connection and a web browser, but, just as desktop OLAP, they have limited functionality. Besides, all major providers now offer web-only access to data as a feature of their much broader solutions. RTOLAP The main feature of Real-Time OLAP (ROLAP) is that aggregations are not stored in relational tables or cubes, but are calculated in-memory during the query execution.
Online analytical processing was created due to the inability of OLTP systems to process complex queries fast enough (OLTP stands for online transactional processing). Since then, the two technologies have been contrasted to each other constantly. Here is a short overview of the main differences between OLAP and OLTP tools.
- Queries. OLAP works with a relatively small amount of very complicated read-only queries, while OLTP processes a large volume of simple queries of different kinds, such as insert, update, read, and delete queries.
- Users. OLAP tools are mainly used by data analysts, managers and decision-makers; OLTP systems are used by database administrators and IT professionals.
- Data. OLAP uses consolidated historical data organized according to a star schema or a snowflake schema. OLPT uses operational data stored in many tables. OLAP data is rarely updated and the updating process takes a lot of time, while OLTP data is updated regularly and fast.
- Data representation. Multidimensional vs. flat two-dimensional.
- Database size. As OLAP databases have to store consolidated data, they are significantly larger than OLTP ones, containing millions of record and occupying up to 1 TB as opposed to thousands of records and the size of up to 1 GB of the OLTP databases.
- Purpose. The main purpose of OLAP is data analysis and business intelligence operations scaffolding planning, problem-solving and decision-making processes in the company, while OLTP is used for running and controlling ongoing business tasks.
The data in cubes is organized in dimensions that consist of hierarchies of multiple levels. Such data structure lays down the foundation for operations of OLAP tools that facilitate data analysis. The main OLAP operations include: roll up, drill down, slice, dice, and pivot. Roll Up Roll Up is the aggregation of data along a hierarchy of a cube dimension. When Roll Up is performed, the data is aggregated by ascending a hierarchy. Drill Down The Drill Down operation allows users to step down the hierarchy, from the most aggregated values to the most detailed ones, for example, “year” to “month” to “day” in the Date dimension, “country” to “province” to “city” in the Geography dimension. Slice The Slice operation extracts a subcube from a cube by selecting a value for its dimension, for example, we can retrieve a “slice” from a cube showing the number of internet sales in all countries for all years by choosing “Austria” as the value of the country dimension. Dice The Dice operation is an “extension” of the slice operation as it allows users to extract a subcube by selecting values for several dimensions. Using the example above, we can perform the dicing operation by selecting “Austria” for the country dimension and “2017” for the Date dimension. Pivot Pivot operation figuratively rotates the cube in order to provide a view of the OLAP data from an alternative perspective. This operation allows analyzing the data from different perspectives, making it one of the most important OLAP features for creating analytical reports. You can see the graphic representation of the main operations in the figure below. These are just the main OLAP operations. Other operations include drill through, scoping, screening, etc.
The first standard API, OLE DB for OLAP (ODBO), was released by Microsoft in 1997. Four years later, XMLA was announced as the standard for data access in OLAP tools. To date, they remain the industry standards that are accepted by most vendors. ODBO introduced MDX as the query language for multidimensional cubes. It was written specifically for business analysts and bears a slight resemblance to SQL. MDX is the only query language that is considered standard for XMLA and is used by most OLAP tools. Some systems, like Oracle, for example, implement the functionality of OLAP in SQL language extensions but also support MDX.
Unlike OLTP, OLAP supports just read-only queries, which means that there are no OLAP functions for data editing, but only for its selection – select. MDX syntax is complicated, but there are some basic concepts. MDX functions for cube members include .Children, .Parent, .Level, and .Hierarchy. They return the member’s child elements, parent element, hierarchy level and the hierarchy itself, respectively. .Members function can be applied to axes, hierarchies and hierarchy levels and returns all members. MDX allows you to select Sets (members from the same axis), Tuples (members from different axes), and Sets of tuples (using the Crossjoin function). Filtering in MDX is introduced using where. All cube elements (dimensions, hierarchies, levels, members, etc.) are written in square brackets, while sets and tuples are written in parentheses. MDX also provides functions for calculating custom members (both measures and attributes) that are not currently available in the cube.
Nowadays, the market offers a large number of OLAP data analysis tools that cater for users’ versatile business needs. There is no shortage in vendors, and customers have a lot of freedom choosing whether they want to work with MDX or SQL and what type of OLAP will work best with their system architecture (ROLAP, MOLAP, etc.). We can single out two main types of OLAP tools available on the market. The first type encompasses the server-side tools like OLAP servers. They occupy the spot between the data warehouse and the client application and aggregate the data from the database into cubes. They also provide advanced calculation options, data access security, and other server-side functionality. The second type includes client-side tools that provide easy and comprehensible interfaces for business users, allowing them to work with data and create OLAP reports without any specific IT or business analysis skills. These tools usually provide such functionality as ad hoc report generation, drag-n-drop interface, data visualization, working with KPIs, etc. Two of such tools are Ranet OLAP and Ranet Analytics.
Ranet tools provide end users with intuitive easy-to-use interfaces to work with their business data. Ranet OLAP tool is a library of powerful OLAP components for data analysis, reporting, KPI analysis, trend analysis, and data visualization. Ranet allows end users to connect instantly to their OLAP data source and create interactive reports by simply dragging and dropping elements from the cube to the report designer area. A built-in cell style designer allows creating custom formatting styles for pivot tables, making it easier to spot underlying data trends. Special extensions allow visualizing your OLAP report data on different types of pivot charts. For those users that know MDX syntax, Ranet OLAP also provides an option to write custom MDX queries and create custom calculations, which helps users to extract and analyze data slices that are not readily available in the cubes.
Like any other technology, OLAP has its advantages and disadvantages that need to be thoroughly considered before database and data warehouse design and implementation.
The most important advantage of OLAP tools is their speed. The main reason for developing it was the inability of other systems to process queries fast enough. Due to aggregation and pre-computation of data, OLAP can execute complex queries in just a tiny fraction of time in which the same query will be processed in OLTP. Another considerable advantage is advanced calculating and data analysis options. OLAP is heavily used for data analysis and BI purposes, which incentivizes vendors to improve the computational power of their tools constantly. MDX is a powerful instrument that offers numerous options for data extraction and analytics. Data representation is another important advantage of using OLAP for analytics. Not only is multidimensional data structure convenient for the human brain to visualize and comprehend, but also it offers an opportunity to rotate the data and analyze it from different perspectives. The cube structure provides an option to not only analyze actual data but also create what-if scenarios for business modeling using write-back partitions of the cube. Besides, it has a relatively flat learning curve. Created for business analysts rather than engineers, it allows end users to operate easily comprehensible business terms, and the help of database and IT specialists is rarely required for designing and creating OLAP reports.
When designed and used appropriately, OLAP has hardly any disadvantages. However, this technology has a number of limitations that prevent it from being used in certain situations. One of the biggest factors that can become a disadvantage is that it requires a lot of storage space. In most OLAP systems, the data is aggregated and pre-computed before storing in the cubes, which leads to steep increase in the amount of data and can potentially result in data redundancy. In order to avoid it, a lot of OLAP tools use powerful data compression algorithms. Besides, some systems don’t aggregate the data at all or aggregate just parts of it. However, depending on the database size, data warehouse architecture, the purpose of the system and other factors, these steps are not always effective. Another disadvantage is that OLAP can’t process a large number of queries. It was designed to process a small number of queries that are more complex in nature and will perform poorly when required to execute multiple queries. One more disadvantage is that the data in the cubes cannot be updated often. Again, as the data in the cubes is aggregated and pre-computed, the update process will take a considerable amount of time. OLAP was designed to work with a large amount of historical data that is not supposed to be frequently updated, and if you need to process a lot of current data it will not perform well.
As we have already mentioned above, the main asset of online analytical processing is its powerful data analysis capabilities. It determines extensive use of OLAP in business intelligence and reporting applications. Standard examples of OLAP applications include marketing analysis and planning, sales analytics, business modeling and forecasting, financial reporting and other spheres. OLAP tools are widely used in data mining, providing insights into the company’s data trends and delivering information to the management used in the company's’ problem solving and decision-making processes.