SAP HANA Database Architecture Overview

(C) Brandeis Consulting

Agenda - SAP HANA Database Architecture Overview

Performance (EN)

  • In Memory
  • Column Based
  • Compression
  • Insert-Only

Blog of Werner Daehn

(C) Brandeis Consulting

In Memory

Typical access times

  • Hard Disc (HDD): 10 ms
  • Flash Memory (SSD): 250 µs
  • RAM: 70 ns
  • CPU Cache: 1 - 15 ns

(Source: Wikipedia)

(C) Brandeis Consulting

CPU Cache

Typically 1-32MiB Cache per CPU, not per Core

A HANA Database has often multiple TB of data!

(C) Brandeis Consulting

Column Orientation - Example data

ID Firstname Lastname Room Department
1 Peter Müller A12 DEVELOPMENT
2 Paul Maier A12 DEVELOPMENT
3 Sibylle Müller A3 CONSULTING
4 Sigmar Schmidt A3 SUPPORT
5 Siglinde Fischer A12 DEVELOPMENT
6 Petra Becker C1 SALES
7 Ernst Weber A3 SUPPORT
(C) Brandeis Consulting

Colum Orientation - Splitted in Columns

Row Value
1 1
2 2
3 3
4 4
5 5
6 6
7 7
Row Value
1 Peter
2 Paul
3 Sibylle
4 Sigmar
5 Siglinde
6 Petra
7 Ernst
Row Value
1 Müller
2 Maier
3 Müller
4 Schmidt
5 Fischer
6 Becker
7 Weber
Row Value
1 A12
2 A12
3 A3
4 A3
5 A12
6 C1
7 A3
Row Value
1 DEVELOPMENT
2 DEVELOPMENT
3 CONSULTING
4 SUPPORT
5 DEVELOPMENT
6 SALES
7 SUPPORT
(C) Brandeis Consulting

Single Column Perspective

Column

Row Value
1 DEVELOPMENT
2 DEVELOPMENT
3 CONSULTING
4 SUPPORT
5 DEVELOPMENT
6 SALES
7 SUPPORT

Dictionary for the column

Key Value
1 CONSULTING
2 DEVELOPMENT
3 SALES
4 SUPPORT
  • Key & Value are sorted
  • Similar to the SID-Concept of BW

Compressed column

Row Key
1 2
2 2
3 1
4 4
5 2
6 3
7 4
(C) Brandeis Consulting

Simple filter in a compressed column

SELECT ... WHERE department = 'DEVELOPMENT'
  1. Get the key for the value from the dictionary. This is fast, because
    • the dictionary is sorted
  2. Find the key in the compressed column. This is fast, because
    • a big chunk of compressed columns fit in the processor cache (~4mio)
    • only integer comparisons were required
    • the task could be parallized
Key Value
1 CONSULTING
2 DEVELOPMENT
3 SALES
4 SUPPORT
Row Key
1 2
2 2
3 1
4 4
5 2
6 3
7 4
(C) Brandeis Consulting

Complex filter in a compressed column

  1. Which factors influence the runtime

    • the length of the dictionary - This is dominant!
    • the length of the column
  2. How could intervals be selected?

    • The keys and values were sorted equaly. So an interval of values correspond to an intervall of keys.
  3. What about complex criterias?

SELECT ... WHERE SUBSTR(department, 2, 4) = 'EVEL'
  • Everything depends on the lenght of the dictionary
Key Value
1 CONSULTING
2 DEVELOPMENT
3 SALES
4 SUPPORT
Row Key
1 2
2 2
3 1
4 4
5 2
6 3
7 4
(C) Brandeis Consulting

Dictionary compression in JOINs

  • Both tables have different dictionaries, even if they have similar columns, e.g. MATERIAL
  • A translation between the dictionaries is required. This is fast, because
    • Both dictionaries were sorted
(C) Brandeis Consulting

Conclusions of the concepts

  • Many database operations can be executed on compressed data
  • These are fast, because
    • the data volume is small and can be processed in the cache in big chunks
    • only integer comparisons are required
  • The length of the dictionary has an important impact on filter and join operations
(C) Brandeis Consulting

Compression Types

Other types of compression are also used in SAP HANA. Dictionary compression is the standard compression.

Other compression types

  • Prefix encoding
  • Run-length encoding
  • Cluster encoding
  • Indirect encoding
  • Sparse encoding

Details are described in SAP Note 2112604
https://launchpad.support.sap.com/#/notes/2112604

(C) Brandeis Consulting

Information about Sizes and Compression

The system view M_CS_COLUMNS contains detailed information about the compression, e.g.

  • MEMORY_SIZE_IN_TOTAL
  • MEMORY_SIZE_IN_MAIN
  • MEMORY_SIZE_IN_DELTA
  • UNCOMPRESSED_SIZE
  • COMPRESSION_RATION_IN_PERCENTAGE
  • COMPRESSION_TYPE
  • COUNT
  • DISTINCT_COUNT
(C) Brandeis Consulting

Insert Only and Delta

Changes to the Data base

Updates and Inserts will not be stored in the optimized, compressed data (aka. MAIN) but in a separate data structure: The DELTA Storage

All changes are just an append to this table.

Reading the Delta Storage

After each reading operation to the MAIN, the DELTA has to be checked for changes. Because it is not optimized, a read could be expensive when the DELTA is large.

==> DELTA should be merged to MAIN

(C) Brandeis Consulting

Delta Merge (DM)

The Delta Merge(DM) is a column operation that creates a new optimized version of the MAIN storage.

  • Each colum can be merged seperately from the other columns
  • The DM is either started manualy via SQL or automatically
  • The DM runs in the background

Example

  1. We start with MAIN(V1) and DELTA(V1)
  2. When DM starts a new DELTA(V2) is created. All further changes will be stored there. Read operations have to consider MAIN(V1), DELTA(V1) and DELTA(V2).
  3. The data of MAIN(V1) and DELTA(V1) will be merged into MAIN(V2). When this is finished:
  4. MAIN(V2) and DELTA(V2) will be read in future and MAIN(V1) and DELTA(V1) will be deleted.
(C) Brandeis Consulting