Vertica optimisation part 2: best practices

This is the second post suggesting some Vertica performance improvement. After having looked at the system side of things, we will now look into how to use Vertica.

INSERT vs. COPY statements

Vertica is a columnar database. I will not get into details of what this means, but as data is stored per column, you can imagine that inserting one row impacts the whole table, as all the columns need to be updated.

Our first test with a standard ETL tool (pentaho) was using a usual INSERT into a big-ish table. We reached the great feat of inserting less than 100 rows per seconds. Then we switched to the Vertica way, ie. using a COPY statement. We then reached many thousands per seconds.

Lesson 1: Use the COPY command, not INSERT. I wrote another blog post specifically about the COPY statement.

Window and anlaytical functions

Vertica supports window functions. They are related to grouping as they return aggregated values, but they do so without aggregating the rows. An example taken from the Vertica documentation shows this. You can see that each line is still present (you do not have only one line per department), but you still do have a count, per employee, contrary to the aggregate.

  • Aggregate, 1 row per department:
SELECT dept_no, COUNT(*) AS emp_count 
FROM employees
GROUP BY dept_no ORDER BY dept_no;
dept_no | emp_count
---------+-----------
     10 | 2
     20 | 6
     30 | 3
(3 rows)
  • Analytical, all the rows are present
SELECT 
   emp_no
 , dept_no
 , COUNT(*) OVER(PARTITION BY dept_no ORDER BY emp_no) AS emp_count
FROM employees;
 emp_no | dept_no | emp_count
--------+---------+-----------
      1 |      10 |         1
      4 |      10 |         2
------------------------------
      6 |      20 |         1
      7 |      20 |         2
      8 |      20 |         3
      9 |      20 |         4
     10 |      20 |         5
     11 |      20 |         6
------------------------------
      2 |      30 |         1
      3 |      30 |         2
      5 |      30 |         3
(11 rows)

There is a lot more to tell, and this is not the place for this. You can do a lot with this analytical functions, which are a very strong point of Vertica, for instance:

  • Rank the longest-standing customers in a particular state
  • Calculate the moving average of retail volume over a specified time
  • Find the highest score among all students in the same grade
  • Compare the current sales bonus each salesperson received against his or her previous bonus

Why do I mention them here? Because Vertica very aggressively optimises those queries. It knows how to run the partitions in parallel, making for dramatic speed improvements when you can rewrite joins with window function. With a bit of experience, you even end up with SQL more compact and more readable.

Lesson 2: Use window functions

Projection

This is a big strength of Vertica. When a table is created, it creates a logical schema (CREATE TABLE), but a default physical schema as well (CREATE PROJECTION). A projection is a way to physically store data on disk, and are completely transparent for the end user. You as an administrator can still tailor your projections, though.

If for instance you know that for a specific table a specific column will always be in the WHERE condition, you can put it first in the projection order. As Vertica stores data ordered on disk, this column will thus be very quick to access, and Vertica, without even needing to uncompress data, will know where to find the relevant dataset. If this column would have been the last one on the projection order, it would have been already at least partly split because of the previous column, thus increasing access time.

If you think “index” you might be right, but this is a taboo word in the Vertica world.

What does that mean for you?

By default, Vertica (via the database designer or at first insert) is pretty good a creating projections for you. In some cases, though, you might want to go beyond. You could for instance create 2 projections for a table. This would double load time, of course, as data would have to be written twice on disk, but if the projections are properly tailored, they can make specific queries a lot faster (query-specific projection). If you are using a MERGE statement, having the 2 parts of the merge identically segmented (segmentation is part of defining a projection) will make the merge a lot faster, as it will prevent network traffic.

Furthermore, you can create pre-join projections, where you pay the join cost at load time, not a query time.

A word of warning here. Projections are extremely powerful, but you really can shoot yourself in the foot with them. You could, for instance, create twice the same projection on a table. This would double load time, without having any impact at query time, as Vertica choses only one projection per query.

There is a lot more to say about projections, you can read all about it on the online documentation.

Lesson 3: Set up good projections

Timestamp considered harmful

Just imagine. You have a big table, containing events. You receive 1000s of event per second, all containing a timestamp, at second level. Vertica will of course happily load those events, compressing and storing them on disk.

You then want to look at you events from yesterday. You thus SELECT based on your timestamp column. This column has a cardinality of 86.4k for yesterday only. This means 2.5M for a month. Of course, this will work. But the data stored on disk for this column on disk is huge due to the cardinality, and Vertica will have a lot of extra process to do.

Ask yourself this question: do you really need data at the level of the second? What is smallest timeslice you use when querying this table? If your heaviest queries never look at more precise than one hour, create an extra column with only an hourly timestamp. Just like that, you divided the cardinality of your column, and thus of the first column to look at in your query, by 3.6k, thus saving on IO and processing time. Of course, this does not mean that you need to fully get rid off the timestamp column, it will still be there for adhoc queries when needed. But you usual big queries will run a lot faster.

Lesson 4: Do not use timestamp in heavy queries.

Vertica optimisation part 1: system

Vertica out of the box is an amazing system, not needing a lot of configuration to perform well. That said, a few easy tweaks can even improve its performance. This first post will explain what can be done at a system level, a second post will suggest a few best practices.

Set readahead

This is the way Linux prefetches data. If you fetch a small amount of data from disk, the main cost incurred is head positioning and disk rotation. This means that the cost of fetching a larger amount of data in the same position on disk is dwarfed by the initial cost. This command fetches 8MB of data on each read. Just issue the following command as root or with sudo for a runtime change. Copy the line in /etc/rc.local for permanent change.

# where sdxx is the disk on which your data sits
/sbin/blockdev --setra 8192 /dev/sdxx

Set swappiness

The swappiness is what tells a Linux system how to balance data between memory and swap. By default it is set up to 60. Vertica does require some swap, even if you have enough memory. Not having enough swap can result in the out of memory killer (oom-killer) deciding to kill Vertica. Note that Vertica recommends having at least 2GB of cache in any case. Run the following as root or sudo at runtime, or write it in /etc/rc.local for permanent effect.

echo 0 > /proc/sys/vm/swappiness

Run powertop

Powertop is a little tool looking at what consumes the most energy on a system. Although mostly useful for laptops, it can tell you if some applications or daemon you would not think of is consuming energy, and thus probably resources.

Set up the IO scheduler

The default Linux scheduler is cfq (completely fair scheduler) which tries to balance the need of all applications. Although ideal for a desktop usage, this is not what we want for Vertica. We do not want to be fair, we just want Vertica to perform as fast as it can. For this purpose, the deadline scheduler, which goal is to reduce latency is a much better choice. Run the following as root or sudo, or again write it in /etc/rc.local to keep the change permanent.

echo 'deadline' > /sys/block/sdxx/queue/scheduler

Externalise ext4 journal

Ext4, the only filesystem supported by Vertica (with its parent, ext3), is a journaling filesystem. This means that it keeps track of changes to be made in a journal before committing the changes to disk, speeding up the recovery in case of crash. This journal is kept on disk, so adds extra writes. If you have another disk available, you can write the log on another disk, thus reducing the amount of IOs.

I will assume that the mount point on which vertica writes its data is /mnt/vertica, and the partition is /dev/sda1. The journal partition will be /dev/sdb1.

# stop all relevant services
# you can see with lsof which users are using with files 
# unmount the current ext4 fs
umount /mnt/vertica 

# remove journal from current FS
tune2fs -O ^has_journal  /dev/sda1# set up external journal
tune2fs -o journal_data_ordered -j -J device=/dev/sdb1 /dev/sda1

# remount
mount /mnt/vertica

Data partition mount option

By default, on ext4 each read and write of a file updates the metadata of the file to write the access time. This means that even a read will result on a disk write. This can be avoided to increase performance by adding an option in your /etc/fstab in the line mounting the Vertica partition. You have 2 options:

  • noatime: this prevents update on read and write, for the highest performance gain
  • relatime: this prevents update in read, but keep updating on writes

Create a catalog partition

The catalog can be put on its own partition to relieve the data partition from extra writes. The easiest is then to create a symlink from the initial catalog directory to a directory in the catalog partition.

Conclusion

I set all these enhancements up at the time were performance was very bad due to bad configuration. They had a huge impact (even the simplest queries ran about 2 orders magnitude faster), but I sadly do not have numbers to pinpoint exactly which improvement were the most efficient. I can just tell you that all together they are fantastic.