Vertica ODBC error messages and solutions

Those are error messages and solutions found after lots of trials and errors. I am mostly using python with Vertica, some some solutions might thus be python specific, but most should be generic enough.

[HY000] [unixODBC][Vertica][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function. (11560) (SQLDriverConnect)

The ODBC connection does not find a properly defined DSN. Reasons include:

  • Path not existing in one of the odbc.ini or odbcinst.ini files (check mostly ODBCInstLib, Driver, Driver64).

[22001] [Vertica][ODBC] (10170) String data right truncation on data from data source: String data is too big for the driver’s data buffer. (10170) (SQLPutData)

This is a unicode issue. Reasons might be:

  • Old pyodbc which does not handle UTF-8  properly (try to use version 3+)
  • Vertica’s VARCHAR length is given in bytes, not character. So if you have UTF8 characters in a string, you might go above the limit without noticing. Eg. a VARCHAR(1) can hold ‘0’ but not ‘€’.
  • Pyodbc does not handle unicode properly. If you are using python, encode in UTF-8.

[IM002] [unixODBC][Driver Manager]Data source name not found, and no default driver specified (0) (SQLDriverConnect)

The DSN used does not exist. Reasons include:

  • Typo in the DSN in your code (you are asking for a DSN not defined in odbc.ini).
  • odbc.ini file syntax invalid (for instance closing square bracket forgotten).
  • DSN not defined in the used odbc.ini file.
  • Wrong odbc.ini file used, hence DSN not found. This can happen if a $HOME/.odbc.ini file, often created by default, exists.
  • The odbc.ini is not in the expected path (/etc/odbc.ini). Pointing the ODBCINI environment variable to the right path might work.
  • The odbc.ini file references a Driver in the the relevant DSN section which is not defined in /etc/odbcinst.ini.

[HY000] [unixODBC][DSI] The error message NoSQLGetPrivateProfileString could not be found in the en-US locale. Check that /en-US/ODBCMessages.xml exists. (-1) (SQLDriverConnect)

Vertica needs some extra specifications in either /etc/vertica.ini (default), or in the file pointed to by the VERTICAINI environment variable:

[Driver]
ErrorMessagesPath = /opt/vertica/lib64/
ODBCInstLib = /usr/lib/x86_64-linux-gnu/libodbcinst.so
DriverManagerEncoding=UTF-16

Usually I just add this to odbc.ini and points VERTICAINI to it.

pyodbc.Error: (‘H’, ‘[H] [unixODBC][ (4294967295) (SQLDriverConnectW)’)

You are using an old version of pyodbc. Upgrade system wide or create a virtualenv and pip install pyodbc.

Advertisements

Python + Vertica = pyvertica

At $work we use mostly python for all the glue between systems and other developments. Naturally, when we started using Vertica, we wanted to connect to it via python. Of course, vertica is accessible via ODBC and supports standard SQL, but to fully harness its specificities, we needed a bit more. INSERT statements are very slow, for instance, and should be replaced by COPY, as already described in the best practices.

We quickly decided to create an open source library which can handle that for us. This hides the dirty details of the COPY statement, and allows us to insert data in a pythonic way.

The best way to show it in action is to display a quick example, taken from the documentation:


from pyvertica.batch import VerticaBatch

batch = VerticaBatch(
  odbc_kwargs={'dsn': 'VerticaDWH'},
  table_name='schema.my_table',
  truncate=True,
  column_list=['column_1', 'column_2'],
  copy_options={
    'DELIMITER': ',',
  }
)

row_list = [
  ['row_1_val_1', 'row_1_val_2'],
  ['row_2_val_1', 'row_2_val_2'],
  ...
]

for column_data_list in row_list:
  batch.insert_list(column_data_list)

error_bool, error_file_obj = batch.get_errors()

if error_bool:
  print error_file_obj.read()

batch.commit()

As you can see, inserting data is just as easy as calling insert_list on a well defined object. There are as well other helpers to insert raw data, if for instance you just read a line from a csv file, there is no need to parse it in python to define columns, you can just throw all data to pyvertica. All COPY options are useable when you define the object, they will just be passed as is to Vertica.

Technically, this was an interesting little project. The COPY statement reads data from a file (actual file, STDIN or fifo). As we did not want to duplicate data on disk, pyvertica actually creates a fifo, spawns a thread running the COPY statement and send data to the fifo. Whatever you send to pyvertica is converted into a csv-like structure, which is what COPY understands.

We use it in production since months without a single issue. Depending on used hardware and extra processing, it is easily possible to import 100.000 records/second.

The installation is very easy as pyvertica can be found in the python package index. Just type

pip install pyvertica

and you are good to go.

To actually connect to vertica, you have the choice of

  • pass a valid odbc connection to pyvertica
  • setup a proper odbc.ini and pass the DSN to pyvertica
  • craft your ow DSN-less line and pass it to pyvertica.

All documentation can be found on readthedocs page of pyvertica, and the source can be found and forked on the pyvertica github repository.