You have probably noticed that Apache Cassandra, through Thrift, exposes it’s column names, column values and super column names as binary. See the Thrift declarations:
struct Column { 1: required binary name, 2: required binary value, 3: required Clock clock, 4: optional i32 ttl, } struct SuperColumn { 1: required binary name, 2: required listcolumns, }
On the other hand the column family name, the row key and the keystore name are all type string. See the ColumnPath Thrift declaration:
struct ColumnPath { 3: required string column_family, 4: optional binary super_column, 5: optional binary column, }
And the ‘get’ method Thrift declaration:
ColumnOrSuperColumn get(1:required string keyspace, 2:required string key, 3:required ColumnPath column_path, 4:required ConsistencyLevel consistency_level=ONE)
Yet, when declaring a Column Family in the config file you do give a data type:
<ColumnFamily Name="Regular1" CompareWith="LongType" />
Which means that the column name is type long.
This fact may seem odd to those who work with relational databases. We aren’t assigning data types to the column value – rather the value is always binary. We also have only one data type for the primary key – string – which is unlike relational databases which often uses a long. (note Cassandra 0.7 should have binary primary keys instead of string). Indeed in a typical relational databases the only fixed data type is the column name and it is string. So in effect it seems like Cassandra (0.6 and before) is the complete inverse of a typical relational database.
To quickly summarize:
set keystore.family[rowkey][columnname]=value In ver 0.6 (current): set string.string[string][bytes]=bytes In ver 0.7: set string.string[bytes][bytes]=bytes
So now back to why. Why do column names get a data type in the config file? The reason has to do with how the values collate. Internally Cassandra collates the columns in order, by name, based on the data type. If we are working with integer column names, for example, then each column name is 4 bytes long. Lets work with column names 727, 1944 and 42.
The bytes associated with these three numbers:
727 = 000002D7
1944 = 00000798
42 = 0000002A
No matter what order the column names are set into Cassandra, the resulting order in storage will be based on byte level comparison of these column names.
string.string[string][0000002A]=bytes string.string[string][000002D7]=bytes string.string[string][00000798]=bytes
Enforcing a collating order is important for collation consistency and is also important for selecting “slices” of columns (see SliceRange in the API). Collation consistency can be violated when data types mix. Let us suppose, for example, we tried to store with a column name using a ASCII string value “92”. What does “92” look like in bytes?
“92” = 3932
So *if* Cassandra allowed that operation we’d now have:
string.string[string][0000002A]=bytes string.string[string][000002D7]=bytes string.string[string][00000798]=bytes string.string[string][3932]=bytes
Because the leading bytes of the integer type would not be present. This throws off the collation because it is likely that we wanted “92” to be stored after 42 and before 727, instead it was stored after 1944.
However, Cassandra would not allow that operation if the data type was configured to integer data type. Likewise with a TimeUUID where the first 8 byte are time sensitive – thus ensuring a collation in chronological order.
The collation order is deemed important by Cassandra and, in order to provide maximum flexibility, columns names (and soon row keys) are collated based on arbitrary byte sequences. However, values inserted into Cassandra may be limited to byte sequences matching an accepted pattern (Long, TimeUUID, UTF-8, etc)
With the intended collation scheme of Cassandra in mind, in the next post I’ll discuss how JSON’s data types (string, number, object, array, true, false, null) can fit into the Cassandra data model.