Sunday, 8 September 2013

Cassandra data modelling - dynamic columns with datetimes and filtering on UTF8Type

Cassandra data modelling - dynamic columns with datetimes and filtering on
UTF8Type

I'm looking to create a table to store information containing details on
the URL parameters of visitors to a bunch of websites. I'm getting a
dictionary of parameters and values per user_id-session_id pair for each
site, each with a timestamp.
In summary: Each site has Users (who have sessions) that have parameters
that have values at times.
I'm going to want to query this data like so...
For each website, given a daterange, get the customer_id of the customers
with a certain parameter value.
p.s. I don't readily have a list of user id/session id so I can't use
those as row keys without doing a 'get_range()' query. Which may be
expensive.
Attempted solutions/thoughts...
I figured that dynamic columns would be a good idea here as we don't
really know all the different sets of parameters and I'd rather not make
an attempt at hard coding them in. That gives me the name of the parameter
as the column header and the value of the cells as the parameter value.
That's all well and good but it leaves me with composite row keys of
(site, date, customer_id, session_id).
I'm all for big composite row keys but I don't think I (would love to be
corrected) could query the data to get ranges of dates in this way. If so,
how can I do that please?
Also thought about having site-datetime composite column headers as that
allows me to filter for those properties nicely. Then I could have the row
keys as (param - param value) composite keys and build a set of user_ids
as values. Actually, that's sounding like my best bet at the moment I
think, but it still seems fairly messy.
If anybody has got any ideas, I'd love to hear them!
Thanks,
Matt
p.p.s I'm using Pycassa, CQL3 and Cassandra 1.2 so have all the tools
available to me that those platforms offer

No comments:

Post a Comment