What are /r/cassandra's favorite Products & Services?
From 3.5 billion Reddit comments

I'll be up front, this isn't a great use case for Cassandra. What you're asking for is essentially a sorted set, something Redis is great at, since it stores 2 structures, one for the set look ups and one for the sorting. I've had this exact problem in the past and I used Redis for the sorted sets.

That said, if you feel like jumping into dark territory, you could try it like this...

CREATE TABLE inbox ( user int, last_contact timestamp, contact_id int, primary key (user, last_contact) ) WITH clustering order by (last_contact desc)
AND compression = {'sstable_compression': 'LZ4Compressor', 'chunk_length_kb': '4'} AND compaction = {'class': 'LeveledCompactionStrategy'} ;

When a user gets sent a message, you'll have to delete the old last_contact record and insert a new one. I'm not wild about this because a high churn on messages will generate a lot of tombstones, but since you're dealing with people you might only see a few hundred of these per week.

If you do hit a high tombstone count, my advice is to use LCS and run daily subrange repairs on this table using reaper: http://cassandra.apache.org/ which we (The Last Pickle) maintain and is open source. Once you've got your repairs running regularly you can drop your gc grace seconds down to a number close to your repair schedule, and let the tombstones drop out at a faster rate than they do by default.

I think you'll also probably need a per-user lookup table to identify all the messages from a user:

create table inbox_by_user ( user int, contact int, message_id id, // other necessary message details here primary key ((user, contact), message_id) );

Whenever you want to lookup all the messages in the inbox table from a specific user, you can consult inbox_by_user. It also gives you a per-user history, which might be helpful.

Duuuuuude, I was pretty hype with Titan ever since I read about it, I didn't expect them to acquire the tech.

Neo4J isn't so great for scaling up from what I've researched unless you pay for their enterprise but they're pretty mature tech compare to Titan.

edit:

I spoke too soon:

https://groups.google.com/forum/#!topic/aureliusgraphs/WTNYYpUyrvw/discussion

Titan is going to slow down as the main two contributor will be focusing on creating Datastack enterprise graph db. I do understand that Datastack needs to make money and is a company. Hopefully Titan's future is bright.

my colleagues use razorsql, I prefer cqlsh myself though.

The thing that I hate most about DevCenter is its appetite to eat more and more of memory. Plus, the stupid limitations of how many rows you can retrieve. Like What the hell? We raised this concern to DataStax back when we had DevCenter 1.4, but until now (1.6), it's still the same

Their solution is to stripe multiple disks, and you can also move some of your data to the temporary SSD. Just make sure to spread over upgrade domains and you should be fine.

Please note that they announced new instance types with a lot of SSD and while I have not profiled those yet, they look promising.

Here is the annoucement: http://azure.microsoft.com/blog/2014/09/22/new-d-series-virtual-machine-sizes/

What are /r/cassandra's favorite Products & Services? From 3.5 billion Reddit comments

The most popular Products mentioned in /r/cassandra:

Mastering Apache Cassandra - Second Edition

The most popular Services mentioned in /r/cassandra:

Apache Cassandra

elasticsearch

RazorSQL

Microsoft Azure

DigitalOcean

Google Groups

The most popular reviews in /r/cassandra:

What are /r/cassandra's favorite Products & Services?
From 3.5 billion Reddit comments