Cassandra and Hadoop and Hive, Oh my! (part 2)

This is part two to my previous post about adding Cassandra, Hadoop and Hive to the infrastructure at work. We are building a new web based application and wanted to add features that the current infrastructure would not support.

We needed to get our web and database development teams trained on Cassandra, we used DataStax Training and had them onsite to not only deliver training but to answer questions and give guidance on how to setup our column families.

Training

We had a onsite 2 day class delivered by Tupshin Harper from DataStax. He did an excellent
job of getting us up to speed, and answering all questions about NoSQL basics, how to attack some of the problems we are looking to solve, and how to use Hive/Pig.

Infrastructure

Our initial setup was development and QA environment single machine installations of Cassandra. This allowed us to start developing and getting used to how to work with Cassandra, CQL and Fluent Cassandra (we are a .Net shop, expanding into using more open source tools), without having to worry about maintaining a multiple instance setup.

We are now building the staging and production environments both of which have multiple nodes in the clusters. Another wrinkle we had to overcome was the staging and production environment are locked down, so I did not have internet access on those instances to perform a typical install of DataStax Enterprise. To solve this problem we created an internal YUM repository, and mirrored the RedHat, and DataStax repos. Now we can use YUM to install and solve all of the dependency issues. Doing the install this way we did have to manually edit a few files, generate node tokens by hand, and bring the cluster up first then use OpsCenter to install the opscenter-agent on each node.

Cassandra 1.2 for DataStax Enterprise

Cassandra 1.2 for DataStax Enterprise, once it is released, brings virtual nodes and cql3, which has a binary protocol, and Collection columns (collections set, list and map). Features that we are prepared to take full advantage of once they are released to DataStax Enterprise.

3 thoughts on “Cassandra and Hadoop and Hive, Oh my! (part 2)”

Annie Rogers says

August 8, 2013 at 12:56 am

Surprised to see that you haven't included nutch?

- rmcfrazier says
  
  August 8, 2013 at 7:18 am
  
  Hi Annie,
  Packaged with DSE is Solr, which we will be using, to index and query the data in CFs, but we are not crawling or indexing webpages, so we currently have no need for Nutch.
  
  - Annie Rogers says
    
    August 9, 2013 at 12:35 pm
    
    Ok. That actually makes sense. Thanks

blog.robert.mcfrazier.com

Cassandra and Hadoop and Hive, Oh my! (part 2)

Training

Infrastructure

Cassandra 1.2 for DataStax Enterprise

You may also like:

3 thoughts on “Cassandra and Hadoop and Hive, Oh my! (part 2)”

Leave a comment Cancel reply