Hive plays well with ElasticSearch

Using the Amazon Elasticsearch Service with Hive

Amazon launched the Amazon Elasticsearch Service less than a month ago to enable their clients to spin up scalable Elasticsearch clusters directly from the AWS Management Console and forget about about managing these clusters by themselves. While you can spin up and use an Elasticsearch cluster in several minutes, this ease of use comes with a small disadvantage: as opposed to a classic Elasticsearch setup, the Elasticsearch service only exposes the publicly accessible client gateway, making it impossible for Hadoop applications to connect to the nodes behind this gateway using discovery mechanisms.

Hive and Elasticsearch

To connect to the ElasticSearch service from any popular Hadoop applications (Hive, Pig, Spark etc.) you need to use the Elasticsearch Hadoop connector. This can be imported into your Java/Scala application using build tools such as Maven and sbt respectively. To use the connector in Hive though, you need to download the standalone jar package available on the Elasticsearch website.

Continue reading…

Using the AWS Flow Java framework with IntelliJ IDEA and Maven

Enabling AspectJ support in Java is a bit of a “love” story in itself. But making sure aspect weaving works for Amazon Simple Workflow in the Maven context (in which, I might say, any developer that operates in a production environment lives and breathes) is a challenge on its own. I’m sharing this article as a result of several days of research, sweat and hair pulling. In short, I am going to explain how to enable compile time weaving for the AWS SWF Flow Java Framework in combination with IntelliJ and Maven. So here goes…

Continue reading…