Table of Contents
In this article, we will see when we need to use elasticsearch percolator query and how to implement it in Ruby. The elasticsearch percolator query is written based on Ubuntu, but it works in other Linux libraries too.
How does it work?
We believe most Elasticsearch developers think conventionally, and so, they design documents according to the structure of data and store them in an index. Then they define queries through the search API to retrieve these documents. The percolator works in the opposite (reverse) direction. Meaning, first, you store queries into an index and then through the Percolate API you define documents in order to retrieve these queries
- All queries are loaded in memory
- Each document is indexed in memory
- All queries get executed against it
- Execution time linear to # of queries
- Memory index gets cleaned up
When do we need to use percolator?
The usage of the Percolate API in Elasticsearch is quite common, and for the purpose of document monitoring and alerting.
For example, provision of a platform that stores users’ interests in order to send the right content (notification alert) to the right users every time new content comes in.
For instance, a user subscribes to a specific topic, and as soon as a new article for that topic comes in, a notification will be sent to the interested users.
How is this done?
By expressing the users’ interests as an elasticsearch query, using the query DSL, and you can register it in elasticsearch as though it was a document. Every time a new article is issued, without needing to index it, you can percolate it to know which users are interested in it.
At this point in time you know who needs to receive a notification containing the article link (sending the notification is not done by elasticsearch though). An additional step would also be to index the content itself but that is not required.
The uses of this concept are many, such as alerting weather forecast, price monitoring, news alerts, stocks alerts, logos monitoring and many more.
Pre-requisites & Setup:
Java:
Elastic search engine is developed in Java, so we need to make sure Java is installed with help of the below command:
java --version
Installing Elasticsearch:
Next, install Elasticsearch with the below command:
sudo apt-get install elasticsearch
In order to make sure that Elasticsearch is installed correctly, use the following command:
curl -XGET 'localhost:9200'
The result should be something like the following:
{ "name" : "lNOxiFt", "cluster_name" : "elasticsearch", "cluster_uuid" : "r8yOSyCjRtmHFYmdbijjpg", "version" : { "number" : "5.1.2", "build_hash" : "c8c4c16", "build_date" : "2017-01-11T20:18:39.146Z", "build_snapshot" : false, "lucene_version" : "6.3.0" }, "tagline" : "You Know, for Search"
Using Percolator:
The following steps explain how your queries get store into an index and how you define documents in order to retrieve these queries through the Percolate API.
- Requirement and service set up
- Making a connection
- Create a index
- Index a query
- Percolate a document
Requirement & Service setup
In order to implement elasticsearch percolator, we need elasticsearch gem.
gem 'elasticsearch'
I created one service object to index query.
index_service = Services::Percolation.new index_service.re_index
Making a connection
In order to make a connection, we need elasticsearch-transport, which provides a low-level Ruby client for connecting to an Elasticsearch cluster.
def initialize(cfg) @cfg = cfg transport_configuration = lambda do |f| f.response :logger f.adapter :typhoeus end transport = Elasticsearch::Transport::Transport::HTTP::Faraday.new hosts: [ { host: @cfg['elastic']['url'], port: @cfg['elastic']['port'] } ], &transport_configuration @server = Elasticsearch::Client.new log: true, transport: transport end def re_index index_name = "percolator-index" delete_index(index_name) create_index(index_name) ds = ['foo', 'bar'] ds.map do |i| index(i, index_name) end end
Create an index
Create an index with two mappings:
def create_index(index_name) @server.indices.create index: index_name, body: { mappings: { doctype: { properties: { message: { type: "text" } } }, queries: { properties: { query: { type: "percolator" } } } } } end
The doctype mapping is the mapping use to pre-process the document define in the elasticsearch percolator query before it gets index into a temporary index.
The queries mapping is the mapping used for indexing the query documents. A json object is store in the query field, and this json object actually constitutes an Elasticsearch query.
Further, this query field is configured in such a way as to utilise the percolator field type. This particular field type (the percolator field type) is used since it is the one that can comprehend the query dsl.
This is also useful because of the manner in which it stores the query. The documents specified on the elasticsearch percolator query can be match at any point later, with the query.
Index a query
Register a query in the percolator:
def index(ds, index_name) query = { query: { match: { message: "#{ds}" } } } begin r = @server.index index: index_name, type: 'queries', id: ds, body: query puts 'Indexing result:' puts r.inspect rescue Faraday::Error::ResourceNotFound, Faraday::Error::ClientError, Faraday::Error::ConnectionFailed => e puts "Connection failed: #{e}" false end end
Percolate a document
Match a document to the registered percolator queries:
def list_document(index_name='percolator-index') sleep 2 doc = { query: { percolate: { field: "query", document_type: "doctype",document: {message: 'message foo bar'} } } } data = @server.search index: index_name, type: 'queries', body: doc puts "final result" puts data end
The above request will yield the following output response:
{"took"=>8, "timed_out"=>false, "_shards"=>{"total"=>5, "successful"=>5, "failed"=>0}, "hits"=>{"total"=>2, "max_score"=>0.25316024, "hits"=>[{"_index"=>"percolator-index", "_type"=>"queries", "_id"=>"foo", "_score"=>0.25316024, "_source"=>{"query"=>{"match"=>{"message"=>"foo"}}}}, {"_index"=>"percolator-index", "_type"=>"queries", "_id"=>"bar", "_score"=>0.25316024, "_source"=>{"query"=>{"match"=>{"message"=>"bar"}}}}]}}
This can then be use in whichever manner to render the desire output.
This is a sample implementation of elasticsearch percolator query using Ruby, and as mentioned above, it has quite a lot of features. To learn more, check this ElasticSearch Percolator.
To checkout this particular example, please check agiratech github repo.