How your code should interact with elasticsearch

I wrote a few articles about elasticsearch. It’s great tool for searching. It develops very fast and has very rich API. It changed my vision to search. I tend to use only one line search that will search what  you need in all your models by any field that you want.  I think I have pretty good experience with ES. And I continue to extend my knowledge.

In this article I would like to share my vision how to organize your code responsible for search in Elasticsearch.

Let’s cook

I worked a pattern that I use with ES. For me it works great. It’s custom implementation, so I do not claim that this is a universal solution. Also if you need to search one model by name – probably you can find some simple solutions.

A lot of frameworks or libraries trying to make ES documents is a reflection of relational DB tables. Or add functionality that is used their AR models. User::find_by User::find_one, built in validation for document fields. For me this approach doesn’t work in most cases. I’ll explain why. Let’s use data example from my previous article – why use elasticsearch.

Data structure

We had following data in Mysql

And in ES I got following JSON data:

{
    "_source": {
        "name": "BranchOut",
        "description": "Your professional profile on BranchOut is where you can capture and share professional moments, build a living narritive of your work life a",
        "company_category_ids": ["12", "13"],
        "tags": "facebook apps recruiting sales jobs",
        "office_locations": ["9q8yyyk", "9q8yyye"]
    }
}

It’s quite simple example in real life ES document body can include much more fields. My searches are not simple.

I will use 2 terms mysql model – AR class \AR\Company that works with Mysql and ES model -\ES\Company class that will work with ES.

What do we need from ES model.

Now let us ask ourselves – what do we need from ES models?

Obviously to search something in ES documents we need to build these documents first. We need to know what to write in ES. From an example below you can see that we write in ES document only those data/fields that will be used for search. So we need functionality that will know what to write in ES.

Do we need to validate data that we write to ES – NO. Surprised ? Our ES document is a copy of data that we keep in mysql. We leave what we need, we put related models in one flat document – but still it’s a copy.So if validated data when writing it to mysql – why do we need to validate it again.

Since we have a copy of mysql data. We need to keep ES data up to date. And need to trigger ES update when we change data in Mysql. Also we will need some a command that can sync data between Mysql and Elasticsearch.

Now let’s talk about search cases. Do we need “accurate search”, i.e. search by id, search by full name. Find by attribute(s) functionality – \es\Company::find([‘name’ => ‘BranchOut’, ‘company_category_id’ => [1,2]]) – NO because we can use \AR\Company::find([‘name’ => ‘BranchOut’]) for accurate search queries. It’s easier – we can use all the power of AR classes $company->category->name etc. It’s more comfortable to work with AR for such cases.

Do we need \es\Company::find($id) – not directly. We need it to update date in ES document. But not for search. Again it’s easy to use AR model.

In a real life your search request that is generated from user input plus some business rules(additional user access rules, some default conditions) will look like this

\es\Company::find([‘user_id’ => $currentUserId, ‘q’ => ‘Branch’, ‘location’ => [$lat, $lng], ‘company_category_id’ => 12])

What do we need to note here – search is not accurate. User can enter whatever he/she wants – Branch Out, facebook BranchOut. BranchOut New York, etc. Next thing – we cannot just pass this query to ES. We need to build additional functionality that will translate this params to ES query. For example we can convert q param – that is raw user text input field in search request

For location filter we will need to convert lat,lng to geohash first and so on. So that’s an answer to our question – we need to add code that will convert query from user input into ES query.

Finally we will need to set mapping for our indexes. ES set mapping automatically but if you work with nested fields, date fields, geo fields, etc you will need to set mapping.

Let’s summarize.

  • We need a builder – that will be build ES document from multiple mysql models.
  • We need ability to set mapping for indexed
  • We need query – that can convert user input into ES query

To be continued…

Leave a Reply

Your email address will not be published. Required fields are marked *