Why use elasticsearch

Much time has passed since I wrote the last post. Had a lot of routine work and housework. But the one topic that I want to share for a long time is elasticsearch.

A magnifying glass hovering over the word Search

Today I’ll tell you why elasticsearch is so good in my opinion.

Backstory

A little backstory about my first  elasticsearch experience. First time I heard about elasticsearch from my client (how was a great dev BTW) when we were working on wideopenspaces.com . He proposed to use elasticsearch in our new child project. When I reviewed it I was confused…. So I need to setup elasticsearch server then  build index (which seemed the same as mysql data) and then sync it every time when my DB data is changed. So we have a headache but where is the profit if I can do the same with raw SQL (especially if I don’t have some big text data) ?

Next time when I remembered about it was a moment when I tired to optimize search queries at openmed.com. We needed to provide a good and quick search to find providers and we had to consider following cases – search by name, specialties, practice office location, my insurance, provide insurances, provider’s rate etc.

I was responsible for backend, I used postgresql as our main DB. And we were satisfied with it. But when it comes to search – I’d say that I felt that I need something more flexible and easy for search. A lot of tables in our search cases had many to many  relations. Different search cases use different tables and resulting sql queries could be very different.

So you’re trying to optimize some search query… but built-in postgresql optimizer is also smart. So if you get some better performance in one example you lost it in another. But frankly speaking all that optimizations didn’t provide great performance improvement. Moreover if we need to add some new criteria we can get some new issues with other cases.

So it wasn’t the end of the world. But it was completely inconvenient when you need to update something in search.

Why elasticsearch is better than DB

Create data structure prepared for search

I will use data example to describe differences in search approaches. Say we have companies. Company has name, description, categories, tags and offices.

Screenshot from 2015 06 22 13 20 31

Elasticsearch index has no sql structure. And this approach usually much more effective when it comes to search. You can mix your DB data and put everything related the search in one index. Yes it’s flat and sometimes denormalized – but it’s normal. Therefore you can exclude such term like JOIN-s, that often become a bottleneck in the queries. Let’s see what we have in elasticsearch index,

{
    "_source": {
        "name": "BranchOut",
        "description": "Your professional profile on BranchOut is where you can capture and share professional moments, build a living narritive of your work life a",
        "company_category_ids": ["12", "13"],
        "tags": "facebook apps recruiting sales jobs",
        "office_locations": ["9q8yyyk", "9q8yyye"]
    }
}

Looks much more compact, right 🙂 ? But let’s see what we got here:

  • name and description – nothing really special here. We just get them from company table.
  • company_category_ids – right it’s category ids. We don’t need categories name or any other category info for search. Normally our our frontend filters will provide us category IDs. So we just add this info to elastic document.
  • tags – here we have denormalized data. i.e. we have tags themselves instead of tag IDs. I’ll explain later why we do so.
  • office_locations here we have an array of geo hashes (i.e. lat/lng geocode). Elasticsearch could work with both lat/lng representation and geo hashes.

Great Search API

I think the most common and useful search cases – is global company search by text.

    select *
    from company
    where match ('name, description') AGAINST ('find me')
  "query": {
    "multi_match": {
      "query": "find me",
      "fields": ["name", "description"]
    }
  }

At this point both requests looks pretty similar. But what if we also would like to include results matching by company tags. In elasticsearch index we already have prepared data for search. In mysql we don’t but also can do the same if needed for search( where match (‘name, description, tags_string’) AGAINST (‘find me’) ). And again at this point we don’t see big difference. But what should we do if we want to add weight to name, i.e. name – is the most valuable case, then description then tags. Elasticsearch has a great search API, check out the docs https://www.elastic.co/guide/en/elasticsearch/guide/current/search-in-depth.html  in current example we change the query (adding boost param)

"should": [ 
    { "match": { "name": "find me", "boost": 3}},
    { "match": { "description": "fine me", "boost": 2}},
    { "match": { "tags": "fine me"}},
]

elasticsearch api is really awesome, I cannot describe everything in scope of this article. But believe we me very often it’s smarter than you :).

All in one request

Now let’s imagine the common search page.

Creately   Draw  Share  Validate and Export diagram

Besides search results and filters we have pagination and aggregation filters that allow quickly to get total counts for similar/sibling searches. And now let’s think how many queries should we do in order to build this page.

  • main query for data
  • count query to build pagination
  • group query for left filter

In elasticsearch you can do that with only one query. Elasticsearch will automatically return total for each query. You also can add aggregation params in same request to get aggregation queries.

GET myindex/company/_search
{
  "query": {
    "multi_match": {
      "query": "apple",
      "fields": ["name", "short_description"]
    }
  },
  "aggregations": {
    "company_category_ids" : {
      "terms": {"field": "company_category_ids"}
    }
  }
}

  "hits": {
      "total": 4,    // get total
      "max_score": 0, 
      "hits": [.... ]    // get data
   },
   "aggregations": {    // get aggregations
      "company_category_ids": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "122",
               "doc_count": 1
            },
            {
               "key": "172",
               "doc_count": 1
            },
            {
               "key": "4",
               "doc_count": 1
            },
            {
               "key": "94",
               "doc_count": 1
            }
         ]
      }
   }

Cool geo search

The last think that I’d like to highlight is awesome geo search API.  Remember I mentioned that we used geo searh in openmed at the beginning of this article.. For geo search we used postgis extension and in order to get nearby practice we have following queries

SELECT ...,
    MIN(ST_Distance(
        ST_Transform(ST_GeomFromText(:myLatLng,4326),2163),
        ST_Transform(
           ST_GeomFromText('POINT('  ||  practice.longitude || ' ' || practice.latitude ||  ')', 4326),
           2163
        )
    )) as distance

In elastic you can to the same with following query

 "sort": [
    {
      "_geo_distance": {
        "location": { 
          "lat":  40.715,
          "lon": -73.998
        },
        "order":         "asc",
        "unit":          "km", 
        "distance_type": "plane" 
      }
    }
  ]

Looks much more human-friendly. Also in most cases you don’t need to get accurate results. As a rule you need to get nearest locations. In this case you can use mentioned geohashes  https://www.elastic.co/guide/en/elasticsearch/reference/1.3/search-aggregations-bucket-geohashgrid-aggregation.html find suited precision and geo search should work even more faster.

Conclusion

Elasticsearch was built for searching. It has very reach search API that in most cases will work better than other search engines. Also it was designed for scale https://www.elastic.co/guide/en/elasticsearch/guide/current/scale.html Yes still you need to take care about synchronization between your main source and elasticsearch index but it’s worth it. So if you still thinking does it make sense to try elastic or not – don’t think just use it.

2 thoughts on “Why use elasticsearch

  1. Thinh

    Hey Sergei, thanks for the explanation.
    I heard about ElasticSearch a long time ago, but i have always been like “Why would I use it ? What do i really GAIN from using it” and couldn’t get a clear answer on the web, until your article.

    Very clear and straightforward examples that make me realize how powerful and useful ElasticSearch really is. Now it just seems the obvious and logical solution for whenever you have to implement a decent search functionality in your website.

    Thank you.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *