How can I identify missing documents in ElasticSearch? -


background:

in elasticsearch index, have 2 types of documents can identified 'bvi_ship' , 'bvi_notify'. each document identified 'bvi_ship' should have corresponding document identified 'bvi_notify'.

question:

what appropriate way of identifying 'bvi_ship' documents don't have 'bvi_notify' document?

using facet:

i've been able identify necessary documents using following faceted code:

{    "size":0,    "query":{       "filtered":{          "query":{             "query_string":{                "default_operator":"or",                "default_field":"_all",                "query":"@fields.action:\"bv_ship\" or @fields.action:\"bvi_notify\""             }          }       }    },    "facets":{       "terms":{          "terms":{             "field":[                "@fields.object"             ],             "size":1000          }       }    } } 

which returns results this:

{   "took" : 147,   ...   },   "hits" : {     ...   },   "facets" : {     "terms" : {       ...       "terms" : [ {         "term" : "xml",         "count" : 1443       }, {         "term" : "content_ff47d2d096ea4510ac0895941666e507",         "count" : 2       }, {         "term" : "content_fa525becb2724b7682df278c02fed308",         "count" : 2       },         ... thousands of records count of 2       }, {         "term" : "content_f1ff2f7440534a08bad4c62b92165949",         "count" : 1       } ]     }   } } 

this could work well, don't want return thousands of records have count of 2 when interested in records have count of 1.

is there way limit faceted search returns records count of 1?

using filter:

i'm guessing should able more specific in query , select appropriate records using combination of queries , filters, though elasticsearch kung-fu being handicapped relational database karate.

i think best way index records 'bvi_notify' objects children of records 'bvi_ship' objects. able use has_child filter in must_not clause of bool filter find 'bvi_ship' documents don't have corresponding 'bvi_notify' objects.

to answer original question, there no way limit term facets terms count of 1, can sort facets using reverse_count order, bring terms count of 1 top of list. however, should mention if have more 1 shard, counts in facets might incorrect. reason why recommend going parent/child solution instead of facets.


Comments