Elasticsearch PHP bulk index performance vs index -


i run benchmark on elasticsearch using elasticsearch-php. compare time taken 10 000 index 1 one vs 10 000 bulk of 1 000 documents.

on vpn server 3 cores 2 gb mem performance quite same or without bulk index.

my php code (inspired à post):

<?php set_time_limit(0);  //  no timeout require 'vendor/autoload.php'; $es = new elasticsearch\client([     'hosts'=>['127.0.0.1:9200'] ]); $max = 10000;  // elasticsearch bulk index $temps_debut = microtime(true); ($i = 0; $i <=  $max; $i++) {     $params['body'][] = array(         'index' => array(             '_index' => 'articles',             '_type' => 'article',             '_id' => 'cle' . $i         )     );     $params['body'][] = array(         'my_field' => 'my_value' . $i     );     if ($i % 1000) {   // every 1000 documents stop , send bulk request         $responses = $es->bulk($params);         $params = array();  // erase old bulk request             unset($responses); // unset  save memory     } } $temps_fin = microtime(true); echo 'elasticsearch bulk: ' . round($i / round($temps_fin - $temps_debut, 4)) . ' per sec <br>';  // elasticsearch without bulk index $temps_debut = microtime(true);         ($i = 1; $i <= $max; $i++) {                 $params = array();             $params['index'] = 'my_index';             $params['type']  = 'my_type';             $params['id']    = "key".$i;             $params['body']  = array('testfield' => 'valeur'.$i);             $ret = $es->index($params);         } $temps_fin = microtime(true); echo 'elasticsearch 1 one : ' . round($i / round($temps_fin - $temps_debut, 4)) . 'per sec <br>'; ?> 

elasticsearch bulk: 1209 per sec elasticsearch 1 one : 1197per sec

is there wrong on bulk index obtain better performance ?

thank's

replace:

if ($i % 1000) {   // every 1000 documents stop , send bulk request 

with:

if (($i + 1) % 1000 === 0) {   // every 1000 documents stop , send bulk request 

or query each non-0 value (that 999 of 1000)... obviously, works if $max multiple of 1000.

also, correct bug:

for ($i = 0; $i <=  $max; $i++) { 

will iterate on $max + 1 items. replace with:

for ($i = 0; $i < $max; $i++) { 

there might problem how initialize $params. shouldn't set outside of loop , clean-up $params['body'] after each ->bulk()? when reset $params = array(); loose of it.

also, remember es may distributed on cluster. bulk operations can distributed workload. performance scaling not visible on single physical node.


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

php - CakePHP HttpSockets send array of paramms -

node.js - Using Node without global install -