Elasticsearch PHP bulk index performance vs index -
i run benchmark on elasticsearch using elasticsearch-php. compare time taken 10 000 index 1 one vs 10 000 bulk of 1 000 documents.
on vpn server 3 cores 2 gb mem performance quite same or without bulk index.
my php code (inspired à post):
<?php set_time_limit(0); // no timeout require 'vendor/autoload.php'; $es = new elasticsearch\client([ 'hosts'=>['127.0.0.1:9200'] ]); $max = 10000; // elasticsearch bulk index $temps_debut = microtime(true); ($i = 0; $i <= $max; $i++) { $params['body'][] = array( 'index' => array( '_index' => 'articles', '_type' => 'article', '_id' => 'cle' . $i ) ); $params['body'][] = array( 'my_field' => 'my_value' . $i ); if ($i % 1000) { // every 1000 documents stop , send bulk request $responses = $es->bulk($params); $params = array(); // erase old bulk request unset($responses); // unset save memory } } $temps_fin = microtime(true); echo 'elasticsearch bulk: ' . round($i / round($temps_fin - $temps_debut, 4)) . ' per sec <br>'; // elasticsearch without bulk index $temps_debut = microtime(true); ($i = 1; $i <= $max; $i++) { $params = array(); $params['index'] = 'my_index'; $params['type'] = 'my_type'; $params['id'] = "key".$i; $params['body'] = array('testfield' => 'valeur'.$i); $ret = $es->index($params); } $temps_fin = microtime(true); echo 'elasticsearch 1 one : ' . round($i / round($temps_fin - $temps_debut, 4)) . 'per sec <br>'; ?> elasticsearch bulk: 1209 per sec elasticsearch 1 one : 1197per sec
is there wrong on bulk index obtain better performance ?
thank's
replace:
if ($i % 1000) { // every 1000 documents stop , send bulk request with:
if (($i + 1) % 1000 === 0) { // every 1000 documents stop , send bulk request or query each non-0 value (that 999 of 1000)... obviously, works if $max multiple of 1000.
also, correct bug:
for ($i = 0; $i <= $max; $i++) { will iterate on $max + 1 items. replace with:
for ($i = 0; $i < $max; $i++) { there might problem how initialize $params. shouldn't set outside of loop , clean-up $params['body'] after each ->bulk()? when reset $params = array(); loose of it.
also, remember es may distributed on cluster. bulk operations can distributed workload. performance scaling not visible on single physical node.
Comments
Post a Comment