Elasticsearch PHP bulk index performance vs index -
i run benchmark on elasticsearch using elasticsearch-php. compare time taken 10 000 index 1 one vs 10 000 bulk of 1 000 documents.
on vpn server 3 cores 2 gb mem performance quite same or without bulk index.
my php code (inspired à post):
<?php set_time_limit(0); // no timeout require 'vendor/autoload.php'; $es = new elasticsearch\client([ 'hosts'=>['127.0.0.1:9200'] ]); $max = 10000; // elasticsearch bulk index $temps_debut = microtime(true); ($i = 0; $i <= $max; $i++) { $params['body'][] = array( 'index' => array( '_index' => 'articles', '_type' => 'article', '_id' => 'cle' . $i ) ); $params['body'][] = array( 'my_field' => 'my_value' . $i ); if ($i % 1000) { // every 1000 documents stop , send bulk request $responses = $es->bulk($params); $params = array(); // erase old bulk request unset($responses); // unset save memory } } $temps_fin = microtime(true); echo 'elasticsearch bulk: ' . round($i / round($temps_fin - $temps_debut, 4)) . ' per sec <br>'; // elasticsearch without bulk index $temps_debut = microtime(true); ($i = 1; $i <= $max; $i++) { $params = array(); $params['index'] = 'my_index'; $params['type'] = 'my_type'; $params['id'] = "key".$i; $params['body'] = array('testfield' => 'valeur'.$i); $ret = $es->index($params); } $temps_fin = microtime(true); echo 'elasticsearch 1 one : ' . round($i / round($temps_fin - $temps_debut, 4)) . 'per sec <br>'; ?>
elasticsearch bulk: 1209 per sec elasticsearch 1 one : 1197per sec
is there wrong on bulk index obtain better performance ?
thank's
replace:
if ($i % 1000) { // every 1000 documents stop , send bulk request
with:
if (($i + 1) % 1000 === 0) { // every 1000 documents stop , send bulk request
or query each non-0 value (that 999 of 1000)... obviously, works if $max
multiple of 1000.
also, correct bug:
for ($i = 0; $i <= $max; $i++) {
will iterate on $max + 1
items. replace with:
for ($i = 0; $i < $max; $i++) {
there might problem how initialize $params
. shouldn't set outside of loop , clean-up $params['body']
after each ->bulk()
? when reset $params = array();
loose of it.
also, remember es may distributed on cluster. bulk operations can distributed workload. performance scaling not visible on single physical node.
Comments
Post a Comment