javascript - JSON.parse() on a large array of objects is using way more memory than it should -
i generate ~200'000-element array of objects (using object literal notation inside map
rather new constructor()
), , i'm saving json.stringify
'd version of disk, takes 31 mb, including newlines , one-space-per-indentation level (json.stringify(arr, null, 1)
).
then, in new node process, read entire file utf-8 string , pass json.parse
:
var fs = require('fs'); var arr1 = json.parse(fs.readfilesync('jmdict-all.json', {encoding : 'utf8'}));
node memory usage 1.05 gb according mavericks' activity monitor! typing terminal feels laggier on ancient 4 gb ram machine.
but if, in new node process, load file's contents string, chop @ element boundaries, , json.parse
each element individually, ostensibly getting same object array:
var fs = require('fs'); var arr2 = fs.readfilesync('jmdict-all.json', {encoding : 'utf8'}).trim().slice(1,-3).split('\n },').map(function(s) {return json.parse(s+'}');});
node using ~200 mb of memory, , no noticeable system lag. pattern persists across many restarts of node: json.parse
ing whole array takes gig of memory while parsing element-wise more memory-efficient.
why there such huge disparity in memory usage? problem json.parse
preventing efficient hidden class generation in v8? how can memory performance without slicing-and-dicing strings? must use streaming json parse �
for ease of experimentation, i've put json file in question in gist, please feel free clone it.
a few points note:
- you've found that, whatever reason, it's more efficient individual
json.parse()
calls on each element of array instead of 1 bigjson.parse()
. - the data format you're generating under control. unless misunderstood, data file whole not have valid json, long can parse it.
- it sounds issue second, more efficient method fragility of splitting original generated json.
this suggests simple solution: instead of generating 1 giant json array, generate individual json string each element of array - no newlines in json string, i.e. use json.stringify(item)
no space
argument. join json strings newline (or character know never appear in data) , write data file.
when read data, split incoming data on newline, json.parse()
on each of lines individually. in other words, step second solution, straightforward string split instead of having fiddle character counts , curly braces.
your code might (really simplified version of posted):
var fs = require('fs'); var arr2 = fs.readfilesync( 'jmdict-all.json', { encoding: 'utf8' } ).trim().split('\n').map( function( line ) { return json.parse( line ); });
as noted in edit, simplify code to:
var fs = require('fs'); var arr2 = fs.readfilesync( 'jmdict-all.json', { encoding: 'utf8' } ).trim().split('\n').map( json.parse );
but careful this. work in particular case, there potential danger in more general case.
the json.parse
function takes 2 arguments: json text , optional "reviver" function.
the [].map()
function passes three arguments function calls: item value, array index, , entire array.
so if pass json.parse
directly, being called json text first argument (as expected), being passed number "reviver" function. json.parse()
ignores second argument because not function reference, you're ok here. can imagine other cases trouble - it's idea triple-check when pass arbitrary function didn't write [].map()
.
Comments
Post a Comment