javascript - JSON.parse() on a large array of objects is using way more memory than it should -


i generate ~200'000-element array of objects (using object literal notation inside map rather new constructor()), , i'm saving json.stringify'd version of disk, takes 31 mb, including newlines , one-space-per-indentation level (json.stringify(arr, null, 1)).

then, in new node process, read entire file utf-8 string , pass json.parse:

var fs = require('fs'); var arr1 = json.parse(fs.readfilesync('jmdict-all.json', {encoding : 'utf8'})); 

node memory usage 1.05 gb according mavericks' activity monitor! typing terminal feels laggier on ancient 4 gb ram machine.

but if, in new node process, load file's contents string, chop @ element boundaries, , json.parse each element individually, ostensibly getting same object array:

var fs = require('fs'); var arr2 = fs.readfilesync('jmdict-all.json', {encoding : 'utf8'}).trim().slice(1,-3).split('\n },').map(function(s) {return json.parse(s+'}');}); 

node using ~200 mb of memory, , no noticeable system lag. pattern persists across many restarts of node: json.parseing whole array takes gig of memory while parsing element-wise more memory-efficient.

why there such huge disparity in memory usage? problem json.parse preventing efficient hidden class generation in v8? how can memory performance without slicing-and-dicing strings? must use streaming json parse 😭?

for ease of experimentation, i've put json file in question in gist, please feel free clone it.

a few points note:

  1. you've found that, whatever reason, it's more efficient individual json.parse() calls on each element of array instead of 1 big json.parse().
  2. the data format you're generating under control. unless misunderstood, data file whole not have valid json, long can parse it.
  3. it sounds issue second, more efficient method fragility of splitting original generated json.

this suggests simple solution: instead of generating 1 giant json array, generate individual json string each element of array - no newlines in json string, i.e. use json.stringify(item) no space argument. join json strings newline (or character know never appear in data) , write data file.

when read data, split incoming data on newline, json.parse() on each of lines individually. in other words, step second solution, straightforward string split instead of having fiddle character counts , curly braces.

your code might (really simplified version of posted):

var fs = require('fs'); var arr2 = fs.readfilesync(     'jmdict-all.json',     { encoding: 'utf8' } ).trim().split('\n').map( function( line ) {     return json.parse( line ); }); 

as noted in edit, simplify code to:

var fs = require('fs'); var arr2 = fs.readfilesync(     'jmdict-all.json',     { encoding: 'utf8' } ).trim().split('\n').map( json.parse ); 

but careful this. work in particular case, there potential danger in more general case.

the json.parse function takes 2 arguments: json text , optional "reviver" function.

the [].map() function passes three arguments function calls: item value, array index, , entire array.

so if pass json.parse directly, being called json text first argument (as expected), being passed number "reviver" function. json.parse() ignores second argument because not function reference, you're ok here. can imagine other cases trouble - it's idea triple-check when pass arbitrary function didn't write [].map().


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

node.js - Using Node without global install -

php - CakePHP HttpSockets send array of paramms -