python - Theano function: Unused input -
i trying implement mini-batch kmeans. part seems giving me hard time specifying minibatches inputs theano. have kmeansminibatch
class __init__(self, batch_size, data=none)
function, data
in case minibatch, , batch_size
size of batch. have fit_once
function takes no arguments, instead uses data passed __init__
.
my main script consists of following:
x = t.matrix('x', dtype='float64') mini_batch = t.matrix('x', dtype='float64') kmeans = kmeansminibatch( batch_size=10000, data=x ) func = theano.function( inputs=[mini_batch], outputs=kmeans.fit_once(), givens={ kmeans.x: mini_batch, } ) data = load_data() in xrange(30): func(get_batch(data)) image = image.fromarray( tile_raster_images(x=np.transpose(kmeans.d.eval()), img_shape=(12, 12), tile_shape=(10, 30), tile_spacing=(1, 1)))
what intended initialize kmeansminibatch object symbolic variable x
, gets replaced mini_batch
@ each iteration. each of minibatches generated function get_batch
takes input entire dataset, , using numpy.random.choice
returns subset of dataset, numpy
array. unfortunately, cannot seem accomplish set out achieve, above code results in following error message:
theano.compile.function_module.unusedinputerror: theano.function asked create function computing outputs given inputs, provided input variable @ index 0 not part of computational graph needed compute outputs: x. make error warning, can pass parameter on_unused_input='warn' theano.function. disable completely, use on_unused_input='ignore'.
i not sure why error, since replace symbolic variable x
function input mini_batch
. furthermore, if set on_unused_input='ignore'
end following error message during evaluating kmeans.d.eval()
:
theano.gof.fg.missinginputerror: ("an input of graph, used compute shape(x), not provided , not given value.use theano flag exception_verbosity='high',for more information on error.", x)
any appreciated !
edit:
so got working ! fit_once
function used update matrix d
, attribute of class kmeanminibatch, didn't return it, apparently caused theano
complain, since input indeed not used in output. did is, modified fit_once
return d
, , solved issue. here modified main
x = t.matrix('x', dtype='float64') mini_batch = t.matrix('mini_batch', dtype='float64') kmeans = kmeansminibatch( batch_size=1000, data=x ) func = theano.function( inputs=[mini_batch], outputs=kmeans.fit_once(), givens={ x: mini_batch }, ) data = load_data() d= none in xrange(30): d = func(get_batch(data)) image = image.fromarray( tile_raster_images(x=np.transpose(d), img_shape=(12, 12), tile_shape=(10, 30), tile_spacing=(1, 1))) image.save('repflds7.png')
apparently theano
functions not play void python functions.
edit 2:
just shed more light on problem solve. so, version of kmeans
implementing know vector quantization
, whereby dictionary d
provides compression dataset x
s
. initially, part of fit_once
concerning d
follows:
self.d = t.dot(self.x, t.transpose(s)) self.d = self.d / t.sqrt(t.sum(t.sqr(self.d), axis=0))
so basically, @ every iteration dictionary d
updated, , therefore, make no sense return d
, had in order stop theano
complaining:
self.d = t.dot(self.x, t.transpose(s)) self.d = self.d / t.sqrt(t.sum(t.sqr(self.d), axis=0)) return self.d
d initialized follows in __init__
:
self.d = self.srng.normal(size=(self.dimensions, self.k)) self.d = self.d / t.sqrt(t.sum(t.sqr(self.d), axis=0))
what achieve is: 1. not have return d
, instead update, , evaluate d
in place, can retrieve through kmeans.d
2. not choice of having d
symbolic variable? perhaps shared variable better choice? 3. importantly, @ each of 30 iterations, substitute data x
kmeansminibatch
model minibatch, , hence, use of givens param. there better way achieve that?
as far user of theano concerned, symbolic variables not support concept of "current value" or "updating". need shared variable.
you need clearer on how use kmeansminibatch
class. @ moment not encapsulate d-updating behaviour since theano function compiled , executed outside of kmeansminibatch
. might prefer usage this:
kmeans = kmeansminibatch() data = load_data() in xrange(30): kmeans.update(get_batch(data, batch_size=10000)) image = image.fromarray( tile_raster_images(x=np.transpose(kmeans.get_d()), img_shape=(12, 12), tile_shape=(10, 30), tile_spacing=(1, 1)))
note there no theano functionality visible here, that's encapsulated kmeansminibatch
class. don't need tell kmeansminibatch
batch size because doesn't change symbolic expression; instead tell get_batch
how large batch get.
inside kmeansminibatch
have 2 possible approaches.
make
d
shared variable , useupdates=...
in theano function change contents on eachupdate
.class kmeansminibatch: def __init__(dimensions, k): # ... init srng ... d = srng.normal(size=(dimensions, k)) d = d / numpy.sqrt(numpy.sum(numpy.sqr(d), axis=0))) self.d = theano.shared(d, 'd') mini_batch = t.matrix('mini_batch', dtype='float64') self.func = theano.function(inputs=[mini_batch], updates=fit_once(mini_batch)) def update(batch): self.func(batch) def fit_once(mini_batch): # ... work create s symbolically ... d_update = t.dot(mini_batch, t.transpose(s)) d_update = d_update / t.sqrt(t.sum(t.sqr(d_update), axis=0)) return [(self.d, d_update)] def get_d(): return self.d.get_value()
note init of d has changed theano operation numpy operation.
make
d
regular numpy array, pass in input theano function, , change value output of theano function, on eachupdate
.class kmeansminibatch: def __init__(dimensions, k): # ... init srng ... self.d = srng.normal(size=(dimensions, k)) self.d = self.d / numpy.sqrt(numpy.sum(numpy.sqr(self.d), axis=0))) mini_batch = t.matrix('mini_batch', dtype='float64') self.func = theano.function(inputs=[mini_batch], outputs=fit_once()) def update(batch): self.d = self.func(batch) def fit_once(mini_batch): # ... work create s symbolically ... d_update = t.dot(mini_batch, t.transpose(s)) d_update = d_update / t.sqrt(t.sum(t.sqr(d_update), axis=0)) return d_update def get_d(): return self.d
as far can see there's no need use givens=...
@ all.
Comments
Post a Comment