Rabu, 13 Maret 2019

k-hot encoded feature Tensorflow Estimator

Let's say I have the following pandas dataframe:


| col1 | col2 | col3 |
---------------------------------
0 | 5 | 4 | [0,2,4] |
1 | 3 | 8 | [7,3] |
2 | 2 | 1 | [7,3,6,9] |

in 'col3' I have lists with different sizes which I also want to import into my Tensorflow Estimator. These lists also must be k-hot encoded, I am not sure if this is how people call it:


[1,4,6] ---> [0, 1, 0, 0, 1, 0, 1]

The problem comes because the maximum number in col3 is 600_000 so my k-hot encoded vector will have size of 600_000 thus I cannot encode my entire dataframe (due to MemoryError) and pass col3 to Tensorflow as


tf.feature_column.numeric_column('col3', 600_000)

Do you have any ideas how can I feed this column in my DNNRegressor ? To share some code, this is how I normally do it for 'standard' columns:


# reading the columns from the pandas_input_fn
col1 = tf.feature_column.numeric_column('col1', default_value=0.0)
col2 = tf.feature_column.numeric_column('col2', default_value=0.0)

# converting them to categorical
col1_b = tf.feature_column.bucketized_column(col1, [0,5,10,20])
col1_2 = tf.feature_column.bucketized_column(col2, [0,4,8,16])

# make crosses
cross = tf.feature_column.crossed_colum([col1_b, col2_b], 4*4)

# define what will go in my estimator
features = [
tf.feature_column.embedding_column(cross, 10)
tf.feature_column.indicator_column(col1_b)
tf.feature_column.indicator_cilumn(col2_b)
]

# and finally
estimator = tf.estimator.DNNRegressor(
model_dir=model_dir,
feature_columns=features,
hidden_units=[512,256,256,128,32])



from k-hot encoded feature Tensorflow Estimator

k-hot encoded feature Tensorflow Estimator Rating: 4.5 Diposkan Oleh: Admin

0 komentar:

Posting Komentar

Popular Posts