Sagemaker/Blazing text error ‘cudaMalloc((void **) &(gpuPointers_.cuInput), inputSize) == cudaSuccess’

Published on Author solarflare

Today I ran into a random error trying with hyperpermitter tuning for Blazingtext on Sagemaker.

blazingtext2: src/gpu.cu:400: void blazingtext::BlazingText::initGPU(): Assertion 'cudaMalloc((void **) &(gpuPointers_.cuInput), inputSize) == cudaSuccess' failed.

Thank to this post I learned the random error was a resource problem. 5 gigs of hard drive was not enough.

Part of my code:

bt_model = sagemaker.estimator.Estimator(container,
role,
instance_count=1,
instance_type='ml.p2.xlarge', #$1.125
volume_size = 20, #5 was too small
max_run = 3600,
input_mode= 'File',
output_path=s3_output_location,
sagemaker_session=sess,
use_spot_instances = True,
max_wait=3600)

bt_model.set_hyperparameters(mode="supervised",
epochs=500,
min_count=4,
learning_rate=0.01,
vector_dim=32,
early_stopping=True,
patience=20,
min_epochs=5,
word_ngrams=3

from sagemaker.tuner import (IntegerParameter, CategoricalParameter, ContinuousParameter,
HyperparameterTuner)

hyperparameter_ranges = {
'buckets': IntegerParameter(1000000, 10000000),
'epochs': IntegerParameter(5, 15),
'learning_rate': ContinuousParameter(0.005, 0.01, scaling_type="Logarithmic"),
'min_count': IntegerParameter(0, 100),
'negative_samples': IntegerParameter(5,25),
'learning_rate': ContinuousParameter(0.0001, 0.001, scaling_type="Logarithmic"),
'vector_dim': IntegerParameter(32, 300),
'window_size': IntegerParameter(1, 10)
}

objective_metric_name = 'validation:accuracy'

tuner = HyperparameterTuner(
bt_model,
objective_metric_name,
hyperparameter_ranges,
max_jobs=1,#testing
max_parallel_jobs=1, #testing
objective_type='Maximize'
)