top of page
Image by Pascal Meier

Hosting a Deep Learning model on Heroku (with ChatGPT DEV support)

a journey through ChatGPT, AWS S3, Lambda, GCP Cloud Storage, BigQuery, TensorFlow (Keras), tensorflow-cpu, TFLite, tinynumpy, ChatGPT (again) and pillow!

This week we took a look at deploying a deep learning model on Heroku.

Heroku used to have a free tier, but now charges a monthly fee, albeit small ($5), for hosting apps. While the cost may be manageable for prototyping, the are two major issues with app development on Heroku stemming from the imposed “slug” size limit of 500 MB: most AI applications have a whole load of dependencies, and the underlying models are big (especially for deep learning).

It’s somewhat of a nightmare weighing up how to keep the dependencies lean and what to do about the model. Our raw model (.h5) file was 170 MB so hosting directly on Heroku wasn’t really an option – when added together with the dependencies this would exceed the Heroku slug size (the TensorFlow python library alone is over 500 MB).

So where to start ? Well ChatGPT of course – the answer to everything ! We asked a few pertinent questions via Bing without much success (and a crash at the end) but it did generate a few ideas to put us on the right path.

To get things moving, we tried hosting the model on an AWS S3 bucket. Amazon Simple Storage (or S3) is widely used for large file (cloud) storage and setting up and pushing our model to an S3 bucket is straightforward enough (see steps here - you will likely need to make your bucket publicly accessible for app use).

Connecting to, never mind copying a model from S3 in order to serve the app is more problematic. Once you have arranged your AWS credentials (go to IAM > Users > Security Credentials tab > Access Keys and download your rootkey.csv in the AWS Management Console), what should be a rather trivial replacement of a local path with a url doesn’t produce the expected results. This stems from how S3 stores files as objects. In the end, we used the python code below to access the model, but the latency in performing inference on a S3-hosted model in this way is prohibitive, although probably fine for a smallish model.

import AWS_credentials

import boto3
import pandas as pd
from tensorflow.keras.models import load_model

s3 = boto3.resource(

s3 = boto3.client('s3',

# s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME')
s3.download_file('ce-aws-heroku', ‘myModel.h5', './tmp/myModel.h5')
model = load_model("./tmp/myModel.h5")

Using instead an AWS lambda function to do the heavy lifting and reduce the inference latency appears to be the preferred route, but comes with a considerable amount of engineering we wanted to avoid in this case.

Google is taking a beating at the moment on Bard, but when it comes to cloud integration, Google Cloud Platform is a far more intuitive experience than AWS. Setting up a service account and connecting to a model hosted in GCP Storage bucket is certainly easier (see code below), but the latency for large models is again pretty horrible. Like Lambda for AWS, there may be a better option here with GCP via BigQuery - we hope to take a look at serverless options in a future post.

from keras.models import load_model
import h5py
import gcsfs

PROJECT_NAME = 'cetech-apps'
CREDENTIALS = './cred.json'
MODEL_PATH = 'gs://ce-tech-dl/myModel.h5'
FS = gcsfs.GCSFileSystem(project=PROJECT_NAME,
with, 'rb') as model_file:
    model_gcs = h5py.File(model_file, 'r')
    myModel = load_model(model_gcs)

Both the AWS and GCP storage connectors were explored from our local repo – if its not going to work from a local deployment, then its not going to work on cloud.

So where now, given direct cloud hosting of a model of this size kills the inference process ?

We tried a different angle – perhaps partially leaning on insights from ChatGPT, we looked at whether bringing the code base down to a basic set of requirements might allow us to push the 170 MB model to Heroku.

We found tensorflow-cpu helps – essentially TensorFlow for running on standard CPU memory (as opposed to GPUs). For a demo app / prototype, this is sufficient. Changing our dependencies and pushing our repo to Heroku resulted in a slug size around 650 MB – much better than the 800 MB or so with the main library.

The model was still a problem though – if we are not going to store it on cloud, can we reduce its size ? Yes! TensorFlow also comes with TFLite - a “light” version for deploying to Edge. It’s supposedly much faster and, importantly, smaller than core TensorFlow. Running the code below shrunk our 170 MB model below 60 MB:

import tensorflow as tf
from pathlib import Path
from tensorflow.keras.models import load_model

# load the model
myModel = load_model('myModel.h5')

# create a TFLiteConverter object from a TensorFlow Keras model 
converter = tf.lite.TFLiteConverter.from_keras_model(myModel)

# converts a Keras model based on instance variable
myModel_tflite = converter.convert()

# Save the model
tflite_model_file = Path('tfliteConv-model.tflite')

So we are done, right ? Wrong! Deploying our compacted model and scaled down library dependencies still resulted in a slug size of around 525 MB :(

We noticed that our dependencies still had a matplotlib library mentioned – we weren't using it anywhere, so removed that, and looked into the possibility of using tinynumpy instead of numpy. Unfortunately that didn’t work as we required three relatively simple numpy functions: expand_dims, argmax() and max() in our code to reshape images to tensors.

So back to ChatGPT again – we asked a few questions to see if we could essentially perform the same three numpy functions using custom functions (and thereby remove entirely numpy from our dependencies). ChatGPT here was pretty helpful and put us on the right path to refactor our code. With these changes we were able to push our app to Heroku with a large, but manageable slug size of around 485 MB.

There was one more spanner in the works – while our app worked on Heroku, the inference process failed. Although not required locally, we needed the “pillow” library to be added back to our dependencies in order for the image-to-tensor conversion to work. Thankfully this library was not large enough to break the slug limit again.

And here is the hosted model. For now, its not mobile-supported and please allow up to 20 seconds or so to load first time! You can try it with the demo images at this link.

Hope that is useful – please get in touch if your looking for videos of the two ChatGPT (Bing) searches used at the start and end of the process, scripted end-to-end steps for the entire process or to deploy AI apps this way or on another cloud service.


bottom of page