I’m using ml5js to train a model to predict the number of items that will be sold at a bakery given these fields:
day_of_week
next_holiday
days_until_next_holiday
avg_cloud_cover
avg_precip
avg_temp
highest_cloud_cover
highest_precip
highest_temp
lowest_cloud_cover
lowest_precip
lowest_temp
I’ve trained the model, creating the three usual files:
- model_meta.json
- model.json
- model.weights.bin
However, when I import the model and go to make a prediction using the model using a test input like this:
const input = {
day_of_week: "Thursday",
next_holiday: "Canada Day",
days_until_next_holiday: 10,
avg_cloud_cover: 0,
avg_precip: 0,
avg_temp: 0,
highest_cloud_cover: 0,
highest_precip: 0,
highest_temp: 0,
lowest_cloud_cover: 0,
lowest_precip: 0,
lowest_temp: 0
}
I get this error:
Error: Error when checking : expected dense_Dense1_input to have shape [null,2134] but got array with shape [1,37].
I think I’m getting this error because I’m not formatting the input correctly. I normalized the data when making the model, so it created many new fields, all of which have a value of 0 or 1.
For example, of day_of_week, the start of model_meta.json looks like this:
{
"inputUnits":[2134],
"outputUnits":1,
"inputs":{
"day_of_week":{
"dtype":"string",
"min":0,
"max":1,
"uniqueValues":["Thursday","Wednesday","Tuesday","Monday","Sunday","Saturday","Friday"],
"legend":{"Thursday":[1,0,0,0,0,0,0],"Wednesday":[0,1,0,0,0,0,0],"Tuesday":[0,0,1,0,0,0,0],"Monday":[0,0,0,1,0,0,0],"Sunday":[0,0,0,0,1,0,0],"Saturday":[0,0,0,0,0,1,0],"Friday":[0,0,0,0,0,0,1]
}
},
...
I’m wondering if I need to format the input to not use strings for fields like day_of_week. But if so, I’m not sure what to replace it with.
Any assistance?
Let’s break down the error and how to fix your input formatting.
Understanding the Error
The error “expected dense_Dense1_input to have shape [null,2134] but got array with shape [1,37]” indicates a mismatch between what your model expects as input and what you’re providing.
Expected shape [null, 2134]: This means your model is looking for an input with 2134 features (columns). The ‘null’ signifies that it can handle any number of samples (rows) but each sample must have 2134 features.
Got array with shape [1, 37]: You’re providing an input with 1 sample (row) and only 37 features (columns).
This mismatch arises because your model has been trained on one-hot encoded (or similar encoding) categorical variables, expanding the original number of features. Your input needs to reflect this transformation.
Solution: Formatting the Input
You need to preprocess your input data in the same way you did when training the model. Essentially, you need to one-hot encode your categorical variables to match the model’s expectations.
Here’s a breakdown of how to do it:
Identify Categorical Variables: From your model_meta.json, identify all the variables that were treated as categorical during training (like day_of_week, next_holiday). These are the ones that have a legend section in the metadata.
One-Hot Encoding: For each categorical variable in your input:
Replace the string value with its corresponding one-hot encoded representation from the legend.
For example, if day_of_week is “Thursday”, replace it with [1, 0, 0, 0, 0, 0, 0] based on your metadata.
Construct the Input Array: Create a single array (or a 2D array if you have multiple samples) containing all the feature values in the correct order. The order should match the order of features the model was trained on. You can infer this order from the model_meta.json.
Example (Illustrative)
Let’s say your model_meta.json indicates that the first 7 features correspond to the one-hot encoded day_of_week, and the rest are numerical features as-is. If your input is:
JavaScript
const input = {
day_of_week: “Thursday”,
next_holiday: “Canada Day”,
days_until_next_holiday: 10,
// … other features
}
You’d transform it to something like:
JavaScript
const processedInput = [
1, 0, 0, 0, 0, 0, 0, // one-hot encoded day_of_week
// … one-hot encoded next_holiday
10, // days_until_next_holiday
// … other numerical features
];
Important Considerations
Metadata is Key: The model_meta.json is crucial for understanding how your input needs to be structured.
Consistent Preprocessing: Ensure the preprocessing you apply to your input data during prediction is exactly the same as what you did during training.
Handling New Categories: If your prediction input contains a category that wasn’t present during training (e.g., a new holiday), you’ll need to handle this gracefully. One approach is to have an ‘other’ category in your encoding.
Next Steps
Carefully examine your model_meta.json to identify all categorical variables and their encodings.
Write code to preprocess your input data based on the metadata.
Construct the final input array and feed it to your model for prediction.
If you provide more details about your model_meta.json or the specific preprocessing steps you used during training, I can offer more tailored guidance.
Feel free to ask if you have further questions or need more specific help!