I’m trying to construct a binary image classifier in Tensorflow. I have found a research paper on a similar problem and, as a first shot, tried to apply it to mine to see where it would take me. The architecture is a really simple sequential CNN, so I just typed it out into a Jupyter notebook as a functional model, with binary cross-entropy as my loss function and ROC AUC as my metric. I prepared a few experiments and decided to run them in parallel on my M1 Macbook Pro and on Google Colab to speed up the process. I then noticed that, even if I run the exact same notebook on the two platforms, they behave differently: Google Colab gives the unsurprising (if not quite satisfactory) result of train metric peaking after 2-3 epochs with severe overfitting. My Mac, however, can’t seem to get out of the gate: the train metric oscillates around 0.5 all the time, while the validation metric ALWAYS comes out at 0.5. To me, this indicates that the model is not learning at all on the Mac. This isn’t the first time I see my Mac act differently from Colab but this time, it really bugs me as the code is the exact same, so it could indicate a potential problem in my Mac’s setup. It might be relevant how I installed Tensorflow with GPU support on my Mac: I tried to follow the steps in https://yashguptatech.medium.com/tensorflow-setup-on-apple-silicon-mac-m1-m1-pro-m1-max-661d4a6fbb77 but I noticed some weird hangups, e.g. whenever I tried to compile a model. So I just installed these three packages normally using pip (i.e. without any preliminaries): tensorflow, tensorflow-macos, tensorflow-metal. I have been able to use Tensorflow normally, though like I said – sometimes a difference in behavior between the two platforms would emerge. I’m not saying I didn’t expect ANY differences but if on one platform the same code overfits in three epochs while on the other it can’t even beat random guessing, then I guess there must be something fundamentally wrong here. Has anyone come across such behavior? I know an obvious answer could be that I’m not setting random seeds for dataset splitting, etc., but I am doing that. Besides, a different split should not result in a model not learning at all. I should add that I have been able to train different model architectures on my Mac on the same data set (split and all) and they have been working fine.
Tried running the same Jupyter notebook containing a simple image classifier with 4 convolutional layers written as a Tensorflow functional model on two different platforms: Apple Macbook Pro M1 (with GPU) and Google Colab.
Actual result: On the Macbook, the classifier doesn’t train at all: in every training epoch, training metric (ROC AUC) oscillates around 0.5, while validation metric is always exactly 0.5. On Colab, the training metric quickly (in 2-3 epochs) climbs to ~1.0, while the validation metric fluctuates between ~0.6 and 0.7. Once again: the very same code is run on the very same data set on both platforms, with random seeds set to ensure reproducibility.
Expected result: The behavior of the notebook is similar on both platforms: whether the model learns or can’t even get out of the gate shouldn’t be platform-dependent.