Iam storing 2500 files (each file of size ~5mb) per hour in a folder and creating this folder every hour. Iam running 1 namenode and 1 datanode as docker containers and pyarrow/python to create parquet files. Iam facing this issue after it crosses about a million records:

Iam new to HDFS, not able to figure out the error. Can someone please help why this error is coming and is storing these many files can cause issue in future in hdfs cluster/is this an anti pattern due to small file problem in hdfs? Iam planning to store data for 2 years, is this setup enough or we need more datanodes. Its an on-prem deployment on a system with 100GB RAM and 100TB storage and 120 cores

ERROR:

2024-03-21 05:42:58,324 INFO datanode.DataNode: Unsuccessfully sent block report 0xb1366a5bae38ffca,  containing 1 storage report(s), of which we sent 0. The reports had 6023020 total blocks and used 0 RPC(s). This took 456 msec to generate and 99 msecs for RPC and NN processing. Got back no commands.
2024-03-21 05:42:58,324 WARN datanode.DataNode: IOException in offerService
java.io.EOFException: End of File Exception between local host is: "9a5f5a8974e6/172.26.0.2"; destination host is: "namenode":9000; : java.io.EOFException; For more details see:  http://wiki.apache.org/hadoop/EOFException
        at sun.reflect.GeneratedConstructorAccessor16.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:833)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:791)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)
        at org.apache.hadoop.ipc.Client.call(Client.java:1491)
        at org.apache.hadoop.ipc.Client.call(Client.java:1388)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy16.blockReport(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:218)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:404)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:701)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1850)
        at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1183)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1079)