Relative Content

Tag Archive for hadoop

How best to merge/sort/page through tons of JSON arrays?

Here’s the scenario: Say you have millions of JSON documents stored as text files. Each JSON document is an array of “activity” objects, each of which contain a “created_datetime” attribute. What is the best way to merge/sort/filter/page through these activities via a web UI? For example, say we want to take a few thousand of the documents, merge them into a gigantic array, sort the array by the “created_datetime” attribute descending and then page through it 10 activities at a time.

Got exception: java.lang.NullPointerException: java.lang.NullPointerException whem transfering data from mysql to hadoop using sqoop command

2024-06-19 17:08:00,110 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1718757162368_0013
2024-06-19 17:08:00,116 INFO mapreduce.JobSubmitter: Executing with tokens: []
2024-06-19 17:08:01,043 INFO conf.Configuration: found resource resource-types.xml at file:/home/rao-hanan/hadoop-3.2.3/etc/hadoop/resource-types.xml
2024-06-19 17:08:01,272 INFO impl.YarnClientImpl: Submitted application application_1718757162368_0013
2024-06-19 17:08:01,397 INFO mapreduce.Job: The url to track the job: http://rao-hanan-VMware-Virtual-Platform:8088/proxy/application_1718757162368_0013/
2024-06-19 17:08:01,400 INFO mapreduce.Job: Running job: job_1718757162368_0013
2024-06-19 17:08:03,506 INFO mapreduce.Job: Job job_1718757162368_0013 running in uber mode : false
2024-06-19 17:08:03,514 INFO mapreduce.Job: map 0% reduce 0%
2024-06-19 17:08:03,594 INFO mapreduce.Job: Job job_1718757162368_0013 failed with state FAILED due to: Application application_1718757162368_0013 failed 2 times due to Error launching appattempt_1718757162368_0013_000002. Got exception: java.lang.NullPointerException: java.lang.NullPointerException
at org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager.retrievePassword(NMContainerTokenSecretManager.java:160)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.verifyAndGetContainerTokenIdentifier(ContainerManagerImpl.java:1230)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:959)
at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:106)
at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:245)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)

Cannot write file into hadoop3.3.6

I’m trying to use docker compose to build a hadoop cluster on my macbook(m1, apple silicon chip). I have expose the port of 9000 on docker container, here is my java code below: