I came across this (Java) code yesterday and can’t figure out why someone would have done this in any language.
In what scenario, language, paranoid delusion would “removing” values from an ArrayList like this improve performance? Try to avoid getting dragged in to the general design of the worker. I know it’s bad and should be different, but I’m trying to understand if there is a reason for writing a ‘remove’ method this way.
Some points of clarification:
- tradeIdList contains no duplicate values
- It is populated in the constructor and never read until the worker has finished.
-
The class reading this result simply prints the contents to a log file.
public class TradeIdWorker implements Runnable{
private List tradeIdList = new ArrayList();
….public void run(){ // Copy the tradeIdList to an array (No I do not know why) for (int tradeId : tradeIdArray){ .... // Once processed, remove it removeProcessedTradeId(tradeId); } /** * Set Trade id to zero (for performance) */ private void removeProcessedTradeId(int tradeId) { for (int index = 0; index < tradeIdList.size(); index++) { if (tradeIdList.get(index).intValue() == tradeId) { tradeIdList.set(index, 0); } } }
To try and understand why I’ve added a real world use case from the system. The arraylist of tradeIds never contains duplicates.
public static void main(String[] args) {
testNormalRemove();
testPerformantRemove();
}
private static void testNormalRemove() {
ArrayList<Integer> aList = new ArrayList<Integer>();
for (int i = 0; i < 1000000; i++) {
aList.add(i);
}
long start = System.currentTimeMillis();
for (int i = 0; i < aList.size(); i += 1000) {
aList.remove((Object) i);
}
long end = System.currentTimeMillis();
System.out.println("Normal Took: " + (end - start)); // 1700ms
}
private static void testPerformantRemove() {
ArrayList<Integer> aList = new ArrayList<Integer>();
for (int i = 0; i < 1000000; i++) {
aList.add(i);
}
long start = System.currentTimeMillis();
for (int i = 0; i < aList.size(); i += 1000) {
for (int index = 0; index < aList.size(); index++) {
if (aList.get(index).intValue() == i) {
aList.set(index, 0);
}
}
}
long end = System.currentTimeMillis();
System.out.println("Performant Took: " + (end - start)); // 4800ms
}
13
Removing an element in an ArrayList requires shifting all the remaining elements with a higher index down one. Marking an element with a zero does not.
So if N is the number of elements in tradeIdList and M is the number of elements in tradeIdArray, this approach is O(M × N) whereas actually removing the elements on each iteration of M is O(M×N²).
Typically there would be a final pass to remove all the blanked values, which would be O(N) and so not increase the complexity.
This applies for the general case – the code iterates over all of the array list, zeroing any occurrence of the id. If you can assume that there is only one occurrence, then there will not be a difference in the times.
In the following, the tests compare a set of inputs with repeats.
import java.util.*;
public class Programmers312172 {
public static void main (String ... args) {
Worker[] workers = {new TradeIdWorker(), new RemoveWorker(), new ZeroOnceWorker(), new RemoveOnceWorker()};
for (int run = 0; run < 10; ++run) {
for ( Worker worker : workers )
worker.prepare();
for ( Worker worker : workers ) {
long start = System.nanoTime();
worker.run();
long end = System.nanoTime();
System.out.printf("%-24s %12dnsn", worker, end-start);
}
}
}
static int[] prepareInputs (List<Integer> tradeIdList) {
Random r = new Random(0);
int[] tradeIdArray = new int[1000];
for(int i=0; i < 100000; ++i)
tradeIdList.add(r.nextInt(2000));
for(int i=0; i < 1000; ++i)
tradeIdArray[i] = i;
return tradeIdArray;
}
public abstract static class Worker {
protected List<Integer> tradeIdList = new ArrayList<Integer>();
private int[] tradeIdArray;
public void prepare() {
tradeIdArray = prepareInputs(tradeIdList);
}
public void run(){
for (int tradeId : tradeIdArray)
removeProcessedTradeId(tradeId);
}
protected abstract void removeProcessedTradeId(int tradeId);
}
public static class TradeIdWorker extends Worker {
@Override
protected void removeProcessedTradeId(int tradeId) {
for (int index = 0; index < tradeIdList.size(); index++) {
if (tradeIdList.get(index).intValue() == tradeId) {
tradeIdList.set(index, 0);
}
}
}
@Override
public String toString() { return "zero all"; }
}
public static class ZeroOnceWorker extends Worker {
@Override
protected void removeProcessedTradeId(int tradeId) {
for (int index = 0; index < tradeIdList.size(); index++) {
if (tradeIdList.get(index).intValue() == tradeId) {
tradeIdList.set(index, 0);
break;
}
}
}
@Override
public String toString() { return "zero once"; }
}
public static class RemoveWorker extends Worker {
protected void removeProcessedTradeId(int tradeId) {
Integer toRemove = tradeId;
while(tradeIdList.remove(toRemove));
}
@Override
public String toString() { return "remove all"; }
}
public static class RemoveOnceWorker extends Worker {
protected void removeProcessedTradeId(int tradeId) {
tradeIdList.remove((Integer)tradeId);
}
@Override
public String toString() { return "remove once"; }
}
}
This gives values such as
zero all 1136905192ns
remove all 13162226491ns
zero once 6587605ns
remove once 70135769ns
Other data will change the relationships, but typically there will be similar speed increases. As the value will typically be hit in the first 2% of the elements, zero once can stop after iterating over 2% of the list, but remove once has to move the remaining 98% down so has to iterate over all the list. You could probably also come up with inputs that show no improvement ( e.g. single occurrence, remove in reverse order from the end ), but in general zeroing will be better.
6