How do I improve my code where it can gather a list of directories in faster time?

  Kiến thức lập trình

We have a lot of directories under myfolder, and all I need is any folders up to the third level after myfolder excluding paths that contain the name within the avoidFolders string array. I have been told that it cannot be longer than 2 hours.

So for instance, if myfolder is “C:LoadingClaim”

Examples of what would be collected would be:
-C:LoadingClaimfolder1folder2folder3
-C:LoadingClaimfolder1folder2
-C:LoadingClaimfolder1

But, any folders after would not be accepted like “C:LoadingClaimfolder1folder2folder3folder4”

Paths that contain “Arc” folder (i.e, C:LoadingClaimfolder1folder2Arc” would not be accepted either, because Arc is part of the avoidFolders string array. Even though, folder2 contains an Arc subfolder, we can still capture “C:LoadingClaimfolder1folder2” and any other subfolders that is not part of the avoidFolders string array.

I thought that enumerating the directories, parallel processing, and saving it as a HashSet variable to avoid duplicates could help me. I know that there are limits to how fast a script can be processed on my machine, but I was hoping there was something in my code that was slowing down the script so that this can be easily resolved.

I have tried two different lines of code, which I have called Test1 and Test2. Below is what I have done with myfolder being the parent folder that contains the subfolders, and avoidFolders is a link to a text document of a short list (for now, it is short) of names that pertain to the paths we do not want to capture.

string[] avoidFolders = File.ReadAllLines(avoidFolder);

Here is where they differ. Both tests being displayed in a try-catch block, but I have not seen any exceptions being alerted.

Below is Test1

HashSet<string> directories = Directory.EnumerateDirectories(myfolder, "*", SearchOption.AllDirectories)
    .AsParallel()
    .Where(subfolder =>
    {
        string[] parts = subfolder.Split(Path.DirectorySeparatorChar);
        int index = Array.IndexOf(parts, "Claims");
        return index >= 0 && parts.Length - index <= 4
            && !avoidFolders.Any(af => subfolder.IndexOf(af, StringComparison.OrdinalIgnoreCase) >= 0);
    })
    .ToHashSet();

Below is Test 2

/*Compile the regex pattern to avoid looping through same Regex pattern multiple times*/
Regex directoryRegex = new Regex($@"{Regex.Escape(myfolder)}[^\]+\?[^\]*\?[^\]*", RegexOptions.Compiled);
Regex avoidFoldersRegex = new Regex(string.Join("|", avoidFolders
     .Select(af => Regex.Escape(af))),
     RegexOptions.IgnoreCase | RegexOptions.Compiled);
HashSet<string> directories = Directory.EnumerateDirectories(myfolder, "*", SearchOption.AllDirectories)
                .AsParallel()
                .Select(subfolder => directoryRegex.Match(subfolder).Value)
                .Where(match => !avoidFoldersRegex.IsMatch(match)).Where(match => match != null)
                .ToHashSet();


Once it has collected the list, it will be displayed in this text document where txtfile is a file path where I am creating and writing to the file. I have tried using parallel processing too, but it messes up how the paths are being displayed like it stops writing one path in the middle to write another path.

            using (StreamWriter writer = new StreamWriter(txtfile))
            {
                zt.Log.Output(DateTime.Now + " : Writing list of paths to txt file");
                foreach (string directory in directories)
                {
                    writer.WriteLine(directory);
                }
            }

LEAVE A COMMENT