mardi 3 mars 2015

How to approach Utilizing The Hardware (multiple threads and/or GPU) while indexing (via a database) a very large set of binary files


Problem



  1. How can I maximize the hardware when reading a large file (i.e. how to avoid being blocked by IO and running out of memory)


preferably I would like to give the user a "high priority" (locks the system apart from some progress UI) and a "run in background" option (Allows the user to start using data that has already been indexed/loaded)


Assumptions



  1. Presumably I will have no problems having multiple threads reading the same set of files?

  2. The way I see it the biggest bottleneck / will be writing to the database. Presumably I will have to lock/unlock that and have each thread queue up data to write in batches?

  3. As each thread will need its own data queues I am unsure how to make sure the machine does not run out of memory

  4. When processing the data blocks I basically want to get stuff like averages, minimums and maximums, I have assumed I can't use the GPU to process the data blocks like this? it feels like there is too much shared data to utilize the GPU here.


Detail:


I am working with very large data sets, split across multiple files (each "full" data file is 1.5gb and there are often several of these files (I am looking at one now that has 10)


The data effectively contains a series of buffers that I want to access 1 or more at a time.


|headerInfo-datablock|HeaderInfo-datablock|HeaderInfo-datablock| (thousands)


I want to go through the data file, filling a database with index information (so i can access specific blocks quickly based of either information in the header or the datablock itself). If I can I would also want to blit information to a graph image as I do it.


The machine it will be ran on is a proper workstation PC so there is plenty of ram and processing power to utilize :)


Technology I am using C# so any tips on the best ways to judge the hardware capabilities in C# (at run time) and maximize them when accessing a set of large files? would be a great bonus





Aucun commentaire:

Enregistrer un commentaire