I understand that there are a variety of different databases one can use. Each tuned to different needs of users. I'm less familiar with file systems, but again I understand that there are also a variety tuned to different needs.
In particular, when companies like Google or Amazon describe their cloud infrastructure, while they may have many database offerings, they have one file system. The file systems are designed based on how each company would use them (or rather the most profitable way they use them, Google for search, Amazon for selling things). However, I could imagine that large companies like these might have multiple objectives. Different parts of their business might have better performance using different file systems. How do they (or would they) go about managing this?
For instance, suppose Google and Amazon merged into Google-Amazon. Google has GFS optimized for their needs and Amazon has Dynamo optimized for its needs. The most obvious approach would be to keep the status quo. I can think of a few reasons why they might not want to though.
Economies of Scale - If Dynamo servers are more heavily utilized than GFS servers, then there is an imbalance. They could either add more Dynamo servers or cannibalize GFS servers.
Efficiency - Google might find that some things it uses GFS for might be more efficiently done with Dynamo as a file system, and vice-versa for Amazon. For instance, part of the reason for adopting Colossus was that they could ditch the original GFS' 64MB block size in order to better accommodate Gmail and others. It might be the case that Dynamo could handle Gmail as well as Colossus, in which case future design decisions for GFS could focus on search applications.
So I'm curious what the options for Google-Amazon to manage multiple file systems. For instance, I would imagine that they could build some layer above their current distributed file systems, a sort of doubly distributed file system, to manage communication with and between the 2 (or N) distributed file systems among all the server farms. Then, all the relevant tools could be made to work on the doubly distributed file system (easier said than done, I imagine).
Aucun commentaire:
Enregistrer un commentaire