vendredi 20 mars 2015

storm topology to handle "dating website"-like workloads


suppose im writing a dating website, similar to okcupid. there are profiles, and i need to compute the (N^2) "match" table - given every 2 profiles whats the match between them?


i was thinking this could be done by creating a spout to listen on a "new/updated profiles" queue (say kafka, doesnt matter), but then how do i break down the matching to achieve any degree of parallelism?


if i have a single bolt that compares the profile vs the entire DB that wont scale.


if i create another spout, for "all profiles" it will run in a continous loop and never stop (?)


obviously the assumption is that the "churn rate" (rate of new/updated profiles) is less of an issue than the sheer size of the database.


any suggestions on how to design the topology would be very welcome.





Aucun commentaire:

Enregistrer un commentaire