Programmers: storm topology to handle "dating website"-like workloads

vendredi 20 mars 2015

storm topology to handle "dating website"-like workloads

suppose im writing a dating website, similar to okcupid. there are profiles, and i need to compute the (N^2) "match" table - given every 2 profiles whats the match between them?

i was thinking this could be done by creating a spout to listen on a "new/updated profiles" queue (say kafka, doesnt matter), but then how do i break down the matching to achieve any degree of parallelism?

if i have a single bolt that compares the profile vs the entire DB that wont scale.

if i create another spout, for "all profiles" it will run in a continous loop and never stop (?)

obviously the assumption is that the "churn rate" (rate of new/updated profiles) is less of an issue than the sheer size of the database.

any suggestions on how to design the topology would be very welcome.

Programmers

vendredi 20 mars 2015

storm topology to handle "dating website"-like workloads

Aucun commentaire:

Enregistrer un commentaire