System design interview: how to design comments and reply, likes button and total views on Youtube
Methodology: READ MF!
[Originally from the Post: System design interview: how to design a chat system (e.g., Facebook Messenger, WeChat or WhatsApp)]
Remind ourselves with the "READ MF!" methodology.
This is a follow up on the previous post: System design interview: how to design a video platform (e.g, Youtube, Netflix)
Requirements
First,
let‘s quickly about requirements. Likes and views are relatively
straightforward, users can torllerate a bit delay and inaccuracy. For
the majority case, as long as user clicks the button, it pluses one,
then it‘s fine. Sometimes it‘s not the case if videos have too many
likes, e.g., if a Youtube video already has 10K likes, you plus 1, it
still shows 10K, it‘s just the UI tricks you that it‘s toggled.
For
comments reply, there are a few different styles. The major ones such
as Youtube, Instagram and TikTok uses following style. It displays
comments (directly reply to video) based on order of likes and timestamp
(descending) and any reply to comments are only 1 on 1, meaning [email protected] How
are you, then [email protected] I am fine thank you and you? There is no more
indentation needed.
Reddit
uses the "block building" reply-to style (中文俗称"盖楼"), where it shows
which reply replies to which reply, and it needs to show the
indentations about those replies.
Estimation
For viral videos, say normally it has around 10M views
For likes, assuming 20% people liked a video, 10M * 20% = 2M likes
For comments, 1% people would leave a comment (we are lazy) 10M * 10% = 100K comments
For
the majority normal videos, it would probably has 1000 views and 100
likes top and maybe 10 comments, a relationship DB could solve it pretty
well.
Key designs and terms
Comments design
If you start building your product, just bootstrap it with a relational DB
Introduce
a comments table shard by video UUID, add a reply_to_uuid to know which
comment the reply is targeted to and leave it null for root comment.
Build an index on the reply_to_uuid
Select * from comments where reply_to_uuid is null order by comment likes desc, timestamp desc
If you need to see the replies to those comments, just
Select * from comments where reply_to_uuid is the_target_comment_uuid order by comment likes desc, timestamp desc
Even
if your product becomes Youtube scale, the comments would be around
100K for viral videos, the above solution would still works fine. Simply
add more capacity to better shard your comments using consistent
hashing, cache the comments would do the trick.
If you
need to build the Reddit tree structure, just sort it in memory. If the
problem can fit into memory, it becomes much easier.
The
extreme case is your comments section becoming a chat, then we can do
something like an append only in memory DB or redis cache keep appending
the values to the queue with async backup to DB.
Views and Likes count design
Similarly, when you bootstrap the
project, keep a counter in DB or in memory cache solves your problem
when traffic is low. If within one machine, you don‘t even need locks
just use compare and swap (CAS), atomic operations for counting, thread
safe.
If your product starts to become popular, add more capacity using
consistent hashing. Add in memory cache like Redis to count the values
(memory access time 100us vs disk access time 10ms. 100Kx improvement).
Could be further optimized using distributed counter, aggregating the
results together when read.
If you product becomes
YouTube scale, then
use offline counting. Build a pipeline to promote the videos from cold
to hot/viral once the view counts hit a certain threshold (say 1M). Use
async messaging like Kafka to ingest from those logs and pump it to data
warehouse, query it and update the values on a cron schedule. Of course
on the UI side, you need to toggle the like button, plus 1 if needed
(Sometimes you would see a 100K likes video, even if clicked the like,
the count would not be increased)
Baozi Youtube Video
References (Credits to original authors)
原文地址:https://www.cnblogs.com/baozitraining/p/12178850.html