System Design of Airbnb (Part 3): Messaging, Notifications, Trust & Safety, and Customer Support
Completing the end-to-end platform with robust communications, secure safeguards, and seamless user support.
In Part 1 of this series, we explored User Profiles, Listing Management, Search & Discovery, and Availability Service.
In Part 2, we tackled Booking & Reservations, Payment, and Reviews & Ratings—the revenue-critical and trust-building components of the platform.
Now in Part 3, we’ll complete our end-to-end design by focusing on:
Messaging System – direct communication between guests and hosts.
Notifications – real-time alerts (booking confirmations, status updates, etc.).
Trust & Safety – fraud detection, user verification, and dispute management.
Customer Support – handling queries, issue resolution, and escalation.
By the end of this article, you’ll have a holistic view of how all services fit together in an Airbnb-like microservices ecosystem.
1. Understand Question as User
We aim to design:
Messaging System that allows guests and hosts to communicate securely and efficiently (before, during, and after a stay).
Notifications framework to send relevant alerts and updates (e.g., booking confirmations, payment receipts, reminders).
Trust & Safety mechanisms to protect users from fraud, abuse, or policy violations, including identity verifications and risk scoring.
Customer Support workflows to handle queries, disputes, refunds, and more, integrated with the rest of the platform for a 360° view of user interactions.
As with previous parts, we want a scalable, fault-tolerant, secure, and globally accessible design.
2. Requirement Gathering
2.1 Functional Requirements (FR)
Messaging System
Guests and hosts can exchange messages once the guest shows interest or makes a booking request.
Support real-time or near real-time updates (e.g., instant chat or short-polling/long-polling).
Store conversation history (text, minimal media like images or attachments).
Optionally allow group chats for multi-guest bookings or host co-managers.
Notifications
Send booking confirmations, payment receipts, or listing inquiries via email/push/SMS.
Support different channels based on user preferences.
Provide in-app notification center for quick reference.
Manage notification templates, localization, and user opt-in preferences.
Trust & Safety
User Verification: Basic ID checks, email/phone verification, possibly advanced KYC for hosts.
Fraud Detection: Identify suspicious activities (e.g., multiple cancellations, stolen credit cards, spam listings).
Risk Scoring: Assign risk levels to new users or transactions.
Policy Enforcement: Implement content moderation for messages and listings, block or remove users violating policies.
Customer Support
Tiered support system (self-service FAQs, chatbots, human agent escalation).
Ticketing: Each issue or complaint generates a ticket.
Integration: View user’s entire history—bookings, messages, payments—within the support tool.
Issue Resolution: Tools to process refunds, re-bookings, or dispute mediation.
2.2 Non-Functional Requirements (NFR)
Scalability & Low Latency: Messages and notifications should be delivered with minimal delay even at peak loads.
Reliability: Ensure message storage and notifications do not get lost (at-least-once delivery).
Security & Privacy: Conversations are private; implement encryption at rest or in transit. Sensitive user data must be protected (GDPR, data protection laws).
Compliance: Payment or identity checks may require adherence to local regulations, KYC/AML in certain countries.
Availability: Minimal downtime, especially for critical notifications and urgent support issues.
2.3 Out of Scope
Advanced AI-based content moderation (we’ll keep it basic or mention a third-party integration).
Detailed security frameworks for deep fraud detection or risk analytics (outline only).
Full-blown CRM or enterprise-level customer support beyond a basic ticketing system.
3. BOE Calculations / Capacity Estimations
3.1 Messaging System
Daily Active Conversations: Suppose 2 million active daily users leading to ~1 million daily conversations.
Message Volume: Each conversation averages 5–10 messages/day → ~5–10 million messages/day.
Peak QPS: ~5–10 million messages / 86,400 sec ≈ 60–115 messages/sec average, with possible spikes.
Storage: Each message ~500 bytes (ID, sender, recipient, text, timestamps, metadata) → ~2.5–5 GB/day of new data.
3.2 Notifications
Notification Triggers: Bookings, payment confirmations, host replies, etc.
Potentially 2–3 notifications per booking (confirmation, reminder, etc.) → with 500,000 daily bookings (from Part 2), we get ~1–1.5 million notifications/day just for booking events.
Peak QPS might be higher if multiple events cluster in time.
Storage: Minimal metadata for logs. Could be ~300 bytes/notification → ~300–450 MB/day for logs.
3.3 Trust & Safety
User Base: 10 million users (from Part 1).
Verification Data: If collecting ID scans or partial data, storage could expand quickly.
Fraud Check Rate: Potentially run a risk check for each booking or new listing → ~500,000 checks/day.
3.4 Customer Support
Tickets: A fraction of bookings lead to support tickets. Assume 1% of total bookings → ~5,000 tickets/day.
Ticket Size: ~1 KB for main fields + attached messages or docs. Could be ~5 MB–10 MB/day of new data, plus attachments.
Overall, these numbers are manageable with horizontally scalable services and databases (NoSQL for messages, for instance, or specialized search indexes for logs).
4. Approach “Think-Out-Loud” (High-Level Design)
4.1 Architecture Overview
We maintain the microservices approach introduced in Parts 1 & 2:
Messaging Service
Notification Service
Trust & Safety Service
Customer Support Service
They integrate with existing core services:
User Service (for user identity & profiles)
Booking Service (for reservations & statuses)
Payment Service (for transaction data)
Reviews Service (for rating and feedback context)
All microservices communicate through an API Gateway or direct Service-to-Service calls. Message Queues (Kafka, RabbitMQ) handle event-driven updates and asynchronous workflows.
4.2 Data Consistency vs. Real-Time Updates
Messaging: Requires near real-time updates (web sockets, push notifications). At-least-once or exactly-once message storage (depending on design).
Notifications: Usually asynchronous, small tolerable delay is acceptable but must not be lost.
Trust & Safety: Real-time checks can be synchronous (e.g., booking flow) or async background checks for user changes.
Customer Support: Strong consistency on ticket states (OPEN → IN_PROGRESS → RESOLVED). Some parts can be async (e.g., generating user’s full history).
4.3 Security and Privacy Considerations
Messaging: End-to-end encryption is ideal but can complicate moderation. Alternatively, server-side encryption at rest.
Trust & Safety: Respect privacy rules (GDPR). Only store essential info. Possibly integrate with third-party ID verification services.
Notifications: Handle PII carefully in email/SMS. Let users opt out or configure preferences.
5. Databases & Rationale
5.1. Messaging Service
Primary Use Case: Storing large volumes of messages and conversation data between guests and hosts, potentially with high write throughput and flexible messaging structure (text, optional attachments).
Chosen Database: NoSQL (e.g., Cassandra, MongoDB, or Amazon DynamoDB).
Why:
Scalability: Must handle spikes in chat volume.
Flexible Schema: Message payloads (text, images, metadata) can vary.
High Write Throughput: NoSQL solutions excel at horizontal scaling for frequent inserts.
Why Not a Traditional RDBMS:
Handling billions of messages with highly variable structure can be less efficient in relational schemas.
The overhead of JOINs or complex transactions is not as critical for ephemeral or timeline-based chat data.
Schemas:
conversations(_id, participants, metadata)
messages(_id, conversation_id, text, attachments)
5.2. Notification Service
Primary Use Case:
Storing user notification preferences (channels, opt-ins/opt-outs).
Logging sent notifications for audits and debugging.
Chosen Database: Relational DB (e.g., PostgreSQL or MySQL) or a lightweight NoSQL store for logs.
Why:
User Preferences typically require strong consistency (you don’t want to lose a user’s opt-out).
The data model is relatively structured (e.g., userId, channels, timestamps).
Notification Logs can be large, but often are accessed for audits or recent lookups. A relational store with partitioning or a cheaper cold storage for old logs can work well.
Why Not a Pure NoSQL:
Preferences are highly relational to user IDs and typically require ACID guarantees.
The volume of user preferences is smaller compared to messaging. A standard RDB works well.
Schemas:
user_preferences(user_id, channels, updated_at)
notification_logs(id, user_id, type, status, created_at)
5.3. Trust & Safety Service
Primary Use Case:
Storing fraud checks, risk assessments, flagged cases, and moderation logs.
Possibly referencing user or booking details to link suspicious behavior.
Chosen Database: Relational DB (e.g., PostgreSQL or MySQL).
Why:
Transactional & Auditable: We want to store a verifiable record of risk checks and moderation decisions.
Queries & Joins: Cases often reference user info, booking info, and we may need multi-dimensional queries (especially if an agent is investigating).
Why Not a Pure NoSQL:
We need reliable auditing and potentially complex queries for investigators. A relational model fits well.
Many trust/safety workflows rely on single authoritative records for legal or compliance reasons.
Schemas:
risk_checks(id, user_id, risk_score, flagged, reason, created_at)
cases(case_id, user_id, booking_id, status, summary, updated_at)
6. APIs
6.1. Messaging Service
Create Message
Endpoint:
POST /messages
Payload:
{ conversationId, senderId, text, attachments[] }
Response:
201 Created { messageId, sentAt }
Get Messages
Endpoint:
GET /conversations/{id}/messages
Payload: Query Params (e.g., limit, offset)
Response:
200 OK [ { messageId, senderId, text, ... }, ... ]
Mark Messages Read
Endpoint:
PATCH /messages/mark-read
Payload:
{ userId, messageIds[] }
Response:
200 OK
6.2. Notification Service
Update Notification Preferences
Endpoint:
PUT /notifications/preferences/{userId}
Payload:
{ channels[], locale }
Response:
200 OK
List Notification Logs
Endpoint:
GET /notifications/logs
Payload: Query Params (userId, type, limit, offset)
Response:
200 OK [ { id, userId, type, channel, status }, ... ]
6.3. Trust & Safety Service
Evaluate Risk
Endpoint:
POST /trust/evaluate
Payload:
{ userId, bookingId, context? }
Response:
200 OK { riskScore, flagged }
Open / Update Case
Endpoint:
POST /trust/cases
Payload:
{ userId, bookingId?, status, summary }
Response:
201 Created { caseId, status }
6.4. Customer Support Service
Create Support Ticket
Endpoint:
POST /support/tickets
Payload:
{ userId, subject, description }
Response:
201 Created { ticketId, status }
Add Comment to Ticket
Endpoint:
POST /support/tickets/{ticketId}/comments
Payload:
{ authorId, commentText }
Response:
201 Created { commentId, createdAt }
Update Ticket Status
Endpoint:
PATCH /support/tickets/{ticketId}
Payload:
{ status }
Response:
200 OK
7. Deep Dive into Core Services
A. Messaging Service
Responsibilities
Conversation Management
Create conversation threads between guest and host (or multiple participants if needed).
Mark read/unread states, handle message statuses (sent, delivered, read).
Real-Time Delivery
Support web sockets or server-sent events (SSE) for live chat.
Fallback to long-polling if real-time is not feasible.
Message Storage & Retrieval
Persist messages in a scalable data store (NoSQL or specialized chat DB).
Indexed by conversation ID, timestamp, user ID for efficient lookups.
Core Components
API Layer:
POST /messages
to send a message in a conversation.GET /conversations/{id}
to fetch conversation history.Access control to ensure only participants read the messages.
WebSocket Gateway (optional):
Dedicated gateway to maintain persistent connections for real-time chat.
Subscribes to message events from the backend and pushes them to clients.
Message Database:
A NoSQL DB (e.g., Cassandra, DynamoDB, MongoDB) to handle high write volumes and flexible queries.
Possibly store older messages in cold storage or archives to reduce cost.
Event Bus:
Each new message triggers a “message.created” event for analytics, notification triggers, or content moderation.
Handling Corner Cases
Concurrency & Ordering:
Use a timestamp or sequence approach to maintain message order.
Inappropriate Content:
Basic text checks or ML-based moderation (e.g., profanity filters).
Potentially block or warn for policy violations.
Offline / Missed Messages:
Store messages in the DB, notify the user (email/push) if they’re offline.
Data Retention & Deletion:
Allow users to delete messages or entire conversations (subject to platform policy).
B. Notification Service
Responsibilities
Notification Orchestration
Subscribes to events (booking.confirmed, payment.success, message.received) from multiple services.
Determines the right channel (push, email, SMS) and template.
User Preference Management
Users can opt in/out of specific channels.
Store preferences to respect privacy and comply with laws (CAN-SPAM, GDPR, etc.).
Template & Localization
Manage notification templates in various languages.
Insert dynamic data (user name, booking details).
Delivery & Retry
Integrate with email providers (SendGrid, SES), push notification services (FCM, APNs), SMS gateways (Twilio), etc.
Implement retries for transient failures.
Log success/failure for analytics and audits.
Core Components
Events Consumer:
Listens on Kafka or RabbitMQ for relevant events.
Processes them asynchronously to generate notifications.
Notification Dispatcher:
A rules engine that checks user preference & channel availability.
Queues or triggers the final send action to external providers.
Notification Logs DB:
Stores each sent notification (event ID, user ID, channel, status).
Useful for customer support or debugging.
Handling Corner Cases
High Volume Spikes:
Bulk notifications (e.g., hosts with many listings). Must throttle or batch.
Failed Sends:
Automatic retries. Possibly escalate to a fallback channel (push → SMS) if urgent.
Misconfigured Preferences:
Fallback to default or prompt user to update preferences if all channels are off.
Localization Issues:
Default to a global language if no template is available for user’s locale.
C. Trust & Safety Service
Responsibilities
User Verification
Email, phone number verification.
Optional ID checks for hosts or high-risk countries.
Fraud Detection & Risk Scoring
Monitor suspicious patterns (multiple failed payments, abnormal login locations).
Score new listings or bookings for risk—may require manual review or additional steps.
Policy Enforcement
Content moderation for listings, messages, or reviews flagged as inappropriate.
Ban or suspend users for severe violations.
Dispute & Resolution
If a user reports fraud or policy breach, open a case.
Integrate with Customer Support for final resolution.
Core Components
Verification Module:
Interfaces with third-party ID check services.
Manages “verified” flags in the user profile.
Fraud Detection Engine:
Rules-based or ML-based scoring engine (e.g., suspicious IP addresses, multiple accounts using same card).
Real-time checks on new bookings. Possibly place them on hold if the score is too high.
Moderation Tools:
Basic text scanning for messages, reviews, or listings.
Admin consoles for manual inspection of flagged items.
Case Management:
Tracks disputes or claims.
Possibly integrated with the Customer Support ticketing system if escalated.
Handling Corner Cases
False Positives:
Provide appeals or secondary verification steps.
Privacy Laws:
Must store minimal personal data, especially for ID verifications.
Global Variation:
Different countries have different KYC/AML requirements.
Real-Time vs. Batch:
Some checks can be real-time (new booking) vs. daily batch scans of logs.
D. Customer Support Service
Responsibilities
Ticketing & Issue Tracking
Central place to log user inquiries and problems.
Status transitions: OPEN → ASSIGNED → RESOLVED → CLOSED.
Integrated User History
Agents see booking history, payments, messages, reviews.
Quick resolution by referencing all details in one place.
Refunds & Adjustments
Interface with Payment Service to process partial or full refunds.
Must respect cancellation policies or escalate for special cases.
Escalations & SLA
Tiered support (L1, L2, specialized teams).
Automatic escalation if not resolved within SLA time.
Core Components
Support Portal / UI:
Used by agents to search for tickets, user details, and system logs.
Ticket DB:
Stores all tickets with references to user, booking, messages, etc.
Could be relational or a robust NoSQL with search capabilities.
Workflow Engine:
Automates ticket routing to the right team.
Triggers notifications to the user or agent at certain steps.
Analytics & Reporting:
Track common issues, agent performance, resolution times.
Helps identify product or platform improvements.
Handling Corner Cases
High Volume Incidents:
During system outages or major events, scale support channels.
Possibly enable chatbots or AI-based FAQ to handle simpler queries.
Refund Disputes & Exceptions:
Follow a structured workflow to ensure consistent handling.
Might require manager overrides or advanced checks in the Payment system.
Data Privacy:
Agents should only see relevant data. Enforce role-based access.
Sensitive info (like payment details) must be masked or hidden.
Audit Trails:
Keep a history of all agent actions (who changed a ticket, why).
Important for compliance and internal reviews.
9. Bonus Read: Guest-Host Messaging Flow (Detailed)
Guest Initiates Contact
The guest is logged into the platform (having a valid session or token).
The guest navigates to a listing page and clicks a “Contact Host” or “Message Host” button.
The client application (web or mobile) displays a message input UI, often part of a conversation thread.
Client Sends ‘Create Message’ Request
When the guest presses “Send,” the client device packages the message text (and optional attachments) with relevant conversation or listing context.
An HTTP POST (e.g.,
POST /messages
) is sent to the API Gateway with JSON payload containingconversationId
(if it already exists),senderId
(the guest), and the message content.
API Gateway Routing
The API Gateway inspects the request path (
/messages
) and routes it to the Messaging Service.Basic authentication/authorization checks happen here (e.g., verifying the guest’s token).
Check or Create Conversation (if needed)
The Messaging Service receives the message creation request.
If
conversationId
is provided, the service verifies that it exists and the guest is indeed a participant.If no conversation exists yet (e.g., first time the guest is contacting the host), the service may:
Generate a new
conversationId
Insert a new record in the
conversations
collection/table indicating participants[guestId, hostId]
.
Store the Message
The Messaging Service inserts the new message into the NoSQL ‘messages’ data store, including:
conversationId
senderId
text
(or sanitized message content)Timestamp (
sentAt
).
The database returns an ACK (confirmation) with the generated
messageId
.
Update Conversation Metadata
Optionally, the Messaging Service updates the conversation’s
lastMessageAt
timestamp orlastMessagePreview
.This allows quick display of recent activity in conversation lists.
Response to Guest
The Messaging Service returns
201 Created
(or200 OK
) to the API Gateway, including themessageId
andsentAt
timestamp.The API Gateway relays this success response back to the guest’s device, confirming the message was successfully stored.
Asynchronous Notification Event
In parallel (or immediately after storing the message), the Messaging Service may publish a “message.created” event to an internal Message Queue (MQ) (e.g., Kafka, RabbitMQ).
This event typically contains
conversationId
,senderId
, and a short text snippet or metadata.
Trigger Real-Time Update (If Using WebSockets or SSE)
If the system supports real-time chat, the Messaging Service (or a specialized WebSocket server) pushes the new message event to the host’s active session (if the host is currently online).
The host’s chat interface will immediately display the incoming message, with no page refresh needed.
Notification Service Listens
The Notification Service subscribes to the “message.created” event from the MQ.
Upon receiving the event, it checks whether the host is online or has some notification preferences for offline messages.
Notification Preference Lookup
The Notification Service may query a local store or the User Service for the host’s preferences (
email_enabled
,sms_enabled
, etc.).Based on these preferences, the service decides how and when to notify the host (e.g., push notification, email, SMS).
Send Notification (Offline Host)
If the host is offline (not in the chat session), the Notification Service sends an email or push message to the host device, saying “New message from Guest X.”
External providers (e.g., SendGrid for email, FCM/APNs for push) return an ACK to the Notification Service.
Host Receives Alert
On the host’s device, a push notification or email arrives, prompting them to open the app or website.
Upon opening, the host device calls the API Gateway to fetch the latest conversation messages from the Messaging Service (e.g.,
GET /conversations/{conversationId}/messages
).
Host Reads and Responds
The host sees the guest’s message in the conversation thread.
The host composes a reply; similarly, a
POST /messages
request is sent from the host’s device to the Messaging Service.The cycle repeats:
The new message is stored,
Possibly triggers real-time or offline notifications to the guest,
The conversation continues as needed.
Read Receipts (Optional)
If the system supports read receipts, whenever the host or guest opens the conversation, the client can call something like
PATCH /messages/mark-read
with themessageIds
that were viewed.The Messaging Service updates those messages’
readBy
array or sets a read timestamp.Another optional event (“message.read”) can be broadcast so the sender sees “Host read your message.”
Scalability & Concurrency
If multiple messages are sent in rapid succession, the NoSQL data store scales horizontally to handle the high write throughput.
The system is designed so each microservice (Messaging, Notification, WebSockets) can scale independently behind load balancers or container orchestration.
Edge Cases & Retry
If the Messaging DB write fails or times out, the Messaging Service returns an error to the gateway, prompting the client to retry.
If the Notification Service cannot reach an external email/push provider, it queues the notification for retry and logs a failure event if it ultimately cannot be delivered.
Analytics & Logging (Optional)
The platform may log message traffic for analytics or store aggregated metrics (e.g., message count per listing).
The Messaging Service can push logs to a central analytics pipeline or data warehouse.
Security & Moderation
(Optional) The Trust & Safety Service could process text from “message.created” events to run basic profanity checks or detect suspicious content. If flagged, it may hide or quarantine messages or open a new case.
The conversation data remains in the Messaging DB, subject to data retention policies and user privacy regulations.
Conclusion of Conversation
The conversation remains persisted in the
conversations
andmessages
store indefinitely or until archived.Participants can delete or archive it from their UI, but actual data may remain on the backend, subject to the platform’s policies.
10. Addressing Non-Functional Requirements (NFRs) for Part 3
A. Scalability & High Availability
Messaging
Shard conversation data across multiple NoSQL clusters.
Use load balancers or microservice orchestration (Kubernetes) to handle spikes.
Notifications
Asynchronous processing with queue-based backpressure.
Horizontal scaling of worker pods to handle sending large volumes of messages/emails.
Trust & Safety
Fraud checks distributed across multiple nodes or regions.
Possibly incorporate an ML platform that scales with the volume of scoring.
Customer Support
Horizontal scaling of the ticketing system’s DB.
Distribute agent load and use region-based support centers if global.
B. Performance & Low Latency
Messaging: Real-time chat typically requires latencies under a few seconds. WebSocket connections or SSE reduce overhead vs. repeated polling.
Notifications: Usually asynchronous. Aim to deliver within seconds, but short delays (1–5 sec) are often acceptable.
Trust & Safety: Fraud checks for bookings must not significantly slow down the booking flow. Use efficient rule evaluation or precomputed risk scores.
Support: Quick load times for ticket pages. Agents must see user data instantly. Caching or partial pre-fetch helps.
C. Security & Privacy
Encryption: TLS for all in-transit data (messages, support queries). Optionally encrypt sensitive data at rest.
Access Control: RBAC for support staff, moderators, and system admins.
Personal Data Minimization: Don’t over-collect or store unneeded data (especially for ID checks).
Compliance: GDPR “Right to be Forgotten” → must handle user data deletion gracefully.
D. Reliability & Fault Tolerance
Multi-AZ or Multi-Region: Deploy messaging and notifications across multiple availability zones.
Message Queues with Retries: Ensure no message or notification is lost if a node fails.
Trust & Safety: Duplicate checks or fallback modules if the main risk engine is offline.
Customer Support: If the main DB is down, have a read-replica or failover strategy. Offline or emergency support mode might still collect tickets for later sync.
E. Observability & Monitoring
Logging & Metrics: Each microservice logs to a centralized system (ELK, Splunk).
Alerting: If error rates or latencies spike (e.g., messaging timeouts, notification failures), send alerts to ops.
Tracing: Distributed traces from user-initiated actions (like sending a message) through the entire system.
Dashboards: Real-time dashboards for conversation traffic, notification throughput, fraud detection rates, support ticket volumes.
7. Bringing It All Together
With Part 3 covering Messaging, Notifications, Trust & Safety, and Customer Support, we now have a complete picture of an Airbnb-like platform:
Messaging Service enables secure, real-time communication.
Notification Service ensures users are promptly informed about critical events.
Trust & Safety upholds the platform’s integrity, managing verifications and protecting against fraud or abuse.
Customer Support closes the loop for user queries, disputes, and specialized needs.
All these services work in concert with the User, Listing, Booking, Payment, and Reviews services from Parts 1 and 2. By deploying these microservices with robust scalability, security, and fault-tolerance practices, we can deliver a global, reliable, user-friendly experience to both guests and hosts.
Congratulations! You’ve reached the end of the Airbnb System Design Series—equipped with insights spanning user profiles, listings, bookings, payments, reviews, messaging, trust, and much more. This holistic approach ensures that as your platform grows, you can maintain performance, reliability, and user satisfaction at every step of the journey.
Thank you for following along in this three-part deep dive into designing an Airbnb-like system.