How Mixbook Used Generative AI to Deliver Personalized Photo Book Experiences

This article is co-written with Vlad Lebedev and DJ Charles of Mixbook.

Mixbook is an award-winning creation platform that gives users unparalleled creative freedom to design and share unique stories, transforming the lives of over six million people. Today, Mixbook is the highest-rated photo book service in the United States with 26,000 five-star reviews.

Mixbook empowers users to share their stories with creativity and confidence. Its mission is to help users celebrate the beautiful moments in their lives. Mixbook aims to foster deep connections between users and their loved ones by sharing their stories across physical and digital mediums.

A few years ago, Mixbook undertook a strategic initiative to move its operational workloads to Amazon Web Services (AWS), a decision that has continually delivered significant benefits. This critical decision has been instrumental in achieving its mission, ensuring that its system operations are characterized by reliability, superior performance, and operational efficiency.

In this post, we show you how Mixbook used AWS’s generative artificial intelligence (AI) capabilities to personalize its photo book experiences, a step toward its mission.

Business challenge

In today’s digital world, we have a lot of photos that we take and share with our friends and family. Let’s imagine a scenario where we have hundreds of photos from a recent family vacation and we want to create a photo album to make it memorable. However, choosing the best photos from the lot and describing them with captions can take a lot of time and effort. As we all know, a picture is worth a thousand words, which is why trying to sum up a moment with a caption of just six to ten words can be so difficult. Mixbook has truly understood the problem and is here to solve it.

Solution

Mixbook Smart Captions is the magic solution to the caption problem. It not only interprets users’ photos; it also adds a touch of creativity, making stories stand out.

Most importantly, Smart Captions doesn’t fully automate the creative process. Instead, it provides a creative partner that allows the user to tell their story to imbue a book with personal flourishes. Whether it’s a selfie or a panoramic shot, the goal is to make sure users’ photos speak for themselves, effortlessly.

Presentation of the architecture

The implementation of the system involves three main elements:

Data collection
Information inference
Creative synthesis

Caption generation is highly dependent on the inference process, as the quality and relevance of the results of the understanding process directly influence the specificity and personalization of caption generation. The data flow diagram of the caption generation process is described below.

Data collection

A user uploads photos to Mixbook. The raw photos are stored in Amazon Simple Storage Service (Amazon S3).

The data capture process involves three macro components: Amazon Aurora MySQL-Compatible Edition, Amazon S3, and AWS Fargate for Amazon ECS. Aurora MySQL serves as the primary relational data storage solution for tracking and recording media file download sessions and the metadata that accompanies them. It offers flexible capacity options, ranging from serverless on one side to provisioned instances reserved for predictable long-term usage on the other. S3, in turn, provides efficient, scalable, and secure storage for the media file objects themselves. Its storage classes keep recent downloads in a warm state for low-latency access, while older objects can be moved to Amazon S3 Glacier tiers, minimizing storage overhead over time. Amazon Elastic Container Registry (Amazon ECS), when used in conjunction with AWS Fargate’s low-maintenance compute environment, forms a convenient orchestrator for containerized workloads, bringing all the components together seamlessly.

Inference

The understanding phase extracts essential contextual and semantic elements from the input, including image descriptions, temporal and spatial data, facial recognition, emotional sentiment, and labels. Of these, image descriptions generated by a computer vision model provide the most fundamental understanding of captured moments. Amazon Rekognition provides accurate detection of facial bounding boxes and emotional expressions. Face detection is essential for optimal automatic photo placement and cropping, while emotion recognition enables more effective story tone adjustments. The facial bounding boxes detected in photos are primarily used for optimal automatic photo placement and cropping. Emotions are used to help select a better tone to make it funnier or more nostalgic (for example). Additionally, Amazon Rekognition improves security by identifying potentially objectionable content.

The inference pipeline is powered by a multi-stage architecture based on AWS Lambda, which maximizes cost-effectiveness and elasticity by running independent image analysis steps in parallel. AWS Step Functions enables synchronization and ordering of interdependent steps.

Image captions are generated by an Amazon SageMaker inference endpoint, which is enhanced by an Amazon ElastiCache-powered buffer for Redis. The buffer was implemented after evaluating the performance of the captioning model. The evaluation found that the model performed optimally when processing batches of images, but underperformed when analyzing individual images.

Generation

The caption generation mechanism behind the writing assistant feature is what turns Mixbook Studio into a natural language storytelling tool. Powered by a Llama language model, the assistant initially used carefully crafted prompts created by AI experts. However, the Mixbook Storyarts team sought to gain more precise control over the style and tone of the captions, leading to a diverse team including an Emmy-nominated screenwriter who reviewed, adjusted, and added unique hand-crafted examples. This led to a process of fine-tuning the model, moderating edited responses, and deploying approved models for experimental and public releases. After inference, three captions are created and stored in Amazon Relational Database Service (Amazon RDS).

The following image shows the Mixbook Smart Captions feature in Mixbook Studio.

Benefits

Mixbook implemented this solution to provide new features to its customers. It has helped improve the user experience and improve operational efficiency.

User experience

Improved storytelling:Captures users’ emotions and experiences, now beautifully expressed through heartfelt captions.
User pleasure: Adds an element of surprise with subtitles that are not only accurate, but also enjoyable and imaginative. One delighted user, Hanie U, says, “I hope more subtitle experiments are released in the future.” Another user, Megan P., says, “This worked great!” Users can also edit the generated subtitles.
Time efficiency:No one has time to struggle with subtitles. This feature saves valuable time while showcasing users’ stories.
Safety and Accuracy:Subtitles have been generated responsibly, leveraging safeguards to ensure moderation and content relevance.

System

Elasticity and Scalability of Lambda
Understandable workflow orchestration with Step Functions
Variety of SageMaker base models and tuning capabilities for maximum control

Due to improved user satisfaction, Mixbook was named an official Webby Award winner in 2024 for Applications and Software Better use of AI and machine learning.

“AWS enables us to scale the innovations our customers value most. And now, with AWS’s new Generative AI capabilities, we’re able to surprise our customers with creative power they never thought possible. Innovations like this are why we’ve partnered with AWS since beta in 2006.”

– Andrew Laffoon, CEO of Mixbook

Conclusion

In early 2023, Mixbook began experimenting with AWS’s Generative AI solutions to complement their existing application. They started with a rapid proof of concept to produce results to show the art of the possible. Continuous development, testing, and integration using the breadth of AWS services across compute, storage, analytics, and machine learning allowed them to iterate quickly. After releasing Smart Caption capabilities in beta, they were able to quickly adapt to real-world usage patterns and protect the value of the product.

Try Mixbook Studio to experience storytelling. To learn more about AWS Generative AI solutions, start with Transform Your Business with Generative AI. To hear more from Mixbook leaders, listen to the AWS re:Think Podcast available on Art19, Apple Podcasts and Spotify.

About the authors

Vlad Lebedev Vlad is a Senior Technical Lead at Mixbook. He leads a product engineering team tasked with transforming Mixbook into a place for heartfelt storytelling. He draws on over a decade of hands-on experience in web development, systems design, and data engineering to deliver elegant solutions to complex problems. Vlad enjoys learning about contemporary and ancient cultures, their history, and their languages.

DJ Charles DJ is the CTO of Mixbook. He has spent 30 years designing interactive and e-commerce designs for major brands. The innovation in broadband technology for the cable industry in the 90s, the revolution in supply chain processes in the 2000s, and the advancement of environmental technology at Perillon led to the creation of real-time global auction platforms for brands like Sotheby’s and eBay. Beyond technology, DJ enjoys learning new musical instruments, the art of songwriting, and is deeply involved in music production and engineering in his spare time.

Malini Chatterjee is a Senior Solutions Architect at AWS. She provides consulting services to AWS customers on their workloads across a variety of AWS technologies. She brings extensive expertise in data analytics and machine learning. Prior to joining AWS, she designed data solutions in the financial industry. She is passionate about semi-classical dance and performing at community events. She enjoys traveling and spending time with her family.

Jessica Oliveira is an Account Manager at AWS providing guidance and support to commercial sales in Northern California. She is passionate about building strategic collaborations to ensure customer success. Outside of work, she enjoys traveling, learning about different languages and cultures, and spending time with her family.