close
close

Beam College 2024: Is it worth your time?

Beam College 2024: Is it worth your time?

Beam College 2024: Is it worth your time?Beam College 2024

I recently attended Beam College for the second time and am happy to share my thoughts. It should be noted that this review is coming from a huge Apache Beam fan. I may be a little biased, but I am happy to share my experience! 😄

🎓 Beam College

Beam College is a free, hands-on training program designed to improve data processing skills. It offers flexible workshops led by industry experts, focused on Apache Beam. Participants learn everything from basic concepts to advanced use cases and best practices, gaining hands-on experience building effective data pipelines. The program aims to bridge the gap between theoretical knowledge and real-world applications, providing valuable insights for both beginners and experienced data professionals looking to master Apache Beam.

Beam College Online Platform 2024

You will find all the sessions on the Apache Beam Youtube channel.

Beam College 2024 offered 3 days of sessions (the schedule is linked here):

  1. Apache Beam Presentation (July 23)
  2. Apache Beam for AI (July 24)
  3. Moving from batch to streaming (July 25)

🐝 Day 1: Introducing Apache Beam

The first day of Beam College provided a comprehensive introduction to Apache Beam. Sessions began by understanding the fundamentals of Apache Beam, exploring its unique features, and how it differentiated itself from other tools in the data processing ecosystem. Attendees learned how to identify scenarios where Beam would be a good fit for their projects or organization. The day concluded with hands-on instruction on getting started with Apache Beam, guiding attendees through the process of creating their first pipeline. This blend of theoretical knowledge and hands-on experience laid a solid foundation for the remainder of the program.

Approximately 120 people attended the first day of Beam College. It was a pleasure to see familiar faces from last year’s event alongside many new attendees, fostering a sense of community and collaboration. The day was filled with interesting presentations, and I’d like to highlight some of the most interesting:

  • Presentation by Marc Howard, “Project Shield: Defend Against DDoS Attacks with Beam”, presented a critical application of Apache Beam and Google Cloud Dataflow to protect global access to election information. As the founding engineer of Project Shield, a free service protecting vulnerable online content from DDoS attacks, Marc explained how their system processed over 3TB of daily log data, handling over 10,000 requests per second and scaling to 400 million during major attacks. This impressive infrastructure not only powered real-time user analytics, but also strengthened Project Shield’s defenses, playing a crucial role in maintaining internet freedom, especially during democratic processes. The presentation highlighted an important and compelling use case for Beam, demonstrating its power in protecting freedom of expression and information in the digital age.
  • Introducing Jeff Kinard, “Beam YAML Bootcamp: Effortless Pipeline Design for Data Processing“, generated considerable interest and prompted many questions from the audience. Jeff introduced YAML as a simplified approach to expressing pipelines, emphasizing its simple syntax and ease of use.

🤖 Day 2: Apache Beam for AI

On the second day, participants learned how to use Apache Beam to implement AI pipelines. In the first set of lessons, they implemented a machine learning pipeline from conceptualization to coding and running it on a laptop. An additional session focused on using Beam to interact with Google Gemini via Google AI Studio.

  • How to implement an ML pipeline using Beam. Part 1: Concepts and definition of our pipeline and Part 2: Coding” by Danny McCormick and Kerry Donny-Clark. This was a two-part session on implementing machine learning pipelines using Apache Beam. The first part introduced the key concepts and pipeline definition, focusing on the RunInference transform and the ModelHandler class. These tools facilitated inference and model adaptation from various ML frameworks. The second part demonstrated the practical application of these concepts, presenting Beam code for a complex pipeline that processed speech inputs, classified the text, applied different classification-based models, and converted the text back to speech. The presenter illustrated how to map the classification outputs to specific language models and run the full pipeline. This comprehensive session provided attendees with both theoretical knowledge and hands-on experience of integrating ML capabilities into Beam pipelines, using a practical example of speech-to-text-to-speech conversion.
  • Israel Herraiz led a session titled “Implementing a Complex ML Pipeline: Demonstration with Google AI Studios” where a demonstration was given on interacting with an AI model (Gemini was used in this case) from Beam using RunInference from a notebook in Google Colab.

📡Day 3: Moving from batch to streaming

The final day of Beam College focused on Apache Beam’s unified approach to batch and streaming pipelines. Attendees explored key concepts for implementing streaming pipelines in Beam, followed by a hands-on demo. The sessions concluded with an introduction to Beam Quest, an advanced learning resource for mastering complex concepts in Apache Beam.

  • Yi Hu’s session, “Moving from Batch to Streaming: Concepts and Code”explored the transition from batch to streaming pipelines using Apache Beam. The talk covered the fundamental concepts differentiating batch and streaming, introduced Beam primitives for streaming applications, and concluded with practical examples using Pub/Sub.
  • Session by Surjit Singh, “CI/CD with data flow models”provided an overview of Dataflow’s capabilities for implementing continuous integration and continuous delivery (CI/CD) pipelines.

I guess all those meetings helped me! I finished at the top of the ranking. 😃🥇 Beam College 2024 Online Platform

Beam College is a must-attend event for Apache Beam and Google Cloud enthusiasts, offering invaluable learning and networking opportunities that you shouldn’t miss. I highly recommend making it a priority in your schedule for next year. This event has significantly transformed my perspective on Beam. When I first started learning Beam/Dataflow, I noticed a lack of resources and hands-on projects. Recognizing this gap, I took it upon myself to create some Apache Beam projects. If you’re interested in exploring these projects, you can find them here:

Batch treatment:

  • ☁️GCP Data Engineering Project: Building and Orchestrating an ETL Pipeline for the Online Food Delivery Industry with Apache Beam and Apache Airflow🍕🚚
  • ☁️GCP Data Engineering Project: Connect Four Game with Python and Apache Beam 🔴⚫️

Streaming treatment:

  • ☁️GCP Data Engineering Project: Streaming Data Pipeline with Pub/Sub and Apache Beam/Dataflow📡

Feel free to contact me on LinkedIn if you want to exchange ideas about Apache Beam and Google Cloud! 💬😊


Beam College 2024: Is it worth your time? was originally published in Google Cloud – Community on Medium, where people are continuing the conversation by highlighting and responding to this story.