Agenda
- 11:00 AM – 11:20 AM: Arrival & Networking
- 11:20 AM – 12:00 PM: Data in Motion with Apache ActiveMQ® and Apache Beam | JB Onofré, Principal Software Engineer, Dremio + Director, Apache Foundation
- 12:00 PM – 1:00 PM: Rooftop Lunch & Networking (on the rooftop if the weather permits)
- 1:00 PM – 1:40 PM: GraphFlow & Beam: Pythonic, Scalable GNN Pipelines | Yogesh Tewari, Senior Cloud Data Engineer, Google
- 1:40 PM – 2:00 PM: Final Networking
Abstracts
Data in Motion with Apache ActiveMQ® and Apache Beam
JB Onofré, Principal Software Engineer, Dremio + Director, Apache Foundation
Modern data architectures demand more than batch processing — they require reliable, scalable, and flexible pipelines that can handle data as it moves. This session explores the powerful combination of Apache ActiveMQ, a battle-tested message broker for enterprise messaging, and Apache Beam, a unified programming model for both batch and streaming data processing.
We’ll walk through the fundamentals of integrating ActiveMQ as a durable message source and sink within Beam pipelines, enabling real-time event-driven workflows across distributed systems. Attendees will learn how to build end-to-end pipelines that consume messages from ActiveMQ queues and topics, apply transformations, enrichments, and windowing strategies using Beam’s expressive API, and route results to downstream systems — all with portability across runners like Apache Flink, Apache Spark, and Google Dataflow.
Key topics include:
- ActiveMQ connectivity patterns in Beam (JMS I/O)
- Message acknowledgment and exactly-once semantics
- Schema handling and payload deserialization
- Scaling strategies for high-throughput messaging workloads
- Real-world use cases: event sourcing, CDC, and operational data pipelines
Whether you’re modernizing a legacy messaging infrastructure or designing a new streaming architecture from scratch, this talk will give you practical patterns and insights to put data in motion — reliably and at scale.
GraphFlow & Beam: Pythonic, Scalable GNN Pipelines
Yogesh Tewari, Senior Cloud Data Engineer at Google
Learn how GraphFlow, a modular Python toolkit, utilizes Apache Beam to create efficient and scalable data pipelines for Graph Neural Networks (GNNs). We’ll demonstrate how GraphFlow on Beam tackles large-scale graph data challenges, including distributed ingestion from cloud databases, scalable feature normalization, graph sampling, and online model inference.