About the conference
DataSphere is a conference devoted to data-centric systems and the technologies making them tick.
Whether it is data engineering or AI application challenges - they all fit well in.
From technical details to concrete business use cases, no fluff.
Call For Presentations
- call for presentations closes on January 15th, 2018
- all presenters notified by January 31st, 2018
- abstracts: should have up to 250 words and mention the target audience (beginner, mid, advanced)
- speaker bios: subject matter expertise and presentation experience are taken into account during evaluation; impress us!
- additional details, links: add these separately to make the evaluator’s job easier and increase your chances of being selected :)
- presentations should all be in English, if that is not yet self-evident :)
- presentation duration: anything from 15 to 60 minutes will fit just fine in the program
- for those interested in a commercial twist, please contact us about the sponsorship packages available; attendees like presentations which focus on the technology or a business use case
- perks: our speakers enjoy unlimited access to conference events and presentations (including the sister Sphere events), plus an exclusive presenters’ dinner. However, if you need more support with your travel expenses etc, please let us know and we can discuss the available options (e.g. sponsorship packages)
An IBM Fellow in the IBM Analytics Group. He is based in San Francisco at the IBM Spark Technology Center and working closely with development and clients across the Analytics portfolio. A particular area of focus is the performance and scalability of Big Data technologies including Spark, Big SQL, Db2, Db2 Warehouse, and Db2 pureScale acceleration in on-premise, public and private cloud environments. His passion is in bringing advanced technology to market with an emphasis on exploiting processor, memory, networking, storage technology, and other hardware and software acceleration technologies. Since joining IBM Canada 1985, he has worked closely with many customers, ISVs and business partners around the world. Berni has a B.Sc. in Computer Science from the University of Saskatchewan and received the Alumni of Influence Award in 2016.
Vladimir is Artificial Intelligence enthusiast. Perfectionist at heart, with a pragmatic mindset. He is a trainer at DataWorkshop.eu where he explains how to use machine learning in real life. He has a podcast about Artificial Intelligence – BiznesMysli.pl (in Polish). He is an architect at General Electric. He participates in Kaggle’s competitions. He loves data and its challenges.
Do you know that the 4th industrial revolution is coming? Do you think that the talks about taking over work by robots concern only other professions and the programmer will still be needed? Partial is true, but on the other hand – there are really big changes going on over the next 5-10 years. Therefore, if you want to prepare for them, I recommend checking who the programmer 2.0 will be.
It turns out that learning machine learning can be an interesting adventure (and doing a PhD is optional).
I will show you examples of how relatively little effort you can do interesting and valuable things. The goal of the presentation is to inspire and break the barrier as to the complexity of the topic of machine learning. If you can program, you can do ML too!
William Benton leads a team of data scientists and engineers at Red Hat, where he has applied analytic techniques to problems ranging from forecasting cloud infrastructure costs to designing better cycling workouts. His current focus is investigating the best ways to build and deploy intelligent applications in cloud-native environments, but he has also conducted research and development in the areas of static program analysis, managed language runtimes, logic databases, cluster configuration management, and music technology.
Implementing Machine Learning Algorithms for Scale-Out Parallelism
Frameworks for elastic scale-out computation, like Apache Spark and Apache Flink, are important tools for putting machine intelligence into production applications. However, these frameworks do not always offer the same breadth or depth of algorithm coverage as specialized machine learning libraries that run on a single node, and the gulf between being a competent framework user and a seasoned library developer who can extend a framework can be quite daunting.
In this talk, we’ll walk through the process of developing a parallel implementation of a machine learning algorithm. We’ll start with the basics, by considering what makes algorithms difficult to parallelize and showing how we’d design a parallel implementation of an unsupervised learning technique. We’ll then introduce a simple parallel implementation of our technique on Apache Spark, and iteratively improve it to make it more efficient and more user-friendly. While some of the techniques we’ll introduce will be specific to the Spark implementation of our example, most of the material in this talk is broadly applicable to other distributed computing frameworks. We’ll conclude by briefly examining some techniques to complement scale-out performance by scaling our code up, taking advantage of specialized hardware to accelerate single-worker performance. You’ll leave this talk with everything you need to implement a new machine learning technique that takes advantage of parallelism and resources in the public cloud.
Umit is a Data Scientist at IBM, extensively focusing on IBM Data Science Experience and IBM Watson Machine Learning to solve complex business problems. His research spans across many areas from statistical modeling of financial asset prices to using evolutionary algorithms to improve the performance of machine learning models. Before joining to IBM, he worked on various domains such as high-frequency trading, supply chain management and consulting. He likes to learn from others and also share his insights at universities, conferences and local meet-ups.
Recent advancements in NLP and deep learning: a quant’s perspective
There is a gold-rush among hedge-funds for text mining algorithms to
quantify textual data and generate trading signals. Harnessing the power of
alternative data sources became crucial to find novel ways of enhancing
With the proliferation of new data sources, natural language data became
one of the most important data sources which could represent the public
sentiment and opinion about market events, which then can be used to
predict financial markets.
Talk is split into 5 parts:
- Who is a quant and how do they use NLP?
- How deep learning has changed NLP?
- Let’s get dirty with word embeddings
- Performant deep learning layer for NLP: The Recurrent Layer
- Using all that to make money
Lukasz Cmielowski joined IBM in 2008. In 2009 he successfully defended PhD dissertation in bioinformatics. Since then he worked as QA architect focused on putting AI into old school software quality activities. His domain was automation of software failure prediction. In 2015 he joined new team working on analytics solutions as automation architect and data scientist. He is also also a big fan of Norman Davies’ and Terry Pratchett’s book series.
From Spark MLlib model to learning system with Watson Machine Learning
A biomedical company that produces heart drugs has collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of five medications.
Based on treatment records they would like to predict the best drug for the patient. They also need to ensure that their prediction model is always up-to-date providing the highest possible quality of predictions.
During this session I will demonstrate how continuous learning system (part of Watson Machine Learning) can be used to achieve those goals.
Maciej has built large-scale data analytics and AI products in both research and industry, previously at DERI/INSIGHT Galway, currently as Chief Data Scientist at Altocloud. He is the founder of the Galway Data Meetup with over 250 members and received a number of awards for his work. He was shortlisted as one of the four finalists of the DatSci 2016 competition in the Data Scientist of the Year category.
Artificial Intelligence, creating value with data-driven products, decision support systems, recommender systems
Software Engineer. Last 10 years spent solving data problems in companies like Google, Sun, Base. As of today happy coder acting as Director of Data Science at AirHelp.
Michał Kaczmarczyk (Tech Lead, Software Architect, Project Manager, Ph.D.) is leading a development team implementing Spark-based fully automated predictive modeling system in cooperation with NEC Laboratories America. Michał received his PhD from Warsaw University and is exploring the field of distributed systems from year 2005. He worked for companies such as NEC Labs (Princeton, NJ), Microsoft (Redmond, WA), 9LivesData (Warsaw, currently). During this time he worked on core system components and published research papers on conferences such as FAST and SYSTOR. Since 2015 devoted to Spark and charmed with Scala.
Marcin Kulka is a Senior Software Engineer in 9LivesData. In cooperation with NEC Labs America machine learning researchers, he works on Spark-based fully automated predictive modelling system. He holds master’s degree in both Computer Science and Mathematics from Warsaw University. His biggest areas of interests are big data, machine learning, distributed systems and algorithms. Marcin has almost 10 years of professional experience in software engineering, most of which spent working on HYDRAstor – cutting edge, distributed and highly scalable backup system. Privately he is happy husband and father of two daughters.
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark
Building accurate machine learning models has been an art of data scientists, i.e., algorithm selection, hyper parameter tuning, feature selection and so on. Recently, challenges to breakthrough this difficult art have got started. In cooperation with our partner, NEC Laboratories America, we have developed a Spark-based automatic predictive modeling system. The system automatically searches the best algorithm, parameters and features without any manual work. In this talk, we will share how the automation system is designed to exploit attractive advantages of Spark. The evaluation with real open data demonstrates that our system can explore hundreds of predictive models and discovers the most accurate ones in minutes on a Ultra High Density Server, which employs 272 CPU cores, 2TB memory and 17TB SSD in 3U chassis. We will also share open challenges to learn such a massive number of models on Spark, particularly from reliability and stability standpoints.
Jacek Laskowski is an independent consultant, developer and trainer focusing exclusively on Apache Spark, Apache Kafka and Kafka Streams (with Scala and sbt on Apache Mesos, Hadoop YARN and DC/OS). I offer courses, workshops, mentoring and software development services.
Main field of interest: statistics, international academic cooperation, management.
Professional experience (selection):
1989 – 1997: University of California, Santa Barbara, USA;
1997 – 2012 Polish-American Higher School of Business – National-Louis University, Nowy Sącz, Poland.
2012 – present; Cracow University of Technology.
A visiting Professor in: USA, Mexico, France, Brazil, Ukraine, Kyrgyzstan, Sweden.
1989 – 1997 – Director of Statistical Consulting Lab, University of California, USA.
1993 – 1998 – Owner, StatLab International Consulting, Wrocław, Poland.
2007 – 2010 – Vice-rector for Research, WSB-NLU, Poland.
2007 – 2013 – Member of Committee for monitoring MRPO.
Director of research projects financed by NATO (three times) and National Science Centre, Poland.
– A triple laureate of NATO grant in matters of analysis and security of telecommunication signals (1995, 2000, 2008)
– A laureate of award for the best publication in matters of statistical signal analysis, European Signal Processing Society, 2007.
Big Data and Data Analytics
He is an entrepreneur, software engineer and machine learning practitioner. Currently holds the position of CTO at Craftinity, machine learning startup based out of Krakow. He’s been working on machine learning projects since graduation from university, first at Siemens, then at Craftinity. When he doesn’t train neural networks, he reads a lot on history, economics and plays football.
CTO & co-founder at 2040.io. He started his professional journey with technology in 1998 when he became system administrator and promoted secure Unix environment and open source movement. After that switched his focus to software development and worked as programmer and IT project manager. As of 2005 became CTO of Empathy Internet Software House (also a co-owner). In 2012 became board member and e-commerce director of Grupa Unity – one of the biggest polish internet software houses. Early adopter of promising technologies – mobile, blockchain and deep learning.
Armand Ruiz Gabernet
Lead Product Manager at IBM. Armand is Product Manager of Advanced Analytics solutions and technology enthusiast. Motivated and self-starter to create new innovative products. Strong organizational skills and able to navigate across different teams as well as varying personalities. Motivated by great design, product simplicity and high-quality user experience.
Software Engineer in the Data Services section of the Beams Controls group at CERN. Since 2015 he has played a major role in the design and development of the next generation Accelerators Logging Service (NXCALS), which is based on state of the art Big Data technologies including Apache Spark and Kafka.
Nikolay is driven by the goal of making the life of the Data Scientists easier, providing native structured access to logged data and integrated tools for data analysis. He strives to deliver pragmatic solutions and is passionate about dealing with huge amounts of data and building distributed systems.
Next CERN Accelerator Logging Service Architecture
The Next Accelerator Logging Service (NXCALS) is a new Big Data project at CERN aiming to replace the existing Oracle-based service.
The main purpose of the system is to store technical accelerator data needed by machine operators and data scientists at CERN. Gathered from thousands of devices in the whole accelerators complex the data is used to operate the machines, improve their performance and conduct studies for new beam types or future experiments.
This presentation is a dive into the Hadoop/Spark based NXCALS architecture. Nikolay will speak about the service requirements, the design choices and present the Ingestion API as one of the main components of the system. He will also reveal the core abstraction behind the Meta-data provider and the Spark-based Extraction API where simple changes to the result schema improved the overall usability and performance of the system.
This talk can be of interest to any companies or institutes confronted with similar Big Data problems as the system itself is not CERN specific.
React.sphere.it is a conference focused on Reactive Programming and Reactive System Design. Now in its 2nd Edition, it’s a perfect opportunity to meet and share knowledge with experts in this field.
Scala.sphere.it is a unique event devoted to important topic for every Scala Software Developer – Dev Tools.
Main venueThe Opera of Kraków
Code of Conduct
The following Code of Conduct is inspired by that from other prominent conferences such as ScalaDays or Scala eXchange.
DataSphere is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, nationality, age or religion. We do not tolerate harassment of participants in any form. Please show respect for those around you. This applies to both in-person and online behavior.
All communication should be appropriate for a technical audience,
including people of many different backgrounds. Sexual language, innuendo, and imagery is not appropriate for any conference venue, including talks.
If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact a member of staff immediately. If an individual engages in harassing behaviour, the DataSphere staff may take any action they deem appropriate, including warning the offender or expulsion from the event.
We expect all attendees to follow above rules during our DataSphere Conference.