Exam Objectives 400-201 pdf 70-461 dumps Download 70-640 pdf Exam PDF Questions Answers 220-802 dumps Popular PMI pmi-001 pdf Exam Guide for free, PMI-001 Cert Download400-251 pdf | Questions Answers Data.sphere.it

A conference devoted to making the most in a world that’s

DATA DRIVEN

April 15-17, 2018 | Kraków, Poland Get tickets

About the conference

DataSphere is a conference devoted to data-centric systems and the technologies making them tick. Whether it is data engineering or AI application challenges - they all fit well in.

From technical details to concrete business use cases, no fluff.





Agenda

Day 1 - Sunday

Day 2 - Monday

Day 3 - Tuesday

Machine Learning & Deep Learning Workshop

During the workshop we will prepare machine learning an deep learning models and deploy them in the background of a web application. The workshop consist of two parts:

Part 1: My first Machine Learning model working on production
Part 2: Developing Deep Learning (DL) Applications More details here

To participate in the workshop Python skills are required, NodeJs skills are nice to have.

Hosts: Umit Mert Cakmak & Lukasz Cmielowski. Free entrance. Register here

10:00 to 14:00

Sales forecasting with Keras and Tensorflow

This beginner-level workshop will guide you through implementing a sales forecasting neural network model based on an openly available dataset. We will go through some basic neural network theory and apply a systematic model improvement approach to get from a simple model to a more complex one, trying out various tricks along the way. Requirements: - python: beginner level - a laptop with some software configured (see setup instructions) - pandas, numpy, sklearn: nice-to-have Host: Grzegorz Gawron Free entrance. Register here

10:00 to 14:00

Registration


8:00 to 9:00

Conference Opening


9:00 to 9:05

KEYNOTE: Present/future vision for data+AI

Presented by Anthony Stevens
Anthony Stevens IBM
9:05 to 9:35

KEYNOTE: Reactive is a Product

Abstract: The Reactive Manifesto was released almost 4 years ago, and while many teams and companies have adopted its principles, many are still unsure how to go about the work to implement resilience and elasticity. Who on a team should advocate for the work, define the requirements, establish success criteria and more? And how should those individuals interact with other team members to set priorities and drive the implementations? In this talk, I will go through practices I have successfully established and enabled to address these issues, and provide a road map for you to do the same.
Jamie Allen
9:40 to 10:10

KEYNOTE: Tools of the Mind

Tools are a force multiplier, without them we would still be toggling code into machines with switches. Tooling is powerful and we use them every day to empower us. Rather than focusing on software tooling we will explore mental tooling that will help to improve your productivity, happiness and career prospects. More details here

Rory Graves Ensime
10:15 to 10:45

Implementing Machine Learning Algorithms for Scale-Out Parallelism

Frameworks for elastic scale-out computation, like Apache Spark and Apache Flink, are important tools for putting machine intelligence into production applications. However, these frameworks do not always offer the same breadth or depth of algorithm coverage as specialized machine learning libraries that run on a single node, and the gulf between being a competent framework user and a seasoned library developer who can extend a framework can be quite daunting. In this talk, we’ll walk through the process of developing a parallel implementation of a machine learning algorithm. We’ll start with the basics, by considering what makes algorithms difficult to parallelize and showing how we’d design a parallel implementation of an unsupervised learning technique. We’ll then introduce a simple parallel implementation of our technique on Apache Spark, and iteratively improve it to make it more efficient and more user-friendly. While some of the techniques we’ll introduce will be specific to the Spark implementation of our example, most of the material in this talk is broadly applicable to other distributed computing frameworks. We’ll conclude by briefly examining some techniques to complement scale-out performance by scaling our code up, taking advantage of specialized hardware to accelerate single-worker performance. You’ll leave this talk with everything you need to implement a new machine learning technique that takes advantage of parallelism and resources in the public cloud.
William Benton Red Hat
11:00 to 11:40

What we’ve learned from creating Edward.ai

This will be a story about creation of AI powered sales assistant. How did it all start, and what challenges we’ve faced during the last 18 months? How did we apply AI into our software and what our customers are saying about the usability of such tool? And what are the plans for the nearest future in comparison of artificial intelligence advancement.
Tomasz Wesołowski 2040.io
11:50 to 12:20

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark

Building accurate machine learning models has been an art of data scientists, i.e., algorithm selection, hyper parameter tuning, feature selection and so on. Recently, challenges to breakthrough this difficult art have got started. In cooperation with our partner, NEC Laboratories America, we have developed a Spark-based automatic predictive modeling system. The system automatically searches the best algorithm, parameters and features without any manual work. In this talk, we will share how the automation system is designed to exploit attractive advantages of Spark. The evaluation with real open data demonstrates that our system can explore hundreds of predictive models and discovers the most accurate ones in minutes on a Ultra High Density Server, which employs 272 CPU cores, 2TB memory and 17TB SSD in 3U chassis. We will also share open challenges to learn such a massive number of models on Spark, particularly from reliability and stability standpoints.
Marcin Kulka 9LivesData
12:30 to 13:00

Lunch Break


13:00 to 14:00

How we built a Shiny App for 700 users?

Shiny has proved itself a great tool for communicating data science teams’ results. However, developing a Shiny app for a large scope project that will be used commercially by more than dozens of users is not easy. The first challenge is User Interface (UI): the expectations are that the app should not vary from modern web pages. Secondly, performance directly impacts user experience (UX), and it’s difficult to maintain efficiency with growing requirements and user base. In this talk, we will share our experience from a real-life case study of building an app used daily by 700 users where our data science team tackled all these problems. This, to our knowledge, was one of the biggest production deployments of a Shiny App. We will show an innovative approach to building a beautiful and flexible Shiny UI using **shiny.semantic** package (an alternative to standard Bootstrap). Furthermore, we will talk about the non-standard optimization tricks we implemented to gain performance. Then we will discuss challenges regarding complex reactivity and offer solutions. We will go through implementation and deployment process of the app using a load balancer. Finally, we will present the application and give details on how this benefited our client. Senior Data Scientist at Appsilon Data Science
Olga Mierzwa-Sulima Appsilon Data Science
14:00 to 14:40

Recent advancements in NLP and deep learning: a quant’s perspective

There is a gold-rush among hedge-funds for text mining algorithms to quantify textual data and generate trading signals. Harnessing the power of alternative data sources became crucial to find novel ways of enhancing trading strategies.
 
  With the proliferation of new data sources, natural language data became one of the most important data sources which could represent the public sentiment and opinion about market events, which then can be used to predict financial markets.
 
  Talk is split into 5 parts:
 
  Who is a quant and how do they use NLP?
  - How deep learning has changed NLP?
  - Let’s get dirty with word embeddings
  - Performant deep learning layer for NLP: The Recurrent Layer
  - Using all that to make money
 
Umit Mert Cakmak IBM
14:50 to 15:30

Building Successful Machine Learning Products

With recent advancements in the AI ecosystem, the entry barriers for utilisation of Machine Learning techniques are lower than ever. The growing availability of tools and platforms together with decreasing cost of computation, allows smaller teams to build ML products faster and add value in a number of industries, from self-driving cars to personal assistants. This not only creates new opportunities, but also poses a number of challenges related to the design of products that we interact with on a daily basis. In this talk I will share a number of experiences and examples of products using Machine Learning, focusing on the common gaps as well as the key steps for designing a successful ML product. I will describe how techniques such as human-centered design or design thinking play an important role in choosing the right problem to solve with Machine Learning and in shapIng the user experience when the algorithms fail to deliver. The second part of the talk will focus on the engineering challenges, including data collection, model training and deployment at scale.
Maciej Dąbrowski Genesys
15:45 to 16:25

Descriptive statistics – the mighty dwarf of data science

Nowadays fair part of the community (often influenced by the pressure from the business) seems to show a tendency of applying somewhat complex and rather computationally expensive algorithms to applications that would have been easily accommodated in the past by much simpler (hence faster) and much more interpretable (hence of greater business value) techniques. This presentation aims to remind peers the power and beauty of descriptive statistics as an approach for quantitatively describing the nature of features and creating solid foundations for any subsequent data investigations.
Paweł Rzeszuciński Codewise
16:35 to 16:50

Next CERN Accelerator Logging Service Architecture

The Next Accelerator Logging Service (NXCALS) is a new Big Data project at CERN aiming to replace the existing Oracle-based service. The main purpose of the system is to store technical accelerator data needed by machine operators and data scientists at CERN. Gathered from thousands of devices in the whole accelerators complex the data is used to operate the machines, improve their performance and conduct studies for new beam types or future experiments. This presentation is a dive into the Hadoop/Spark based NXCALS architecture. Nikolay will speak about the service requirements, the design choices and present the Ingestion API as one of the main components of the system. He will also reveal the core abstraction behind the Meta-data provider and the Spark-based Extraction API where simple changes to the result schema improved the overall usability and performance of the system. This talk can be of interest to any companies or institutes confronted with similar Big Data problems as the system itself is not CERN specific.
Nikolay Tsvetkov CERN
16:55 to 17:25

Make Your Data FABulous

The CAP theorem is widely known for distributed systems, but it’s not the only tradeoff you should be aware of. For datastores there is also the FAB theory and just like with the CAP theorem you can only pick two: Fast: Results are fast enough so that people can have a seamless interaction. Accurate: Answers are accurate and don’t have a margin of error. Big: Dozens or hundreds of systems are involved in calculating the result. Most SQL databases are in the FA space whereas Hadoop and related systems are generally AB systems. A system optimized for FB is Elasticsearch for example. While Fast and Big are relatively easy to understand, Accurate is a bit harder to picture. This talk shows some concrete examples of accuracy tradeoffs Elasticsearch has taken and how to optimize them for your usecase.
Philipp Krenn Elastic
17:40 to 18:20

After Party


18:30 to **

Big Data and Data Analytics

The main goal of my presentation is to introduce the participants to the most novel concepts in computational statistics and their implications into broader range of decisions based on Big Data. These days a data scientist has to work simultaneously on at least three different fronts. First, the data gathering plan called the design. Invariably, designs become extremely useful as we are flooded with data. Designs help us to focus on the aim of the study and on reduction of complexity. Secondly, the software environment we choose to analyze our data. The competition here is quite strong but the future will belong to open source solutions. The Author of this presentation belongs to the club of R-ofiles, that is a world-wide community of data scientists willing to share their tools. Finally, the third front of the battle of the data scientist is selection of an appropriate statistical algorithm. Here the Big Data revolution has dramatically changed the perspective. We now move much more audaciously to extremely high dimensional data with new statistical tools. The concepts of the talk will be illustrated with examples from the Authors experience, that is from signal processing, medical and financial data.
Jacek Leśkow Cracow University of Technology
9:00 to 9:40

Life after the model


Convolutional neural network is ready, F1 score calculated and ROC curve drawn. So you have a model. This tale is about what happens next: when and how to cleanse the model, how do push it into production and what defines quality for the model. I will show it using one of the projects we deployed here at AirHelp. I will also try to show a few tips & tricks that helps me manage Machine Learning projects.
Michał Jakóbczyk AirHelp
9:55 to 10:35

Big and smart data in the development of autonomous vehicles

Abstrakt: When can we expect autonomous cars on the roads? All the technical solutions are there, ready for vehicle manufacturers to equip their models. What stops us? Part of the answer is connected with ensuring safety, and the need of testing. Simusafe project, funded by EU and coordinated by Aptiv, aims at finding behavioral models to describe the multi-actors traffic, making the simulations reflecting the real life.
Grzegorz Wyszyński Aptiv
10:45 to 11:25

Big O in a Retailer’s Big Data (or where computer science meets data science)


In situations where 1% optimisation improvement might be worth millions - algorithm tuning, parallelisation, cloud benchmarking, etc - might turn out to be just the things that count. Grzegorz Gawron and Tomasz Lichoń will walk you through some cases where software engineering (computer science?) goes hand in hand with data science. All based on VirtusLab's real-world projects.
Grzegorz Gawron & Tomasz Lichoń VirtusLab
11:40 to 12:20

Explaining neural networks predictions

Recently Deep Neural Networks have become superior in many machine learning tasks. However, they are more difficult to interpret than simpler models like Support Vector Machines or Decision Trees. One may say that neural nets are black boxes that produce predictions but we can’t explain why a given prediction is made. Such a condition is not acceptable in industries like healthcare or law. In this talk, I will show known ways of understanding neural networks predictions.
Matthew Opala Craftinity
12:30 to 13:00

Lunch Break


13:00 to 14:00

Luna – presentation

The talk is a presentation of Luna, the visual-textual programming language and environment for data processing. It showcases a novel paradigm for data processing and explains how strongly-typed, purely functional programming can be combined with visual representation to help people create pipelines that are more intuitive, easier to comprehend and less error-prone. We demonstrate interactive examples and discuss the possibilities of such paradigm to change the way the data is being processed across industries.
Piotr Moczurad Luna
14:00 to 14:40

Collaborative Filtering Microservices on Spark

The Alternating Least Squares (ALS) algorithm is still deemed the industry standard in collaborative filtering. In this talk we will focus on Apache Spark’s ALS implementation and discuss the steps we took to build a distributed recommendation engine, focusing on continuous model training and model management. We show that, by splitting the recommendation engine into microservices, we were able to reduce the system’s complexity and produce a robust collaborative filtering platform with support for continuous model training. At the end of this talk, you should be equipped with enough tools and ideas to implement your own collaborative algorithm and avoid some common pitfalls.
Rui Vieira & Sophie Watson Red Hat
14:50 to 15:30

From Spark MLlib model to learning system with Watson Machine Learning

A biomedical company that produces heart drugs has collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of five medications. Based on treatment records they would like to predict the best drug for the patient. They also need to ensure that their prediction model is always up-to-date providing the highest possible quality of predictions. During this session I will demonstrate how continuous learning system (part of Watson Machine Learning) can be used to achieve those goals.
Łukasz Ćmielowski IBM
16:35 to 17:15

Programmer 2.0

Do you know that the 4th industrial revolution is coming? Do you think that the talks about taking over work by robots concern only other professions and the programmer will still be needed? Partial is true, but on the other hand – there are really big changes going on over the next 5-10 years. Therefore, if you want to prepare for them, I recommend checking who the programmer 2.0 will be. It turns out that learning machine learning can be an interesting adventure (and doing a PhD is optional). I will show you examples of how relatively little effort you can do interesting and valuable things. The goal of the presentation is to inspire and break the barrier as to the complexity of the topic of machine learning. If you can program, you can do ML too!
Vladimir Alekseichenko GE
17:30 to 18:10

Discussion Panel: Experiences and views on AI adoption

Panelists working at a diverse range of companies (open source/proprietary software, products/service focus, small/big) will share their unique perspectives on AI and ML adoption across industry.

Panelists:
- William Benton, RedHat
- Lukasz Cmielowski, IBM
- Konrad Pabianczyk, Appsilon Data Science
- Grzegorz Gawron, VirtusLab
- TBA

18:15 to 19:00

Conference Closing


19:00 to 19:05

CloudSphere - a small companion event to Sphere.IT

More details here

19:15 to 21:45

FrontendSphere - a small companion event to Sphere.IT

More details here

19:15 to 21:45

Keynote speakers

Anthony Stevens
Anthony Stevens IBM

Anthony Stevens

Anthony Stevens directs offering strategy for Watson Deep Learning where he collaborates across IBM Watson Research and Development to identify then integrate AI into IBM’s cloud offering. This allows him to collaborate among IBM’s numerous partners and customers to transform theory into practice for deep learning solutions based on IBM Watson technologies. Prior to joining the IBM Watson product team, Anthony was a senior solution architect within the Watson Ecosystem where he gained insights into the challenges confronting AI implementation through working with over 50 customers including partners, engineering, and research teams.
 
 

Anthony is an expert in transforming complex problems and datasets into leading AI, mobile, web, and cloud solutions. He has done this for life science, health, consumer, and multimedia companies by bringing deep expertise in both software development and product management. He has led the process to design, architect, build, and launch products for startups and Fortune 500 corporations.

Topic:
Present/future vision for data+AI

Anthony Stevens IBM

The speakers

Vladimir Alekseichenko
Vladimir Alekseichenko GE
William Benton
William Benton Red Hat
Umit Mert Cakmak
Umit Mert Cakmak IBM
Łukasz Ćmielowski
Łukasz Ćmielowski IBM
Maciej Dąbrowski
Maciej Dąbrowski Genesys
Grzegorz Gawron
Grzegorz Gawron VirtusLab
Michał Jakóbczyk
Michał Jakóbczyk AirHelp
Philipp Krenn
Philipp Krenn Elastic
Marcin Kulka
Marcin Kulka 9LivesData
Tomasz Lichoń
Tomasz Lichoń VirtusLab
Olga Mierzwa-Sulima
Olga Mierzwa-Sulima Appsilon Data Science
Piotr Moczurad
Piotr Moczurad Luna
Jacek Leśkow
Jacek Leśkow Cracow University of Technology
Matthew Opala
Matthew Opala Craftinity
Maciek Próchniak
Maciek Próchniak TouK
Paweł Rzeszuciński
Paweł Rzeszuciński Codewise
Grzegorz Wyszyński
Grzegorz Wyszyński Aptiv
Nikolay Tsvetkov
Nikolay Tsvetkov CERN
Rui Vieira
Rui Vieira Red Hat
Sophie Watson
Sophie Watson Red Hat
Tomasz Wesołowski
Tomasz Wesołowski 2040.io

Vladimir Alekseichenko

Bio:
Vladimir is Artificial Intelligence enthusiast. Perfectionist at heart, with a pragmatic mindset. He is a trainer at DataWorkshop.eu where he explains how to use machine learning in real life. He has a podcast about Artificial Intelligence – BiznesMysli.pl (in Polish). He is an architect at General Electric. He participates in Kaggle’s competitions. He loves data and its challenges.

Topic:
Programmer 2.0

Abstract:
Do you know that the 4th industrial revolution is coming? Do you think that the talks about taking over work by robots concern only other professions and the programmer will still be needed? Partial is true, but on the other hand – there are really big changes going on over the next 5-10 years. Therefore, if you want to prepare for them, I recommend checking who the programmer 2.0 will be.

It turns out that learning machine learning can be an interesting adventure (and doing a PhD is optional).

I will show you examples of how relatively little effort you can do interesting and valuable things. The goal of the presentation is to inspire and break the barrier as to the complexity of the topic of machine learning. If you can program, you can do ML too!

Vladimir Alekseichenko GE

William Benton

Bio:
William Benton leads a team of data scientists and engineers at Red Hat, where he has applied analytic techniques to problems ranging from forecasting cloud infrastructure costs to designing better cycling workouts. His current focus is investigating the best ways to build and deploy intelligent applications in cloud-native environments, but he has also conducted research and development in the areas of static program analysis, managed language runtimes, logic databases, cluster configuration management, and music technology.

Topic:
Implementing Machine Learning Algorithms for Scale-Out Parallelism

Abstract:
Frameworks for elastic scale-out computation, like Apache Spark and Apache Flink, are important tools for putting machine intelligence into production applications. However, these frameworks do not always offer the same breadth or depth of algorithm coverage as specialized machine learning libraries that run on a single node, and the gulf between being a competent framework user and a seasoned library developer who can extend a framework can be quite daunting.
In this talk, we’ll walk through the process of developing a parallel implementation of a machine learning algorithm. We’ll start with the basics, by considering what makes algorithms difficult to parallelize and showing how we’d design a parallel implementation of an unsupervised learning technique. We’ll then introduce a simple parallel implementation of our technique on Apache Spark, and iteratively improve it to make it more efficient and more user-friendly. While some of the techniques we’ll introduce will be specific to the Spark implementation of our example, most of the material in this talk is broadly applicable to other distributed computing frameworks. We’ll conclude by briefly examining some techniques to complement scale-out performance by scaling our code up, taking advantage of specialized hardware to accelerate single-worker performance. You’ll leave this talk with everything you need to implement a new machine learning technique that takes advantage of parallelism and resources in the public cloud.

William Benton Red Hat

Umit Mert Cakmak

Bio:
Umit is a Data Scientist at IBM, extensively focusing on IBM Data Science Experience and IBM Watson Machine Learning to solve complex business problems. His research spans across many areas from statistical modeling of financial asset prices to using evolutionary algorithms to improve the performance of machine learning models. Before joining to IBM, he worked on various domains such as high-frequency trading, supply chain management and consulting. He likes to learn from others and also share his insights at universities, conferences and local meet-ups.

Topic:
Recent advancements in NLP and deep learning: a quant’s perspective

Abstract:
There is a gold-rush among hedge-funds for text mining algorithms to
quantify textual data and generate trading signals. Harnessing the power of
alternative data sources became crucial to find novel ways of enhancing
trading strategies.
With the proliferation of new data sources, natural language data became
one of the most important data sources which could represent the public
sentiment and opinion about market events, which then can be used to
predict financial markets.
Talk is split into 5 parts:

  • Who is a quant and how do they use NLP?
  • How deep learning has changed NLP?
  • Let’s get dirty with word embeddings
  • Performant deep learning layer for NLP: The Recurrent Layer
  • Using all that to make money
Umit Mert Cakmak IBM

Łukasz Ćmielowski

Bio:
Lukasz Cmielowski joined IBM in 2008. In 2009 he successfully defended PhD dissertation in bioinformatics. Since then he worked as QA architect focused on putting AI into old school software quality activities. His domain was automation of software failure prediction. In 2015 he joined new team working on analytics solutions as automation architect and data scientist. He is also also a big fan of Norman Davies’ and Terry Pratchett’s book series.

Topic:
From Spark MLlib model to learning system with Watson Machine Learning

Abstract:
A biomedical company that produces heart drugs has collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of five medications.
Based on treatment records they would like to predict the best drug for the patient. They also need to ensure that their prediction model is always up-to-date providing the highest possible quality of predictions.
During this session I will demonstrate how continuous learning system (part of Watson Machine Learning) can be used to achieve those goals.

Łukasz Ćmielowski IBM

Maciej Dąbrowski

Bio:
Maciej has built large-scale data analytics and AI products in both research and industry, previously at DERI/INSIGHT Galway, currently as Chief Data Scientist at Genesys. He is the founder of the Galway Data Meetup with over 250 members and received a number of awards for his work. He was shortlisted as one of the four finalists of the DatSci 2016 competition in the Data Scientist of the Year category.

Interests:
Artificial Intelligence, creating value with data-driven products, decision support systems, recommender systems

Topic:
Building Successful Machine Learning Products

Abstract:
With recent advancements in the AI ecosystem, the entry barriers for utilisation of Machine Learning techniques are lower than ever. The growing availability of tools and platforms together with decreasing cost of computation, allows smaller teams to build ML products faster and add value in a number of industries, from self-driving cars to personal assistants. This not only creates new opportunities, but also poses a number of challenges related to the design of products that we interact with on a daily basis.

In this talk I will share a number of experiences and examples of products using Machine Learning, focusing on the common gaps as well as the key steps for designing a successful ML product. I will describe how techniques such as human-centered design or design thinking play an important role in choosing the right problem to solve with Machine Learning and in shapIng the user experience when the algorithms fail to deliver. The second part of the talk will focus on the engineering challenges, including data collection, model training and deployment at scale.

Maciej Dąbrowski Genesys

Grzegorz Gawron

Bio:
Grzegorz Gawron is a lead software engineer/manager with interests in advanced analytics and a taste for theory (that makes him a computer scientist surely!).

He acts as the Head of Data Science at VirtusLab. Previously doing data engineering at Base CRM. Before that trading systems for banking (PRM). … It all started with a :joy: of having the first Commodore 64 program stored securely on a magnetic tape.

He holds an MSc in computer science and an MSc in economics from the University of Warsaw.

Interests in data, algorithms, software engineering, distributed systems, machine learning.

Topic:
Big O in a Retailer’s Big Data (or where computer science meets data science)

Abstract:
In situations where 1% optimisation improvement might be worth millions – algorithm tuning, parallelisation, cloud benchmarking, etc – might turn out to be just the things that count. Grzegorz Gawron and Tomasz Lichoń will walk you through some cases where software engineering (computer science?) goes hand in hand with data science. All based on VirtusLab’s real-world projects.

Grzegorz Gawron VirtusLab

Michał Jakóbczyk

Bio:
Software Engineer. Last 10 years spent solving data problems in companies like Google, Sun, Base. As of today happy coder acting as Director of Data Science at AirHelp.

Topic:
Life after the model

Abstract:
Convolutional neural network is ready, F1 score calculated and ROC curve drawn. So you have a model.
This tale is about what happens next: when and how to cleanse the model, how do push it into production and what defines quality for the model.
I will show it using one of the projects we deployed here at AirHelp.
I will also try to show a few tips & tricks that helps me manage Machine Learning projects.

Michał Jakóbczyk AirHelp

Philipp Krenn

Bio:
Philipp is part of the infrastructure team and a developer advocate at Elastic. He is frequently talking about full-text search, databases, operations, and security. Additionally, he is organizing multiple meetups in Vienna.

Topic:
Make Your Data FABulous

Abstract:
The CAP theorem is widely known for distributed systems, but it’s not the only tradeoff you should be aware of. For datastores there is also the FAB theory and just like with the CAP theorem you can only pick two:
Fast: Results are fast enough so that people can have a seamless interaction.
Accurate: Answers are accurate and don’t have a margin of error.
Big: Dozens or hundreds of systems are involved in calculating the result.

Most SQL databases are in the FA space whereas Hadoop and related systems are generally AB systems. A system optimized for FB is Elasticsearch for example.

While Fast and Big are relatively easy to understand, Accurate is a bit harder to picture. This talk shows some concrete examples of accuracy tradeoffs Elasticsearch has taken and how to optimize them for your usecase.

Philipp Krenn Elastic

Marcin Kulka

Bio:
Marcin Kulka is a Senior Software Engineer in 9LivesData. In cooperation with NEC Labs America machine learning researchers, he works on Spark-based fully automated predictive modelling system. He holds master’s degree in both Computer Science and Mathematics from Warsaw University. His biggest areas of interests are big data, machine learning, distributed systems and algorithms. Marcin has almost 10 years of professional experience in software engineering, most of which spent working on HYDRAstor – cutting edge, distributed and highly scalable backup system. Privately he is happy husband and father of two daughters.

Topic:
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark

Abstract:
Building accurate machine learning models has been an art of data scientists, i.e., algorithm selection, hyper parameter tuning, feature selection and so on. Recently, challenges to breakthrough this difficult art have got started. In cooperation with our partner, NEC Laboratories America, we have developed a Spark-based automatic predictive modeling system. The system automatically searches the best algorithm, parameters and features without any manual work. In this talk, we will share how the automation system is designed to exploit attractive advantages of Spark. The evaluation with real open data demonstrates that our system can explore hundreds of predictive models and discovers the most accurate ones in minutes on a Ultra High Density Server, which employs 272 CPU cores, 2TB memory and 17TB SSD in 3U chassis. We will also share open challenges to learn such a massive number of models on Spark, particularly from reliability and stability standpoints.

Marcin Kulka 9LivesData

Tomasz Lichoń

Bio:
Tomasz Lichoń is big data engineer and team leader at VirtusLab, in where he’s involved in building data lake and data processing pipelines using Hadoop related technologies. Prior to that, he was focusing on open source scientific projects, being one of the main contributors to onedata initiative that aims to revolutionize access to scientific data, and helping to improve software for simulating LHC beam collisions at Cern. He obtained his M.Sc. and started Ph.D. studies at AGH University of Science and Technology. His personal interests include football, AI, psychology and good beer.

Topic:
Big O in a Retailer’s Big Data (or where computer science meets data science)

Abstract:
In situations where 1% optimisation improvement might be worth millions – algorithm tuning, parallelisation, cloud benchmarking, etc – might turn out to be just the things that count. Grzegorz Gawron and Tomasz Lichoń will walk you through some cases where software engineering (computer science?) goes hand in hand with data science. All based on VirtusLab’s real-world projects.

Tomasz Lichoń VirtusLab

Olga Mierzwa-Sulima

Bio:
Olga is a senior data scientist at Appsilon Data Science and a co-founder of datahero.tech. She leads a team of data scientists and build data science predictive/explanatory solutions and deploy them in production, usually wrapped in a Shiny App UI. She develops Appsilon’s open-source R packages. Olga holds a MSc degree in Econometrics from the University of Rotterdam. She co-organizes the largest meetup of R users in Poland and is a co-founder of R-Ladies Warsaw chapter.

Topic:
How we built a Shiny App for 700 users?

Abstract:
Shiny has proved itself a great tool for communicating data science teams’ results. However, developing a Shiny app for a large scope project that will be used commercially by more than dozens of users is not easy. The first challenge is User Interface (UI): the expectations are that the app should not vary from modern web pages. Secondly, performance directly impacts user experience (UX), and it’s difficult to maintain efficiency with growing requirements and user base.

In this talk, we will share our experience from a real-life case study of building an app used daily by 700 users where our data science team tackled all these problems. This, to our knowledge, was one of the biggest production deployments of a Shiny App.

We will show an innovative approach to building a beautiful and flexible Shiny UI using **shiny.semantic** package (an alternative to standard Bootstrap). Furthermore, we will talk about the non-standard optimization tricks we implemented to gain performance. Then we will discuss challenges regarding complex reactivity and offer solutions. We will go through implementation and deployment process of the app using a load balancer. Finally, we will present the application and give details on how this benefited our client.

Senior Data Scientist at Appsilon Data Science

Olga Mierzwa-Sulima Appsilon Data Science

Piotr Moczurad

Bio:
Haskell developer in the Luna language team, changing the way people think about software development and data processing. Functional programming enthusiast, especially Haskell and Scala. Doctoral candidate at the Faculty of Computer Science of AGH University of Science and Technology, working on seamlessly integrating serverless architecture with visual and functional programming.

Topic:
Luna – presentation

Abstract:
The talk is a presentation of Luna, the visual-textual programming language and environment for data processing. It showcases a novel paradigm for data processing and explains how strongly-typed, purely functional programming can be combined with visual representation to help people create pipelines that are more intuitive, easier to comprehend and less error-prone. We demonstrate interactive examples and discuss the possibilities of such paradigm to change the way the data is being processed across industries.

 

Piotr Moczurad Luna

Jacek Leśkow

Bio:
Main field of interest: statistics, international academic cooperation, management. A professor at University of California (USA), Polish-American Higher School of Business – National-Louis University (Poland) and Cracow University of Technology. A visiting Professor in: USA, Mexico, France, Brazil, Ukraine, Kyrgyzstan, Sweden.
Management experience: a Director of Statistical Consulting Lab, University of California (USA), an owner of StatLab International Consulting (Poland), a vice-rector for Research, WSB-NLU (Poland), a Member of Committee for monitoring MRPO and a director of research projects financed by NATO (three times) and National Science Centre (Poland).
Professional distinction: a triple laureate of NATO grant in matters of analysis and security of telecommunication signals (1995, 2000, 2008) and a laureate of award for the best publication in matters of statistical signal analysis, European Signal Processing Society, 2007.

Topic:
Big Data and Data Analytics

Abstract:
The main goal of my presentation is to introduce the participants to the most novel concepts in computational statistics and their implications into broader range of decisions based on Big Data. These days a data scientist has to work simultaneously on at least three different fronts. First, the data gathering plan called the design. Invariably, designs become extremely useful as we are flooded with data. Designs help us to focus on the aim of the study and on reduction of complexity. Secondly, the software environment we choose to analyze our data. The competition here is quite strong but the future will belong to open source solutions. The Author of this presentation belongs to the club of R-ofiles, that is a world-wide community of data scientists willing to share their tools. Finally, the third front of the battle of the data scientist is selection of an appropriate statistical algorithm. Here the Big Data revolution has dramatically changed the perspective. We now move much more audaciously to extremely high dimensional data with new statistical tools. The concepts of the talk will be illustrated with examples from the Authors experience, that is from signal processing, medical and financial data.

Jacek Leśkow Cracow University of Technology

Matthew Opala

Bio:
He is an entrepreneur, software engineer and machine learning practitioner. Currently holds the position of CTO at Craftinity, machine learning startup based out of Krakow. He’s been working on machine learning projects since graduation from university, first at Siemens, then at Craftinity. When he doesn’t train neural networks, he reads a lot on history, economics and plays football.

Topic:
Explaining neural networks predictions

Abstract:
Recently Deep Neural Networks have become superior in many machine learning tasks. However, they are more difficult to interpret than simpler models like Support Vector Machines or Decision Trees. One may say that neural nets are black boxes that produce predictions but we can’t explain why a given prediction is made. Such a condition is not acceptable in industries like healthcare or law. In this talk, I will show known ways of understanding neural networks predictions.

Matthew Opala Craftinity

Maciek Próchniak

Bio:
In the past algebraic topologist, for more than 10 years JVM developer (Java, Scala, …), currently usually as architect/(lead) developer – but my roles vary from analysis to devops.
My main fields of interest are integration (Camel, OSGi), functional programming and stream processing systems (Akka, Kafka, Flink).
I also like to give talks at conferences – Confitura, JEEConf, VoxxedDays just to name a few.
Currently I’m leader of TouK Nussknacker – project which enables analysts to create streaming jobs with friendly UI.

Topic:
Stream processing in telco – case study based on Apache Flink & TouK Nussknacker

Abstract:
Stream processing is one of hypes of last two years. Apache Flink, Spark Streaming and the likes conquer the world. We can hear about quite a few interesting use cases but most come from startups/technology companies – Netflix, Uber or Alibaba are good examples. I’d like to talk about case which is a bit different.
Two years ago we helped to introduce Apache Flink in one of the largest mobile operators in Poland – at first to help with real time marketing. The data used included information from billing and signalling systems.
We wanted to enable analysts and semi-technical people to create and monitor processes and that’s how Nussknacker – our open source GUI for Flink was born. Today, many steaming jobs are created by analysts, without the need of developers assistance. I’ll tell about this journey –
what features of stream processing are important for telco business, what barriers do we see in Flink adoption in enterprise and what we consider to be it’s main selling points.
We have learnt that common data model can be reused for different purposes – the most important one is real time fraud detection.
Today we’re processing billions of events daily from more than dozen of sources with > 40 processes running in production.
I’ll also talk about our current architecture, where it seems applicable and what are our plans for the future.
The target audience of this talk are developers/analysts and architects who consider introducing stream processing in their organizations.

Maciek Próchniak TouK

Paweł Rzeszuciński

Bio:
Pawel Rzeszucinski received MSc in Computer Science from Cranfield University and MSc in Electronics from Wroclaw University of Technology. He subsequently moved to The University of Manchester where he obtained PhD on project sponsored by QinetiQ related to data analytics for helicopter gearbox diagnostics. Upon returning to Poland he worked as a Senior Scientist at ABB’s Corporate Research Center and a Senior Risk Modeler in Strategic Analytics at HSBC. Currently he is a Data Scientist at Codewise.

Topic:
Descriptive statistics – the mighty dwarf of data science.

Abstract:
Nowadays fair part of the community (often influenced by the pressure from the business) seems to show a tendency of applying somewhat complex and rather computationally expensive algorithms to applications that would have been easily accommodated in the past by much simpler (hence faster) and much more interpretable (hence of greater business value) techniques. This presentation aims to remind peers the power and beauty of descriptive statistics as an approach for quantitatively describing the nature of features and creating solid foundations for any subsequent data investigations.

Paweł Rzeszuciński Codewise

Grzegorz Wyszyński

Bio:
Grzegorz is working in Advanced Engineering group in Aptiv (formerly Delphi) developing Advanced Driver Assistance and Automated Driving systems for the future cars. His team work is focused not only on sensors and hardware, but also on data analysis algorithms and modelling.

Grzegorz finished Nuclear Physics on Jagiellonian Univiersity and gained his Ph.D. title in 2015. Since then his professional life is related to automotive and bringing new products to market, firstly as Validation Engineer, later as R&D Engineer.

Topic:
Big and smart data in the development of autonomous vehicles

Abstrakt:
When can we expect autonomous cars on the roads? All the technical solutions are there, ready for vehicle manufacturers to equip their models. What stops us? Part of the answer is connected with ensuring safety, and the need of testing. Simusafe project, funded by EU and coordinated by Aptiv, aims at finding behavioral models to describe the multi-actors traffic, making the simulations reflecting the real life.

Grzegorz Wyszyński Aptiv

Nikolay Tsvetkov

Bio:
Software Engineer in the Data Services section of the Beams Controls group at CERN. Since 2015 he has played a major role in the design and development of the next generation Accelerators Logging Service (NXCALS), which is based on state of the art Big Data technologies including Apache Spark and Kafka.
Nikolay is driven by the goal of making the life of the Data Scientists easier, providing native structured access to logged data and integrated tools for data analysis. He strives to deliver pragmatic solutions and is passionate about dealing with huge amounts of data and building distributed systems.

Topic:
Next CERN Accelerator Logging Service Architecture

Abstract:
The Next Accelerator Logging Service (NXCALS) is a new Big Data project at CERN aiming to replace the existing Oracle-based service.

The main purpose of the system is to store technical accelerator data needed by machine operators and data scientists at CERN. Gathered from thousands of devices in the whole accelerators complex the data is used to operate the machines, improve their performance and conduct studies for new beam types or future experiments.

This presentation is a dive into the Hadoop/Spark based NXCALS architecture. Nikolay will speak about the service requirements, the design choices and present the Ingestion API as one of the main components of the system. He will also reveal the core abstraction behind the Meta-data provider and the Spark-based Extraction API where simple changes to the result schema improved the overall usability and performance of the system.

This talk can be of interest to any companies or institutes confronted with similar Big Data problems as the system itself is not CERN specific.

Nikolay Tsvetkov CERN

Rui Vieira

Bio:
Rui is a Software Engineer at Red Hat working on Data Science, Apache Spark and Spark Streaming applications.

Topic:
Collaborative Filtering Microservices on Spark
(Joint talk with Sophie Watson)

Abstract:
The Alternating Least Squares (ALS) algorithm is still deemed the industry standard in collaborative filtering. In this talk we will focus on Apache Spark’s ALS implementation and discuss the steps we took to build a distributed recommendation engine, focusing on continuous model training and model management. We show that, by splitting the recommendation engine into microservices, we were able to reduce the system’s complexity and produce a robust collaborative filtering platform with support for continuous model training. At the end of this talk, you should be equipped with enough tools and ideas to implement your own collaborative algorithm and avoid some common pitfalls.

Rui Vieira Red Hat

Sophie Watson

Bio:
Sophie is a Software Engineer at Red Hat, and has recently finished a PhD in Bayesian Statistics.

Topic:
Collaborative Filtering Microservices on Spark
(Joint talk with Rui Vieira)

Abstract:
The Alternating Least Squares (ALS) algorithm is still deemed the industry standard in collaborative filtering. In this talk we will focus on Apache Spark’s ALS implementation and discuss the steps we took to build a distributed recommendation engine, focusing on continuous model training and model management. We show that, by splitting the recommendation engine into microservices, we were able to reduce the system’s complexity and produce a robust collaborative filtering platform with support for continuous model training. At the end of this talk, you should be equipped with enough tools and ideas to implement your own collaborative algorithm and avoid some common pitfalls.

Sophie Watson Red Hat

Tomasz Wesołowski

Bio:
He spent last 15 years in the internet-industry as a entrepreneur, advisor, and board member of several companies. Founder and managing director of one of the biggest polish software houses (grown from 2 to 200 employees). After M&A process he made an exit with selling his shares, and now he is involved in creating an intelligent assistant Edward, as a co-founder and CEO of 2040.io.
Co-founder of Krakow Artificial Intelligence Meetup Group, interested in modern user interfaces, and social aspects of artificial intelligence.
He was also a co-founder and board member of PROFEO – the community for professionals (Polish Linkedin competitor), founder of Techcamp – biggest Polish technological barcamp meetings and co-founder of Ecommerce director’s club.

Topic:
What we’ve learned from creating Edward.ai

Abstract:
This will be a story about creation of AI powered sales assistant. How did it all start, and what challenges we’ve faced during the last 18 months? How did we apply AI into our software and what our customers are saying about the usability of such tool? And what are the plans for the nearest future in comparison of artificial intelligence advancement.

Tomasz Wesołowski 2040.io

Other sphere.it events

React.sphere.it is a conference focused on Reactive Programming and Reactive System Design. Now in its 2nd Edition, it’s a perfect opportunity to meet and share knowledge with experts in this field.

Visit react.sphere.it

abc

Scala.sphere.it is a unique event devoted to important topic for every Scala Software Developer – Dev Tools.

Visit scala.sphere.it

Already <b>decided?</b>

Already decided?

You can book your tickets today. The tickets also let you attend both sister events happening at the same time and venue (React.sphere.it & Scala.sphere.it)

Get tickets

Practical info

Sponsorship

Please take a look at Sponsorship Offer

Getting around

From airport take taxi or train.
For checking public communication you can use jakdojade.pl

Main venue

The Opera of Kraków
Lubicz 48
31-512 Kraków

The afterparty

We invite you to Browar Lubicz

Day of practice

There will be several workshops, hackathons and training on 15th of April. More details soon

We are powered by

Organizer

Virtus Lab

Platinum Sponsor

tesco

Partners

Ibm

Community Friends

MBN Solutions Networking IT w Krakowie Frankfurt Data Science Pykonik gdg krakow datakrk TDS

Media Partners

Cloudforum programistka.com No Fluff Jobs justjoin.it DataWorkshop Biznes Myśli


Would you like to become a sponsor?

Code of Conduct

The following Code of Conduct is inspired by that from other prominent conferences such as ScalaDays or Scala eXchange.

DataSphere is dedicated to providing a harassment-free experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, nationality, age or religion. We do not tolerate harassment of participants in any form. Please show respect for those around you. This applies to both in-person and online behavior.

All communication should be appropriate for a technical audience,

including people of many different backgrounds. Sexual language, innuendo, and imagery is not appropriate for any conference venue, including talks.

If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact a member of staff immediately. If an individual engages in harassing behaviour, the DataSphere staff may take any action they deem appropriate, including warning the offender or expulsion from the event.

We expect all attendees to follow above rules during our DataSphere Conference.