Different publicly available data sources will be presented and discussed to potentially train machine learning algorithms and the minimum requirements on datasets to be useful for training.
We will have three speakers:

  • Steffen Brandt: data for natural language processing
  • Matthias Nannt: data for computer vision
  • Thies Schönfeldt: the Kiel Data Hub and open data in Kiel

Depending on the audience attending, the talks will be in English or German language.

Also in this meetup, we would like to discuss potential additional activities to foster the collaboration of the Kiel.AI Meetup members; for example, providing a written feedback from the Kiel.AI group on potential improvements for data platforms such as the newly launched open data platform of Schleswig-Holstein.

After the presentations, we are as always looking forward to have a drink together and discuss things further!

To get an overview of the number of participants, please register for the event at Meetup here.

Background
Companies such as Google are able to develop powerful AI software due to their large data collections, and this is also the reason why they rarely make their training data publicly available. However, there are also a lot of publicly available datasets on platforms such as Kaggle or on sites for specific purposes, like the MNIST dataset to train reading handwriting, or the fashion MNIST for recognizing different types of clothes.