It’s our job to refine and extract as much value as we can from what we get from our customers
<HeurekaDevs People> How does machine learning influence e‑commerce and online shopping? Meet Miha Jenko, an experienced machine learning engineer who will tell us more about his team at Ceneje, their challenges, and how to keep up with trends in this area.
What was your journey to becoming a machine learning (ML) engineer?
I started out as a programmer with an interest in fitting curves to multidimensional data points. Many places define a machine learning engineer as a full-stack developer with a specialization in machine learning. To be effective, you basically need to know a bit of everything: programming, data science, distributed computation, system operations, and architecture design. The tools of the trade have taken me years to learn at three different companies. Our stack is ridiculous and challenging, but the upside is that the job is never boring.
Can you describe or give us examples of how machine learning influences online shopping and what it could bring in the future?
These days, machine learning is all over e‑commerce. You might start your product search on Google. How does their autocorrect work? Why can we ask questions and get results back in the form of an answer? You might visit Heureka.cz and realize after searching for a keyword that you have been redirected to a category page instead of getting mixed results. Personalized recommendations of goods cannot work without sophisticated machine learning algorithms powering them.
What is your role at Ceneje?
At Ceneje, data quality is our highest priority. I’m the lead developer of a system for automatically sorting products into our category trees. It saves our content people time and the frustration of dealing with low-quality data.
As a newly established team – One Matching, what are the main areas of your team’s expertise and what is its vision?
I'm part of two teams. The main one – the One Matching team is be responsible for creating an end-to-end machine learning pipeline for product category classification, product matching, and product attribute extraction tasks. On top of this core service, we expect to develop a series of tools for content people, automating away the mundane and repetitive tasks. Our goal is to increase both data quality as well as reduce labour costs associated with matching our users with the best offers for the products they are shopping for.
What are the challenges your team is dealing with now?
The main challenge with any machine learning problem is the initial quality of data. Heureka Group is swimming in data of varying degrees of quality. It’s our job to refine and extract as much value as we can from what we get from our customers. Metadata often disappoint us, so we have to get creative. One unique challenge is matching product text snippets across different languages. Imagine having a product description in Czech or Hungarian. Do we have to find Czech-Hungarian experts to translate our data? Do we translate everything to intermediary English first? Those approaches are simply not scalable. Fortunately, recent developments in language technologies offer some solutions, but we need to make them work for us.
Machine learning is a fast-growing field. How do you keep up with trends? Anything you’d like to recommend to other "ML" engineers?
The easiest way to grab hold of trends is to follow relevant people on Twitter. The latest research gets posted on Arxiv, which is then signal boosted by Twitter users. Honestly, it is a great content aggregator. YouTube is another one, and I’d argue a bit more accessible too. What I also find valuable is to join Slack or Discord communities covering open-source tools. Every so often I skim the threads for interesting use cases. It gives me the opportunity to read and ask others in the field about how they solve their problems.