Once or regularly?
Regardless of whether it is the sales report, the quarterly figures, the AI model, the market observation or the performance dashboard, the same problem almost always occurs: the numbers have been found, the sources have been identified and the reports are actively used in the company. Only every day/week/month/quarter there is stress again when the report has to be updated.
This is where “data engineers” help who automate the processing of data, make it more robust, keep the pipeline alive and thus relieve the workload on the specialists.
My help as Data Engineer
Pipelines
Various open source tools have proven themselves useful for building processing pipelines (ETL):
- Kafka
- Hadoop
- Hive
- Spark
- Nifi
- Airflow
- ...
I have already had the opportunity to work with many of these tools and bring with me a wide range of experience.
Operations
Operating a platform has various facets:
- Availability
- Cost
- maintenance
- Security
Regardless of the platform, whether cloud or on-premise - when setting up a system, future regular operation must always be taken into account.
I bring these experiences with me.
Workflows
Data never stands alone. Collection, processing and use must be documented. Users must be able to work with them and trust them.
Talking, explaining, listening, helping, advising - in the end, this communication is more important than the nicest new tool.
Consulting
Maybe I don't yet have perfect knowledge of a tool. I guess I can no longer recite all the algorithms and data structures from my studies without disseminating them.
However, I can bring a wealth of experience to your project and your team and know many strategies that others have already used successfully.
Is that a short cut? Probably yes - but is it wrong?