Research Software Engineer

Natural Language Processing (NLP) with a focus on Information Retrieval and Machine Learning methods.

Carsten Schnober

Featured Projects

100 Queries

Investigating what content Dutch children find on the internet through web search analysis

Explore →

The Syllabus

Natural Language Processing solutions for information curation and public sphere defense

Explore →

Research R&D

Applied research projects in AI, historical text recognition, and sustainability

Explore →

Open Science & Open Source

Committed to FAIR data principles and sustainable software design at the Netherlands eScience Center

Netherlands eScience Center

Practical Innovation

Turning state-of-the-art research into practical applications grounded in real-world requirements

Community Teaching

Certified Carpentries Instructor spreading digital skills across the scientific community

Let's Work Together

Interested in consulting, software implementation, or collaboration?

Get In Touch

Technology & Software

In the past 14 years, I have worked for academic and commercial organizations to build specialized and curated search engines and discovery platforms for educational content, researchers, journalists, historical texts, and job seekers.

I turn state-of-the-art technologies in fields like Natural Language Processing (NLP), Machine Learning, and Deep Learning into practical applications with a strong emphasis on the human perspective and scientific methodology.

🔍 Information Retrieval & Search

Keyword Search: Which documents are most relevant for a search term?

Semantic Search: Which documents cover the most similar topics?

Reranking: What is the best order of the results?

🧠 Language Modelling & Generation

Encoding: Machine-readable vector representation using BERT and similar models

Generation: Creating human-readable texts and summaries

Weaviate

RAG: Retrieval-augmented generation for knowledge-grounded responses

🏷️ Classification & Labeling

Document Classification: Subject, age category, content type

Spam Detection: Identifying unwanted content

Named Entity Extraction: Persons, places, organizations

📊 Topic Modelling & Clustering

Unsupervised Learning: Discovering patterns in data

Topic Discovery: Finding thematic clusters in document collections

Dimensionality Reduction: Making data interpretable

Programming Languages & Frameworks

Python

Primary language for research and data science. Deep expertise with scientific computing ecosystem.

NumPy Pandas Scikit-Learn

Machine Learning & Deep Learning

Building and deploying modern neural networks and ML models.

PyTorch Hugging Face TensorFlow

Java & JVM Languages

Enterprise software development and backend systems.

Java Scala Spring

Search & Data Technologies

Building scalable search infrastructure and data pipelines.

Elasticsearch Solr Spark

DevOps & Cloud

Containerization, orchestration, and cloud infrastructure.

Kubernetes Docker AWS

CI/CD & Agile

Modern development practices and continuous integration.

Git CI/CD Agile

Key Principles

Grounded Evaluation

Understanding what is the best solution in a specific context through careful measurement and qualitative analysis.

Data Curation

Manual annotation and qualitative evaluation ensure quality over generic benchmarks.

Efficient Resources

Building sustainable solutions that respect computational constraints and environmental impact.

User-Centered Design

Understanding purpose and requirements of users, not providing generic solutions.

"A technological solution needs to fully understand the purpose and specific requirements of the users, not to provide a generic solution based on a benchmark."

Interests & Expertise Areas

Software Engineering Open Source Open Data Open Science Natural Language Processing Information Retrieval Search Engines Machine Learning Deep Learning Artificial Intelligence Data Engineering Data Analysis DevOps Agile Development

Projects & Journalism

Notable Projects

100 Queries

In the context of SlimZoeken, we used web search as a lens to investigate what content Dutch children find on the internet.

We selected one hundred queries from authentic searches by primary school pupils and manually annotated the search results to evaluate quality and suitability.

Outputs:

The Syllabus

The Syllabus is a non-profit knowledge curation platform committed to defending and strengthening a well-informed public sphere. They unearth, disseminate, and highlight high-quality information without deepening public dependence on opaque algorithmic solutions.

Related to the Center for the Advancement of Infrastructural Imagination (CAII), I have been involved since the early stage, designing and implementing Natural Language Processing (NLP) solutions to semi-automatically support the curation process.

Visit The Syllabus →

Research & Development

As a Research Software Engineer at the Netherlands eScience Center, I have been involved in:

  • Leveraging AI for HTR Post-Correction: Improving historical text recognition
  • The Semantics of Sustainability: Understanding sustainability discourse
  • Impact & Fiction: Exploring the impact of fiction on research
  • Semantics of Sustainability

Mentoring & Fellowship

As part of the Netherlands eScience Center's fellowship programme, I have mentored:

  • 4Cat: A research tool for analyzing and processing data from online social platforms
  • Inseq: A PyTorch-based toolkit for studying interpretability in sequence generation models

Journalism & Publishing

For more than two decades, I have been an author and editor writing about Open Source Software, Large Language Models, Artificial Intelligence, and other technology-related topics.

Golem

German technology news and analysis

Visit Golem →

iX Magazine

Enterprise technology and development

My Articles →

Linux Magazine (English)

Open source and Linux software

My Articles →

Linux-Magazin (German)

German Linux and open source publication

My Articles →

LinuxUser & EasyLinux

User-friendly Linux and technology guides

My Articles →

Jungle World

Politics, culture, and society (in German)

My Articles →

Workshops & Teaching

I have developed and taught courses for academic staff at various academic institutions in the Netherlands. As a certified Carpentries Instructor, I focus on practical, hands-on learning in:

  • Deep Learning and Neural Networks
  • Machine Learning with Python
  • Research Software Development
  • Version Control and Collaborative Development
  • Best Practices in Scientific Computing

Available Courses

Introduction to Deep Learning

Foundational course covering neural networks, architectures, and practical applications.

Course Materials →

Intermediate Research Software Development

Advanced Python development for research, including testing, documentation, and best practices.

Course Materials →

Good Practices in Research Software Development

Comprehensive guide to sustainable and effective research software practices.

Course Materials →

Machine Learning in Python with Scikit-Learn

Practical machine learning applications using scikit-learn and modern data science tools.

Course Materials →

Collaborative Version Control with Git and GitHub

Master Git workflows and GitHub for effective team collaboration.

Course Materials →

Customized Corporate Workshops

Tailored training programs designed for your specific organizational needs and technology stack.

Contact for details

Institutions & Partnerships

I have developed and taught customized courses and workshops for:

PhD Summer Schools

ODISSEI

CAA International

Data Center Programs

Data Center Apprenticeship at UCR

Corporate & Research

TU Eindhoven

TNO

Erasmus MC

Certification

I am a certified Carpentries Instructor, committed to teaching foundational computing skills and best practices in research software development.

The Carpentries community teaches foundational coding and data science skills to researchers worldwide through collaborative, evidence-based teaching practices.

Workshop Topics

Deep Learning Machine Learning Python Programming Research Software Git & GitHub DevOps Best Practices Agile Development Scikit-Learn Data Science

About Me

I am a research software engineer with an academic background in Computational Linguistics, Natural Language Processing (NLP), and Machine Learning.

I develop research software at the Netherlands eScience Center, in cooperation with Dutch research institutions. I am passionate about:

  • Building infrastructural tools for Information Retrieval and Language Technology
  • Understanding the mutual impact of technology on society and education
  • Open Science and Open Source principles
  • FAIR and sustainable software design
  • Teaching and mentoring in digital skills

I am also involved with the Center for the Advancement of Infrastructural Imagination (CAII), The Syllabus, SlimZoeken, and various print and online publications.

Research & Engineering Experience

I have worked as a research software engineer in academic institutions and industry, bringing state-of-the-art technology from Natural Language Processing (NLP), Machine Learning, and Deep Learning to practical use.

NLP & Research Focus

Language Models & AI

Working with Large Language Models and modern NLP techniques for practical applications

Information Retrieval & Search

Building semantic search, RAG systems, and discovery platforms

Text Analysis

Topic modelling, text classification, and inter-disciplinary research

Named Entity Recognition

Extracting persons, places, organizations, and domain-specific entities

Engineering & DevOps

Cloud & Containerization

Kubernetes, Docker, microservices architecture, and AWS

Programming Languages

Python, Java, Scala, and modern development practices

Distributed Computing

Spark, Hadoop, and large-scale data processing

DevOps & MLOps

CI/CD pipelines, monitoring, and sustainable software practices

Education

MSc Speech & Language Processing

University of Edinburgh

This programme covers all areas of speech and language processing, from the foundations of phonetics and speech technology to natural language understanding and the latest methods using large generative speech and language models.

B.A. Computational Linguistics

University of Heidelberg (Applied Computer Science)

Computational linguistics investigates how human language can be processed and interpreted by computers. It explores the mathematical and logical properties of natural language and develops algorithmic and statistical methods for automatic language processing.

Additional Activities

Teaching & Workshops

Member of the teaching team at the Netherlands eScience Center, organizing workshops on Machine Learning, Deep Learning, Python, and Research Software Development

Project Review & Consulting

Reviewing project proposals, consulting proponents, and contributing expertise to new research initiatives

Proposal Writing

Forming consortia and co-authoring proposals for research funding and collaborations

Scientific Publishing

Publishing research findings and insights in academic and professional publications

Mentoring

Participating in mentorship programs including the eScience Center Fellowship Programme

Journalism

Contributing articles on Open Source Software, AI, and technology culture to various publications

Core Values

Open Science & Sustainability

Committed to FAIR data principles and designing software in a sustainable, environment-conscious way.

Human-Centered Technology

Understanding user needs and societal impact, not just implementing generic solutions.

Knowledge Sharing

Spreading digital skills and fostering a community of learning and collaboration.