Get Involved - Tech For Nepal

We are completely open source, and all we want to do is contribute to our country in the ways that we believe we can the best: through our skills and knowledge in the tech and research fields. If you, just like us, share common sentiments and have a deep love for our country, we’d love to hear from you, whether you’re interested in contributing, have questions, or want to get involved in our community.

We now also have a Discord server. For any inquiries, suggestions, or general feedback, you can join us there or reach out via email at namaste@techfornepal.com.

Contributing via Writing

If you want to send us your writing for the blog, you can do one of the following:

Read our CONTRIBUTING.md file on our GitHub repository and follow the instructions.
Send us your writing via email (alongside with your name, email, and your photo if you’re comfortable with that). We can do the uploading for you. The blog posts will look something like this.

Contributing via Tech & Research

At the moment, it’s just me doing this alone, but I believe we have some exceedingly difficult but (what we consider) important projects that we need to work on. I’m focused on AI Research and Development, and this is me showing my dedication to our country in ways that I think I can be most useful in.

Here are the research directions we’re actively exploring:

Tokenization: Most tokenizers fragment Nepali text into far more tokens than English (sometimes 2-3x more), which inflates cost and shrinks effective context. We want to train and benchmark a dedicated Nepali tokenizer and compare it against existing approaches.
Data curation and corpus engineering: Nepali is a low-resource language, and the data we do have is heavily skewed toward news. We need to assemble a deduplicated, high-quality corpus, audit it for domain coverage, and figure out the right mix of native Nepali, Hindi, and English data for training.
Translationese detection: A lot of available Nepali text is translated from English, and translated text has systematic quirks (unnatural word order, overuse of pronouns, calques) that can degrade model quality. Building tools to detect and manage translationese in training data is an open problem.
Synthetic data generation: Carefully generated synthetic data can help fill gaps in domains where native Nepali text is scarce. The challenge is doing this without degrading naturalness, grounding generation in Nepali sources rather than translating from English, and maintaining a healthy mix of real and synthetic data.
Evaluation and benchmarks: Nepali lacks a unified benchmark for language generation. We want to build evaluation tools like instruction-following benchmarks, cultural adequacy tests for register and honorifics, and bilingual retention suites that check whether a model still works well in English after learning Nepali.
Nepali language model development: The research directions above all feed into what we consider one of the primary goals, which is building our own series of open-source Nepali language models. Whether that’s extending an existing model with Nepali vocabulary and continued pretraining, or training a smaller model from scratch on a carefully curated Nepali-Hindi-English corpus, we want to get there. Tons of exciting directions to be taken.
Endangered language preservation: Languages in our country are dying out, and that makes me very sad. Creating datasets for endangered local languages is even harder than for Nepali, but it’s something we believe we must do. For this, I’ve already found some fellow researchers at Sabdakunja too; do check them out!
Accessible research writing: Reading papers on machine learning and its applications in our country, and then writing about them in a way that is easy to understand for the general public.

We might also need some core members who are good at development, design, and so on. If you think you have something to contribute on the tech side, please open a discussion on our GitHub or join the Discord server.

Grants & Computational Resources

We need computational resources to do this work. If you can help with that, please reach out to us via email at namaste@techfornepal.com.

If you want to get in touch with me personally, you can do so at sumit@sumit.ml.