ryanwhocodes Mar 27, 2018 · 2 min read

Mind Your Repo's Programming Language on GitHub

How to use GitHub Linguist and gitattributes to detect your app’s code type accurately.

Some GitHub projects appear to be based on a different code type than they really are. This tutorial will show you how to fix this.

GitHub aims to detect and display the main programming language of your project repos using its Linguist library. However, it sometimes reports a different code type for the project. This post will show you how to take control of this.


GitHub Linguist

Linguist aims to estimate the code type of a project by calculating percentages based on the bytes of code for each language, then selecting the highest one as the language for your project. A more-in depth explanation of this calculation can be read in the Linguist Readme

One of the consequences can sometimes be that the largest code type can be reported based on imported libraries, or files that support the app but do not contain the most significant logic.

For example, in my project qr-code-pwa — A Javascript Progressive Web App that generates QR codes — GitHub Linguist initially reported it as a CSS project.

A GitHub project identified as using CSS.

If the language reported is not as expected, you can override it using gitattributes. In short:

  • Create a .gitattributes file in the root of your project
  • If you were creating a web app but the language containing the key logic is Javascript rather than HTML or CSS, you could add the following to it:
* linguist-vendored
*.js linguist-vendored=false
  • Push it to your repo
  • You should now see an updated project code type

A GitHub project identified as using Javascript.

How gitattributes works

Linguist defines vendored code as libraries or imported files you didn’t write. Setting files to linguist-vendored=false in your .gitattributes file tells the program that these are not imported library files, and that it should include them in its stats.

Use the linguist-vendored attribute to vendor or un-vendor paths.

$ cat .gitattributes
special-vendored-path/* linguist-vendored
jquery.js linguist-vendored=false

It also allows you to specify that the language files should be marked as:

*.rb linguist-language=Java

And whether to exclude files that should be considered documentation:

documented_code.rb linguist-documentation=true

Now your projects should display the correct programming language on GitHub.

Find out more

Learning about Gihub Linguist can help ensure your projects are correctly identified with the desired programming language. Check out documentation to learn more about how to work with .gitattributes files.

Header image source: outerplaces.com.