Back in the days before I worked for Tabnine I was tasked with various ML problems. From industrial product defect review, CCAI solutions to fraud detection. Nearly all of these tasks utilized some version of deep learning via neural networks, with various levels of complexity. As any reader versed in the current state of machine learning might be aware, natural language processing is where the most difficult and simultaneously intriguing problems lie. It’s even become a bit of an arms race to see who can train the most cutting edge NL model. From BERT to XLNet and GPT-3 and numerous derivations of each, natural language models are the definition of large, computationally intensive “artificial intelligence”.
So where does a programming language fit in the machine learning space? Programming languages are “languages”, it says so right there in the name. But are they really in the same problem space as actual spoken or written languages? Famously the phrase, “Sam was found by the stream” is linguistically ambiguous without surrounding context in which to place the various actors. Of course we as humans know that a stream can’t really “find” anyone but that’s only because we are in possession of a fantastically well trained neural network of gray matter in between our ears. For a NL algorithm to learn ambiguity, cultural turns of phrase or contextual summary is an incredibly difficult problem.
However, programming languages follow much stricter guidelines. They have to if we hope to have it compile and run. Ambiguity is not an option. In the bias/variance tradeoff we can err on the side of overfitting without serious deleterious consequences. There are, of course, multiple ways to write any given function in programming language that will accomplish a defined task. So if we are to use an AI code completion tool like Tabnine, the model running on that code base can be quite specialized.
And this brings us to the question of parameter metrics. Many of the NL models tout the number of parameters used to build the underlying model. We are now well into the hundreds of billions of parameters for the leading models. But does this necessarily mean that using one of these models as the basis for programming language transfer learning will give us better suggestions in the IDE? The answer is quite clearly no, it does not. Training a specialized model for a particular task utilizing specific gold standard code and libraries will give the developer some really amazing in-line suggestions, full code snippets and contextual awareness.
So beware of AI based code tools that purport to have XXXM more parameters than some other tool. Just because there are more parameters does not mean that it will work better for a specific programming task. As in life, so goes AI assisted pair programming. Balance is key. A specific model trained for the front-end devs in your company will likely show some significant ROI if it was well considered and deployed. But you would not want to hand that model over to the data science group as it is the equivalent of two left shoes.
A properly done based model for your organization’s developer teams is ideal. Tabnine’s team learning model automatically learns from the group to constantly improve and focus the model on what the team is developing. Metrics surrounding the team’s usage and model effectiveness should be tracked and changes made if those metrics begin to slip. Like any properly managed MLOps process, AI pair programming is a powerful tool when done right.
In conclusion, while NL model parameter count can tell us something about the inherent applicability to our everyday communication, it isn’t the right metric to judge the effectiveness as applied to programming language models. Choose wisely and keep vigilant on your organizations model library and the returns will be significant.