Programmers: Searching for a way to accelerate code exploration by mapping line number ranges to namespace/class/method names

TL;DR: I want some tool that can (on a per-line basis) generate a "semantic" context for use when I am doing code-grepping. For instance I use ack, and before I used that, I used grep -R. But these tools essentially only give me a filename, a usually-inscrutable line number, and a fixed amount of context that either blows up the output too much or is not enough context. I think we can do better than this.

I don't want to start from scratch, so are there better tools than the ones I am already using?

Initially I was going to post this question on StackOverflow, but I realized that the question is really more about searching for existing tools (prior art) rather than getting into the specifics of implementation. If there is nothing I can use now, it will be the wrong decision for me to start building such a tool. However, if I am unable to stop that urge, I already know more or less what techniques I would apply. I'm not sure, but I think this means that people will vote to close if I ask on SO.

More background follows:

I have realized that over the years, and across many code bases, a few common workflows repeat themselves. One of these is the often experienced situation when we need to do a search across an entire codebase for a variable name.

I find that even if code is clean and well organized, there is still often quite some time consumed by scrolling around (listlessly... or frantically... or excitedly... doesn't matter, it always consumes time) getting my bearings -- I'm always dealing with this menial task of taking a file and a line number, and converting that into some simple mental representation that I can actually work with.

So today I had the idea of automating this job by seeing if I can change my tools to serve me better. A representation for the search results that actually has any hope of instantly helping me would be, say, the name of a function or class or some such, which contains these lines.

I'll use a PHP style example since that's the job I'm tackling at the moment, but the principles apply in the general case.

Suppose I want to find when the database field "hour_start" is being referenced by the code, I might perhaps run `ack "hour[^A-Za-z]+start --color | less" and start browsing through the code in the editor while referring to that output. And doing a whole bunch of scrolling or entering line numbers to jump to, and opening a ton of tabs, etc.

But what would be really sweet is to just have my search tool give me back the API method names that contain the references to my search pattern. And i can run it with a few more lines of context and I might not even need to open the files! I'll learn right off the bat that the methods fetchTimeRecord() and deleteTimeRecord() in the recordController are the only methods that I care about, and that maybe fetchTimeRecord has 10 references while deleteTimeRecord has 2 references. Compare this with essentially what amounts to being 12 random numbers and 12 lines of code taken entirely out of context (or 36 lines of code in just barely more context, etc.)

So, Exuberant Ctags is a good start because its purpose is somewhat well-aligned with mine. I can set it up easily to get it to capably grab most of the relevant symbols in almost any piece of source code, because it supports a lot of langauges. Since I use vim, I have a plugin (tagbar) that cleverly runs ctags automatically so that it is kept up to date with the code and I can have it always visible to give me a good high-level overview of whatever file's open. That's all pretty neat. But tags has one huge Achilles' flaw which is that a given tag is only mapped to a single location (a line number) which presumably represents the location of the definition of the symbol. In fact this limitation is the one that leads to me not actually using tagbar much, because it isn't actually smart enough to tell which function my cursor is inside of. It tends to just highlight whatever random variable my cursor was last over, which is almost less than useless. I will admit that I'm not 100% sure how it works, because I do recall using it with C++ code and it did a fairly good job of figuring out which method I was in. Still, this doesn't seem terribly promising.

What I need is the metadata that tells me the range of line numbers that are covered by these definitions, so that I can take my line number that ack or grep has found for me and fetch the name of the method/class/namespace/etc. that it lies within.

One way I might approach this is to iteratively build more and more robust regexes that are targeted toward matching and calculating the line number ranges of a piece of code that certain language constructs "cover" (say, limit to namespaces, class definitions, methods/functions for now).

For a language like modern JavaScript (which I incidentally also do a lot of work with), regexes may fall well short of being sufficient to decipher this stuff, considering how invasive the closures are.

This is starting to sound really challenging, and maybe it really is just a fundamentally difficult problem to solve.

But I also think that my constraints are relaxed enough to the point that I shouldn't actually require a full parse of the specific programming language. That I should be able to use a regex to specify all the types of namespace, class, and function declarations that I care about, and then for most languages other than python I can simply hunt for the corresponding closing brace.

Programmers

mardi 24 mars 2015

Searching for a way to accelerate code exploration by mapping line number ranges to namespace/class/method names

Aucun commentaire:

Enregistrer un commentaire