kloc and other foolish metrics

What makes a good metric?

I've talked about metrics before in the context of binary patching. But there's a simpler problem which is code complexity. Random people have this tendency of asking the following question of any project with a seemingly large amount of code: How many KLOC does it have?

I'd argue that it's the wrong question.

Here are some types of files you'll find in mozilla's cvs:
Ext File type Format Data content
xbm XBitMap files. C Source. Each row in a picture gets a line in the xbm file.
rc Resource Scripts C Source. Approximately the same as xbms if you stuff bmps or icons into them.
css Cascading Style Sheets. CSS. Whitespace isn't significant and you can put an entire css file onto a single line.
rdf Resource Description Format XML. Whitespace isn't significant and you can put an entire rdf file onto a single line. You can also split that line at each whitespace marker so that each line has a single attribute or tag.
xml XML Userinterface Language. XML. Whitespace isn't significant and you can put an entire xul file onto a single line. You can also split that line at each whitespace marker so that each line has a single attribute or tag.
xml Extensible Markup Language. XML. Whitespace isn't significant and you can put an entire xml file onto a single line. You can also split that line at each whitespace marker so that each line has a single attribute or tag.
html HyperText Markup Language. SGML. Whitespace isn't significant and you can put an entire html file onto a single line. You can also split that line at each whitespace marker so that each line has a single attribute or tag. If you happen to split a <pre> section at whitespace markers you will need to use some css to counteract the pre's magic. If you put a pre block on a single line and want it to have line breaks then you'll have to use <br> or something.
cpp C++. C Source with single line comments. If a C++ file has // comments then you would have to take care not to merge the line that follows it into the comment line when compressing the file to a minimal number of lines (although you might be able to use trigraphs). Otherwise it's just a C file.
c C. C Source. If a C file has preprocessor markings then you would have to take care not to merge the line that follows into the preprocessor line.
Makefile. Makefile. Mozilla makefiles are composed of two things in general:
  1. a very tall license block comment.
  2. assigning a long list of files to variables. This list is frequently horizontal or vertical. You can of course convert from horizontal to vertical or vice versa.
Binary pictures. Binary. Any instance of a line marker is purely random.
pl Perl. Pathological. Whitespace isn't significant and you can put an entire perl file onto a single line. You can also split that line at each whitespace marker so that each line has a single token.
The odd thing about all of these formats isn't that they have lines (in fact bitmaps don't have lines). Nor is it that they all take up bytes (everything takes up bytes, but licenses are meaningless to programmers concerned with learning curves for programming projects).

So if counting lines isn't a good idea. and counting words will give you way too many comments and licenses, could there possibly be a better and more consistent measure?

yes