Code Narrator - an inverse literate programming tool.
Code Narrator (Codnar) is an inverse literate programming tool. It splits the source files into “chunks” (including structured comments) and weaves them back into a narrative that describes the overall system.
A simple gem install codnar
should do the trick, assuming you
have Ruby gems set up. If you want to use the VIM-based syntax
highlighting, you also need to install gvim
. Similarly, you
need to install GraphViz
to be able to embed SVG diagrams in
your HTML.
The basic usage is:
codnar-split [options] source-file > chunks-file codnar-weave [options] chunks-files... > codnar.html
Both programs accept a -h
or --help
flag to print
more detailed usage messages. You can also invoke Codnar from a Rakefile:
require "codnar/rake" Codnar::Rake::SplitTask([ source-files... ], [ configurations... ]) Codnar::Rake::WeaveTask(root-file, [ configurations... ], output)
This is the story of the Code Narrator (Codnar) tool. It serves a dual purpose. It describes the Codnar tool itself, but it also serves as an example of why it exists in the first place. To explain this more fully, we'll have to make a little detour into the issue of system documentation.
Documentation for any system can be grouped to two kinds. The first kind is the reference manual. If you know of a small piece of the system, this kind of documentation will give you the details about it. A good reference will help you find this piece even if you only have a rough idea of what it is named. A really good reference will also link it to related pieces. A great reference will even give you example of how to use the related pieces in a realistic context.
Reference manuals are invaluable, and there are plenty of tools to help you create them. The common approach is the use of structured comments (e.g., JavaDoc, Doxygen, and a host of similar tools). However, reference manuals by themselves are insufficient.
A reference manual only works if you have some idea about how the system works as a whole. For that, you need some sort of overview. Here there is much less to help you produce good documentation. The common practice is to sprinkle small tutorials inside your reference documentation (the MSDN library is a good example). This doesn't really solve the problem: how do you sufficiently explain a complex new system, so that references and small tutorials become useful?
One possible solution to this problem, literate programming, was proposed by Knuth. In a nutshell, the idea was that the source code for the system fulfilled a dual role. You could compile it into the executable code, as expected. But you could also generate documentation from it.
So far this sounds a lot like structured comments, and indeed structured comments were inspired by literate programming. The key difference between the two approaches is that in literate programming, the generated documentation was not a reference manual. It was a linear narrative describing the system - a story which walked you through the system in an specific path chosen for optimal presentation.
To achieve this, the sources contained the linear documentation, with embedded code "chunks". The order of the chunks in the sources was determined by the narrative, not the programming language requirements. Extracting and re-ordering these chunks was part of the build process, so the regular compiler could process them as usual.
This was the great strength, but also the great weakness, of literate programming. For example, it is next to impossible to create IDEs and similar tools for literate programming source code. The code chunks are split any which way and spread around the source files in any order; the same source file may contain chunks in several languages; etc. Automatically figuring out, say, the list of members of some class would be a daunting task.
In contrast, structured comments stay out of the way of the IDE and similar tools. The source code is still structured exactly the way the compiler wants, which allows for easy, localized processing. The trade-off, of course, is that structured comments produce a reference manual, not a narrative.
Today, structured comments have taken over the coding world, and literate programming has all but been forgotten. The problem it tried to solve, however, is still very much with us. How do we explain a new complex system?
Codnar is an example of a different approach for solving this problem, "inverse literate programming" (similar to, for example, antiweb). This approach is a combination of structured comments and literate programming. Note that this approach is similar to, but different in key aspects from, reverse literate programming.
In inverse literate programming, the source files are organized just the way the compiler, IDE, and similar tools expect them to be. Structured comments are used to document the pieces of code, and a reference manual can be generated from the sources as usual.
In addition, the code is split into (possibly nested) named "chunks". This is done using specially formatted comments. It turns out this functionality is already supported by most coding editors and IDEs, in the form of "folds" or "regions". These allow the developer to collapse or expand such chunks at will.
At this point, inverse literate programming kicks in. The developer writes additional documentation source files, next to the usual code source files. These documentation source files contain a narrative that describes the system, much in the same way that a literate programming documentation would have done, with two important differences.
The first difference is that the documentation source files refer to and embed the code chunks (using their names), as opposed to a literate programming system, where the documentation source files actually contain the code chunks.
The second difference is that the documentation source files do not need to repeat the information that is already covered in the structured comments. When a code chunk is embedded into the documentation, it includes these comments, so all the documentation source files need to contain is the narrative "glue" for placing these pieces into a comprehensible context for the reader.
In this way, inverse literate programming allows generating a linear narrative describing the system, without abandoning the existing code processing tools. It also makes it easy to retrofit such documentation to an existing code base; all that's needed is to mark the already-documented code chunks (or even just treat each source code file as a single chunk), and provide the narrative glue around them.
Structutred comments have the advantage that they are easy to maintain. Every time you change a piece of code, change its comment to match. Simiarlt, literate programming forced one to maintain the documentation as well, since the same source file was used for code and documentation. Inverse literate programming does not share this advantage. The linear documentation is in a separate file, so it isn't immediately visible to the developer who is making the changes. Also, it is easy to just forget to include some chunks of code in the documentation.
These issues are very similar to the issues of unit testing. Unit tests live in a separate file from the code they test, and it is easy to forget to test some chunks of code. One way to ensure all code is tested is to use a code coverage tool. Similarly, inverse literate programming tools should complain about code chunks that are left out of the final narrative.
A different approach, TDD, ensures that the tests are up-to-date and complete by writing the tests before the code. The same approach can be used for documentation. DDD means that you first document what you are about to do, and only then follow up with the actual coding. Inverse literate programming and TDD are an excellent practical way to achieve that.
The unit tests are code like any other code. As such, they should be documented using structured comments. Certain unit test tools like RSpec, Cucumber and other BDD tools blur the line between the tests-as-code and the tests-as-documentation anyway, so the amount of unit test structured documentation should be small.
Therefore, if you are writing the tests first, you have done the heavy lifting of documenting what the new code will do. All that is left is providing a bit of surrounding context and embedding it all in the currect location in the narrative. Then, when you write the new code itself, it should be easy to connect it to the narrative at the appropriate point.
In the case of Code Narrator itself, the number of (raw) lines in the code library itself is ~2100 lines, the number of test code lines is ~2200 lines, and the number of narrative documentation lines is only ~900 lines. Given narrative documentation are easier to write than system (or test) code, this indicates maintaining a narrative is not an unreasonable burden for a well-tested project.
Codnar is an inverse literate programming tool. It allows you to tell a story about your system, which will explain it to others: developers, maintainers, and/or users. It builds on the structured comments you would write anyway to generate a reference manual for the system, requires minimal or no changes to your source code files, and works perfectly well inside your favorite IDE or editor. If you follow TDD or BDD, Codnar will make it easier for you to complement it with DDD.
Codnar is available under the MIT license:
Copyright © 2010-2011 Oren Ben-Kiki
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
And the current Codnar version is:
module Codnar
|
This version number. The third number is automatically updated to track the
number of Git commits by running |
VERSION = "0.1.76" end
The rest of this document goes into the details of Codnar's implementation. The core of the system is the following simple data flow: A set of source files is split into chunks; the chunks are woven into a single HTML. This simple flow can be enhanced by pre-processing the sources, or post-processing the HTML. In a realistic project, all this would be managed by some build tool; either using the command-line (for arbitrary build tools) or using the provided Ruby classes for Rake integration.
Here is a diagram showing the overall data flow in a system documented with Codnar:
Codnar makes the reasonable assumption that each source file can be effectively processed as a sequence of lines. This works well in practice for all "text" source files. It fails miserably for "binary" source files, but such files don't work that well in most generic source management tools (such as version management systems).
A second, less obvious assumption is that it is possible to classify the source
file lines to "kinds" using a simple state machine. The classified lines are
then grouped into nested chunks based on the two special line kinds
begin_chunk
and end_chunk
. The other line kinds are used to control how the
lines are formatted into HTML.
The collected chunks, with the formatted HTML for each one, are then stored in a chunks file to be used later for weaving the overall HTML narrative.
Scanning a file into classified lines is done by the Scanner
class.
Here is a simple test that demonstrates using the scanner:
require "codnar" require "olag/test" require "test/spec"
Test scanning classified lines. |
class TestScanLines < Test::Unit::TestCase include Test::WithErrors include Test::WithFakeFS def test_scan_lines write_fake_file("comments", INPUT) scanner = Codnar::Scanner.new(@errors, SYNTAX) scanner.lines("comments").should == LINES @errors.should == ERRORS end SYNTAX = { "start_state" => "comment", "patterns" => { "shell" => { "regexp" => "^(\\s*)#+\\s*(.*)$", "groups" => [ "indentation", "payload" ], "kind" => "comment", }, "c++" => { "regexp" => /^(\s*)\/\/+\s*(.*)$/, "groups" => [ "indentation", "payload" ], "kind" => "comment", }, "invalid" => { "regexp" => "(" }, }, "states" => { "comment" => { "transitions" => [ { "pattern" => "shell" }, { "pattern" => "c++" }, { "pattern" => "no-such-pattern", "next_state" => "no-such-state" }, ], }, }, } INPUT = <<-EOF.unindent.gsub("#!", "#") #! foo // bar baz EOF LINES = [ { "kind" => "comment", "line" => "# foo", "indentation" => "", "payload" => "foo", "number" => 1, }, { "kind" => "comment", "line" => " // bar", "indentation" => " ", "payload" => "bar", "number" => 2, }, { "kind" => "error", "line" => " baz", "indentation" => " ", "payload" => "baz", "state" => "comment", "number" => 3, } ] ERRORS = [ "#{$0}: Invalid pattern: invalid regexp: ( error: premature end of regular expression: /(/", "#{$0}: Reference to a missing pattern: no-such-pattern", "#{$0}: Reference to a missing state: no-such-state", "#{$0}: State: comment failed to classify line: baz in file: comments at line: 3" ] end
And here is the implementation:
module Codnar
|
Scan a file into classified lines. |
class Scanner
|
Construct a scanner based on a syntax in the following structure: patterns: <name>: name: <name> kind: <kind> regexp: <regexp> groups: - <name> states: <name>: name: <name> transitions: - pattern: <pattern> kind: <kind> next_state: <state> start_state: <state> To allow for cleaner YAML files to specify the syntax, the following shorthands are supported:
When the Scanner is constructed, a deep clone of the syntax object is created and modified to expand all the above shorthands. Any problems detected during this process are pushed into the errors. |
def initialize(errors, syntax) @errors = errors @syntax = syntax.deep_clone @syntax.patterns.each { |name, pattern| expand_pattern_shorthands(name, pattern) } @syntax.states.each { |name, state| expand_state_shorthands(name, state) } @syntax.start_state = resolve_start_state end
|
Scan a disk file into classified lines in the following format (where the groups contain the text extracted by the matching pattern): - kind: <kind> line: <text> <group>: <text> By convention, each classified line has a “payload” group that contains the “main” content of the line (chunk name for begin/end/nested chunk lines, clean comment text for comment lines, etc.). In addition, most classified lines have an “indentation” group that contains the leading white space (which is not included in the payload). If at some state, a file line does not match any pattern, the scanner will push a message into the errors. In addition it will classify the line as follows: - kind: error state: <name> line: <text> indentation: <leading white space> payload: <line text following the indentation> |
def lines(path) @path = path @lines = [] @state = @syntax.start_state @errors.in_file_lines(path) { |line| scan_line(line.chomp) } return @lines end protected
Scanner pattern shorthands
Scanner state shorthands
Scanner file processing
Scanner line processing
end end
As we can see, the implementation is split into two main parts. First, all shorthands in the syntax definition are expanded (possibly generating errors). Then, the expanded syntax is applied to a file, to generate a sequence of classified lines.
The syntax is expected to be written by hand in a YAML file. We therefore provide some convenient shorthands (listed above) to make YAML syntax files more readable. These shorthands must be expanded to their full form before we can apply the syntax to a file. There are two sets of shorthands we need to expand:
Expand all the shorthands used in the pattern. |
def expand_pattern_shorthands(name, pattern) pattern.kind ||= fill_name(name, pattern, "Pattern") pattern.groups ||= [ "indentation", "payload" ] pattern.regexp = convert_to_regexp(name, pattern.regexp) end
Convert a string regexp to a real Regexp. |
def convert_to_regexp(name, regexp) return regexp if Regexp == regexp begin return Regexp.new(regexp) rescue @errors << "Invalid pattern: #{name} regexp: #{regexp} error: #{$!}" end end
Fill in the name field for state or pattern object. |
def fill_name(name, data, type) data_name = data.name ||= name @errors << "#{type}: #{name} has wrong name: #{data_name}" if data_name != name return data_name end
A pattern that matches any line and extracts no data; is meant to be used
for catch-all transitions that transfer the scanning to a different state.
It is used if no explicit pattern is specified in a transition (that is,
you can think of this as the |
CATCH_ALL_PATTERN = { "kind" => nil, "groups" => [], "regexp" => // }
Expand all the shorthands used in the state. |
def expand_state_shorthands(name, state) fill_name(name, state, "State") state.transitions.each do |transition| pattern = transition.pattern = lookup(@syntax.patterns, "pattern", transition.pattern || CATCH_ALL_PATTERN) transition.kind ||= pattern.andand.kind transition.next_state = lookup(@syntax.states, "state", transition.next_state || state) end end
Convert a string name to an actual data reference. |
def lookup(mapping, type, reference) return reference unless String === reference data = mapping[reference] @errors << "Reference to a missing #{type}: #{reference}" unless data return data end
Resolve the start state reference. |
def resolve_start_state return lookup(@syntax.states, "state", @syntax.start_state || "start") || { "name" => "missing_start_state", "kind" => "error", "transitions" => [] } end
The above code modifies the syntax object in place. This is safe because we are
working on a deep_clone
of the original syntax:
Extend the core Hash class. |
class Hash
|
Obtain a deep clone which shares nothing with this hash. |
def deep_clone return YAML.load(to_yaml) end
Deep merge
end
Scanning a file to classified lines is a simple matter of applying the current state transitions to each line:
Scan the next file line. |
def scan_line(line) until state_classified_line(line)
|
Do nothing |
end end
Scan the current line using the current state transitions. Return true if the line was classified, of false if we need to try and classify it again using the updated (next) state. |
def state_classified_line(line) @state.transitions.each do |transition| match = transition.pattern.andand.regexp.andand.match(line) if transition.next_state return classify_matching_line(line, transition, match) if match end classify_error_line(line, @state.name) return true end
If a line matches a state transition, it is classified accordingly. Otherwise, it is reported as an error:
Handle a file line, only if it matches the pattern. |
def classify_matching_line(line, transition, match) @state = transition.next_state kind = transition.kind return false unless kind # A +nil+ kind indicates the next state will classify the line. @lines << Scanner.extracted_groups(match, transition.pattern.groups || []).update({ "line" => line, "kind" => kind, "number" => @errors.line_number }) return true end
Extract named groups from a match. As a special case, indentation is deleted if there is no payload. |
def self.extracted_groups(match, groups) extracted = {} groups.each_with_index do |group, index| extracted[group] = match[index + 1] end extracted.delete("indentation") if match[0] == "" return extracted end
Handle a file line that couldn’t be classified. |
def classify_error_line(line, state_name) @lines << { "line" => line, "indentation" => line.indentation, "payload" => line.unindent, "kind" => "error", "state" => state_name, "number" => @errors.line_number } @errors << "State: #{state_name} failed to classify line: #{@lines.last.payload}" end
Once we have the array of scanned classified lines, we need to merge them into nested chunks. Here is a simple test that demonstrates using the merger:
require "codnar" require "olag/test" require "test/spec"
Test merging classified lines to chunks. |
class TestMergeLines < Test::Unit::TestCase include Test::WithErrors def test_merge_no_chunks lines = [ { "kind" => "code", "line" => "foo", "number" => 1, "indentation" => "", "payload" => "foo" } ] chunks = Codnar::Merger.chunks(@errors, "path", lines) @errors.should == [] chunks.should == [ { "name" => "path", "locations" => [ { "file" => "path", "line" => 1 } ], "containers" => [], "contained" => [], "lines" => lines } ] end def test_valid_merge chunks = Codnar::Merger.chunks(@errors, "path", VALID_LINES) @errors.should == [] chunks.should == VALID_CHUNKS end VALID_LINES = [ { "kind" => "code", "number" => 1, "line" => "before top", "indentation" => "", "payload" => "before top" }, { "kind" => "begin_chunk", "number" => 2, "line" => " {{{ top chunk", "indentation" => " ", "payload" => "top chunk" }, { "kind" => "code", "number" => 3, "line" => " before intermediate", "indentation" => " ", "payload" => "before intermediate" }, { "kind" => "begin_chunk", "number" => 4, "line" => " {{{ intermediate chunk", "indentation" => " ", "payload" => "intermediate chunk" }, { "kind" => "code", "number" => 5, "line" => " before inner", "indentation" => " ", "payload" => "before inner" }, { "kind" => "begin_chunk", "number" => 6, "line" => " {{{ inner chunk", "indentation" => " ", "payload" => "inner chunk" }, { "kind" => "code", "number" => 7, "line" => " inner line", "indentation" => " ", "payload" => "inner line" }, { "kind" => "end_chunk", "number" => 8, "line" => " }}} inner chunk", "indentation" => " ", "payload" => "inner chunk" }, { "kind" => "code", "number" => 9, "line" => " after inner", "indentation" => " ", "payload" => "after inner" }, { "kind" => "end_chunk", "number" => 10, "line" => " }}}", "indentation" => " ", "payload" => "" }, { "kind" => "code", "number" => 11, "line" => " after intermediate", "indentation" => " ", "payload" => "after intermediate" }, { "kind" => "end_chunk", "number" => 12, "line" => " }}} TOP CHUNK", "indentation" => " ", "payload" => "TOP CHUNK" }, { "kind" => "code", "number" => 13, "line" => "after top", "indentation" => "", "payload" => "after top" } ] VALID_CHUNKS = [ { "name" => "path", "locations" => [ { "file" => "path", "line" => 1 } ], "containers" => [], "contained" => [ "top chunk" ], "lines" => [ VALID_LINES[0].merge("indentation" => ""), { "kind" => "nested_chunk", "number" => 2, "line" => " {{{ top chunk", "indentation" => " ", "payload" => "top chunk" }, VALID_LINES[12].merge("indentation" => ""), ] }, { "name" => "top chunk", "locations" => [ { "file" => "path", "line" => 2 } ], "containers" => [ "path" ], "contained" => [ "intermediate chunk" ], "lines" => [ VALID_LINES[1].merge("indentation" => ""), VALID_LINES[2].merge("indentation" => ""), { "kind" => "nested_chunk", "number" => 4, "line" => " {{{ intermediate chunk", "indentation" => " ", "payload" => "intermediate chunk" }, VALID_LINES[10].merge("indentation" => ""), VALID_LINES[11].merge("indentation" => ""), ] }, { "name" => "intermediate chunk", "locations" => [ { "file" => "path", "line" => 4 } ], "containers" => [ "top chunk" ], "contained" => [ "inner chunk" ], "lines" => [ VALID_LINES[3].merge("indentation" => ""), VALID_LINES[4].merge("indentation" => ""), { "kind" => "nested_chunk", "number" => 6, "line" => " {{{ inner chunk", "indentation" => " ", "payload" => "inner chunk" }, VALID_LINES[8].merge("indentation" => ""), VALID_LINES[9].merge("indentation" => ""), ] }, { "name" => "inner chunk", "locations" => [ { "file" => "path", "line" => 6 } ], "containers" => [ "intermediate chunk" ], "contained" => [], "lines" => [ VALID_LINES[5].merge("indentation" => ""), VALID_LINES[6].merge("indentation" => ""), VALID_LINES[7].merge("indentation" => "") ] } ] def test_mismatching_end_chunk_line lines = [ { "kind" => "begin_chunk", "number" => 1, "line" => "{{{ top chunk", "indentation" => "", "payload" => "top chunk" }, { "kind" => "end_chunk", "number" => 2, "line" => "}}} not top chunk", "indentation" => "", "payload" => "not top chunk" } ] Codnar::Merger.chunks(@errors, "path", lines) @errors.should == [ "#{$0}: End line for chunk: not top chunk mismatches begin line for chunk: top chunk in file: path at line: 2" ] end def test_missing_begin_chunk_name lines = [ { "kind" => "begin_chunk", "number" => 1, "line" => "{{{", "indentation" => "", "payload" => "" }, { "kind" => "end_chunk", "number" => 2, "line" => "}}}", "indentation" => "", "payload" => "" } ] Codnar::Merger.chunks(@errors, "path", lines) @errors.should == [ "#{$0}: Begin line for chunk with no name in file: path at line: 1" ] end def test_missing_end_chunk_line lines = [ { "kind" => "begin_chunk", "number" => 1, "line" => "{{{ top chunk", "indentation" => "", "payload" => "top chunk" } ] Codnar::Merger.chunks(@errors, "path", lines) @errors.should == [ "#{$0}: Missing end line for chunk: top chunk in file: path at line: 1" ] end end
And here is the implementation:
module Codnar
|
Merge classified lines into chunks. |
class Merger
|
Convert classified lines from a disk file into chunks. |
def self.chunks(errors, path, lines) return Merger.new(errors, path, lines).chunks end
|
Return merged chunks containing the classified lines. Each chunk lines are only indented relative to the chunk. This allows nested chunks to be presented unindented in the final weaved HTML. |
def chunks @chunks = [ file_chunk ] @stack = @chunks.dup @errors.in_path(@path) { merge_lines } @chunks.each { |chunk| Merger.unindent_lines(chunk.lines) } return @chunks end protected
|
Convert classified lines from a disk file into chunks. |
def initialize(errors, path, lines) @errors = errors @path = path @lines = lines end
|
The top-level all-the-disk-file chunk (without any classified lines) |
def file_chunk return { "name" => @path, "locations" => [ { "file" => @path, "line" => 1 } ], "containers" => [], "contained" => [], "lines" => [] } end
Merging nested chunk lines
Unindenting chunk lines
end end
To merge the nested chunk lines, we maintain a stack of the current chunks.
Each begin_chunk
line pushes another chunk on the stack, and each end_chunk
line pops it. If any chunks are not properly terminated, they will remain in
the stack when all the lines are processed.
Merge all the classified lines into chunks |
def merge_lines @lines.each do |line| @errors.at_line(line.number) merge_line(line) end end_unterminated_chunks end
End all chunks missing a terminating end chunk classified line. |
def end_unterminated_chunks @stack.shift @stack.each do |chunk| @errors << "Missing end line for chunk: #{chunk.name}" end end
Merge the next classified line. |
def merge_line(line) case line.kind when "begin_chunk" begin_chunk_line(line) when "end_chunk" end_chunk_line(line) else @stack.last.lines << line end end
Merge a classified line that starts a new chunk. |
def begin_chunk_line(line) chunk = contained_chunk(container = @stack.last, line) container.contained << chunk.name container.lines << line.merge("kind" => "nested_chunk") @chunks << chunk @stack << chunk end
A chunk contained in another chunk. |
def contained_chunk(container, line) return { "name" => new_chunk_name(line.payload), "locations" => [ { "file" => @path, "line" => line.number } ], "containers" => [ container.name ], "contained" => [], "lines" => [ line ] } end
Return the name of a new chunk. |
def new_chunk_name(name) return name unless name.nil? || name == "" @errors << "Begin line for chunk with no name" return "#{@path}/#{@chunks.size}" end
Merge a classified line that ends an existing chunk. |
def end_chunk_line(line) return missing_begin_chunk_line(line) if @stack.size == 1 chunk = @stack.last @errors << "End line for chunk: #{line.payload} mismatches begin line for chunk: #{chunk.name}" \ unless Merger.matching_end_chunk_line?(chunk, line) chunk.lines << line @stack.pop end
Check whether an end chunk classified line matches the begin chunk classified line. |
def self.matching_end_chunk_line?(chunk, line) line_name = line.payload return line_name.to_s == "" || line_name.to_id == chunk.name.to_id end
Nested chunks are typically indented relative to their container chunks. However, in the generated documentation, these chunks are displayed on their own, and preserving this relative indentation would reduce their readability. We therefore unindent all chunks as much as possible as the final step.
Remove the common indentation from a sequence of classified lines. |
def self.unindent_lines(lines) indentation = Merger.minimal_indentation(lines) lines.each do |line| line.indentation = line.indentation.andand.unindent(indentation) end end
Find out the minimal indentation of all the classified lines. |
def self.minimal_indentation(lines) return lines.map { |line| line.indentation }.compact.min end
Now that we have each chunk's lines, we need to convert them to HTML.
Instead of formatting each line on its own, we batch the operations to work on all lines of the same kind at once. Here is a simple test that demonstrates using the grouper:
require "codnar" require "test/spec"
Test grouping classified lines by their kind. |
class TestGroupLines < Test::Unit::TestCase def test_group_empty_lines Codnar::Grouper.lines_to_groups([]).should == [] end def test_group_one_line Codnar::Grouper.lines_to_groups([ { "kind" => "code" } ]).should == [ [ { "kind" => "code" } ] ] end def test_group_lines Codnar::Grouper.lines_to_groups([ { "kind" => "code", "line" => "0" }, { "kind" => "code", "line" => "1" }, { "kind" => "comment", "line" => "2" }, { "kind" => "code", "line" => "3" }, ]).should == [ [ { "kind" => "code", "line" => "0" }, { "kind" => "code", "line" => "1" }, ], [ { "kind" => "comment", "line" => "2" }, ], [ { "kind" => "code", "line" => "3" }, ] ] end end
And here is the implementation:
module Codnar
|
Group classified lines according to kind. |
module Grouper
|
Convert array of classified lines to array of classified line groups with the same line kind. |
def self.lines_to_groups(lines) groups = lines.reduce([], &method(:group_next_line)) return groups end protected
|
Add the next classified line to the classified line groups. |
def self.group_next_line(groups, next_line) last_group = groups.last if last_group.andand.last.andand.kind == next_line.kind last_group.push(next_line) else groups.push([ next_line ]) end return groups end end end
Formatting is based on a configuration that specifies, for (a group of) lines of each kind, how to convert it to HTML. Here is a simple test that demonstrates using the formatter:
require "codnar" require "olag/test" require "test/spec"
Test converting classified lines to HTML. |
class TestFormatLines < Test::Unit::TestCase include Test::WithErrors alias_method :original_setup, :setup def setup original_setup Codnar::Formatter.send(:public, *Codnar::Formatter.protected_instance_methods) @formatter = Codnar::Formatter.new(@errors, "code" => "Formatter.lines_to_pre_html(lines)", "fail" => "TestFormatLines.fail") end def test_process_html_lines lines_group = @formatter.process_lines_group([ { "kind" => "html", "number" => 1, "payload" => "foo", }, { "kind" => "html", "number" => 2, "payload" => "bar", }, { "kind" => "html", "number" => 3, "payload" => "baz", }, ]) @errors.should == [] lines_group.should == [ { "kind" => "html", "number" => 1, "payload" => "foo\nbar\nbaz" } ] end def test_process_unknown_lines lines_group = @formatter.process_lines_group([ { "kind" => "unknown-kind", "number" => 1, "payload" => "<foo>", }, ]) @errors.should == [ "#{$0}: No formatter specified for lines of kind: unknown-kind" ] lines_group.should == [ { "kind" => "html", "number" => 1, "payload" => "<pre class='missing formatter error'>\n<foo>\n</pre>" } ] end def test_process_code_lines lines_group = @formatter.process_lines_group([ { "kind" => "code", "number" => 1, "payload" => "<foo>", }, { "kind" => "code", "number" => 2, "payload" => "bar", }, ]) @errors.should == [] lines_group.should == [ { "kind" => "html", "number" => 1, "payload" => "<pre>\n<foo>\nbar\n</pre>" } ] end def test_failed_formatter lines_group = @formatter.process_lines_group([ { "kind" => "fail", "number" => 1, "payload" => "foo", } ]) @errors.size.should == 1 @errors.last.should =~ \ /#{$0}: Formatter: TestFormatLines.fail for lines of kind: fail failed with exception:.*in `fail': Reason/ lines_group.should == [ { "kind" => "html", "number" => 1, "payload" => "<pre class='failed formatter error'>\nfoo\n</pre>" } ] end def test_lines_to_html lines_group = @formatter.lines_to_html([ { "kind" => "html", "number" => 1, "payload" => "foo" }, { "kind" => "code", "number" => 2, "payload" => "<bar>" }, { "kind" => "html", "number" => 3, "payload" => "baz" }, ]) @errors.should == [] lines_group.should == "foo\n<pre>\n<bar>\n</pre>\nbaz" end def self.fail raise "Reason" end end
And here is the implementation:
module Codnar
|
Format chunks into HTML. |
class Formatter
|
Construct a Formatter based on a mapping from a classified line kind, to a
Ruby expression, that converts an array of classified lines of that kind,
into an array of lines of another kind. This expression is simply
Formatting repeatedly applies these formatting expressions, until the
result is an array containing a single classified line, which has the kind
The default formatting expression for the kind
If no formatting expression is specified for some classified line kind, an
error is reported and the classified lines are wrapped in a pre HTML
element with a |
def initialize(errors, formatters) @errors = errors @formatters = { "html" => "Formatter.merge_html_lines(lines)" }.merge(formatters) end
|
Repeatedly process an array of classified lines of arbitrary kinds until we obtain a single classified “line” containing a unified final HTML presentation of the original classified lines. |
def lines_to_html(lines) until Formatter.single_html_line?(lines) lines = Grouper.lines_to_groups(lines).map { |group| process_lines_group(group) }.flatten end return lines.last.andand.payload.to_s end protected
|
Check whether we have finally got a single HTML classified “line” for the whole classified lines sequence. |
def self.single_html_line?(lines) return lines.size <= 1 && lines[0].andand.kind == "html" end
|
Perform one pass of processing toward HTML on a group of consecutive classified lines with the same kind. |
def process_lines_group(lines) kind = lines.last.kind formatter = @formatters[kind] ||= missing_formatter(kind) begin return eval formatter rescue return failed_formatter(lines, formatter, $!) end end
|
Return an expression for formatting classified lines of some kind that doesn’t have such a formatting expression already specified. |
def missing_formatter(kind) @errors << "No formatter specified for lines of kind: #{kind}" return "Formatter.lines_to_pre_html(lines, :class => 'missing formatter error')" end
|
Format classified lines as HTML if the original specified formatting expression failed. |
def failed_formatter(lines, formatter, exception) @errors << "Formatter: #{formatter} for lines of kind: #{lines.last.kind} failed with exception: #{exception}" return Formatter.lines_to_pre_html(lines, :class => "failed formatter error") end
Basic formatters
end end
The implementation contains some basic formatting functions. These are sufficient for generic source code processing.
Merge a group of consecutive indented lines into a group with a single classified “line”. The given block is passed the joined content of all the lines, and may process it to yield the merged “line” content. If an explicit indentation is given, it overrides each line’s indentation. This is useful for avoiding the inclusion of the indentation in the payload. |
def self.merge_lines(lines, kind, indentation = nil) payload = yield lines.map { |line| (indentation || line.indentation || "") + (line.payload || "") }.join("\n") merged_line = lines[0] merged_line.merge!("kind" => kind, "payload" => payload) merged_line.delete("indentation") if indentation.nil? return [ merged_line ] end
Merge a group of consecutive HTML classified lines into a group with a single HTML classified “line”. This is the default formatting expression for HTML lines. |
def self.merge_html_lines(lines) return Formatter.merge_lines(lines, "html") { |payload| payload } end
Format classified lines into HTML using a pre element with optional attributes. This is the default formatting expression for classified lines of unknown kinds. |
def self.lines_to_pre_html(lines, attributes = {}) return Formatter.merge_lines(lines, "html") do |payload| ( "<pre" + Formatter.html_attributes(attributes) + ">\n" \ + CGI.escapeHTML(payload) + "\n" \ + "</pre>" ) end end
Convert an attribute mapping to HTML. |
def self.html_attributes(attributes) return "" if attributes == {} return " " + attributes.map { |name, value| "#{name}='#{CGI.escapeHTML(value.to_s)}'" }.join(" ") end
Format classified lines that indicate a nested chunk to HTML. |
def self.nested_chunk_lines_to_html(lines) return lines.each do |line| line.kind = "html" chunk_name = line.payload line.payload = "<pre class='nested chunk'>\n" \ + (line.indentation || "") \ + "<a class='nested chunk' href='##{chunk_name.to_id}'>#{CGI.escapeHTML(chunk_name)}</a>\n" \ + "</pre>" line.delete("indentation") end end
Indent arbitrary HTML lines to line up with the rest of the lines. |
def self.unindented_lines_to_html(lines) merged_line = lines[0] html = lines.map { |line| line.payload + "\n" }.join merged_line.payload = self.indent_html(merged_line.indentation, html) merged_line.kind = "html" return [ merged_line ] end
Indent a chunk of HTML by some spaces. This uses a table, which is arguably the wrong way to do it. |
def self.indent_html(indentation, html) return html.chomp if indentation.nil? return "<table class='layout'>\n<tr>\n" \ + "<td class='indentation'>\n" \ + "<pre>#{indentation}</pre>\n" \ + "</td>\n" \ + "<td class='html'>\n" \ + html \ + "</td>\n" \ + "</tr>\n</table>" end
Cast a sequence of classified lines into a different kind without any processing. |
def self.cast_lines(lines, kind) lines = lines.dup lines.each { |line| line.kind = kind } return lines end
Convert a sequence of marked-up classified lines to (unindented) HTML |
def self.markup_lines_to_html(lines, klass, css_class = nil) implementation = String === klass ? Kernel.const_get(klass) : klass css_class ||= implementation.to_s.downcase.gsub("::", "-") return Formatter.merge_lines(lines, "unindented_html", "") do |payload| ( "<div class='#{css_class} #{lines[0].kind} markup'>\n" \ + implementation.to_html(payload) \ + "</div>" ) end end
The markup_lines_to_html
formatter above relies on the existence of a class
for converting comments from the specific markup format to HTML. Currently, two
such formats are supported:
RDoc, the default markup format used in Ruby comments. Here is a simple test that demonstrates using RDoc:
require "codnar" require "test/spec"
Test expanding RDoc text. |
class TestExpandRDoc < Test::Unit::TestCase def test_emphasis_text Codnar::RDoc.to_html("_text_").should == "<p>\n<em>text</em>\n</p>\n" end def test_strong_text Codnar::RDoc.to_html("*text*").should == "<p>\n<strong>text</strong>\n</p>\n" end def test_indented_pre Codnar::RDoc.to_html("base\n indented\n more\nback\n").should \ == "<p>\nbase\n</p>\n<pre>indented\n more</pre>\n<p>\nback\n</p>\n" end end
And here is the implementation:
module Codnar
|
Convert RDoc to HTML. |
module RDoc
|
Process a RDoc String and return the resulting HTML. |
def self.to_html(rdoc) return ::RDoc::Markup::ToHtml.new.convert(rdoc).clean_markup_html end end end
Markdown, a generic markup syntax used across many systems and languages. Here is a simple test that demonstrates using Markdown:
require "codnar" require "test/spec"
Test expanding Markdown text. |
class TestExpandMarkdown < Test::Unit::TestCase def test_emphasis_text Codnar::Markdown.to_html("*text*").should == "<p>\n<em>text</em>\n</p>\n" end def test_strong_text Codnar::Markdown.to_html("**text**").should == "<p>\n<strong>text</strong>\n</p>\n" end def test_embed_chunk Codnar::Markdown.to_html("[[Chunk|template]]").should == "<p>\n<embed src='chunk' type='x-codnar/template'/>\n</p>\n" end def test_embed_anchor Codnar::Markdown.to_html("[[#Name]]").should == "<p>\n<a id='name'/>\n</p>\n" end def test_embed_link Codnar::Markdown.to_html("[Label](#Name)").should == "<p>\n<a href=\"#name\">Label</a>\n</p>\n" end end
And here is the implementation:
module Codnar
|
Convert Markdown to HTML. |
module Markdown
|
Process a Markdown String and return the resulting HTML. In addition to the normal Markdown syntax, processing supports the following Codnar-specific extensions:
|
def self.to_html(markdown) markdown = embed_chunks(markdown) markdown = id_anchors(markdown) html = RDiscount.new(markdown).to_html html = id_links(html) return html.clean_markup_html end protected
|
Expand “[[chunk|template]]” to HTML embed tags. Use identifiers instead
of names in the |
def self.embed_chunks(markdown) return markdown.gsub(/\[\[(.*?)\|(.*?)\]\]/) do src = $1 template = $2 src = src.to_id unless Codnar::Weaver::FILE_TEMPLATE_PROCESSORS.include?(template) "<embed src='#{src}' type='x-codnar/#{template}'/>" end end
|
Expand “[[#name]]” anchors to HTML anchor tags with the matching identifier. |
def self.id_anchors(markdown) return markdown.gsub(/\[\[#(.*?)\]\]/) { "<a id='#{$1.to_id}'/>" } end
|
Expand “href=‘#name’” links to the matching “href=‘#id’” links. |
def self.id_links(html) return html.gsub(/href=(["'])#(.*?)(["'])/) { "href=#{$1}##{$2.to_id}#{$3}" } end end end
Haddock, a specific markup syntax used in comments to document Haskell code. Here is a simple test that demonstrates using Haddock:
require "codnar" require "test/spec"
Test expanding Haddock text. |
class TestExpandHaddock < Test::Unit::TestCase def test_normal_text Codnar::Haddock.to_html("normal").should == "<p>normal\n</p>\n" end def test_identifier_text Codnar::Haddock.to_html("'Int'").should == "<p><code>Int</code>\n</p>\n" end def test_emphasis_text Codnar::Haddock.to_html("/emphasis/").should == "<p><em>emphasis</em>\n</p>\n" end def test_code_text Codnar::Haddock.to_html("@code@").should == "<pre>code</pre>\n" end end
And here is the implementation:
module Codnar
|
Convert Haddoc to HTML. |
class Haddock
|
Process a Haddock String and return the resulting HTML. |
def self.to_html(haddock) with_temporary_directory do |path| write_temporary_file(path, haddock) run_haddock(path) html = read_html_file(path) clean_html(html) end end protected
|
Run a block using a temporary directory, that is then removed. TODO: This should be in some more generic place. |
def self.with_temporary_directory path = create_temporary_directory result = yield path FileUtils.rm_rf(path) return result end
|
Create a temporary directory to run Haddock in. |
def self.create_temporary_directory file = Tempfile.open("dir", ".") path = file.path File.delete(path) Dir.mkdir(path) return path end
|
Minimal header to insert before the Haddock String to trick Haddock into generating HTML from it. |
HADDOCK_HEADER = <<-EOF.unindent module Wrapper where -- $doc EOF
|
Write the Haddock String into a wrapper Haskell file so we’ll be able to run Haddock to generate HTML from it. |
def self.write_temporary_file(path, haddock) File.open(path + "/wrapper.hs", "w") do |file| file.write(HADDOCK_HEADER) haddock = self.patch_module_comments(haddock) file.write("-- " + haddock.gsub("\n", "\n-- ")) end end
|
Convert structured module comments to a definition list. TODO: This is rather flaky. |
def self.patch_module_comments(haddock) haddock.sub!(/^\s*Module\s*:\s*\$\s*header\s*\$\s*$/i, "Module") return haddock.gsub(/^(\s*)(Module|Description|Copyright|License|Maintainer|Stability|Portability)(\s*):/, "\n\\1[@\\2@]\\3") end
|
Run Haddock to convert the wrapper Haskell file into HTML documentation. |
def self.run_haddock(path) system("cd #{path} && haddock --html wrapper.hs > haddock.out 2>&1") end
|
Read the HTML generated by Haddock. |
def self.read_html_file(path) return File.read(path + "/Wrapper.html") end
|
Extract the clean generated HTML from Haddock’s output. |
def self.clean_html(html) html.gsub!("\r\n", "\n") html.sub!(/.*<div class="doc">/m, '') html.sub!(/<\/div><\/div><\/div><div id="footer">.*/m, "\n") return html end end end
In all cases, the HTML generated by the markup format conversion is a bit messy. We therefore clean it up:
Clean HTML generated by markup formatters. Such HTML tends to have extra empty lines for no apparent reason. Cleaning it up seems to be safe enough, and eliminates the ugly additional vertical space in the final HTML. |
def clean_markup_html return gsub("\r\n", "\n") \ .gsub(/\n*<p>\n*/, "\n<p>\n") \ .gsub(/\n*<\/p>\n*/, "\n</p>\n") \ .gsub(/\n*<pre>\n+/, "\n<pre>\n") \ .gsub(/\n+<\/pre>\n*/, "\n</pre>\n") \ .sub(/^\n*/, "") end
If you have graphviz
installed, it is possible to use it to generate SVG
diagrams that can be embedded directly into the HTML. This is implemented as an
additional formatter; in principle, you this allows embeding the GraphViz
directives directly in the code, but in practice people prefer keeping the
diagrams as separate files.
We pre-process the GraphViz directives using the m4
macro processor. This
allows dramatically reducing the amount of repeated boilerplate in the diagram
definitions, by defining macros for node and edge styles and, if desired, more
advanced techniques.
Here is a simple test that demonstrates generating SVG from a GraphViz diagram:
require "codnar" require "test/spec"
Test highlighting syntax using GVim. |
class TestGraphVizDiagrams < Test::Unit::TestCase MINIMAL_DIAGRAM_SVG = <<-EOF.unindent #! ((( svg
<svg width="62pt" height="116pt" viewBox="0.00 0.00 62.00 116.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> <g id="graph1" class="graph" transform="scale(1 1) rotate(0) translate(4 112)"> <title>_anonymous_0</title> <polygon fill="white" stroke="white" points="-4,5 -4,-112 59,-112 59,5 -4,5"/> <!-- A --> <g id="node1" class="node"><title>A</title> <ellipse fill="none" stroke="black" cx="27" cy="-90" rx="27" ry="18"/> <text text-anchor="middle" x="27" y="-85.4" font-family="Times New Roman,serif" font-size="14.00">A</text> </g> <!-- B --> <g id="node3" class="node"><title>B</title> <ellipse fill="none" stroke="black" cx="27" cy="-18" rx="27" ry="18"/> <text text-anchor="middle" x="27" y="-13.4" font-family="Times New Roman,serif" font-size="14.00">B</text> </g> <!-- A->B --> <g id="edge2" class="edge"><title>A->B</title> <path fill="none" stroke="black" d="M27,-71.8314C27,-64.131 27,-54.9743 27,-46.4166"/> <polygon fill="black" stroke="black" points="30.5001,-46.4132 27,-36.4133 23.5001,-46.4133 30.5001,-46.4132"/> </g> </g> </svg> EOF
#! ))) svg def test_valid_diagram diagram = <<-EOF.unindent #! ((( dot
define(`X', `A') digraph { X -> B; } EOF
#! ))) dot Codnar::GraphViz.to_html(diagram).should == MINIMAL_DIAGRAM_SVG end def test_invalid_diagram diagram = <<-EOF.unindent #! ((( dot
digraph { A -> EOF
#! ))) dot lambda { Codnar::GraphViz.to_html(diagram) }.should.raise end end
And here is the implementation:
module Codnar
|
Generate diagrams using GraphViz. |
class GraphViz
|
Convert a string containing a GraphViz diagram into SVG suitable for embedding into the HTML documentation. We pre-process the diagram using M4 to allow cutting down on the boilerplate (repeating the same styles in many nodes etc.). This should not be harmful for diagrams that do not use M4 commands. |
def self.to_html(diagram) stdin, stdout, stderr = Open3.popen3("m4 | dot -Tsvg") write_diagram(stdin, diagram) check_for_errors(stderr) return clean_output(stdout) end protected
|
Send the diagram to the commands pipe. |
def self.write_diagram(stdin, diagram) stdin.write(diagram) stdin.close end
|
Ensure we got no processing errors from either m4 or dot. If we did, raise them, and they will be handled by the formatter wrapping code. |
def self.check_for_errors(stderr) errors = stderr.read raise errors.sub(/Error: <stdin>:\d+: /, "") if errors != "" end
|
Clean the SVG we got to make it suitable for embedding in HTML. |
def self.clean_output(stdout) return stdout.read.sub(/.*<svg/m, "<svg").gsub(/\r/, "") end end end
If you have GVim istalled, it is possible to use it to generate syntax highlighting. This is a slow operation, as GVim was never meant to be used as a command-line tool. However, what it lacks in speed it compensates for in scope; almost any language you can think of has a GVim syntax highlighting definition. Here is a simple test that demonstrates using GVim for syntax highlighting:
require "codnar" require "test/spec"
Test highlighting syntax using GVim. |
class TestGVimHighlightSyntax < Test::Unit::TestCase def setup Codnar::GVim.force_recompute = true end def teardown Codnar::GVim.force_recompute = false end def test_ruby_no_css ruby = <<-EOF.unindent def foo return bar = baz end EOF Codnar::GVim.cached_syntax_to_html(ruby, "ruby").should == <<-EOF.unindent #! ((( html
<div class='ruby code syntax' bgcolor="#ffffff" text="#000000"> <font face="monospace"> <font color="#ff40ff">def</font> <font color="#00ffff">foo</font><br /> <font color="#ffff00">return</font> bar = baz<br /> <font color="#ff40ff">end</font><br /> </font> </div> EOF
#! ))) html end def test_ruby_css ruby = <<-EOF.unindent def foo return bar = baz end EOF Codnar::GVim.cached_syntax_to_html(ruby, "ruby", [ "+:let html_use_css=1" ]).should == <<-EOF.unindent #! ((( html
<pre class='ruby code syntax'> <span class="PreProc">def</span> <span class="Identifier">foo</span> <span class="Statement">return</span> bar = baz <span class="PreProc">end</span> </pre> EOF
#! ))) html
end
end
And here is the implementation:
module Codnar
|
Syntax highlight using GVim. |
class GVim
|
Convert a sequence of classified code lines to HTML using GVim syntax
highlighting. The commands array allows configuring the way that GVim will
format the output (see the |
def self.lines_to_html(lines, syntax, commands = []) return Formatter.merge_lines(lines, "html") do |payload| GVim.cached_syntax_to_html(payload + "\n", syntax, commands).chomp end end
|
The cache used for speeding up recomputing the same syntax highlighting HTML. |
@cache = Cache.new(".gvim-cache") do |data| GVim.uncached_syntax_to_html(data.text, data.syntax, data.commands) end
|
Force recomputation of the syntax highlighting HTML, even if a cached version exists. |
def self.force_recompute=(force_recompute) @cache.force_recompute = force_recompute end
|
Highlight syntax of text using GVim. This uses the GVim standard CSS classes to mark keywords, identifiers, and so on. See the GVim documentation for details. The commands array allows configuring the way that GVim will format the output. For example:
Additional commands may be useful; GVim provides a full scripting environment so there is no theoretical limit to what can be done here.
Since GVim is as slow as molasses to start up, we cache the results of
highlighting the syntax of each code fragment in a directory called
|
def self.cached_syntax_to_html(text, syntax, commands = []) data = { "text" => text, "syntax" => syntax, "commands" => commands } return @cache[data] end
|
Highlight syntax of text using GVim, without caching. This is
slow (measured in seconds), due to GVim’s start-up tim.
See the |
def self.uncached_syntax_to_html(text, syntax, commands = []) file = write_temporary_file(text) run_gvim(file, syntax, commands) html = read_html_file(file) delete_temporary_files(file) return clean_html(html, syntax) end protected
|
Write the text to highlight the syntax of into a temporary file. |
def self.write_temporary_file(text) file = Tempfile.open("codnar-") file.write(text) file.close(false) return file end
|
Run GVim to highlight the syntax of a temporary file. This uses the little-known ability of GVim to emit the syntax highlighting as HTML using only command-line arguments. |
def self.run_gvim(file, syntax, commands) path = file.path ENV["DISPLAY"] = "none" # Otherwise the X11 server *does* affect the result. command = [ "gvim", "-f", "-X", "-u", "none", "-U", "none", "+:let html_ignore_folding=1", "+:let use_xhtml=1", "+:let html_use_css=0", "+:syn on", "+:set syntax=#{syntax}", commands, "+run! syntax/2html.vim", "+:f #{path}", "+:wq", "+:q", path ] system("echo '\n' | '#{command.flatten.join("' '")}' > /dev/null 2>&1") end
|
Read the HTML with the syntax highlighting written out by GVim. |
def self.read_html_file(file) return File.read(html_file_path(file)) end
|
Delete both the text and HTML temporary files. |
def self.delete_temporary_files(file) File.delete(html_file_path(file)) file.delete end
|
Find the path of the generate HTML file. You’d think it would be predictable, but it ends up either “.html” or “.xhtml” depending on the system. |
def self.html_file_path(file) return Dir.glob(file.path + ".*html")[0] end
|
Extract the clean highlighted syntax HTML from GVim’s HTML output. |
def self.clean_html(html, syntax) if html =~ /<pre>/ html.sub!(/.*?<pre>/m, "<pre class='#{syntax} code syntax'>") html.sub!("</body>\n</html>\n", "") else html.sub!(/.*?<body/m, "<div class='#{syntax} code syntax'") html.sub!("</body>\n</html>\n", "</div>\n") end return html end end end
Since GVim is so slow, we are using caching to minimize the time it takes to recompute the same code's highlighted HTML. This is pretty useful in practice - making changes in one chunk in a file will not require recomputing the highlighting for any of the unchanged chunks in the same file. Here is a simple test of using the caching functionality:
require "codnar" require "olag/test" require "test/spec"
Test caching long computations. |
class TestCacheComputations < Test::Unit::TestCase include Test::WithTempfile def test_cached_computation cache = make_addition_cache(directory = create_tempdir("..")) cache[1].should == 2 File.open(Dir.glob(directory + "/*")[0], "w") { |file| file.puts("3") } cache[1].should == 3 cache.force_recompute = true cache[1].should == 2 end def test_uncached_computation stderr = capture_stderr { make_addition_cache("no-such-directory")[1].should == 2 } stderr.should.include?("no-such-directory") end protected
|
Run a block and capture its standard error (without using FakeFS). |
def capture_stderr stderr_path = write_tempfile("stderr", "") Olag::Globals.without_changes do $stderr = File.open(stderr_path, "w") yield end return File.read(stderr_path) end
|
Create a cache for the “+ 1” operation. |
def make_addition_cache(directory) return Codnar::Cache.new(directory) { |value| value + 1 } end end
And here is the implementation:
module Codnar
|
Cache long computations in disk files. |
class Cache
|
Whether to recompute values even if they are cached. |
attr_accessor :force_recompute
|
Connect to an existing disk cache. The cache is expected to be stored in a directory of the specified name, which is either in the current working directory or in one of its parent directories. |
def initialize(directory, &block) @force_recompute = false @computation = block @directory = find_directory(Dir.pwd, directory) if @directory class <<self; alias [] :cached_computation; end else class <<self; alias [] :uncached_computation; end $stderr.puts("#{$0}: Could not find cache directory: #{directory}.") end end
|
Access the results of the computation for the specified input. Fetch the result from the cache if it is there, otherwise invoke the computation and store the result in the cache for the next time. |
def cached_computation(input) file = cache_file(input) return YAML.load_file(file) if File.exists?(file) and not @force_recompute result = @computation.call(input) File.open(file, "w") { |file| file.write(result.to_yaml) } return result end
|
Return the file expected to cache the computed results for a given input, |
def cache_file(input) key = Digest.hexencode(Digest::SHA2.digest(input.to_yaml)) return @directory + "/" + key + ".yaml" end
|
Access the results of a computation for the specified input, in case we do not have a cache directory to look for and store the results in. |
def uncached_computation(input) return @computation.call(input) end protected
|
Find the path of the cache directory, search from the given working directory upward until finding a match. |
def find_directory(working_directory, cache_directory) directory = working_directory + "/" + cache_directory return directory if File.exists?(directory) parent_directory = File.dirname(working_directory) return nil if parent_directory == working_directory return find_directory(parent_directory, cache_directory) end end end
CodeRay is a Ruby gem that knows how to highlight the syntax of many popular languages. It is much faster than using GVim`but doesn't offer the huge range of programming languages offered by GVim (for example, it does not currently offer shell script highlighting). If your languages are covered by it, it may serve as a convenient replacement to the slow GVim approach.
Here is a simple test that demonstrates using CodeRay for syntax highlighting:
require "codnar" require "test/spec"
Test highlighting syntax using CodeRay. |
class TestCodeRayHighlightSyntax < Test::Unit::TestCase def test_coderay_lines Codnar::CodeRay.lines_to_html([ { "kind" => "ruby_code", "number" => 1, "indentation" => "", "payload" => "def foo" }, { "kind" => "ruby_code", "number" => 2, "indentation" => " ", "payload" => "return 1" }, { "kind" => "ruby_code", "number" => 3, "indentation" => "", "payload" => "end" }, ], "ruby").should == [ { "kind" => "html", "number" => 1, "payload" => <<-EOF.unindent.chomp <div class="CodeRay"> <div class="code"><pre><span style="color:#080;font-weight:bold">def</span> <span style="color:#06B;font-weight:bold">foo</span> <span style="color:#080;font-weight:bold">return</span> <span style="color:#00D">1</span> <span style="color:#080;font-weight:bold">end</span></pre></div> </div> EOF }, ] end end
And here is the implementation:
module Codnar
|
Extend the CodeRay module. |
module CodeRay
|
Convert a sequence of classified code lines to HTML using CodeRay syntax highlighting. The options control the way CodeRay behaves (e.g., <tt>:css > :class</tt>). |
def self.lines_to_html(lines, syntax, options = {}) return Formatter.merge_lines(lines, "html") do |payload| ::CodeRay.scan(payload, syntax).div(options).chomp end end end end
Sunlight offers a different approach for syntax highlighting. Instead of pre-processing the code to generate highlighted HTML while splitting, it provides Javascript files that examine the textual code in the DOM and convert it to highlighted HTML in the browser. This takes virtually no time when splitting the code, but requires recomputing highlighting for all the code chunks every time the HTML file is loaded. This can be pretty slow, especially if using a browser with a slow Javascript engine, like IE. However, this may be a reasonable trade-off, at least for small projects. Since Sunlight is a new project, it supports a limited range of programming languages.
Here is a simple test that demonstrates using Sunlight for syntax highlighting:
require "codnar" require "test/spec"
Test highlighting syntax using Sunlight. |
class TestSunlightHighlightSyntax < Test::Unit::TestCase def test_sunlight_lines Codnar::Sunlight.lines_to_html([ { "kind" => "ruby_code", "number" => 1, "indentation" => "", "payload" => "def foo" }, { "kind" => "ruby_code", "number" => 2, "indentation" => " ", "payload" => "return 1" }, { "kind" => "ruby_code", "number" => 3, "indentation" => "", "payload" => "end" }, ], "ruby").should == [ { "kind" => "html", "number" => 1, "payload" => <<-EOF.unindent.chomp <pre class='sunlight-highlight-ruby'> def foo return 1 end </pre> EOF }, ] end end
And here is the implementation:
module Codnar
|
Syntax highlight using Sunlight. |
class Sunlight
|
Convert a sequence of classified code lines to HTML using Sunlight syntax
highlighting. All we need to do is wrap the lines in an HTML
|
def self.lines_to_html(lines, syntax) return Formatter.lines_to_pre_html(lines, :class => "sunlight-highlight-#{syntax}") end end end
Now that we have all the separate pieces of functionality for splitting source files into HTML chunks, we need to combine them to a single convenient service.
Here is a simple test that demonstrates using the splitter for source code files:
require "codnar" require "olag/test" require "test/spec"
Test splitting code files. |
class TestSplitCode < Test::Unit::TestCase include Test::WithErrors include Test::WithTempfile def test_split_ruby splitter = Codnar::Splitter.new(@errors, RUBY_CONFIGURATION) path = write_tempfile("ruby.rb", RUBY_FILE) chunks = splitter.chunks(path) @errors.should == [] chunks.should == ruby_chunks(path) end protected def ruby_chunks(path) RUBY_CHUNKS[0].name = path RUBY_CHUNKS[1].containers[0] = path RUBY_CHUNKS.each { |chunk| chunk.locations[0].file = path } return RUBY_CHUNKS end RUBY_FILE = <<-EOF.unindent.gsub("#!", "#") #! This is *rdoc*. #! {{{ assignment local = $global indented #! }}} EOF RUBY_CONFIGURATION = { "formatters" => { "code" => "Formatter.cast_lines(lines, 'ruby')", "comment" => "Formatter.cast_lines(lines, 'rdoc')", "ruby" => "GVim.lines_to_html(lines, 'ruby')", "rdoc" => "Formatter.markup_lines_to_html(lines, Codnar::RDoc, 'rdoc')", "begin_chunk" => "[]", "end_chunk" => "[]", "nested_chunk" => "Formatter.nested_chunk_lines_to_html(lines)", "unindented_html" => "Formatter.unindented_lines_to_html(lines)", }, "syntax" => { "start_state" => "ruby", "patterns" => { "comment" => { "regexp" => "^(\\s*)#\\s*(.*)$" }, "code" => { "regexp" => "^(\\s*)(.*)$" }, "begin_chunk" => { "regexp" => "^(\\s*)\\W*\\{\\{\\{\\s*(.*?)\\s*$" }, "end_chunk" => { "regexp" => "^(\\s*)\\W*\\}\\}\\}\\s*(.*?)\\s*$" }, }, "states" => { "ruby" => { "transitions" => [ { "pattern" => "begin_chunk" }, { "pattern" => "end_chunk" }, { "pattern" => "comment" }, { "pattern" => "code" }, ], }, }, }, } RUBY_CHUNKS = [ { "name" => "PATH", "locations" => [ "file" => "PATH", "line" => 1 ], "containers" => [], "contained" => [ "assignment" ], "html" => <<-EOF.unindent.chomp, #! ((( html
<table class='layout'> <tr> <td class='indentation'> <pre></pre> </td> <td class='html'> <div class='rdoc rdoc markup'> <p> This is <strong>rdoc</strong>. </p> </div> </td> </tr> </table> <pre class='nested chunk'> <a class='nested chunk' href='#assignment'>assignment</a> </pre> EOF
#! ))) html }, { "name" => "assignment", "containers" => [ "PATH" ], "contained" => [], "locations" => [ "file" => "PATH", "line" => 2 ], "html" => <<-EOF.unindent.chomp, #! ((( html
<div class='ruby code syntax' bgcolor="#ffffff" text="#000000"> <font face="monospace"> local = <font color="#00ffff">$global</font><br /> indented<br /> </font> </div> EOF
#! ))) html
} ]
end
And here is the implementation:
module Codnar
|
Split disk files into chunks. |
class Splitter
|
Construct a splitter based on a configuration in the following structure: syntax: <syntax> formatters: <kind>: <expression> Where the syntax is passed as-is to (and expanded in-place by) a Scanner, and the formatters are passed as-is to a Formatter to convert the chunk’s classified lines into HTML. |
def initialize(errors, configuration) @errors = errors @configuration = configuration @scanner = Scanner.new(errors, configuration.syntax) @formatter = Formatter.new(errors, configuration.formatters) end
|
Split a disk file into HTML chunks. |
def chunks(path) lines = @scanner.lines(path) chunks = Merger.chunks(@errors, path, lines) chunks.each { |chunk| chunk.html = @formatter.lines_to_html(chunk.delete("lines")) } return chunks end end end
The narrative documentation is expected to reside in one or more files, which are also "split" to a single chunk each. Having both documentation and code exist as chunks allows for uniform treatment of both when weaving, as well as allowing for pre-processing the documentation files, if necessary. For example, Codnar currently supports for documentation the same two markup formats that are also supported for code comments. Here is a simple test that demonstrates "splitting" documentation (using the same implementation as above):
require "codnar" require "olag/test" require "test/spec"
Test “splitting” documentation files. |
class TestSplitDocumentation < Test::Unit::TestCase include Test::WithErrors include Test::WithFakeFS def test_split_raw write_fake_file("raw.html", "<foo>\nbar\n</foo>\n") splitter = Codnar::Splitter.new(@errors, configuration("html")) chunks = splitter.chunks("raw.html") @errors.should == [] chunks.should == [ { "name" => "raw.html", "containers" => [], "contained" => [], "locations" => [ { "file" => "raw.html", "line" => 1 } ], "html" => "<foo>\nbar\n</foo>" } ] end def test_split_markdown write_fake_file("markdown.md", "*foo*\nbar\n") splitter = Codnar::Splitter.new(@errors, configuration("markdown")) chunks = splitter.chunks("markdown.md") @errors.should == [] chunks.should == [ { "name" => "markdown.md", "containers" => [], "contained" => [], "locations" => [ { "file" => "markdown.md", "line" => 1 } ], "html" => "<div class='markdown markdown markup'>\n<p>\n<em>foo</em>\nbar\n</p>\n</div>" } ] end def test_split_rdoc write_fake_file("rdoc.rdoc", "*foo*\nbar\n") splitter = Codnar::Splitter.new(@errors, configuration("rdoc")) chunks = splitter.chunks("rdoc.rdoc") @errors.should == [] chunks.should == [ { "name" => "rdoc.rdoc", "containers" => [], "contained" => [], "locations" => [ { "file" => "rdoc.rdoc", "line" => 1 } ], "html" => "<div class='rdoc rdoc markup'>\n<p>\n<strong>foo</strong> bar\n</p>\n</div>" } ] end def test_split_unknown_kind write_fake_file("unknown.kind", "foo\nbar\n") splitter = Codnar::Splitter.new(@errors, configuration("unknown-kind")) chunks = splitter.chunks("unknown.kind") @errors.should == [ "#{$0}: No formatter specified for lines of kind: unknown-kind" ] chunks.should == [ { "name" => "unknown.kind", "containers" => [], "contained" => [], "locations" => [ { "file" => "unknown.kind", "line" => 1 } ], "html" => "<pre class='missing formatter error'>\nfoo\nbar\n</pre>" } ] end protected def configuration(kind) return { "formatters" => { "markdown" => "Formatter.markup_lines_to_html(lines, Markdown, 'markdown')", "unindented_html" => "Formatter.unindented_lines_to_html(lines)", "rdoc" => "Formatter.markup_lines_to_html(lines, Codnar::RDoc, 'rdoc')", }, "syntax" => { "start_state" => kind, "patterns" => { kind => { "regexp" => "^(.*)$", "groups" => [ "payload" ] }, }, "states" => { kind => { "transitions" => [ { "pattern" => kind } ] } } } } end end
The splitting mechanism defined above is pretty generic. To apply it to a specific system requires providing the appropriate configuration. The system provides a few specific built-in configurations which may be useful "out of the box".
If one is willing to give up altogether on syntax highlighting and comment formatting, the system would be applicable as-is to any programming language. Properly highlighting almost any known programming language syntax would be a simple matter of passing the correct syntax parameter to GVim.
Properly formatting comments in additional mark-up formats would be trickier.
First, a proper pattern needs to be established for extracting the comments
(/*
, //
, --
, etc.). Them, the results need to be converted to HTML. One
way would be to pass them through GVim syntax highlighting with an appropriate
format (e.g, syntax=doxygen
). Another would be to invoke some Ruby library;
finally, one could invoke some external tool to do the job. The latter two
options would require providing additional glue Ruby code, similar to the GVim
class above.
At any rate, here are the built-in configurations:
module Codnar
|
A module for all the “built-in” configurations. The names of these configurations can be passed to the –require option of any Codnar Application. |
module Configuration include Documentation include Code include Comments include Highlighting end end
Different source files require different overall configurations but reuse common building blocks. To support it, we allow comfigurations to be combined using a "deep merge". This allows complex nested structures to be merged. There is even a way for arrays to append elements before/after the array they are merged with. Here is a simple test that demonstrates deep-merging complex structures:
require "codnar" require "test/spec"
Test deep-merging complex structures. |
class TestDeepMerge < Test::Unit::TestCase def test_deep_merge default = { "only_default" => "default_value", "overriden" => "default_value", "overriden_array" => [ "default_value" ], "merged_array" => [ "default_value" ], } override = { "only_override" => "overriden_value", "overriden" => "overriden_value", "overriden_array" => [ "overriden_value" ], "merged_array" => [ "overriden_value", [] ], } default.deep_merge(override).should == { "only_default" => "default_value", "only_override" => "overriden_value", "overriden" => "overriden_value", "overriden_array" => [ "overriden_value" ], "merged_array" => [ "overriden_value", "default_value" ], } end end
Here is the implementation:
|
Perform a deep merge with another hash. |
def deep_merge(hash) return merge(hash, &Hash::method("deep_merger")) end protected
|
Return a Hash merger that recursively merges nested hashes. |
def self.deep_merger(key, default, override) if Hash === default && Hash === override default.deep_merge(override) elsif Array === default && Array === override Hash.deep_merge_arrays(default, override) else override end end
|
If the overriding data array contains an empty array element (“[]”), it is replaced by the default data array being overriden. |
def self.deep_merge_arrays(default, override) embed_index = override.find_index([]) return override unless embed_index override = override.dup override[embed_index..embed_index] = default return override end
And here is a test module that automates the process of merging configurations and invoking the Splitter:
Tests with Codnar split configurations. |
module Test::WithConfigurations
|
Test running the Splitter with merged configurations. |
def check_split_file(file_text, *configurations, &block) configuration = configurations.inject({}) do |merged_configuration, next_configuration| merged_configuration.deep_merge(next_configuration) end splitter = Codnar::Splitter.new(@errors, configuration) chunks = splitter.chunks(path = write_tempfile("splitted", file_text)) @errors.should == [] chunks.should == yield(path) end end
These are pretty simple configurations, applicable to files containing a piece of the narrative in some supported format. These configurations typically do not require to be combined with other configurations. Here is a simple test that demonstrates "splitting" documentation:
require "codnar" require "olag/test" require "test/spec" require "test_with_configurations"
Test the built-in split documentation configurations. |
class TestSplitDocumentationConfigurations < Test::Unit::TestCase include Test::WithConfigurations include Test::WithErrors #!include Test::WithFakeFS - until FakeFS fixes the tempfile issue. include Test::WithTempfile HTML_FILE = <<-EOF.unindent #! ((( html
<p>This is an HTML file.</p> EOF
# ))) html def test_split_html_documentation check_split_file(HTML_FILE, Codnar::Configuration::SPLIT_HTML_DOCUMENTATION) do |path| [ { "name" => path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [], "html" => HTML_FILE.chomp } ] end end PRE_FILE = <<-EOF.unindent This is a preformatted raw text file. EOF def test_split_pre_documentation check_split_file(PRE_FILE, Codnar::Configuration::SPLIT_PRE_DOCUMENTATION) do |path| [ { "name" => path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [], "html" => "<pre class='doc'>\n" + PRE_FILE + "</pre>" } ] end end MARKUP_FILE = <<-EOF.unindent This is a *marked-up* file. EOF RDOC_HTML = <<-EOF.unindent.chomp #! ((( html
<div class='rdoc doc markup'> <p> This is a <strong>marked-up</strong> file. </p> </div> EOF
# ))) html def test_split_rdoc_documentation check_split_file(MARKUP_FILE, Codnar::Configuration::SPLIT_RDOC_DOCUMENTATION) do |path| [ { "name" => path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [], "html" => RDOC_HTML, } ] end end MARKDOWN_HTML = <<-EOF.unindent.chomp #! ((( html
<div class='markdown doc markup'> <p> This is a <em>marked-up</em> file. </p> </div> EOF
#! ))) html def test_split_markdown_documentation check_split_file(MARKUP_FILE, Codnar::Configuration::SPLIT_MARKDOWN_DOCUMENTATION) do |path| [ { "name" => path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [], "html" => MARKDOWN_HTML, } ] end end end
And here are the actual configurations:
module Codnar module Configuration
|
Configurations for “splitting” documentation files. |
module Documentation
|
“Split” a documentation file. All lines are assumed to have the same
kind This is the default configuration as it performs the minimal amount of processing on the input. It isn’t the most useful configuration. |
SPLIT_HTML_DOCUMENTATION = { "formatters" => { "doc" => "Formatter.cast_lines(lines, 'html')", }, "syntax" => { "patterns" => { "doc" => { "regexp" => "^(.*)$", "groups" => [ "payload" ] }, }, "states" => { "start" => { "transitions" => [ { "pattern" => "doc" } ] }, }, }, }
|
“Split” a documentation file containing arbitrary text, which is preserved by escaping it and wrapping it in an HTML pre element. |
SPLIT_PRE_DOCUMENTATION = SPLIT_HTML_DOCUMENTATION.deep_merge( "formatters" => { "doc" => "Formatter.lines_to_pre_html(lines, :class => :doc)", } )
|
“Split” a documentation file containing pure RDoc documentation. |
SPLIT_RDOC_DOCUMENTATION = SPLIT_HTML_DOCUMENTATION.deep_merge( "formatters" => { "doc" => "Formatter.markup_lines_to_html(lines, Codnar::RDoc, 'rdoc')", "unindented_html" => "Formatter.unindented_lines_to_html(lines)", } )
|
“Split” a documentation file containing pure Markdown documentation. |
SPLIT_MARKDOWN_DOCUMENTATION = SPLIT_HTML_DOCUMENTATION.deep_merge( "formatters" => { "doc" => "Formatter.markup_lines_to_html(lines, Codnar::Markdown, 'markdown')", "unindented_html" => "Formatter.unindented_lines_to_html(lines)", } )
|
“Split” a documentation file containing a GraphViz diagram. |
SPLIT_GRAPHVIZ_DOCUMENTATION = SPLIT_HTML_DOCUMENTATION.deep_merge( "formatters" => { "doc" => "Formatter.markup_lines_to_html(lines, Codnar::GraphViz, 'graphviz')", "unindented_html" => "Formatter.unindented_lines_to_html(lines)", } ) end end end
Splitting source code files is a more complex affair, which does typically require combining several configurations.
module Codnar module Configuration
|
Configurations for splitting source code. |
module Code
Source code lines classification configurations
Nested foreign syntax code islands configurations
end end end
The basic configuration marks all lines as belonging to some code syntax, as a single chunk:
Classify all lines as source code of some syntax (kind). This doesn’t
distinguish between comment and code lines; to do that, you need to combine
this with comment classification configuration(s). Also, it just formats
the lines in an HTML |
CLASSIFY_SOURCE_CODE = lambda do |syntax| return { "formatters" => { "#{syntax}_code" => "Formatter.lines_to_pre_html(lines, :class => :code)", }, "syntax" => { "patterns" => { "#{syntax}_code" => { "regexp" => "^(\\s*)(.*)$" }, }, "states" => { "start" => { "transitions" => [ { "pattern" => "#{syntax}_code" }, ], }, }, }, } end
Sometimes, a code in one syntax contains nested "islands" of code in another syntax. Here is a simple configuration to support that, which can be combined with the above basic configuration:
Allow for comments containing “((( <syntax>” and “))) <syntax>” to designate nested islands of foreign syntax inside the normal code. The designator comment lines are always treated as part of the surrounding code, not as part of the nested foreign syntax code. There is no further classification of the nested foreign syntax code. Therefore, the nested code is not examined for begin/end chunk markers. Likewise, the nested code may not contain deeper nested code using a third syntax. |
CLASSIFY_NESTED_CODE = lambda do |outer_syntax, inner_syntax| { "syntax" => { "patterns" => { "start_#{inner_syntax}_in_#{outer_syntax}" => { "regexp" => "^(\\s*)(.*\\(\\(\\(\\s*#{inner_syntax}.*)$" }, "end_#{inner_syntax}_in_#{outer_syntax}" => { "regexp" => "^(\\s*)(.*\\)\\)\\)\\s*#{inner_syntax}.*)$" }, "#{inner_syntax}_in_#{outer_syntax}" => { "regexp" => "^(\\s*)(.*)$" }, }, "states" => { "start" => { "transitions" => [ { "pattern" => "start_#{inner_syntax}_in_#{outer_syntax}", "kind" => "#{outer_syntax}_code", "next_state" => "#{inner_syntax}_in_#{outer_syntax}" }, [], ], }, "#{inner_syntax}_in_#{outer_syntax}" => { "transitions" => [ { "pattern" => "end_#{inner_syntax}_in_#{outer_syntax}", "kind" => "#{outer_syntax}_code", "next_state" => "start" }, { "pattern" => "#{inner_syntax}_in_#{outer_syntax}", "kind" => "#{inner_syntax}_code" }, ], }, }, }, } end
Here is a simple test demonstrating using source code lines classifications:
require "codnar" require "olag/test" require "test/spec" require "test_with_configurations"
Test combinations of the built-in split code configurations. |
class TestSplitCodeConfigurations < Test::Unit::TestCase include Test::WithConfigurations include Test::WithErrors include Test::WithTempfile SOURCE_CODE = <<-EOF.unindent a = b b = 1 EOF def test_source_code check_split_file(SOURCE_CODE, Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("ruby")) do |path| [ { "name" => path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [], "html" => "<pre class='code'>\n#{SOURCE_CODE}</pre>" } ] end end ISLAND_CODE = <<-EOF.unindent a = b b = 1 HTML = <<-EOH.unindent # ((( html
<p> HTML </p> EOH
# ))) html EOF ISLAND_HTML = <<-EOF.unindent.chomp <pre class='ruby code syntax'> a = b b = <span class="Constant">1</span> <span class="Type">HTML</span> = <<-<span class="Special">EOH</span>.unindent <span class="Comment"># ((( html</span>
</pre> <pre class='html code syntax'> <span class="Identifier"><</span><span class="Statement">p</span><span class="Identifier">></span> HTML <span class="Identifier"></</span><span class="Statement">p</span><span class="Identifier">></span> EOH </pre> <pre class='ruby code syntax'>
<span class="Comment"># ))) html</span> </pre> EOF def test_island_code check_split_file(ISLAND_CODE, Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("ruby"), Codnar::Configuration::FORMAT_CODE_GVIM_CSS.call("ruby"), Codnar::Configuration::CLASSIFY_NESTED_CODE.call("ruby", "html"), Codnar::Configuration::FORMAT_CODE_GVIM_CSS.call("html")) do |path| [ { "name" => path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [], "html" => ISLAND_HTML } ] end end end
Classifying comment lines is the most complex part of splitting source code files, requiring the use of one or more configurations specific to the language used.
module Codnar module Configuration
|
Configurations for splitting source code with comments. |
module Comments
Simple comment classification configurations
Denoted comment classification configurations
Delimited comment classification configurations
Comment formatting configurations
end end end
Many languages use a simple comment syntax, where some prefix indicates a
comment that spans until the end of the line (e.g., shell #
comments or C++
//
comments).
Classify simple comment lines. It accepts a restricted format: each comment is expected to start with some exact prefix (e.g. “#” for shell style comments or “//” for C++ style comments). The following space, if any, is stripped from the payload. As a convenience, comment that starts with “!” is not taken to start a comment. This both protects the 1st line of shell scripts (“#!”), and also any other line you wish to avoid being treated as a comment.
This configuration is typically complemented by an additional one
specifying how to format the (stripped!) comments; by default they are just
displayed as-is using an HTML |
CLASSIFY_SIMPLE_COMMENTS = lambda do |prefix| return Comments.simple_comments(prefix) end
Classify simple shell (“#”) comment lines. |
CLASSIFY_SHELL_COMMENTS = lambda do return Comments.simple_comments("#") end
Classify simple C++ (“//”) comment lines. |
CLASSIFY_CPP_COMMENTS = lambda do return Comments.simple_comments("//") end
Configuration for classifying lines to comments and code based on a simple prefix (e.g. “#” for shell style comments or “//” for C++ style comments). |
def self.simple_comments(prefix) return { "syntax" => { "patterns" => { "comment_#{prefix}" => { "regexp" => "^(\\s*)#{prefix}(?!!)\\s?(.*)$" }, }, "states" => { "start" => { "transitions" => [ { "pattern" => "comment_#{prefix}", "kind" => "comment" }, [] ], }, }, }, } end
Here is a simple test demonstrating using simple comment classifications:
require "codnar" require "olag/test" require "test/spec" require "test_with_configurations"
Test built-in split simple comment configurations. |
class TestSplitSimpleCommentsConfigurations < Test::Unit::TestCase include Test::WithConfigurations include Test::WithErrors include Test::WithTempfile def test_custom_comments check_any_comment("!", Codnar::Configuration::CLASSIFY_SIMPLE_COMMENTS.call("!")) end def test_shell_comments check_any_comment("#", Codnar::Configuration::CLASSIFY_SHELL_COMMENTS.call) end def test_cpp_comments check_any_comment("//", Codnar::Configuration::CLASSIFY_CPP_COMMENTS.call) end protected
|
The “?” will be replaced by the simple comment prefix. |
ANY_COMMENT_CODE = <<-EOF.unindent ? ? Comment Code ?! Not comment EOF def check_any_comment(prefix, configuration) check_split_file(ANY_COMMENT_CODE.gsub("?", prefix), Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("any"), Codnar::Configuration::FORMAT_PRE_COMMENTS, configuration) do |path| [ { "name" => path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [], "html" => "<pre class='comment'>\n\nComment\n</pre>\n<pre class='code'>\nCode\n#{prefix}! Not comment\n</pre>" } ] end end end
Sometimes some simple comments require special treatment if they are denoted by
some leading prefix. For example, Haskell simple comments start with --
but
Haddock (documentation) comments start with -- |
, -- ^
etc.
Classify denoted comment lines. Denoted comments are similar to simple comments, except that the 1st simple comment line must start with a specific prefix (e.g., in haddock, comment lines start with ‘–’ but haddoc comments start with ‘– |’, ‘– ^’, etc.). The comment continues in additional simple comment lines.
This configuration is typically complemented by an additional one
specifying how to format the (stripped!) comments; by default they are just
displayed as-is using an HTML |
CLASSIFY_DENOTED_COMMENTS = lambda do |start_prefix, continue_prefix| return Comments.denoted_comments(start_prefix, continue_prefix) end
Classify denoted haddock (“–”) comment lines. Note that non-haddock comment lines are not captured; they would treated as code and handled by syntax highlighting, if any. |
CLASSIFY_HADDOCK_COMMENTS = lambda do return Comments.denoted_comments("-- [|^$]", "--") end
Configuration for classifying lines to comments and code based on a start comment prefix and continuation comment prefix (e.g., “– |” and “–” for haddock). |
def self.denoted_comments(start_prefix, continue_prefix)
|
Ruby coverage somehow barfs if we inline this. Go figure. |
start_transition = { "pattern" => "comment_start_#{start_prefix}", "next_state" => "comment_continue_#{continue_prefix}", "kind" => "comment" } return { "syntax" => { "patterns" => { "comment_start_#{start_prefix}" => { "regexp" => "^(\\s*)#{start_prefix}\\s?(.*)$" }, "comment_continue_#{continue_prefix}" => { "regexp" => "^(\\s*)#{continue_prefix}\\s?(.*)$" }, }, "states" => { "start" => { "transitions" => [ start_transition, [] ], }, "comment_continue_#{continue_prefix}" => { "transitions" => [ { "pattern" => "comment_continue_#{continue_prefix}", "kind" => "comment" }, { "next_state" => "start" } ], }, }, }, } end
Here is a simple test demonstrating using denoted comment classifications:
require "codnar" require "olag/test" require "test/spec" require "test_with_configurations"
Test built-in split denoted comment configurations. |
class TestSplitDenotedCommentsConfigurations < Test::Unit::TestCase include Test::WithConfigurations include Test::WithErrors include Test::WithTempfile def test_custom_comments check_any_comment("// @", "//", Codnar::Configuration::CLASSIFY_DENOTED_COMMENTS.call("// @", "//")) end def test_haddoc_comments check_any_comment("-- |", "--", Codnar::Configuration::CLASSIFY_HADDOCK_COMMENTS.call) end protected
|
The “<<<” will be replaced by the start comment prefix, and the “>>>” will be replaced by the continue comment prefix. |
ANY_COMMENT_CODE = <<-EOF.unindent >>> Not start comment <<< Start comment >>> Continue comment Not a comment EOF
|
The “>>>” will be replaced by the continue comment prefix. |
ANY_COMMENT_HTML = <<-EOF.unindent.chomp # ((( html
<pre class='code'> >>> Not start comment </pre> <pre class='comment'> Start comment Continue comment </pre> <pre class='code'> Not a comment </pre> EOF # ))) def check_any_comment(start_prefix, continue_prefix, configuration) check_split_file(ANY_COMMENT_CODE.gsub("<<<", start_prefix).gsub(">>>", continue_prefix), Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("any"), Codnar::Configuration::FORMAT_PRE_COMMENTS, configuration) do |path| [ { "name" => path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [], "html" => ANY_COMMENT_HTML.gsub(">>>", continue_prefix), } ] end end end
Other languages use a delimited multi-line comment syntax, where some prefix
indicates the beginning of the comment, some suffix indicates the end, and by
convention some prefix is expected for the inner comment lines (e.g., C's
"/*
", "*
", "*/
" comments or HTML's "<!--
", "-
", "-->
" comments).
Classify delimited comment lines. It accepts a restricted format: each comment is expected to start with some exact prefix (e.g. “/*” for C style comments or “<!–” for HTML style comments). The following space, if any, is stripped from the payload. Following lines are also considered comments; a leading inner line prefix (e.g., “ *” for C style comments or “ -” for HTML style comments) with an optional following space are stripped from the payload. Finally, a line containing some exact suffix (e.g. “*/” for C style comments, or “–>” for HTML style comments) ends the comment. A one line comment format is also supported containing the prefix, the payload, and the suffix. As a convenience, comment that starts with “!” is not taken to start a comment. This allows protecting comment block you wish to avoid being classified as a comment.
This configuration is typically complemented by an additional one
specifying how to format the (stripped!) comments; by default they are just
displayed as-is using an HTML |
CLASSIFY_DELIMITED_COMMENTS = lambda do |prefix, inner, suffix| return Comments.delimited_comments(prefix, inner, suffix) end
Classify delimited C (“/*”, “ *”, “ */”) style comments. |
CLASSIFY_C_COMMENTS = lambda do
|
Since the prefix/inner/suffix passed to the configuration are regexps, we need to escape special characters such as “*”. |
return Comments.delimited_comments("/\\*", " \\*", " \\*/") end
Classify delimited HTML (“<!–”, “ -”, “–>”) style comments. |
CLASSIFY_HTML_COMMENTS = lambda do return Comments.delimited_comments("<!--", " -", "-->") end
Configuration for classifying lines to comments and code based on a delimited start prefix, inner line prefix and final suffix (e.g., “/*”, “ *”, “ */” for C-style comments or “<!–”, “ -”, “–>” for HTML style comments). |
def self.delimited_comments(prefix, inner, suffix) return { "syntax" => { "patterns" => { "comment_prefix_#{prefix}" => { "regexp" => "^(\\s*)#{prefix}(?!!)\\s?(.*)$" }, "comment_inner_#{inner}" => { "regexp" => "^(\\s*)#{inner}\\s?(.*)$" }, "comment_suffix_#{suffix}" => { "regexp" => "^(\\s*)#{suffix}\\s*$" }, "comment_line_#{prefix}_#{suffix}" => { "regexp" => "^(\\s*)#{prefix}(?!!)\s?(.*?)\s*#{suffix}\\s*$" }, }, "states" => { "start" => { "transitions" => [ { "pattern" => "comment_line_#{prefix}_#{suffix}", "kind" => "comment" }, { "pattern" => "comment_prefix_#{prefix}", "kind" => "comment", "next_state" => "comment_#{prefix}" }, [], ], }, "comment_#{prefix}" => { "transitions" => [ { "pattern" => "comment_suffix_#{suffix}", "kind" => "comment", "next_state" => "start" }, { "pattern" => "comment_inner_#{inner}", "kind" => "comment" }, ], }, }, }, } end
Here is a simple test demonstrating using delimited comment classifications:
require "codnar" require "olag/test" require "test/spec" require "test_with_configurations"
Test built-in split delimited comment configurations. |
class TestSplitDelimitedCommentsConfigurations < Test::Unit::TestCase include Test::WithConfigurations include Test::WithErrors include Test::WithTempfile def test_custom_comments
|
Since the prefix/inner/suffix passed to the configuration are regexps, we need to escape special characters such as “{” and “|”. |
check_any_comment([ "@{", " |", " }@" ], Codnar::Configuration::CLASSIFY_DELIMITED_COMMENTS.call("@\\{", " \\|", " \\}@")) end def test_c_comments check_any_comment([ "/*", " *", " */" ], Codnar::Configuration::CLASSIFY_C_COMMENTS.call) end def test_html_comments check_any_comment([ "<!--", " -", "-->" ], Codnar::Configuration::CLASSIFY_HTML_COMMENTS.call) end protected
|
The “<<<” will be replaced by the start comment prefix, the “<>” will be replaced by the inner line comment prefix, and the “>>>” will be replaced by the end comment suffix. |
ANY_COMMENT_CODE = <<-EOF.unindent <<< One-line comment >>> Code <<< <> Multi-line <> comment. >>> EOF ANY_COMMENT_HTML = <<-EOF.unindent.chomp # ((( html
<pre class='comment'> One-line comment </pre> <pre class='code'> Code </pre> <pre class='comment'> Multi-line comment. </pre> EOF # ))) def check_any_comment(patterns, configuration) prefix, inner, suffix = patterns check_split_file(ANY_COMMENT_CODE.gsub("<<<", prefix).gsub(">>>", suffix).gsub("<>", inner), Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("any"), Codnar::Configuration::FORMAT_PRE_COMMENTS, configuration) do |path| [ { "name" => path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [], "html" => ANY_COMMENT_HTML.gsub("/--", prefix).gsub("--/", suffix).gsub(" -", inner), } ] end end end
In many cases, the text inside comments is written using some markup format (e.g., RDoc for Ruby or JavaDoc for Java). Currently, two such formats are supported, as well as simply wrapping the comment in an HTML pre element:
Format comments as HTML pre elements. Is used to complement a configuration
that classifies some lines as |
FORMAT_PRE_COMMENTS = { "formatters" => { "comment" => "Formatter.lines_to_pre_html(lines, :class => :comment)", }, }
Format comments that use the RDoc notation. Is used to complement a
configuration that classifies some lines as |
FORMAT_RDOC_COMMENTS = { "formatters" => { "comment" => "Formatter.markup_lines_to_html(lines, Codnar::RDoc, 'rdoc')", "unindented_html" => "Formatter.unindented_lines_to_html(lines)", }, }
Format comments that use the Markdown notation. Is used to complement a
configuration that classifies some lines as |
FORMAT_MARKDOWN_COMMENTS = { "formatters" => { "comment" => "Formatter.markup_lines_to_html(lines, Markdown, 'markdown')", "unindented_html" => "Formatter.unindented_lines_to_html(lines)", }, }
Format comments that use the Haddock notation. Is used to complement a
configuration that classifies some lines as |
FORMAT_HADDOCK_COMMENTS = { "formatters" => { "comment" => "Formatter.markup_lines_to_html(lines, Haddock, 'haddock')", "unindented_html" => "Formatter.unindented_lines_to_html(lines)", }, }
Here is a simple test demonstrating formatting comment contents:
require "codnar" require "olag/test" require "test/spec" require "test_with_configurations"
Test built-in split comment formatting configurations. |
class TestFormatCommentsConfigurations < Test::Unit::TestCase include Test::WithConfigurations include Test::WithErrors include Test::WithTempfile COMMENT_TEXT = <<-EOF.unindent.gsub("#!", "#") #! Comment *text*. EOF PRE_HTML = <<-EOF.unindent.chomp <pre class='comment'> Comment *text*. </pre> EOF def test_pre_comments check_any_format(PRE_HTML, Codnar::Configuration::FORMAT_PRE_COMMENTS) end RDOC_HTML = <<-EOF.unindent.chomp <table class='layout'> <tr> <td class='indentation'> <pre></pre> </td> <td class='html'> <div class='rdoc comment markup'> <p> Comment <strong>text</strong>. </p> </div> </td> </tr> </table> EOF def test_rdoc_comments check_any_format(RDOC_HTML, Codnar::Configuration::FORMAT_RDOC_COMMENTS) end MARKDOWN_HTML = <<-EOF.unindent.chomp <table class='layout'> <tr> <td class='indentation'> <pre></pre> </td> <td class='html'> <div class='markdown comment markup'> <p> Comment <em>text</em>. </p> </div> </td> </tr> </table> EOF def test_markdown_comments check_any_format(MARKDOWN_HTML, Codnar::Configuration::FORMAT_MARKDOWN_COMMENTS) end protected def check_any_format(html, configuration) check_split_file(COMMENT_TEXT, Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("any"), Codnar::Configuration::CLASSIFY_SHELL_COMMENTS.call, configuration) do |path| [ { "name" => path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [], "html" => html, } ] end end end
Highlighting the syntax of the source code embedded in the documentation improved readability. Codnar provides several ways to achieve this.
module Codnar module Configuration
|
Configurations for highlighting source code lines. |
module Highlighting
GVim syntax highlighting formatting configurations
CodeRay syntax highlighting formatting configurations
Sunlight syntax highlighting formatting configurations
Chunk splitting configurations
end end end
Supporting almost any known programming language (other than dealing with comments) is very easy using GVim for syntax highlighting, as demonstrated here:
Format code using GVim’s syntax highlighting, using explicit HTML constructs. Assumes some previous configuration already classified the code lines. |
FORMAT_CODE_GVIM_HTML = lambda do |syntax| return Highlighting.klass_code_format('GVim', syntax, "[]") end
Format code using GVim’s syntax highlighting, using CSS classes instead of explicit font and color styles. Assumes some previous configuration already classified the code lines. |
FORMAT_CODE_GVIM_CSS = lambda do |syntax| return Highlighting.klass_code_format('GVim', syntax, "[ '+:let html_use_css=1' ]") end
Return a configuration for highlighting a specific syntax using GVim. |
def self.klass_code_format(klass, syntax, options) return { "formatters" => { "#{syntax}_code" => "#{klass}.lines_to_html(lines, '#{syntax}', #{options})", }, } end
If you choose to use CSS classes instead of directly embedding fonts and colors into the generated HTML, you will need a CSS stylesheet with the relevant classes. Here is the default CSS stylesheet used by GVim:
Colors for GVim classes |
span.Constant { color: Crimson; } span.Identifier { color: Teal; } span.PreProc { color: Indigo; } span.Special { color: Navy; } span.Statement { color: Maroon; } span.Type { color: Green; } span.Comment { color: Purple; }
For supported programming languages, you may choose to use CodeRay instead of GVim.
Format code using CodeRay’s syntax highlighting, using explicit HTML constructs. Assumes some previous configuration already classified the code lines. |
FORMAT_CODE_CODERAY_HTML = lambda do |syntax| return Highlighting.klass_code_format('CodeRay', syntax, "{}") end
Format code using CodeRay’s syntax highlighting, using CSS classes instead of explicit font and color styles. Assumes some previous configuration already classified the code lines. |
FORMAT_CODE_CODERAY_CSS = lambda do |syntax| return Highlighting.klass_code_format('CodeRay', syntax, "{ :css => :class }") end
If you choose to use CSS classes instead of directly embedding fonts and colors into the generated HTML, you will need a CSS stylesheet with the relevant classes. Here is the default CSS stylesheet used by CodeRay:
Extracted from CodeRay output |
.CodeRay .line-numbers a { text-decoration: inherit; color: inherit; } .CodeRay { background-color: hsl(0,0%,95%); border: 1px solid silver; color: black; } .CodeRay pre { margin: 0px; } span.CodeRay { white-space: pre; border: 0px; padding: 2px; } table.CodeRay { border-collapse: collapse; width: 100%; padding: 2px; } table.CodeRay td { padding: 2px 4px; vertical-align: top; } .CodeRay .line-numbers { background-color: hsl(180,65%,90%); color: gray; text-align: right; -webkit-user-select: none; -moz-user-select: none; user-select: none; } .CodeRay .line-numbers a { background-color: hsl(180,65%,90%) !important; color: gray !important; text-decoration: none !important; } .CodeRay .line-numbers a:target { color: blue !important; } .CodeRay .line-numbers .highlighted { color: red !important; } .CodeRay .line-numbers .highlighted a { color: red !important; } .CodeRay span.line-numbers { padding: 0px 4px; } .CodeRay .line { display: block; float: left; width: 100%; } .CodeRay .code { width: 100%; } .CodeRay .code pre { overflow: auto; } .CodeRay .debug { color: white !important; background: blue !important; } .CodeRay .annotation { color:#007 } .CodeRay .attribute-name { color:#b48 } .CodeRay .attribute-value { color:#700 } .CodeRay .binary { color:#509 } .CodeRay .char .content { color:#D20 } .CodeRay .char .delimiter { color:#710 } .CodeRay .char { color:#D20 } .CodeRay .class { color:#B06; font-weight:bold } .CodeRay .class-variable { color:#369 } .CodeRay .color { color:#0A0 } .CodeRay .comment { color:#777 } .CodeRay .comment .char { color:#444 } .CodeRay .comment .delimiter { color:#444 } .CodeRay .complex { color:#A08 } .CodeRay .constant { color:#036; font-weight:bold } .CodeRay .decorator { color:#B0B } .CodeRay .definition { color:#099; font-weight:bold } .CodeRay .delimiter { color:black } .CodeRay .directive { color:#088; font-weight:bold } .CodeRay .doc { color:#970 } .CodeRay .doc-string { color:#D42; font-weight:bold } .CodeRay .doctype { color:#34b } .CodeRay .entity { color:#800; font-weight:bold } .CodeRay .error { color:#F00; background-color:#FAA } .CodeRay .escape { color:#666 } .CodeRay .exception { color:#C00; font-weight:bold } .CodeRay .float { color:#60E } .CodeRay .function { color:#06B; font-weight:bold } .CodeRay .global-variable { color:#d70 } .CodeRay .hex { color:#02b } .CodeRay .imaginary { color:#f00 } .CodeRay .include { color:#B44; font-weight:bold } .CodeRay .inline { background-color: hsla(0,0%,0%,0.07); color: black } .CodeRay .inline-delimiter { font-weight: bold; color: #666 } .CodeRay .instance-variable { color:#33B } .CodeRay .integer { color:#00D } .CodeRay .key .char { color: #60f } .CodeRay .key .delimiter { color: #404 } .CodeRay .key { color: #606 } .CodeRay .keyword { color:#080; font-weight:bold } .CodeRay .label { color:#970; font-weight:bold } .CodeRay .local-variable { color:#963 } .CodeRay .namespace { color:#707; font-weight:bold } .CodeRay .octal { color:#40E } .CodeRay .operator { } .CodeRay .predefined { color:#369; font-weight:bold } .CodeRay .predefined-constant { color:#069 } .CodeRay .predefined-type { color:#0a5; font-weight:bold } .CodeRay .preprocessor { color:#579 } .CodeRay .pseudo-class { color:#00C; font-weight:bold } .CodeRay .regexp .content { color:#808 } .CodeRay .regexp .delimiter { color:#404 } .CodeRay .regexp .modifier { color:#C2C } .CodeRay .regexp { background-color:hsla(300,100%,50%,0.06); } .CodeRay .reserved { color:#080; font-weight:bold } .CodeRay .shell .content { color:#2B2 } .CodeRay .shell .delimiter { color:#161 } .CodeRay .shell { background-color:hsla(120,100%,50%,0.06); } .CodeRay .string .char { color: #b0b } .CodeRay .string .content { color: #D20 } .CodeRay .string .delimiter { color: #710 } .CodeRay .string .modifier { color: #E40 } .CodeRay .string { background-color:hsla(0,100%,50%,0.05); } .CodeRay .symbol .content { color:#A60 } .CodeRay .symbol .delimiter { color:#630 } .CodeRay .symbol { color:#A60 } .CodeRay .tag { color:#070 } .CodeRay .type { color:#339; font-weight:bold } .CodeRay .value { color: #088; } .CodeRay .variable { color:#037 } .CodeRay .insert { background: hsla(120,100%,50%,0.12) } .CodeRay .delete { background: hsla(0,100%,50%,0.12) } .CodeRay .change { color: #bbf; background: #007; } .CodeRay .head { color: #f8f; background: #505 } .CodeRay .head .filename { color: white; } .CodeRay .delete .eyecatcher { background-color: hsla(0,100%,50%,0.2); border: 1px solid hsla(0,100%,45%,0.5); margin: -1px; border-bottom: none; border-top-left-radius: 5px; border-top-right-radius: 5px; } .CodeRay .insert .eyecatcher { background-color: hsla(120,100%,50%,0.2); border: 1px solid hsla(120,100%,25%,0.5); margin: -1px; border-top: none; border-bottom-left-radius: 5px; border-bottom-right-radius: 5px; } .CodeRay .insert .insert { color: #0c0; background:transparent; font-weight:bold } .CodeRay .delete .delete { color: #c00; background:transparent; font-weight:bold } .CodeRay .change .change { color: #88f } .CodeRay .head .head { color: #f4f }
For small projects in supported languages, you may choose to use Sunlight instead of GVim.
Format code using Sunlight’s syntax highlighting. This assumes the HTML will include and invoke Sunlight’s Javascript file which does the highlighting on the fly inside the DOM, instead of pre-computing it when splitting the file. |
FORMAT_CODE_SUNLIGHT = lambda do |syntax| return Highlighting.sunlight_code_format(syntax) end
Return a configuration for highlighting a specific syntax using Sunlight. |
def self.sunlight_code_format(syntax) return { "formatters" => { "#{syntax}_code" => "Sunlight.lines_to_html(lines, '#{syntax}')", }, } end
Here is a simple test demonstrating highlighting code syntax using the different configurations (GVim, CodeRay, or Sunlight):
require "codnar" require "olag/test" require "test/spec" require "test_with_configurations"
Test built-in split code formatting configurations. |
class TestFormatCodeConfigurations < Test::Unit::TestCase include Test::WithConfigurations include Test::WithErrors include Test::WithTempfile def test_gvim_html_code check_any_code(<<-EOF.unindent.chomp, Codnar::Configuration::FORMAT_CODE_GVIM_HTML.call("c")) <div class='c code syntax' bgcolor=\"#ffffff\" text=\"#000000\"> <font face=\"monospace\"> <font color=\"#00ff00\">int</font> x;<br /> </font> </div> EOF end def test_gvim_css_code check_any_code(<<-EOF.unindent.chomp, Codnar::Configuration::FORMAT_CODE_GVIM_CSS.call("c")) <pre class='c code syntax'> <span class=\"Type\">int</span> x; </pre> EOF end def test_coderay_html_code check_any_code(<<-EOF.unindent.chomp, Codnar::Configuration::FORMAT_CODE_CODERAY_HTML.call("c")) <div class="CodeRay"> <div class="code"><pre><span style="color:#0a5;font-weight:bold">int</span> x;</pre></div> </div> EOF end def test_coderay_css_code check_any_code(<<-EOF.unindent.chomp, Codnar::Configuration::FORMAT_CODE_CODERAY_CSS.call("c")) <div class="CodeRay"> <div class="code"><pre><span class="predefined-type">int</span> x;</pre></div> </div> EOF end def test_sunlight_code check_any_code(<<-EOF.unindent.chomp, Codnar::Configuration::FORMAT_CODE_SUNLIGHT.call("c")) <pre class='sunlight-highlight-c'> int x; </pre> EOF end protected def check_any_code(html, configuration) check_split_file("int x;\n", Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("c"), configuration) do |path| [ { "name" => path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [], "html" => html, } ] end end end
There are many ways to denote code "regions" (which become Codnar chunks). The following covers GVim's default scheme; others are easily added. It is safest to merge this configuration as the last of all the combined configurations, to ensure its patterns end up before any others.
Group lines into chunks using VIM-style “{{{”/“}}}” region designations. Assumes other configurations handle the actual content lines. |
CHUNK_BY_VIM_REGIONS = { "formatters" => { "begin_chunk" => "[]", "end_chunk" => "[]", "nested_chunk" => "Formatter.nested_chunk_lines_to_html(lines)", }, "syntax" => { "patterns" => { "begin_chunk" => { "regexp" => "^(\\s*)\\W*\\{\\{\\{\\s*(.*?)\\s*$" }, "end_chunk" => { "regexp" => "^(\\s*)\\W*\\}\\}\\}\\s*(.*?)\\s*$" }, }, "states" => { "start" => { "transitions" => [ { "pattern" => "begin_chunk" }, { "pattern" => "end_chunk" }, [], ], }, }, }, }
Here is a simple test demonstrating splitting code chunks:
require "codnar" require "olag/test" require "test/spec" require "test_with_configurations"
Test built-in split code formatting configurations. |
class TestSplitChunkConfigurations < Test::Unit::TestCase include Test::WithConfigurations include Test::WithErrors include Test::WithTempfile CODE_TEXT = <<-EOF.unindent.gsub("#!", "#") int x; #! {{{ chunk int y; #! }}} EOF CODE_HTML = <<-EOF.unindent.chomp <pre class='code'> int x; </pre> <pre class='nested chunk'> <a class='nested chunk' href='#chunk'>chunk</a> </pre> EOF CHUNK_HTML = <<-EOF.unindent.chomp <pre class='code'> int y; </pre> EOF def test_gvim_chunks check_split_file(CODE_TEXT, Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("c"), Codnar::Configuration::CHUNK_BY_VIM_REGIONS) do |path| [ { "name"=> path, "locations" => [ { "file" => path, "line" => 1 } ], "containers" => [], "contained" => [ "chunk" ], "html"=> CODE_HTML, }, { "name" => "chunk", "locations" => [ { "file" => path, "line" => 2 } ], "containers" => [ path ], "contained" => [], "html" => CHUNK_HTML, } ] end end end
Here is a test demonstrating putting several of the above configurations together in a meaningful way:
require "codnar" require "olag/test" require "test/spec" require "test_with_configurations"
Test combination of many built-in configurations. |
class TestSplitCombinedConfigurations < Test::Unit::TestCase include Test::WithConfigurations include Test::WithErrors include Test::WithTempfile CODE_TEXT = <<-EOF.unindent.gsub("#!", "#") #!!/usr/bin/ruby -w #! {{{ HTML snippet HELLO_WORLD_IN_HTML = <<-EOH.unindent.chomp #! ((( html
<p> Hello, world! </p> EOH
#! ))) html #! }}} #! {{{ Ruby code #! Hello, *world*! puts HELLO_WORLD_IN_HTML #! }}} EOF FILE_HTML = <<-EOF.unindent.chomp <pre class='ruby code syntax'> <span class="PreProc">#!/usr/bin/ruby -w</span> </pre> <pre class='nested chunk'> <a class='nested chunk' href='#html-snippet'>HTML snippet</a> </pre> <pre class='ruby code syntax'> </pre> <pre class='nested chunk'> <a class='nested chunk' href='#ruby-code'>Ruby code</a> </pre> EOF HTML_CHUNK = <<-EOF.unindent.chomp <pre class='ruby code syntax'> <span class="Type">HELLO_WORLD_IN_HTML</span> = <<-<span class="Special">EOH</span>.unindent.chomp <span class="Comment"># ((( html</span>
</pre> <pre class='html code syntax'> <span class="Identifier"><</span><span class="Statement">p</span><span class="Identifier">></span> Hello, world! <span class="Identifier"></</span><span class="Statement">p</span><span class="Identifier">></span> EOH </pre> <pre class='ruby code syntax'>
<span class="Comment"># ))) html</span> </pre> EOF RUBY_CHUNK = <<-EOF.unindent.chomp <pre class='ruby code syntax'> </pre> <table class='layout'> <tr> <td class='indentation'> <pre></pre> </td> <td class='html'> <div class='rdoc comment markup'> <p> Hello, <strong>world</strong>! </p> </div> </td> </tr> </table> <pre class='ruby code syntax'> puts <span class="Type">HELLO_WORLD_IN_HTML</span> </pre> EOF def test_gvim_chunks check_split_file(CODE_TEXT, Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("ruby"), Codnar::Configuration::FORMAT_CODE_GVIM_CSS.call("ruby"), Codnar::Configuration::CLASSIFY_NESTED_CODE.call("ruby", "html"), Codnar::Configuration::FORMAT_CODE_GVIM_CSS.call("html"), Codnar::Configuration::CLASSIFY_SHELL_COMMENTS.call, Codnar::Configuration::FORMAT_RDOC_COMMENTS, Codnar::Configuration::CHUNK_BY_VIM_REGIONS) do |path| [ { "name" => path, "html" => FILE_HTML, "locations" => [ { "line" => 1, "file" => path } ], "containers" => [], "contained" => [ "HTML snippet", "Ruby code" ], }, { "name" => "HTML snippet", "html" => HTML_CHUNK, "locations" => [ { "line" => 3, "file" => path } ], "containers" => [ path ], "contained" => [], }, { "name" => "Ruby code", "html" => RUBY_CHUNK, "locations" => [ { "line" => 14, "file" => path } ], "containers" => [ path ], "contained" => [], } ] end end end
In any realistic system, the number of source files and chunks will be such that it makes sense to store the chunks on the disk for further processing. This allows incorporating the split operation as part of a build tool chain, and only re-splitting modified files. Here is a simple test demonstrating writing chunks to the disk:
require "codnar" require "olag/test" require "test/spec"
Test writing chunks to files. |
class TestWriteChunks < Test::Unit::TestCase include Test::WithFakeFS def test_write_chunks check_writing_data([]) check_writing_data("name" => "foo") check_writing_data([ { "name" => "foo" }, { "name" => "bar" } ]) end def test_write_invalid_data lambda { check_writing_data("not a chunk") }.should.raise end protected def check_writing_data(data) Codnar::Writer.write("path", data) data = [ data ] unless Array === data YAML.load_file("path").should == data end end
And here is the implementation:
module Codnar
|
Write chunks into a disk file. |
class Writer
|
Write one chunk or an array of chunks to a disk file. |
def self.write(path, data) self.new(path) do |writer| writer << data end end
|
Add one chunk or an array of chunks to the disk file. |
def <<(data) case data when Array @chunks += data when Hash @chunks << data else raise "Invalid data class: #{data.class}" end end protected
|
Write chunks into the specified disk file. |
def initialize(path, &block) @chunks = [] File.open(path, "w") do |file| block.call(self) file.print(@chunks.to_yaml) end end end end
Having written the chunks to the disk requires us, at some following point in time, to read them back into memort. This is the first time we will have a view of the whole documented system, which allows us to detect several classes of consistency errors: Some chunks may be left out of the final narrative (consider this the equivalent of tests code coverage); we may be referring to missing (or misspelled) chunk names; and, finally, we need to deal with duplicate chunks.
In literate programming, it is trivial to write a chunk once and use it in
several places in the compiled source code. The classical example is C/C++
function signatures that need to appear in both the .h
and .c
/.cpp
files.
However, in some cases this practice makes sense for other pieces of code, and
since the ultimate source code contains only one copy of the chunk, this does
not suffer from the typical copy-and-paste issues.
In inverse literate programming, if the same code appears twice (as a result of copy-and-paste), then it does suffer from the typical copy-and-paste issues. The most serious of these is, of course, that when only one copy is changed. The way that Codnar helps alleviate this problem is that if the same chunk appears more than once in the source code, its content is expected to be exactly the same in both cases (up to indentation). This should not be viewed as endorsement of copy-and-paste programming; Using duplicate chunks should be a last resort measure to combat restrictions in the programming language and compilation tool chain.
The above definition raises the obvious question: what does "the same chunk"
mean? As far as Codnar is concerned, a chunk is uniquely identified by its
name, which is specified on the begin_chunk
line. The unique identifier is
not the literal name but a transformation of it. This allows us to ignore
capitalization, white space, and any punctuation that may appear in the name.
It also allows us to use the resulting ID as an HTML anchor name, without
worrying about HTML's restictions on such names.
Here is a simple test demonstrating converting names to identifiers:
require "codnar" require "test/spec"
Test converting chunk names to identifiers. |
class TestIdentifyChunks < Test::Unit::TestCase def test_lower_case_to_id "a".to_id.should == "a" end def test_upper_case_to_id "A".to_id.should == "a" end def test_digits_to_id "1".to_id.should == "1" end def test_non_alnum_to_id "!@-$#".to_id.should == "-" end def test_complex_to_id "C# for .NET!".to_id.should == "c-for-net-" end def test_strip_to_id " a ".to_id.should == "a" end end
And here is the implementation:
Extend the core String class. |
class String
|
Convert this String to an identifier. This is a stable operation, so anything that accept a name will also accept an identifier as well. |
def to_id return self.strip.gsub(/[^a-zA-Z0-9]+/, "-").downcase end
Clean HTML
end
Detecting unused and/or duplicate chunks requires us to have in-memory chunk storage that tracks all chunks access. Here is a simple test demonstrating reading chunks into the storage and handling the various error conditions listed above:
require "codnar" require "olag/test" require "test/spec"
Test reading chunks from files. |
class TestReadChunks < Test::Unit::TestCase include Test::WithErrors include Test::WithFakeFS def test_read_chunks Codnar::Writer.write("foo.chunks", { "name" => "foo" }) Codnar::Writer.write("bar.chunks", [ { "name" => "bar" }, { "name" => "baz" } ]) reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks")) check_read_data(reader, "foo" => { "name" => "foo" }, "bar" => { "name" => "bar" }, "baz" => { "name" => "baz" }) @errors.should == [] end def test_read_invalid_chunks write_fake_file("foo.chunks") reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks")) @errors.should == [ "#{$0}: Invalid chunks data in file: #{File.expand_path("foo.chunks")}" ] end def test_read_unused_chunks Codnar::Writer.write("foo.chunks", { "name" => "foo", "locations" => [ { "file" => "a", "line" => 1 } ] }) Codnar::Writer.write("bar.chunks", { "name" => "bar", "locations" => [ { "file" => "b", "line" => 2 } ] }) reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks")) check_read_data(reader, "foo" => { "name" => "foo", "locations" => [ { "file" => "a", "line" => 1 } ] }) @errors.should == [ "#{$0}: Unused chunk: bar in file: b at line: 2" ] end def test_read_duplicate_chunks Codnar::Writer.write("foo.chunks", { "name" => "foo", "locations" => [ { "file" => "a" } ], "contained" => [ "A" ], "containers" => [ "c" ] }) Codnar::Writer.write("bar.chunks", [ { "name" => "foo", "locations" => [ { "file" => "b" } ], "contained" => [ "a" ], "containers" => [ "d" ] }, { "name" => "foo", "locations" => [ { "file" => "c" } ], "contained" => [ "a" ], "containers" => [] } ]) reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks")) check_read_data(reader, "foo" => { "name" => "foo", "locations" => [ { "file" => "a" }, { "file" => "b" }, { "file" => "c" } ], "contained" => [ "a" ], "containers" => [ "c", "d" ], }) end def test_read_different_chunks Codnar::Writer.write("foo.chunks", [ { "name" => "foo", "html" => "bar", "locations" => [ { "file" => "foo.chunks", "line" => 1 } ], "contained" => [ "a" ], "containers" => [] }, { "name" => "foo", "html" => "baz", "locations" => [ { "file" => "foo.chunks", "line" => 2 } ], "contained" => [ "A" ], "containers" => [] } ]) Codnar::Writer.write("bar.chunks", [ { "name" => "foo", "html" => "bar", "locations" => [ { "file" => "bar.chunks", "line" => 1 } ], "contained" => [ "a" ], "containers" => [] } ]) reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks").sort) @errors.should == [ "#{$0}: Chunk: foo is different in file: foo.chunks at line: 2, " \ + "and in file: bar.chunks at line: 1 or in file: foo.chunks at line: 1" ] check_read_data(reader, "foo" => { "name" => "foo", "html" => "bar", "locations" => [ { "file" => "bar.chunks", "line" => 1 }, { "file" => "foo.chunks", "line" => 1 } ], "contained" => [ "a" ], "containers" => [], }) end def test_read_fake_chunk reader = Codnar::Reader.new(@errors, []) reader["foo"].should == Codnar::Reader.fake_chunk("foo") @errors.should == [ "#{$0}: Missing chunk: foo" ] end def test_read_equivalent_name_chunks Codnar::Writer.write("foo.chunks", [ { "name" => "Foo?", "locations" => [ { "file" => "foo.chunks", "line" => 1 } ], "containers" => [ "1" ], "contained" => [ "c" ] }, { "name" => "FOO!!", "locations" => [ { "file" => "foo.chunks", "line" => 2 } ], "containers" => [ "2" ], "contained" => [ "C" ] } ]) reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks")) check_read_data(reader, "foo-" => { "name" => "Foo?", "locations" => [ { "file" => "foo.chunks", "line" => 1 }, { "file" => "foo.chunks", "line" => 2 } ], "containers" => [ "1", "2" ], "contained" => [ "c" ], }) end protected def check_read_data(reader, chunks) chunks.each do |name, chunk| reader[name].should == chunk end reader.collect_unused_chunk_errors end end
And here is the implementation:
module Codnar
|
Read chunks from disk files. |
class Reader
|
Load all chunks from the specified disk files to memory for later access by name. |
def initialize(errors, paths) @errors = errors @chunks = {} @used = {} paths.each do |path| read_path_chunks(path) end end
|
Fetch a chunk by its name. |
def [](name) id = name.to_id @used[id] = true return @chunks[id] ||= ( @errors << "Missing chunk: #{name}" Reader.fake_chunk(name) ) end
|
Collect errors for unused chunks. |
def collect_unused_chunk_errors @chunks.each do |id, chunk| @errors.push("#{$0}: Unused chunk: #{chunk.name} #{Reader.locations_message(chunk)}") unless @used[id] end end protected
|
Load and merge all chunks from a disk file into memory. |
def read_path_chunks(path) @errors.in_path(path) do chunks = load_path_chunks(path) next unless chunks merge_loaded_chunks(chunks) @root_chunk ||= chunks[0].name end end
|
Load all chunks from a disk file into memory. |
def load_path_chunks(path) chunks = YAML.load_file(path) @errors << "Invalid chunks data" unless chunks
|
TODO: A bit more validation would be nice. |
return chunks
end
|
Merge an array of chunks into memory. |
def merge_loaded_chunks(chunks) chunks.each do |new_chunk| old_chunk = @chunks[id = new_chunk.name.to_id] if old_chunk.nil? @chunks[id] = new_chunk elsif Reader.same_chunk?(old_chunk, new_chunk) Reader.merge_same_chunks(old_chunk, new_chunk) else @errors.push(Reader.different_chunks_error(old_chunk, new_chunk)) end end end
|
Merge a new “same” chunk into an old one. |
def self.merge_same_chunks(old_chunk, new_chunk) old_chunk.locations = \ (old_chunk.locations + new_chunk.locations).uniq.sort \ do |first_location, second_location| [ first_location.file.to_id, first_location.line ] \ <=> [ second_location.file.to_id, second_location.line ] end old_chunk.containers = \ (old_chunk.containers + new_chunk.containers).uniq.sort \ do |first_name, second_name| first_name.to_id <=> second_name.to_id end end
|
Check whether two chunks contain the same “stuff”. |
def self.same_chunk?(old_chunk, new_chunk) return Reader.chunk_payload(old_chunk) == Reader.chunk_payload(new_chunk) end
|
Return just the actual payload of a chunk for equality comparison. |
def self.chunk_payload(chunk) chunk = chunk.reject { |key, value| [ "locations", "name", "containers" ].include?(key) } chunk.contained.map! { |name| name.to_id } return chunk end
|
Error message when two different chunks have the same name. |
def self.different_chunks_error(old_chunk, new_chunk) old_location = Reader.locations_message(old_chunk) new_location = Reader.locations_message(new_chunk) return "#{$0}: Chunk: #{old_chunk.name} is different #{new_location}, and #{old_location}" end
|
Format a chunk’s location for an error message. |
def self.locations_message(chunk) locations = chunk.locations.map { |location| "in file: #{location.file} at line: #{location.line}" } return locations.join(" or ") end
|
Return a fake chunk for the specified name. |
def self.fake_chunk(name) return { "name" => name, "locations" => [ { "file" => "MISSING" } ], "contained" => [], "containers" => [], "html" => "<div class='missing chunk error'>\nMISSING\n</div>" } end end end
Assembling the final HTML requires combining both the narrative documentation and source code chunks. This is done top-down starting at a "root" documentation chunk and recursively embedding nested documentation and code chunks into it.
When embedding a documentation chunk inside another documentation chunk, things
are pretty easy - we just need to insert the embedded chunk HTML into the
containing chunk. When embedding a source code chunk into the documentation,
however, we may want to wrap it in some boilerplate HTML, providing a header,
footer, borders, links, etc. Therefore, the HTML syntax we use to embed a chunk
into the documentation is <embed src="..." type="x-codnar/template-name"/>
.
The templates are normal ERB templates, except for the magical file
and
image
templates, described below.
At any rate, here is a simple test demonstrating applying different templates to the embedded code chunks:
require "codnar" require "olag/test" require "test/spec"
Test the built-in weave configurations. |
class TestWeaveConfigurations < Test::Unit::TestCase include Test::WithErrors include Test::WithFakeFS def test_weave_file Codnar::Writer.write("chunks", { "locations" => [ "file" => "chunk" ], "containers" => [], "contained" => [], "name" => "Top", "html" => <<-EOF.unindent, <h1>Top</h1> <embed src="path" type="x-codnar/file"/> EOF }) write_fake_file("path", "<h2>File</h2>\n") html = Codnar::Weaver.new(@errors, [ "chunks" ], Codnar::Configuration::WEAVE_INCLUDE).weave("include", "top") @errors.should == [] html.should == <<-EOF.unindent <h1>Top</h1> <h2>File</h2> EOF end def test_weave_include Codnar::Writer.write("chunks", chunks("include")) html = Codnar::Weaver.new(@errors, [ "chunks" ], Codnar::Configuration::WEAVE_INCLUDE).weave("include", "top") @errors.should == [] html.should == <<-EOF.unindent #! ((( html
<h1>Top</h1> <h2>Intermediate</h2> <h3>Bottom</h3> EOF
#! ))) html end WOVEN_PLAIN_CHUNK = <<-EOF.unindent #! ((( html
<div class="plain chunk"> <a name="top"/> <h1>Top</h1> <div class="plain chunk"> <a name="intermediate"/> <h2>Intermediate</h2> <div class="plain chunk"> <a name="bottom"/> <h3>Bottom</h3> </div> </div> </div> EOF
#! ))) html def test_weave_plain_chunk Codnar::Writer.write("chunks", chunks("plain_chunk")) html = Codnar::Weaver.new(@errors, [ "chunks" ], Codnar::Configuration::WEAVE_PLAIN_CHUNK).weave("plain_chunk", "top") @errors.should == [] html.should == WOVEN_PLAIN_CHUNK end
|
Normally, one does not nest named_chunk_with_containers chunks this way, but it serves as a test. |
WOVEN_NAMED_CHUNK = <<-EOF.unindent #! ((( html
<div class="named_with_containers chunk"> <div class="chunk name"> <a name="top"> <span>Top</span> </a> </div> <div class="chunk html"> <h1>Top</h1> <div class="named_with_containers chunk"> <div class="chunk name"> <a name="intermediate"> <span>Intermediate</span> </a> </div> <div class="chunk html"> <h2>Intermediate</h2> <div class="named_with_containers chunk"> <div class="chunk name"> <a name="bottom"> <span>BOTTOM</span> </a> </div> <div class="chunk html"> <h3>Bottom</h3> </div> <div class="chunk containers"> <span class="chunk containers header">Contained in:</span> <ul class="chunk containers"> <li class="chunk container"> <a class="chunk container" href="#intermediate">Intermediate</a> </li> </ul> </div> </div> </div> <div class="chunk containers"> <span class="chunk containers header">Contained in:</span> <ul class="chunk containers"> <li class="chunk container"> <a class="chunk container" href="#top">Top</a> </li> </ul> </div> </div> </div> </div> EOF
#! ))) html def test_weave_named_chunk_with_containers Codnar::Writer.write("chunks", chunks("named_chunk_with_containers")) weaver = Codnar::Weaver.new(@errors, [ "chunks" ], Codnar::Configuration::WEAVE_NAMED_CHUNK_WITH_CONTAINERS) html = weaver.weave("named_chunk_with_containers", "top") @errors.should == [] html.should == WOVEN_NAMED_CHUNK end protected def chunks(template) return [ { "locations" => [ "file" => "chunk" ], "containers" => [ "Intermediate" ], "contained" => [], "name" => "BOTTOM", "html" => "<h3>Bottom</h3>\n", }, { "locations" => [ "file" => "chunk" ], "containers" => [ "Top" ], "contained" => [ "BOTTOM" ], "name" => "Intermediate", "html" => <<-EOF.unindent, #! ((( html
<h2>Intermediate</h2> <embed type='x-codnar/#{template}' src='bottom'> </embed> EOF
}, { #! ))) html "locations" => [ "file" => "chunk" ], "containers" => [], "contained" => [ "Intermediate" ], "name" => "Top", "html" => <<-EOF.unindent, #! ((( html
<h1>Top</h1> <embed src="##INTERMEDIATE" type="x-codnar/#{template}"/> EOF
} ] #! ))) html
end
end
Here is the implementation:
module Codnar
|
Weave all chunks to a unified HTML. |
class Weaver < Reader
|
Load all chunks from the specified disk files to memory for weaving using the specified templates. |
def initialize(errors, paths, templates) super(errors, paths) @templates = templates end
|
How to process each magical file template. |
FILE_TEMPLATE_PROCESSORS = { "file" => lambda { |name, data| data }, "image" => lambda { |name, data| Weaver.embedded_base64_img_tag(name, data) }, }
|
Weave the HTML for a named chunk. |
def weave(template, chunk_name = @root_chunk) return process_file_template(template, chunk_name) if FILE_TEMPLATE_PROCESSORS.include?(template) @last_chunk = chunk = self[chunk_name.to_id] expand_chunk_html(chunk) return process_template(chunk, template) end protected
|
Due to github.com/relevance/rcov/issues/#issue/43 the following regular expressions must be on a single line. |
|
Detect embedded chunks ( |
TYPE_SRC_CHUNK = / [ ]* <embed \s+ type = ['\"] x-codnar\/ (.*?) ['\"] \s+ src = ['\"] \#* (.*?) ['\"] \s* (?: \/> | > \s* <\/embed> ) [ ]* /x
|
Detect embedded chunks ( |
SRC_TYPE_CHUNK = / [ ]* <embed \s+ src = ['\"] \#* (.*?) ['\"] \s+ type = ['\"] x-codnar\/ (.*?) ['\"] \s* (?: \/> | > \s* <\/embed> ) [ ]* /x
|
Recursively expand all embedded chunks inside a container chunk. |
def expand_chunk_html(chunk) html = chunk.html @errors.push("No HTML in chunk: #{chunk.name} #{Weaver.locations_message(chunk)}") unless html #! TRICKY: All "container" chunks are assumed to be whole-file chunks with #! a single location. Which makes sense as these are documentation and not #! code chunks. TODO: It would be nice to know the exact line number of #! the chunk embedding directive for better pinpointing of any error. @errors.in_path(chunk.locations[0].file) do chunk.expanded_html ||= expand_embedded_chunks(html || "").chomp end end
|
Recursively expand_embedded_chunks all embedded chunk inside an HTML. |
def expand_embedded_chunks(html) return html.gsub(TYPE_SRC_CHUNK) { |match| weave($1, $2).chomp } \ .gsub(SRC_TYPE_CHUNK) { |match| weave($2, $1).chomp } end
|
Process the chunk using an ERB template prior to inclusion in container chunk. |
def process_template(chunk, template_name) template_text = @templates[template_name] ||= ( @errors << "Missing ERB template: #{template_name}" "<%= chunk.expanded_html %>\n" ) return ( ( chunk.erb ||= {} )[template_name] ||= ERB.new(template_text, nil, "%") ).result(binding) end
Processing the file template
Processing Base64 embedded data images
end end
And here are the pre-defined weaving template configurations:
module Codnar module Configuration
|
Weave configuration providing a single simple |
WEAVE_INCLUDE = { "include" => "<%= chunk.expanded_html %>\n" }
|
Weave chunks in the plainest possible way. |
WEAVE_PLAIN_CHUNK = { "plain_chunk" => <<-EOF.unindent, #! ((( html
<div class="plain chunk"> <a name="<%= chunk.name.to_id %>"/> <%= chunk.expanded_html %> </div> EOF
} #! ))) html
|
Weave chunks with their name and the list of container chunks. |
WEAVE_NAMED_CHUNK_WITH_CONTAINERS = { "named_chunk_with_containers" => <<-EOF.unindent, #! ((( html
<div class="named_with_containers chunk"> <div class="chunk name"> <a name="<%= chunk.name.to_id %>"> <span><%= CGI.escapeHTML(chunk.name) %></span> </a> </div> <div class="chunk html"> <%= chunk.expanded_html %> </div> % if chunk.containers != [] <div class="chunk containers"> <span class="chunk containers header">Contained in:</span> <ul class="chunk containers"> % chunk.containers.each do |container| <li class="chunk container"> <a class="chunk container" href="#<%= container.to_id %>"><%= CGI.escapeHTML(container) %></a> </li> % end </ul> </div> % end </div> EOF
} #! ))) html
end
end
The template named file
is special in two ways. First, the src
is given
special treatment. If it begins with a ".
", it is assumed to be a normal path
name relative to the current working directory; otherwise, it is assumed to be
a name of a file packaged inside some gem and is searched for in Ruby's
$LOAD_PATH
. This allows gems (such as Codnar itself) to provide such files to
be used in the woven documentation.
Second, the content of the file is simply embedded into the generated documentation. This allows the documentation to be a stand-alone file, including all the CSS and Javascript required for proper display.
Process one of the magical file templates. The content of the file, optionally processed, is directly embedded into the generated documentation. If the file’s path begins with “.”, it is taken to be relative to the current working directory. Otherwise, it is searched for in Ruby’s load path, allowing easy access to files packaged inside gems. |
def process_file_template(template, path) begin path = Olag::DataFiles.expand_path(path) unless path[0,1] == "." return FILE_TEMPLATE_PROCESSORS[template].call(path, File.read(path)) rescue Exception => exception @errors.push("#{$0}: Reading file: #{path} exception: #{exception} #{Reader.locations_message(@last_chunk)}") \ if @last_chunk return "FILE: #{path} EXCEPTION: #{exception}" end end
See the doc/root.html
file for plenty of examples of using this
functionality.
The image
template is a specialization of the file
template for dealing
with embedded images. The specified image file is embedded into the generated
HTML as an img
tag, using a data
URL. This is very useful for
small images, but is problematic when their size increase beyond
browser-specific limits.
Here is a simple test demonstrating processing embedded image files:
require "codnar" require "test/spec"
Test computing embedded image HTML tags. |
class TestEmbedImages < Test::Unit::TestCase def test_embed_image Codnar::Weaver.embedded_base64_img_tag('fake file.png', 'fake file content').should \ == "<img src='data:image/png;base64,ZmFrZSBmaWxlIGNvbnRlbnQ=\n'/>" end end
Here is the implementation:
Create an |
def self.embedded_base64_img_tag(name, data) extension = File.extname(name).sub(".", "/") return "<img src='data:image#{extension};base64," \ + Base64.encode64(data) \ + "'/>" end
And here is a sample embedded image:
There are two ways to invoke Codnar's functionality - from the command line, and (for Ruby projects) as integrated Rake tasks.
Executable scripts (tests, command-line applications) start with a require
'codnar'
line to access to the full Codnar code. This also serves as a
convenient list of all of Codnar's parts and dependencies:
require "andand" require "base64" require "cgi" require "coderay" require "digest/sha2" require "erb" require "fileutils" require "irb" require "open3" require "rdiscount" require "rdoc" require "rdoc/markup/to_html" require "tempfile" require "yaml" require "olag/application" require "olag/data_files" require "olag/errors" require "olag/hash_struct" require "olag/string_unindent" require "codnar/version" require "codnar/coderay" require "codnar/haddock" require "codnar/hash_extensions" require "codnar/markdown" require "codnar/rdoc" require "codnar/string_extensions" require "codnar/application" require "codnar/cache" require "codnar/formatter" require "codnar/graphviz" require "codnar/grouper" require "codnar/gvim" require "codnar/merger" require "codnar/split" require "codnar/reader" require "codnar/scanner" require "codnar/configuration/code" require "codnar/configuration/comments" require "codnar/configuration/documentation" require "codnar/configuration/highlighting" require "codnar/split_configurations" require "codnar/splitter" require "codnar/sunlight" require "codnar/weave" require "codnar/weave_configurations" require "codnar/weaver" require "codnar/writer"
The base command line Application class handles execution from the command line, with the usual standard options, as well as some Codnar-specific ones: the ability to specify configuration files and/or built-in configurations, and the ability to include additional extension code triggered from these configurations. Together, these allow configuring and extending Codnar's behavior to cover the specific system's needs.
Here is a simple test demonstrating the standard Codnar application behavior:
require "codnar" require "olag/test" require "test/spec" module Codnar
|
Test running a Codnar Application. |
class TestRunApplication < Test::Unit::TestCase include Test::WithFakeFS include Test::WithTempfile def test_print_version Codnar::Application.with_argv(%w(-o nested/stdout -v -h dummy)) { Codnar::Application.new(true).run }.should == 0 File.read("nested/stdout").should == "#{$0}: Version: #{Codnar::VERSION}\n" end def test_print_help Codnar::Application.with_argv(%w(-o stdout -h -v dummy)) { Codnar::Application.new(true).run }.should == 0 File.read("stdout").should.include?("OPTIONS") end USER_CONFIGURATION = { "formatters" => { "doc" => "Formatter.lines_to_pre_html(lines, :class => :pre)", } } def test_merge_configurations write_fake_file("user_configuration.yaml", USER_CONFIGURATION.to_yaml) Codnar::Application.with_argv(%w(-o stdout -c split_pre_documentation -c user_configuration.yaml -p)) { Codnar::Application.new(true).run }.should == 0 YAML.load_file("stdout").should == Codnar::Configuration::SPLIT_PRE_DOCUMENTATION.deep_merge(USER_CONFIGURATION) end def test_require_missing_configuration status = Application.with_argv(%w(-e stderr -c no-such-configuration)) { Codnar::Application.new(true).run }.should == 1 File.read("stderr").should \ == "#{$0}: Configuration: no-such-configuration is neither a disk file nor a known configuration\n" end def test_require_module FakeFS.deactivate! # The additional_module is read by Ruby and is not affected by FakeFS. directory = create_tempdir write_fake_file(directory + "/additional_module.rb", "puts 'HERE'\n") Application.with_argv(["-o", stdout = directory + "/stdout", "-I", directory, "-r", "additional_module" ]) { Codnar::Application.new(true).run }.should == 0 File.read(stdout).should == "HERE\n" end def test_require_missing_module Application.with_argv(%w(-e stderr -I support -r no_such_module)) { Codnar::Application.new(true).run }.should == 1 File.read("stderr").should == "#{$0}: no such file to load -- no_such_module\n" end end end
And here is the implementation:
module Codnar
|
Base class for Codnar applications. |
class Application < Olag::Application
|
Create a Codnar application. |
def initialize(is_test = nil) super(is_test) @configuration ||= {} end
|
Run the Codnar application, returning its status. |
def run(&block) super(@configuration, &block) end protected
|
Define Codnar application flags. |
def define_flags super define_include_flag define_require_flag define_merge_flag define_print_flag end
|
Return the application’s version - that is, Codnar’s version. |
def version return Codnar::VERSION end
|
Define a flag for collecting module load path directories. |
def define_include_flag @options.on("-I", "--include DIRECTORY", String, "Add directory to Ruby's load path.") do |path| $LOAD_PATH.unshift(path) end end
|
Define a flag for loading a Ruby module. This may be needed for user-specified configurations to work. |
def define_require_flag @options.on("-r", "--require MODULE", String, "Load a Ruby module for user configurations.") do |path| begin require(path) rescue Exception => exception $stderr.puts("#{$0}: #{exception}") exit(1) end end end
|
Define a flag for applying (merging) a Codnar configuration. |
def define_merge_flag @options.on("-c", "--configuration NAME-or-FILE", String, "Apply a named or disk file configuration.") do |name_or_path| loaded_configuration = load_configuration(name_or_path) @configuration = @configuration.deep_merge(loaded_configuration) end end
|
Define a flag for printing the (merged) Codnar configuration. |
def define_print_flag @options.on("-p", "--print", "Print the merged configuration.") do |name_or_path| puts(@configuration.to_yaml) end end
|
Load a configuration either from the available builtin data or from a disk file. |
def load_configuration(name_or_path) return YAML.load_file(name_or_path) if File.exist?(name_or_path) name, *arguments = name_or_path.split(':') value = configuration_value(name) value = value.call(*arguments) unless Hash === value return value end
|
Compute the value of a named built-in configuration. |
def configuration_value(name) begin value = Configuration.const_get(name.upcase) return value if value rescue value = nil end $stderr.puts("#{$0}: Configuration: #{name} is neither a disk file nor a known configuration") exit(1) end end end
Here is a simple test demonstrating invoking the command-line application for splitting files:
require "codnar" require "olag/test" require "test/spec"
Test running the Split Codnar Application. |
class TestRunSplit < Test::Unit::TestCase include Test::WithFakeFS def test_print_help Codnar::Application.with_argv(%w(-o stdout -h)) { Codnar::Split.new(true).run }.should == 0 help = File.read("stdout") [ "codnar-split", "OPTIONS", "DESCRIPTION" ].each { |text| help.should.include?(text) } end def test_run_split write_fake_file("input", "<foo>\n") Codnar::Application.with_argv(%w(-o stdout input)) { Codnar::Split.new(true).run }.should == 0 YAML.load_file("stdout").should == [ { "name" => "input", "locations" => [ { "file" => "input", "line" => 1 } ], "html" => "<foo>", "containers" => [], "contained" => [], } ] end end
Here is the implementation:
module Codnar
|
Split application. |
class Split < Application
|
Run the weaving Codnar application, returning its status. |
def run super { split } end protected
|
Split the specified input file into chunks. |
def split @configuration = Codnar::Configuration::SPLIT_HTML_DOCUMENTATION if @configuration == {} splitter = Splitter.new(@errors, @configuration) print(splitter.chunks(ARGV[0]).to_yaml) end
|
Parse remaining command-line file arguments. |
def parse_arguments expect_exactly(1, "files to split") end
|
Return the banner line of the help message. |
def banner return "codnar-split - Split documentation or code files to chunks." end
|
Return the name and description of any final command-line file arguments. |
def arguments return "FILE", "Documentation or code file to split." end
|
Return a short description of the program. |
def description return <<-EOF.unindent Split the documentation of file into chunks that are printed in YAML format to the output (to be read by codnar-weave). Many file formats can be split depending on the specified configuration. The default configuration is called SPLIT_HTML_DOCUMENTATION, and it preserves the whole file as a single formatted HTML documentation chunk. This isn't very useful. The configuration needs to specify a set of line classification patterns, parsing states and pattern-based transitions between them, the initial state, and expressions for formatting classified lines to HTML. See the Codnar documentation for details. EOF end end end
And here is the actual command-line application script:
#!/usr/bin/ruby -w require "codnar" exit Codnar::Split.new.run
Here is a simple test demonstrating invoking the command-line application for weaving chunk to HTML:
require "codnar" require "olag/test" require "test/spec"
Test running the Weave Codnar Application. |
class TestRunWeave < Test::Unit::TestCase include Test::WithFakeFS def test_print_help Codnar::Application.with_argv(%w(-o stdout -h)) { Codnar::Weave.new(true).run }.should == 0 help = File.read("stdout") [ "codnar-weave", "OPTIONS", "DESCRIPTION" ].each { |text| help.should.include?(text) } end ROOT_CHUNKS = [ { "name" => "root", "locations" => [ { "file" => "root", "line" => 1 } ], "html" => "Root\n<embed src='included' type='x-codnar/include'/>\n" } ] INCLUDED_CHUNKS = [ { "name" => "included", "locations" => [ { "file" => "included", "line" => 1 } ], "html" => "Included" } ] def test_run_weave write_fake_file("root", ROOT_CHUNKS.to_yaml) write_fake_file("included", INCLUDED_CHUNKS.to_yaml) Codnar::Application.with_argv(%w(-o stdout root included)) { Codnar::Weave.new(true).run }.should == 0 File.read("stdout").should == "Root\nIncluded\n" end def test_run_weave_missing_chunk write_fake_file("root", ROOT_CHUNKS.to_yaml) Codnar::Application.with_argv(%w(-e stderr -o stdout root)) { Codnar::Weave.new(true).run }.should == 1 File.read("stderr").should == "#{$0}: Missing chunk: included in file: root\n" end def test_run_weave_unused_chunk write_fake_file("root", ROOT_CHUNKS.to_yaml) write_fake_file("included", INCLUDED_CHUNKS.to_yaml) Codnar::Application.with_argv(%w(-e stderr -o stdout included root)) { Codnar::Weave.new(true).run }.should == 1 File.read("stderr").should == "#{$0}: Unused chunk: root in file: root at line: 1\n" end FILE_CHUNKS = [ { "name" => "root", "locations" => [ { "file" => "root", "line" => 1 } ], "html" => "Root\n<embed src='included.file' type='x-codnar/file'/>\n" } ] def test_run_weave_missing_file write_fake_file("root", FILE_CHUNKS.to_yaml) Codnar::Application.with_argv(%w(-e stderr -o stdout root)) { Codnar::Weave.new(true).run }.should == 1 File.read("stdout").should == "Root\nFILE: included.file EXCEPTION: No such file or directory - included.file\n" File.read("stderr").should \ == "#{$0}: Reading file: included.file exception: No such file or directory - included.file in file: root at line: 1\n" end def test_run_weave_existing_file write_fake_file("root", FILE_CHUNKS.to_yaml) write_fake_file("included.file", "included file\n") Codnar::Application.with_argv(%w(-e stderr -o stdout root)) { Codnar::Weave.new(true).run }.should == 0 File.read("stdout").should == "Root\nincluded file\n" end end
Here is the implementation:
module Codnar
|
Weave application. |
class Weave < Application
|
Run the weaving Codnar application, returning its status. |
def run super { weave } end protected
|
Weave all the chunks together to a single HTML. |
def weave @configuration = Codnar::Configuration::WEAVE_INCLUDE if @configuration == {} weaver = Weaver.new(@errors, ARGV, @configuration) puts(weaver.weave("include")) weaver.collect_unused_chunk_errors end
|
Parse remaining command-line file arguments. |
def parse_arguments expect_at_least(1, "chunk files to weave") end
|
Return the banner line of the help message. |
def banner return "codnar-weave - Weave documentation chunks to a single HTML." end
|
Return the name and description of any final command-line file arguments. |
def arguments return "MAIN-CHUNK ADDITIONAL-CHUNKS", "Chunk files to weave together." end
|
Return a short description of the program. |
def description print(<<-EOF.unindent) Weave chunks in all chunk files (from codnar-split) to a single HTML that is printed to the output. The first file is the main documentation file that is expected to include all the rest of the chunks via directives of the format: <embed src="chunk-name" type="x-codnar/template-name"></embed> Where the template-name is a key in the configuration, whose value is an ERB template for embedding the named chunk into the documentation. If no configuration is specified, the WEAVE_INCLUDE configuration is assumed. This configuration contains a single template named "include", which simply includes the named chunk into the generated HTML. EOF end end end
And here is the actual command-line application script:
#!/usr/bin/ruby -w require "codnar" exit Codnar::Weave.new.run
For Ruby projects (or any other project using Rake), it is also possible to invoke Codnar using Rake tasks. Here is a simple test demonstrating using the Rake tasks:
require "codnar/rake" require "olag/test" require "test/spec"
Test rake tasks. |
class TestRakeTasks < Test::Unit::TestCase include Test::WithFakeFS include Test::WithRake def test_default run_rake test_results end protected def run_rake write_fake_file("foo", "foo\n") Codnar::Rake::SplitTask.new([ "foo" ], []) Codnar::Rake::WeaveTask.new("foo", []) @rake["codnar"].invoke end def test_results chunk_file = Codnar::Rake.chunks_dir + "/foo" YAML.load_file(chunk_file).should == [ { "html" => "foo", "name" => "foo", "locations" => [ { "file" => "foo", "line" => 1 } ], "containers" => [], "contained" => [], } ] File.read("codnar.html").should == "foo\n" Codnar::Rake.chunk_files.should == [ chunk_file ] end end
To use these tasks in a Rakefile, one needs to require 'codnar/rake'
. The
code implements a singleton that holds the global state shared between tasks:
require "rake" require "rake/tasklib" require "codnar" require "codnar/rake/split_task" require "codnar/rake/weave_task" module Codnar
|
This module contains all the Codnar Rake tasks code. |
module Rake class << self
|
The root folder to store all chunk files under. |
attr_accessor :chunks_dir
|
The list of split chunk files for later weaving. |
attr_accessor :chunk_files end Rake.chunk_files = [] Rake.chunks_dir = "chunks"
|
Compute options for invoking an application. |
def self.application_options(output, configurations) options = [ "-o", output ] options += configurations.map { |configuration| [ "-c", configuration.to_s ] }.flatten return options end
|
Return the list of actual configuration files (as opposed to names of built-in configurations) for use as dependencies. |
def self.configuration_files(configurations) return configurations.find_all { |configuration| File.exists?(configuration.to_s) } end end end
To split one or more files to chunks, create a new SplitTask. Multiple such tasks may be created; this is required if different files need to be split using different configurations.
module Codnar module Rake
|
A Rake task for splitting source files to chunks. |
class SplitTask < ::Rake::TaskLib
|
Create a new Rake task for splitting source files to chunks. Each of the specified disk files is split using the specified set of configurations. |
def initialize(paths, configurations) @configurations = configurations paths.each do |path| define_tasks(path) end end protected
|
Define the tasks for splitting a single source file to chunks. |
def define_tasks(path) output = Rake.chunks_dir + "/" + path define_split_file_task(path, output) SplitTask.define_common_tasks SplitTask.connect_common_tasks(output) end
|
Define the actual task for splitting the source file. |
def define_split_file_task(path, output) ::Rake::FileTask.define_task(output => [ path ] + Rake.configuration_files(@configurations)) do run_split_application(path, output) end end
|
Run the Split application for a single source file. |
def run_split_application(path, output) options = Rake.application_options(output, @configurations) options << path status = Application.with_argv(options) { Split.new.run } raise "Codnar split errors" unless status == 0 end
|
Define common Rake split tasks. This method may be invoked several times, only the first invocation actually defined the tasks. The common tasks are codnar_split (for splitting all the source files) and clean_codnar (for getting rid of the chunks directory). |
def self.define_common_tasks @defined_common_tasks ||= SplitTask.create_common_tasks end
|
Actually create common Rake split tasks. |
def self.create_common_tasks desc "Split all files into chunks" ::Rake::Task.define_task("codnar_split") desc "Clean all split chunks" ::Rake::Task.define_task("clean_codnar") { FileUtils.rm_rf(Rake.chunks_dir) } ::Rake::Task.define_task(:clean => "clean_codnar") end
|
For some reason, |
def self.desc(description) ::Rake.application.last_description = description end
|
Connect the task for splitting a single source file to the common task of splitting all source files. |
def self.connect_common_tasks(output) ::Rake::Task.define_task("codnar_split" => output) Rake::chunk_files << output end end end end
To weave the chunks together, create a single WeaveTask.
module Codnar module Rake
|
A Rake task for weaving chunks to a single HTML. |
class WeaveTask < ::Rake::TaskLib
|
Create a Rake task for weaving chunks to a single HTML. The root source file is expected to embed all the chunks into the output HTML. The chunks are loaded from the results of all the previous created SplitTask-s. |
def initialize(root, configurations, output = "codnar.html") @root = Rake.chunks_dir + "/" + root @output = output @configurations = configurations define_tasks end protected
|
Define the tasks for weaving the chunks to a single HTML. |
def define_tasks define_weave_task connect_common_tasks end
|
Define the actual task for weaving the chunks to a single HTML. |
def define_weave_task desc "Weave chunks into HTML" unless ::Rake.application.last_comment ::Rake::Task.define_task("codnar_weave" => @output) ::Rake::FileTask.define_task(@output => Rake.chunk_files + Rake.configuration_files(@configurations)) do run_weave_application end end
|
Run the Weave application for a single source file. |
def run_weave_application options = Rake.application_options(@output, @configurations) options << @root options += Rake.chunk_files.reject { |chunk| chunk == @root } status = Application.with_argv(options) { Weave.new.run } raise "Codnar weave errors" unless status == 0 end
|
Connect the task for cleaning up after weaving
( |
def connect_common_tasks desc "Build the code narrative HTML" ::Rake::Task.define_task(:codnar => "codnar_weave") desc "Remove woven HTML documentation" ::Rake::Task.define_task("clobber_codnar") { rm_rf(@output) } ::Rake::Task.define_task(:clobber => "clobber_codnar") end end end end
The following Rakefile is in charge of building the gem, with the help of some tools described below.
$LOAD_PATH.unshift(File.dirname(__FILE__) + "/lib") require "olag/rake"
Codnar configurations
spec = Gem::Specification.new do |spec| spec.name = "codnar" spec.version = Codnar::VERSION spec.title = "Code Narrator" spec.author = "Oren Ben-Kiki" spec.email = "rubygems-oren@ben-kiki.org" spec.homepage = "https://rubygems.org/gems/codnar" spec.summary = "Code narrator - an inverse literate programming tool." spec.description = (<<-EOF).gsub(/^\s+/, "").chomp.gsub("\n", " ") Code Narrator (Codnar) is an inverse literate programming tool. It splits the source files into "chunks" (including structured comments) and weaves them back into a narrative that describes the overall system. EOF spec.add_dependency("andand") spec.add_dependency("coderay") spec.add_dependency("rdiscount") end Olag::Rake.new(spec)
The generated HTML requires some tweaking to yield aesthetic, readable results. This tweaking consists of using Javascript to control chunk visibility, generating a table of content, and using CSS to make the HTML look better.
Here are the modified configurations for generating the correct HTML:
Override the default Codnar configurations. |
Olag::Rake::CODNAR_CONFIGURATIONS.unshift([
|
Exclude the data files and images from the generated documentation. |
"lib/codnar/data/.*/.*|.*\.png", ], [
|
Tests should not have chunks detected in them. They may however contain HTML islands. |
"test/.*\.rb", "classify_source_code:ruby", "format_code_gvim_css:ruby", "classify_nested_code:ruby:html", "classify_nested_code:ruby:dot", "classify_nested_code:ruby:svg", "format_code_gvim_css:html", "format_code_gvim_css:dot", "format_code_gvim_css:svg", "classify_shell_comments", "format_rdoc_comments", ], [
|
Ruby sources contain HTML islands. |
"Rakefile|.*\.rb|bin/.*", "classify_source_code:ruby", "format_code_gvim_css:ruby", "classify_nested_code:ruby:html", "format_code_gvim_css:html", "classify_shell_comments", "format_rdoc_comments", "chunk_by_vim_regions", ], [
|
We also have Javascript sources. |
".*\.js", "classify_source_code:javascript", "format_code_gvim_css:javascript", "classify_c_comments", "format_markdown_comments" ], [
|
We also have CSS sources. |
".*\.css", "classify_source_code:css", "format_code_gvim_css:css", "classify_c_comments", "format_markdown_comments" ])
The following code injects visibility controls ("+"/"-" toggles) next to each embedded code chunk. It also hides all the chunks by default; this increases the readability of the overall narrative, turning it into a high-level summary. Expanding the embedded code chunks allows the reader to delve into the details.
Quick-and-dirty JS for inserting a "+"/"-" control for chunk visibility next to each chunk's name. By default, all chunks are hidden. |
function inject_chunk_controls() { var name_div; foreach_chunk_elements(function(div) { name_div = div; }, function(html_div) { var control_span = document.createElement("span"); var hide = function() { control_span.innerHTML = "+"; html_div.style.display = "none"; } var show = function() { control_span.innerHTML = "–"; // Vertical bar. html_div.style.display = "block"; } name_div.onclick = function() { html_div.style.display == "block" ? hide() : show(); } hide(); // Initializes html_div.style.display control_span.className = "control chunk"; name_div.insertBefore(control_span, name_div.firstChild); }) }
Loop on all DIV elements that contain a chunk name, or that contain chunk HTML. Assumes that they come in pairs - name first, HTML second. |
function foreach_chunk_elements(name_lambda, html_lambda) { var div_elements = document.getElementsByTagName("div"); for (var e in div_elements) { var div = div_elements[e]; classes = " " + div.className + " "; if (!/ chunk /.test(classes)) continue; if (/ name /.test(classes)) name_lambda(div); if (/ html /.test(classes)) html_lambda(div); } }
Only invoke it after all helper functions are defined. |
inject_chunk_controls();
The following code is not very efficient or elegant but it does a basic job of iunjecting a table of content into the generated HTML.
Quick-and-dirty JS for inserting a table of content inside a DIV with the id "contents". The table of content is a series of nested UL and LI elements, prefixed with an H1 containing the text "0 Contents". This H1 comes in addition to the single static H1 expected by HTML best practices. It looks "right" and should not confuse search engines etc. since they do not execute Javascript code. |
function inject_contents() { var contents = document.getElementById("contents"); var lists = contents_lists(); contents.appendChild(contents_header()); // TRICKY: Must be done after contents_lists(). contents.appendChild(lists); }
Create a table of contents H1. |
function contents_header() { var h = document.createElement("h1"); var text = document.createTextNode("Contents"); h.appendChild(text); return h; }
Create nested UL/LI lists for the table of content. |
function contents_lists() { var container; var indices = []; var h_elements = all_h_elements();
|
Using "for (var e in h_elements)" is too sensitive to other libraries |
for (var e = 0; e < h_elements.length; e++) { h = h_elements[e]; var level = h.tagName.substring(1, 2) - 1; container = pop_container(container, indices, level); container = push_container(container, indices, level); var id = indices.join("."); container.appendChild(list_element(id, h)); h.insertBefore(header_anchor(id), h.firstChild); } return pop_container(container, indices, 1); }
Get a list of all H elements in the DOM. We skip the single H1 element; otherwise it would just have the index "1" which would be prefixed to all other headers. |
function all_h_elements() { var elements = document.getElementsByTagName("*"); var h_elements = []; for (var e in elements) { var h = elements[e]; if (/^h[2-9]$/i.test(h.tagName)) h_elements.push(h); } return h_elements; }
Pop indices (and UL containers) until reaching up to a given level. |
function pop_container(container, indices, level) { while (indices.length > level) { container = container.parentNode; indices.pop(); } return container; }
Push indices (and UL containers) until reaching doen to a given level. |
function push_container(container, indices, level) { while (indices.length < level) { // TRICKY: push a 0 for the very last new level, so the ++ at the end // will turn it into a 1. indices.push(indices.level < level - 1); var ul = document.createElement("ul"); if (container) { container.appendChild(ul); } container = ul; } indices[indices.length - 1]++; return container; }
Create a LI for an H element with some id. |
function list_element(id, h) { var a = document.createElement("a"); a.href = "#" + id; a.innerHTML = id + " " + h.innerHTML; var li = document.createElement("li"); li.appendChild(a); return li; }
Create an anchor for an H element with some id. |
function header_anchor(id) { var text = document.createTextNode(id + " "); var a = document.createElement("a"); a.id = id; a.appendChild(text); return a; }
Only invoke it after all helper functions are defined. |
inject_contents();
To avoid dealing with the different default styles used by different browsers, we employ the YUI CSS reset and base files. Resetting and restoring the default CSS styles is inelegant, but it is the only current way to get a consistent presentation of HTML. Once this is out of the way, we apply styles specific to our HTML. Some of these override the default styles established by the base CSS file above. We do this instead of directly tweaking the base CSS file, to allow easy upgrade to new versions if/when YUI release any.
Margin & Padding |
div.chunk.name, div.chunk.html, div.chunk.containers, div.chunk table, div.chunk td, div.chunk pre { margin: 0; padding: 0; } div.chunk *:last-child { margin-bottom: 0; } h4, h5, h6, div.chunk, div.comment pre { margin: 1em 0; } pre, div.comment, div.chunk.html { padding: 0.33em; } span.control.chunk { padding-left: 0.25em; padding-right: 0.25em; }
Table of content |
div#contents ul { margin-top: 0; margin-bottom: 0; padding: 0; } div#contents li { list-style-type: none; }
Lists |
ul.chunk.containers { padding: 0; margin: 0; display: inline; } ul.chunk.containers li { display: inline; list-style-type: none; }
Borders |
pre, span.control.chunk, div.chunk.html { border: 1px solid #000; } table.layout td.indentation, div.chunk pre { border: none; }
Colors |
span.control.chunk, table.layout td.html { background-color: Beige; }
Fonts |
body { font-family: Sans-Serif; } pre { font-family: Consolas, Inconsolata, Monaco, "Courier New", Monospace; } div.chunk.name { font-weight: bold; }
When using Sunlight for syntax highlighting, we also need to include some CSS
and Javascript files to convert the classified pre
elements into properly
marked-up HTML. We also need to invoke this Javascript code (a one-line
operations). Here is what such code might look like inside a Javascript block
of the generated HTML:
<embed src="codnar/data/sunlight/min.js" type="x-codnar/file"/> <embed src="codnar/data/sunlight/ruby-min.js" type="x-codnar/file"/> Sunlight.globalOptions.lineNumbers = false; Sunlight.highlightAll();
This module contains all the code narrator code.