Data Flow

Here is a diagram showing the overall data flow in a system documented with Codnar:

Splitting files into chunks

Codnar makes the reasonable assumption that each source file can be effectively processed as a sequence of lines. This works well in practice for all "text" source files. It fails miserably for "binary" source files, but such files don't work that well in most generic source management tools (such as version management systems).

A second, less obvious assumption is that it is possible to classify the source file lines to "kinds" using a simple state machine. The classified lines are then grouped into nested chunks based on the two special line kinds begin_chunk and end_chunk. The other line kinds are used to control how the lines are formatted into HTML.

The collected chunks, with the formatted HTML for each one, are then stored in a chunks file to be used later for weaving the overall HTML narrative.

Scanning Lines

Scanning a file into classified lines is done by the Scanner class. Here is a simple test that demonstrates using the scanner:

test/scan_lines.rb

require "codnar"
require "olag/test"
require "test/spec"

Test scanning classified lines.

class TestScanLines < Test::Unit::TestCase

  include Test::WithErrors
  include Test::WithFakeFS

  def test_scan_lines
    write_fake_file("comments", INPUT)
    scanner = Codnar::Scanner.new(@errors, SYNTAX)
    scanner.lines("comments").should == LINES
    @errors.should == ERRORS
  end

  SYNTAX = {
    "start_state" => "comment",
    "patterns" => {
      "shell" => {
        "regexp" => "^(\\s*)#+\\s*(.*)$",
        "groups" => [ "indentation", "payload" ],
        "kind" => "comment",
      },
      "c++" => {
        "regexp" => /^(\s*)\/\/+\s*(.*)$/,
        "groups" => [ "indentation", "payload" ],
        "kind" => "comment",
      },
      "invalid" => { "regexp" => "(" },
    },
    "states" => {
      "comment" => {
        "transitions" => [
          { "pattern" => "shell" },
          { "pattern" => "c++" },
          { "pattern" => "no-such-pattern", "next_state" => "no-such-state" },
        ],
      },
    },
  }

  INPUT = <<-EOF.unindent.gsub("#!", "#")
    #! foo
     // bar
      baz
  EOF

  LINES = [ {
    "kind" => "comment",
    "line" => "# foo",
    "indentation" => "",
    "payload" => "foo",
    "number" => 1,
  }, {
    "kind" => "comment",
    "line" => " // bar",
    "indentation" => " ",
    "payload" => "bar",
    "number" => 2,
  }, {
    "kind" => "error",
    "line" => "  baz",
    "indentation" => "  ",
    "payload" => "baz",
    "state" => "comment",
    "number" => 3,
  } ]

  ERRORS = [
    "#{$0}: Invalid pattern: invalid regexp: ( error: premature end of regular expression: /(/",
    "#{$0}: Reference to a missing pattern: no-such-pattern",
    "#{$0}: Reference to a missing state: no-such-state",
    "#{$0}: State: comment failed to classify line: baz in file: comments at line: 3"
  ]

end

And here is the implementation:

lib/codnar/scanner.rb

module Codnar

Scan a file into classified lines.

  class Scanner

Construct a scanner based on a syntax in the following structure:

patterns:
  <name>:
    name: <name>
    kind: <kind>
    regexp: <regexp>
    groups:
    - <name>
states:
  <name>:
    name: <name>
    transitions:
    - pattern: <pattern>
      kind: <kind>
      next_state: <state>
start_state: <state>

To allow for cleaner YAML files to specify the syntax, the following shorthands are supported:

A pattern or state reference can be presented by the string name of the pattern or state.
The name field of a state or pattern can be ommitted. If specified, it must be identical to the key in the states or patterns mapping.
The kind field of a pattern can be ommitted; by default it is assumed to be identical to the pattern name.
A pattern regexp can be presented by a plain string.
The pattern groups field can be ommitted or contain nil if it is equal to [ “indentation”, “payload” ].
The kind field of a transition can be ommitted; by default it is assumed to be identical to the pattern kind. If it ends up nil, this indicates that there’s no kind assigned by the pattern, and the current line should be classified again by the next state.
The next state of a transition can be ommitted; by default it is assumed to be identical to the containing state.
The start state can be ommitted; by default it is assumed to be named start.

When the Scanner is constructed, a deep clone of the syntax object is created and modified to expand all the above shorthands. Any problems detected during this process are pushed into the errors.

    def initialize(errors, syntax)
      @errors = errors
      @syntax = syntax.deep_clone
      @syntax.patterns.each { |name, pattern| expand_pattern_shorthands(name, pattern) }
      @syntax.states.each { |name, state| expand_state_shorthands(name, state) }
      @syntax.start_state = resolve_start_state
    end

Scan a disk file into classified lines in the following format (where the groups contain the text extracted by the matching pattern):

- kind: <kind>
  line: <text>
  <group>: <text>

By convention, each classified line has a “payload” group that contains the “main” content of the line (chunk name for begin/end/nested chunk lines, clean comment text for comment lines, etc.). In addition, most classified lines have an “indentation” group that contains the leading white space (which is not included in the payload).

If at some state, a file line does not match any pattern, the scanner will push a message into the errors. In addition it will classify the line as follows:

- kind: error
  state: <name>
  line: <text>
  indentation: <leading white space>
  payload: <line text following the indentation>

    def lines(path)
      @path = path
      @lines = []
      @state = @syntax.start_state
      @errors.in_file_lines(path) { |line| scan_line(line.chomp) }
      return @lines
    end

  protected

    Scanner pattern shorthands

    Scanner state shorthands

    Scanner file processing

    Scanner line processing


  end

end

As we can see, the implementation is split into two main parts. First, all shorthands in the syntax definition are expanded (possibly generating errors). Then, the expanded syntax is applied to a file, to generate a sequence of classified lines.

Scanner Syntax Shorthands

The syntax is expected to be written by hand in a YAML file. We therefore provide some convenient shorthands (listed above) to make YAML syntax files more readable. These shorthands must be expanded to their full form before we can apply the syntax to a file. There are two sets of shorthands we need to expand:

Scanner pattern shorthands

Expand all the shorthands used in the pattern.

def expand_pattern_shorthands(name, pattern)
  pattern.kind ||= fill_name(name, pattern, "Pattern")
  pattern.groups ||= [ "indentation", "payload" ]
  pattern.regexp = convert_to_regexp(name, pattern.regexp)
end

Convert a string regexp to a real Regexp.

def convert_to_regexp(name, regexp)
  return regexp if Regexp == regexp
  begin
    return Regexp.new(regexp)
  rescue
    @errors << "Invalid pattern: #{name} regexp: #{regexp} error: #{$!}"
  end
end

Fill in the name field for state or pattern object.

def fill_name(name, data, type)
  data_name = data.name ||= name
  @errors << "#{type}: #{name} has wrong name: #{data_name}" if data_name != name
  return data_name
end

lib/codnar/scanner.rb

Scanner state shorthands

A pattern that matches any line and extracts no data; is meant to be used for catch-all transitions that transfer the scanning to a different state. It is used if no explicit pattern is specified in a transition (that is, you can think of this as the nil pattern).

CATCH_ALL_PATTERN = {
  "kind" => nil,
  "groups" => [],
  "regexp" => //
}

Expand all the shorthands used in the state.

def expand_state_shorthands(name, state)
  fill_name(name, state, "State")
  state.transitions.each do |transition|
    pattern = transition.pattern = lookup(@syntax.patterns, "pattern", transition.pattern || CATCH_ALL_PATTERN)
    transition.kind ||= pattern.andand.kind
    transition.next_state = lookup(@syntax.states, "state", transition.next_state || state)
  end
end

Convert a string name to an actual data reference.

def lookup(mapping, type, reference)
  return reference unless String === reference
  data = mapping[reference]
  @errors << "Reference to a missing #{type}: #{reference}" unless data
  return data
end

Resolve the start state reference.

def resolve_start_state
  return lookup(@syntax.states, "state", @syntax.start_state || "start") || {
    "name" => "missing_start_state",
    "kind" => "error",
    "transitions" => []
  }
end

lib/codnar/scanner.rb

The above code modifies the syntax object in place. This is safe because we are working on a deep_clone of the original syntax:

lib/codnar/hash_extensions.rb

Extend the core Hash class.

class Hash

Obtain a deep clone which shares nothing with this hash.

  def deep_clone
    return YAML.load(to_yaml)
  end

  Deep merge

end

Classifying Source Lines

Scanning a file to classified lines is a simple matter of applying the current state transitions to each line:

Scanner file processing

Scan the next file line.

def scan_line(line)
  until state_classified_line(line)

Do nothing

  end
end

Scan the current line using the current state transitions. Return true if the line was classified, of false if we need to try and classify it again using the updated (next) state.

def state_classified_line(line)
  @state.transitions.each do |transition|
    match = transition.pattern.andand.regexp.andand.match(line) if transition.next_state
    return classify_matching_line(line, transition, match) if match
  end
  classify_error_line(line, @state.name)
  return true
end

lib/codnar/scanner.rb

If a line matches a state transition, it is classified accordingly. Otherwise, it is reported as an error:

Scanner line processing

Handle a file line, only if it matches the pattern.

def classify_matching_line(line, transition, match)
  @state = transition.next_state
  kind = transition.kind
  return false unless kind # A +nil+ kind indicates the next state will classify the line.
  @lines << Scanner.extracted_groups(match, transition.pattern.groups || []).update({
    "line" => line,
    "kind" => kind,
    "number" => @errors.line_number
  })
  return true
end

Extract named groups from a match. As a special case, indentation is deleted if there is no payload.

def self.extracted_groups(match, groups)
  extracted = {}
  groups.each_with_index do |group, index|
    extracted[group] = match[index + 1]
  end
  extracted.delete("indentation") if match[0] == ""
  return extracted
end

Handle a file line that couldn’t be classified.

def classify_error_line(line, state_name)
  @lines << {
    "line" => line,
    "indentation" => line.indentation,
    "payload" => line.unindent,
    "kind" => "error",
    "state" => state_name,
    "number" => @errors.line_number
  }
  @errors << "State: #{state_name} failed to classify line: #{@lines.last.payload}"
end

lib/codnar/scanner.rb

Merging scanned lines to chunks

Once we have the array of scanned classified lines, we need to merge them into nested chunks. Here is a simple test that demonstrates using the merger:

test/merge_lines.rb

require "codnar"
require "olag/test"
require "test/spec"

Test merging classified lines to chunks.

class TestMergeLines < Test::Unit::TestCase

  include Test::WithErrors

  def test_merge_no_chunks
    lines = [ { "kind" => "code", "line" => "foo", "number" => 1, "indentation" => "", "payload" => "foo" } ]
    chunks = Codnar::Merger.chunks(@errors, "path", lines)
    @errors.should == []
    chunks.should == [ {
      "name" => "path",
      "locations" => [ { "file" => "path", "line" => 1 } ],
      "containers" => [],
      "contained" => [],
      "lines" => lines
    } ]
  end

  def test_valid_merge
    chunks = Codnar::Merger.chunks(@errors, "path", VALID_LINES)
    @errors.should == []
    chunks.should == VALID_CHUNKS
  end

  VALID_LINES = [
    { "kind" => "code",        "number" => 1,  "line" => "before top",
      "indentation" => "",     "payload" => "before top"          },
    { "kind" => "begin_chunk", "number" => 2, "line" => " {{{ top chunk",
      "indentation" => " ",    "payload" => "top chunk"           },
    { "kind" => "code",         "number" => 3, "line" => " before intermediate",
      "indentation" => " ",    "payload" => "before intermediate" },
    { "kind" => "begin_chunk", "number" => 4,  "line" => "  {{{ intermediate chunk",
      "indentation" => "  ",   "payload" => "intermediate chunk"  },
    { "kind" => "code",        "number" => 5,  "line" => "  before inner",
      "indentation" => "  ",   "payload" => "before inner"        },
    { "kind" => "begin_chunk", "number" => 6,  "line" => "   {{{ inner chunk",
      "indentation" => "   ",  "payload" => "inner chunk"         },
    { "kind" => "code",        "number" => 7,  "line" => "   inner line",
      "indentation" => "   ",  "payload" => "inner line"          },
    { "kind" => "end_chunk",   "number" => 8,  "line" => "   }}} inner chunk",
      "indentation" => "   ",  "payload" => "inner chunk"         },
    { "kind" => "code",        "number" => 9,  "line" => "  after inner",
      "indentation" => "  ",   "payload" => "after inner"         },
    { "kind" => "end_chunk",   "number" => 10, "line" => "  }}}",
      "indentation" => "  ",   "payload" => ""                    },
    { "kind" => "code",        "number" => 11, "line" => " after intermediate",
      "indentation" => " ",    "payload" => "after intermediate"  },
    { "kind" => "end_chunk",   "number" => 12, "line" => " }}} TOP CHUNK",
      "indentation" => " ",    "payload" => "TOP CHUNK"           },
    { "kind" => "code",        "number" => 13, "line" => "after top",
      "indentation" => "",     "payload" => "after top"           }
  ]

  VALID_CHUNKS = [
    { "name" => "path",
      "locations" => [ { "file" => "path", "line" => 1 } ],
      "containers" => [],
      "contained" => [ "top chunk" ],
      "lines" => [
        VALID_LINES[0].merge("indentation" => ""),
        { "kind" => "nested_chunk", "number" => 2, "line" => " {{{ top chunk",
          "indentation" => " ",     "payload" => "top chunk" },
        VALID_LINES[12].merge("indentation" => ""),
      ] },
    { "name" => "top chunk",
      "locations" => [ { "file" => "path", "line" => 2 } ],
      "containers" => [ "path" ],
      "contained" => [ "intermediate chunk" ],
      "lines" => [
        VALID_LINES[1].merge("indentation" => ""),
        VALID_LINES[2].merge("indentation" => ""),
        { "kind" => "nested_chunk", "number" => 4, "line" => "  {{{ intermediate chunk",
          "indentation" => " ",     "payload" => "intermediate chunk" },
        VALID_LINES[10].merge("indentation" => ""),
        VALID_LINES[11].merge("indentation" => ""),
      ] },
    { "name" => "intermediate chunk",
      "locations" => [ { "file" => "path", "line" => 4 } ],
      "containers" => [ "top chunk" ],
      "contained" => [ "inner chunk" ],
      "lines" => [
        VALID_LINES[3].merge("indentation" => ""),
        VALID_LINES[4].merge("indentation" => ""),
        { "kind" => "nested_chunk", "number" => 6, "line" => "   {{{ inner chunk",
          "indentation" => " ",     "payload" => "inner chunk" },
        VALID_LINES[8].merge("indentation" => ""),
        VALID_LINES[9].merge("indentation" => ""),
      ] },
    { "name" => "inner chunk",
      "locations" => [ { "file" => "path", "line" => 6 } ],
      "containers" => [ "intermediate chunk" ],
      "contained" => [],
      "lines" => [
        VALID_LINES[5].merge("indentation" => ""),
        VALID_LINES[6].merge("indentation" => ""),
        VALID_LINES[7].merge("indentation" => "")
      ] }
  ]

  def test_mismatching_end_chunk_line
    lines = [
      { "kind" => "begin_chunk", "number" => 1, "line" => "{{{ top chunk",
        "indentation" => "",     "payload" => "top chunk"     },
      { "kind" => "end_chunk",   "number" => 2, "line" => "}}} not top chunk",
        "indentation" => "",     "payload" => "not top chunk" }
    ]
    Codnar::Merger.chunks(@errors, "path", lines)
    @errors.should == [
      "#{$0}: End line for chunk: not top chunk mismatches begin line for chunk: top chunk in file: path at line: 2"
    ]
  end

  def test_missing_begin_chunk_name
    lines = [
      { "kind" => "begin_chunk", "number" => 1, "line" => "{{{", "indentation" => "", "payload" => "" },
      { "kind" => "end_chunk",   "number" => 2, "line" => "}}}", "indentation" => "", "payload" => "" }
    ]
    Codnar::Merger.chunks(@errors, "path", lines)
    @errors.should == [ "#{$0}: Begin line for chunk with no name in file: path at line: 1" ]
  end

  def test_missing_end_chunk_line
    lines = [ { "kind" => "begin_chunk", "number" => 1, "line" => "{{{ top chunk",
                "indentation" => "",     "payload" => "top chunk" } ]
    Codnar::Merger.chunks(@errors, "path", lines)
    @errors.should == [ "#{$0}: Missing end line for chunk: top chunk in file: path at line: 1" ]
  end

end

And here is the implementation:

lib/codnar/merger.rb

module Codnar

Merge classified lines into chunks.

  class Merger

Convert classified lines from a disk file into chunks.

    def self.chunks(errors, path, lines)
      return Merger.new(errors, path, lines).chunks
    end

Return merged chunks containing the classified lines. Each chunk lines are only indented relative to the chunk. This allows nested chunks to be presented unindented in the final weaved HTML.

    def chunks
      @chunks = [ file_chunk ]
      @stack = @chunks.dup
      @errors.in_path(@path) { merge_lines }
      @chunks.each { |chunk| Merger.unindent_lines(chunk.lines) }
      return @chunks
    end

  protected

Convert classified lines from a disk file into chunks.

    def initialize(errors, path, lines)
      @errors = errors
      @path = path
      @lines = lines
    end

The top-level all-the-disk-file chunk (without any classified lines)

    def file_chunk
      return {
        "name" => @path,
        "locations" => [ { "file" => @path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "lines" => []
      }
    end

    Merging nested chunk lines

    Unindenting chunk lines


  end

end

Merging nested chunk lines

To merge the nested chunk lines, we maintain a stack of the current chunks. Each begin_chunk line pushes another chunk on the stack, and each end_chunk line pops it. If any chunks are not properly terminated, they will remain in the stack when all the lines are processed.

Merging nested chunk lines

Merge all the classified lines into chunks

def merge_lines
  @lines.each do |line|
    @errors.at_line(line.number)
    merge_line(line)
  end
  end_unterminated_chunks
end

End all chunks missing a terminating end chunk classified line.

def end_unterminated_chunks
  @stack.shift
  @stack.each do |chunk|
    @errors << "Missing end line for chunk: #{chunk.name}"
  end
end

Merge the next classified line.

def merge_line(line)
  case line.kind
  when "begin_chunk"
    begin_chunk_line(line)
  when "end_chunk"
    end_chunk_line(line)
  else
    @stack.last.lines << line
  end
end

Merge a classified line that starts a new chunk.

def begin_chunk_line(line)
  chunk = contained_chunk(container = @stack.last, line)
  container.contained << chunk.name
  container.lines << line.merge("kind" => "nested_chunk")
  @chunks << chunk
  @stack << chunk
end

A chunk contained in another chunk.

def contained_chunk(container, line)
  return {
    "name" => new_chunk_name(line.payload),
    "locations" => [ { "file" => @path, "line" => line.number } ],
    "containers" => [ container.name ],
    "contained" => [],
    "lines" => [ line ]
  }
end

Return the name of a new chunk.

def new_chunk_name(name)
  return name unless name.nil? || name == ""
  @errors << "Begin line for chunk with no name"
  return "#{@path}/#{@chunks.size}"
end

Merge a classified line that ends an existing chunk.

def end_chunk_line(line)
  return missing_begin_chunk_line(line) if @stack.size == 1
  chunk = @stack.last
  @errors << "End line for chunk: #{line.payload} mismatches begin line for chunk: #{chunk.name}" \
    unless Merger.matching_end_chunk_line?(chunk, line)
  chunk.lines << line
  @stack.pop
end

Check whether an end chunk classified line matches the begin chunk classified line.

def self.matching_end_chunk_line?(chunk, line)
  line_name = line.payload
  return line_name.to_s == "" || line_name.to_id == chunk.name.to_id
end

lib/codnar/merger.rb

Unindenting merged chunk lines

Nested chunks are typically indented relative to their container chunks. However, in the generated documentation, these chunks are displayed on their own, and preserving this relative indentation would reduce their readability. We therefore unindent all chunks as much as possible as the final step.

Unindenting chunk lines

Remove the common indentation from a sequence of classified lines.

def self.unindent_lines(lines)
  indentation = Merger.minimal_indentation(lines)
  lines.each do |line|
    line.indentation = line.indentation.andand.unindent(indentation)
  end
end

Find out the minimal indentation of all the classified lines.

def self.minimal_indentation(lines)
  return lines.map { |line| line.indentation }.compact.min
end

lib/codnar/merger.rb

Generating chunk HTML

Now that we have each chunk's lines, we need to convert them to HTML.

Grouping lines of the same kind

Instead of formatting each line on its own, we batch the operations to work on all lines of the same kind at once. Here is a simple test that demonstrates using the grouper:

test/group_lines.rb

require "codnar"
require "test/spec"

Test grouping classified lines by their kind.

class TestGroupLines < Test::Unit::TestCase

  def test_group_empty_lines
    Codnar::Grouper.lines_to_groups([]).should == []
  end

  def test_group_one_line
    Codnar::Grouper.lines_to_groups([ { "kind" => "code" } ]).should == [ [ { "kind" => "code" } ] ]
  end

  def test_group_lines
    Codnar::Grouper.lines_to_groups([
      { "kind" => "code", "line" => "0" },
      { "kind" => "code", "line" => "1" },
      { "kind" => "comment", "line" => "2" },
      { "kind" => "code", "line" => "3" },
    ]).should == [ [
      { "kind" => "code", "line" => "0" },
      { "kind" => "code", "line" => "1" },
    ], [
      { "kind" => "comment", "line" => "2" },
    ], [
      { "kind" => "code", "line" => "3" },
    ] ]
  end

end

And here is the implementation:

lib/codnar/grouper.rb

module Codnar

Group classified lines according to kind.

  module Grouper

Convert array of classified lines to array of classified line groups with the same line kind.

    def self.lines_to_groups(lines)
      groups = lines.reduce([], &method(:group_next_line))
      return groups
    end

  protected

Add the next classified line to the classified line groups.

    def self.group_next_line(groups, next_line)
      last_group = groups.last
      if last_group.andand.last.andand.kind == next_line.kind
        last_group.push(next_line)
      else
        groups.push([ next_line ])
      end
      return groups
    end

  end

end

Formatting lines as HTML

Formatting is based on a configuration that specifies, for (a group of) lines of each kind, how to convert it to HTML. Here is a simple test that demonstrates using the formatter:

test/format_lines.rb

require "codnar"
require "olag/test"
require "test/spec"

Test converting classified lines to HTML.

class TestFormatLines < Test::Unit::TestCase

  include Test::WithErrors

  alias_method :original_setup, :setup

  def setup
    original_setup
    Codnar::Formatter.send(:public, *Codnar::Formatter.protected_instance_methods)
    @formatter = Codnar::Formatter.new(@errors,
                               "code" => "Formatter.lines_to_pre_html(lines)",
                               "fail" => "TestFormatLines.fail")
  end

  def test_process_html_lines
    lines_group = @formatter.process_lines_group([
      { "kind" => "html", "number" => 1, "payload" => "foo", },
      { "kind" => "html", "number" => 2, "payload" => "bar", },
      { "kind" => "html", "number" => 3, "payload" => "baz", },
    ])
    @errors.should == []
    lines_group.should == [ { "kind" => "html", "number" => 1, "payload" => "foo\nbar\nbaz" } ]
  end

  def test_process_unknown_lines
    lines_group = @formatter.process_lines_group([
      { "kind" => "unknown-kind", "number" => 1, "payload" => "<foo>", },
    ])
    @errors.should == [ "#{$0}: No formatter specified for lines of kind: unknown-kind" ]
    lines_group.should == [ { "kind" => "html", "number" => 1,
                              "payload" => "<pre class='missing formatter error'>\n&lt;foo&gt;\n</pre>" } ]
  end

  def test_process_code_lines
    lines_group = @formatter.process_lines_group([
      { "kind" => "code", "number" => 1, "payload" => "<foo>", },
      { "kind" => "code", "number" => 2, "payload" => "bar", },
    ])
    @errors.should == []
    lines_group.should == [ { "kind" => "html", "number" => 1,
                              "payload" => "<pre>\n&lt;foo&gt;\nbar\n</pre>" } ]
  end

  def test_failed_formatter
    lines_group = @formatter.process_lines_group([ { "kind" => "fail", "number" => 1, "payload" => "foo", } ])
    @errors.size.should == 1
    @errors.last.should =~ \
      /#{$0}: Formatter: TestFormatLines.fail for lines of kind: fail failed with exception:.*in `fail': Reason/
    lines_group.should == [ { "kind" => "html", "number" => 1,
                              "payload" => "<pre class='failed formatter error'>\nfoo\n</pre>" } ]
  end

  def test_lines_to_html
    lines_group = @formatter.lines_to_html([
      { "kind" => "html", "number" => 1, "payload" => "foo" },
      { "kind" => "code", "number" => 2, "payload" => "<bar>" },
      { "kind" => "html", "number" => 3, "payload" => "baz" },
    ])
    @errors.should == []
    lines_group.should == "foo\n<pre>\n&lt;bar&gt;\n</pre>\nbaz"
  end

  def self.fail
    raise "Reason"
  end

end

And here is the implementation:

lib/codnar/formatter.rb

module Codnar

Format chunks into HTML.

  class Formatter

Construct a Formatter based on a mapping from a classified line kind, to a Ruby expression, that converts an array of classified lines of that kind, into an array of lines of another kind. This expression is simply eval-ed, and is expected to make use of a variable called lines that contains an array of classified lines, as produced by a Scanner. The result of evaluating the expressions is expected to be an array of any number of classified lines of any kind.

Formatting repeatedly applies these formatting expressions, until the result is an array containing a single classified line, which has the kind html and whose payload field contains the unified final HTML presentation of the original classified lines. In each processing round, all consecutive lines of the same kind are formated together. This allows for properly formating line kinds that use a multi-line notation such as Markdown.

The default formatting expression for the kind html simply joins all the payloads of all the classified lines into a single html, and returns a single “line” containing this joined HTML. All other line kinds need to have a formatting expression explicitly specified in the formatters mapping.

If no formatting expression is specified for some classified line kind, an error is reported and the classified lines are wrapped in a pre HTML element with a missing_formatter CSS class. Similarly, if a formatting expression fails (raises an exception), an error is reported and the lines are wrapped in a pre HTML element with a failed_formatter CSS class.

    def initialize(errors, formatters)
      @errors = errors
      @formatters = { "html" => "Formatter.merge_html_lines(lines)" }.merge(formatters)
    end

Repeatedly process an array of classified lines of arbitrary kinds until we obtain a single classified “line” containing a unified final HTML presentation of the original classified lines.

    def lines_to_html(lines)
      until Formatter.single_html_line?(lines)
        lines = Grouper.lines_to_groups(lines).map { |group| process_lines_group(group) }.flatten
      end
      return lines.last.andand.payload.to_s
    end

  protected

Check whether we have finally got a single HTML classified “line” for the whole classified lines sequence.

    def self.single_html_line?(lines)
      return lines.size <= 1 && lines[0].andand.kind == "html"
    end

Perform one pass of processing toward HTML on a group of consecutive classified lines with the same kind.

    def process_lines_group(lines)
      kind = lines.last.kind
      formatter = @formatters[kind] ||= missing_formatter(kind)
      begin
        return eval formatter
      rescue
        return failed_formatter(lines, formatter, $!)
      end
    end

Return an expression for formatting classified lines of some kind that doesn’t have such a formatting expression already specified.

    def missing_formatter(kind)
      @errors << "No formatter specified for lines of kind: #{kind}"
      return "Formatter.lines_to_pre_html(lines, :class => 'missing formatter error')"
    end

Format classified lines as HTML if the original specified formatting expression failed.

    def failed_formatter(lines, formatter, exception)
      @errors << "Formatter: #{formatter} for lines of kind: #{lines.last.kind} failed with exception: #{exception}"
      return Formatter.lines_to_pre_html(lines, :class => "failed formatter error")
    end

    Basic formatters


  end

end

Basic formatters

The implementation contains some basic formatting functions. These are sufficient for generic source code processing.

Basic formatters

Merge a group of consecutive indented lines into a group with a single classified “line”. The given block is passed the joined content of all the lines, and may process it to yield the merged “line” content. If an explicit indentation is given, it overrides each line’s indentation. This is useful for avoiding the inclusion of the indentation in the payload.

def self.merge_lines(lines, kind, indentation = nil)
  payload = yield lines.map { |line| (indentation || line.indentation || "") + (line.payload || "") }.join("\n")
  merged_line = lines[0]
  merged_line.merge!("kind" => kind, "payload" => payload)
  merged_line.delete("indentation") if indentation.nil?
  return [ merged_line ]
end

Merge a group of consecutive HTML classified lines into a group with a single HTML classified “line”. This is the default formatting expression for HTML lines.

def self.merge_html_lines(lines)
  return Formatter.merge_lines(lines, "html") { |payload| payload }
end

Format classified lines into HTML using a pre element with optional attributes. This is the default formatting expression for classified lines of unknown kinds.

def self.lines_to_pre_html(lines, attributes = {})
  return Formatter.merge_lines(lines, "html") do |payload|
    ( "<pre" + Formatter.html_attributes(attributes) + ">\n" \
    + CGI.escapeHTML(payload) + "\n" \
    + "</pre>" )
  end
end

Convert an attribute mapping to HTML.

def self.html_attributes(attributes)
  return "" if attributes == {}
  return " " + attributes.map { |name, value| "#{name}='#{CGI.escapeHTML(value.to_s)}'" }.join(" ")
end

Format classified lines that indicate a nested chunk to HTML.

def self.nested_chunk_lines_to_html(lines)
  return lines.each do |line|
    line.kind = "html"
    chunk_name = line.payload
    line.payload = "<pre class='nested chunk'>\n" \
                 + (line.indentation || "") \
                 + "<a class='nested chunk' href='##{chunk_name.to_id}'>#{CGI.escapeHTML(chunk_name)}</a>\n" \
                 + "</pre>"
    line.delete("indentation")
  end
end

Indent arbitrary HTML lines to line up with the rest of the lines.

def self.unindented_lines_to_html(lines)
  merged_line = lines[0]
  html = lines.map { |line| line.payload + "\n" }.join
  merged_line.payload = self.indent_html(merged_line.indentation, html)
  merged_line.kind = "html"
  return [ merged_line ]
end

Indent a chunk of HTML by some spaces. This uses a table, which is arguably the wrong way to do it.

def self.indent_html(indentation, html)
  return html.chomp if indentation.nil?
  return "<table class='layout'>\n<tr>\n" \
       + "<td class='indentation'>\n" \
       + "<pre>#{indentation}</pre>\n" \
       + "</td>\n" \
       + "<td class='html'>\n" \
       + html \
       + "</td>\n" \
       + "</tr>\n</table>"
end

Cast a sequence of classified lines into a different kind without any processing.

def self.cast_lines(lines, kind)
  lines = lines.dup
  lines.each { |line| line.kind = kind }
  return lines
end

Convert a sequence of marked-up classified lines to (unindented) HTML

def self.markup_lines_to_html(lines, klass, css_class = nil)
  implementation = String === klass ? Kernel.const_get(klass) : klass
  css_class ||= implementation.to_s.downcase.gsub("::", "-")
  return Formatter.merge_lines(lines, "unindented_html", "") do |payload|
    ( "<div class='#{css_class} #{lines[0].kind} markup'>\n" \
    + implementation.to_html(payload) \
    + "</div>" )
  end
end

lib/codnar/formatter.rb

Markup formats

The markup_lines_to_html formatter above relies on the existence of a class for converting comments from the specific markup format to HTML. Currently, two such formats are supported:

RDoc, the default markup format used in Ruby comments. Here is a simple test that demonstrates using RDoc:

test/expand_rdoc.rb

require "codnar"
require "test/spec"

Test expanding RDoc text.

class TestExpandRDoc < Test::Unit::TestCase

  def test_emphasis_text
    Codnar::RDoc.to_html("_text_").should == "<p>\n<em>text</em>\n</p>\n"
  end

  def test_strong_text
    Codnar::RDoc.to_html("*text*").should == "<p>\n<strong>text</strong>\n</p>\n"
  end

  def test_indented_pre
    Codnar::RDoc.to_html("base\n  indented\n    more\nback\n").should \
                      == "<p>\nbase\n</p>\n<pre>indented\n  more</pre>\n<p>\nback\n</p>\n"
  end

end

And here is the implementation:

lib/codnar/rdoc.rb

module Codnar

Convert RDoc to HTML.

  module RDoc

Process a RDoc String and return the resulting HTML.

    def self.to_html(rdoc)
      return ::RDoc::Markup::ToHtml.new.convert(rdoc).clean_markup_html
    end

  end

end

Markdown, a generic markup syntax used across many systems and languages. Here is a simple test that demonstrates using Markdown:

test/expand_markdown.rb

require "codnar"
require "test/spec"

Test expanding Markdown text.

class TestExpandMarkdown < Test::Unit::TestCase

  def test_emphasis_text
    Codnar::Markdown.to_html("*text*").should == "<p>\n<em>text</em>\n</p>\n"
  end

  def test_strong_text
    Codnar::Markdown.to_html("**text**").should == "<p>\n<strong>text</strong>\n</p>\n"
  end

  def test_embed_chunk
    Codnar::Markdown.to_html("[[Chunk|template]]").should == "<p>\n<embed src='chunk' type='x-codnar/template'/>\n</p>\n"
  end

  def test_embed_anchor
    Codnar::Markdown.to_html("[[#Name]]").should == "<p>\n<a id='name'/>\n</p>\n"
  end

  def test_embed_link
    Codnar::Markdown.to_html("[Label](#Name)").should == "<p>\n<a href=\"#name\">Label</a>\n</p>\n"
  end

end

And here is the implementation:

lib/codnar/markdown.rb

module Codnar

Convert Markdown to HTML.

  module Markdown

Process a Markdown String and return the resulting HTML. In addition to the normal Markdown syntax, processing supports the following Codnar-specific extensions:

The notation “[[chunk|template]]” is expanded to embedding the specified chunk (name) using the specified template at Weave time.
The notation “[[#name]]” defines an empty anchor. The HTML anchor id is not the specified name, but rather the identifier generated from it (in the same way that chunk names are converted to identifiers).
The notation “[…](#name)” defines a link to an anchor, which is either the chunk with the specified name, or an empty anchor defined as above.

    def self.to_html(markdown)
      markdown = embed_chunks(markdown)
      markdown = id_anchors(markdown)
      html = RDiscount.new(markdown).to_html
      html = id_links(html)
      return html.clean_markup_html
    end

  protected

Expand “[[chunk|template]]” to HTML embed tags. Use identifiers instead of names in the src field for safety, unless the template is a magical file template, in which case we must preserve the file path.

    def self.embed_chunks(markdown)
      return markdown.gsub(/\[\[(.*?)\|(.*?)\]\]/) do
        src = $1
        template = $2
        src = src.to_id unless Codnar::Weaver::FILE_TEMPLATE_PROCESSORS.include?(template)
        "<embed src='#{src}' type='x-codnar/#{template}'/>"
      end
    end

Expand “[[#name]]” anchors to HTML anchor tags with the matching identifier.

    def self.id_anchors(markdown)
      return markdown.gsub(/\[\[#(.*?)\]\]/) { "<a id='#{$1.to_id}'/>" }
    end

Expand “href=‘#name’” links to the matching “href=‘#id’” links.

    def self.id_links(html)
      return html.gsub(/href=(["'])#(.*?)(["'])/) { "href=#{$1}##{$2.to_id}#{$3}" }
    end

  end

end

Haddock, a specific markup syntax used in comments to document Haskell code. Here is a simple test that demonstrates using Haddock:

test/expand_haddock.rb

require "codnar"
require "test/spec"

Test expanding Haddock text.

class TestExpandHaddock < Test::Unit::TestCase

  def test_normal_text
    Codnar::Haddock.to_html("normal").should == "<p>normal\n</p>\n"
  end

  def test_identifier_text
    Codnar::Haddock.to_html("'Int'").should == "<p><code>Int</code>\n</p>\n"
  end

  def test_emphasis_text
    Codnar::Haddock.to_html("/emphasis/").should == "<p><em>emphasis</em>\n</p>\n"
  end

  def test_code_text
    Codnar::Haddock.to_html("@code@").should == "<pre>code</pre>\n"
  end

end

And here is the implementation:

lib/codnar/haddock.rb

module Codnar

Convert Haddoc to HTML.

  class Haddock

Process a Haddock String and return the resulting HTML.

    def self.to_html(haddock)
      with_temporary_directory do |path|
        write_temporary_file(path, haddock)
        run_haddock(path)
        html = read_html_file(path)
        clean_html(html)
      end
    end

  protected

Run a block using a temporary directory, that is then removed. TODO: This should be in some more generic place.

    def self.with_temporary_directory
      path = create_temporary_directory
      result = yield path
      FileUtils.rm_rf(path)
      return result
    end

Create a temporary directory to run Haddock in.

    def self.create_temporary_directory
      file = Tempfile.open("dir", ".")
      path = file.path
      File.delete(path)
      Dir.mkdir(path)
      return path
    end

Minimal header to insert before the Haddock String to trick Haddock into generating HTML from it.

    HADDOCK_HEADER = <<-EOF.unindent
      module Wrapper where
      -- $doc
    EOF

Write the Haddock String into a wrapper Haskell file so we’ll be able to run Haddock to generate HTML from it.

    def self.write_temporary_file(path, haddock)
      File.open(path + "/wrapper.hs", "w") do |file|
        file.write(HADDOCK_HEADER)
        haddock = self.patch_module_comments(haddock)
        file.write("-- " + haddock.gsub("\n", "\n-- "))
      end
    end

Convert structured module comments to a definition list. TODO: This is rather flaky.

    def self.patch_module_comments(haddock)
      haddock.sub!(/^\s*Module\s*:\s*\$\s*header\s*\$\s*$/i, "Module")
      return haddock.gsub(/^(\s*)(Module|Description|Copyright|License|Maintainer|Stability|Portability)(\s*):/, "\n\\1[@\\2@]\\3")
    end

Run Haddock to convert the wrapper Haskell file into HTML documentation.

    def self.run_haddock(path)
      system("cd #{path} && haddock --html wrapper.hs > haddock.out 2>&1")
    end

Read the HTML generated by Haddock.

    def self.read_html_file(path)
      return File.read(path + "/Wrapper.html")
    end

Extract the clean generated HTML from Haddock’s output.

    def self.clean_html(html)
      html.gsub!("\r\n", "\n")
      html.sub!(/.*<div class="doc">/m, '')
      html.sub!(/<\/div><\/div><\/div><div id="footer">.*/m, "\n")
      return html
    end

  end

end

In all cases, the HTML generated by the markup format conversion is a bit messy. We therefore clean it up:

Clean HTML

Clean HTML generated by markup formatters. Such HTML tends to have extra empty lines for no apparent reason. Cleaning it up seems to be safe enough, and eliminates the ugly additional vertical space in the final HTML.

def clean_markup_html
  return gsub("\r\n", "\n") \
        .gsub(/\n*<p>\n*/, "\n<p>\n") \
        .gsub(/\n*<\/p>\n*/, "\n</p>\n") \
        .gsub(/\n*<pre>\n+/, "\n<pre>\n") \
        .gsub(/\n+<\/pre>\n*/, "\n</pre>\n") \
        .sub(/^\n*/, "")
end

lib/codnar/string_extensions.rb

Generating diagrams using GraphViz

If you have graphviz installed, it is possible to use it to generate SVG diagrams that can be embedded directly into the HTML. This is implemented as an additional formatter; in principle, you this allows embeding the GraphViz directives directly in the code, but in practice people prefer keeping the diagrams as separate files.

We pre-process the GraphViz directives using the m4 macro processor. This allows dramatically reducing the amount of repeated boilerplate in the diagram definitions, by defining macros for node and edge styles and, if desired, more advanced techniques.

Here is a simple test that demonstrates generating SVG from a GraphViz diagram:

test/graphviz_diagrams.rb

require "codnar"
require "test/spec"

Test highlighting syntax using GVim.

class TestGraphVizDiagrams < Test::Unit::TestCase

  MINIMAL_DIAGRAM_SVG = <<-EOF.unindent #! ((( svg

    <svg width="62pt" height="116pt"
     viewBox="0.00 0.00 62.00 116.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
    <g id="graph1" class="graph" transform="scale(1 1) rotate(0) translate(4 112)">
    <title>_anonymous_0</title>
    <polygon fill="white" stroke="white" points="-4,5 -4,-112 59,-112 59,5 -4,5"/>
    <!-- A -->
    <g id="node1" class="node"><title>A</title>
    <ellipse fill="none" stroke="black" cx="27" cy="-90" rx="27" ry="18"/>
    <text text-anchor="middle" x="27" y="-85.4" font-family="Times New Roman,serif" font-size="14.00">A</text>
    </g>
    <!-- B -->
    <g id="node3" class="node"><title>B</title>
    <ellipse fill="none" stroke="black" cx="27" cy="-18" rx="27" ry="18"/>
    <text text-anchor="middle" x="27" y="-13.4" font-family="Times New Roman,serif" font-size="14.00">B</text>
    </g>
    <!-- A&#45;&gt;B -->
    <g id="edge2" class="edge"><title>A&#45;&gt;B</title>
    <path fill="none" stroke="black" d="M27,-71.8314C27,-64.131 27,-54.9743 27,-46.4166"/>
    <polygon fill="black" stroke="black" points="30.5001,-46.4132 27,-36.4133 23.5001,-46.4133 30.5001,-46.4132"/>
    </g>
    </g>
    </svg>
  EOF

  #! ))) svg

  def test_valid_diagram
    diagram = <<-EOF.unindent #! ((( dot

      define(`X', `A')
      digraph {
        X -> B;
      }
    EOF

    #! ))) dot
    Codnar::GraphViz.to_html(diagram).should == MINIMAL_DIAGRAM_SVG
  end

  def test_invalid_diagram
    diagram = <<-EOF.unindent #! ((( dot

      digraph {
        A ->
    EOF

    #! ))) dot
    lambda { Codnar::GraphViz.to_html(diagram) }.should.raise
  end

end

And here is the implementation:

lib/codnar/graphviz.rb

module Codnar

Generate diagrams using GraphViz.

  class GraphViz

Convert a string containing a GraphViz diagram into SVG suitable for embedding into the HTML documentation. We pre-process the diagram using M4 to allow cutting down on the boilerplate (repeating the same styles in many nodes etc.). This should not be harmful for diagrams that do not use M4 commands.

    def self.to_html(diagram)
      stdin, stdout, stderr = Open3.popen3("m4 | dot -Tsvg")
      write_diagram(stdin, diagram)
      check_for_errors(stderr)
      return clean_output(stdout)
    end

  protected

Send the diagram to the commands pipe.

    def self.write_diagram(stdin, diagram)
      stdin.write(diagram)
      stdin.close
    end

Ensure we got no processing errors from either m4 or dot. If we did, raise them, and they will be handled by the formatter wrapping code.

    def self.check_for_errors(stderr)
      errors = stderr.read
      raise errors.sub(/Error: <stdin>:\d+: /, "") if errors != ""
    end

Clean the SVG we got to make it suitable for embedding in HTML.

    def self.clean_output(stdout)
      return stdout.read.sub(/.*<svg/m, "<svg").gsub(/\r/, "")
    end

  end

end

Syntax highlighting using GVIM

If you have GVim istalled, it is possible to use it to generate syntax highlighting. This is a slow operation, as GVim was never meant to be used as a command-line tool. However, what it lacks in speed it compensates for in scope; almost any language you can think of has a GVim syntax highlighting definition. Here is a simple test that demonstrates using GVim for syntax highlighting:

test/gvim_highlight_syntax.rb

require "codnar"
require "test/spec"

Test highlighting syntax using GVim.

class TestGVimHighlightSyntax < Test::Unit::TestCase

  def setup
    Codnar::GVim.force_recompute = true
  end

  def teardown
    Codnar::GVim.force_recompute = false
  end

  def test_ruby_no_css
    ruby = <<-EOF.unindent
      def foo
        return bar = baz
      end
    EOF
    Codnar::GVim.cached_syntax_to_html(ruby, "ruby").should == <<-EOF.unindent #! ((( html

      <div class='ruby code syntax' bgcolor="#ffffff" text="#000000">
      <font face="monospace">
      <font color="#ff40ff">def</font>&nbsp;<font color="#00ffff">foo</font><br />
      &nbsp;&nbsp;<font color="#ffff00">return</font>&nbsp;bar = baz<br />
      <font color="#ff40ff">end</font><br />
      </font>
      </div>
    EOF

    #! ))) html
  end

  def test_ruby_css
    ruby = <<-EOF.unindent
      def foo
        return bar = baz
      end
    EOF
    Codnar::GVim.cached_syntax_to_html(ruby, "ruby", [ "+:let html_use_css=1" ]).should == <<-EOF.unindent #! ((( html

      <pre class='ruby code syntax'>
      <span class="PreProc">def</span> <span class="Identifier">foo</span>
        <span class="Statement">return</span> bar = baz
      <span class="PreProc">end</span>
      </pre>
    EOF

    #! ))) html
  end

end

And here is the implementation:

lib/codnar/gvim.rb

module Codnar

Syntax highlight using GVim.

  class GVim

Convert a sequence of classified code lines to HTML using GVim syntax highlighting. The commands array allows configuring the way that GVim will format the output (see the cached_syntax_to_html method for details).

    def self.lines_to_html(lines, syntax, commands = [])
      return Formatter.merge_lines(lines, "html") do |payload|
        GVim.cached_syntax_to_html(payload + "\n", syntax, commands).chomp
      end
    end

The cache used for speeding up recomputing the same syntax highlighting HTML.

    @cache = Cache.new(".gvim-cache") do |data|
      GVim.uncached_syntax_to_html(data.text, data.syntax, data.commands)
    end

Force recomputation of the syntax highlighting HTML, even if a cached version exists.

    def self.force_recompute=(force_recompute)
      @cache.force_recompute = force_recompute
    end

Highlight syntax of text using GVim. This uses the GVim standard CSS classes to mark keywords, identifiers, and so on. See the GVim documentation for details. The commands array allows configuring the way that GVim will format the output. For example:

The command "+:colorscheme <name>" will override the default color scheme used.
The command "+:let html_use_css=1" will just annotate each HTML tag with a CSS class, instead of embedding some specific style directly into the tag. In this case the colorscheme and background are ignored; you will need to provide your own CSS stylesheet as part of the final woven document to style the marked-up words.

Additional commands may be useful; GVim provides a full scripting environment so there is no theoretical limit to what can be done here.

Since GVim is as slow as molasses to start up, we cache the results of highlighting the syntax of each code fragment in a directory called .gvim-cache, which can appear at the current working directory or in any of its parents.

    def self.cached_syntax_to_html(text, syntax, commands = [])
      data = { "text" => text, "syntax" => syntax, "commands" => commands }
      return @cache[data]
    end

Highlight syntax of text using GVim, without caching. This is slow (measured in seconds), due to GVim’s start-up tim. See the cached_syntax_to_html method for a faster variant and functionality details.

    def self.uncached_syntax_to_html(text, syntax, commands = [])
      file = write_temporary_file(text)
      run_gvim(file, syntax, commands)
      html = read_html_file(file)
      delete_temporary_files(file)
      return clean_html(html, syntax)
    end

  protected

Write the text to highlight the syntax of into a temporary file.

    def self.write_temporary_file(text)
      file = Tempfile.open("codnar-")
      file.write(text)
      file.close(false)
      return file
    end

Run GVim to highlight the syntax of a temporary file. This uses the little-known ability of GVim to emit the syntax highlighting as HTML using only command-line arguments.

    def self.run_gvim(file, syntax, commands)
      path = file.path
      ENV["DISPLAY"] = "none" # Otherwise the X11 server *does* affect the result.
      command = [
        "gvim",
        "-f", "-X",
        "-u", "none",
        "-U", "none",
        "+:let html_ignore_folding=1",
        "+:let use_xhtml=1",
        "+:let html_use_css=0",
        "+:syn on",
        "+:set syntax=#{syntax}",
        commands,
        "+run! syntax/2html.vim",
        "+:f #{path}",
        "+:wq", "+:q",
        path
      ]
      system("echo '\n' | '#{command.flatten.join("' '")}' > /dev/null 2>&1")
    end

Read the HTML with the syntax highlighting written out by GVim.

    def self.read_html_file(file)
      return File.read(html_file_path(file))
    end

Delete both the text and HTML temporary files.

    def self.delete_temporary_files(file)
      File.delete(html_file_path(file))
      file.delete
    end

Find the path of the generate HTML file. You’d think it would be predictable, but it ends up either “.html” or “.xhtml” depending on the system.

    def self.html_file_path(file)
      return Dir.glob(file.path + ".*html")[0]
    end

Extract the clean highlighted syntax HTML from GVim’s HTML output.

    def self.clean_html(html, syntax)
      if html =~ /<pre>/
        html.sub!(/.*?<pre>/m, "<pre class='#{syntax} code syntax'>")
        html.sub!("</body>\n</html>\n", "")
      else
        html.sub!(/.*?<body/m, "<div class='#{syntax} code syntax'")
        html.sub!("</body>\n</html>\n", "</div>\n")
      end
      return html
    end

  end

end

Since GVim is so slow, we are using caching to minimize the time it takes to recompute the same code's highlighted HTML. This is pretty useful in practice - making changes in one chunk in a file will not require recomputing the highlighting for any of the unchanged chunks in the same file. Here is a simple test of using the caching functionality:

test/cache_computations.rb

require "codnar"
require "olag/test"
require "test/spec"

Test caching long computations.

class TestCacheComputations < Test::Unit::TestCase

  include Test::WithTempfile

  def test_cached_computation
    cache = make_addition_cache(directory = create_tempdir(".."))
    cache[1].should == 2
    File.open(Dir.glob(directory + "/*")[0], "w") { |file| file.puts("3") }
    cache[1].should == 3
    cache.force_recompute = true
    cache[1].should == 2
  end

  def test_uncached_computation
    stderr = capture_stderr { make_addition_cache("no-such-directory")[1].should == 2 }
    stderr.should.include?("no-such-directory")
  end

protected

Run a block and capture its standard error (without using FakeFS).

  def capture_stderr
    stderr_path = write_tempfile("stderr", "")
    Olag::Globals.without_changes do
      $stderr = File.open(stderr_path, "w")
      yield
    end
    return File.read(stderr_path)
  end

Create a cache for the “+ 1” operation.

  def make_addition_cache(directory)
    return Codnar::Cache.new(directory) { |value| value + 1 }
  end

end

And here is the implementation:

lib/codnar/cache.rb

module Codnar

Cache long computations in disk files.

  class Cache

Whether to recompute values even if they are cached.

    attr_accessor :force_recompute

Connect to an existing disk cache. The cache is expected to be stored in a directory of the specified name, which is either in the current working directory or in one of its parent directories.

    def initialize(directory, &block)
      @force_recompute = false
      @computation = block
      @directory = find_directory(Dir.pwd, directory)
      if @directory
        class <<self; alias [] :cached_computation; end
      else
        class <<self; alias [] :uncached_computation; end
        $stderr.puts("#{$0}: Could not find cache directory: #{directory}.")
      end
    end

Access the results of the computation for the specified input. Fetch the result from the cache if it is there, otherwise invoke the computation and store the result in the cache for the next time.

    def cached_computation(input)
      file = cache_file(input)
      return YAML.load_file(file) if File.exists?(file) and not @force_recompute
      result = @computation.call(input)
      File.open(file, "w") { |file| file.write(result.to_yaml) }
      return result
    end

Return the file expected to cache the computed results for a given input,

    def cache_file(input)
      key = Digest.hexencode(Digest::SHA2.digest(input.to_yaml))
      return @directory + "/" + key + ".yaml"
    end

Access the results of a computation for the specified input, in case we do not have a cache directory to look for and store the results in.

    def uncached_computation(input)
      return @computation.call(input)
    end

  protected

Find the path of the cache directory, search from the given working directory upward until finding a match.

    def find_directory(working_directory, cache_directory)
      directory = working_directory + "/" + cache_directory
      return directory if File.exists?(directory)
      parent_directory = File.dirname(working_directory)
      return nil if parent_directory == working_directory
      return find_directory(parent_directory, cache_directory)
    end

  end

end

Syntax highlighting using CodeRay

CodeRay is a Ruby gem that knows how to highlight the syntax of many popular languages. It is much faster than using GVim`but doesn't offer the huge range of programming languages offered by GVim (for example, it does not currently offer shell script highlighting). If your languages are covered by it, it may serve as a convenient replacement to the slow GVim approach.

Here is a simple test that demonstrates using CodeRay for syntax highlighting:

test/coderay_highlight_syntax.rb

require "codnar"
require "test/spec"

Test highlighting syntax using CodeRay.

class TestCodeRayHighlightSyntax < Test::Unit::TestCase

  def test_coderay_lines
    Codnar::CodeRay.lines_to_html([
      { "kind" => "ruby_code", "number" => 1, "indentation" => "",   "payload" => "def foo"  },
      { "kind" => "ruby_code", "number" => 2, "indentation" => "  ", "payload" => "return 1" },
      { "kind" => "ruby_code", "number" => 3, "indentation" => "",   "payload" => "end"      },
    ], "ruby").should == [
      { "kind" => "html", "number" => 1,
        "payload" => <<-EOF.unindent.chomp
          <div class="CodeRay">
            <div class="code"><pre><span style="color:#080;font-weight:bold">def</span> <span style="color:#06B;font-weight:bold">foo</span>
            <span style="color:#080;font-weight:bold">return</span> <span style="color:#00D">1</span>
          <span style="color:#080;font-weight:bold">end</span></pre></div>
          </div>
        EOF
      },
    ]
  end

end

And here is the implementation:

lib/codnar/coderay.rb

module Codnar

Extend the CodeRay module.

  module CodeRay

Convert a sequence of classified code lines to HTML using CodeRay syntax highlighting. The options control the way CodeRay behaves (e.g., <tt>:css

> :class</tt>).

    def self.lines_to_html(lines, syntax, options = {})
      return Formatter.merge_lines(lines, "html") do |payload|
        ::CodeRay.scan(payload, syntax).div(options).chomp
      end
    end

  end

end

Syntax highlighting using Sunlight

Sunlight offers a different approach for syntax highlighting. Instead of pre-processing the code to generate highlighted HTML while splitting, it provides Javascript files that examine the textual code in the DOM and convert it to highlighted HTML in the browser. This takes virtually no time when splitting the code, but requires recomputing highlighting for all the code chunks every time the HTML file is loaded. This can be pretty slow, especially if using a browser with a slow Javascript engine, like IE. However, this may be a reasonable trade-off, at least for small projects. Since Sunlight is a new project, it supports a limited range of programming languages.

Here is a simple test that demonstrates using Sunlight for syntax highlighting:

test/sunlight_highlight_syntax.rb

require "codnar"
require "test/spec"

Test highlighting syntax using Sunlight.

class TestSunlightHighlightSyntax < Test::Unit::TestCase

  def test_sunlight_lines
    Codnar::Sunlight.lines_to_html([
      { "kind" => "ruby_code", "number" => 1, "indentation" => "",   "payload" => "def foo"  },
      { "kind" => "ruby_code", "number" => 2, "indentation" => "  ", "payload" => "return 1" },
      { "kind" => "ruby_code", "number" => 3, "indentation" => "",   "payload" => "end"      },
    ], "ruby").should == [
      { "kind" => "html", "number" => 1,
        "payload" => <<-EOF.unindent.chomp
          <pre class='sunlight-highlight-ruby'>
          def foo
            return 1
          end
          </pre>
        EOF
      },
    ]
  end

end

And here is the implementation:

lib/codnar/sunlight.rb

module Codnar

Syntax highlight using Sunlight.

  class Sunlight

Convert a sequence of classified code lines to HTML using Sunlight syntax highlighting. All we need to do is wrap the lines in an HTML pre element with the correct class (sunlight-highlight-syntax). The actual highlighting is done in the HTML DOM using Javascript. Embedding this Javascript into the final HTML should be done separately.

    def self.lines_to_html(lines, syntax)
      return Formatter.lines_to_pre_html(lines, :class => "sunlight-highlight-#{syntax}")
    end

  end

end

Putting it all together

Now that we have all the separate pieces of functionality for splitting source files into HTML chunks, we need to combine them to a single convenient service.

Splitting code files

Here is a simple test that demonstrates using the splitter for source code files:

test/split_code.rb

require "codnar"
require "olag/test"
require "test/spec"

Test splitting code files.

class TestSplitCode < Test::Unit::TestCase

  include Test::WithErrors
  include Test::WithTempfile

  def test_split_ruby
    splitter = Codnar::Splitter.new(@errors, RUBY_CONFIGURATION)
    path = write_tempfile("ruby.rb", RUBY_FILE)
    chunks = splitter.chunks(path)
    @errors.should == []
    chunks.should == ruby_chunks(path)
  end

protected

  def ruby_chunks(path)
    RUBY_CHUNKS[0].name = path
    RUBY_CHUNKS[1].containers[0] = path
    RUBY_CHUNKS.each { |chunk| chunk.locations[0].file = path }
    return RUBY_CHUNKS
  end

  RUBY_FILE = <<-EOF.unindent.gsub("#!", "#")
    #! This is *rdoc*.
      #! {{{ assignment
      local = $global
        indented
      #! }}}
  EOF

  RUBY_CONFIGURATION = {
    "formatters" => {
      "code" => "Formatter.cast_lines(lines, 'ruby')",
      "comment" => "Formatter.cast_lines(lines, 'rdoc')",
      "ruby" => "GVim.lines_to_html(lines, 'ruby')",
      "rdoc" => "Formatter.markup_lines_to_html(lines, Codnar::RDoc, 'rdoc')",
      "begin_chunk" => "[]",
      "end_chunk" => "[]",
      "nested_chunk" => "Formatter.nested_chunk_lines_to_html(lines)",
      "unindented_html" => "Formatter.unindented_lines_to_html(lines)",
    },
    "syntax" => {
      "start_state" => "ruby",
      "patterns" => {
        "comment" => { "regexp" => "^(\\s*)#\\s*(.*)$" },
        "code" => { "regexp" => "^(\\s*)(.*)$" },
        "begin_chunk" => { "regexp" => "^(\\s*)\\W*\\{\\{\\{\\s*(.*?)\\s*$" },
        "end_chunk" => { "regexp" => "^(\\s*)\\W*\\}\\}\\}\\s*(.*?)\\s*$" },
      },
      "states" => {
        "ruby" => {
          "transitions" => [
            { "pattern" => "begin_chunk" },
            { "pattern" => "end_chunk" },
            { "pattern" => "comment" },
            { "pattern" => "code" },
          ],
        },
      },
    },
  }

  RUBY_CHUNKS = [ {
    "name" => "PATH",
    "locations" => [ "file" => "PATH", "line" => 1 ],
    "containers" => [],
    "contained" => [ "assignment" ],
    "html" => <<-EOF.unindent.chomp, #! ((( html

      <table class='layout'>
      <tr>
      <td class='indentation'>
      <pre></pre>
      </td>
      <td class='html'>
      <div class='rdoc rdoc markup'>
      <p>
      This is <strong>rdoc</strong>.
      </p>
      </div>
      </td>
      </tr>
      </table>
      <pre class='nested chunk'>
        <a class='nested chunk' href='#assignment'>assignment</a>
      </pre>
    EOF

    #! ))) html
  }, {
    "name" => "assignment",
    "containers" => [ "PATH" ],
    "contained" => [],
    "locations" => [ "file" => "PATH", "line" => 2 ],
    "html" => <<-EOF.unindent.chomp, #! ((( html

      <div class='ruby code syntax' bgcolor="#ffffff" text="#000000">
      <font face="monospace">
      local =&nbsp;<font color="#00ffff">$global</font><br />
      &nbsp;&nbsp;indented<br />
      </font>
      </div>
    EOF

    #! ))) html
  } ]

end

And here is the implementation:

lib/codnar/splitter.rb

module Codnar

Split disk files into chunks.

  class Splitter

Construct a splitter based on a configuration in the following structure:

syntax: <syntax>
formatters:
  <kind>: <expression>

Where the syntax is passed as-is to (and expanded in-place by) a Scanner, and the formatters are passed as-is to a Formatter to convert the chunk’s classified lines into HTML.

    def initialize(errors, configuration)
      @errors = errors
      @configuration = configuration
      @scanner = Scanner.new(errors, configuration.syntax)
      @formatter = Formatter.new(errors, configuration.formatters)
    end

Split a disk file into HTML chunks.

    def chunks(path)
      lines = @scanner.lines(path)
      chunks = Merger.chunks(@errors, path, lines)
      chunks.each { |chunk| chunk.html = @formatter.lines_to_html(chunk.delete("lines")) }
      return chunks
    end

  end

end

Splitting documentation files

The narrative documentation is expected to reside in one or more files, which are also "split" to a single chunk each. Having both documentation and code exist as chunks allows for uniform treatment of both when weaving, as well as allowing for pre-processing the documentation files, if necessary. For example, Codnar currently supports for documentation the same two markup formats that are also supported for code comments. Here is a simple test that demonstrates "splitting" documentation (using the same implementation as above):

test/split_documentation.rb

require "codnar"
require "olag/test"
require "test/spec"

Test “splitting” documentation files.

class TestSplitDocumentation < Test::Unit::TestCase

  include Test::WithErrors
  include Test::WithFakeFS

  def test_split_raw
    write_fake_file("raw.html", "<foo>\nbar\n</foo>\n")
    splitter = Codnar::Splitter.new(@errors, configuration("html"))
    chunks = splitter.chunks("raw.html")
    @errors.should == []
    chunks.should == [ {
      "name" => "raw.html",
      "containers" => [],
      "contained" => [],
      "locations" => [ { "file" => "raw.html", "line" => 1 } ],
      "html" => "<foo>\nbar\n</foo>"
    } ]
  end

  def test_split_markdown
    write_fake_file("markdown.md", "*foo*\nbar\n")
    splitter = Codnar::Splitter.new(@errors, configuration("markdown"))
    chunks = splitter.chunks("markdown.md")
    @errors.should == []
    chunks.should == [ {
      "name" => "markdown.md",
      "containers" => [],
      "contained" => [],
      "locations" => [ { "file" => "markdown.md", "line" => 1 } ],
      "html" => "<div class='markdown markdown markup'>\n<p>\n<em>foo</em>\nbar\n</p>\n</div>"
    } ]
  end

  def test_split_rdoc
    write_fake_file("rdoc.rdoc", "*foo*\nbar\n")
    splitter = Codnar::Splitter.new(@errors, configuration("rdoc"))
    chunks = splitter.chunks("rdoc.rdoc")
    @errors.should == []
    chunks.should == [ {
      "name" => "rdoc.rdoc",
      "containers" => [],
      "contained" => [],
      "locations" => [ { "file" => "rdoc.rdoc", "line" => 1 } ],
      "html" => "<div class='rdoc rdoc markup'>\n<p>\n<strong>foo</strong> bar\n</p>\n</div>"
    } ]
  end

  def test_split_unknown_kind
    write_fake_file("unknown.kind", "foo\nbar\n")
    splitter = Codnar::Splitter.new(@errors, configuration("unknown-kind"))
    chunks = splitter.chunks("unknown.kind")
    @errors.should == [ "#{$0}: No formatter specified for lines of kind: unknown-kind" ]
    chunks.should == [ {
      "name" => "unknown.kind",
      "containers" => [],
      "contained" => [],
      "locations" => [ { "file" => "unknown.kind", "line" => 1 } ],
      "html" => "<pre class='missing formatter error'>\nfoo\nbar\n</pre>"
    } ]
  end

protected

  def configuration(kind)
    return {
      "formatters" => {
        "markdown" => "Formatter.markup_lines_to_html(lines, Markdown, 'markdown')",
        "unindented_html" => "Formatter.unindented_lines_to_html(lines)",
        "rdoc" => "Formatter.markup_lines_to_html(lines, Codnar::RDoc, 'rdoc')",
      },
      "syntax" => {
        "start_state" => kind,
        "patterns" => {
          kind => { "regexp" => "^(.*)$", "groups" => [ "payload" ] },
        },
        "states" => {
          kind => {
            "transitions" => [
              { "pattern" => kind }
            ]
          }
        }
      }
    }
  end

end

Built-in configurations

The splitting mechanism defined above is pretty generic. To apply it to a specific system requires providing the appropriate configuration. The system provides a few specific built-in configurations which may be useful "out of the box".

If one is willing to give up altogether on syntax highlighting and comment formatting, the system would be applicable as-is to any programming language. Properly highlighting almost any known programming language syntax would be a simple matter of passing the correct syntax parameter to GVim.

Properly formatting comments in additional mark-up formats would be trickier. First, a proper pattern needs to be established for extracting the comments (/*, //, --, etc.). Them, the results need to be converted to HTML. One way would be to pass them through GVim syntax highlighting with an appropriate format (e.g, syntax=doxygen). Another would be to invoke some Ruby library; finally, one could invoke some external tool to do the job. The latter two options would require providing additional glue Ruby code, similar to the GVim class above.

At any rate, here are the built-in configurations:

lib/codnar/split_configurations.rb

module Codnar

A module for all the “built-in” configurations. The names of these configurations can be passed to the –require option of any Codnar Application.

  module Configuration

    include Documentation
    include Code
    include Comments
    include Highlighting

  end

end

Combining configurations

Different source files require different overall configurations but reuse common building blocks. To support it, we allow comfigurations to be combined using a "deep merge". This allows complex nested structures to be merged. There is even a way for arrays to append elements before/after the array they are merged with. Here is a simple test that demonstrates deep-merging complex structures:

test/deep_merge.rb

require "codnar"
require "test/spec"

Test deep-merging complex structures.

class TestDeepMerge < Test::Unit::TestCase

  def test_deep_merge
    default = {
      "only_default" => "default_value",
      "overriden" => "default_value",
      "overriden_array" => [ "default_value" ],
      "merged_array" => [ "default_value" ],
    }
    override = {
      "only_override" => "overriden_value",
      "overriden" => "overriden_value",
      "overriden_array" => [ "overriden_value" ],
      "merged_array" => [ "overriden_value", [] ],
    }
    default.deep_merge(override).should == {
      "only_default" => "default_value",
      "only_override" => "overriden_value",
      "overriden" => "overriden_value",
      "overriden_array" => [ "overriden_value" ],
      "merged_array" => [ "overriden_value", "default_value" ],
    }
  end

end

Here is the implementation:

Deep merge

Perform a deep merge with another hash.

  def deep_merge(hash)
    return merge(hash, &Hash::method("deep_merger"))
  end

protected

Return a Hash merger that recursively merges nested hashes.

  def self.deep_merger(key, default, override)
    if Hash === default && Hash === override
      default.deep_merge(override)
    elsif Array === default && Array === override
      Hash.deep_merge_arrays(default, override)
    else
      override
    end
  end

If the overriding data array contains an empty array element (“[]”), it is replaced by the default data array being overriden.

  def self.deep_merge_arrays(default, override)
    embed_index = override.find_index([])
    return override unless embed_index
    override = override.dup
    override[embed_index..embed_index] = default
    return override
  end

lib/codnar/hash_extensions.rb

And here is a test module that automates the process of merging configurations and invoking the Splitter:

test/lib/test_with_configurations.rb

Tests with Codnar split configurations.

module Test::WithConfigurations

Test running the Splitter with merged configurations.

  def check_split_file(file_text, *configurations, &block)
    configuration = configurations.inject({}) do |merged_configuration, next_configuration|
      merged_configuration.deep_merge(next_configuration)
    end
    splitter = Codnar::Splitter.new(@errors, configuration)
    chunks = splitter.chunks(path = write_tempfile("splitted", file_text))
    @errors.should == []
    chunks.should == yield(path)
  end

end

Documentation "splitting"

These are pretty simple configurations, applicable to files containing a piece of the narrative in some supported format. These configurations typically do not require to be combined with other configurations. Here is a simple test that demonstrates "splitting" documentation:

test/split_documentation_configurations.rb

require "codnar"
require "olag/test"
require "test/spec"
require "test_with_configurations"

Test the built-in split documentation configurations.

class TestSplitDocumentationConfigurations < Test::Unit::TestCase

  include Test::WithConfigurations
  include Test::WithErrors
  #!include Test::WithFakeFS - until FakeFS fixes the tempfile issue.
  include Test::WithTempfile

  HTML_FILE = <<-EOF.unindent #! ((( html

    <p>This is an
    HTML file.</p>
  EOF

  # ))) html

  def test_split_html_documentation
    check_split_file(HTML_FILE, Codnar::Configuration::SPLIT_HTML_DOCUMENTATION) do |path|
      [ {
        "name" => path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "html" => HTML_FILE.chomp
      } ]
    end
  end

  PRE_FILE = <<-EOF.unindent
    This is a preformatted
    raw text file.
  EOF

  def test_split_pre_documentation
    check_split_file(PRE_FILE, Codnar::Configuration::SPLIT_PRE_DOCUMENTATION) do |path|
      [ {
        "name" => path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "html" => "<pre class='doc'>\n" + PRE_FILE + "</pre>"
      } ]
    end
  end

  MARKUP_FILE = <<-EOF.unindent
    This is a
    *marked-up* file.
  EOF

  RDOC_HTML = <<-EOF.unindent.chomp #! ((( html

    <div class='rdoc doc markup'>
    <p>
    This is a <strong>marked-up</strong> file.
    </p>
    </div>
  EOF

  # ))) html

  def test_split_rdoc_documentation
    check_split_file(MARKUP_FILE, Codnar::Configuration::SPLIT_RDOC_DOCUMENTATION) do |path|
      [ {
        "name" => path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "html" => RDOC_HTML,
      } ]
    end
  end

  MARKDOWN_HTML = <<-EOF.unindent.chomp #! ((( html

    <div class='markdown doc markup'>
    <p>
    This is a
    <em>marked-up</em> file.
    </p>
    </div>
  EOF

  #! ))) html

  def test_split_markdown_documentation
    check_split_file(MARKUP_FILE, Codnar::Configuration::SPLIT_MARKDOWN_DOCUMENTATION) do |path|
      [ {
        "name" => path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "html" => MARKDOWN_HTML,
      } ]
    end
  end

end

And here are the actual configurations:

lib/codnar/configuration/documentation.rb

module Codnar

  module Configuration

Configurations for “splitting” documentation files.

    module Documentation

“Split” a documentation file. All lines are assumed to have the same kind doc and no indentation is collected. Unless overriden by additional configuration(s), the lines are assumed to contain formatted HTML, and are passed as-is to the output.

This is the default configuration as it performs the minimal amount of processing on the input. It isn’t the most useful configuration.

      SPLIT_HTML_DOCUMENTATION = {
        "formatters" => {
          "doc" => "Formatter.cast_lines(lines, 'html')",
        },
        "syntax" => {
          "patterns" => {
            "doc" => { "regexp" => "^(.*)$", "groups" => [ "payload" ] },
          },
          "states" => {
            "start" => { "transitions" => [ { "pattern" => "doc" } ] },
          },
        },
      }

“Split” a documentation file containing arbitrary text, which is preserved by escaping it and wrapping it in an HTML pre element.

      SPLIT_PRE_DOCUMENTATION = SPLIT_HTML_DOCUMENTATION.deep_merge(
        "formatters" => {
          "doc" => "Formatter.lines_to_pre_html(lines, :class => :doc)",
        }
      )

“Split” a documentation file containing pure RDoc documentation.

      SPLIT_RDOC_DOCUMENTATION = SPLIT_HTML_DOCUMENTATION.deep_merge(
        "formatters" => {
          "doc" => "Formatter.markup_lines_to_html(lines, Codnar::RDoc, 'rdoc')",
          "unindented_html" => "Formatter.unindented_lines_to_html(lines)",
        }
      )

“Split” a documentation file containing pure Markdown documentation.

      SPLIT_MARKDOWN_DOCUMENTATION = SPLIT_HTML_DOCUMENTATION.deep_merge(
        "formatters" => {
          "doc" => "Formatter.markup_lines_to_html(lines, Codnar::Markdown, 'markdown')",
          "unindented_html" => "Formatter.unindented_lines_to_html(lines)",
        }
      )

“Split” a documentation file containing a GraphViz diagram.

      SPLIT_GRAPHVIZ_DOCUMENTATION = SPLIT_HTML_DOCUMENTATION.deep_merge(
        "formatters" => {
          "doc" => "Formatter.markup_lines_to_html(lines, Codnar::GraphViz, 'graphviz')",
          "unindented_html" => "Formatter.unindented_lines_to_html(lines)",
        }
      )

    end

  end

end

Source code lines classification

Splitting source code files is a more complex affair, which does typically require combining several configurations.

lib/codnar/configuration/code.rb

module Codnar

  module Configuration

Configurations for splitting source code.

    module Code

      Source code lines classification configurations

      Nested foreign syntax code islands configurations


    end

  end

end

The basic configuration marks all lines as belonging to some code syntax, as a single chunk:

Source code lines classification configurations

Classify all lines as source code of some syntax (kind). This doesn’t distinguish between comment and code lines; to do that, you need to combine this with comment classification configuration(s). Also, it just formats the lines in an HTML pre element, without any syntax highlighting; to do that, you need to combine this with syntax highlighting formatting configuration(s).

CLASSIFY_SOURCE_CODE = lambda do |syntax|
  return {
    "formatters" => {
      "#{syntax}_code" => "Formatter.lines_to_pre_html(lines, :class => :code)",
    },
    "syntax" => {
      "patterns" => {
        "#{syntax}_code" => { "regexp" => "^(\\s*)(.*)$" },
      },
      "states" => {
        "start" => {
          "transitions" => [
            { "pattern" => "#{syntax}_code" },
          ],
        },
      },
    },
  }
end

lib/codnar/configuration/code.rb

Sometimes, a code in one syntax contains nested "islands" of code in another syntax. Here is a simple configuration to support that, which can be combined with the above basic configuration:

Nested foreign syntax code islands configurations

Allow for comments containing “((( <syntax>” and “))) <syntax>” to designate nested islands of foreign syntax inside the normal code. The designator comment lines are always treated as part of the surrounding code, not as part of the nested foreign syntax code. There is no further classification of the nested foreign syntax code. Therefore, the nested code is not examined for begin/end chunk markers. Likewise, the nested code may not contain deeper nested code using a third syntax.

CLASSIFY_NESTED_CODE = lambda do |outer_syntax, inner_syntax|
  {
    "syntax" => {
      "patterns" => {
        "start_#{inner_syntax}_in_#{outer_syntax}" =>
          { "regexp" => "^(\\s*)(.*\\(\\(\\(\\s*#{inner_syntax}.*)$" },
        "end_#{inner_syntax}_in_#{outer_syntax}" =>
          { "regexp" => "^(\\s*)(.*\\)\\)\\)\\s*#{inner_syntax}.*)$" },
        "#{inner_syntax}_in_#{outer_syntax}" =>
          { "regexp" => "^(\\s*)(.*)$" },
      },
      "states" => {
        "start" => {
          "transitions" => [
            { "pattern" => "start_#{inner_syntax}_in_#{outer_syntax}",
              "kind" => "#{outer_syntax}_code",
              "next_state" => "#{inner_syntax}_in_#{outer_syntax}" },
            [],
          ],
        },
        "#{inner_syntax}_in_#{outer_syntax}" => {
          "transitions" => [
            { "pattern" => "end_#{inner_syntax}_in_#{outer_syntax}",
              "kind" => "#{outer_syntax}_code",
              "next_state" => "start" },
            { "pattern" => "#{inner_syntax}_in_#{outer_syntax}",
              "kind" => "#{inner_syntax}_code" },
          ],
        },
      },
    },
  }
end

lib/codnar/configuration/code.rb

Here is a simple test demonstrating using source code lines classifications:

test/split_code_configurations.rb

require "codnar"
require "olag/test"
require "test/spec"
require "test_with_configurations"

Test combinations of the built-in split code configurations.

class TestSplitCodeConfigurations < Test::Unit::TestCase

  include Test::WithConfigurations
  include Test::WithErrors
  include Test::WithTempfile

  SOURCE_CODE = <<-EOF.unindent
    a = b
    b = 1
  EOF

  def test_source_code
    check_split_file(SOURCE_CODE, Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("ruby")) do |path|
      [ {
        "name" => path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "html" => "<pre class='code'>\n#{SOURCE_CODE}</pre>"
      } ]
    end
  end

  ISLAND_CODE = <<-EOF.unindent
    a = b
    b = 1
    HTML = <<-EOH.unindent # ((( html

      <p>
      HTML
      </p>
    EOH

    # ))) html
  EOF

  ISLAND_HTML = <<-EOF.unindent.chomp
    <pre class='ruby code syntax'>
    a = b
    b = <span class="Constant">1</span>
    <span class="Type">HTML</span> = &lt;&lt;-<span class="Special">EOH</span>.unindent <span class="Comment"># ((( html</span>

    </pre>
    <pre class='html code syntax'>
      <span class="Identifier">&lt;</span><span class="Statement">p</span><span class="Identifier">&gt;</span>
      HTML
      <span class="Identifier">&lt;/</span><span class="Statement">p</span><span class="Identifier">&gt;</span>
    EOH
    </pre>
    <pre class='ruby code syntax'>

    <span class="Comment"># ))) html</span>
    </pre>
  EOF

  def test_island_code
    check_split_file(ISLAND_CODE, Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("ruby"),
                                  Codnar::Configuration::FORMAT_CODE_GVIM_CSS.call("ruby"),
                                  Codnar::Configuration::CLASSIFY_NESTED_CODE.call("ruby", "html"),
                                  Codnar::Configuration::FORMAT_CODE_GVIM_CSS.call("html")) do |path|
      [ {
        "name" => path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "html" => ISLAND_HTML
      } ]
    end
  end

end

Classifying comment lines

Classifying comment lines is the most complex part of splitting source code files, requiring the use of one or more configurations specific to the language used.

lib/codnar/configuration/comments.rb

module Codnar

  module Configuration

Configurations for splitting source code with comments.

    module Comments

      Simple comment classification configurations

      Denoted comment classification configurations

      Delimited comment classification configurations

      Comment formatting configurations


    end

  end

end

Simple comment classification

Many languages use a simple comment syntax, where some prefix indicates a comment that spans until the end of the line (e.g., shell # comments or C++ // comments).

Simple comment classification configurations

Classify simple comment lines. It accepts a restricted format: each comment is expected to start with some exact prefix (e.g. “#” for shell style comments or “//” for C++ style comments). The following space, if any, is stripped from the payload. As a convenience, comment that starts with “!” is not taken to start a comment. This both protects the 1st line of shell scripts (“#!”), and also any other line you wish to avoid being treated as a comment.

This configuration is typically complemented by an additional one specifying how to format the (stripped!) comments; by default they are just displayed as-is using an HTML pre element, which isn’t very useful.

CLASSIFY_SIMPLE_COMMENTS = lambda do |prefix|
  return Comments.simple_comments(prefix)
end

Classify simple shell (“#”) comment lines.

CLASSIFY_SHELL_COMMENTS = lambda do
  return Comments.simple_comments("#")
end

Classify simple C++ (“//”) comment lines.

CLASSIFY_CPP_COMMENTS = lambda do
  return Comments.simple_comments("//")
end

Configuration for classifying lines to comments and code based on a simple prefix (e.g. “#” for shell style comments or “//” for C++ style comments).

def self.simple_comments(prefix)
  return {
    "syntax" => {
      "patterns" => {
        "comment_#{prefix}" => { "regexp" => "^(\\s*)#{prefix}(?!!)\\s?(.*)$" },
      },
      "states" => {
        "start" => {
          "transitions" => [
            { "pattern" => "comment_#{prefix}", "kind" => "comment" },
            []
          ],
        },
      },
    },
  }
end

lib/codnar/configuration/comments.rb

Here is a simple test demonstrating using simple comment classifications:

test/split_simple_comment_configurations.rb

require "codnar"
require "olag/test"
require "test/spec"
require "test_with_configurations"

Test built-in split simple comment configurations.

class TestSplitSimpleCommentsConfigurations < Test::Unit::TestCase

  include Test::WithConfigurations
  include Test::WithErrors
  include Test::WithTempfile

  def test_custom_comments
    check_any_comment("!", Codnar::Configuration::CLASSIFY_SIMPLE_COMMENTS.call("!"))
  end

  def test_shell_comments
    check_any_comment("#", Codnar::Configuration::CLASSIFY_SHELL_COMMENTS.call)
  end

  def test_cpp_comments
    check_any_comment("//", Codnar::Configuration::CLASSIFY_CPP_COMMENTS.call)
  end

protected

The “?” will be replaced by the simple comment prefix.

  ANY_COMMENT_CODE = <<-EOF.unindent
    ?
    ? Comment
    Code
    ?! Not comment
  EOF

  def check_any_comment(prefix, configuration)
    check_split_file(ANY_COMMENT_CODE.gsub("?", prefix),
                     Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("any"),
                     Codnar::Configuration::FORMAT_PRE_COMMENTS,
                     configuration) do |path|
      [ {
        "name" => path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "html" => "<pre class='comment'>\n\nComment\n</pre>\n<pre class='code'>\nCode\n#{prefix}! Not comment\n</pre>"
      } ]
    end
  end

end

Denoted comment classification

Sometimes some simple comments require special treatment if they are denoted by some leading prefix. For example, Haskell simple comments start with -- but Haddock (documentation) comments start with -- |, -- ^ etc.

Denoted comment classification configurations

Classify denoted comment lines. Denoted comments are similar to simple comments, except that the 1st simple comment line must start with a specific prefix (e.g., in haddock, comment lines start with ‘–’ but haddoc comments start with ‘– |’, ‘– ^’, etc.). The comment continues in additional simple comment lines.

CLASSIFY_DENOTED_COMMENTS = lambda do |start_prefix, continue_prefix|
  return Comments.denoted_comments(start_prefix, continue_prefix)
end

Classify denoted haddock (“–”) comment lines. Note that non-haddock comment lines are not captured; they would treated as code and handled by syntax highlighting, if any.

CLASSIFY_HADDOCK_COMMENTS = lambda do
  return Comments.denoted_comments("-- [|^$]", "--")
end

Configuration for classifying lines to comments and code based on a start comment prefix and continuation comment prefix (e.g., “– |” and “–” for haddock).

def self.denoted_comments(start_prefix, continue_prefix)

Ruby coverage somehow barfs if we inline this. Go figure.

  start_transition = {
    "pattern" => "comment_start_#{start_prefix}",
    "next_state" => "comment_continue_#{continue_prefix}",
    "kind" => "comment"
  }
  return {
    "syntax" => {
      "patterns" => {
        "comment_start_#{start_prefix}" => { "regexp" => "^(\\s*)#{start_prefix}\\s?(.*)$" },
        "comment_continue_#{continue_prefix}" => { "regexp" => "^(\\s*)#{continue_prefix}\\s?(.*)$" },
      },
      "states" => {
        "start" => {
          "transitions" => [ start_transition, [] ],
        },
        "comment_continue_#{continue_prefix}" => {
          "transitions" => [ {
              "pattern" => "comment_continue_#{continue_prefix}",
              "kind" => "comment" },
            { "next_state" => "start" }
          ],
        },
      },
    },
  }
end

lib/codnar/configuration/comments.rb

Here is a simple test demonstrating using denoted comment classifications:

test/split_denoted_comment_configurations.rb

require "codnar"
require "olag/test"
require "test/spec"
require "test_with_configurations"

Test built-in split denoted comment configurations.

class TestSplitDenotedCommentsConfigurations < Test::Unit::TestCase

  include Test::WithConfigurations
  include Test::WithErrors
  include Test::WithTempfile

  def test_custom_comments
    check_any_comment("// @", "//", Codnar::Configuration::CLASSIFY_DENOTED_COMMENTS.call("// @", "//"))
  end

  def test_haddoc_comments
    check_any_comment("-- |", "--", Codnar::Configuration::CLASSIFY_HADDOCK_COMMENTS.call)
  end

protected

The “<<<” will be replaced by the start comment prefix, and the “>>>” will be replaced by the continue comment prefix.

  ANY_COMMENT_CODE = <<-EOF.unindent
    >>> Not start comment
    <<< Start comment
    >>> Continue comment
    Not a comment
  EOF

The “>>>” will be replaced by the continue comment prefix.

  ANY_COMMENT_HTML = <<-EOF.unindent.chomp # ((( html

    <pre class='code'>
    >>> Not start comment
    </pre>
    <pre class='comment'>
    Start comment
    Continue comment
    </pre>
    <pre class='code'>
    Not a comment
    </pre>
  EOF
  # )))

  def check_any_comment(start_prefix, continue_prefix, configuration)
    check_split_file(ANY_COMMENT_CODE.gsub("<<<", start_prefix).gsub(">>>", continue_prefix),
                     Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("any"),
                     Codnar::Configuration::FORMAT_PRE_COMMENTS,
                     configuration) do |path|
      [ {
        "name" => path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "html" => ANY_COMMENT_HTML.gsub(">>>", continue_prefix),
      } ]
    end
  end

end

Delimited comment classification

Other languages use a delimited multi-line comment syntax, where some prefix indicates the beginning of the comment, some suffix indicates the end, and by convention some prefix is expected for the inner comment lines (e.g., C's "/*", "*", "*/" comments or HTML's "" comments).

Delimited comment classification configurations

Classify delimited comment lines. It accepts a restricted format: each comment is expected to start with some exact prefix (e.g. “/*” for C style comments or “<!–” for HTML style comments). The following space, if any, is stripped from the payload. Following lines are also considered comments; a leading inner line prefix (e.g., “ *” for C style comments or “ -” for HTML style comments) with an optional following space are stripped from the payload. Finally, a line containing some exact suffix (e.g. “*/” for C style comments, or “–>” for HTML style comments) ends the comment. A one line comment format is also supported containing the prefix, the payload, and the suffix. As a convenience, comment that starts with “!” is not taken to start a comment. This allows protecting comment block you wish to avoid being classified as a comment.

CLASSIFY_DELIMITED_COMMENTS = lambda do |prefix, inner, suffix|
  return Comments.delimited_comments(prefix, inner, suffix)
end

Classify delimited C (“/*”, “ *”, “ */”) style comments.

CLASSIFY_C_COMMENTS = lambda do

Since the prefix/inner/suffix passed to the configuration are regexps, we need to escape special characters such as “*”.

  return Comments.delimited_comments("/\\*", " \\*", " \\*/")
end

Classify delimited HTML (“<!–”, “ -”, “–>”) style comments.

CLASSIFY_HTML_COMMENTS = lambda do
  return Comments.delimited_comments("<!--", " -", "-->")
end

Configuration for classifying lines to comments and code based on a delimited start prefix, inner line prefix and final suffix (e.g., “/*”, “ *”, “ */” for C-style comments or “<!–”, “ -”, “–>” for HTML style comments).

def self.delimited_comments(prefix, inner, suffix)
  return {
    "syntax" => {
      "patterns" => {
        "comment_prefix_#{prefix}" => { "regexp" => "^(\\s*)#{prefix}(?!!)\\s?(.*)$" },
        "comment_inner_#{inner}" => { "regexp" => "^(\\s*)#{inner}\\s?(.*)$" },
        "comment_suffix_#{suffix}" => { "regexp" => "^(\\s*)#{suffix}\\s*$" },
        "comment_line_#{prefix}_#{suffix}" => { "regexp" => "^(\\s*)#{prefix}(?!!)\s?(.*?)\s*#{suffix}\\s*$" },
      },
      "states" => {
        "start" => {
          "transitions" => [
            { "pattern" => "comment_line_#{prefix}_#{suffix}",
              "kind" => "comment" },
            { "pattern" => "comment_prefix_#{prefix}",
              "kind" => "comment",
              "next_state" => "comment_#{prefix}" },
            [],
          ],
        },
        "comment_#{prefix}" => {
          "transitions" => [
            { "pattern" => "comment_suffix_#{suffix}",
              "kind" => "comment",
              "next_state" => "start" },
            { "pattern" => "comment_inner_#{inner}",
              "kind" => "comment" },
          ],
        },
      },
    },
  }
end

lib/codnar/configuration/comments.rb

Here is a simple test demonstrating using delimited comment classifications:

test/split_delimited_comment_configurations.rb

require "codnar"
require "olag/test"
require "test/spec"
require "test_with_configurations"

Test built-in split delimited comment configurations.

class TestSplitDelimitedCommentsConfigurations < Test::Unit::TestCase

  include Test::WithConfigurations
  include Test::WithErrors
  include Test::WithTempfile

  def test_custom_comments

Since the prefix/inner/suffix passed to the configuration are regexps, we need to escape special characters such as “{” and “|”.

    check_any_comment([ "@{", " |", " }@" ], Codnar::Configuration::CLASSIFY_DELIMITED_COMMENTS.call("@\\{", " \\|", " \\}@"))
  end

  def test_c_comments
    check_any_comment([ "/*", " *", " */" ], Codnar::Configuration::CLASSIFY_C_COMMENTS.call)
  end

  def test_html_comments
    check_any_comment([ "<!--", " -", "-->" ], Codnar::Configuration::CLASSIFY_HTML_COMMENTS.call)
  end

protected

The “<<<” will be replaced by the start comment prefix, the “<>” will be replaced by the inner line comment prefix, and the “>>>” will be replaced by the end comment suffix.

  ANY_COMMENT_CODE = <<-EOF.unindent
    <<< One-line comment >>>
    Code
    <<<
    <> Multi-line
    <> comment.
    >>>
  EOF

  ANY_COMMENT_HTML = <<-EOF.unindent.chomp # ((( html

    <pre class='comment'>
    One-line comment
    </pre>
    <pre class='code'>
    Code
    </pre>
    <pre class='comment'>

    Multi-line
    comment.

    </pre>
  EOF
  # )))

  def check_any_comment(patterns, configuration)
    prefix, inner, suffix = patterns
    check_split_file(ANY_COMMENT_CODE.gsub("<<<", prefix).gsub(">>>", suffix).gsub("<>", inner),
                     Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("any"),
                     Codnar::Configuration::FORMAT_PRE_COMMENTS,
                     configuration) do |path|
      [ {
        "name" => path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "html" => ANY_COMMENT_HTML.gsub("/--", prefix).gsub("--/", suffix).gsub(" -", inner),
      } ]
    end
  end

end

Comment formatting

In many cases, the text inside comments is written using some markup format (e.g., RDoc for Ruby or JavaDoc for Java). Currently, two such formats are supported, as well as simply wrapping the comment in an HTML pre element:

Comment formatting configurations

Format comments as HTML pre elements. Is used to complement a configuration that classifies some lines as comment.

FORMAT_PRE_COMMENTS = {
  "formatters" => {
    "comment" => "Formatter.lines_to_pre_html(lines, :class => :comment)",
  },
}

Format comments that use the RDoc notation. Is used to complement a configuration that classifies some lines as comment.

FORMAT_RDOC_COMMENTS = {
  "formatters" => {
    "comment" => "Formatter.markup_lines_to_html(lines, Codnar::RDoc, 'rdoc')",
    "unindented_html" => "Formatter.unindented_lines_to_html(lines)",
  },
}

Format comments that use the Markdown notation. Is used to complement a configuration that classifies some lines as comment.

FORMAT_MARKDOWN_COMMENTS = {
  "formatters" => {
    "comment" => "Formatter.markup_lines_to_html(lines, Markdown, 'markdown')",
    "unindented_html" => "Formatter.unindented_lines_to_html(lines)",
  },
}

Format comments that use the Haddock notation. Is used to complement a configuration that classifies some lines as comment.

FORMAT_HADDOCK_COMMENTS = {
  "formatters" => {
    "comment" => "Formatter.markup_lines_to_html(lines, Haddock, 'haddock')",
    "unindented_html" => "Formatter.unindented_lines_to_html(lines)",
  },
}

lib/codnar/configuration/comments.rb

Here is a simple test demonstrating formatting comment contents:

test/format_comment_configurations.rb

require "codnar"
require "olag/test"
require "test/spec"
require "test_with_configurations"

Test built-in split comment formatting configurations.

class TestFormatCommentsConfigurations < Test::Unit::TestCase

  include Test::WithConfigurations
  include Test::WithErrors
  include Test::WithTempfile

  COMMENT_TEXT = <<-EOF.unindent.gsub("#!", "#")
    #! Comment *text*.
  EOF

  PRE_HTML = <<-EOF.unindent.chomp
    <pre class='comment'>
    Comment *text*.
    </pre>
  EOF

  def test_pre_comments
    check_any_format(PRE_HTML, Codnar::Configuration::FORMAT_PRE_COMMENTS)
  end

  RDOC_HTML = <<-EOF.unindent.chomp
    <table class='layout'>
    <tr>
    <td class='indentation'>
    <pre></pre>
    </td>
    <td class='html'>
    <div class='rdoc comment markup'>
    <p>
    Comment <strong>text</strong>.
    </p>
    </div>
    </td>
    </tr>
    </table>
  EOF

  def test_rdoc_comments
    check_any_format(RDOC_HTML, Codnar::Configuration::FORMAT_RDOC_COMMENTS)
  end

  MARKDOWN_HTML = <<-EOF.unindent.chomp
    <table class='layout'>
    <tr>
    <td class='indentation'>
    <pre></pre>
    </td>
    <td class='html'>
    <div class='markdown comment markup'>
    <p>
    Comment <em>text</em>.
    </p>
    </div>
    </td>
    </tr>
    </table>
  EOF

  def test_markdown_comments
    check_any_format(MARKDOWN_HTML, Codnar::Configuration::FORMAT_MARKDOWN_COMMENTS)
  end

protected

  def check_any_format(html, configuration)
    check_split_file(COMMENT_TEXT,
                     Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("any"),
                     Codnar::Configuration::CLASSIFY_SHELL_COMMENTS.call,
                     configuration) do |path|
      [ {
        "name" => path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "html" => html,
      } ]
    end
  end

end

Syntax highlighting

Highlighting the syntax of the source code embedded in the documentation improved readability. Codnar provides several ways to achieve this.

lib/codnar/configuration/highlighting.rb

module Codnar

  module Configuration

Configurations for highlighting source code lines.

    module Highlighting

      GVim syntax highlighting formatting configurations

      CodeRay syntax highlighting formatting configurations

      Sunlight syntax highlighting formatting configurations

      Chunk splitting configurations


    end

  end

end

Syntax highlighting using GVim

Supporting almost any known programming language (other than dealing with comments) is very easy using GVim for syntax highlighting, as demonstrated here:

GVim syntax highlighting formatting configurations

Format code using GVim’s syntax highlighting, using explicit HTML constructs. Assumes some previous configuration already classified the code lines.

FORMAT_CODE_GVIM_HTML = lambda do |syntax|
  return Highlighting.klass_code_format('GVim', syntax, "[]")
end

Format code using GVim’s syntax highlighting, using CSS classes instead of explicit font and color styles. Assumes some previous configuration already classified the code lines.

FORMAT_CODE_GVIM_CSS = lambda do |syntax|
  return Highlighting.klass_code_format('GVim', syntax, "[ '+:let html_use_css=1' ]")
end

Return a configuration for highlighting a specific syntax using GVim.

def self.klass_code_format(klass, syntax, options)
  return {
    "formatters" => {
      "#{syntax}_code" => "#{klass}.lines_to_html(lines, '#{syntax}', #{options})",
    },
  }
end

lib/codnar/configuration/highlighting.rb

If you choose to use CSS classes instead of directly embedding fonts and colors into the generated HTML, you will need a CSS stylesheet with the relevant classes. Here is the default CSS stylesheet used by GVim:

lib/codnar/data/gvim.css

Colors for GVim classes

span.Constant   { color: Crimson; }
span.Identifier { color: Teal; }
span.PreProc    { color: Indigo; }
span.Special    { color: Navy; }
span.Statement  { color: Maroon; }
span.Type       { color: Green; }
span.Comment    { color: Purple; }

Syntax highlighting using CodeRay

For supported programming languages, you may choose to use CodeRay instead of GVim.

CodeRay syntax highlighting formatting configurations

Format code using CodeRay’s syntax highlighting, using explicit HTML constructs. Assumes some previous configuration already classified the code lines.

FORMAT_CODE_CODERAY_HTML = lambda do |syntax|
  return Highlighting.klass_code_format('CodeRay', syntax, "{}")
end

Format code using CodeRay’s syntax highlighting, using CSS classes instead of explicit font and color styles. Assumes some previous configuration already classified the code lines.

FORMAT_CODE_CODERAY_CSS = lambda do |syntax|
  return Highlighting.klass_code_format('CodeRay', syntax, "{ :css => :class }")
end

lib/codnar/configuration/highlighting.rb

lib/codnar/data/coderay.css

Extracted from CodeRay output


.CodeRay .line-numbers a {
  text-decoration: inherit;
  color: inherit;
}
.CodeRay {
  background-color: hsl(0,0%,95%);
  border: 1px solid silver;
  color: black;
}
.CodeRay pre {
  margin: 0px;
}

span.CodeRay { white-space: pre; border: 0px; padding: 2px; }

table.CodeRay { border-collapse: collapse; width: 100%; padding: 2px; }
table.CodeRay td { padding: 2px 4px; vertical-align: top; }

.CodeRay .line-numbers {
  background-color: hsl(180,65%,90%);
  color: gray;
  text-align: right;
  -webkit-user-select: none;
  -moz-user-select: none;
  user-select: none;
}
.CodeRay .line-numbers a {
  background-color: hsl(180,65%,90%) !important;
  color: gray !important;
  text-decoration: none !important;
}
.CodeRay .line-numbers a:target { color: blue !important; }
.CodeRay .line-numbers .highlighted { color: red !important; }
.CodeRay .line-numbers .highlighted a { color: red !important; }
.CodeRay span.line-numbers { padding: 0px 4px; }
.CodeRay .line { display: block; float: left; width: 100%; }
.CodeRay .code { width: 100%; }
.CodeRay .code pre { overflow: auto; }

.CodeRay .debug { color: white !important; background: blue !important; }

.CodeRay .annotation { color:#007 }
.CodeRay .attribute-name { color:#b48 }
.CodeRay .attribute-value { color:#700 }
.CodeRay .binary { color:#509 }
.CodeRay .char .content { color:#D20 }
.CodeRay .char .delimiter { color:#710 }
.CodeRay .char { color:#D20 }
.CodeRay .class { color:#B06; font-weight:bold }
.CodeRay .class-variable { color:#369 }
.CodeRay .color { color:#0A0 }
.CodeRay .comment { color:#777 }
.CodeRay .comment .char { color:#444 }
.CodeRay .comment .delimiter { color:#444 }
.CodeRay .complex { color:#A08 }
.CodeRay .constant { color:#036; font-weight:bold }
.CodeRay .decorator { color:#B0B }
.CodeRay .definition { color:#099; font-weight:bold }
.CodeRay .delimiter { color:black }
.CodeRay .directive { color:#088; font-weight:bold }
.CodeRay .doc { color:#970 }
.CodeRay .doc-string { color:#D42; font-weight:bold }
.CodeRay .doctype { color:#34b }
.CodeRay .entity { color:#800; font-weight:bold }
.CodeRay .error { color:#F00; background-color:#FAA }
.CodeRay .escape  { color:#666 }
.CodeRay .exception { color:#C00; font-weight:bold }
.CodeRay .float { color:#60E }
.CodeRay .function { color:#06B; font-weight:bold }
.CodeRay .global-variable { color:#d70 }
.CodeRay .hex { color:#02b }
.CodeRay .imaginary { color:#f00 }
.CodeRay .include { color:#B44; font-weight:bold }
.CodeRay .inline { background-color: hsla(0,0%,0%,0.07); color: black }
.CodeRay .inline-delimiter { font-weight: bold; color: #666 }
.CodeRay .instance-variable { color:#33B }
.CodeRay .integer  { color:#00D }
.CodeRay .key .char { color: #60f }
.CodeRay .key .delimiter { color: #404 }
.CodeRay .key { color: #606 }
.CodeRay .keyword { color:#080; font-weight:bold }
.CodeRay .label { color:#970; font-weight:bold }
.CodeRay .local-variable { color:#963 }
.CodeRay .namespace { color:#707; font-weight:bold }
.CodeRay .octal { color:#40E }
.CodeRay .operator { }
.CodeRay .predefined { color:#369; font-weight:bold }
.CodeRay .predefined-constant { color:#069 }
.CodeRay .predefined-type { color:#0a5; font-weight:bold }
.CodeRay .preprocessor { color:#579 }
.CodeRay .pseudo-class { color:#00C; font-weight:bold }
.CodeRay .regexp .content { color:#808 }
.CodeRay .regexp .delimiter { color:#404 }
.CodeRay .regexp .modifier { color:#C2C }
.CodeRay .regexp { background-color:hsla(300,100%,50%,0.06); }
.CodeRay .reserved { color:#080; font-weight:bold }
.CodeRay .shell .content { color:#2B2 }
.CodeRay .shell .delimiter { color:#161 }
.CodeRay .shell { background-color:hsla(120,100%,50%,0.06); }
.CodeRay .string .char { color: #b0b }
.CodeRay .string .content { color: #D20 }
.CodeRay .string .delimiter { color: #710 }
.CodeRay .string .modifier { color: #E40 }
.CodeRay .string { background-color:hsla(0,100%,50%,0.05); }
.CodeRay .symbol .content { color:#A60 }
.CodeRay .symbol .delimiter { color:#630 }
.CodeRay .symbol { color:#A60 }
.CodeRay .tag { color:#070 }
.CodeRay .type { color:#339; font-weight:bold }
.CodeRay .value { color: #088; }
.CodeRay .variable  { color:#037 }

.CodeRay .insert { background: hsla(120,100%,50%,0.12) }
.CodeRay .delete { background: hsla(0,100%,50%,0.12) }
.CodeRay .change { color: #bbf; background: #007; }
.CodeRay .head { color: #f8f; background: #505 }
.CodeRay .head .filename { color: white; }

.CodeRay .delete .eyecatcher { background-color: hsla(0,100%,50%,0.2); border: 1px solid hsla(0,100%,45%,0.5); margin: -1px; border-bottom: none; border-top-left-radius: 5px; border-top-right-radius: 5px; }
.CodeRay .insert .eyecatcher { background-color: hsla(120,100%,50%,0.2); border: 1px solid hsla(120,100%,25%,0.5); margin: -1px; border-top: none; border-bottom-left-radius: 5px; border-bottom-right-radius: 5px; }

.CodeRay .insert .insert { color: #0c0; background:transparent; font-weight:bold }
.CodeRay .delete .delete { color: #c00; background:transparent; font-weight:bold }
.CodeRay .change .change { color: #88f }
.CodeRay .head .head { color: #f4f }

Syntax highlighting using Sunlight

For small projects in supported languages, you may choose to use Sunlight instead of GVim.

Sunlight syntax highlighting formatting configurations

Format code using Sunlight’s syntax highlighting. This assumes the HTML will include and invoke Sunlight’s Javascript file which does the highlighting on the fly inside the DOM, instead of pre-computing it when splitting the file.

FORMAT_CODE_SUNLIGHT = lambda do |syntax|
  return Highlighting.sunlight_code_format(syntax)
end

Return a configuration for highlighting a specific syntax using Sunlight.

def self.sunlight_code_format(syntax)
  return {
    "formatters" => {
      "#{syntax}_code" => "Sunlight.lines_to_html(lines, '#{syntax}')",
    },
  }
end

lib/codnar/configuration/highlighting.rb

Here is a simple test demonstrating highlighting code syntax using the different configurations (GVim, CodeRay, or Sunlight):

test/format_code_configurations.rb

require "codnar"
require "olag/test"
require "test/spec"
require "test_with_configurations"

Test built-in split code formatting configurations.

class TestFormatCodeConfigurations < Test::Unit::TestCase

  include Test::WithConfigurations
  include Test::WithErrors
  include Test::WithTempfile

  def test_gvim_html_code
    check_any_code(<<-EOF.unindent.chomp, Codnar::Configuration::FORMAT_CODE_GVIM_HTML.call("c"))
      <div class='c code syntax' bgcolor=\"#ffffff\" text=\"#000000\">
      <font face=\"monospace\">
      <font color=\"#00ff00\">int</font>&nbsp;x;<br />
      </font>
      </div>
    EOF
  end

  def test_gvim_css_code
    check_any_code(<<-EOF.unindent.chomp, Codnar::Configuration::FORMAT_CODE_GVIM_CSS.call("c"))
      <pre class='c code syntax'>
      <span class=\"Type\">int</span> x;
      </pre>
    EOF
  end

  def test_coderay_html_code
    check_any_code(<<-EOF.unindent.chomp, Codnar::Configuration::FORMAT_CODE_CODERAY_HTML.call("c"))
      <div class="CodeRay">
        <div class="code"><pre><span style="color:#0a5;font-weight:bold">int</span> x;</pre></div>
      </div>
    EOF
  end

  def test_coderay_css_code
    check_any_code(<<-EOF.unindent.chomp, Codnar::Configuration::FORMAT_CODE_CODERAY_CSS.call("c"))
      <div class="CodeRay">
        <div class="code"><pre><span class="predefined-type">int</span> x;</pre></div>
      </div>
    EOF
  end

  def test_sunlight_code
    check_any_code(<<-EOF.unindent.chomp, Codnar::Configuration::FORMAT_CODE_SUNLIGHT.call("c"))
      <pre class='sunlight-highlight-c'>
      int x;
      </pre>
    EOF
  end

protected

  def check_any_code(html, configuration)
    check_split_file("int x;\n",
                     Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("c"),
                     configuration) do |path|
      [ {
        "name" => path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [],
        "html" => html,
      } ]
    end
  end

end

Chunk splitting

There are many ways to denote code "regions" (which become Codnar chunks). The following covers GVim's default scheme; others are easily added. It is safest to merge this configuration as the last of all the combined configurations, to ensure its patterns end up before any others.

Chunk splitting configurations

Group lines into chunks using VIM-style “{{{”/“}}}” region designations. Assumes other configurations handle the actual content lines.

CHUNK_BY_VIM_REGIONS = {
  "formatters" => {
    "begin_chunk" => "[]",
    "end_chunk" => "[]",
    "nested_chunk" => "Formatter.nested_chunk_lines_to_html(lines)",
  },
  "syntax" => {
    "patterns" => {
      "begin_chunk" => { "regexp" => "^(\\s*)\\W*\\{\\{\\{\\s*(.*?)\\s*$" },
      "end_chunk" => { "regexp" => "^(\\s*)\\W*\\}\\}\\}\\s*(.*?)\\s*$" },
    },
    "states" => {
      "start" => {
        "transitions" => [
          { "pattern" => "begin_chunk" },
          { "pattern" => "end_chunk" },
          [],
        ],
      },
    },
  },
}

lib/codnar/configuration/highlighting.rb

Here is a simple test demonstrating splitting code chunks:

test/split_chunk_configurations.rb

require "codnar"
require "olag/test"
require "test/spec"
require "test_with_configurations"

Test built-in split code formatting configurations.

class TestSplitChunkConfigurations < Test::Unit::TestCase

  include Test::WithConfigurations
  include Test::WithErrors
  include Test::WithTempfile

  CODE_TEXT = <<-EOF.unindent.gsub("#!", "#")
    int x;
    #! {{{ chunk
    int y;
    #! }}}
  EOF

  CODE_HTML = <<-EOF.unindent.chomp
    <pre class='code'>
    int x;
    </pre>
    <pre class='nested chunk'>
    <a class='nested chunk' href='#chunk'>chunk</a>
    </pre>
  EOF

  CHUNK_HTML = <<-EOF.unindent.chomp
    <pre class='code'>
    int y;
    </pre>
  EOF

  def test_gvim_chunks
    check_split_file(CODE_TEXT,
                     Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("c"),
                     Codnar::Configuration::CHUNK_BY_VIM_REGIONS) do |path|
      [ {
        "name"=> path,
        "locations" => [ { "file" => path, "line" => 1 } ],
        "containers" => [],
        "contained" => [ "chunk" ],
        "html"=> CODE_HTML,
      }, {
        "name" => "chunk",
        "locations" => [ { "file" => path, "line" => 2 } ],
        "containers" => [ path ],
        "contained" => [],
        "html" => CHUNK_HTML,
      } ]
    end
  end

end

Putting it all together

Here is a test demonstrating putting several of the above configurations together in a meaningful way:

test/split_combined_configurations.rb

require "codnar"
require "olag/test"
require "test/spec"
require "test_with_configurations"

Test combination of many built-in configurations.

class TestSplitCombinedConfigurations < Test::Unit::TestCase

  include Test::WithConfigurations
  include Test::WithErrors
  include Test::WithTempfile

  CODE_TEXT = <<-EOF.unindent.gsub("#!", "#")
    #!!/usr/bin/ruby -w

    #! {{{ HTML snippet

    HELLO_WORLD_IN_HTML = <<-EOH.unindent.chomp #! ((( html

      <p>
      Hello, world!
      </p>
    EOH

    #! ))) html

    #! }}}

    #! {{{ Ruby code

    #! Hello, *world*!
    puts HELLO_WORLD_IN_HTML

    #! }}}
  EOF

  FILE_HTML = <<-EOF.unindent.chomp
    <pre class='ruby code syntax'>
    <span class="PreProc">#!/usr/bin/ruby -w</span>

    </pre>
    <pre class='nested chunk'>
    <a class='nested chunk' href='#html-snippet'>HTML snippet</a>
    </pre>
    <pre class='ruby code syntax'>

    </pre>
    <pre class='nested chunk'>
    <a class='nested chunk' href='#ruby-code'>Ruby code</a>
    </pre>
  EOF

  HTML_CHUNK = <<-EOF.unindent.chomp
    <pre class='ruby code syntax'>

    <span class="Type">HELLO_WORLD_IN_HTML</span> = &lt;&lt;-<span class="Special">EOH</span>.unindent.chomp <span class="Comment"># ((( html</span>

    </pre>
    <pre class='html code syntax'>
      <span class="Identifier">&lt;</span><span class="Statement">p</span><span class="Identifier">&gt;</span>
      Hello, world!
      <span class="Identifier">&lt;/</span><span class="Statement">p</span><span class="Identifier">&gt;</span>
    EOH
    </pre>
    <pre class='ruby code syntax'>

    <span class="Comment"># ))) html</span>

    </pre>
  EOF

  RUBY_CHUNK = <<-EOF.unindent.chomp
    <pre class='ruby code syntax'>

    </pre>
    <table class='layout'>
    <tr>
    <td class='indentation'>
    <pre></pre>
    </td>
    <td class='html'>
    <div class='rdoc comment markup'>
    <p>
    Hello, <strong>world</strong>!
    </p>
    </div>
    </td>
    </tr>
    </table>
    <pre class='ruby code syntax'>
    puts <span class="Type">HELLO_WORLD_IN_HTML</span>

    </pre>
  EOF

  def test_gvim_chunks
    check_split_file(CODE_TEXT,
                     Codnar::Configuration::CLASSIFY_SOURCE_CODE.call("ruby"),
                     Codnar::Configuration::FORMAT_CODE_GVIM_CSS.call("ruby"),
                     Codnar::Configuration::CLASSIFY_NESTED_CODE.call("ruby", "html"),
                     Codnar::Configuration::FORMAT_CODE_GVIM_CSS.call("html"),
                     Codnar::Configuration::CLASSIFY_SHELL_COMMENTS.call,
                     Codnar::Configuration::FORMAT_RDOC_COMMENTS,
                     Codnar::Configuration::CHUNK_BY_VIM_REGIONS) do |path|
      [ {
        "name" => path, "html" => FILE_HTML,
        "locations" => [ { "line" => 1, "file" => path } ], "containers" => [], "contained" => [ "HTML snippet", "Ruby code" ],
      }, {
        "name" => "HTML snippet", "html" => HTML_CHUNK,
        "locations" => [ { "line" => 3, "file" => path } ], "containers" => [ path ], "contained" => [],
      }, {
        "name" => "Ruby code", "html" => RUBY_CHUNK,
        "locations" => [ { "line" => 14, "file" => path } ], "containers" => [ path ], "contained" => [],
      } ]
    end
  end

end

Storing chunks on the disk

Writing chunks to disk

In any realistic system, the number of source files and chunks will be such that it makes sense to store the chunks on the disk for further processing. This allows incorporating the split operation as part of a build tool chain, and only re-splitting modified files. Here is a simple test demonstrating writing chunks to the disk:

test/write_chunks.rb

require "codnar"
require "olag/test"
require "test/spec"

Test writing chunks to files.

class TestWriteChunks < Test::Unit::TestCase

  include Test::WithFakeFS

  def test_write_chunks
    check_writing_data([])
    check_writing_data("name" => "foo")
    check_writing_data([ { "name" => "foo" }, { "name" => "bar" } ])
  end

  def test_write_invalid_data
    lambda { check_writing_data("not a chunk") }.should.raise
  end

protected

  def check_writing_data(data)
    Codnar::Writer.write("path", data)
    data = [ data ] unless Array === data
    YAML.load_file("path").should == data
  end

end

And here is the implementation:

lib/codnar/writer.rb

module Codnar

Write chunks into a disk file.

  class Writer

Write one chunk or an array of chunks to a disk file.

    def self.write(path, data)
      self.new(path) do |writer|
        writer << data
      end
    end

Add one chunk or an array of chunks to the disk file.

    def <<(data)
      case data
      when Array
        @chunks += data
      when Hash
        @chunks << data
      else
        raise "Invalid data class: #{data.class}"
      end
    end

  protected

Write chunks into the specified disk file.

    def initialize(path, &block)
      @chunks = []
      File.open(path, "w") do |file|
        block.call(self)
        file.print(@chunks.to_yaml)
      end
    end

  end

end

Reading chunks to memory

Having written the chunks to the disk requires us, at some following point in time, to read them back into memort. This is the first time we will have a view of the whole documented system, which allows us to detect several classes of consistency errors: Some chunks may be left out of the final narrative (consider this the equivalent of tests code coverage); we may be referring to missing (or misspelled) chunk names; and, finally, we need to deal with duplicate chunks.

In literate programming, it is trivial to write a chunk once and use it in several places in the compiled source code. The classical example is C/C++ function signatures that need to appear in both the .h and .c/.cpp files. However, in some cases this practice makes sense for other pieces of code, and since the ultimate source code contains only one copy of the chunk, this does not suffer from the typical copy-and-paste issues.

In inverse literate programming, if the same code appears twice (as a result of copy-and-paste), then it does suffer from the typical copy-and-paste issues. The most serious of these is, of course, that when only one copy is changed. The way that Codnar helps alleviate this problem is that if the same chunk appears more than once in the source code, its content is expected to be exactly the same in both cases (up to indentation). This should not be viewed as endorsement of copy-and-paste programming; Using duplicate chunks should be a last resort measure to combat restrictions in the programming language and compilation tool chain.

Chunk identifiers

The above definition raises the obvious question: what does "the same chunk" mean? As far as Codnar is concerned, a chunk is uniquely identified by its name, which is specified on the begin_chunk line. The unique identifier is not the literal name but a transformation of it. This allows us to ignore capitalization, white space, and any punctuation that may appear in the name. It also allows us to use the resulting ID as an HTML anchor name, without worrying about HTML's restictions on such names.

Here is a simple test demonstrating converting names to identifiers:

test/identify_chunks.rb

require "codnar"
require "test/spec"

Test converting chunk names to identifiers.

class TestIdentifyChunks < Test::Unit::TestCase

  def test_lower_case_to_id
    "a".to_id.should == "a"
  end

  def test_upper_case_to_id
    "A".to_id.should == "a"
  end

  def test_digits_to_id
    "1".to_id.should == "1"
  end

  def test_non_alnum_to_id
    "!@-$#".to_id.should == "-"
  end

  def test_complex_to_id
    "C# for .NET!".to_id.should == "c-for-net-"
  end

  def test_strip_to_id
    " a ".to_id.should == "a"
  end


end

And here is the implementation:

lib/codnar/string_extensions.rb

Extend the core String class.

class String

Convert this String to an identifier. This is a stable operation, so anything that accept a name will also accept an identifier as well.

  def to_id
    return self.strip.gsub(/[^a-zA-Z0-9]+/, "-").downcase
  end

  Clean HTML

end

In-memory chunks storage

Detecting unused and/or duplicate chunks requires us to have in-memory chunk storage that tracks all chunks access. Here is a simple test demonstrating reading chunks into the storage and handling the various error conditions listed above:

test/read_chunks.rb

require "codnar"
require "olag/test"
require "test/spec"

Test reading chunks from files.

class TestReadChunks < Test::Unit::TestCase

  include Test::WithErrors
  include Test::WithFakeFS

  def test_read_chunks
    Codnar::Writer.write("foo.chunks", { "name" => "foo" })
    Codnar::Writer.write("bar.chunks", [ { "name" => "bar" }, { "name" => "baz" } ])
    reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks"))
    check_read_data(reader, "foo" => { "name" => "foo" },
                            "bar" => { "name" => "bar" },
                            "baz" => { "name" => "baz" })
    @errors.should == []
  end

  def test_read_invalid_chunks
    write_fake_file("foo.chunks")
    reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks"))
    @errors.should == [ "#{$0}: Invalid chunks data in file: #{File.expand_path("foo.chunks")}" ]
  end

  def test_read_unused_chunks
    Codnar::Writer.write("foo.chunks", { "name" => "foo",
                                         "locations" => [ { "file" => "a", "line" => 1 } ] })
    Codnar::Writer.write("bar.chunks", { "name" => "bar",
                                         "locations" => [ { "file" => "b", "line" => 2 } ] })
    reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks"))
    check_read_data(reader, "foo" => { "name" => "foo",
                                       "locations" => [ { "file" => "a", "line" => 1 } ] })
    @errors.should == [ "#{$0}: Unused chunk: bar in file: b at line: 2" ]
  end

  def test_read_duplicate_chunks
    Codnar::Writer.write("foo.chunks", { "name" => "foo", "locations" => [ { "file" => "a" } ],
                                         "contained" => [ "A" ], "containers" => [ "c" ] })
    Codnar::Writer.write("bar.chunks", [
      { "name" => "foo", "locations" => [ { "file" => "b" } ],
        "contained" => [ "a" ], "containers" => [ "d" ] },
      { "name" => "foo", "locations" => [ { "file" => "c" } ],
        "contained" => [ "a" ], "containers" => [] }
    ])
    reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks"))
    check_read_data(reader, "foo" => {
      "name" => "foo",
      "locations" => [ { "file" => "a" }, { "file" => "b" }, { "file" => "c" } ],
      "contained" => [ "a" ],
      "containers" => [ "c", "d" ],
    })
  end

  def test_read_different_chunks
    Codnar::Writer.write("foo.chunks", [
      { "name" => "foo", "html" => "bar", "locations" => [ { "file" => "foo.chunks", "line" => 1 } ],
        "contained" => [ "a" ], "containers" => [] },
      { "name" => "foo", "html" => "baz", "locations" => [ { "file" => "foo.chunks", "line" => 2 } ],
        "contained" => [ "A" ], "containers" => [] }
    ])
    Codnar::Writer.write("bar.chunks", [ { "name" => "foo", "html" => "bar",
                                           "locations" => [ { "file" => "bar.chunks", "line" => 1 } ],
                                           "contained" => [ "a" ], "containers" => [] } ])
    reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks").sort)
    @errors.should == [ "#{$0}: Chunk: foo is different in file: foo.chunks at line: 2, " \
                      + "and in file: bar.chunks at line: 1 or in file: foo.chunks at line: 1" ]
    check_read_data(reader, "foo" => {
      "name" => "foo",
      "html" => "bar",
      "locations" => [ { "file" => "bar.chunks", "line" => 1 }, { "file" => "foo.chunks", "line" => 1 } ],
      "contained" => [ "a" ],
      "containers" => [],
    })
  end

  def test_read_fake_chunk
    reader = Codnar::Reader.new(@errors, [])
    reader["foo"].should == Codnar::Reader.fake_chunk("foo")
    @errors.should == [ "#{$0}: Missing chunk: foo" ]
  end

  def test_read_equivalent_name_chunks
    Codnar::Writer.write("foo.chunks", [
      { "name" => "Foo?", "locations" => [ { "file" => "foo.chunks", "line" => 1 } ],
        "containers" => [ "1" ], "contained" => [ "c" ] },
      { "name" => "FOO!!", "locations" => [ { "file" => "foo.chunks", "line" => 2 } ],
        "containers" => [ "2" ], "contained" => [ "C" ] }
    ])
    reader = Codnar::Reader.new(@errors, Dir.glob("./**/*.chunks"))
    check_read_data(reader, "foo-" => {
      "name" => "Foo?",
      "locations" => [ { "file" => "foo.chunks", "line" => 1 }, { "file" => "foo.chunks", "line" => 2 } ],
      "containers" => [ "1", "2" ],
      "contained" => [ "c" ],
    })
  end

protected

  def check_read_data(reader, chunks)
    chunks.each do |name, chunk|
      reader[name].should == chunk
    end
    reader.collect_unused_chunk_errors
  end

end

And here is the implementation:

lib/codnar/reader.rb

module Codnar

Read chunks from disk files.

  class Reader

Load all chunks from the specified disk files to memory for later access by name.

    def initialize(errors, paths)
      @errors = errors
      @chunks = {}
      @used = {}
      paths.each do |path|
        read_path_chunks(path)
      end
    end

Fetch a chunk by its name.

    def [](name)
      id = name.to_id
      @used[id] = true
      return @chunks[id] ||= (
        @errors << "Missing chunk: #{name}"
        Reader.fake_chunk(name)
      )
    end

Collect errors for unused chunks.

    def collect_unused_chunk_errors
      @chunks.each do |id, chunk|
        @errors.push("#{$0}: Unused chunk: #{chunk.name} #{Reader.locations_message(chunk)}") unless @used[id]
      end
    end

  protected

Load and merge all chunks from a disk file into memory.

    def read_path_chunks(path)
      @errors.in_path(path) do
        chunks = load_path_chunks(path)
        next unless chunks
        merge_loaded_chunks(chunks)
        @root_chunk ||= chunks[0].name
      end
    end

Load all chunks from a disk file into memory.

    def load_path_chunks(path)
      chunks = YAML.load_file(path)
      @errors << "Invalid chunks data" unless chunks

TODO: A bit more validation would be nice.

      return chunks
    end

Merge an array of chunks into memory.

    def merge_loaded_chunks(chunks)
      chunks.each do |new_chunk|
        old_chunk = @chunks[id = new_chunk.name.to_id]
        if old_chunk.nil?
          @chunks[id] = new_chunk
        elsif Reader.same_chunk?(old_chunk, new_chunk)
          Reader.merge_same_chunks(old_chunk, new_chunk)
        else
          @errors.push(Reader.different_chunks_error(old_chunk, new_chunk))
        end
      end
    end

Merge a new “same” chunk into an old one.

    def self.merge_same_chunks(old_chunk, new_chunk)
      old_chunk.locations = \
        (old_chunk.locations + new_chunk.locations).uniq.sort \
          do |first_location, second_location|
            [ first_location.file.to_id, first_location.line ] \
            <=> [ second_location.file.to_id, second_location.line ]
          end
      old_chunk.containers = \
        (old_chunk.containers + new_chunk.containers).uniq.sort \
          do |first_name, second_name|
            first_name.to_id <=> second_name.to_id
          end
    end

Check whether two chunks contain the same “stuff”.

    def self.same_chunk?(old_chunk, new_chunk)
      return Reader.chunk_payload(old_chunk) == Reader.chunk_payload(new_chunk)
    end

Return just the actual payload of a chunk for equality comparison.

    def self.chunk_payload(chunk)
      chunk = chunk.reject { |key, value| [ "locations", "name", "containers" ].include?(key) }
      chunk.contained.map! { |name| name.to_id }
      return chunk
    end

Error message when two different chunks have the same name.

    def self.different_chunks_error(old_chunk, new_chunk)
      old_location = Reader.locations_message(old_chunk)
      new_location = Reader.locations_message(new_chunk)
      return "#{$0}: Chunk: #{old_chunk.name} is different #{new_location}, and #{old_location}"
    end

Format a chunk’s location for an error message.

    def self.locations_message(chunk)
      locations = chunk.locations.map { |location| "in file: #{location.file} at line: #{location.line}" }
      return locations.join(" or ")
    end

Return a fake chunk for the specified name.

    def self.fake_chunk(name)
      return {
        "name" => name,
        "locations" => [ { "file" => "MISSING" } ],
        "contained" => [],
        "containers" => [],
        "html" => "<div class='missing chunk error'>\nMISSING\n</div>"
      }
    end

  end

end

Weaving chunks into HTML

Assembling the final HTML requires combining both the narrative documentation and source code chunks. This is done top-down starting at a "root" documentation chunk and recursively embedding nested documentation and code chunks into it.

Weaving chunks together

When embedding a documentation chunk inside another documentation chunk, things are pretty easy - we just need to insert the embedded chunk HTML into the containing chunk. When embedding a source code chunk into the documentation, however, we may want to wrap it in some boilerplate HTML, providing a header, footer, borders, links, etc. Therefore, the HTML syntax we use to embed a chunk into the documentation is <embed src="..." type="x-codnar/template-name"/>. The templates are normal ERB templates, except for the magical file and image templates, described below.

At any rate, here is a simple test demonstrating applying different templates to the embedded code chunks:

test/weave_configurations.rb

require "codnar"
require "olag/test"
require "test/spec"

Test the built-in weave configurations.

class TestWeaveConfigurations < Test::Unit::TestCase

  include Test::WithErrors
  include Test::WithFakeFS

  def test_weave_file
    Codnar::Writer.write("chunks", {
      "locations" => [ "file" => "chunk" ], "containers" => [], "contained" => [],
      "name" => "Top", "html" => <<-EOF.unindent,
        <h1>Top</h1>
        <embed src="path" type="x-codnar/file"/>
      EOF
    })
    write_fake_file("path", "<h2>File</h2>\n")
    html = Codnar::Weaver.new(@errors, [ "chunks" ], Codnar::Configuration::WEAVE_INCLUDE).weave("include", "top")
    @errors.should == []
    html.should == <<-EOF.unindent
      <h1>Top</h1>
      <h2>File</h2>
    EOF
  end

  def test_weave_include
    Codnar::Writer.write("chunks", chunks("include"))
    html = Codnar::Weaver.new(@errors, [ "chunks" ], Codnar::Configuration::WEAVE_INCLUDE).weave("include", "top")
    @errors.should == []
    html.should == <<-EOF.unindent #! ((( html

      <h1>Top</h1>
      <h2>Intermediate</h2>
      <h3>Bottom</h3>
    EOF

    #! ))) html
  end

  WOVEN_PLAIN_CHUNK = <<-EOF.unindent #! ((( html

    <div class="plain chunk">
    <a name="top"/>
    <h1>Top</h1>
    <div class="plain chunk">
    <a name="intermediate"/>
    <h2>Intermediate</h2>
    <div class="plain chunk">
    <a name="bottom"/>
    <h3>Bottom</h3>
    </div>
    </div>
    </div>
  EOF

  #! ))) html

  def test_weave_plain_chunk
    Codnar::Writer.write("chunks", chunks("plain_chunk"))
    html = Codnar::Weaver.new(@errors, [ "chunks" ], Codnar::Configuration::WEAVE_PLAIN_CHUNK).weave("plain_chunk", "top")
    @errors.should == []
    html.should == WOVEN_PLAIN_CHUNK
  end

Normally, one does not nest named_chunk_with_containers chunks this way, but it serves as a test.

  WOVEN_NAMED_CHUNK = <<-EOF.unindent #! ((( html

    <div class="named_with_containers chunk">
    <div class="chunk name">
    <a name="top">
    <span>Top</span>
    </a>
    </div>
    <div class="chunk html">
    <h1>Top</h1>
    <div class="named_with_containers chunk">
    <div class="chunk name">
    <a name="intermediate">
    <span>Intermediate</span>
    </a>
    </div>
    <div class="chunk html">
    <h2>Intermediate</h2>
    <div class="named_with_containers chunk">
    <div class="chunk name">
    <a name="bottom">
    <span>BOTTOM</span>
    </a>
    </div>
    <div class="chunk html">
    <h3>Bottom</h3>
    </div>
    <div class="chunk containers">
    <span class="chunk containers header">Contained in:</span>
    <ul class="chunk containers">
    <li class="chunk container">
    <a class="chunk container" href="#intermediate">Intermediate</a>
    </li>
    </ul>
    </div>
    </div>
    </div>
    <div class="chunk containers">
    <span class="chunk containers header">Contained in:</span>
    <ul class="chunk containers">
    <li class="chunk container">
    <a class="chunk container" href="#top">Top</a>
    </li>
    </ul>
    </div>
    </div>
    </div>
    </div>
  EOF

  #! ))) html

  def test_weave_named_chunk_with_containers
    Codnar::Writer.write("chunks", chunks("named_chunk_with_containers"))
    weaver = Codnar::Weaver.new(@errors, [ "chunks" ], Codnar::Configuration::WEAVE_NAMED_CHUNK_WITH_CONTAINERS)
    html = weaver.weave("named_chunk_with_containers", "top")
    @errors.should == []
    html.should == WOVEN_NAMED_CHUNK
  end

protected

  def chunks(template)
    return [
      { "locations" => [ "file" => "chunk" ], "containers" => [ "Intermediate" ], "contained" => [],
        "name" => "BOTTOM", "html" => "<h3>Bottom</h3>\n", },
      { "locations" => [ "file" => "chunk" ], "containers" => [ "Top" ], "contained" => [ "BOTTOM" ],
        "name" => "Intermediate", "html" => <<-EOF.unindent, #! ((( html

          <h2>Intermediate</h2>
          <embed type='x-codnar/#{template}' src='bottom'>
          </embed>
        EOF

      }, { #! ))) html
        "locations" => [ "file" => "chunk" ], "containers" => [], "contained" => [ "Intermediate" ],
        "name" => "Top", "html" => <<-EOF.unindent, #! ((( html

          <h1>Top</h1>
          <embed src="##INTERMEDIATE" type="x-codnar/#{template}"/>
        EOF

    } ] #! ))) html
  end

end

Here is the implementation:

lib/codnar/weaver.rb

module Codnar

Weave all chunks to a unified HTML.

  class Weaver < Reader

Load all chunks from the specified disk files to memory for weaving using the specified templates.

    def initialize(errors, paths, templates)
      super(errors, paths)
      @templates = templates
    end

How to process each magical file template.

    FILE_TEMPLATE_PROCESSORS = {
      "file" => lambda { |name, data| data },
      "image" => lambda { |name, data| Weaver.embedded_base64_img_tag(name, data) },
    }

Weave the HTML for a named chunk.

    def weave(template, chunk_name = @root_chunk)
      return process_file_template(template, chunk_name) if FILE_TEMPLATE_PROCESSORS.include?(template)
      @last_chunk = chunk = self[chunk_name.to_id]
      expand_chunk_html(chunk)
      return process_template(chunk, template)
    end

  protected

Due to github.com/relevance/rcov/issues/#issue/43 the following regular expressions must be on a single line.

Detect embedded chunks (type before src).

    TYPE_SRC_CHUNK = / [ ]* <embed \s+ type = ['\"] x-codnar\/ (.*?) ['\"] \s+ src = ['\"] \#* (.*?) ['\"] \s* (?: \/> | > \s* <\/embed> ) [ ]* /x

Detect embedded chunks (src before type).

    SRC_TYPE_CHUNK = / [ ]* <embed \s+ src = ['\"] \#* (.*?) ['\"] \s+ type = ['\"] x-codnar\/ (.*?) ['\"] \s* (?: \/> | > \s* <\/embed> ) [ ]* /x

Recursively expand all embedded chunks inside a container chunk.

    def expand_chunk_html(chunk)
      html = chunk.html
      @errors.push("No HTML in chunk: #{chunk.name} #{Weaver.locations_message(chunk)}") unless html
      #! TRICKY: All "container" chunks are assumed to be whole-file chunks with
      #! a single location. Which makes sense as these are documentation and not
      #! code chunks. TODO: It would be nice to know the exact line number of
      #! the chunk embedding directive for better pinpointing of any error.
      @errors.in_path(chunk.locations[0].file) do
        chunk.expanded_html ||= expand_embedded_chunks(html || "").chomp
      end
    end

Recursively expand_embedded_chunks all embedded chunk inside an HTML.

    def expand_embedded_chunks(html)
      return html.gsub(TYPE_SRC_CHUNK) { |match| weave($1, $2).chomp } \
                 .gsub(SRC_TYPE_CHUNK) { |match| weave($2, $1).chomp }
    end

Process the chunk using an ERB template prior to inclusion in container chunk.

    def process_template(chunk, template_name)
      template_text = @templates[template_name] ||= (
        @errors << "Missing ERB template: #{template_name}"
        "<%= chunk.expanded_html %>\n"
      )
      return (
        (
          chunk.erb ||= {}
        )[template_name] ||= ERB.new(template_text, nil, "%")
      ).result(binding)
    end

    Processing the file template

    Processing Base64 embedded data images


  end

end

And here are the pre-defined weaving template configurations:

lib/codnar/weave_configurations.rb

module Codnar

  module Configuration

Weave configuration providing a single simple include template.

    WEAVE_INCLUDE = { "include" => "<%= chunk.expanded_html %>\n" }

Weave chunks in the plainest possible way.

    WEAVE_PLAIN_CHUNK = {
      "plain_chunk" => <<-EOF.unindent, #! ((( html

        <div class="plain chunk">
        <a name="<%= chunk.name.to_id %>"/>
        <%= chunk.expanded_html %>
        </div>
      EOF

    } #! ))) html

Weave chunks with their name and the list of container chunks.

    WEAVE_NAMED_CHUNK_WITH_CONTAINERS = {
      "named_chunk_with_containers" => <<-EOF.unindent, #! ((( html

        <div class="named_with_containers chunk">
        <div class="chunk name">
        <a name="<%= chunk.name.to_id %>">
        <span><%= CGI.escapeHTML(chunk.name) %></span>
        </a>
        </div>
        <div class="chunk html">
        <%= chunk.expanded_html %>
        </div>
        % if chunk.containers != []
        <div class="chunk containers">
        <span class="chunk containers header">Contained in:</span>
        <ul class="chunk containers">
        % chunk.containers.each do |container|
        <li class="chunk container">
        <a class="chunk container" href="#<%= container.to_id %>"><%= CGI.escapeHTML(container) %></a>
        </li>
        % end
        </ul>
        </div>
        % end
        </div>
      EOF

    } #! ))) html

  end

end

Embedding files

The template named file is special in two ways. First, the src is given special treatment. If it begins with a ".", it is assumed to be a normal path name relative to the current working directory; otherwise, it is assumed to be a name of a file packaged inside some gem and is searched for in Ruby's $LOAD_PATH. This allows gems (such as Codnar itself) to provide such files to be used in the woven documentation.

Second, the content of the file is simply embedded into the generated documentation. This allows the documentation to be a stand-alone file, including all the CSS and Javascript required for proper display.

Processing the file template

Process one of the magical file templates. The content of the file, optionally processed, is directly embedded into the generated documentation. If the file’s path begins with “.”, it is taken to be relative to the current working directory. Otherwise, it is searched for in Ruby’s load path, allowing easy access to files packaged inside gems.

def process_file_template(template, path)
  begin
    path = Olag::DataFiles.expand_path(path) unless path[0,1] == "."
    return FILE_TEMPLATE_PROCESSORS[template].call(path, File.read(path))
  rescue Exception => exception
    @errors.push("#{$0}: Reading file: #{path} exception: #{exception} #{Reader.locations_message(@last_chunk)}") \
      if @last_chunk
    return "FILE: #{path} EXCEPTION: #{exception}"
  end
end

lib/codnar/weaver.rb

See the doc/root.html file for plenty of examples of using this functionality.

Embedding images

The image template is a specialization of the file template for dealing with embedded images. The specified image file is embedded into the generated HTML as an img tag, using a data URL. This is very useful for small images, but is problematic when their size increase beyond browser-specific limits.

Here is a simple test demonstrating processing embedded image files:

test/embed_images.rb

require "codnar"
require "test/spec"

Test computing embedded image HTML tags.

class TestEmbedImages < Test::Unit::TestCase

  def test_embed_image
    Codnar::Weaver.embedded_base64_img_tag('fake file.png', 'fake file content').should \
      == "<img src='data:image/png;base64,ZmFrZSBmaWxlIGNvbnRlbnQ=\n'/>"
  end

end

Here is the implementation:

Processing Base64 embedded data images

Create an img tag with an embedded data URL. Different browsers have different constraints about the size of the resulting URL, so YMMV.

def self.embedded_base64_img_tag(name, data)
  extension = File.extname(name).sub(".", "/")
  return "<img src='data:image#{extension};base64," \
       + Base64.encode64(data) \
       + "'/>"
end

lib/codnar/weaver.rb

And here is a sample embedded image:

Invoking the functionality

There are two ways to invoke Codnar's functionality - from the command line, and (for Ruby projects) as integrated Rake tasks.

Command Line Applications

Executable scripts (tests, command-line applications) start with a require 'codnar' line to access to the full Codnar code. This also serves as a convenient list of all of Codnar's parts and dependencies:

lib/codnar.rb

require "andand"
require "base64"
require "cgi"
require "coderay"
require "digest/sha2"
require "erb"
require "fileutils"
require "irb"
require "open3"
require "rdiscount"
require "rdoc"
require "rdoc/markup/to_html"
require "tempfile"
require "yaml"

require "olag/application"
require "olag/data_files"
require "olag/errors"
require "olag/hash_struct"
require "olag/string_unindent"

require "codnar/version"

require "codnar/coderay"
require "codnar/haddock"
require "codnar/hash_extensions"
require "codnar/markdown"
require "codnar/rdoc"
require "codnar/string_extensions"

require "codnar/application"
require "codnar/cache"
require "codnar/formatter"
require "codnar/graphviz"
require "codnar/grouper"
require "codnar/gvim"
require "codnar/merger"
require "codnar/split"
require "codnar/reader"
require "codnar/scanner"
require "codnar/configuration/code"
require "codnar/configuration/comments"
require "codnar/configuration/documentation"
require "codnar/configuration/highlighting"
require "codnar/split_configurations"
require "codnar/splitter"
require "codnar/sunlight"
require "codnar/weave"
require "codnar/weave_configurations"
require "codnar/weaver"
require "codnar/writer"

The base command line Application class handles execution from the command line, with the usual standard options, as well as some Codnar-specific ones: the ability to specify configuration files and/or built-in configurations, and the ability to include additional extension code triggered from these configurations. Together, these allow configuring and extending Codnar's behavior to cover the specific system's needs.

Here is a simple test demonstrating the standard Codnar application behavior:

test/run_application.rb

require "codnar"
require "olag/test"
require "test/spec"

module Codnar

Test running a Codnar Application.

  class TestRunApplication < Test::Unit::TestCase

    include Test::WithFakeFS
    include Test::WithTempfile

    def test_print_version
      Codnar::Application.with_argv(%w(-o nested/stdout -v -h dummy)) { Codnar::Application.new(true).run }.should == 0
      File.read("nested/stdout").should == "#{$0}: Version: #{Codnar::VERSION}\n"
    end

    def test_print_help
      Codnar::Application.with_argv(%w(-o stdout -h -v dummy)) { Codnar::Application.new(true).run }.should == 0
      File.read("stdout").should.include?("OPTIONS")
    end

    USER_CONFIGURATION = {
      "formatters" => {
        "doc" => "Formatter.lines_to_pre_html(lines, :class => :pre)",
      }
    }

    def test_merge_configurations
      write_fake_file("user_configuration.yaml", USER_CONFIGURATION.to_yaml)
      Codnar::Application.with_argv(%w(-o stdout -c split_pre_documentation -c user_configuration.yaml -p)) { Codnar::Application.new(true).run }.should == 0
      YAML.load_file("stdout").should == Codnar::Configuration::SPLIT_PRE_DOCUMENTATION.deep_merge(USER_CONFIGURATION)
    end

    def test_require_missing_configuration
      status = Application.with_argv(%w(-e stderr -c no-such-configuration)) { Codnar::Application.new(true).run }.should == 1
      File.read("stderr").should \
        == "#{$0}: Configuration: no-such-configuration is neither a disk file nor a known configuration\n"
    end

    def test_require_module
      FakeFS.deactivate! # The additional_module is read by Ruby and is not affected by FakeFS.
      directory = create_tempdir
      write_fake_file(directory + "/additional_module.rb", "puts 'HERE'\n")
      Application.with_argv(["-o", stdout = directory + "/stdout", "-I", directory, "-r", "additional_module" ]) { Codnar::Application.new(true).run }.should == 0
      File.read(stdout).should == "HERE\n"
    end

    def test_require_missing_module
      Application.with_argv(%w(-e stderr -I support -r no_such_module)) { Codnar::Application.new(true).run }.should == 1
      File.read("stderr").should == "#{$0}: no such file to load -- no_such_module\n"
    end

  end

end

And here is the implementation:

lib/codnar/application.rb

module Codnar

Base class for Codnar applications.

  class Application < Olag::Application

Create a Codnar application.

    def initialize(is_test = nil)
      super(is_test)
      @configuration ||= {}
    end

Run the Codnar application, returning its status.

    def run(&block)
      super(@configuration, &block)
    end

  protected

Define Codnar application flags.

    def define_flags
      super
      define_include_flag
      define_require_flag
      define_merge_flag
      define_print_flag
    end

Return the application’s version - that is, Codnar’s version.

    def version
      return Codnar::VERSION
    end

Define a flag for collecting module load path directories.

    def define_include_flag
      @options.on("-I", "--include DIRECTORY", String, "Add directory to Ruby's load path.") do |path|
        $LOAD_PATH.unshift(path)
      end
    end

Define a flag for loading a Ruby module. This may be needed for user-specified configurations to work.

    def define_require_flag
      @options.on("-r", "--require MODULE", String, "Load a Ruby module for user configurations.") do |path|
        begin
          require(path)
        rescue Exception => exception
          $stderr.puts("#{$0}: #{exception}")
          exit(1)
        end
      end
    end

Define a flag for applying (merging) a Codnar configuration.

    def define_merge_flag
      @options.on("-c", "--configuration NAME-or-FILE", String, "Apply a named or disk file configuration.") do |name_or_path|
        loaded_configuration = load_configuration(name_or_path)
        @configuration = @configuration.deep_merge(loaded_configuration)
      end
    end

Define a flag for printing the (merged) Codnar configuration.

    def define_print_flag
      @options.on("-p", "--print", "Print the merged configuration.") do |name_or_path|
        puts(@configuration.to_yaml)
      end
    end

Load a configuration either from the available builtin data or from a disk file.

    def load_configuration(name_or_path)
      return YAML.load_file(name_or_path) if File.exist?(name_or_path)
      name, *arguments = name_or_path.split(':')
      value = configuration_value(name)
      value = value.call(*arguments) unless Hash === value
      return value
    end

Compute the value of a named built-in configuration.

    def configuration_value(name)
      begin
        value = Configuration.const_get(name.upcase)
        return value if value
      rescue
        value = nil
      end
      $stderr.puts("#{$0}: Configuration: #{name} is neither a disk file nor a known configuration")
      exit(1)
    end

  end

end

Application for splitting files

Here is a simple test demonstrating invoking the command-line application for splitting files:

test/run_split.rb

require "codnar"
require "olag/test"
require "test/spec"

Test running the Split Codnar Application.

class TestRunSplit < Test::Unit::TestCase

  include Test::WithFakeFS

  def test_print_help
    Codnar::Application.with_argv(%w(-o stdout -h)) { Codnar::Split.new(true).run }.should == 0
    help = File.read("stdout")
    [ "codnar-split", "OPTIONS", "DESCRIPTION" ].each { |text| help.should.include?(text) }
  end

  def test_run_split
    write_fake_file("input", "<foo>\n")
    Codnar::Application.with_argv(%w(-o stdout input)) { Codnar::Split.new(true).run }.should == 0
    YAML.load_file("stdout").should == [ {
      "name" => "input",
      "locations" => [ { "file" => "input", "line" => 1 } ],
      "html" => "<foo>",
      "containers" => [],
      "contained" => [],
    } ]
  end

end

Here is the implementation:

lib/codnar/split.rb

module Codnar

Split application.

  class Split < Application

Run the weaving Codnar application, returning its status.

    def run
      super { split }
    end

  protected

Split the specified input file into chunks.

    def split
      @configuration = Codnar::Configuration::SPLIT_HTML_DOCUMENTATION if @configuration == {}
      splitter = Splitter.new(@errors, @configuration)
      print(splitter.chunks(ARGV[0]).to_yaml)
    end

Parse remaining command-line file arguments.

    def parse_arguments
      expect_exactly(1, "files to split")
    end

Return the banner line of the help message.

    def banner
      return "codnar-split - Split documentation or code files to chunks."
    end

Return the name and description of any final command-line file arguments.

    def arguments
      return "FILE", "Documentation or code file to split."
    end

Return a short description of the program.

    def description
      return <<-EOF.unindent
        Split the documentation of file into chunks that are printed in YAML format to
        the output (to be read by codnar-weave). Many file formats can be split
        depending on the specified configuration. The default configuration is called
        SPLIT_HTML_DOCUMENTATION, and it preserves the whole file as a single formatted
        HTML documentation chunk. This isn't very useful.

        The configuration needs to specify a set of line classification patterns,
        parsing states and pattern-based transitions between them, the initial state,
        and expressions for formatting classified lines to HTML. See the Codnar
        documentation for details.
      EOF
    end

  end

end

And here is the actual command-line application script:

bin/codnar-split

#!/usr/bin/ruby -w

require "codnar"

exit Codnar::Split.new.run

Application for weaving chunks

Here is a simple test demonstrating invoking the command-line application for weaving chunk to HTML:

test/run_weave.rb

require "codnar"
require "olag/test"
require "test/spec"

Test running the Weave Codnar Application.

class TestRunWeave < Test::Unit::TestCase

  include Test::WithFakeFS

  def test_print_help
    Codnar::Application.with_argv(%w(-o stdout -h)) { Codnar::Weave.new(true).run }.should == 0
    help = File.read("stdout")
    [ "codnar-weave", "OPTIONS", "DESCRIPTION" ].each { |text| help.should.include?(text) }
  end

  ROOT_CHUNKS = [ {
    "name" => "root",
    "locations" => [ { "file" => "root", "line" => 1 } ],
    "html" => "Root\n<embed src='included' type='x-codnar/include'/>\n"
  } ]

  INCLUDED_CHUNKS = [ {
    "name" => "included",
    "locations" => [ { "file" => "included", "line" => 1 } ],
    "html" => "Included"
  } ]

  def test_run_weave
    write_fake_file("root", ROOT_CHUNKS.to_yaml)
    write_fake_file("included", INCLUDED_CHUNKS.to_yaml)
    Codnar::Application.with_argv(%w(-o stdout root included)) { Codnar::Weave.new(true).run }.should == 0
    File.read("stdout").should == "Root\nIncluded\n"
  end

  def test_run_weave_missing_chunk
    write_fake_file("root", ROOT_CHUNKS.to_yaml)
    Codnar::Application.with_argv(%w(-e stderr -o stdout root)) { Codnar::Weave.new(true).run }.should == 1
    File.read("stderr").should == "#{$0}: Missing chunk: included in file: root\n"
  end

  def test_run_weave_unused_chunk
    write_fake_file("root", ROOT_CHUNKS.to_yaml)
    write_fake_file("included", INCLUDED_CHUNKS.to_yaml)
    Codnar::Application.with_argv(%w(-e stderr -o stdout included root)) { Codnar::Weave.new(true).run }.should == 1
    File.read("stderr").should == "#{$0}: Unused chunk: root in file: root at line: 1\n"
  end

  FILE_CHUNKS = [ {
    "name" => "root",
    "locations" => [ { "file" => "root", "line" => 1 } ],
    "html" => "Root\n<embed src='included.file' type='x-codnar/file'/>\n"
  } ]

  def test_run_weave_missing_file
    write_fake_file("root", FILE_CHUNKS.to_yaml)
    Codnar::Application.with_argv(%w(-e stderr -o stdout root)) { Codnar::Weave.new(true).run }.should == 1
    File.read("stdout").should == "Root\nFILE: included.file EXCEPTION: No such file or directory - included.file\n"
    File.read("stderr").should \
      == "#{$0}: Reading file: included.file exception: No such file or directory - included.file in file: root at line: 1\n"
  end

  def test_run_weave_existing_file
    write_fake_file("root", FILE_CHUNKS.to_yaml)
    write_fake_file("included.file", "included file\n")
    Codnar::Application.with_argv(%w(-e stderr -o stdout root)) { Codnar::Weave.new(true).run }.should == 0
    File.read("stdout").should == "Root\nincluded file\n"
  end

end

Here is the implementation:

lib/codnar/weave.rb

module Codnar

Weave application.

  class Weave < Application

Run the weaving Codnar application, returning its status.

    def run
      super { weave }
    end

  protected

Weave all the chunks together to a single HTML.

    def weave
      @configuration = Codnar::Configuration::WEAVE_INCLUDE if @configuration == {}
      weaver = Weaver.new(@errors, ARGV, @configuration)
      puts(weaver.weave("include"))
      weaver.collect_unused_chunk_errors
    end

Parse remaining command-line file arguments.

    def parse_arguments
      expect_at_least(1, "chunk files to weave")
    end

Return the banner line of the help message.

    def banner
      return "codnar-weave - Weave documentation chunks to a single HTML."
    end

Return the name and description of any final command-line file arguments.

    def arguments
      return "MAIN-CHUNK ADDITIONAL-CHUNKS", "Chunk files to weave together."
    end

Return a short description of the program.

    def description
      print(<<-EOF.unindent)
        Weave chunks in all chunk files (from codnar-split) to a single HTML that is
        printed to the output. The first file is the main documentation file that is
        expected to include all the rest of the chunks via directives of the format:

          <embed src="chunk-name" type="x-codnar/template-name"></embed>

        Where the template-name is a key in the configuration, whose value is an ERB
        template for embedding the named chunk into the documentation.

        If no configuration is specified, the WEAVE_INCLUDE configuration is assumed.
        This configuration contains a single template named "include", which simply
        includes the named chunk into the generated HTML.
      EOF
    end

  end

end

And here is the actual command-line application script:

bin/codnar-weave

#!/usr/bin/ruby -w

require "codnar"

exit Codnar::Weave.new.run

Rake Integration

For Ruby projects (or any other project using Rake), it is also possible to invoke Codnar using Rake tasks. Here is a simple test demonstrating using the Rake tasks:

test/rake_tasks.rb

require "codnar/rake"
require "olag/test"
require "test/spec"

Test rake tasks.

class TestRakeTasks < Test::Unit::TestCase

  include Test::WithFakeFS
  include Test::WithRake

  def test_default
    run_rake
    test_results
  end

protected

  def run_rake
    write_fake_file("foo", "foo\n")
    Codnar::Rake::SplitTask.new([ "foo" ], [])
    Codnar::Rake::WeaveTask.new("foo", [])
    @rake["codnar"].invoke
  end

  def test_results
    chunk_file = Codnar::Rake.chunks_dir + "/foo"
    YAML.load_file(chunk_file).should == [ {
      "html" => "foo",
      "name" => "foo",
      "locations" => [ { "file" => "foo", "line" => 1 } ],
      "containers" => [],
      "contained" => [],
    } ]
    File.read("codnar.html").should == "foo\n"
    Codnar::Rake.chunk_files.should == [ chunk_file ]
  end

end

To use these tasks in a Rakefile, one needs to require 'codnar/rake'. The code implements a singleton that holds the global state shared between tasks:

lib/codnar/rake.rb

require "rake"
require "rake/tasklib"

require "codnar"
require "codnar/rake/split_task"
require "codnar/rake/weave_task"

module Codnar

This module contains all the Codnar Rake tasks code.

  module Rake

    class << self

The root folder to store all chunk files under.

      attr_accessor :chunks_dir

The list of split chunk files for later weaving.

      attr_accessor :chunk_files

    end

    Rake.chunk_files = []
    Rake.chunks_dir = "chunks"

Compute options for invoking an application.

    def self.application_options(output, configurations)
      options = [ "-o", output ]
      options += configurations.map { |configuration| [ "-c", configuration.to_s ] }.flatten
      return options
    end

Return the list of actual configuration files (as opposed to names of built-in configurations) for use as dependencies.

    def self.configuration_files(configurations)
      return configurations.find_all { |configuration| File.exists?(configuration.to_s) }
    end

  end

end

Task for splitting files

To split one or more files to chunks, create a new SplitTask. Multiple such tasks may be created; this is required if different files need to be split using different configurations.

lib/codnar/rake/split_task.rb

module Codnar

  module Rake

A Rake task for splitting source files to chunks.

    class SplitTask < ::Rake::TaskLib

Create a new Rake task for splitting source files to chunks. Each of the specified disk files is split using the specified set of configurations.

      def initialize(paths, configurations)
        @configurations = configurations
        paths.each do |path|
          define_tasks(path)
        end
      end

    protected

Define the tasks for splitting a single source file to chunks.

      def define_tasks(path)
        output = Rake.chunks_dir + "/" + path
        define_split_file_task(path, output)
        SplitTask.define_common_tasks
        SplitTask.connect_common_tasks(output)
      end

Define the actual task for splitting the source file.

      def define_split_file_task(path, output)
        ::Rake::FileTask.define_task(output => [ path ] + Rake.configuration_files(@configurations)) do
          run_split_application(path, output)
        end
      end

Run the Split application for a single source file.

      def run_split_application(path, output)
        options = Rake.application_options(output, @configurations)
        options << path
        status = Application.with_argv(options) { Split.new.run }
        raise "Codnar split errors" unless status == 0
      end

Define common Rake split tasks. This method may be invoked several times, only the first invocation actually defined the tasks. The common tasks are codnar_split (for splitting all the source files) and clean_codnar (for getting rid of the chunks directory).

      def self.define_common_tasks
        @defined_common_tasks ||= SplitTask.create_common_tasks
      end

Actually create common Rake split tasks.

      def self.create_common_tasks
        desc "Split all files into chunks"
        ::Rake::Task.define_task("codnar_split")
        desc "Clean all split chunks"
        ::Rake::Task.define_task("clean_codnar") { FileUtils.rm_rf(Rake.chunks_dir) }
        ::Rake::Task.define_task(:clean => "clean_codnar")
      end

For some reason, include ::Rake::DSL doesn’t give us this and life is too short…

      def self.desc(description)
        ::Rake.application.last_description = description
      end

Connect the task for splitting a single source file to the common task of splitting all source files.

      def self.connect_common_tasks(output)
        ::Rake::Task.define_task("codnar_split" => output)
        Rake::chunk_files << output
      end

    end

  end

end

Task for weaving chunks

To weave the chunks together, create a single WeaveTask.

lib/codnar/rake/weave_task.rb

module Codnar

  module Rake

A Rake task for weaving chunks to a single HTML.

    class WeaveTask < ::Rake::TaskLib

Create a Rake task for weaving chunks to a single HTML. The root source file is expected to embed all the chunks into the output HTML. The chunks are loaded from the results of all the previous created SplitTask-s.

      def initialize(root, configurations, output = "codnar.html")
        @root = Rake.chunks_dir + "/" + root
        @output = output
        @configurations = configurations
        define_tasks
      end

    protected

Define the tasks for weaving the chunks to a single HTML.

      def define_tasks
        define_weave_task
        connect_common_tasks
      end

Define the actual task for weaving the chunks to a single HTML.

      def define_weave_task
        desc "Weave chunks into HTML" unless ::Rake.application.last_comment
        ::Rake::Task.define_task("codnar_weave" => @output)
        ::Rake::FileTask.define_task(@output => Rake.chunk_files + Rake.configuration_files(@configurations)) do
          run_weave_application
        end
      end

Run the Weave application for a single source file.

      def run_weave_application
        options = Rake.application_options(@output, @configurations)
        options << @root
        options += Rake.chunk_files.reject { |chunk| chunk == @root }
        status = Application.with_argv(options) { Weave.new.run }
        raise "Codnar weave errors" unless status == 0
      end

Connect the task for cleaning up after weaving (clobber_codnar) to the common task of cleaning up everything (clobber).

      def connect_common_tasks
        desc "Build the code narrative HTML"
        ::Rake::Task.define_task(:codnar => "codnar_weave")
        desc "Remove woven HTML documentation"
        ::Rake::Task.define_task("clobber_codnar") { rm_rf(@output) }
        ::Rake::Task.define_task(:clobber => "clobber_codnar")
      end

    end

  end

end

Building the Codnar gem

The following Rakefile is in charge of building the gem, with the help of some tools described below.

Rakefile

$LOAD_PATH.unshift(File.dirname(__FILE__) + "/lib")

require "olag/rake"

Codnar configurations


spec = Gem::Specification.new do |spec|
  spec.name = "codnar"
  spec.version = Codnar::VERSION
  spec.title = "Code Narrator"
  spec.author = "Oren Ben-Kiki"
  spec.email = "rubygems-oren@ben-kiki.org"
  spec.homepage = "https://rubygems.org/gems/codnar"
  spec.summary = "Code narrator - an inverse literate programming tool."
  spec.description = (<<-EOF).gsub(/^\s+/, "").chomp.gsub("\n", " ")
    Code Narrator (Codnar) is an inverse literate programming tool. It splits the
    source files into "chunks" (including structured comments) and weaves them back
    into a narrative that describes the overall system.
  EOF
  spec.add_dependency("andand")
  spec.add_dependency("coderay")
  spec.add_dependency("rdiscount")
end

Olag::Rake.new(spec)

The generated HTML requires some tweaking to yield aesthetic, readable results. This tweaking consists of using Javascript to control chunk visibility, generating a table of content, and using CSS to make the HTML look better.

Here are the modified configurations for generating the correct HTML:

Codnar configurations

Override the default Codnar configurations.

Olag::Rake::CODNAR_CONFIGURATIONS.unshift([

Exclude the data files and images from the generated documentation.

  "lib/codnar/data/.*/.*|.*\.png",
], [

Tests should not have chunks detected in them. They may however contain HTML islands.

  "test/.*\.rb",
  "classify_source_code:ruby",
  "format_code_gvim_css:ruby",
  "classify_nested_code:ruby:html",
  "classify_nested_code:ruby:dot",
  "classify_nested_code:ruby:svg",
  "format_code_gvim_css:html",
  "format_code_gvim_css:dot",
  "format_code_gvim_css:svg",
  "classify_shell_comments",
  "format_rdoc_comments",
], [

Ruby sources contain HTML islands.

  "Rakefile|.*\.rb|bin/.*",
  "classify_source_code:ruby",
  "format_code_gvim_css:ruby",
  "classify_nested_code:ruby:html",
  "format_code_gvim_css:html",
  "classify_shell_comments",
  "format_rdoc_comments",
  "chunk_by_vim_regions",
], [

We also have Javascript sources.

  ".*\.js",
  "classify_source_code:javascript",
  "format_code_gvim_css:javascript",
  "classify_c_comments",
  "format_markdown_comments"
], [

We also have CSS sources.

  ".*\.css",
  "classify_source_code:css",
  "format_code_gvim_css:css",
  "classify_c_comments",
  "format_markdown_comments"
])

Rakefile

Javascript chunk visibilty control

The following code injects visibility controls ("+"/"-" toggles) next to each embedded code chunk. It also hides all the chunks by default; this increases the readability of the overall narrative, turning it into a high-level summary. Expanding the embedded code chunks allows the reader to delve into the details.

lib/codnar/data/control_chunks.js

Quick-and-dirty JS for inserting a "+"/"-" control for chunk visibility next to each chunk's name. By default, all chunks are hidden.

function inject_chunk_controls() {
  var name_div;
  foreach_chunk_elements(function(div) {
    name_div = div;
  }, function(html_div) {
    var control_span = document.createElement("span");
    var hide = function() {
      control_span.innerHTML = "+";
      html_div.style.display = "none";
    }
    var show = function() {
      control_span.innerHTML = "&#8211;"; // Vertical bar.
      html_div.style.display = "block";
    }
    name_div.onclick = function() {
      html_div.style.display == "block" ? hide() : show();
    }
    hide(); // Initializes html_div.style.display
    control_span.className = "control chunk";
    name_div.insertBefore(control_span, name_div.firstChild);
  })
}

Loop on all DIV elements that contain a chunk name, or that contain chunk HTML. Assumes that they come in pairs - name first, HTML second.

function foreach_chunk_elements(name_lambda, html_lambda) {
  var div_elements = document.getElementsByTagName("div");
  for (var e in div_elements) {
    var div = div_elements[e];
    classes = " " + div.className + " ";
    if (!/ chunk /.test(classes)) continue;
    if (/ name /.test(classes)) name_lambda(div);
    if (/ html /.test(classes)) html_lambda(div);
  }
}

Only invoke it after all helper functions are defined.

inject_chunk_controls();

Javascript table of content

The following code is not very efficient or elegant but it does a basic job of iunjecting a table of content into the generated HTML.

lib/codnar/data/contents.js

Quick-and-dirty JS for inserting a table of content inside a DIV with the id "contents". The table of content is a series of nested UL and LI elements, prefixed with an H1 containing the text "0 Contents". This H1 comes in addition to the single static H1 expected by HTML best practices. It looks "right" and should not confuse search engines etc. since they do not execute Javascript code.

function inject_contents() {
  var contents = document.getElementById("contents");
  var lists = contents_lists();
  contents.appendChild(contents_header()); // TRICKY: Must be done after contents_lists().
  contents.appendChild(lists);
}

Create a table of contents H1.

function contents_header() {
  var h = document.createElement("h1");
  var text = document.createTextNode("Contents");
  h.appendChild(text);
  return h;
}

Create nested UL/LI lists for the table of content.

function contents_lists() {
  var container;
  var indices = [];
  var h_elements = all_h_elements();

Using "for (var e in h_elements)" is too sensitive to other libraries

  for (var e = 0; e < h_elements.length; e++) {
    h = h_elements[e];
    var level = h.tagName.substring(1, 2) - 1;
    container = pop_container(container, indices, level);
    container = push_container(container, indices, level);
    var id = indices.join(".");
    container.appendChild(list_element(id, h));
    h.insertBefore(header_anchor(id), h.firstChild);
  }
  return pop_container(container, indices, 1);
}

Get a list of all H elements in the DOM. We skip the single H1 element; otherwise it would just have the index "1" which would be prefixed to all other headers.

function all_h_elements() {
  var elements = document.getElementsByTagName("*");
  var h_elements = [];
  for (var e in elements) {
    var h = elements[e];
    if (/^h[2-9]$/i.test(h.tagName)) h_elements.push(h);
  }
  return h_elements;
}

Pop indices (and UL containers) until reaching up to a given level.

function pop_container(container, indices, level) {
  while (indices.length > level) {
    container = container.parentNode;
    indices.pop();
  }
  return container;
}

Push indices (and UL containers) until reaching doen to a given level.

function push_container(container, indices, level) {
  while (indices.length < level) {
    // TRICKY: push a 0 for the very last new level, so the ++ at the end
    // will turn it into a 1.
    indices.push(indices.level < level - 1);
    var ul = document.createElement("ul");
    if (container) {
      container.appendChild(ul);
    }
    container = ul;
  }
  indices[indices.length - 1]++;
  return container;
}

Create a LI for an H element with some id.

function list_element(id, h) {
  var a = document.createElement("a");
  a.href = "#" + id;
  a.innerHTML = id + "&nbsp;" + h.innerHTML;
  var li = document.createElement("li");
  li.appendChild(a);
  return li;
}

Create an anchor for an H element with some id.

function header_anchor(id) {
  var text = document.createTextNode(id + " ");
  var a = document.createElement("a");
  a.id = id;
  a.appendChild(text);
  return a;
}

Only invoke it after all helper functions are defined.

inject_contents();

CSS style

To avoid dealing with the different default styles used by different browsers, we employ the YUI CSS reset and base files. Resetting and restoring the default CSS styles is inelegant, but it is the only current way to get a consistent presentation of HTML. Once this is out of the way, we apply styles specific to our HTML. Some of these override the default styles established by the base CSS file above. We do this instead of directly tweaking the base CSS file, to allow easy upgrade to new versions if/when YUI release any.

lib/codnar/data/style.css

Margin & Padding


div.chunk.name,
div.chunk.html,
div.chunk.containers,
div.chunk table,
div.chunk td,
div.chunk pre {
  margin: 0;
  padding: 0;
}
div.chunk *:last-child {
  margin-bottom: 0;
}
h4, h5, h6,
div.chunk,
div.comment pre {
  margin: 1em 0;
}
pre,
div.comment,
div.chunk.html {
  padding: 0.33em;
}

span.control.chunk {
  padding-left: 0.25em;
  padding-right: 0.25em;
}

Table of content


div#contents ul {
  margin-top: 0;
  margin-bottom: 0;
  padding: 0;
}

div#contents li {
  list-style-type: none;
}

Lists


ul.chunk.containers {
  padding: 0;
  margin: 0;
  display: inline;
}
ul.chunk.containers li {
  display: inline;
  list-style-type: none;
}

Borders


pre,
span.control.chunk,
div.chunk.html {
  border: 1px solid #000;
}

table.layout td.indentation,
div.chunk pre {
  border: none;
}

Colors


span.control.chunk,
table.layout td.html {
  background-color: Beige;
}

Fonts


body {
  font-family: Sans-Serif;
}
pre {
  font-family: Consolas, Inconsolata, Monaco, "Courier New", Monospace;
}
div.chunk.name {
  font-weight: bold;
}

Using Sunlight

When using Sunlight for syntax highlighting, we also need to include some CSS and Javascript files to convert the classified pre elements into properly marked-up HTML. We also need to invoke this Javascript code (a one-line operations). Here is what such code might look like inside a Javascript block of the generated HTML:

Codnar

TL;DR

Description

Installation

Usage

The Story

The Documentation Problem

A Different Approach

Maintaining the Documentation

Code Narrator

Data Flow

Splitting files into chunks

Scanning Lines

Scanner Syntax Shorthands

Classifying Source Lines

Merging scanned lines to chunks

Merging nested chunk lines

Unindenting merged chunk lines

Generating chunk HTML

Grouping lines of the same kind

Formatting lines as HTML

Basic formatters

Markup formats

Generating diagrams using GraphViz

Syntax highlighting using GVIM

Syntax highlighting using CodeRay

> :class</tt>).

Syntax highlighting using Sunlight

Putting it all together

Splitting code files

Splitting documentation files

Built-in configurations

Combining configurations

Documentation "splitting"

Source code lines classification

Classifying comment lines

Simple comment classification

Denoted comment classification

Delimited comment classification

Comment formatting

Syntax highlighting

Syntax highlighting using GVim

Syntax highlighting using CodeRay

Syntax highlighting using Sunlight

Chunk splitting

Putting it all together

Storing chunks on the disk

Writing chunks to disk

Reading chunks to memory

Chunk identifiers

In-memory chunks storage

Weaving chunks into HTML

Weaving chunks together

Embedding files

Embedding images

Invoking the functionality

Command Line Applications

Application for splitting files

Application for weaving chunks

Rake Integration

Task for splitting files

Task for weaving chunks

Building the Codnar gem

Javascript chunk visibilty control

Javascript table of content

CSS style

Using Sunlight