More AXIS Modules

Square Rooter

The questions below are due on Monday October 07, 2024; 11:59:00 PM.
 
You are not logged in.

Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.

OK you just build the thing that splits, squares, and sums our complex numbers. Now we need to take the square root of that value.

We're going to build two modules this week.

Square Rooting

We'd now like to write a module that is AXI-streaming compatible that calculates the incoming square root of a 32 bit number. You can assume it is unsigned. As for an algorithm, there's lots of different ways to calculate the square root. An easy way to do it that just involves some multiplies, adds, and shifts, is to just do it with a binary search.

When I search for "binary-search square root", Google's AI gives me this which should kinda make sense looking at it:

def binary_search_sqrt(x):
  """
  Finds the integer square root of x using binary search.
  """

  if x == 0:
    return 0

  left, right = 1, x

  while left <= right:
    mid = left + (right - left) // 2
    mid_squared = mid * mid

    if mid_squared == x:
      return mid
    elif mid_squared < x:
      left = mid + 1
    else:
      right = mid - 1

  return right  # Return the floor of the square root

x = 16
print(binary_search_sqrt(x))

We'll try to do the same thing, but we'll want to do it fully-pipelined so that we can calcluate one square root per clock cycle (albeit with a certain amount of latency).

An approach like this is simple and avoids any need of non-base-two division. It is however somewhat less efficient than other methods out there. For one, binary search only gives us one more bit of information per iteration (it can't do any more) compared to some other variants out there. The wikipedia article is great. If you do something like find the inverse square root, using Newton's Method, etc... or do a Cordic trig functions, you can often times converge more quickly. For simplicity, we'll just do binary search.

We want this module to be fully pipelined, however, so make sure whatever route you choose, you do it using a pipeline. One layer of logic should be doing one layer of the algorithm. Unlike a software implementation which has the freedom to run "until found" we need to address the worst-case-loop count right up front so we can lay out the correct number of stages. This can be determined pretty easily. If our input is 32 bits, that means our largest number is 32'hFFFF_FFFF. That will have a square root of roughly 16'hFFFF or thereabouts. Every other possible root we may need to find will be less than this value (and greater than or equal to 1). So we can use these two boundary values for our initial high/low values. With that asserted, our initial guess can then be 16'h7FFF. Then binary search can run on from there. Since we have 16 bits we need to figure out, we'll need (worst case) 16 layers of searching to narrow down to our answer. Values may get found before they reach the final stage, but since we're aiming for 100% throughput/pipelining, a early-breakout path like you'll sometimes see in some systems isn't needed here.

Lay it out and build it!

Much like the module from the previous page, this module should be "stateless" in its calculations, (unlike the FIR) so the data-stalling should be much less of an annoyance (if you end up calculating some invalid chunks of data, that's fine as long as the valids and lasts are pipelined properly along with your data.

Improving Our Scoreboard

In the process of building a square root calculator using a binary search approach, because of how things worked out, my final answer would occasionally end up being 1 off the "gold-standard" calculation from Python (such as int(value**0.5)). If this is something you are ok with for your purpose (and to be fair many times we're not actually finding the answer but an approximation), you may not want your Scoreboard to do an == type check on the values.

Below I made my own variant of the Scoreboard that I overrode the compare function in. Most of this code was lifted from the original definition, but the comparison check I made to be effectively a "at or within 1" check.


class SqrtScoreboard (Scoreboard):
  def compare(self, got, exp, log, strict_type=True):
    if abs(got-exp) <= 1: #change to whatever you want for the problem at hand.
      # Don't want to fail the test
      # if we're passed something without __len__
      try:
        log.debug("Received expected transaction %d bytes" %
                  (len(got)))
        log.debug(repr(got))
      except Exception:
        pass
    else:
      self.errors += 1
      # Try our best to print out something useful
      strgot, strexp = str(got), str(exp)
      log.error("Received transaction differed from expected output")
      log.info("Expected:\n" + repr(exp))
      log.info("Received:\n" + repr(got))
      if self._imm:
        assert False, (
          "Received transaction differed from expected "
          "transaction"
        )

This new scoreboard could then be used in your main checking infrastructure!

Build your binary-search square-root module and verify it is working!

Upload your testbench with working driver and monitor and scoreboard for the square-rooter module.
 No file selected

Upload your square-rooter module
 No file selected