Article
BACK TO FEED
2026.03.29 5 min

Rolling your own OFAC search

elixir rust ofac sanction screening compliance

When you do anything with USD payments, one of the first things your CCO will tell you do do is that you need to screen the names in payment vs OFAC lists. Normally, your compliance team will just configure this as a rule in whatever third party monitoring tool that you're using and be done with it, but what if you'd want to roll your own?

Scope.

Functional

The OFAC search website is a baseline that I wanted to replicate.

We take as an input a name and an entity type (indivdual / legal entity etc.), choose a confidence score from 0 to 100 (100 being an exact match, usually this is around ~80-85) and search if it occurs on any of the entries in the OFAC lists - the OFAC SDN - Specially Designed Nationals list or the OFAC non-sdn list - Consolidated List.

image.png

Non-functional

I want this to be reasonably fast, ideally under 200ms for search. - I tested a bruteforce implementation of OFAC search once in nodejs, and one search of both SDN & Consolidated lists took around 1,5 - 2s.

I want this also to be entirely within this site - which itself is written in Elixir. Now, elixir is pretty comparable to nodejs in performance, so to go faster, we'd have to go native. You can define bindings to native code in elixir using nifs, so I decided to offload the hottest paths to Rust.

How does OFAC search even work?

The FAQ provides some details:

Sanctions List Search then uses two matching logic algorithms, and two matching logic techniques to calculate the score. (...)

The first technique involves using the Jaro-Winkler algorithm to compare the entire name string entered against full name strings of potential match entries on OFAC's sanctions lists.

The second technique involves splitting the name string entered into multiple name parts (for example, John Doe would be split into two name parts). Each name part is then compared to name parts on all of OFAC's sanctions lists using the Jaro-Winkler and Soundex algorithms. The search calculates a score for each name part entered, and a composite score for all name parts entered.

Sanctions List Search uses both techniques each time the search is run and returns the higher of the two scores in the Score column.

FAQ 249. How is the Score calculated?

Ok so we have a combination of fuzzy an phonetic matching. It mentions both the Jaro-Winkler and Soundex algorithms.

Unfortunately diving deeper in the FAQs reveals that there was an additional algorithm change in 2021 that introduced "new algorithm to the tool's fuzzy logic search funcitonality', and the exact details are unknown.

V1 - simple Jaro - Winkler

On the other hand, it's rare you'll come across a ticket that will provide as much details as the OFAC FAQ did, so I can't complain. Let's start with the first algorithm it mentioned - Jaro Winkler.

Preprocessing

Let's say we need to screen the extreme case - the estimeed "Dr. José García III" in his full glory, we need to cleanup both the ofac entries and the search itself. To do that, I used the following pipline:

  • lowercase -> "dr. josé garcía iii"
  • unicode normalization - unicode lets you represent the same character in multiple ways, for example é can be one codepoint or two (e + accent mark), so I applied the NFKD normalization that expands everyting into simplest pieces, so the 2 strings that look identical can actually match in code, regerdless of how they were encoded. So "dr. josé garcía iii" get's his é decomposed into é → e + ◌́
  • diacritics - strip the accents marks and replace all these ą and ę, leaving dr. Garcia as "dr. jose garcia iii"
  • transliteration - tries to convert non-latin scripts into ASCII equivalent, using the any_ascii crate. Dr. Garcie is not affected by this but we're prepared for Владимир → Vladimir
  • punctuation removal - "dr jose garcia iii",
  • title removal - leaving "jose garcia iii", no longer a dr.
  • suffix removal - "jose garcia" and no longer the 3rd
  • whitespace cleanup - no-op here, but any stray spaces will be cleaned up, and duplicate merged into one

So we reduced estimeed "Dr. José García III" to less estimeed "jose garcia" but we'll make up the lack of titles with ease of comparison.

Jaro Winkler

First fuzzy algorithm - Jaro-Winkler compares 2 strings and gives us a score between 0 and 1 on how similar are those 2 strings - 0 being completely different and 1 identical.

The base of that is the Jaro score - it looks at three things:

  1. How many characters match
  2. How many characters are out of order
  3. The lenghts of both strings

The Winkler part of the JW is a later modification that adds a "prefix bonus" - strings that share the same starting characters get a score boost. This makes the algorithm "front-biased" - test vs xest scores lower than test vs tesx, even though both have just one character different.

Same single-character error, different position — identical Jaro scores, but Winkler boosts only when the prefix matches
TARGET s e r g e i
QUERY s e r g e y
prefix (4)
prefix matched & in shared prefix — boosts score
match character found within the match window
transposed matched, but paired with a different character when comparing matches in order
unmatched no matching character found within the window
WINDOW floor(max(6, 6) / 2) − 1 = 2
each character can match up to 2 positions left or right — hover the grid to see
JARO (5/6 + 5/6 + (50)/5) / 3 = 0.8889
PREFIX min(5, 4) = 4
BOOST 4 × 0.1 × (1 − 0.8889) = +0.0444
JW SCORE 0.8889 + 0.0444 = 0.9333

Elixir has it's own jaro-similarity function, but for now we'll go native and use jaro-winkler from the strsim create. So we first take the list, run pre-processing on it, and then run JW comparison, and return any matches greater then score. Here's a demo of that

DEMO: OFAC v1 — Plain Jaro-Winkler ▼ COLLAPSE
LISTS: OFAC CONS OFAC SDN

Pretty good! You can find Vladimir Putin pretty easily! It will also find Vladim Ptin. Thanks to all our pre-processing steps it also catches all the cyrylics - "Владимир Путин" and "Владимiр Путин".

How about "Putin Vladimir"? Well, in this case we get a big fat 0. The OFAC search website handles it just fine:

image.png

Even "Putin" itself causes multiple 100 points matches:

image.png

As stated in the FAQ, OFAC website does something with name parts as well, not full names only, let's try to replicate that.

V2 - Token Jaro-Winkler

OFAC search tries to compare each token in the name separatly, and then produces a "composite score" out of them. To replicate that, we can tokenize both sides of the target / query, and for each query token find the best JW match among the target tokens.

This should fix bot the name-parts reordering - tokens matched individually, regardless of order (Putin Vladimir instead of Vladimir Putin) and partial queries ("putin" matches "putin" token in "vladimir putin" directly.

Now all that's left is the 'composite score', we'll do the simplest hting here, final sore will be the max(full_string_jw, average_of_best_token_matches so the full-string path from v1 is still there as a fallback. As naive as it gets. Here's the demo:

DEMO: OFAC v2 — Token Jaro-Winkler ▼ COLLAPSE
LISTS: OFAC CONS OFAC SDN