Oblivious RAM: Difference between revisions

Latest revision as of 06:17, 16 August 2024

Template:Multiple issues An Oblivious RAM (ORAM) simulator is a compiler that transforms an algorithm in such a way that the resulting algorithm preserves the input-output behavior of the original algorithm but the distribution of the memory access patterns of the transformed algorithm is independent of the memory access pattern of the original algorithm.

The use of ORAMs is motivated by the fact that an adversary can obtain nontrivial information about the execution of a program and the data that the program is using just by observing the pattern in which the program accesses various memory locations during its execution. An adversary can get this information even if the data in memory is encrypted.

This definition is suited for settings like protected programs running on unprotected shared memory or clients running programs on their systems by accessing previously stored data on a remote server. The concept was formulated by Oded Goldreich and Rafail Ostrovsky in 1996.^[1]

Definition

A Turing machine (TM), a mathematical abstraction of a real computer (program), is said to be oblivious if, for any two inputs of the same length, the motions of the tape heads remain the same. Pippenger and Fischer^[2] proved that every TM with running time $T (n)$ can be made oblivious and that the running time of the oblivious TM is $O (T (n) \log T (n))$ . A more realistic model of computation is the RAM model. In the RAM model of computation, there is a CPU that can execute the basic mathematical, logical, and control instructions. The CPU is also associated with a few registers and a physical random access memory, where it stores the operands of its instructions. The CPU also has instructions to read the contents of a memory cell and write a specific value to a memory cell. The definition of ORAMs captures a similar notion of obliviousness for memory accesses in the RAM model.

Informally, an ORAM is an algorithm at the interface of a protected CPU and the physical RAM such that it acts like a RAM to the CPU by querying the physical RAM for the CPU while hiding information about the actual memory access pattern of the CPU from the physical RAM. In other words, the distribution of memory accesses of two programs that make the same number of memory accesses to the RAM is indistinguishable from each other. This description will still make sense if the CPU is replaced by a client with a small storage and the physical RAM is replaced with a remote server with a large storage capacity, where the data of the client resides.

The following is a formal definition of ORAMs. Let $Π$ denote a program requiring memory of size $n$ when executing on an input $x$ . Suppose that $Π$ has instructions for basic mathematical and control operations in addition to two special instructions $𝗋 𝖾 𝖺 𝖽 (l)$ and $𝗐 𝗋 𝗂 𝗍 𝖾 (l, v)$ , where $𝗋 𝖾 𝖺 𝖽 (l)$ reads the value at location $l$ and $𝗐 𝗋 𝗂 𝗍 𝖾 (l, v)$ writes the value $v$ to $l$ . The sequence of memory cells accessed by a program $Π$ during its execution is called its memory access pattern and is denoted by $\tilde{Π} (n, x)$ .

A polynomial-time algorithm $C$ is an Oblivious RAM (ORAM) compiler with computational overhead $c (\cdot)$ and memory overhead $m (\cdot)$ , if $C$ given $n \in N$ and a deterministic RAM program $Π$ with memory-size $n$ outputs a program $Π_{0}$ with memory-size $m (n) \cdot n$ such that for any input $x$ , the running-time of $Π_{0} (n, x)$ is bounded by $c (n) \cdot T$ where $T$ is the running-time of $Π (n, x)$ , and there exists a negligible function $μ$ such that the following properties hold:

Correctness: For any $n \in ℕ$ and any string $x \in {0, 1}^{*}$ , with probability at least $1 - μ (n)$ , $Π (n, x) = Π_{0} (n, x)$ .
Obliviousness: For any two programs $Π_{1}, Π_{2}$ , any $n \in ℕ$ and any two inputs, $x_{1}, x_{2} \in {0, 1}^{*}$ if $| {\tilde{Π}}_{1} (n, x_{1}) | = | {\tilde{Π}}_{2} (n, x_{2}) |$ , then ${\tilde{Π}}_{1}' (n, x_{1})$ is $μ$ -close to ${\tilde{Π}}_{2}' (n, x_{2})$ in statistical distance, where $Π_{1}' = C (n, Π_{1})$ and $Π_{2}' = C (n, Π_{2})$ .

Note that the above definition uses the notion of statistical security. One can also have a similar definition for the notion of computational security.^[3]

History of ORAMs

ORAMs were introduced by Goldreich and Ostrovsky,^[4]^[5]^[1] where the key motivation was stated as providing software protection from an adversary who can observe a program's memory access pattern (but not the contents of the memory).

The main result in this work^[1] is that there exists an ORAM compiler that uses $O (n)$ server space and incurs a running time overhead of $O (\log^{3} n)$ when making a program that uses $n$ memory cells oblivious. There are several attributes that need to be considered when comparing various ORAM constructions. The most important parameters of an ORAM construction's performance are the client-side space overhead, server-side space overhead, and the time overhead required to make one memory access. Based on these attributes, the construction of Asharov et al.,^[6] called "OptORAMa", is the first optimal ORAM construction. It achieves $O (1)$ client storage, $O (n)$ server storage, and $O (\log n)$ access overhead, matching the known lower bounds.

Another important attribute of an ORAM construction is whether the access overhead is amortized or worst-case. Several earlier ORAM constructions have good amortized access overhead guarantees but have $Ω (N)$ worst-case access overheads. Some ORAM constructions with polylogarithmic worst-case computational overheads are.^[7]^[8]^[9]^[10]^[11]^[12] The constructions of^[4]^[5]^[1] were in the random oracle model, where the client assumes access to an oracle that behaves like a random function and returns consistent answers for repeated queries. Access to the oracle could be replaced by a pseudorandom function whose seed is a secret key stored by the client, if one assumes the existence of one-way functions. The papers^[13]^[14] were aimed at removing this assumption completely. The authors of^[14] also achieve an access overhead of $O (\log^{3} n)$

While most of the earlier works focus on proving security computationally, there are more recent works^[3]^[10]^[13]^[14] that use the stronger statistical notion of security.

One of the only known lower bounds on the access overhead of ORAMs is due to Goldreich et al.^[1] They show a $Ω (\log n)$ lower bound for ORAM access overhead, where $n$ is the data size. Another lower bound is by Larsen and Nielsen.^[15] There is also a conditional lower bound on the access overhead of ORAMs due to Boyle et al.^[16] that relates this quantity with that of the size of sorting networks.

ORAM constructions

Trivial construction

A trivial ORAM simulator construction, for each read or write operation, reads from and writes to every single element in the array, only performing a meaningful action for the address specified in that single operation. The trivial solution thus, scans through the entire memory for each operation. This scheme incurs a time overhead of $Ω (n)$ for each memory operation, where Template:Mvar is the size of the memory.

A simple ORAM scheme

A simple version of a statistically secure ORAM compiler constructed by Chung and Pass^[3] is described in the following along with an overview of a proof of its correctness. The compiler on input Template:Mvar and a program Template:Math with its memory requirement Template:Mvar, outputs an equivalent oblivious program Template:Math.

If the input program Template:Math uses Template:Mvar registers, the output program Template:Math will need $r + n / α + poly \log n$ registers, where $α > 1$ is a parameter of the construction. Template:Math uses $O (n poly \log n)$ memory and its (worst-case) access overhead is $O (poly \log n)$ .

The ORAM compiler is very simple to describe. Suppose that the original program Template:Math has instructions for basic mathematical and control operations in addition to two special instructions $𝗋 𝖾 𝖺 𝖽 (l)$ and $𝗐 𝗋 𝗂 𝗍 𝖾 (l, v)$ , where $𝗋 𝖾 𝖺 𝖽 (l)$ reads the value at location Template:Mvar and $𝗐 𝗋 𝗂 𝗍 𝖾 (l, v)$ writes the value Template:Mvar to Template:Mvar. The ORAM compiler, when constructing Template:Math, simply replaces each Template:Sans-serif and Template:Sans-serif instructions with subroutines Template:Sans-serif and Template:Sans-serif and keeps the rest of the program the same. It may be noted that this construction can be made to work even for memory requests coming in an online fashion.

The ORAM compiler substitutes the read and write instructions in the original program with subroutines Oread and Owrite.

Memory organization of the oblivious program

The program Template:Math stores a complete binary tree Template:Mvar of depth $d = \log (n / α)$ in its memory. Each node in Template:Mvar is represented by a binary string of length at most Template:Mvar. The root is the empty string, denoted by Template:Mvar. The left and right children of a node represented by the string $γ$ are $γ_{0}$ and $γ_{1}$ respectively. The program Template:Math thinks of the memory of Template:Math as being partitioned into blocks, where each block is a contiguous sequence of memory cells of size Template:Mvar. Thus, there are at most $⌈ n / α ⌉$ blocks in total. In other words, the memory cell Template:Mvar corresponds to block $b = ⌊ r / α ⌋$ .

An illustration of the memory of the oblivious program showing the binary tree and position map.

At any point in time, there is an association between the blocks and the leaves in Template:Mvar. To keep track of this association, Template:Math also stores a data structure called a position map, denoted by $P o s$ , using $O (n / α)$ registers. This data structure, for each block Template:Mvar, stores the leaf of Template:Mvar associated with Template:Mvar in $P o s (b)$ .

Each node in Template:Mvar contains an array with at most Template:Mvar triples. Each triple is of the form $(b, P o s (b), v)$ , where Template:Mvar is a block identifier and Template:Mvar is the contents of the block. Here, Template:Mvar is a security parameter and is $O (poly \log n)$ .

Description of the oblivious program

The program Template:Math starts by initializing its memory as well as registers to Template:Math. Describing the procedures, Template:Sans-serif and Template:Sans-serif is enough to complete the description of Template:Math. The sub-routine Template:Sans-serif is given below. The inputs to the sub-routine are a memory location $l \in [n]$ and the value Template:Mvar to be stored at the location Template:Mvar. It has three main phases, namely FETCH, PUT_BACK, and FLUSH.

 input: a location Template:Mvar, a value Template:Mvar

 Procedure FETCH  // Search for the required block.
    $b \leftarrow ⌊ l / α ⌋$     // Template:Mvar is the block containing Template:Mvar.
    $i \leftarrow l mod α$     // Template:Mvar is Template:Mvar's component in block Template:Mvar.
    $p o s \leftarrow P o s (b)$ 
   if  $p o s = ⊥$  then  $p o s \leftarrow_{R} [n / α]$ .    // Set Template:Mvar to a uniformly random leaf in Template:Mvar.
   flag  $\leftarrow 0$ .
   for each node Template:Mvar on the path from the root to Template:Mvar do
     if Template:Mvar has a triple of the form  $(b, p o s, x)$  then
       Remove  $(b, p o s, x)$  from Template:Mvar, store Template:Mvar in a register, and write back the updated Template:Mvar to Template:Mvar.
       flag  $\leftarrow 1$ .
     else
       Write back Template:Mvar to Template:Mvar.

 Procedure PUT_BACK  // Add back the updated block at the root.
    $p o s^{'} \leftarrow_{R} [n / α]$ .  // Set Template:Mvar to a uniformly random leaf in Template:Mvar.
   if flag $= 1$  then
     Set Template:Mvar to be the same as Template:Mvar except for Template:Mvar at the Template:Mvar-th position.
   else
     Set Template:Mvar to be a block with Template:Mvar at Template:Mvar-th position and Template:Math's everywhere else.
   if there is space left in the root then
     Add the triple  $(b, p o s^{'}, x^{'})$  to the root of Template:Mvar.
   else
     Abort outputting overflow.

 Procedure FLUSH  // Push the blocks present in a random path as far down as possible.
    $p o s^{*} \leftarrow_{R} [n / α]$ .  // Set  $p o s^{*}$  to a uniformly random leaf in Template:Mvar.
   for each triple  $(b^{″}, p o s^{″}, v^{″})$  in the nodes traversed the path from the root to  $p o s^{*}$ 
     Push down this triple to the node Template:Mvar that corresponds to the longest common prefix of  $p o s^{″}$  and  $p o s^{*}$ .
     if at any point some bucket is about to overflow then
       Abort outputting overflow.

The task of the FETCH phase is to look for the location Template:Mvar in the tree Template:Mvar. Suppose Template:Mvar is the leaf associated with the block containing location Template:Mvar. For each node Template:Mvar in Template:Mvar on the path from root to Template:Mvar, this procedure goes over all triples in Template:Mvar and looks for the triple corresponding to the block containing Template:Mvar. If it finds that triple in Template:Mvar, it removes the triple from Template:Mvar and writes back the updated state of Template:Mvar. Otherwise, it simply writes back the whole node Template:Mvar.

In the next phase, it updates the block containing Template:Mvar with the new value Template:Mvar, associates that block with a freshly sampled uniformly random leaf of the tree, and writes back the updated triple to the root of Template:Mvar.

The last phase, which is called FLUSH, is an additional operation to release the memory cells in the root and other higher internal nodes. Specifically, the algorithm chooses a uniformly random leaf $p o s^{*}$ and then tries to push down every node as much as possible along the path from root to $p o s^{*}$ . It aborts outputting an overflow if at any point some bucket is about to overflow its capacity.

The sub-routine Template:Sans-serif is similar to Template:Sans-serif. For the Template:Sans-serif sub-routine, the input is just a memory location $l \in [n]$ and it is almost the same as Template:Sans-serif. In the FETCH stage, if it does not find a triple corresponding to location Template:Mvar, it returns Template:Math as the value at location Template:Mvar. In the PUT_BACK phase, it will write back the same block that it read to the root, after associating it with a freshly sampled uniformly random leaf.

Correctness of the simple ORAM scheme

Let Template:Mvar stand for the ORAM compiler that was described above. Given a program Template:Math, let Template:Math denote $C (Π)$ . Let $Π (n, x)$ denote the execution of the program Template:Math on an input Template:Mvar using Template:Mvar memory cells. Also, let $\tilde{Π} (n, x)$ denote the memory access pattern of $Π (n, x)$ . Let Template:Mvar denote a function such that for any $n \in ℕ$ , for any program Template:Math and for any input $x \in {0, 1}^{*}$ , the probability that $Π^{'} (n, x)$ outputs an overflow is at most $μ (n)$ . The following lemma is easy to see from the description of Template:Mvar.

Equivalence Lemma: Let $n \in ℕ$ and $x \in {0, 1}^{*}$ . Given a program Template:Math, with probability at least $1 - μ (n)$ , the output of $Π^{'} (n, x)$ is identical to the output of $Π (n, x)$ .

It is easy to see that each Template:Sans-serif and Template:Sans-serif operation traverses root-to-leaf paths in Template:Mvar chosen uniformly and independently at random. This fact implies that the distribution of memory access patterns of any two programs that make the same number of memory accesses are indistinguishable if they both do not overflow.

Obliviousness Lemma: Given two programs Template:Tmath and Template:Tmath and two inputs $x_{1}, x_{2} \in {0, 1}^{*}$ such that $| \tilde{Π_{1}} (x_{1}, n) | = | \tilde{Π_{2}} (x_{2}, n) |$ , with probability at least $1 - 2 μ (n)$ , the access patterns $\tilde{{Π_{1}}^{'}} (x_{1}, n)$ and $\tilde{{Π_{2}}^{'}} (x_{2}, n)$ are identical.

The following lemma completes the proof of correctness of the ORAM scheme.

Overflow Lemma: There exists a negligible function Template:Mvar such that for every program Template:Math, every Template:Mvar and input Template:Mvar, the program $Π^{'} (n, x)$ outputs overflow with probability at most $μ (n)$ .

Computational and memory overheads

During each Template:Sans-serif and Template:Sans-serif operation, two random root-to-leaf paths of Template:Mvar are fully explored by Template:Math. This takes $O (K \cdot \log (n / α))$ time. This is the same as the computational overhead and is $O (poly \log n)$ since Template:Mvar is $O (poly \log n)$ .

The total memory used up by Template:Math is equal to the size of Template:Mvar. Each triple stored in the tree has $α + 2$ words in it and thus there are $K (α + 2)$ words per node of the tree. Since the total number of nodes in the tree is $O (n / α)$ , the total memory size is $O (n K)$ words, which is $O (n poly \log n)$ . Hence, the memory overhead of the construction is $O (poly \log n)$ .

References

Template:Reflist