# A new proposed Cache Memory cell using Josephson resistive DCI logic gate

K.SRINIVAS, DEPARTMENT OF PHYSICS, GMR INSTITUTE OF TECHNOLOGY, RAJAM-532127,

A.P., INDIA,Email: srinkura@gmail.com srinivas.k@gmrit.org

Abstract--- A new Cache memory cell has been proposed using the DCI logic gate. The principle and operation of the cell has been given. The current equation at each stage of the memory cell has been deduced. The dynamic response of the cell has been obtained by computer-simulation. The performance of our Cache memory cell has been compared with that of the IBM Cache memory cell. The cycle and access time of the proposed cell are 250ps and 210ps respectively, compared to earlier Cache memory cycle and access times, 1000ps and 650ps, respectively. Our Cache memory cycle time and access time can be further reduced since it is based on the DCI logic, which does not have much restriction on its size.

*Keywords*—Superconducting electronics, Memory Device Cashe Memory Cell, Dynamic Simulation

#### I. Introduction

Generally any large general purpose (main frame) computer always has two types of storages, (i) a very high capacity (-'M byte) but relatively slow primary or main memory and (ii) a high speed much efficient but comparatively of low capacity (%AK byte) buffer or Cache memory. The Cache memory plays a very vital and effective role in today's machines of all sorts, by improving the computing speed through coupling of data quickly from the CPU to the main memory. The Cache is needed to reduce the otherwise high cycle and access times in scanning data from the main memory by placing blocks of data on a demand basis very close to a processor so that it can fetch and process quickly.

The idea of Cache was first initiated in IBM 360. The requirement of Cache for Josephson supercomputer has already thought by IBM [1][2-7]. All are based on the Cache memory cell proposed by Faris et al [1]. The earlier Cache memory cell [1] is in the form of a superconducting loop containing a single write junction coupled to a single read junction. The two states of the cell are zero circulating current to represent the binary "0" and a clockwise current to represent "1".

The main drawbacks of SQUID devices for logic or memory application are relatively large device area and high sensitivity to stray magnetic fields. In SQUIDS, 80% of the area is occupied by the transformers. Further, the high sensitivity to stray magnetic fields and that the

SQUID based logic or memory circuits be well shielded from the stray magnetic fields.

Resistive logics such as DCI and RCJL etc. will eliminate the above drawbacks and will occupy less area because of the lack of transformer coupling. Further, it is found that resistive logic gates have high switching speed compared to SQUID based logic gates. VanDuzer reviewed the importance of RSFQ device research to realize a successful superconductor digital technology [8]. Kleinsasser approached Hybrid Technology Multi-Threaded (HTMT) approach to petaflops computing includes large numbers of ultra-high performance Rapid Single Flux Quantum (RSFQ) processor and memory chips using Nb technology [9]. K. Fujiwara et al developed a 64-kb sub nanosecond Josephson-CMOS hybrid random-access memory (RAM) with ultrafast hybrid interface circuits. This hybrid memory was designed and fabricated and showing an access-time of 600 ps.[10] [11]

In the present paper we have proposed a new Cache memory cell using DCI logic gate. In Section-2 we have discussed the earlier Cache memory cell. In section-3 we have given the principle and operation of our proposed Cache memory cell. Section-4 deals with the dynamic analysis of the proposed Cache memory cell Results and discussions are given in Section-5. Finally, conclusions are given in Section-6.

#### II. EARLIER IBM CACHE MEMORY CELL

The basic Cache memory cell [1] is in the form a superconducting loop containing a single WRITE junction coupled to a single READ junction as shown in Fig. 1. The principle of operation is as follows:



Fig. 1 Equivalent of a IBM Cache memory cell (Faris et al[4])

A "1" is written in an empty cell by the following procedure. The supply current  $I_Y$  is applied first and it splits up because of flux conservation into left and right hand branch current  $I_L$  and  $I_R$  respectively as determined

DIGITAL LIBRARY

20

by  $I_L L_L = I_R L_R$  where  $L_L$  and  $L_R$  are the inductances of the left and right branches of the loop. The current I<sub>X</sub> and I<sub>Y</sub> are applied to the cell after the current I<sub>L</sub> is established in the Write gate. This forces the gate to switch and I<sub>L</sub> transfers to the right branch. Upon removal of all currents, a clockwise circulating current is left in the loop because of the flux conservation. A "0" is written in .cell with an initial "1" by dissipating the circulating current by coincident application of Ix and I x. The cell is read nondestructively by coincident application of I<sub>Y</sub> and I<sub>S</sub> and the resultant current in the right hand branch is detected by the sense gate. If the cell contains a "I" the clockwise circulating current adds to the fraction of I<sub>Y</sub> that goes to the right hand branch and is sufficient to cause switching of the sense gate. If there is no circulating current (O' state) I<sub>R</sub> is insufficient to switch the sense gate.

## III. PRINCIPLE AND OPERATION OF THE PROPOSED CACHE MEMORY CELL:

The basic circuit diagram of -the suggested prototype Cache memory cell is shown in Fig. 2 . The circuit uses altogether four Josephson junctions. Two of them  $J_1$  and  $J_2$  form the two parallel arms of a four arms bridge of the basic DCI device.  $J_3$  is switched from superconducting (OFF) state to normal (ON) state, when a "I" is written into a cell. The junction  $J_3$  is called as WRITE junction. The operation of the cell is as follows.



Fig. 2 Equivalent circuit of a proposed Cache memory cell using Direct Coupled Isolation (DCI) logic.

When both  $I_X$  and  $I_Y$  are- applied simultaneously, first the junction  $J_2$  switches and then junction  $J_1$ . When both junction  $J_1$  and  $J_2$  switched to voltage state the current  $I_Y$  is steered to output of the first stage through the resistor  $R_3$ . Thereby it acts as an input to the junction  $J_3$ . When this current is sufficient,  $J_3$  also switches to the voltage state and performs Write operation. The critical current of the junction  $J_4$  is chosen such that the output current through the resistor  $R_4$  cannot switch the junction  $J_4$  unless  $I_S$  is applied. When  $I_S$  is applied, only the junction  $J_4$  switches to voltage state. Similarly junction  $J_4$  cannot be switched with  $I_S$  alone. Thus the operation of the cell can be explained in the following way.

If  $I_X$ ,  $I_Y$ , or both are not applied  $J_1$ ,  $J_2$  and  $J_3$  cannot be switched. Even if  $I_S$  is applied  $J_4$  cannot be switched.

To write '1' both  $I_X$  and  $I_Y$  are to be applied. When the junctions  $J_1$  and  $J_2$  switches to voltage states, 1 4 is steered through the resistor  $R_3$  and passes through the junction J3, and  $J_3$  switches to voltage state. Now '1' is written in the cell.

To read this  $I_S$  has to be applied. With  $I_S$  and the output current through  $R_4$  allows  $J_4$  to switch into the voltage state. Thus reading  ${}^al^t$  can be achieved.

To write '0' when '1' is already written, reverse the polarities of  $I_X$  and  $I_Y$  are to be reversed.

The reading is nondestructive that means even after reading the information will not be disturbed.

## IV. DYNAMIC RESPONSE OF THE CACHE MEMORY CELL:

The current equation of this Cache memory cell at each stage are given as follows

$$i1 = 2 I_0 \sin\theta_a + 2 C_j \frac{dVa}{dt} \qquad ---- (1)$$

$$i2 = I_0 \sin(\theta_a - \theta_b) + C_j \frac{d}{dt} (Va - Vb) ---- (2)$$

$$i3 = Va / R_1 \qquad ---- (3)$$

$$i4 = Va / R_2 \qquad ---- (4)$$

$$i5 = Va / R_3 \qquad ---- (5)$$

$$I_Y = i_2 + I_4 \qquad ---- (6)$$

$$i_4 = i_1 + i_5 \qquad ---- (7)$$

$$i_5 = I_0 \sin\theta_a + C_j \frac{dVc}{dt} + \frac{V}{R_C} \qquad ---- (8)$$

$$i_6 = V_C / R_4 \qquad ---- (9)$$

$$I_S + i_6 = 2I0 \sin\theta_d + 2C_j \frac{dVd}{dt} + \frac{Vd}{R_L} \qquad ---- (10)$$

Using the above equations we can obtain the following equations:

$$\frac{d^{2}\theta_{a}}{dt^{2}} = \frac{\pi}{\phi_{o} C_{j}} [I_{Y} + I_{X} - 2I_{0} \sin \theta a] ---(11)$$

$$-\frac{\phi_{o}}{2\pi R_{1}} \frac{d\theta_{b}}{dt} - \frac{\phi_{o}}{2\pi R_{2}} \frac{d\theta_{a}}{dt}$$

$$-\frac{d^{2}\theta_{b}}{dt^{2}} = \frac{2\pi}{\phi_{o} C_{j}} [\frac{I_{Y}}{2} + \frac{3}{2} I_{X} - I_{0} \sin(\theta_{a} - \theta_{b})] --(12)$$

$$-2I_{0} \sin \theta_{a} - \frac{3\phi_{o}}{4\pi R_{1}} \frac{d\theta_{b}}{dt} - \frac{\phi_{o}}{4\pi R_{2}} \frac{d\theta_{a}}{dt}$$



UACEE International Journal of Advances in Computer Networks and its Security - Volume 2: Issue 3 [ISSN 2250 - 3757]

$$\frac{d^2\theta_c}{dt^2} = \frac{2\pi}{\phi_o C_j} \left[ \frac{\phi_o}{2\pi R_2} \frac{d\theta_a}{dt} \right] - I_0 \sin\theta_c - \frac{\phi_o}{2\pi R_3} \frac{d\theta_c}{dt} - I_0 \sin\theta_c - \frac{\phi_o}{2\pi R_3} \frac{d\theta_c}{dt} + \frac{\phi_o}{2\pi R_3} \frac{d\theta_c}{dt} - \frac{(I_S - 2I_o \sin\theta_d)}{2\pi R_4} - \frac{(I_S - 2I_o \sin\theta_d)}{dt} + \frac{\phi_o}{2\pi R_3} \frac{d\theta_c}{dt} - \frac{\phi_o}{2\pi R_4} \frac{d\theta_a}{dt} \right]$$

Eqns.11-14 are in the form of second order differential equations. The computer-simulation response of READ and WRITE operations can be obtained by solving these equations using fouth-order Runge-Kutta method on a computer.

#### V. RESULTS AND DISCUSSIONS

Fig.3 shows the dynamic response of a DCI Cache memory cell using Nb/A10 x/Nb technology. Solid curve indicates the dynamic response of a WRITE junction whereas the dotted curve indicates the dynamic response of a READ junction. Here to write "1" the time taken is approximately 40ps whereas to read "1" the time taken is approximately 70ps.



Fig. 3 Dynamic response of the new Cache memory cell using Nb/A10x/Nb technology. The solid curve shows the dynamic response of the WRITE junction, whereas the dotted curves indicates the dynamic response of the READ junction.

Fig.4 shows voltage vs time of WRITE "1" operation for different technologies. Solid curves indicate the dynamic response of the WRITE gate using Pb-alloy technology whereas dotted curves are for corresponding Nb/AlOx/Nb technology. It can be observed from the figure that the time taken for WRITE "1" in the case of Nb/AlO x/Nb technology is smaller. This is due to the lower junction capacitance (0.39pF) in the case of Nb/AlO x/Nb Josephson junction technology.



Fig. 4 Dynamic response of the WRITE junction of the Cache memory cell. The solid curve indicates the dynamic response of the WRITE junction using Pb-Alloy technology whereas the dotted curve shows the dynamic response of the WRITE junction using Nb/A10x/Nb Josephson technology.

Further, in Fig.5 the dynamic response df WRITE and READ junctions have been shown for Nb/AlO x/Nb technology based DCI Cache memory cell. Fig.5a shows the dynamic response of WRITE junction. Fig.5b shows the dynamic response of READ junction. Fig.5c shows the dynamic response of the WRITE junction when input  $I_Y$  is absent.



Fig. 5 Dynamic response of the Cache memory cell using Nb/A10x/Nb Josephson technology. Curve (a) shows the dynamic response of the WRITE junction (b) shows the dynamic response of the READ junction and (c) shows the dynamic response of the WRITE junction when the input current Ix=0.



Fig. 6 Dynamic response of the Cache memory cell using Nb/A10x/Nb Josephson technology. Curve (a) shows the dynamic response of the WRITE junction (b) shows the dynamic response of the READ junction and (c) shows the dynamic response of the WRITE junction when the input current  $I_{\rm Y}=0$ .

DIGITAL LIBRARY

If  $I_X$  or  $I_Y$  is not applied, the first stage of DCI (as shown in Fig.2 that means the two junctions  $J_1$  and  $J_2$  will not switch. Thereby  $J_3$  cannot switch. Thus WRITE "0" is achieved. Fig. 6 shows the dynamic response of a WRITE gate when an input  $I_Y$  is applied instead of  $I_X$ . It can be observed that the WRITE junction is not switched. If WRITE junction is not switched, READ junction cannot be switched. To read "1", WRITE junction should be in the voltage state and  $I_S$  should be present. Since  $I_X$  or  $I_Y$  in both cases (Fig.5 and Fig.6) is absent, the WRITE junction is not switched to the voltage state.

Estimation of total cycle and access times of the proposed cell:

It can be observed from the dynamic response of the Cache memory cell that the preliminary cycle time and access time of the unit cell Using Nb/A10 x/Nb Josephson technology are as follows:

WRITE time (Preliminary)= 50 ps READ time (Preliminary) = 20 ps.

The preliminary Read and Write times taken here are basically the access and cycle time components. We have just computed the very first degree values of delays. Now, these delays are again modified by the delays in peripheral circuits such as decoders, drivers, etc. Further, the chip crossing delay must be added to these delays.

Assuming a standard 6m x 6m chip layout, the average distance of a cell in the chip mid-point is equal to 3mm. In a 2.5um based design, we find that a minimum of 4 spacings (line-width) or cross over margins between junction and resistor branches of the memory are to be kept in the fabrication process. This is done to give some separation between the adjacent layers. The Cache memory cells are to connected to output braches and auxiliary ciucuits, such as decoders, drivers, etc. Due to this we have taken 40  $\mu$ m length extra in wire-length of the system to consider the worst case possibility, when the Line length = Z.5 x 4 + 40 = 50um. Then the length becomes 3000 + 50 = 3050pm. The transmission delay per unit length of the stripline is obtained as 0.01016 ps.

Therefore, the chip crossing delay for 1 chip crossing (in the case of Cache) =  $3050 \times 0.01016$ ps =  $30 \times$ 

(Note: In the case of main memory, 1 card and 3 chip crossing are involved).

Therefore, Cycle time = preliminary cycle time + Chip crossing delay + Some other delays= 50 + 30 + 20 = 100 ps. Similarly, access time = 20 + 30 + 20 = 70 ps

Again, these times will be further modified by the peripheral circuits delays, i.e. decoder and driver delays. Taking the decoder delay (50ps according to Nikinishi and Fujita [12]) and driver delay (100s according to Tazoh et al [13]) the peripheral circuit delay in the worst case is 150 ps. Then,

The total cycle time = 100 + 150 = 250 ps. The final access time = 60 + 150 = 210 ps.

These values, in any case, are much less than 600 ps access time and 1000 ps cycle time of the previous reported IBM Cache memory cell [1]. Speed:

Speed response, i.e, the frequency of operation of the cell

$$= \frac{1}{250 ps} = 4 \times 10^9 \ hz = 4 \text{GHz}.$$

So, the periodic repetition rate will be 4 GHz. This Cache memory cell acts much faster Cell dimension

Taking line width as 3um, the total cell area is estimated according to Fig.2 as 30.5pmx16.5pm. Still this area can be reduced with 2um line width and 1.5um interspacing width. It is apparent that the area is very small compared to that of earlier Cache Memory cell. Power dissipation:

In the quiescent state, the cell will not consume any power. Only when it change state, i.e switched from '0' to '1' or vice versa, power dissipated. Zero standby power is another advantage of this cell, like in earlier cases. A rough estimate of power needed for an average load  $R_{\rm L}=50~\Omega$  in the circuit is as follows:

Average power dissipation = 
$$\frac{1}{2} I_{in}^2 x R_L$$
  
=  $\frac{1}{2} 100 \mu A x 100 \mu A x 50 = 2.5 x 10^{-9} = 2.5 \text{ nW}$ 

Of Course, this would be again coupled with power consumed in arrays to give some increase of overall system power dissipation. This can be in the range of microwatts. Taking overall power consumed in the worst case as  $10\mu W$ ,

Power delay product of the system =  $(10 \times 250)$  x  $10^{-18}$  Joules =  $2.5 \times 10^{-16} = 2.5$  FJ.

This would be much less than what has been obtained in the earlier Cache memory cell. In Table-I a comparison is made between our proposed Cache memory and that of IBM. It is apparent from the Table-I that our proposed memory has excellent features over earlier Cache memory cell.



#### TABLE-I

### A COMPARISION BETWEEN OUR CACHE MEMORY CELL AND IBM CACHE MEMORY CELL

| Cache<br>Memor    |       | Cycle capaci | Access<br>ty Time | Power<br>Time | Power<br>Delay |
|-------------------|-------|--------------|-------------------|---------------|----------------|
| Cell              |       |              |                   |               | Product        |
| IDM               |       |              |                   |               |                |
| IBM               |       |              |                   |               |                |
| Cache             | 4K    | 1000ps       | 600ps             | 6mW           | 7200FJ         |
| Memory            |       |              |                   | (Femto        |                |
| (Faris et al [1]) |       |              |                   | Joule )       |                |
| Propose           | ed 4K | 250ps        | 210ps             | μW            | 2.5 FJ based   |
| Cache             |       |              | on a total of     |               |                |
|                   |       |              | 10μW power        |               |                |
|                   |       |              | Dissipation       |               |                |
|                   |       |              |                   |               | the system     |
|                   |       |              |                   |               |                |

#### VI. CONCLUSIONS

It can be concluded from the various sections of the Chapter that the proposed Cache memory cell using the DCI logic has better advantages over the IBM Cache memory, in both speed and circuit dimension. The cycle and access time of the proposed cell are 250ps and 210ps respectively, compared to earlier Cache memory cycle and access times, 1000ps and 650ps, respectively. Our Cache memory cycle time and access time can be further reduced since it is based on the DCI logic, which does not have much restriction on its size.

#### VII. Acknowledgements

The greatest thanks to my guide ,Prof. Dr.J. C. Biswas, Ex-Professor, ECE Department , Indian Institute of Technology , India for his quality oriented suggestions made crucial impact on the value and importance of this work. I would like to acknowledge Prof.C.L.R.S.V. Prasad, Prinipal GMR Institute of Technology, Rajam, A.P,India for his constant encouragement to finish this work.

#### REFERENCES

- S. M. Faris, W.H. Henkels, E. A. Valsamakis, and H. H. Zappe," Basic design of a Josephson Cache memory," IBM J. Res. Devel. vol.24, no.C, March 198U, pp. 143-lb4.
- [2] W.H.Henkel s and J. H. Greiner," Experimental Single Flux Quantum Josephson memory cell," IEEE J. Solid-state Circuits vol.SC-14, no.5, October 1979, pp. 794-796.
- [3] W.H. Henkel s, K.H. Brown, T.V. Rajeevakumar, L. Gepperat, J. W. Allan, Y. H. Lee and J. T. Yeh, "Experiments on a cross-section of a Josephson RAM chip," Proceeding of 1983 IEEE Conference on Computer Design (ICCD' 83) pp. 570-573.
- [4] W. H. Henkels, "Fundamental criteria for the design of highperformance Josephson non-destructive readout random access memory cells and experimental confirmation, J. Appl. Phys. vol.50, no.12, December 1979, pp. 8143-8168.

- [5] W. H. Henkels, L. M. Geppert, J. Kadlec, P. W. Epperlein, H. Beha, W. H. Chang and H. Jaeckel, "Josephson 4K-bit Cache memory design for a prototype signal processor. I. General Overview," J. Appl. Phys. vol.58, no.6, September 1985, pp. 2371-2378
- [6] W. H. Henkels, L. M. Geppert, J. Kadlec, P. W. Epperlein, H. Beha, W. H. Chang and H. Jaeckel, "Josephson 4K-bit Cache memory design for a prototype signal processor. II. Cell array and drivers," J. Appl. Phys. vol.58, no-6, September 1985, pp. 2379-2388.
- [7] W. H. Henkels, L. M. Geppert, J. Kadlec, P. W. Epperlein, H. Beha, W. H. Chang and H. Jaeckel, "Josephson 4K-bit Cache memory design for a prototype signal processor. III. Decoding, sensing, and timing," J. Appl. Phys. vol.58, no.6, September 1985, pp. 2389-2399.
- [8] T. VanDuzer "Superconductor digital tecnlogy Past, Present and future "IEICE Transcations on Electrinics vol E91-C, no.3, 2008 pp 260-271.
- [9] <u>Kleinsasser.</u>, "High performance Nb Josephson Devices for petaflops computing " IEEE Transactions on Applied Superconductivity (March 2001), 11 (1, 1), pg. 1043-1049
- [10] Q. Liu; K. Fujiwara; X. Meng; S.R. Whiteley; T. Van Duzer; N. Yoshikawa; Y. Thakahashi; T. Hikida; N. Kawai "Latency and power measurements on 64-kB Hybrid Josephson CMOS memory "IEEE Transactions on Applied Superconductivity (June 2007), 17 (2, 1), pp. 526-529
- [11] K. Fujiwara; Qingguo Liu; T. Van Duzer; Xiaofan Meng; N. Yoshikawa, "New Delay time measurements on 64-kB Josephson-CMOS Hybrid memory with a 660ps access time ",IEEE Transactions on Applied Superconductivity (Februay 2010), 20(1), pp. 14-20
- [12] T. Nakanishi and S. Fujita, "Josephson NOR decoder circuit for Josephson memory arrays," Jap. J. Appl. Phys. vol.23, no.8, August 1984, pp. 1002-1006.
- [13] Y. Tazoh, K. Aoki and H. Yoshikiyo, "A new driver circuit for chip-to-chip signal transmission in J osephson packaging," IEEE Trans. Electron Devices vol.ED-33, no.7, July 1986, pp. 1049-1053

