| leading the way to the
new storage frontier|
zero to three seconds - aspects
extreme diversity in
- March 23, 2015 |
An ever present tension in SSD designs has
always been - that making one thing better - can result in something else
inevitably getting worse.
In a past article -
11 SSD design
symmetries - I showed why stretching an SSD specification in one direction
to optimize behavior for one beloved application can render the same SSD much
less suitable for other starring roles.
That's one reason why there
can never be such a thing as the "perfect SSD".
anyone tries to tell you otherwise - you know that you understand this a lot
better than they do.
I always find it interesting to think about the
extreme cases of SSD design (and
conditions in markets) because you can learn a lot about what to expect to
see in the market - by pushing parameters to their limits.
You can do
this in your imagination. It's rarely affordable (or advisable) to test all
these extreme limits in the lab (or in your own systems).
market provides us with many examples we can learn from.
up time in 0 to 3 seconds
This note - the 0 to 3 seconds part
- is about the range of power hold up times I've seen in the market - in the
context of rugged / military 2.5" SATA SSDs.
Most (but not all)
flash SSDs use internal capacitors to hold up the power island of the flash
array and controller to enable the SSD it to save its state and stash data
and metadata securely in the event of sudden power loss.
the bigger the hold up capacitance, and the longer the hold up time - the less
likely it is that power line disturbances will corrupt data.
firmware in the
SSD controller, the
type of flash used, and
the size and technology make up of internal
cached data -
factor together go into determining how long that minimum hold up time needs
to be. And hold up isn't the only concern - line noise can be too. For more
see the papers linked in
the 3 seconds case
An extremely long example of hold
up time is the
- a 2.5" rugged
Solidata - which has
the longest hold up time I'm aware of in a device this size and with this type
of capacity, ruggedness and performance.
hold up time is about 3 seconds!
But such a long hold up time - due to
a big capacitor (3F) will also have an impact on what happens at power up as
well - as the effect will be to elongate the power up ready time too - and
require control of the charging current.
So I asked Solidata about
that aspect of the design.
Solidata said their Rana SSD includes a protection circuit to avoid
the current surges and it will take 2 to 4 seconds for the power on to ready
That's similar to the power on ready time for hard drives -
and in most applications will be OK.
On the other hand - if you have
a military application which needs very fast cold boot - then this is not the
SSD for you.
Another question which arises in capacitive hold up
systems - is what are the failure modes of the capacitor?
The range of
important failure modes are :-
- fail to open circuit - in which case the failed capacitor no longer
provides the hold up protection - (which in some designs is mitigated by a
parallel array of caps).
Some vendors choose caps which - due to their
internal layout and materials - are known to fail in a particular mode - that is
open circuit - which is the easiest mode to design around.
- fail to short circuit - which requires current limiting protection
I didn't ask
Solidata about this aspect of their design - but there's nothing to stop you
asking them - if you need to know.
the 0 seconds case
is the type of design in which there are no hold up capacitors - or batteries -
assumed in the design of the data integrity system. (And any capacitance in or
around the SSD is purely for EMI compatibility - and not for power fail
An example of this is the
Microsemi - which I
wrote about when the product was launched in
March 2011. (In
2016 the product line and business unit was acquired by
interviewed Jack Bogdanski, Director of New Product Development in 2011
he didn't want to discuss the exact way they'd solved this design problem - but
my guess (based on my
) is that the SSD probably uses a combination of 2 (or more) things
- a small amount of raw fast write
nvm (distinct from
flash) for the write activity metadata, and
As you can imagine - you have
to ensure that all data in flight is controlled and monitored as well.
- a guardian angel state machine which filters all R/W activities to check
that internal transcations which have started have been known to complete.
With the initial assumption always being saved as being incomplete -
unless proved otherwise.
And if the activities haven't completed -
rolling back to the last raw saved data fragements and resuming the rebuild
in flash - until it's known to be good.
can be done - if you have control of all the firmware inside the SSD. And
invest a lot of work.
How much work?
Microsemi told me (in
2011) a team of 5 people had worked for nearly 2 years on the SSD power
And - I suspect they haven't stopped working on it - as
they've added more security features in recent years - each of which carry their
own data integrity burdens.
The only downside in this design approach
(aside from the need to create the design IP and patents) is that a true zero
capacitance hold up time compatible design won't give you the highest data
throughput - unless you scale up the alternative nvm - which you can't do in a
very small footprint.
And another thing is that Microsemi's design is
skinny - which
means the ration of it nvm cache registers to flash capacity is close to zero.
Later (in 2017):-
described how its ST-MRAM has been used in some SSD designs to protect
data in flight and remove the need for large capacitors. "The use of
ST-MRAM enables improving the power fail window from 100mS in the case of NAND
Flash to less than 10µS." This demonstrates the principles involved in
using a small NVRAM alongside a flash arrays.
In a solo SSD
context the less components you start with the more reliable the design. But
there can be different considerations in an
FT SSD array.
For more about reliability modeling problems see
FITs - data architecture and
flaws in component based SSD failure analysis
within the narrow space of military oriented 2.5" SATA SSDs - you can find
a wide range of differences in design approaches in nearly every aspect of the
I've used the power hold up architecture as my example today.
But there are other apects I could have chosen.
In any single
project - it's extremely unlikely that you'd be looking at the 2 SSDs above
for the exact same application - because of other factors such as:- power
consumption, IOPS performance, location of suppliers, and longevity of
supply - all of which would take precedence.
But I had noted these
differences in my own reading before (which are directly due to differences in
the RAM cache
flash architecture) and I was reminded if them today by an email from a
reader who is designing power loss protection for a another market and a
different form factor.
For more about this subject see my article and
related resources in -
sudden power loss
...Later:- prompted by my many
questions, Clark Yu, R&D Engineer - Solidata provided
more details about the thinking in the
military SSD in this
Power Loss Protection in Solidata's Rana series military SSD. The key
point here is that the design uses "an industrial grade 3F capacitor."
what happened in SSD
year 2015? /
Fast purge and autonomous
data destruct flash SSDs
challenges in flash SSD design and emerging nvms
was it so hard to compile a simple list of military SSD companies?
|Designers of military and
secure industrial systems for whom SLC is the only flash memory good enough -
but who also needed higher capacities in their 2.5" SATA slots have - until
recently - had little choice but to consider SSDs with significant internal
capacitor holdup for their toughest designs. And that, in turn means a complex
qualification process and really getting to know the internal ad hoc internal
details of SSD architectures and related firmware which might well change
considerably over the lifetime of their projects.
|new MIL SSD for those
who loathe supercaps (July 16, 2015)|
matter how much UPS you have...|
power fails during writing to a page in
are still possible."
|Tony Pond, Virtium in his blog -
which outlines (May 28, 2015)|
|In 2011 SMART said they
didn't think supercaps were reliable enough for enterprise SSDs because...|
every 10 degrees C of ambient operating temperature rise, the life expectancy of
a supercapacitor can be cut approximately in half."
they used NbO capacitors in an array in their XceedIOPS2.
|SSD news story - April
|Viking's DDR-3 flash back
DRAM DIMM - the ArxCis-NV - relied for power fail on an optional external 25F
|SSD news story - October
|"Power fail protection
is a differentiator for embedded SSDs, and many vendors tout solutions. However,
developing effective power fail protection is as much an art as it is a science,
and is not a trivial endeavor."
|The above quote comes from a 2013 paper -
Art of SSD Power Fail Protection (pdf) - by
other things it provides the justification and marketing support framework for
a technology called
- which WD acquired when it bought
was the first SSD company to invest in
and branding about
sudden SSD power loss - and ways of testing such schemes in system
You can judge how much importance they attached to the
awareness of these reliability issues by the fact that they ran expensive
banner ads here on from 2005 to 2012 - to promote their
You can still see these old ads in my article
- the cultivation
and nurturing of "reliability" in a 2.5" SSD brand
|The willingness to offer
customization and professional design engineering support opens doors to
valuable customers who are leaders in their own vertical markets but whose unit
volumes are too small to be of interest to high volume standard SSD vendors.|
|some thoughts about SSD
|Do the power up / down
tradeoffs in other types of non volatile memories provide a better applications
It's tempting to think that the grass is greener with nvms which
have faster write cycles and therefore shorter power holdup requirements.
aside from density constraints there are other systems problems which they
ECC architectures can't simply be migrated from the DRAM and
nand/nor flash experience into newer emerging nvms. They have their own
The scale of these difficulties with soft error rates were
discussed in an SSD
news story in February 2017 - Soft-Error Mitigation for PCM and STT-RAM.