Anyone who’s spent more that a few minutes with me knows I’m obsessed productivity and metrics. So it’s no surprise that I try to apply the same approach I do to my life as I do to my code. Today I want to share one of my “hacks” for my personal life that came out of basic project planning.
The inspiration for this is heavily influenced by Steve Kamb’s Epic Quest of Awesome (by the way if you are a nerd and interested in fitness, please poke around his site for some awesome workouts.
What I’ve come up with is a very basic system using Emacs org-mode. I have also created a system to interpret the org-mode file and create a dashboard of sorts, that I can easily track my progress on various goals.
The key to this (for me at least) is that I use the same organization file for my “Epic” goals or my yearly goals as I do for my day to day tasks. This forces me to continually (and literally) keep an eye on the bigger picture. I do build in regular time during the week to take that “step back” but I find this helps me keep focused on that bigger picture.
Again, this is nothing grandiose just a little hack I’d like to share that might help to get you on your own Epic Quest. In a future post I will be going over the code and methods you can use to great your own progress dashboard on your Epic Quest of Awesome!
If you have any questions about my particular setup in org-mode or my status board, drop a comment. Thank you.
A simple Go frontend to a json database. This is a simple “key store” that allows you to save statues of various services or objects. Updates are automatically timestamped. Contents of database and be dumped to a html file. “Success” and “Fail” status are considered “special” status and will be highlighted with green or red in the html output
Added a new command “test” to take place of the cumbersome “if” statements I’ve been using in crontabs. This will allow you to specify a command and then update with a “success” or “failure” message depending on the return code.
Get the source: https://github.com/zpeters/GoStatusBoard
See the original post: A Status Board for your Crontab
I’ve been a long time user of Free and Open Source Software (FOSS). By many measures my IT career has been a brief blink of the eye (12 years this summer), but all along the way I’ve been a dedicated user of FOSS software. I can’t count the number of times I’ve felt a rush of exhilaration when I’ve installed, setup and starting hacking away at real “industrial strength” server software right in front of my face.
It is a rare opportunity for someone to be able to cut their teeth on tools of the same level of sophistication as the “big boys” use. No trials. No watering down. No bullshit.
In my IT career, thus far, nothing has compared to being able to say “I know this”. Not because I’ve seen it in a book or watched a video but because I’ve been hands on, I’ve bent the software to my needs and have examined its inner-workings.
Lately, I’ve felt guilty.
The FOSS community is huge. Think of the scale – office applications, web servers, text editors, programming languages, entire operating systems. All of this made of millions of individual contributions. Of all of the software I’ve consumed, I’ve never truly given back.
Today I wanted to announce and share my first tiny contribution to the FOSS world. Over the past few months I’ve been working on a command-line interface to speedtest.net. It isn’t much and it’s far from perfect, but it scratches an itch I had and it is my first step at sharing with the community at large.
If you find this software helpful or useful please join in the fun and contribute to the project as you see fit – send me feedback, contribute code, fork the code and start your own project. Even if you don’t find this software particularly useful I’d like to challenge you to share, in your own way, to the FOSS community.
Speedtest.net command line interface – https://github.com/zpeters/speedtest
When you are thinking about IT often the first thought is hardware or software – some server in a backroom or a mission-critical application. When we speak of these important issues, critical processes, data integrity and security it is often in terms of software bugs, mean time between failure and other factors.
Today I’d like to bring up one area of IT that is often overlooked – the very language that we use. The words we choose to describe situations often have more of a “framing effect” than we’d like to admit.
The following list is my accumulation of what I’ve believe to be some of the dirtiest words in IT.
“Probably” comes first on the list because I feel it is the most dangerous. It’s so subtle that it slips into conversation imperceptible, however its effect is detrimental to the logical process needed to evaluate a situation. Often this is used when we are “reasonably certain” that a problem has been solved. The problem with this statement is that it short-circuits the logical process as “probably” always comes across as meaning “certainly”.
When we hear the word “probably” it can ring through with a certain air of assurance. We are placing our trust in the speaker that their experience and judgment is leading them to this conclusion and it is just a matter of formality to “prove” the circumstances. And herein lies the trap…
Why did we not just go the final 10% and prove conclusion is correct?
This is logical laziness!
True, there are circumstance that we cannot prove our assumptions – perhaps the necessary data is unavailable or the circumstances cannot be reproduced. We make our best guess and move on with life.
Unfortunately, “probably” is used in numerous situations where we can prove our assumptions. One classic example of this is in troubleshooting performance issues. Time and time again what I see the word “probably” used to mask or avoid the dirty work of truly tracing down the root cause of a performance issue. “It’s ‘probably’ Application FooBar, we just installed it yesterday and that’s when the performance issues started”. The best recent anti-example of this I’ve seen ( http://utcc.utoronto.ca/~cks/space/blog/sysadmin/MailProblemAnatomy ) goes to show that if we leave our assumptions at our first gut instinct we will never truly solve the issue.
Just remember “probably” comes from “probability”, so if you are saying something is “probably X” ask yourself what you are basing that opinion on. If you are basing it on a guess what is the harm of proving that assumption is correct?
Though less often spoken, the “never/always” dichotomy is entrenched in the thought process in IT. We are constantly planning for “future proof” systems, adding more drive space than we will “ever need” or building systems so resilient that they will “never” fail.
The root of this problem is that we are building an abstract logical system on top of a substrate that is firmly rooted in real world conditions. Variables such as component failure, unexpected latency and human creativity will continue to add unanticipated exceptions to your Platonic ideal of how a user will interact with your systems.
The key to keep in mind is that if something “never” or “always” happens then it would not need to be stated, it would be an unquestioned expectation. However, if you find yourself stating or thinking “X will never happen” or “Y will always do this” then ask yourself if this is a truism or if you are just taking the mental shortcut of “probably” and if so, what is the true probability?
The classic scenario for this would be a server with three drives in a RAID 5 configuration. RAID 5 is considered to provide durable storage since it is highly unlikely that multiple drives will fail at the same time, thus it is always true that we can survive a single drive failure. Or is it?
Consider the following, during the past decade drive speeds and capacities have grown leaps and bounds, but what has happened to the error-rate or mean time between failure? The “likelihood of read failure per storage unit” has not increased to compensate. Well that is what RAID 5 is to protect against, right? Check out http://www.standalone-sysadmin.com/blog/2012/08/i-come-not-to-praise-raid-5/ for a further explanation but generally at the capacity of over a few terabytes your likelihood of failure on rebuild is a coin-flip.
Again, calling back to our previous “dirty word” what is the true likelihood of failure in your particular environment?
Often when we are discussing cost we are exclaiming at the outrageous cost of the new Foobar Widget or how we got a great deal on last years discontinued model. Often these are just idle conversation starters, no one would really base business and technological decisions on the sticker price of an item. Surely there is other research involved…
Sadly too often this is the initial, and only, consideration made when evaluating a new technology. All things being equal, we should be able to make our decision on the intersection of “does what I want” and “cheap” and arrive at a logical conclusion – the correct conclusion. Ahh, but this is not the case. The sticker price of an item, any item, is far more complex than “the relative worth of the widget”, at it’s essence it is “what the market will bare”.
There are many factors that come into play when a manufacture is pricing any item. Some are very straightforward – the raw materials cost X, labor costs Y, we anticipate high demand for our new widget. We’d like to pretend that the retailer is covering their costs, making a fair profit and using that to base the cost on – this simply is not the case. In the world of retail there are many psychological pricing tricks that get used for various purposes. There is the “framing” effect of using Small, Medium and Large that will set guidelines for what you see as a realistic and fair price regardless of intrinsic value or other vendors pricing. Another common trick is simply pricing something higher to make it seem like a better item. “It costs more so it must be better, right?”
Once we get past the initial purchase price of an item there are the “hidden” costs. What are the maintenance costs, training costs, cost of consumables? What are the training costs to use the new widget? How long with Widget X last? How often does the manufacturer release a new model or for that matter how long will it be supported.
Again the running theme here is that we want to avoid “logical laziness” and truly evaluate the situation beyond our initial gut reactions. Even if the answers to these questions are unknown we still have made a more considered evaluation and are in a better position to understand the true costs involved.
“As Soon As Possible” has become one of the most trite expressions that has crept into professional language. Typically, this is used as a throwaway phrase or filler adding emphasis to the overall message being delivered (i.e. “Do It Now”). Sadly, this phrase is so lacking in information that it often comes across as more of a threat then any sort of actual priority assignment. When this phrase is used often what the speaker means to say is “this is the highest priority I have at the present moment and I’d like you to make it yours as well, the time frame for completion is limited”.
The problem with this phrase is that we are working backwards from our goal instead of the other way around. To say this another way – if you have made a convincing argument for why the other person should care about this issue, why it is important and time sensitive, the priority will be implicit and self-evident.
Whenever we are giving direction, making suggestion, etc. the global scope needs to be considered. Yes this problem is important, but at what scale? Is solving this problem?
Is it critical to continued service? Critical to continued business operation? Impacting one customer? Impacting multiple customers …
Having an idea of scale/scope helps us to start to prioritize this with other ongoing issues as well as determine how much “effort” to apply to the issue.
The difficulty with a statement like “X is complete” is that the very idea of “completion” can be elusive and have many meanings. Does “X is complete” mean the initial problem is circumvented for now? Is it fixed in the sense that the faulty part was replace? Have new pieces been put in place to predict or prevent failure of a similar scenario?
As you can see from just these few questions to idea of “complete” from person to person will have many different meanings in different situations. It is best to define the completion in terms of what has been accomplished and what can be expected in the future. After all, the very notion of “complete” or “finished” in IT is an asymptote. Given more time/money/desire surely a proceed or process could be made at least somewhat better.
When you are defining a product or solution it is best to come up with a list of agreed upon deliverables that way the “completeness” of the project is clear to everyone involved.
Saying something “just works” implies some sort of “magic” or an incomplete understanding of the system. If something really “just works” it could be that it is a truly durable and robust system – it is clearly thought out, has a high degree of reliability and transparency. Sadly, “just works” is often just the expression of “logical laziness” that we have discussed.
If one is afraid or unwilling to peel back the layers of the system it will appear as a simple “just works” black box. However, if you are willing to truly examine the foundations and inner workings of a system you will undoubtedly reveal a layer of complexity that you were previously unaware of.
Be careful not to confuse a truly elegant system where the pieces are well thought out and understandable with a system that is just opaque.
The very concept of a backup is a duality of backup/restore, there is no “backup”. Let me say this again, a “backup” is not a “backup” it’s only half of a thing that we typically call a “backup”. Without a verified restore we must consider the backup to be “write only”.
Regular and rigorous restore tests are the only way to ensure that you are truly able to retrieve the data you wish to share. Is there a bug in the vendor’s software? Does your filesystem checksum data read and written, does it do it correctly? What about your backup media? Are you even backing up the correct files? Server? None of these questions can truly be verified without testing the viability of the date you’ve attempted to backup.
Again the key to keep in mind is that every backup is a “backup attempt” and cannot be fully trusted until it’s been restored. The restore must be of the same level and quality you wish you use the backups for. If you only are truly concerned about recovering a text file, then restoring that file and verifying it’s contents is enough. However if you wish to use any data with any sort of complexity it needs to be loaded into the appropriate application. Do you know, truly know, that the data file you are backing up is in a consistent state on the disk – are you certain it was entirely flushed to disk when your backup ran? You’ll never really know until you do a full test restore and run it in at least an approximation of your production environment…
I hope you’ve enjoyed my tour through the seven “dirty words” of IT. The purpose of this is not to by hyper-analytical and just pick apart common mistakes or communication difficulties – it hopefully has left you with some scenarios in your life to think about and find new ways to think about and approach these problems.
If you’ve encountered any situations like theses, or have some “dirty words” of your own please share in the comments.
Thank you for reading
This is more of a reminder for myself because I forget this every time…
Please note the
-Restart at the end. This will reboot without mercy :-)