December 13, 2017

My Etherealmind

Internet Giants Should Be Broken Up

This is a 30 minute presentation that highlights the lack of societal value that Google, Apple, Facebook and Amazon deliver. Galloway examines their market dominance and how the market is failing to regulate or control the tech companies. I recommend watching this and considering the ideas proposed here. Galloway is well known and worth listening […]

by Greg Ferro at December 13, 2017 06:13 PM Blog (Ivan Pepelnjak)

Create IP Multicast Tree Graphs from Operational Data

A while ago I created an Ansible playbook that creates network diagrams from LLDP information. Ben Roberts, a student in my Building Network Automation Solutions online course used those ideas to create an awesome solution: he’s graphing multicast trees.

Here’s how he described his solution:

Read more ...

by Ivan Pepelnjak ( at December 13, 2017 08:01 AM

XKCD Comics

December 12, 2017

Networking Now (Juniper Blog)

The Art of Fighting Cyber Crime

When it comes to defending your organization from cyber crime, time matters. Visibility matters. Environment matters. And, more than ever, conditions matter. In order to shrink the time from detection to remediation, security operators need a cyber defense system truly adapts to a hyper-active threat climate and is designed from inception to be agile. That window of time between detection and remediation defines the overall potential impact of a security breach. The longer the time, the greater potential for damage. The diversity of environments – physical, virtual, private cloud, public cloud, locations, and departments – drives the need for a more responsive and unified approach to cybersecurity. The sheer volume of information generated by your security environment creates a firehose of alerts from so many sources that security operators often have difficulty seeing the most crucial characteristics of the threats that come into their view.

by Amy James at December 12, 2017 09:25 PM

Moving Packets

One Little Thing Can Break All Your Automation

I’ve been doing some work automating A10 Networks load balancers recently, and while testing I discovered a bug which broke my scripts. As hard as I try to code my scripts to cope with both success and failure, I realized that the one thing I can’t do much with is a failure of a key protocol.

A10 Networks Logo

Breaking Badly

So what matters when I’m automating device configurations? Three things come to mind immediately:

The Network / Transport

I need reliable network connectivity, or my automation requests will constantly be timing out or failing. While a script should be able to cope with an occasional network failure, unreliable networks are not fun to work with.

Data Transfer Format

Pick a format (XML, JSON) and use it consistently. I’m happy so long as I can send a request and read a response in a given format. If I send a request using JSON, send a response using JSON. Funnily enough I was troubleshooting WordPress xmlrpc recently and noticed that when there was an error, the XML command being issued received a 404 error followed by, well, you’d hope an XML error response, right? No, because it was an HTTP 404 error, the site was returning the blog search page instead. I think I would have preferred an XML response explaining what the error actually was. Unsurprisingly, the client code using the XMLRPC connection was complaining about an unexpected XML response (correct, since it was HTML).

Consistent API

Create an API that makes sense (I can only dream). Create consistent responses so that I don’t have to “special case” every single response based on the request I make. If a particular response element can be an array, always send back an array, even if it only has a single entry; don’t send a string instead. Wrap responses consistently so that errors and responses can be easily distinguished and extracted. For example, I found this note to self in some code I wrote last year:

// Decode response into an object.
 // But because infoblox is stupid:
 //  - if it's an error it returns a hash;
 //  - if it's an actual result, it provides an array of hashes, because screw you.
 // So, we need to test the first byte of the response to see whether it's [ or {. 
 // If it's a { decode it as an error. Otherwise decode as an IPAM object.

Ok, it’s not the end of the world, but it does add an additional step which I really don’t appreciate.

ACOS For Concern

So, my recent discovery with the A10 Networks load balancers, which run the ACOS operating system, was that the encoding of escaped characters within the configuration can mean that ACOS will return invalid JSON in response to my request. For example, imagine that a health check must be configured to request the URL /checkseq\8s1. It’s an unusual URL because it has a backslash in it, but that’s what the server in question asks for, so that’s the health check that’s needed. ACOS understands escaped characters (using a backslash), so to send a \ in the health check, it would have to be entered as \\. Similarly, to send \r\n (carriage return, new line) the health check would contain \\r\\n, and that allows the addition of a custom HTML header as well, which in this example is called “X-Custom-Field” and has a value of 101:

health monitor checkweb
 method http url GET "/checkseq\\8s1\\r\\nX-Custom-Field: 101" expect 200"

When the health check is used, the GET string is analyzed and the escaped characters resolve to a more normal looking string:

/checkseq\8s1\r\nX-Custom-Field: 101

However, when viewing the health monitor’s configuration via the REST API, the same exact process occurs and the JSON for the method is encoded something like this:

    "method”: "http",
    "type": "url",
    "subtype": "GET",
    "expect": "200",
    "url": "/checkseq\8s1\r\nX-Custom-Field: 101"

When read by the received, the url string is again analyzed for escaped characters and the following are discovered:


Unfortunately, \8 is not a valid escape code, thus the JSON decoding process spews an error at this point. To me this is a failure in the JSON encoder in ACOS; it should take the interpreted string then make it ‘JSON-safe’. By having encoded a string including the invalid character “\8”, ACOS generated invalid JSON. Since my JSON decoder can’t handle invalid JSON, my automation fails on the spot. I don’t know if the query worked or not; I only know it couldn’t be decoded. Highly annoying.

The Workaround

This all started because of a health check URL containing a backslash. The workaround, rather than using “\\” is to URL-encode that backslash as %5C (or similar) in the original health check. However, there’s no way to stop a user creating another “url bomb” in the future, because ACOS will accept \\ in the url string without generating an error.

My 2 Bits

What this really brings home to me is how a breakdown in a key protocol – in this case JSON – can bring automation to its knees. We assume that protocols like TCP will just work, and at this point I think of JSON, largely, in the same way. Scripts rely upon formats like JSON to allow the accurate storage and transport of information, but if the JSON can’t be read by the recipient, the data is lost. In the case of my automation scripts, it brought a workflow to a screeching halt, and it was not possible to get past that point in the process without manually applying a workaround to the health check which was causing problems.

There’s certainly a lesson here about checking results, and raising alarms when an unexpected result shows up. Even a reliable automation script will need some tender loving care at times.

If you liked this post, please do click through to the source at One Little Thing Can Break All Your Automation and give me a share/like. Thank you!

by John Herbert at December 12, 2017 08:13 PM

Networking Now (Juniper Blog)

Adaptive Security Policies for Dynamic Security Environments


Statically defined network security policies impose significant operational burden adapting to ever changing security environment. Lets see how Juniper’s innovative, patent-pending construct called “Dynamic Policy Actions” allows you have right security for right conditions

by snimmagadda at December 12, 2017 02:45 PM Blog (Ivan Pepelnjak)

Moving Complexity to Application Layer?

One of my readers sent me this question:

One thing that I notice is you mentioned moving the complexity to the upper layer. I was wondering why browsers don't support multiple IP addresses for a single site – when a browser receives more than one IP address in a DNS response, it could try to perform TCP SYN to the first address, and if it fails it will move to the other address. This way we don't need an anycast solution for DR site.

Of course I pointed out an old blog post ;), and we all know that Happy Eyeballs work this way.

Read more ...

by Ivan Pepelnjak ( at December 12, 2017 07:53 AM

December 11, 2017

A Christmas Support Story

Warning: Non-Technical Post

As it’s the festive period and this time of the year is for caring and sharing, here’s a short story from many years ago. This might make some chuckle, but some of these times were not pleasant and I can assure you, they were very real!

Like most IT related people, I started in support. The job paid peanuts, it was shift work and I had much to learn. Being quite eager to please, many mistakes were made and in these cases seniors were supposed to help the younglings (like me). For some companies, a functioning support network just isn’t there and low rank power struggles leave you fighting fires a la solo.

Within the first three months of the job, I experienced two major backhaul fibre outages, a group of people stealing our generator power cables and the air conditioning system failed to the point of meltdown. We also had a total power outage which took 40 hours or so of non-stop work to get everything back online and healthy.

These kinds of experiences make or break you. The phones do not stop ringing (at least when the power is on) and customers rightfully do not stop complaining. If you survive the pressure, your skin begins to thicken.

The thing that happened that should have never have happened, can also happen again. Lightening does strike twice in the same place!

It can always happen so be honest with yourself and customers. Understanding contractual obligations for both parties is important and despite the customer trying to push you in to a corner, you have real limits on what you can do.


When it comes to customers, I like to make sure they get what they pay for and feel connected to us. When you have a good relationship with your customers, awkward conversations are sometimes easier to have. I recall one particular customer who had leased line issue after leased line issue. Her issues combined with our issues made her job very risky after placing her eggs in our basket. Knowing that her job was at risk, she was often the first customer I called proactively when we were having issues. I nearly lost my job over this.

Despite how proactive you think you’re being, always ensure your management team understands what position your customers may be in due to issues in your realm. Sometimes, management want to come forward with a more rounded package and story as compensation instead of a young happy go person ringing them up to apologise and assure them everything is being done to restore a service.

Do not allow yourself to be brushed off by poor management but allow yourself to be guided by them when they understand what the full situation is. Shared responsibility goes a long way for your customer and for yourself. If they do not understand, try harder; that’s their job and don’t let them forget it!


To all of the support people over this festive period, good luck my friends. May it be an event-less and merry time.

The post A Christmas Support Story appeared first on

by David Gee at December 11, 2017 06:45 PM Blog (Ivan Pepelnjak)

First Speakers in the Spring 2018 Automation Online Course

For the first two sessions of the Building Network Automation Solutions online course I got awesome guest speakers, and it seems we’ll have another fantastic lineup in the Spring 2018 course:

Most network automation solutions focus on device configuration based on user request – service creation or change of data model describing the network. Another very important but often ignored aspect is automatic response to external events, and that’s what David Gee will describe in his presentation.

Read more ...

by Ivan Pepelnjak ( at December 11, 2017 07:43 AM

Networking Now (Juniper Blog)

Necurs Malspam Delivers GlobeImposter Ransomware



Necurs botnet seems to be coming up with a fresh wave of malspam delivering GlobeImposter ransomware. The malspam comes in the form of a quasi-blank email with little to no message content, a short subject line and an attached 7z archive containing a VBScript that downloads the ransomware.


Indicators of Compromise (IOCs)

GlobeImposter Payload MD5sum: c99e32fb49a2671a6136535c6537c4d7


Technical Analysis


Mail Attachment VBScript Analysis

Looking at spam mail that has already been gathered and comparing it to other samples seen on VirusTotal in the last few days, the spam mail samples seen so far show mostly blank mail with little to no content and malicious attachments in the form of 7z archive, containing VBScript files with a naming scheme that looks either numerical like 10006000420.7z or like FL_386828_11.30.2017.7z. Other attachment naming schemes have also been seen.


The VBScript file is somewhat obfuscate. It stores a hard-coded string that it parses to obtain various sub-strings, which it then uses to figure out the objects it needs to create for network communication, the file name to use for the ransomware to be downloaded and so on. The format of this string looks like the example below, delimited by the character “>”, and is present in the reverse form, which the malware reverses before extracting the necessary substrings.




The VBScript file has a hardcoded list of URLs, as shown in the below snapshot, to download the ransomware from. It. then loops through the list of URLs until it is successful. Various attachments from multiple spam emails were analyzed and showed a largely non-intersecting list of URLs in each attachment. Below left is the obfuscated code looping through the URL list trying to download the malware. Below right is the corresponding simplified pseudocode.



The VBS file saves the payload to the “C:\Users\user\AppData\Local\Temp\” folder and executes as seen below.



While the attachment itself is compressed, the VBScript file inside the archive, seen across various attachments samples, shows a recurring pattern in the code that can be leveraged by an IDS/IPS or YARA engine that can decompress the archive and match on the VBScript file.


Common patterns seen across the VBScript attachments include:


“krapivec\s*=\s*Array(“ -> Regex



“CUA ="Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0"”



And so on.


 Ransomware Payload Analysis

The ransomware payload is in the form of an NSIS installer. When spawned, it unpacks itself, spawns a copy of itself in suspended mode and injects its code into the child using process hollowing technique.




When the malicious payload is executed, the malware encrypts all files adding the “..doc”extension. Previous and other versions of the ransomware are known to use other extensions.


The malware ensures persistence by putting a run entry key of “UpdateBrowserCheck” with  the path to the ransomware executable under HKCU\Software\Microsoft\Windows\RunOnce.



It writes a temporary batch file to the system, as shown below, with commands to delete any shadow volume copies in order to prevent restoration of encrypted files. It also clears all event logs from the system to cover its tracks.




Strings from the unpacked sample:


The ransom file name and the ransom note.



Batch commands run by the malware:





Ransom Note

The ransomware note, “Read__ME.html” is dropped into every directory where files are encrypted, asking the user to connect to the TOR network for the decryptor. Upon clicking the “Buy Decryptor” button, it redirects the user to an onion link, mentioning the ransom amount, starting a timer for 48 hours and, doubling the ransom amount when the timer expires.


It is interesting to note the difference from other ransomware -  the start of the timer is not tied to the encryption time and the ransom amount is not known at the time the victim is notified of what just happened. Both require a victim’s action to visit the TOR web address before triggering the timer and finding out the ransom amount.


Akin to other ransomware, this variant of GlobeImposter allows the victim to decrypt one file of their choice to gain assurance that the decryption is possible prior to paying the ransom. Many victims indeed refuse to pay the ransom because they do not believe there is a guarantee of recovering their files. The ransomware criminal gangs are having to recover from the damage to their image done by wipers disguised as ransomware.




Both Cyphort (now a Juniper company) and Juniper Sky ATP detect the email attachment  and the ransomware payload.



Many thanks to Abhijit Mohanta form the Threat Reserach Team for co-authoring this blog.

by asaldanha at December 11, 2017 04:36 AM

XKCD Comics

December 10, 2017 Blog (Ivan Pepelnjak)

New Content: Debugging Ansible Playbooks and Jinja2 Templates

Here’s a quote from one of my friends who spent years working with Ansible playbooks:

Debugging Ansible is one of the most terrible experiences one can endure…

It’s not THAT bad, particularly if you have a good debugging toolbox. I described mine in the Debugging Ansible Playbooks part of the Ansible for Networking Engineers online course.

Please note that the Building Network Automation Solutions online course includes all material from the Ansible online course.

by Ivan Pepelnjak ( at December 10, 2017 12:23 PM

December 09, 2017 Blog (Ivan Pepelnjak)

Worth Reading: Postpone Inbox Procrastination

This blog post by Ethan Bank totally describes my (bad) Inbox habits. If you're anything like me, you might find Ethan's ideas useful (I do... following them is a different story though).

by Ivan Pepelnjak ( at December 09, 2017 06:12 AM

December 08, 2017

Ethan Banks on Technology

Pre-Order My Computer Networking Problems & Solutions Book And Save 40%

I co-authored Computer Networking Problems And Solutions with Russ White. The nice folks at are accepting pre-orders of the book and ebook at 40% off until December 16, 2017. Go get yourself a copy of this short 832 page read via this link containing all of InformIT’s titles coming soon.

Or, if you use the book’s product page instead of the “coming soon” link above, use code PREORDER to get the discount.

All “coming soon” titles on sale at InformIT:

Product Page for Computer Networking Problems & Solutions:

by Ethan Banks at December 08, 2017 08:11 PM

The Networking Nerd

Network Visibility with Barefoot Deep Insight

As you may have heard this week, Barefoot Networks is back in the news with the release of their newest product, Barefoot Deep Insight. Choosing to go down the road of naming a thing after what it actually does, Barefoot has created a solution to finding out why network packets are behaving the way they are.

Observer Problem

It’s no secret that modern network monitoring is coming out of the Dark Ages. ping, traceroute, and SNMP aren’t exactly the best tools to be giving any kind of real information about things. They were designed for a different time with much less packet flow. Even Netflow can’t keep up with modern networks running at multi-gigabit speeds. And even if it could, it’s still missing in-flight data about network paths and packet delays.

Imagine standing outside of the Holland Tunnel. You know that a car entered at a specific time. And you see the car exit. But you don’t know what happened to the car in between. If the car takes 5 minutes to traverse the tunnel you have no way of knowing if that’s normal or not. Likewise, if a car is delayed and takes 7-8 minutes to exit you can’t tell what caused the delay. Without being able to see the car at various points along the journey you are essentially guessing about the state of the transit network at any given time.

Trying to solve this problem in a network can be difficult. That’s because the OS running on the devices doesn’t generally lend itself to easy monitoring. The old days of SNMP proved that time and time again. Today’s networks are getting a bit better with regard to APIs and the like. You could even go all the way up the food chain and buy something like Cisco Tetration if you absolutely needed that much visibility.

Embedding Reporting

Barefoot solves this problem by using their P4 language in concert with the Tofino chipset to provide a way for there to be visibility into the packets as they traverse the network. P4 gives Tofino the flexibility to build on to the data plane processing of a packet. Rather than bolting the monitoring on after the fact you can now put it right along side the packet flow and collect information as it happens.

The other key is that the real work is done by the Deep Insight Analytics Software running outside of the switch. The Analytics platform takes the data collected from the Tofino switches and starts processing it. It creates baselines of traffic patterns and starts looking for anomalies in the data. This is why Deep Insight claims to be able to detect microbursts. Because the monitoring platform can analyze the data being fed to it and provide the operator with insights.

It’s important to note that this is info only. The insights gathered from Deep Insight are for informational purposes. This is where the skill of network professional comes into play. By gaining perspective into what could be causing issues like microbursts from the software you gain the ability to take your skills and fix those issues. Perhaps it’s a misconfigured ECMP pair. Maybe it’s a dead or dying cable in a link. Armed with the data from the platform, you can work your networking magic to make it right.

Barefoot says that Deep Insight builds on itself via machine learning. While machine learning is seems to be one of the buzzwords du jour it could be hoped that a platform that can analyze the states of packets can start to build an idea of what’s causing them to behave in certain ways. While not mentioned in the press release, it could also be inferred that there are ways to upload the data from your system to a larger set of servers. Then you can have more analytics applied to the datasets and more insights extracted.

Tom’s Take

The Deep Insight platform is what I was hoping to see from Barefoot after I saw them earlier this year at Networking Field Day 14. They are taking the flexibility of the Tofino chip and the extensibility of P4 and combining them to build new and exciting things that run right alongside the data plane on the switches. This means that they can provide the kinds of tools that companies are willing to pay quite a bit for and do it in a way that is 100% capable of being audited and extended by brilliant programmers. I hope that Deep Insight takes off and sees wide adoption for Barefoot customers. That will be the biggest endorsement of what they’re doing and give them a long runway to building more in the future.

by networkingnerd at December 08, 2017 02:34 PM Blog (Ivan Pepelnjak)
XKCD Comics

December 07, 2017

Dyn Research (Was Renesys Blog)

Puerto Rico’s Slow Internet Recovery

On 20 September 2017, Hurricane Maria made landfall in Puerto Rico.  Two and a half months later, the island is still recovering from the resulting devastation.  This extended phase of recovery is reflected in the state of the local internet and reveals how far Puerto Rico still has to go to make itself whole again.

While most of the BGP routes for Puerto Rico have returned, DNS query volumes from the island are still only a fraction of what they were on September 19th  — the day before the storm hit.  DNS activity is a better indicator of actual internet use (or lack thereof) than the simple announcements of BGP routes.

We have been analyzing the impacts of natural disasters such as hurricanes and earthquakes going back to Hurricane Katrina in 2005.  Compared to the earthquake near Japan in 2011, Hurricane Sandy in 2012, or the earthquake in Nepal in 2015, Puerto Rico’s disaster stands alone with respect to its prolonged and widespread impact on internet access.  The following analysis tells that story.

DNS statistics

Queries from Puerto Rico to our Internet Guide recursive DNS service have still not recovered to pre-hurricane levels as is illustrated in the plot below.  Earlier this week, on December 4th, we handled only 53% of the query volume from Puerto Rico that were received on September 18th, just before the hurricane.  Both dates are Mondays, hopefully ruling out possible day-of-the-week effects.


Queries from Puerto Rico to our authoritative DNS services are also reduced from prior to the hurricane, but not as much as to our recursive DNS service.  This may be because caching effects are more pronounced with our authoritative DNS, since they handle queries for a smaller set of domains than our recursive DNS service.  Additionally, we may have lost some Internet Guide clients if those computers reverted to a default DNS configuration upon returning to service.  Regardless, the volume is still lower than pre-hurricane levels for authoritative DNS. On December 4th, we handled 71% of the query volume from Puerto Rico as compared to September 18th.


Based on these two figures (53% and 71%), we estimate that internet service in Puerto Rico is only a little more than half of where it was before the hurricane.

BGP and traceroute measurement statistics

The graphic below shows the impact of the hurricane on the routed networks of Puerto Rico colored by the major providers.  Many of these BGP routes were withdrawn as the hurricane came ashore and the island suffered what has been labeled the largest power outage in US history.  By early November, most of these routes were once again being announced in the global routing table, however, damage to last-mile infrastructure meant that many Puerto Ricans were still unable to obtain internet access.


Our traceroute measurements to Puerto Rico, illustrated below, tell a similar story — a steep drop-off on 20 September 2017, followed by a long slow recovery that appears to come incrementally as different pieces of Puerto Rican infrastructure come back online.  Despite an island-wide power outage, some networks in Puerto Rico (like Critical Hub Networks) continued to be reachable throughout the period of analysis.  While the plot below shows a steeper dip than the BGP-based plot above, the responding hosts that we measured to are often part of the core infrastructure and more likely to be connected to backup power than access layer networks and could, like the BGP routes above, overstate the degree of recovery.


Submarine cable impact

Perhaps less appreciated about this incident is Hurricane Maria’s impact on connectivity in several South American countries.  Puerto Rico is an important landing site for several submarine cables that link South America to the global internet.  The cable landing station serving Telecom Italia’s Seabone network had to be powered down due to flooding.  A statement from Seabone read:

We must inform you that Hurricane Maria (Category 5) has impacted Puerto Rico causing serious damage and flooding on the island. We had to de-energize our nodes at the station to avoid serious damage to the equipment.

As a result, in the early afternoon on 21 September 2017, we observed traffic shifts away from Telecom Italia in multiple South American countries as the submarine cable became unavailable.  To illustrate the impact, below are four South American ASNs that experienced a loss of one of their transit providers at this moment in time.  Cablevision Argentina (AS10481) is from Argentina, while the other three are from Brazil.  Brazilian provider Citta Telecom AS27720) lost service from Eweka Internet, while the others lost Telecom Italia transit.

Additionally, following the hurricane, Venezuelan incumbent CANTV announced that their international capacity had been cut by 50% due to storm-related submarine cable damage. The announcement was met with skepticism from a citizenry increasingly subjected to censorship and surveillance by their government.

<script async="async" charset="utf-8" src=""></script>

However, our data shows impairment of CANTV’s international links aligns with other outages in the region due to the effects of the hurricane. The plots below show latencies to CANTV from several cities around the world spiking on 21 September 2017 after the submarine cable station in Puerto Rico was flooded.


Immediately following Hurricane Maria’s arrival in Puerto Rico, Sean Donelan, Principal Security Architect of Qmulos began dutifully posting status updates he had collected to the NANOG email list about the connectivity situation on the island.  In addition, the website was setup to collect various metrics about the status of the recovery.

Now with over two months of hindsight, we can truly appreciate just how devastating the hurricane was in many respects other than simply internet impairment.  Puerto Rico may no longer be in the headlines as it was just after the storm, but the resources required to get this part of the United States back on its feet are truly extensive.

Below are links about how you can help:

<script async="async" charset="utf-8" src=""></script>

by Doug Madory at December 07, 2017 05:07 PM Blog (Ivan Pepelnjak)

December 06, 2017 Blog (Ivan Pepelnjak)

Automate Remote Site Hardware Refresh Process

Every time we finish the Building Network Automation Solutions online course I ask the attendees to share their success stories with me. Stan Strijakov was quick to reply:

I have yet to complete the rest of the course and assignments, but the whole package was a tremendous help for me to get our Ansible running. We now deploy whole WAN sites within an hour.

Of course I wanted to know more and he sent me a detailed description of what they’re doing:

Read more ...

by Ivan Pepelnjak ( at December 06, 2017 07:12 AM

XKCD Comics

December 05, 2017 Blog (Ivan Pepelnjak)

Stop Googling and Start Testing

Here’s a question I got on one of my ancient blog posts:

How many OSPF process ID can be used in a single VRF instance?

Seriously? You have to ask that? OK, maybe the question isn’t as simple as it looks. It could be understood as:

Read more ...

by Ivan Pepelnjak ( at December 05, 2017 12:43 PM

December 04, 2017

Ethan Banks on Technology

Postpone Inbox Procrastination

I’ve recently admitted to myself that my ineptitude with my inbox is due largely to procrastination. That is, I can’t face the task that a particular inbox message presents, and thus I ignore the message. With this admission comes a desire to reach inbox zero each and every day. I don’t like my productivity squashed by ineptitude. I must overcome!

But how?

  1. Getting to inbox zero each day is, first of all, an important goal. In other words, I really want to be at inbox zero each day. I don’t want to leave items hanging around for the next day. Therefore, among all my tasks, I have to prioritize inbox management.
  2. I filter messages heavily. I use Gmail, and have begun digging into the filtering system. At the moment, I have 27 rules that route messages to folders. Those rules are covering several dozen PR agencies, newsletters, and auto-notifiers. This helps me to focus when I’m working on my inbox, making it much easier to evaluate and react to messages depending on the folder they were routed to.
  3. I unsubscribe from uninteresting lists. Because I work in media, I receive pitches everyday from PR firms who don’t know me, but found me in a database and hope I’ll cover their customer’s product. Therefore, everyday I have to unsubscribe from certain lists I didn’t ask to be on.
  4. Outbound messages breed inbound messages. Therefore, I don’t respond to messages unless absolutely necessary. When I do respond, I attempt to be as complete as possible to minimize the conversational exchange. That means I anticipate questions and action items, and handle everything up front in a single message if possible. I don’t create a minimum effort message and throw it over the wall, which is really just delaying completion of the task.
  5. I remind myself that my inbox is not a task management tool. If I have a message I can’t complete that moment, I will create a task with a due date and tackle it when my task manager says I need to get it done. Then I am free to archive the message and respond to it later. I’m starting to feel that archiving is greater than deleting, because the message goes away while still being searchable. On the other hand, that could lead to a large mail database, and I’m not sure how I feel about that.
  6. I postpone procrastination. If I open my inbox, that means I’m there to tackle each and every item, moving them all towards closure. I’m not going to cherrypick favorite items for that dopamine hit. Rather, I’m going to go through each message chronologically (I’m still terrible at this), and work it through. I will not leave for another day items that invoke dread, because another day becomes another week. A week becomes a month. A month becomes two months, or even three. Procrastination is not getting things done, so I leave procrastination for another day.

The big deal here…

…is focus. To be able to grind through the daily inbox flood, I stack the deck in favor of focus. When I focus, I get the inbox cleared out.

I think of inbox management like cleaning the catbox. Doing it every day is best. If I miss a day, it’s tolerable, but sort of gross. If I skip a couple of days, I don’t really want to go in there, because cleaning it up is just a nasty, nasty job.

Therefore, it’s best to exercise self-discipline, focus once a day, and sift the inbox clean.

by Ethan Banks at December 04, 2017 04:50 PM Blog (Ivan Pepelnjak)

Simplifying Products

When I started my project life was simple: I had a few webinars, and you could register for the live sessions. After a while I started adding recordings, subscriptions, bundles, roadmaps (and tracks), books… and a few years later workshops and online courses.

As you can imagine, the whole thing became a hard-to-navigate mess. Right now you can buy almost 70 different products on Time for a cleanup.

Read more ...

by Ivan Pepelnjak ( at December 04, 2017 07:12 AM

Potaroo blog

Network Neutrality - Again

It strikes me as odd to see a developed and, by any reasonable standard, a prosperous economy getting into so much trouble with its public communications policy framework.

December 04, 2017 12:45 AM

XKCD Comics

December 03, 2017 Blog (Ivan Pepelnjak)

Worth Reading: The Basic Math behind Reliability

Diptanshu Singh wrote a nice explanation of the math behind reliability calculations. Definitely worth reading even if you hated statistics.

by Ivan Pepelnjak ( at December 03, 2017 09:03 AM

December 02, 2017 Blog (Ivan Pepelnjak)

How Did NETCONF Start on Software Gone Wild

A long while ago Marcel Wiget sent me an interesting email along the lines “I think you should do a Software Gone Wild podcast with Phil Shafer, the granddaddy of NETCONF

Not surprisingly, as we started discovering the history behind NETCONF we quickly figured out that all the API and automation hype being touted these days is nothing new – some engineers have been doing that stuff for almost 20 years.

Read more ...

by Ivan Pepelnjak ( at December 02, 2017 02:59 PM

December 01, 2017

The Networking Nerd

Does Juniper Need To Be Purchased?

You probably saw the news this week that Nokia was looking to purchase Juniper Networks. You also saw pretty quickly that the news was denied, emphatically. It was a curious few hours when the network world was buzzing about the potential to see Juniper snapped up into a somewhat larger organization. There was also talk of product overlap and other kinds of less exciting but very necessary discussions during mergers like this. Which leads me to a great thought exercise: Does Juniper Need To Be Purchased?

Sins of The Father

More than any other networking company I know of, Juniper has paid the price for trying to break out of their mold. When you think Juniper, most networking professionals will tell you about their core routing capabilities. They’ll tell you how Juniper has a great line of carrier and enterprise switches. And, if by some chance, you find yourself talking to a security person, you’ll probably hear a lot about the SRX Firewall line. Forward thinking people may even tell you about their automation ideas and their charge into the world of software defined things.

Would you hear about their groundbreaking work with Puppet from 2013? How about their wireless portfolio from 2012? Would anyone even say anything about Junosphere and their modeling environments from years past? Odds are good you wouldn’t. The Puppet work is probably bundled in somewhere, but the person driving it in that video is on to greener pastures at this point. The wireless story is no longer a story, but a footnote. And the list could go on longer than that.

When Cisco makes a misstep, we see it buried, written off, and eventually become the butt of really inside jokes between groups of engineers that worked with the product during the short life it had on this planet. Sometimes it’s a hardware mistake. Other times it’s software architecture missteps. But in almost every case, those problems are anecdotes you tell as you watch the 800lb gorilla of networking squash their competitors.

With Juniper, it feels different. Every failed opportunity is just short of disaster. Every misstep feels like it lands on a land mine. Every advance not expanded upon is the “one that got away”. Yet we see it time and time again. If a company like Cisco pushed the envelope the way we see Juniper pushing it we would laud them with praise and tell the world that they are on the verge of greatness all over again.

Crimes Of The Family

Why then does Juniper look like a juicy acquisition target? Why are they slow being supplanted by Arista as the favored challenger of the Cisco Empire? How is it that we find Juniper under the crosshairs of everyone, fighting to say alive?

As it turns out, wars are expensive. And when you’re gearing to fight Cisco you need all the capital you can. That forces you to make alliances that may not be the best for you in the long run. And in the case of Juniper, it brought in some of the people that thought they could get in on the ground floor of a company that was ready to take on the 800lb gorilla and win.

Sadly, those “friends” tend to be the kind that desert you when you need them the most. When Juniper was fighting tooth and nail to build their offerings up to compete against Cisco, the investors were looking for easy gains and ways to make money. And when those investors realize that toppling empires takes more than two quarters, they got antsy. Some bailed. Those needed to go. But the ones that stayed cause more harm than good.

I’ve written before about Juniper’s issues with Elliott Capital Management, but it bears repeating here. Elliott is an activist investor in the same vein as Carl Ichan. They take a substantial position in a company and then immediately start demanding changes to raise the stock price. If they don’t get their way, they release paper after paper decrying the situation to the market until the stock price is depressed enough to get the company to listen to Elliott. Once Elliott’s demands are met, the company exits their position. They get a small profit and move on to do it all over again, leaving behind a shell of a company wonder what happened.

Elliott has done this to Juniper in droves. Pulse VPN. Trapeze. They’ve demanded executive changes and forced Juniper to abandon good projects that have long term payoffs because they won’t bounce the stock price higher this quarter. And worse yet, if you look back over the last five years you can find story in the finance industry about Juniper being up for sale or being a potential acquisition target. Five. Years. When’s the last time you heard about Cisco being a potential target for buyout? Hell, even Arista doesn’t get shopped as much as Juniper.

Tom’s Take

I think these symptoms are all the same root issue. Juniper is a great technology company that does some exciting and innovative things. But, much like a beautiful potted plant in my house, they are reaching the maximum amount of size they can grow to without making a move. Like a plant, you can only grow as big as their container. If you leave them in a small one, they’ll only ever be small. You can transfer them to something larger but you risk harm or death. But you’ll never grow if you don’t change. Juniper has the minds and the capability to grow. And maybe with the eyes of the Wall Street buzzards looking elsewhere for a while, they can build a practice that gives them the capability to challenge in the areas they are good at, not just being the answer for everything Cisco is doing.

by networkingnerd at December 01, 2017 05:37 PM

Ethan Banks on Technology

All Of Ethan’s Podcasts And Articles For November 2017

Here’s a catalog of all the media I produced (or helped produce) in November 2017. I’ve included content summaries to motivate you to click. See, that’s coming right at you with how I’m trying to manipulate your behavior. I’m honest like that.



  • Episode 134 – Meet ZeroTier–Open Source Networking. I interview Adam Ierymenko about ZeroTier, his overlay networking baby that connects devices at L2 no matter where they are. Really interesting tech. This could be one of the most discussed shows in the Packet Pushers catalog. Plus, Adam joined the Packet Pushers audience Slack channel and has been interacting with the community.
  • Episode 135 – Master Python Networking–The Book. I interview Eric Chou, author of this book, who is donating all the proceeds to charity. Lots of folks have reacted to this interview, reflecting the strong interest from the networking community in automation.


  • Episode 108 – Building Service Meshes With Avi Networks (Sponsored). Service meshes are the latest in the evolution of application delivery controllers. The big idea is to put a service anywhere it’s needed and route traffic through it in a dynamic infrastructure environment. Pair “service mesh” with “cloud native,” and you’re starting to get it.
  • Episode 109 – Run VMware Apps In The Cloud With Ravello (Sponsored). Oracle Ravello makes a product that allows you to pick up your data center as-is and run it in the cloud. Lots of use cases–lab, change validation, infrastructure modeling, user acceptance testing, quality assurance, and even production.
  • Episode 110 – The Future Of Storage. We interview Tom Lyon, Sun Microsystems employee number 8, about where storage is headed. He seems to be a good person to ask, as he’s working at DriveScale these days, creating a distributed storage product designed for leading edge compute.
  • Episode 111 – NVMe And Its Network Impact. Cisco’s J Metz makes a repeat Datanauts appearance. We go full nerd discussing how the incredible performance of NVMe drives will impact storage networks. This is a very big deal that not enough people are talking about, IMHO.
  • Episode 112 – Building The Perfect Data Center Beast. We talk through several aspects of building a physical data center including power distribution, hot/cold aisle designs, racks, and cabling plant. A longer-than-average show that’s seen a good bit of feedback on Twitter already, including opening up the question of, “Does anyone build raised floor facilities anymore?”

Briefings In Brief Podcast

  • Understanding Wireshark Capture Filters. I dive deep on a specific Wireshark capture filter, explaining how it works piece by piece, and concluding with a list of resources to find even more information.


  • Nothing new this month, although I have decided that I am going to focus on personal productivity in this blog. I have felt for a while that I needed a specific topic to write about here, and productivity is an area where I continue to evolve.


  • Human Infrastructure Magazine 70 – How Do You Learn? I ask for feedback from you about how you learn. We’re working on new styles of content over at Packet Pushers Heavy Industries, and want to come as close as we can to getting it correct out of the gate. Your feedback appreciated.


  • I delivered a QoS Fundamentals webinar over at this month. That went reasonably well, although I got some feedback that made me question how I should be doing slides, etc. when doing live over-the-Internet presentations. I’ve since bought a Wacom tablet that I need to figure out how to use. My idea is to use the Wacom to do live whiteboarding during webinars.
  • I spent a day with a higher ed institution, acting as facilitator for a devops workshop they ran internally. That was quite intriguing as their issues were far more human than technical. There’s content there somewhere. I need to think it through and decide what to focus on.
  • The devops workshop did leave me with a technical question I don’t have a great answer to yet. That is, can devops practices be applied in the case of shops deploying lots of shrink-wrap software? That’s a different pipeline than an in-house dev shop pushing code through a CI/CD pipeline into prod using microservices over cloud native. And yet…there are many parallels as well as demands of efficiency. Where does devops, as traditionally defined, fit? I have homework to do and perhaps some folks to interview to shine some light on this topic as I have mixed opinions right now.
  • I am knee-deep into the Todoist task management app, working to make it my single source of truth. Post coming.
  • I have completed migration of my home and lab networks to a D-Link L3 switch instead of a Cisco SG-300 that was running, but had taken a lightning hit and lost an ASIC (I think) and therefore several front panel ports. The D-Link is a DGS-1510-52 gigabit Ethernet switch with a ton of capabilities. I have spent time going through the manual, and I’m favorably impressed. I will likely blog about some of the lesser-known features when I get a chance to study them.
  • I have also migrated my home firewall from a VMware instance of pfSense to a bare-metal instance. Now I have a beast-mode firewall at home with a quad-core Xeon CPU and 32GB of RAM. It’s barely ticking over with the load I’m placing on it, but that was the point. I am going to be loading it up with as many features as I think I can take advantage of, and I don’t want hardware to be a question mark. I still have a ways to go on this box, but so far I’ve got it serving forward and reverse DNS locally, which has made some of the auxiliary packages like BandwidthD offer some more interesting statistics. For example, I have Amazon Echo devices sucking down gigabytes of data from the Internet. Fascinating, and vaguely worrying until I have a better idea of what that data is. In any case, I have plans for ZeroTier on this pfSense box, but I need to do more homework to grok how to install the package, as it doesn’t seem to be supported as a core function. Not sure yet on this, as I’ve heard it can be done, but haven’t managed to hit the right support page explaining it.
  • After many hours, I managed to get minikube (virtualized Kubernetes cluster running on a single host) running on my iMac. I was making it harder than it needed to be, wanting to run minikube in a Linux VM. It kept failing miserably until I opted to do the minikube install like the guide suggested, leveraging Fusion as the hypervisor but otherwise running OS X native. I need minikube to support my reading of the Kubernetes Up And Running book sitting on my desk. It’s not a long book, but it won’t mean much without the lab work to reinforce concepts.

by Ethan Banks at December 01, 2017 05:00 AM

XKCD Comics

November 30, 2017 Blog (Ivan Pepelnjak)

Automate End-to-End Latency Measurements

Here’s another idea from the Building Network Automation Solutions online course: Ruben Tripiana decided to implement a latency measurement tool. His playbook takes a list of managed devices from Ansible inventory, generates a set of unique device pairs, measures latency between them, and produces a summary report (see also his description of the project).

Read more ...

by Ivan Pepelnjak ( at November 30, 2017 07:19 AM

November 29, 2017 Blog (Ivan Pepelnjak)

BGP as a Better IGP? When and Where?

A while ago I helped a large enterprise redesign their data center fabric. They did a wonderful job optimizing their infrastructure, so all they really needed were two switches in each location.

Some vendors couldn’t fathom that. One of them proposed to build a “future-proof” (and twice as expensive) leaf-and-spine fabric with two leaves and two spines. On top of that they proposed to use EBGP as the only routing protocol because draft-lapukhov-bgp-routing-large-dc – a clear case of missing the customer needs.

Read more ...

by Ivan Pepelnjak ( at November 29, 2017 08:13 AM

XKCD Comics

November 28, 2017

Dyn Research (Was Renesys Blog)

The Migration of Political Internet Shutdowns

In January 2011, what was arguably the first significant disconnection of an entire country from the Internet took place when routes to Egyptian networks disappeared from the Internet’s global routing table, leaving no valid paths by which the rest of the world could exchange Internet traffic with Egypt’s service providers. It was followed in short order by nationwide disruptions in Bahrain, Libya, and Syria. These outages took place during what became known as the Arab Spring, highlighting the role that the Internet had come to play in political protest, and heralding the wider use of national Internet shutdowns as a means of control.

“How hard is it to disconnect a country from the Internet, really?”

After these events, and another significant Internet outage in Syria, this question led a blog post published in November 2012 by former Dyn Chief Scientist Jim Cowie that examined the risk of Internet disconnection for countries around the world, based on the number of Internet connections at their international border. “You can think of this, to [a] first approximation,” Cowie wrote, “as the number of phone calls (or legal writs, or infrastructure attacks) that would have to be performed in order to decouple the domestic Internet from the global Internet.”

Defining Internet Disconnection Risk

Based on our aggregated view of the global Internet routing table at the time, we identified the set of border providers in each country: domestic network providers (autonomous systems, in BGP parlance) who have direct connections, visible in routing, to international (foreign) providers. From that data set, four tiers were defined to classify a country’s risk of Internet disconnection. A summary of these classifications is below – additional context can be found in the original blog post:

  • If a country has only 1 or 2 service providers at its international frontier, it is classified as being at severe risk of Internet disconnection.
  • With fewer than 10 service providers at its international frontier, a country is classified as being at significant risk of Internet disconnection.
  • A country’s risk of Internet disconnection is classified as low risk with between 10 and 40 internationally-connected service providers.
  • Finally, countries with more than 40 providers at their borders are considered to be resistant to Internet disconnection.

The original blog post classified 223 countries and territories, with the largest number of them classified as being at significant risk of Internet disconnection.

A February 2014 update to the original post, entitled “Syria, Venezuela, Ukraine: Internet Under Fire” examined changes observed in the 16 months since the original post, highlighting both increases and decreases in Internet disconnection risk level across a number of countries. The post noted the continued fragility of Internet connectivity in Syria, owing in part to its classification of being at severe risk of Internet disconnection, as well as mentioning the lack of nationwide Internet disruptions in Venezuela despite periodic slowdowns and regional access disruptions.

It has been five years since the original blog post, and over three and a half years since the followup post, so we thought that it would be interesting to take a new look at Internet resiliency around the world. Has connection diversity increased, and does that lead to a potential decrease in vulnerability to Internet shutdown?

However, as the 2014 blog post notes, “We acknowledge the limitations of such a simple model in predicting complex events such as Internet shutdowns. Many factors can contribute to making countries more fragile than they appear at the surface (for example, shared physical infrastructure under the control of a central authority, or the physical limitations of a few shared fiber optic connections to distant countries).” For instance, at the time of the original (2012) post, New Zealand relied primarily on the Southern Cross submarine cable connection to Australia for international Internet connectivity, despite our data showing dozens of border network providers. And while Iraq has gained numerous border relationships since 2012, most of the country (except for Kurdistan in the north) relies on a national fiber backbone which the Iraqi government has shut down dozens of times since 2014 to combat cheating on student exams, stifle protests, and disrupt ISIS communication.

In addition, it’s worth recognizing that there likely isn’t a meaningful difference in resilience in a country with 39 border providers (which would classify it as “low risk”) and 41 border providers (which would classify it as “resistant”). With these caveats in mind, an updated world map reflecting the risk of Internet disconnection as classified in our 2017 data set is presented below.

What’s Happened Since Then?

In reviewing other notable Internet shutdowns that have occurred since the 2014 post was published, a few things stood out:

However, the most interesting observation was the ‘migration’ of politically-motivated nationwide Internet disruptions. The outages that occurred during the Arab Spring time frame were largely concentrated in North Africa and the Middle East, shifting over the last several years into sub-Saharan Africa. This shift has not gone unnoticed, with online publication Quartz also highlighting the growing trend of African governments blocking the Internet to silence dissent, and the United Nations taking note as well. In addition, as these shutdowns are now a more regular occurrence, both in Africa and in other areas around the world, it is also worth looking at the financial impact that they have on affected countries.

Nearly three years ago, in January 2015, an Internet shutdown was put into place in Kinshasa, the capital of the Democratic Republic of Congo, after bloody clashes took place between opponents of President Joseph Kabila and police.  Banks and government agencies reportedly regained access after four days, while subscribers remained offline for three weeks. Almost two years later, in December 2016, an Internet shutdown was ordered as a means of blocking access to social media sites to prevent mobilization of those protesting against the president’s stay in office beyond the two-term limit.

<script async="async" charset="utf-8" src=""></script>

While many governments force Internet shutdowns that last for just a few hours, or across multiple days or weeks, Gabon combined both in September 2016, implementing a nightly “Internet curfew” that lasted for 23 days. The regular Internet disruptions occurred on the heels of a disputed national election that ultimately saw the incumbent president win a second term by a slim vote margin. International Internet connectivity was also reportedly restricted in the week before the election. With Internet access largely concentrated through Gabon Telecom, the country is at severe risk of Internet shutdown.

<script async="async" charset="utf-8" src=""></script>

In late November 2016, Internet connectivity in Gambia was shut down ahead of a national election that saw the country’s president of more than 20 years upset by the opposition candidate. Published reports noted that the opposition party relied on Internet messaging apps to organize rallies and demonstrations. Efforts by the incumbent party to disrupt Internet connectivity were presumably intended to derail this organizing, as well as to limit potential protests depending on the outcome of the election.

<script async="async" charset="utf-8" src=""></script>

In Cameroon, Internet connectivity was blocked in English-speaking parts of the country starting in January 2017, reportedly affecting about 20 percent of the population. The government reportedly suspended Internet service for users in the Southwest and Northwest provinces after a series of protests that resulted in violence and the arrest of community leaders. Ten months later, Internet access remains unstable in Cameroon, highlighted by the #BringBackOurInternet hashtag on Twitter.

<script async="async" charset="utf-8" src=""></script>

In Togo, throughout the fall of 2017, protesters have been calling for the resignation of President Faure Gnassingbe, who has been in power since his father died in 2005. In response, the country’s government has limited Internet access in an effort to prevent demonstrators from organizing on social media, and has also blocked text messaging. Published reports indicate that the mobile messaging app WhatsApp was a particular target, although some users resorted to VPNs to maintain access to the tool. Looking at the graph below, the Internet restrictions have not generally been implemented through broad manipulation or removal of routes — while some instability is evident, there have not been widespread outages, as have been seen in the past in countries such as Syria.

<script async="async" charset="utf-8" src=""></script>

Most recently, the government of Equatorial Guinea widely blocked access to the Internet ahead of a nationwide election that was widely expected to keep the ruling party in power. Local service providers GuineaNet and IPXEG, among others, were taken completely offline. This disruption followed blocking access to opposition Web sites, which has been going on since  2013, as well blocking access to Facebook, which was put into place when the electoral campaign started on October 27.

<script async="async" charset="utf-8" src=""></script>

“Swift and Dramatic” Economic Damage

In 2011, the Organisation for Economic Co-operation and Development (OECD) estimated that the economic impact of Egypt’s five-day national Internet shutdown “incurred direct costs of at minimum USD 90 million.” They estimated that lost revenues due to blocked telecommunications and Internet services accounted for approximately USD 18 million per day. However, the OECD also noted that “this amount does not include the secondary economic impacts which resulted from a loss of business in other sectors affected by the shutdown of communication services e.g. e-commerce, tourism and call centres.”

The true cost to a country of a nationwide Internet shutdown can be significant. An October 2016 study produced by Deloitte reached the following conclusions:

“The impacts of a temporary shutdown of the Internet grow larger as a country develops and as a more mature online ecosystem emerges. It is estimated that for a highly Internet connected country, the per day impact of a temporary shutdown of the Internet and all of its services would be on average $23.6 million per 10 million population. With lower levels of Internet access, the average estimated GDP impacts amount to $6.6 million and to $0.6 million per 10 million population for medium and low Internet connectivity economies, respectively.”

The study also noted that if Internet disruptions become more frequent and longer-term in nature, these impacts are likely to be magnified.

The Brookings Institute also published a report in October 2016 that looked at the cost of Internet shutdowns over the previous year. The report’s headline claim that “Internet shutdowns cost countries $2.4 billion last year” was cited in publications including Techcrunch and an Internet Society Policy Brief. However, within the report, so-called Internet shutdowns are broken down into a number of categories. By their count, 36 instances of “national Internet” shutdowns led to just under 20 days of aggregate downtime, responsible for almost USD 295 million of financial impact. In contrast, blocking access to apps at a nationwide level accounted for nearly half of the claimed financial impact.

The costs of a nationwide Internet shutdown to a country’s economy are clearly very real. In an October 2016 article in The Atlantic on this topic, my colleague Doug Madory noted “The hope is that a government would be less likely to order an Internet blackout if it knew the negative impacts of such a decision in terms of hard dollar figures.” We can hope that in the future, national governments will recognize that the money that these nationwide outages would cost them would be better redirected into improving Internet connectivity for citizens and businesses across their countries.


In 2012, we published the “Could It Happen In Your Country?” analysis in the aftermath of the Internet disruptions of the Arab Spring. Since then, we have observed and documented the trend of national Internet blackouts as they have migrated, most recently, to Africa.

While the studies by Deloitte and Brookings have pointed out the severe negative economic consequences of these blackouts, NGOs like AccessNow and Internet Sans Frontières do advocacy work by drawing attention to the adverse impacts on human rights when governments decide to cut communications lines. The role we play, and have played for many years, is to inform the Internet blackout discussion with expert technical analysis.

We can only hope that our combined efforts help to reduce the frequency of future government-directed Internet disruptions. Given the number of blackouts we’ve observed in recent months, help can’t come fast enough.

by David Belson at November 28, 2017 03:30 PM Blog (Ivan Pepelnjak)

Security or Convenience, That’s the Question

One of my readers was so delighted that something finally happened after I wrote about a NX-OS bug that he sent me a pointer to another one that has been pending for a long while, and is now officially terminated as FAD (Functions-as-Designed… even documented in the Further Problem Description).

Here’s what he wrote (slightly reworded)…

Read more ...

by Ivan Pepelnjak ( at November 28, 2017 08:21 AM

November 27, 2017 Blog (Ivan Pepelnjak)

It’s Bash Scripts All the Way Down (more on CLI versus API)

Netfortius made an interesting comment to my Ansible playbook as a bash script blog post:

Ivan - aren't we now moving the "CLI"[-like] approach, upstream (the one we are just trying to depart, via the more structured and robust approach of RESTAPI).

As I explained several times, I don’t know where the we must get rid of CLI ideas are coming from; the CLI is root of all evil mantra is just hype generated by startups selling alternative approaches (the best part: one of them was actually demonstrating their product using CLI).

Read more ...

by Ivan Pepelnjak ( at November 27, 2017 07:56 AM

Potaroo blog

Helping Resolvers to help the DNS

Here, I'd like to look at ways that recursive resolvers in the DNS can take some further steps that assist other parts of the DNS, notably the set of authoritative name servers, including root zone servers, to function more efficiently, and to mitigate some of the negative consequences if these authoritative name servers are exposed to damaging DOS attacks.

November 27, 2017 12:45 AM

Hiding the DNS

I’d like to look in a little more detail at the efforts to hide the DNS behind HTTPS, and put the work in the IETF's DOH Working Group into a broader perspective. There are a number of possible approaches here, and they can be classified according to the level of interaction between the DNS application and the underlying HTTPS encrypted session.

November 27, 2017 12:30 AM

XKCD Comics

November 26, 2017

Potaroo blog


It took some hundreds of years, but Europe eventually reacted to the introduction of gunpowder and artillery by recognising that they simply could not build castles large enough to defend against any conceivable attack. So they stopped. I hope it does not take us the same amount of time to understand that building ever more massively fortified and over-provisioned DNS servers is simply a tactic for today, not a strategy for tomorrow.

November 26, 2017 10:30 PM


It took some hundreds of years, but Europe eventually reacted to the introduction of gunpowder and artillery by recognising that they simply could not build castles large enough to defend against any conceivable attack. So they stopped. I hope it does not take us the same amount of time to understand that building ever more massively fortified and over-provisioned DNS servers is simply a tactic for today, not a strategy for tomorrow.

November 26, 2017 10:30 PM Blog (Ivan Pepelnjak)

New Video: Whitespace Handling in Jinja2

Whitespace handling is one of the most confusing aspects of Jinja2, thoroughly frustrating many attendees of my Ansible and Network Automation online courses.

I decided to fix that, ran a few well-controlled experiments, and documented the findings and common caveats in Whitespace Handling in Jinja2 video.

by Ivan Pepelnjak ( at November 26, 2017 06:06 AM

November 25, 2017 Blog (Ivan Pepelnjak)

Video: Using Simple PowerShell Scripts

After explaining the basics of PowerShell, Mitja Robas described how to do implement the “Hello, World!” of network automation (collecting printouts from network devices) in PowerShell.

To watch all videos from this free webinar, register here.

by Ivan Pepelnjak ( at November 25, 2017 11:52 AM

November 24, 2017

The Networking Nerd

Complexity Isn’t Always Bad

I was reading a great post this week from Gian Paolo Boarina (@GP_Ifconfig) about complexity in networking. He raises some great points about the overall complexity of systems and how we can never really reduce it, just move or hide it. And it made me think about complexity in general. Why are we against complex systems?

Confusion and Delay

Complexity is difficult. The more complicated we make something the more likely we are to have issues with it. Reducing complexity makes everything easier, or at least appears to do so. My favorite non-tech example of this is the carburetor of an internal combustion engine.

Carburetors are wonderful devices that are necessary for the operation of the engine. And they are very complicated indeed. A minor mistake in configuring the spray pattern of the jets or the alignment of them can cause your engine to fail to work at all. However, when you spend the time to learn how to work with one properly, you can make the engine perform even above the normal specifications.

Carburetors have been largely replaced in modern engines by computerized fuel injectors. These systems accomplish the same goal of injecting the fuel-air mixture into the engine. However, they are completely controlled by a computer system instead of being mechanically configured. It’s a great leap forward for people that aren’t mechanics or gear heads. The system either works or it doesn’t. There’s no configuration parameters. Of course, if it doesn’t work there’s also very little that you as a non-mechanic can do to rectify the situation. As Gian Paolo points out, the complexity in the system has just been moved from the carburetor to the computer system running it.

But why is that a bad thing? If the standard user is never supposed to fiddle with the system why is moving the complexity a bad thing? It could be argued that removing complications from the operation and diagnostics of the system are good, but only if you ever intend untrained people to ever work on the system. A non-mechanic might never be able to fix a fuel injector system, but a trained person should be able to fix it quickly. Here, the complexity isn’t a barrier to the people who have been trained properly to anticipate it.

Complexity is only a problem for people who don’t understand it. Whether it’s a routing protocol or a file system, complex things are going to exist no matter what we do. Understanding them doesn’t have to be the job of everyone that uses the system.

A Tangled Web

I remember briefly working with Novell’s original identity management system back when it was still called DirXML. It was horribly complicated. It required a number of XML drivers importing information into eDirectory, which itself had quirks. And that identity repository fed multiple systems via XML rules to populate those data structures. It was a complicated nightmare to end all nightmares.

Except when it worked. When the system did the job properly, it looked like magic. Information entered for a new employee in the HR system automatically created an Active Directory user account in a different system, provisioned an email account in a third different system, and even created a time card entry in a fourth totally different system. The complexity under the hood churned its way through to provide usability to the people that relied on the system. Could they have manually entered all of that information? Sure. But having it automatically happen was a huge time saver for them. And when you apply it to a school where those actions needed to be repeated dozens of times for new students you can see how it would save a significant amount of time.

Here, complexity is the reason the system exists. You didn’t have the capability to feed those individual systems at the time because of the lack of API support or various other reasons. You had to find a way to force feed the information to a system that wasn’t expecting to get it any other way. Complexity here was required. And it worked. Until it didn’t.

Troubleshooting the XML issues in the system and keeping it running with new updates and broken links consumed a huge amount of time for the people I knew that were good at using it. So much time, in fact, that a couple of them made a business out of remotely supported DirXML for customers that utilized it and either didn’t know how to use it or didn’t have the specific knowledge necessary to make it work the way they wanted. Here, the complexity wasn’t only a necessity of the system, but it was a driver to create a new support level for it.

Ultimately, DirXML went away as it was consumed by the Novell Identity Manager. And now, the idea of these systems not having an API is silly. We focus our efforts more on the programming of the API and not on the extra complexity of the layers on top of it. But even those API interactions can be complex. So, we’ve essentially traded one complexity for another. We have simplified some aspects of the complexity while introducing others. We’ve also standardized things without necessarily making them an easier to do.

Tom’s Take

Complexity is bad when we don’t understand it. Trying to explain lagrange points and orbital dynamics is a huge pain when you aren’t talking to rocket scientists. However, most people that understand the complexities of the college football playoff system are more than happy to explain it to you in depth simply because they “get it”. Complexity isn’t always the enemy. If the people working on the system understand it enough to get the reasons why it needs to be complex to fulfill a job requirement then it’s not a bad thing. Instead of trying to move or reduce complexity, we should instead to try to ensure that we don’t add any additional complexity to the system. That’s how you keep the complexity snowball from rolling you over.

by networkingnerd at November 24, 2017 05:42 PM

XKCD Comics

November 23, 2017 Blog (Ivan Pepelnjak)

Worth Reading: Designing Container Networking

Diane Patton (Cumulus Networks) published a short overview of container networking design options, from traditional MLAG to running Quagga on Docker host.

If you want to learn more about individual designs described in that blog post, watch the Leaf-and-Spine Fabric Architectures and Docker Networking webinars, or join one of the data center online courses.

by Ivan Pepelnjak ( at November 23, 2017 09:07 AM

November 22, 2017

XKCD Comics