Introduction
Monitoring RAM usage is a key aspect of operating system administration, and the ability to properly interpret the results of system tools seems to be something every systems specialist should possess. In practice, however, this often leads to debates among IT professionals, and additional RAM is sometimes added unnecessarily. Where do these interpretation errors come from, and how should free memory actually be monitored? That’s what I’d like to cover in this article.
The topic of RAM in the AIX system is fairly complex, so I’d like to focus on what I consider the most practical and fundamental question: “How much memory is available on a given system?”
Of course, the details of memory usage are also very important, but I want this article to help readers understand just the basic interpretation. It should serve as an introduction and an encouragement to explore more expert-level knowledge in the future.
Where do the misunderstandings come from?
The most common issue in interpreting free memory arises from the fact that many people look at the “Free” value in tools like svmon, NMON, TOPAS, etc.
But doesn’t “Free” mean free memory? It does, of course, but it’s important to understand that AIX tries not to waste available memory. As a result, a significant portion is often used as Filesystem Cache.
Different tools may display available memory in slightly different ways. For example, in the older 2.4 version of the Zabbix monitoring system, the “available” memory was shown exactly the same as “free.” However, starting from version 3.0 and continuing into current versions, Zabbix reports “available” memory as “free + cached”. If you are using a monitoring system, it’s worth checking its documentation to understand what the various labels represent and how they relate to values visible at the operating system level.
What is the Filesystem Cache?
The Filesystem Cache is a mechanism that buffers data from disk storage into RAM, significantly accelerating I/O (input/output) operations. This means data doesn’t have to be read from or written to disk as frequently – and as we know, disk access is much slower than RAM.
It’s important to note that the Filesystem Cache does not take memory away from your application or database. When, for example, an application process requires memory, AIX dynamically releases the necessary portion of the cache.
In general, we shouldn’t be concerned that the Filesystem Cache occupies a large part of our memory – it simply means that memory is being used to boost I/O performance.
In the chart below, RAM usage is visualized. The dark blue area represents used memory, light blue indicates Filesystem Cache, and the black area up to the yellow line shows free memory.
In this particular case, we can see that the used memory (Used) represents only a small portion of the total RAM allocated to the LPAR. Over a broader time range, the Filesystem Cache level can be observed dynamically rising and falling. If an administrator looks at the Free value at 3:00 a.m., he might start to panic and try to expand the RAM resources allocated to the LPAR, fearing that the application could soon crash due to memory exhaustion. However, that will not happen in this situation.
If the monitoring system triggers alerts based solely on the “Free” memory value, then – as seen in this chart – such alerts will appear and disappear dynamically. Depending on the behavior of a specific application, or for example during backup operations, the rise and fall of the Filesystem Cache can vary significantly.
How to interpret values presented by popular administrative tools
There are many tools available for monitoring the state of an AIX system, but in this article, I’d like to highlight those that I believe are most commonly used.
Memory usage consists of many components (real system, real user, real process, pinned, etc.), but to keep things simple, this article focuses only on free memory.
SVMON
The command svmon -G -O unit=auto allows you to display memory usage values in a human-readable format, meaning values are shown in KB/MB/GB/TB as appropriate.
In the example case, 20 GB of memory is assigned to the LPAR. If you want to know how much actual free memory you have, you should look at the “free” field, marked in yellow. If you want to see how much available memory you have, including the Filesystem Cache, refer to the “available” field, marked in green.
TOPAS
Topas in AIX system can be treated as a kind of equivalent to the top command known from Linux distributions. When run without any parameters, it displays a screen similar to the one shown below (the section related to RAM is highlighted in yellow). Administrators often rely on this default screen to assess memory usage. But is this really the right approach? In my opinion, it does help give a rough estimate, but it may also oversimplify the situation.
For detailed explanations of what the fields %Comp, %Noncomp, and %Client mean, I recommend checking the MEMORY section of the official documentation: https://www.ibm.com/docs/en/aix/7.3.0?topic=t-topas-command
The default Topas screen does not explicitly show free memory (but we wanted this to be simple!). To make your life easier, I recommend pressing the “M” key, or running the tool with the -M parameter, which launches the Memory Topology Panel.
In this mode, the screen will show the values for FREE and FILECACHE in human-readable units. Therefore, to calculate the available memory, simply add these two values together: (FREE + FILECACHE).
NMON
The NMON tool (Nigel’s Monitor) in interactive mode, after pressing the M key (for Memory), displays the following screen:
Many IT professionals, upon seeing a Used memory value above 95%, experience a spike in heart rate. That’s why it’s important to take a deep breath and calmly look to the right-hand side of the screen, where the FileSystemCache (numperm) is shown.
If we add Free (3.8%) and numperm (57.9%) together, we get the available memory, which is 61.7%.
(For explanations of the remaining memory-related values in NMON, please refer to the documentation: https://www.ibm.com/docs/en/aix/7.2.0?topic=tool-memory-statistics)
Grafana + NIMON/NJMON
In my opinion, the most valuable method of monitoring memory usage is to record statistics over time, allowing them to be visualized in the form of charts. Available memory can fluctuate greatly, so a post-crash diagnosis like “this system had plenty of free memory” – just after memory was released due to application processes stopping – is unlikely to be meaningful.
It’s important to have the ability to observe not only the current state of the system, but also what happened before, allowing for a broader context and correlations with other events or resource utilization charts.
I believe that a chart like the one below is very readable, and in this particular example, it clearly shows a large portion of RAM occupied by the Filesystem Cache. If we looked only at the Real free value, it would be easy to mistakenly conclude that the system urgently needs more RAM.
In the next article, I would like to present a ready-to-use example of such a panel in Grafana.
The chart below is also a good example of how long-term observation of memory utilization can help identify a memory leak. There are cases where memory is gradually “eaten” by a process due to a development bug – over weeks or even months – until the system runs out of memory.“Constant dripping wears away the stone”, so it’s definitely worth tracking memory utilization over time and setting alert thresholds accordingly.
Can Filesystem Cache usage be tuned?
Depending on the server’s purpose, it is possible to tune the use of the Filesystem Cache, primarily by configuring parameters of the Virtual Memory Manager such as:
- minperm%
- maxperm%
- maxclient%
- lru_file_repage
It is generally recommended to leave these parameters at their default values, but there may be cases where the defaults are not optimal. It’s advisable to follow the recommendations of the software vendor for the application you’re configuring AIX to run – especially in the case of databases.
For example, Oracle often uses its own caching mechanism, and in such cases, duplicating cache functionality at the OS level may be unnecessary (typically, the lru_file_repage parameter is changed from its default value of 1 to 0 in these scenarios).
Where to find more detailed information?
This publication was intentionally written to be as simple, short, and understandable as possible. If you’re looking for more in-depth information on memory management and system tool usage, I strongly recommend referring to the following PDF resources:
AIX 7.3
- AIX Version 7.3: Performance management:
https://www.ibm.com/docs/en/ssw_aix_73/pdf/performance_pdf.pdf - AIX Version 7.3: Performance Tools Guide and Reference:
https://www.ibm.com/docs/en/ssw_aix_73/pdf/performancetools_pdf.pdf
AIX 7.2
- AIX Version 7.2: Performance management:
https://www.ibm.com/docs/en/ssw_aix_72/pdf/performance_pdf.pdf - AIX Version 7.2: Performance Tools Guide and Reference:
https://www.ibm.com/docs/en/ssw_aix_72/pdf/performancetools_pdf.pdf
Summary
The purpose of this article was to simplify the interpretation of free RAM availability and explain how to check it in a simple way. I hope the text was helpful to you and that the experts won’t be angry at me for oversimplifying things too much 🙂
I have planned to descripe topics like SWAP and the compnents of used memory for the future, more complex publication.