网络数据包头部在linux网络协议栈中的变化

接收时使用skb_pull()不断去掉各层协议头部;发送时使用skb_push()不断添加各层协议头部。

先说说接收:

150  * eth_type_trans - determine the packet‘s protocol ID.
151  * @skb: received socket data
152  * @dev: receiving network device
153  *
154  * The rule here is that we
155  * assume 802.3 if the type field is short enough to be a length.
156  * This is normal practice and works for any ‘now in use‘ protocol.
157  */
158 __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
159 {
160         struct ethhdr *eth;
161         unsigned char *rawp;
162
163         skb->dev = dev;
164         skb_reset_mac_header(skb);
165         skb_pull(skb, ETH_HLEN);
166         eth = eth_hdr(skb);
167
168         if (unlikely(is_multicast_ether_addr(eth->h_dest))) {
169                 if (!compare_ether_addr_64bits(eth->h_dest, dev->broadcast))
170                         skb->pkt_type = PACKET_BROADCAST;
171                 else
172                         skb->pkt_type = PACKET_MULTICAST;
173         }
174
175         /*
176          *      This ALLMULTI check should be redundant by 1.4
177          *      so don‘t forget to remove it.
178          *
179          *      Seems, you forgot to remove it. All silly devices
180          *      seems to set IFF_PROMISC.
181          */
182
183         else if (1 /*dev->flags&IFF_PROMISC */ ) {
184                 if (unlikely(compare_ether_addr_64bits(eth->h_dest, dev->dev_addr)))
185                         skb->pkt_type = PACKET_OTHERHOST;
186         }
187
188         /*
189          * Some variants of DSA tagging don‘t have an ethertype field
190          * at all, so we check here whether one of those tagging
191          * variants has been configured on the receiving interface,
192          * and if so, set skb->protocol without looking at the packet.
193          */
194         if (netdev_uses_dsa_tags(dev))
195                 return htons(ETH_P_DSA);
196         if (netdev_uses_trailer_tags(dev))
197                 return htons(ETH_P_TRAILER);
198
199         if (ntohs(eth->h_proto) >= 1536)
200                 return eth->h_proto;
201
202         rawp = skb->data;
203
204         /*
205          *      This is a magic hack to spot IPX packets. Older Novell breaks
206          *      the protocol design and runs IPX over 802.3 without an 802.2 LLC
207          *      layer. We look for FFFF which isn‘t a used 802.2 SSAP/DSAP. This
208          *      won‘t work for fault tolerant netware but does for the rest.
209          */
210         if (*(unsigned short *)rawp == 0xFFFF)
211                 return htons(ETH_P_802_3);
212
213         /*
214          *      Real 802.2 LLC
215          */
216         return htons(ETH_P_802_2);
217 }
218 EXPORT_SYMBOL(eth_type_trans);
193 static int ip_local_deliver_finish(struct sk_buff *skb)
194 {
195         struct net *net = dev_net(skb->dev);
196
197         __skb_pull(skb, ip_hdrlen(skb));
198
199         /* Point into the IP datagram, just past the header. */
200         skb_reset_transport_header(skb);
201
202         rcu_read_lock();
203         {
204                 int protocol = ip_hdr(skb)->protocol;
205                 int hash, raw;
206                 const struct net_protocol *ipprot;
207
208         resubmit:
209                 raw = raw_local_deliver(skb, protocol);
210
211                 hash = protocol & (MAX_INET_PROTOS - 1);
212                 ipprot = rcu_dereference(inet_protos[hash]);
213                 if (ipprot != NULL) {
214                         int ret;
215
216                         if (!net_eq(net, &init_net) && !ipprot->netns_ok) {
217                                 if (net_ratelimit())
218                                         printk("%s: proto %d isn‘t netns-ready\n",
219                                                 __func__, protocol);
220                                 kfree_skb(skb);
221                                 goto out;
222                         }
223
224                         if (!ipprot->no_policy) {
225                                 if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
226                                         kfree_skb(skb);
227                                         goto out;
228                                 }
229                                 nf_reset(skb);
230                         }
231                         ret = ipprot->handler(skb);
232                         if (ret < 0) {
233                                 protocol = -ret;
234                                 goto resubmit;
235                         }
236                         IP_INC_STATS_BH(net, IPSTATS_MIB_INDELIVERS);
237                 } else {
238                         if (!raw) {
239                                 if (xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) {
240                                         IP_INC_STATS_BH(net, IPSTATS_MIB_INUNKNOWNPROTOS);
241                                         icmp_send(skb, ICMP_DEST_UNREACH,
242                                                   ICMP_PROT_UNREACH, 0);
243                                 }
244                         } else
245                                 IP_INC_STATS_BH(net, IPSTATS_MIB_INDELIVERS);
246                         kfree_skb(skb);
247                 }
248         }
249  out:
250         rcu_read_unlock();
251
252         return 0;
253 }
5224 int tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
5225                         struct tcphdr *th, unsigned len)
5226 {
5227         struct tcp_sock *tp = tcp_sk(sk);
5228         int res;
5229
5230         /*
5231          *      Header prediction.
5232          *      The code loosely follows the one in the famous
5233          *      "30 instruction TCP receive" Van Jacobson mail.
5234          *
5235          *      Van‘s trick is to deposit buffers into socket queue
5236          *      on a device interrupt, to call tcp_recv function
5237          *      on the receive process context and checksum and copy
5238          *      the buffer to user space. smart...
5239          *
5240          *      Our current scheme is not silly either but we take the
5241          *      extra cost of the net_bh soft interrupt processing...
5242          *      We do checksum and copy also but from device to kernel.
5243          */
5244
5245         tp->rx_opt.saw_tstamp = 0;
5246
5247         /*      pred_flags is 0xS?10 << 16 + snd_wnd
5248          *      if header_prediction is to be made
5249          *      ‘S‘ will always be tp->tcp_header_len >> 2
5250          *      ‘?‘ will be 0 for the fast path, otherwise pred_flags is 0 to
5251          *  turn it off (when there are holes in the receive
5252          *       space for instance)
5253          *      PSH flag is ignored.
5254          */
5255
5256         if ((tcp_flag_word(th) & TCP_HP_BITS) == tp->pred_flags &&
5257             TCP_SKB_CB(skb)->seq == tp->rcv_nxt &&
5258             !after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {
5259                 int tcp_header_len = tp->tcp_header_len;
5260
5261                 /* Timestamp header prediction: tcp_header_len
5262                  * is automatically equal to th->doff*4 due to pred_flags
5263                  * match.
5264                  */
5265
5266                 /* Check timestamp */
5267                 if (tcp_header_len == sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) {
5268                         /* No? Slow path! */
5269                         if (!tcp_parse_aligned_timestamp(tp, th))
5270                                 goto slow_path;
5271
5272                         /* If PAWS failed, check it more carefully in slow path */
5273                         if ((s32)(tp->rx_opt.rcv_tsval - tp->rx_opt.ts_recent) < 0)
5274                                 goto slow_path;
5275
5276                         /* DO NOT update ts_recent here, if checksum fails
5277                          * and timestamp was corrupted part, it will result
5278                          * in a hung connection since we will drop all
5279                          * future packets due to the PAWS test.
5280                          */
5281                 }
5282
5283                 if (len <= tcp_header_len) {
5284                         /* Bulk data transfer: sender */
5285                         if (len == tcp_header_len) {
5286                                 /* Predicted packet is in window by definition.
5287                                  * seq == rcv_nxt and rcv_wup <= rcv_nxt.
5288                                  * Hence, check seq<=rcv_wup reduces to:
5289                                  */
5290                                 if (tcp_header_len ==
5291                                     (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&
5292                                     tp->rcv_nxt == tp->rcv_wup)
5293                                         tcp_store_ts_recent(tp);
5294
5295                                 /* We know that such packets are checksummed
5296                                  * on entry.
5297                                  */
5298                                 tcp_ack(sk, skb, 0);
5299                                 __kfree_skb(skb);
5300                                 tcp_data_snd_check(sk);
5301                                 return 0;
5302                         } else { /* Header too small */
5303                                 TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_INERRS);
5304                                 goto discard;
5305                         }
5306                 } else {
5307                         int eaten = 0;
5308                         int copied_early = 0;
5309
5310                         if (tp->copied_seq == tp->rcv_nxt &&
5311                             len - tcp_header_len <= tp->ucopy.len) {
5312 #ifdef CONFIG_NET_DMA
5313                                 if (tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
5314                                         copied_early = 1;
5315                                         eaten = 1;
5316                                 }
5317 #endif
5318                                 if (tp->ucopy.task == current &&
5319                                     sock_owned_by_user(sk) && !copied_early) {
5320                                         __set_current_state(TASK_RUNNING);
5321
5322                                         if (!tcp_copy_to_iovec(sk, skb, tcp_header_len))
5323                                                 eaten = 1;
5324                                 }
5325                                 if (eaten) {
5326                                         /* Predicted packet is in window by definition.
5327                                          * seq == rcv_nxt and rcv_wup <= rcv_nxt.
5328                                          * Hence, check seq<=rcv_wup reduces to:
5329                                          */
5330                                         if (tcp_header_len ==
5331                                             (sizeof(struct tcphdr) +
5332                                              TCPOLEN_TSTAMP_ALIGNED) &&
5333                                             tp->rcv_nxt == tp->rcv_wup)
5334                                                 tcp_store_ts_recent(tp);
5335
5336                                         tcp_rcv_rtt_measure_ts(sk, skb);
5337
5338                                         __skb_pull(skb, tcp_header_len);
5339                                         tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
5340                                         NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITSTOUSER);
5341                                 }
5342                                 if (copied_early)
5343                                         tcp_cleanup_rbuf(sk, skb->len);
5344                         }
5345                         if (!eaten) {
5346                                 if (tcp_checksum_complete_user(sk, skb))
5347                                         goto csum_error;
5348
5349                                 /* Predicted packet is in window by definition.
5350                                  * seq == rcv_nxt and rcv_wup <= rcv_nxt.
5351                                  * Hence, check seq<=rcv_wup reduces to:
5352                                  */
5353                                 if (tcp_header_len ==
5354                                     (sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&
5355                                     tp->rcv_nxt == tp->rcv_wup)
5356                                         tcp_store_ts_recent(tp);
5357
5358                                 tcp_rcv_rtt_measure_ts(sk, skb);
5359
5360                                 if ((int)skb->truesize > sk->sk_forward_alloc)
5361                                         goto step5;
5362
5363                                 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITS);
5364
5365                                 /* Bulk data transfer: receiver */
5366                                 __skb_pull(skb, tcp_header_len);
5367                                 __skb_queue_tail(&sk->sk_receive_queue, skb);
5368                                 skb_set_owner_r(skb, sk);
5369                                 tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;
5370                         }
5371
5372                         tcp_event_data_recv(sk, skb);
5373
5374                         if (TCP_SKB_CB(skb)->ack_seq != tp->snd_una) {
5375                                 /* Well, only one small jumplet in fast path... */
5376                                 tcp_ack(sk, skb, FLAG_DATA);
5377                                 tcp_data_snd_check(sk);
5378                                 if (!inet_csk_ack_scheduled(sk))
5379                                         goto no_ack;
5380                         }
5381
5382                         if (!copied_early || tp->rcv_nxt != tp->rcv_wup)
5383                                 __tcp_ack_snd_check(sk, 0);
5384 no_ack:
5385 #ifdef CONFIG_NET_DMA
5386                         if (copied_early)
5387                                 __skb_queue_tail(&sk->sk_async_wait_queue, skb);
5388                         else
5389 #endif
5390                         if (eaten)
5391                                 __kfree_skb(skb);
5392                         else
5393                                 sk->sk_data_ready(sk, 0);
5394                         return 0;
5395                 }
5396         }
5397
5398 slow_path:
5399         if (len < (th->doff << 2) || tcp_checksum_complete_user(sk, skb))
5400                 goto csum_error;
5401
5402         /*
5403          *      Standard slow path.
5404          */
5405
5406         res = tcp_validate_incoming(sk, skb, th, 1);
5407         if (res <= 0)
5408                 return -res;
5409
5410 step5:
5411         if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
5412                 goto discard;
5413
5414         tcp_rcv_rtt_measure_ts(sk, skb);
5415
5416         /* Process urgent data. */
5417         tcp_urg(sk, skb, th);
5418
5419         /* step 7: process the segment text */
5420         tcp_data_queue(sk, skb);
5421
5422         tcp_data_snd_check(sk);
5423         tcp_ack_snd_check(sk);
5424         return 0;
5425
5426 csum_error:
5427         TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_INERRS);
5428
5429 discard:
5430         __kfree_skb(skb);
5431         return 0;
5432 }

再来看看发送过程中的填充头部:

777 /* This routine actually transmits TCP packets queued in by
778  * tcp_do_sendmsg().  This is used by both the initial
779  * transmission and possible later retransmissions.
780  * All SKB‘s seen here are completely headerless.  It is our
781  * job to build the TCP header, and pass the packet down to
782  * IP so it can do the same plus pass the packet off to the
783  * device.
784  *
785  * We are working here with either a clone of the original
786  * SKB, or a fresh unique copy made by the retransmit engine.
787  */
788 static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
789                             gfp_t gfp_mask)
790 {
791         const struct inet_connection_sock *icsk = inet_csk(sk);
792         struct inet_sock *inet;
793         struct tcp_sock *tp;
794         struct tcp_skb_cb *tcb;
795         struct tcp_out_options opts;
796         unsigned tcp_options_size, tcp_header_size;
797         struct tcp_md5sig_key *md5;
798         struct tcphdr *th;
799         int err;
800
801         BUG_ON(!skb || !tcp_skb_pcount(skb));
802
803         /* If congestion control is doing timestamping, we must
804          * take such a timestamp before we potentially clone/copy.
805          */
806         if (icsk->icsk_ca_ops->flags & TCP_CONG_RTT_STAMP)
807                 __net_timestamp(skb);
808
809         if (likely(clone_it)) {
810                 if (unlikely(skb_cloned(skb)))
811                         skb = pskb_copy(skb, gfp_mask);
812                 else
813                         skb = skb_clone(skb, gfp_mask);
814                 if (unlikely(!skb))
815                         return -ENOBUFS;
816         }
817
818         inet = inet_sk(sk);
819         tp = tcp_sk(sk);
820         tcb = TCP_SKB_CB(skb);
821         memset(&opts, 0, sizeof(opts));
822
823         if (unlikely(tcb->flags & TCPCB_FLAG_SYN))
824                 tcp_options_size = tcp_syn_options(sk, skb, &opts, &md5);
825         else
826                 tcp_options_size = tcp_established_options(sk, skb, &opts,
827                                                            &md5);
828         tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
829
830         if (tcp_packets_in_flight(tp) == 0)
831                 tcp_ca_event(sk, CA_EVENT_TX_START);
832
833         skb_push(skb, tcp_header_size);
834         skb_reset_transport_header(skb);
835         skb_set_owner_w(skb, sk);
836
837         /* Build TCP header and checksum it. */
838         th = tcp_hdr(skb);
839         th->source              = inet->inet_sport;
840         th->dest                = inet->inet_dport;
841         th->seq                 = htonl(tcb->seq);
842         th->ack_seq             = htonl(tp->rcv_nxt);
843         *(((__be16 *)th) + 6)   = htons(((tcp_header_size >> 2) << 12) |
844                                         tcb->flags);
845
846         if (unlikely(tcb->flags & TCPCB_FLAG_SYN)) {
847                 /* RFC1323: The window in SYN & SYN/ACK segments
848                  * is never scaled.
849                  */
850                 th->window      = htons(min(tp->rcv_wnd, 65535U));
851         } else {
852                 th->window      = htons(tcp_select_window(sk));
853         }
854         th->check               = 0;
855         th->urg_ptr             = 0;
856
857         /* The urg_mode check is necessary during a below snd_una win probe */
858         if (unlikely(tcp_urg_mode(tp) && before(tcb->seq, tp->snd_up))) {
859                 if (before(tp->snd_up, tcb->seq + 0x10000)) {
860                         th->urg_ptr = htons(tp->snd_up - tcb->seq);
861                         th->urg = 1;
862                 } else if (after(tcb->seq + 0xFFFF, tp->snd_nxt)) {
863                         th->urg_ptr = 0xFFFF;
864                         th->urg = 1;
865                 }
866         }
867
868         tcp_options_write((__be32 *)(th + 1), tp, &opts);
869         if (likely((tcb->flags & TCPCB_FLAG_SYN) == 0))
870                 TCP_ECN_send(sk, skb, tcp_header_size);
871
872 #ifdef CONFIG_TCP_MD5SIG
873         /* Calculate the MD5 hash, as we have all we need now */
874         if (md5) {
875                 sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
876                 tp->af_specific->calc_md5_hash(opts.hash_location,
877                                                md5, sk, NULL, skb);
878         }
879 #endif
880
881         icsk->icsk_af_ops->send_check(sk, skb->len, skb);
882
883         if (likely(tcb->flags & TCPCB_FLAG_ACK))
884                 tcp_event_ack_sent(sk, tcp_skb_pcount(skb));
885
886         if (skb->len != tcp_header_size)
887                 tcp_event_data_sent(tp, skb, sk);
888
889         if (after(tcb->end_seq, tp->snd_nxt) || tcb->seq == tcb->end_seq)
890                 TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
891
892         err = icsk->icsk_af_ops->queue_xmit(skb, 0);
893         if (likely(err <= 0))
894                 return err;
895
896         tcp_enter_cwr(sk, 1);
897
898         return net_xmit_eval(err);
899 }
314 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
315 {
316         struct sock *sk = skb->sk;
317         struct inet_sock *inet = inet_sk(sk);
318         struct ip_options *opt = inet->opt;
319         struct rtable *rt;
320         struct iphdr *iph;
321
322         /* Skip all of this if the packet is already routed,
323          * f.e. by something like SCTP.
324          */
325         rt = skb_rtable(skb);
326         if (rt != NULL)
327                 goto packet_routed;
328
329         /* Make sure we can route this packet. */
330         rt = (struct rtable *)__sk_dst_check(sk, 0);
331         if (rt == NULL) {
332                 __be32 daddr;
333
334                 /* Use correct destination address if we have options. */
335                 daddr = inet->inet_daddr;
336                 if(opt && opt->srr)
337                         daddr = opt->faddr;
338
339                 {
340                         struct flowi fl = { .oif = sk->sk_bound_dev_if,
341                                             .mark = sk->sk_mark,
342                                             .nl_u = { .ip4_u =
343                                                       { .daddr = daddr,
344                                                         .saddr = inet->inet_saddr,
345                                                         .tos = RT_CONN_FLAGS(sk) } },
346                                             .proto = sk->sk_protocol,
347                                             .flags = inet_sk_flowi_flags(sk),
348                                             .uli_u = { .ports =
349                                                        { .sport = inet->inet_sport,
350                                                          .dport = inet->inet_dport } } };
351
352                         /* If this fails, retransmit mechanism of transport layer will
353                          * keep trying until route appears or the connection times
354                          * itself out.
355                          */
356                         security_sk_classify_flow(sk, &fl);
357                         if (ip_route_output_flow(sock_net(sk), &rt, &fl, sk, 0))
358                                 goto no_route;
359                 }
360                 sk_setup_caps(sk, &rt->u.dst);
361         }
362         skb_dst_set(skb, dst_clone(&rt->u.dst));
363
364 packet_routed:
365         if (opt && opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
366                 goto no_route;
367
368         /* OK, we know where to send it, allocate and build IP header. */
369         skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0));
370         skb_reset_network_header(skb);
371         iph = ip_hdr(skb);
372         *((__be16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff));
373         if (ip_dont_fragment(sk, &rt->u.dst) && !ipfragok)
374                 iph->frag_off = htons(IP_DF);
375         else
376                 iph->frag_off = 0;
377         iph->ttl      = ip_select_ttl(inet, &rt->u.dst);
378         iph->protocol = sk->sk_protocol;
379         iph->saddr    = rt->rt_src;
380         iph->daddr    = rt->rt_dst;
381         /* Transport layer set skb->h.foo itself. */
382
383         if (opt && opt->optlen) {
384                 iph->ihl += opt->optlen >> 2;
385                 ip_options_build(skb, opt, inet->inet_daddr, rt, 0);
386         }
387
388         ip_select_ident_more(iph, &rt->u.dst, sk,
389                              (skb_shinfo(skb)->gso_segs ?: 1) - 1);
390
391         skb->priority = sk->sk_priority;
392         skb->mark = sk->sk_mark;
393
394         return ip_local_out(skb);
395
396 no_route:
397         IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
398         kfree_skb(skb);
399         return -EHOSTUNREACH;
400 }
178 static inline int ip_finish_output2(struct sk_buff *skb)
179 {
180         struct dst_entry *dst = skb_dst(skb);
181         struct rtable *rt = (struct rtable *)dst;
182         struct net_device *dev = dst->dev;
183         unsigned int hh_len = LL_RESERVED_SPACE(dev);
184
185         if (rt->rt_type == RTN_MULTICAST) {
186                 IP_UPD_PO_STATS(dev_net(dev), IPSTATS_MIB_OUTMCAST, skb->len);
187         } else if (rt->rt_type == RTN_BROADCAST)
188                 IP_UPD_PO_STATS(dev_net(dev), IPSTATS_MIB_OUTBCAST, skb->len);
189
190         /* Be paranoid, rather than too clever. */
191         if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {
192                 struct sk_buff *skb2;
193
194                 skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));
195                 if (skb2 == NULL) {
196                         kfree_skb(skb);
197                         return -ENOMEM;
198                 }
199                 if (skb->sk)
200                         skb_set_owner_w(skb2, skb->sk);
201                 kfree_skb(skb);
202                 skb = skb2;
203         }
204
205         if (dst->hh)
206                 return neigh_hh_output(dst->hh, skb);   //添加eth头部
207         else if (dst->neighbour)
208                 return dst->neighbour->output(skb);     //添加eth头部
209
210         if (net_ratelimit())
211                 printk(KERN_DEBUG "ip_finish_output2: No header cache and no neighbour!\n");
212         kfree_skb(skb);
213         return -EINVAL;
214 }
//以太网头部填充函数之一302 static inline int neigh_hh_output(struct hh_cache *hh, struct sk_buff *skb)
303 {
304         unsigned seq;
305         int hh_len;
306
307         do {
308                 int hh_alen;
309
310                 seq = read_seqbegin(&hh->hh_lock);
311                 hh_len = hh->hh_len;
312                 hh_alen = HH_DATA_ALIGN(hh_len);
313                 memcpy(skb->data - hh_alen, hh->hh_data, hh_alen);
314         } while (read_seqretry(&hh->hh_lock, seq));
315
316         skb_push(skb, hh_len);
317         return hh->hh_output(skb);
318 }
//以太网头部填充函数之二1195 int neigh_resolve_output(struct sk_buff *skb)
1196 {
1197         struct dst_entry *dst = skb_dst(skb);
1198         struct neighbour *neigh;
1199         int rc = 0;
1200
1201         if (!dst || !(neigh = dst->neighbour))
1202                 goto discard;
1203
1204         __skb_pull(skb, skb_network_offset(skb));
1205
1206         if (!neigh_event_send(neigh, skb)) {
1207                 int err;
1208                 struct net_device *dev = neigh->dev;
1209                 if (dev->header_ops->cache && !dst->hh) {
1210                         write_lock_bh(&neigh->lock);
1211                         if (!dst->hh)
1212                                 neigh_hh_init(neigh, dst, dst->ops->protocol);
1213                         err = dev_hard_header(skb, dev, ntohs(skb->protocol),
1214                                               neigh->ha, NULL, skb->len);
1215                         write_unlock_bh(&neigh->lock);
1216                 } else {
1217                         read_lock_bh(&neigh->lock);
1218                         err = dev_hard_header(skb, dev, ntohs(skb->protocol),
1219                                               neigh->ha, NULL, skb->len);
1220                         read_unlock_bh(&neigh->lock);
1221                 }
1222                 if (err >= 0)
1223                         rc = neigh->ops->queue_xmit(skb);
1224                 else
1225                         goto out_kfree_skb;
1226         }
1227 out:
1228         return rc;
1229 discard:
1230         NEIGH_PRINTK1("neigh_resolve_output: dst=%p neigh=%p\n",
1231                       dst, dst ? dst->neighbour : NULL);
1232 out_kfree_skb:
1233         rc = -EINVAL;
1234         kfree_skb(skb);
1235         goto out;
1236 }
1237 EXPORT_SYMBOL(neigh_resolve_output);
1280 static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev,
1281                                   unsigned short type,
1282                                   const void *daddr, const void *saddr,
1283                                   unsigned len)
1284 {
1285         if (!dev->header_ops || !dev->header_ops->create)
1286                 return 0;
1287
1288         return dev->header_ops->create(skb, dev, type, daddr, saddr, len);//调用eth_header
1289 }
 79 int eth_header(struct sk_buff *skb, struct net_device *dev,
 80                unsigned short type,
 81                const void *daddr, const void *saddr, unsigned len)
 82 {
 83         struct ethhdr *eth = (struct ethhdr *)skb_push(skb, ETH_HLEN);
 84
 85         if (type != ETH_P_802_3 && type != ETH_P_802_2)
 86                 eth->h_proto = htons(type);
 87         else
 88                 eth->h_proto = htons(len);
 89
 90         /*
 91          *      Set the source hardware address.
 92          */
 93
 94         if (!saddr)
 95                 saddr = dev->dev_addr;
 96         memcpy(eth->h_source, saddr, ETH_ALEN);
 97
 98         if (daddr) {
 99                 memcpy(eth->h_dest, daddr, ETH_ALEN);
100                 return ETH_HLEN;
101         }
102
103         /*
104          *      Anyway, the loopback-device should never use this function...
105          */
106
107         if (dev->flags & (IFF_LOOPBACK | IFF_NOARP)) {
108                 memset(eth->h_dest, 0, ETH_ALEN);
109                 return ETH_HLEN;
110         }
111
112         return -ETH_HLEN;
113 }
114 EXPORT_SYMBOL(eth_header);
时间: 2024-10-13 21:23:28

网络数据包头部在linux网络协议栈中的变化的相关文章

Linux程序设计学习笔记----网络编程之网络数据包拆封包与字节顺序大小端

网络数据包的封包与拆包 过程如下: 将数据从一台计算机通过一定的路径发送到另一台计算机.应用层数据通过协议栈发到网络上时,每层协议都要加上一个数据首部(header),称为封装(Encapsulation),如下图所示: 不同的协议层对数据包有不同的称谓,在传输层叫做段(segment),在网络层叫做数据包(packet),在链路层叫做帧(frame).数据封装成帧后发到传输介质上,到达目的主机后每层协议再剥掉相应的首部,最后将应用层数据交给应用程序处理. 上图对应两台计算机在同一网段中的情况,

Linux内核中网络数据包的接收-第一部分 概念和框架

与网络数据包的发送不同,网络收包是异步的的,因为你不确定谁会在什么时候突然发一个网络包给你,因此这个网络收包逻辑其实包含两件事:1.数据包到来后的通知2.收到通知并从数据包中获取数据这两件事发生在协议栈的两端,即网卡/协议栈边界以及协议栈/应用边界:网卡/协议栈边界:网卡通知数据包到来,中断协议栈收包:协议栈栈/应用边界:协议栈将数据包填充socket队列,通知应用程序有数据可读,应用程序负责接收数据.本文就来介绍一下关于这两个边界的这两件事是怎么一个细节,关乎网卡中断,NAPI,网卡poll,

linux 内核网络数据包接收流程

转:https://segmentfault.com/a/1190000008836467 本文将介绍在Linux系统中,数据包是如何一步一步从网卡传到进程手中的. 如果英文没有问题,强烈建议阅读后面参考里的两篇文章,里面介绍的更详细. 本文只讨论以太网的物理网卡,不涉及虚拟设备,并且以一个UDP包的接收过程作为示例. 本示例里列出的函数调用关系来自于kernel 3.13.0,如果你的内核不是这个版本,函数名称和相关路径可能不一样,但背后的原理应该是一样的(或者有细微差别) 网卡到内存 网卡需

Linux网络 - 数据包的接收过程【转】

转自:https://segmentfault.com/a/1190000008836467 本文将介绍在Linux系统中,数据包是如何一步一步从网卡传到进程手中的. 如果英文没有问题,强烈建议阅读后面参考里的两篇文章,里面介绍的更详细. 本文只讨论以太网的物理网卡,不涉及虚拟设备,并且以一个UDP包的接收过程作为示例. 本示例里列出的函数调用关系来自于kernel 3.13.0,如果你的内核不是这个版本,函数名称和相关路径可能不一样,但背后的原理应该是一样的(或者有细微差别) 网卡到内存 网卡

[转]Linux网络 - 数据包的接收过程

转, 原文: https://segmentfault.com/a/1190000008836467 ----------------------------------------------------------------------------------------------------------------- 本文将介绍在Linux系统中,数据包是如何一步一步从网卡传到进程手中的. 如果英文没有问题,强烈建议阅读后面参考里的两篇文章,里面介绍的更详细. 本文只讨论以太网的物理网

Linux网络 - 数据包的接收过程(转)

Linux网络包收发过程 就TCP/IP而言,IP和TCP的报文结构并不是最重要的,但是很多文章都在讨论他们,就体系而言,最重要的应该是各栈的流转流程.如果应用的话,重点应该在4次挥手(tcp的三次握手与四次挥手及为什么面试官喜欢问这个问题)及粘包和拆包及滑动窗口等.下面简单看下整体的收发过程. 注:Socket是提供给用户访问的TCP层接口 网卡到内存 网卡需要有驱动才能工作,驱动是加载到内核中的模块,负责衔接网卡和内核的网络模块,驱动在加载的时候将自己注册进网络模块,当相应的网卡收到数据包时

Linux网络数据包的揭秘以及常见的调优方式总结

https://mp.weixin.qq.com/s/boRWlx1R7TX0NLuI2sZBfQ 作为业务 SRE,我们所运维的业务,常常以 Linux+TCP/UDP daemon 的形式对外提供服务.SRE 需要对服务器数据包的接收和发送路径有全面的了解,以方便在服务异常时能快速定位问题.以 tcp 协议为例,本文将对 Linux 内核网络数据包接收的路径进行整理和说明,希望对大家所有帮助. Linux 数据包接收路径的整体说明 接收数据包是一个复杂的过程,涉及很多底层的技术细节 , 这里

LINUX下的远端主机登入 校园网络注册 网络数据包转发和捕获

第一部分:LINUX 下的远端主机登入和校园网注册 校园网内目的主机远程管理登入程序 本程序为校园网内远程登入,管理功能,该程序分服务器端和客户端两部分:服务器端为remote_server_udp.py 客户端分为单播客户端和广播客户端: 单播客户端client_unicast.py 广播客户端client_broadcast.py 1.单播客户端为根据net.info文件中的网络记录遍历目标网段中的所有IP,向其发送UDP封包. net.info中记录了目标网络中的一个样例IP和目标网段的子

Linux系统下网络数据包的处理流程

本文主要探讨linux环境下,数据包从网卡接收到协议栈处理的处理流程和对应的代码逻辑. 分析的内核代码版本为4.17.6,涉及到的网卡硬件功能特性和逻辑均以intel的82599以太网控制器为例,驱动为ixgbe.本文仅讨论physical function的驱动代码逻辑. 数据包从网卡接收开始,其总体处理流程如下: 网卡接收光/电信号,将其转换为数据帧内容,如果帧符合以太网地址等过滤条件,则保存到FIFO缓存中.82599控制器中共有8个FIFO缓存队列. 网卡解析FIFO中数据帧的2/3/4